[pgpool-general: 439] Re: strange load balancing issue in Solaris

Tatsuo Ishii ishii at postgresql.org
Wed May 9 20:56:09 JST 2012


Thanks for the hint. Attached is a patch trying to fix the
problem. Can you please try it?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Yes the issue is with random() function.
> 
> Looks like I have solved the problem by using rand.
> 
> Regards,
> Aravinth
> 
> 
> On Wed, May 9, 2012 at 4:02 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
> 
>> Thanks. Apparently random() of Solaris could return value beyond
>> RAND_MAX! It's easy to fix the problem, but I would like to do it with
>> respcet to portability. Any idea?
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>> >>From Solaris 10 (x86) man page:
>> >
>> >
>> > SYNOPSIS
>> >      #include <stdlib.h>
>> >
>> >      long random(void);
>> >
>> >      void srandom(unsigned int seed);
>> >
>> >      char  *initstate(unsigned  int  seed,  char  *state,  size_t
>> >      size);
>> >
>> >      char *setstate(const char *state);
>> >
>> > DESCRIPTION
>> >      The random() function uses  a  nonlinear  additive  feedback
>> >      random-number generator employing a default state array size
>> >      of 31  long  integers  to  return  successive  pseudo-random
>> >      numbers  in the range from 0 to 2**31 -1. The period of this
>> >      random-number generator is approximately 16 x (2 **31   -1).
>> >      The  size  of  the  state array determines the period of the
>> >      random-number generator. Increasing  the  state  array  size
>> >      increases the period.
>> >
>> >      The srandom() function initializes the current  state  array
>> >      using the value of seed.
>> >
>> >
>> > (...)
>> >
>> >
>> >
>> > Regards,
>> > Rafal
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: pgpool-general-bounces at pgpool.net [mailto:
>> pgpool-general-bounces at pgpool.net] On Behalf Of Tatsuo Ishii
>> > Sent: Wednesday, May 09, 2012 11:44 AM
>> > To: caravinth at gmail.com
>> > Cc: pgpool-general at pgpool.net
>> > Subject: [pgpool-general: 431] Re: strange load balancing issue in
>> Solaris
>> >
>> > Thanks.
>> >
>> > 2012-05-09 14:31:48 LOG:   pid 22459: r: 268356063.000000 total_weight:
>> 32767.000000
>> >
>> > This is really weird. Here pgpool caculate this:
>> >
>> >       r = (((double)random())/RAND_MAX) * total_weight;
>> >
>> > Total weight is same as RAND_MAX.  It seems your random() returns
>> > bigger than RAND_MAX, which does not make sense because man page of
>> > random(3) on my Linux says:
>> >
>> >          The random() function uses a non-linear additive feedback
>> random number
>> >        generator  employing a default table of size 31 long integers to
>> return
>> >        successive pseudo-random numbers in the range from 0 to RAND_MAX.
>>   The
>> >        period  of  this  random  number generator is very large,
>> approximately
>> >        16 * ((2^31) - 1).
>> >
>> > What does your man page for random() say on your system?
>> > --
>> > Tatsuo Ishii
>> > SRA OSS, Inc. Japan
>> > English: http://www.sraoss.co.jp/index_en.php
>> > Japanese: http://www.sraoss.co.jp
>> >
>> >> Sorry . I missed it.
>> >>
>> >> Here is the log file.
>> >>
>> >> --Aravinth
>> >>
>> >>
>> >> On Wed, May 9, 2012 at 2:07 PM, Tatsuo Ishii <ishii at postgresql.org>
>> wrote:
>> >>
>> >>> > The code you have sent is same in child.c.
>> >>>
>> >>> No.
>> >>>
>> >>>        pool_log("r: %f total_weight: %f", r, total_weight);
>> >>>
>> >>> You need to add the line above to get usefull information.
>> >>> --
>> >>> Tatsuo Ishii
>> >>> SRA OSS, Inc. Japan
>> >>> English: http://www.sraoss.co.jp/index_en.php
>> >>> Japanese: http://www.sraoss.co.jp
>> >>>
>> >>>
>> >>> > I have attached the log file. Please check
>> >>> >
>> >>> >
>> >>> > --Aravinth
>> >>> >
>> >>> >
>> >>> > On Tue, May 8, 2012 at 6:20 AM, Tatsuo Ishii <ishii at postgresql.org>
>> >>> wrote:
>> >>> >
>> >>> >> I suspect there's some portablity issue with load balance code. The
>> >>> >> actual source code is in select_load_balancing_nodechild.c).
>> >>> >> Please modify source code and connect to pgpool by using psql.
>> >>> >> Please send the log output.
>> >>> >> --
>> >>> >> Tatsuo Ishii
>> >>> >> SRA OSS, Inc. Japan
>> >>> >> English: http://www.sraoss.co.jp/index_en.php
>> >>> >> Japanese: http://www.sraoss.co.jp
>> >>> >>
>> >>> >> int select_load_balancing_node(void)
>> >>> >> {
>> >>> >>        int selected_slot;
>> >>> >>        double total_weight,r;
>> >>> >>        int i;
>> >>> >>
>> >>> >>        /* choose a backend in random manner with weight */
>> >>> >>        selected_slot = MASTER_NODE_ID;
>> >>> >>        total_weight = 0.0;
>> >>> >>
>> >>> >>        for (i=0;i<NUM_BACKENDS;i++)
>> >>> >>        {
>> >>> >>                if (VALID_BACKEND(i))
>> >>> >>                {
>> >>> >>                        total_weight +=
>> BACKEND_INFO(i).backend_weight;
>> >>> >>                }
>> >>> >>        }
>> >>> >>        r = (((double)random())/RAND_MAX) * total_weight;
>> >>> >>        pool_log("r: %f total_weight: %f", r, total_weight);
>> >>>  <--
>> >>> >> add this
>> >>> >>
>> >>> >>        total_weight = 0.0;
>> >>> >>        for (i=0;i<NUM_BACKENDS;i++)
>> >>> >>        {
>> >>> >>                if (VALID_BACKEND(i) &&
>> BACKEND_INFO(i).backend_weight >
>> >>> >> 0.0)
>> >>> >>                {
>> >>> >>                        if(r >= total_weight)
>> >>> >>                                selected_slot = i;
>> >>> >>                        else
>> >>> >>                                break;
>> >>> >>                        total_weight +=
>> BACKEND_INFO(i).backend_weight;
>> >>> >>                 }
>> >>> >>        }
>> >>> >>
>> >>> >>        pool_debug("select_load_balancing_node: selected backend id
>> is
>> >>> %d",
>> >>> >> selected_slot);
>> >>> >>         return selected_slot;
>> >>> >> }
>> >>> >>
>> >>> >>
>> >>> >> > Hi Tatsuo, Thanks for the reply.
>> >>> >> >
>> >>> >> > The normalized weights are 0.5 for both nodes and the selected
>> node is
>> >>> >> always the same node. I hope then it's srandom().
>> >>> >> >
>> >>> >> >
>> >>> >> > Any idea to solve this srandom issue
>> >>> >> >
>> >>> >> >
>> >>> >> > Thanks and Regards,
>> >>> >> > Aravinth
>> >>> >> >
>> >>> >> >
>> >>> >> > ________________________________
>> >>> >> >  From: Tatsuo Ishii <ishii at postgresql.org>
>> >>> >> > To: aravinth at mafiree.com
>> >>> >> > Cc: pgpool-general at pgpool.net
>> >>> >> > Sent: Tuesday, May 1, 2012 4:41 AM
>> >>> >> > Subject: Re: [pgpool-general: 396] strange load balancing issue in
>> >>> >> Solaris
>> >>> >> >
>> >>> >> > First of all please check "normalized" weights are as you
>> expected.
>> >>> >> > Run "show pool_status;" and see "backend_weight0",
>> "backend_weight1"
>> >>> >> > section. You see a floating point numbers, which are the
>> normalized
>> >>> >> > weight between 0.0 and 1.0. If you see both are 0.5, primary and
>> >>> >> > standby are given same weight.
>> >>> >> >
>> >>> >> > If they are ok, I suspect srandom() function behavior is different
>> >>> >> > from other platforms. Pgpool-II chooses the load balance node by
>> using
>> >>> >> > srandom(). select_load_balancing_node() is the function which is
>> >>> >> > responsible for selecting the load balance node. If you run
>> pgpool-II
>> >>> >> > with -d (debug) option, you will see following in the log:
>> >>> >> >
>> >>> >> >     pool_debug("select_load_balancing_node: selected backend id is
>> >>> %d",
>> >>> >> selected_slot);
>> >>> >> >
>> >>> >> > If backend_weight in show pool_status are fine but the line above
>> >>> >> > always shows same number, it is the sign that we have problem with
>> >>> >> > srandom().
>> >>> >> > --
>> >>> >> > Tatsuo Ishii
>> >>> >> > SRA OSS, Inc. Japan
>> >>> >> > English: http://www.sraoss.co.jp/index_en.php
>> >>> >> > Japanese: http://www.sraoss.co.jp
>> >>> >> >
>> >>> >> >> Hi All,
>> >>> >> >>
>> >>> >> >> I am facing a strange issue in load balancing with replication
>> mode
>> >>> set
>> >>> >> to
>> >>> >> >> true in Solaris. Load balancing algorithm always select the same
>> node
>> >>> >> >> whatever may be the backend weight
>> >>> >> >>
>> >>> >> >> Here is the scenario.
>> >>> >> >>
>> >>> >> >> I have a pgpool installed installed in 1 server
>> >>> >> >> 2 postgres nodes in other 2 servers
>> >>> >> >> replication mode set to true and load balancing set to true
>> >>> >> >> backend weight of the 2 nodes is 1.
>> >>> >> >>
>> >>> >> >> When I fire the queries manuall using different connections or
>> using
>> >>> >> >> pgbench all the queries hit the same node. Load balancing
>> algorithm
>> >>> >> always
>> >>> >> >> select the same node.
>> >>> >> >> No effect in changing the backend weight. Only when I set backend
>> >>> >> weight to
>> >>> >> >> 0 hits go to the other server.
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> I face this issue only in solaris. The same setup in other
>> servers (
>> >>> >> centos
>> >>> >> >> ,RHEL, ubunt etc) does the load balancing perfectly.
>> >>> >> >>
>> >>> >> >> Also tries various postgres versions and pgpool version with same
>> >>> >> result.
>> >>> >> >> But every version runs fine in other servers.
>> >>> >> >>
>> >>> >> >> Has anyone faced this issue?
>> >>> >> >>
>> >>> >> >> Any information would highly helpful.
>> >>> >> >>
>> >>> >> >> Regards,
>> >>> >> >> Aravinth
>> >>> >> _______________________________________________
>> >>> >> pgpool-general mailing list
>> >>> >> pgpool-general at pgpool.net
>> >>> >> http://www.pgpool.net/mailman/listinfo/pgpool-general
>> >>> >>
>> >>>
>> > _______________________________________________
>> > pgpool-general mailing list
>> > pgpool-general at pgpool.net
>> > http://www.pgpool.net/mailman/listinfo/pgpool-general
>> > _______________________________________________
>> > pgpool-general mailing list
>> > pgpool-general at pgpool.net
>> > http://www.pgpool.net/mailman/listinfo/pgpool-general
>> _______________________________________________
>> pgpool-general mailing list
>> pgpool-general at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: child.patch
Type: text/x-patch
Size: 745 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20120509/f57cfd62/attachment.bin>


More information about the pgpool-general mailing list