[pgpool-general: 676] Re: strange load balancing issue in Solaris

Tatsuo Ishii ishii at postgresql.org
Fri Jun 29 18:02:26 JST 2012


It seems the trouble occurs after pgpool receives signal 1 (HUP).
Did you do pgpool reload?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Guys,
> 
> I am facing another issue in same solaris.
> 
> I have initialized 300 pre-forked connections using num_init_childresn in
> streaming replication mode. Every thing works perfectly for a few hours.
> 
> After a few hours the connections drop with the below error . Also pgpool
> doesn't allow any new connections.
> 
> Any ideas guys.....
> 
> 
> 
> 
> 2012-06-07 08:31:15 DEBUG: pid 927: fork a new child pid 1645
> 2012-06-07 08:31:15 DEBUG: pid 927: child 1413 exits with status 1 by
> signal 1
> 2012-06-07 08:31:15 DEBUG: pid 1645: I am 1645
> 2012-06-07 08:31:15 DEBUG: pid 1645:
> pool_initialize_private_backend_status: initialize backend status
> 2012-06-07 08:31:15 DEBUG: pid 927: fork a new child pid 1646
> 2012-06-07 08:31:15 DEBUG: pid 927: child 1410 exits with status 1 by
> signal 1
> 2012-06-07 08:31:15 DEBUG: pid 1646: I am 1646
> 2012-06-07 08:31:15 DEBUG: pid 1646:
> pool_initialize_private_backend_status: initialize backend status
> 2012-06-07 08:31:17 ERROR: pid 927: fork() failed. reason: Not enough space
> 2012-06-07 08:31:17 DEBUG: pid 2012-06-07 08:31:172012-06-07
> 08:31:1716382012-06-07 08:31:172012-06-07 08:31:17 DEBUG: pid 2012-06-07
> 08:31:172012-06-07 08:31:172012-06
> -07 08:31:17 DEBUG: pid 2012-06-07 08:31:172012-06-07 08:31:172012-06-07
> 08:31:172012-06-07 08:31:17: 2012-06-07 08:31:172012-06-07
> 08:31:172012-06-07 08:31:172012-06-0
> 7 08:31:172012-06-07 08:31:172012-06-07 08:31:17 DEBUG: pid 2012-06-07
> 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-07
> 08:31:172012-06-07 08
> :31:172012-06-07 08:31:17 DEBUG: pid 2012-06-07 08:31:172012-06-07
> 08:31:172012-06-07 08:31:172012-06-07 08:31:1716402012-06-07
> 08:31:172012-06-07 08:31:17 DEBUG: pid 2
> 012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-07
> 08:31:172012-06-07 08:31:17 DEBUG: pid  DEBUG: pid 1637 DEBUG: pid  DEBUG:
> pid  DEBUG: pid  DEBUG: pi
> d child received shutdown request signal  DEBUG: pid  DEBUG: pid  DEBUG:
> pid  DEBUG: pid  DEBUG: pid  DEBUG: pid 1641 DEBUG: pid  DEBUG: pid  DEBUG:
> pid  DEBUG: pid  DE
> BUG: pid  DEBUG: pid  DEBUG: pid 1642 DEBUG: pid  DEBUG: pid  DEBUG: pid
> DEBUG: pid :  DEBUG: pid  DEBUG: pid 1636 DEBUG: pid  DEBUG: pid  DEBUG:
> pid  DEBUG: pid  DEBU
> G: pid 16341646: 163016351629163115163316431628164516441639:
> 1632162716251617161516261610: 1614162216181613child received shutdown
> request signal 16191612: 161116231621
> 16241620: : child received shutdown request signal : : : :
> : : : : : : child received shutdown request signal : : : : : : : child
> received shutdown request signal : : : : 15: : child received shutdown
> request signal : : : : : child received shutdown request signal child
> received shutdown request signal 15child received shutdown request signal
> child received shutdown request signal child received shutdown request
> signal child received shutdown request signal child received shutdown
> request signal child received shutdown request signal child received
> shutdown request signal child received shutdown request signal child
> received shutdown request signal child received shutdown request signal
> 15child received shutdown request signal child received shutdown request
> signal child received shutdown request signal child received shutdown
> request signal child received shutdown request signal child received
> shutdown request signal child received shutdown request signal 15child
> received shutdown request signal child received shutdown request signal
> child received shutdown request signal child received shutdown request
> signal
> child received shutdown request signal child received shutdown request
> signal 15child received shutdown request signal child received shutdown
> request signal child received shutdown request signal child received
> shutdown request signal child received shutdown request signal 1515
> 15151515151515151515
> 15151515151515
> 151515151515
> 1515151515
> 
> 
> 
> Regards,
> Aravinth
> 
> 
> On Thu, May 10, 2012 at 8:15 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:
> 
>> Good. Fix committed in master/V3_1_STABLE/V3_0_STABLE.
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>> > It's working.
>> >
>> > Regards,
>> > Aravinth
>> >
>> >
>> > On Wed, May 9, 2012 at 5:26 PM, Tatsuo Ishii <ishii at postgresql.org>
>> wrote:
>> >
>> >> Thanks for the hint. Attached is a patch trying to fix the
>> >> problem. Can you please try it?
>> >> --
>> >> Tatsuo Ishii
>> >> SRA OSS, Inc. Japan
>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> Japanese: http://www.sraoss.co.jp
>> >>
>> >> > Yes the issue is with random() function.
>> >> >
>> >> > Looks like I have solved the problem by using rand.
>> >> >
>> >> > Regards,
>> >> > Aravinth
>> >> >
>> >> >
>> >> > On Wed, May 9, 2012 at 4:02 PM, Tatsuo Ishii <ishii at postgresql.org>
>> >> wrote:
>> >> >
>> >> >> Thanks. Apparently random() of Solaris could return value beyond
>> >> >> RAND_MAX! It's easy to fix the problem, but I would like to do it
>> with
>> >> >> respcet to portability. Any idea?
>> >> >> --
>> >> >> Tatsuo Ishii
>> >> >> SRA OSS, Inc. Japan
>> >> >> English: http://www.sraoss.co.jp/index_en.php
>> >> >> Japanese: http://www.sraoss.co.jp
>> >> >>
>> >> >> >>From Solaris 10 (x86) man page:
>> >> >> >
>> >> >> >
>> >> >> > SYNOPSIS
>> >> >> >      #include <stdlib.h>
>> >> >> >
>> >> >> >      long random(void);
>> >> >> >
>> >> >> >      void srandom(unsigned int seed);
>> >> >> >
>> >> >> >      char  *initstate(unsigned  int  seed,  char  *state,  size_t
>> >> >> >      size);
>> >> >> >
>> >> >> >      char *setstate(const char *state);
>> >> >> >
>> >> >> > DESCRIPTION
>> >> >> >      The random() function uses  a  nonlinear  additive  feedback
>> >> >> >      random-number generator employing a default state array size
>> >> >> >      of 31  long  integers  to  return  successive  pseudo-random
>> >> >> >      numbers  in the range from 0 to 2**31 -1. The period of this
>> >> >> >      random-number generator is approximately 16 x (2 **31   -1).
>> >> >> >      The  size  of  the  state array determines the period of the
>> >> >> >      random-number generator. Increasing  the  state  array  size
>> >> >> >      increases the period.
>> >> >> >
>> >> >> >      The srandom() function initializes the current  state  array
>> >> >> >      using the value of seed.
>> >> >> >
>> >> >> >
>> >> >> > (...)
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > Regards,
>> >> >> > Rafal
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > -----Original Message-----
>> >> >> > From: pgpool-general-bounces at pgpool.net [mailto:
>> >> >> pgpool-general-bounces at pgpool.net] On Behalf Of Tatsuo Ishii
>> >> >> > Sent: Wednesday, May 09, 2012 11:44 AM
>> >> >> > To: caravinth at gmail.com
>> >> >> > Cc: pgpool-general at pgpool.net
>> >> >> > Subject: [pgpool-general: 431] Re: strange load balancing issue in
>> >> >> Solaris
>> >> >> >
>> >> >> > Thanks.
>> >> >> >
>> >> >> > 2012-05-09 14:31:48 LOG:   pid 22459: r: 268356063.000000
>> >> total_weight:
>> >> >> 32767.000000
>> >> >> >
>> >> >> > This is really weird. Here pgpool caculate this:
>> >> >> >
>> >> >> >       r = (((double)random())/RAND_MAX) * total_weight;
>> >> >> >
>> >> >> > Total weight is same as RAND_MAX.  It seems your random() returns
>> >> >> > bigger than RAND_MAX, which does not make sense because man page of
>> >> >> > random(3) on my Linux says:
>> >> >> >
>> >> >> >          The random() function uses a non-linear additive feedback
>> >> >> random number
>> >> >> >        generator  employing a default table of size 31 long
>> integers
>> >> to
>> >> >> return
>> >> >> >        successive pseudo-random numbers in the range from 0 to
>> >> RAND_MAX.
>> >> >>   The
>> >> >> >        period  of  this  random  number generator is very large,
>> >> >> approximately
>> >> >> >        16 * ((2^31) - 1).
>> >> >> >
>> >> >> > What does your man page for random() say on your system?
>> >> >> > --
>> >> >> > Tatsuo Ishii
>> >> >> > SRA OSS, Inc. Japan
>> >> >> > English: http://www.sraoss.co.jp/index_en.php
>> >> >> > Japanese: http://www.sraoss.co.jp
>> >> >> >
>> >> >> >> Sorry . I missed it.
>> >> >> >>
>> >> >> >> Here is the log file.
>> >> >> >>
>> >> >> >> --Aravinth
>> >> >> >>
>> >> >> >>
>> >> >> >> On Wed, May 9, 2012 at 2:07 PM, Tatsuo Ishii <
>> ishii at postgresql.org>
>> >> >> wrote:
>> >> >> >>
>> >> >> >>> > The code you have sent is same in child.c.
>> >> >> >>>
>> >> >> >>> No.
>> >> >> >>>
>> >> >> >>>        pool_log("r: %f total_weight: %f", r, total_weight);
>> >> >> >>>
>> >> >> >>> You need to add the line above to get usefull information.
>> >> >> >>> --
>> >> >> >>> Tatsuo Ishii
>> >> >> >>> SRA OSS, Inc. Japan
>> >> >> >>> English: http://www.sraoss.co.jp/index_en.php
>> >> >> >>> Japanese: http://www.sraoss.co.jp
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> > I have attached the log file. Please check
>> >> >> >>> >
>> >> >> >>> >
>> >> >> >>> > --Aravinth
>> >> >> >>> >
>> >> >> >>> >
>> >> >> >>> > On Tue, May 8, 2012 at 6:20 AM, Tatsuo Ishii <
>> >> ishii at postgresql.org>
>> >> >> >>> wrote:
>> >> >> >>> >
>> >> >> >>> >> I suspect there's some portablity issue with load balance
>> code.
>> >> The
>> >> >> >>> >> actual source code is in select_load_balancing_nodechild.c).
>> >> >> >>> >> Please modify source code and connect to pgpool by using psql.
>> >> >> >>> >> Please send the log output.
>> >> >> >>> >> --
>> >> >> >>> >> Tatsuo Ishii
>> >> >> >>> >> SRA OSS, Inc. Japan
>> >> >> >>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> >> >>> >> Japanese: http://www.sraoss.co.jp
>> >> >> >>> >>
>> >> >> >>> >> int select_load_balancing_node(void)
>> >> >> >>> >> {
>> >> >> >>> >>        int selected_slot;
>> >> >> >>> >>        double total_weight,r;
>> >> >> >>> >>        int i;
>> >> >> >>> >>
>> >> >> >>> >>        /* choose a backend in random manner with weight */
>> >> >> >>> >>        selected_slot = MASTER_NODE_ID;
>> >> >> >>> >>        total_weight = 0.0;
>> >> >> >>> >>
>> >> >> >>> >>        for (i=0;i<NUM_BACKENDS;i++)
>> >> >> >>> >>        {
>> >> >> >>> >>                if (VALID_BACKEND(i))
>> >> >> >>> >>                {
>> >> >> >>> >>                        total_weight +=
>> >> >> BACKEND_INFO(i).backend_weight;
>> >> >> >>> >>                }
>> >> >> >>> >>        }
>> >> >> >>> >>        r = (((double)random())/RAND_MAX) * total_weight;
>> >> >> >>> >>        pool_log("r: %f total_weight: %f", r, total_weight);
>> >> >> >>>  <--
>> >> >> >>> >> add this
>> >> >> >>> >>
>> >> >> >>> >>        total_weight = 0.0;
>> >> >> >>> >>        for (i=0;i<NUM_BACKENDS;i++)
>> >> >> >>> >>        {
>> >> >> >>> >>                if (VALID_BACKEND(i) &&
>> >> >> BACKEND_INFO(i).backend_weight >
>> >> >> >>> >> 0.0)
>> >> >> >>> >>                {
>> >> >> >>> >>                        if(r >= total_weight)
>> >> >> >>> >>                                selected_slot = i;
>> >> >> >>> >>                        else
>> >> >> >>> >>                                break;
>> >> >> >>> >>                        total_weight +=
>> >> >> BACKEND_INFO(i).backend_weight;
>> >> >> >>> >>                 }
>> >> >> >>> >>        }
>> >> >> >>> >>
>> >> >> >>> >>        pool_debug("select_load_balancing_node: selected
>> backend
>> >> id
>> >> >> is
>> >> >> >>> %d",
>> >> >> >>> >> selected_slot);
>> >> >> >>> >>         return selected_slot;
>> >> >> >>> >> }
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> > Hi Tatsuo, Thanks for the reply.
>> >> >> >>> >> >
>> >> >> >>> >> > The normalized weights are 0.5 for both nodes and the
>> selected
>> >> >> node is
>> >> >> >>> >> always the same node. I hope then it's srandom().
>> >> >> >>> >> >
>> >> >> >>> >> >
>> >> >> >>> >> > Any idea to solve this srandom issue
>> >> >> >>> >> >
>> >> >> >>> >> >
>> >> >> >>> >> > Thanks and Regards,
>> >> >> >>> >> > Aravinth
>> >> >> >>> >> >
>> >> >> >>> >> >
>> >> >> >>> >> > ________________________________
>> >> >> >>> >> >  From: Tatsuo Ishii <ishii at postgresql.org>
>> >> >> >>> >> > To: aravinth at mafiree.com
>> >> >> >>> >> > Cc: pgpool-general at pgpool.net
>> >> >> >>> >> > Sent: Tuesday, May 1, 2012 4:41 AM
>> >> >> >>> >> > Subject: Re: [pgpool-general: 396] strange load balancing
>> >> issue in
>> >> >> >>> >> Solaris
>> >> >> >>> >> >
>> >> >> >>> >> > First of all please check "normalized" weights are as you
>> >> >> expected.
>> >> >> >>> >> > Run "show pool_status;" and see "backend_weight0",
>> >> >> "backend_weight1"
>> >> >> >>> >> > section. You see a floating point numbers, which are the
>> >> >> normalized
>> >> >> >>> >> > weight between 0.0 and 1.0. If you see both are 0.5, primary
>> >> and
>> >> >> >>> >> > standby are given same weight.
>> >> >> >>> >> >
>> >> >> >>> >> > If they are ok, I suspect srandom() function behavior is
>> >> different
>> >> >> >>> >> > from other platforms. Pgpool-II chooses the load balance
>> node
>> >> by
>> >> >> using
>> >> >> >>> >> > srandom(). select_load_balancing_node() is the function
>> which
>> >> is
>> >> >> >>> >> > responsible for selecting the load balance node. If you run
>> >> >> pgpool-II
>> >> >> >>> >> > with -d (debug) option, you will see following in the log:
>> >> >> >>> >> >
>> >> >> >>> >> >     pool_debug("select_load_balancing_node: selected backend
>> >> id is
>> >> >> >>> %d",
>> >> >> >>> >> selected_slot);
>> >> >> >>> >> >
>> >> >> >>> >> > If backend_weight in show pool_status are fine but the line
>> >> above
>> >> >> >>> >> > always shows same number, it is the sign that we have
>> problem
>> >> with
>> >> >> >>> >> > srandom().
>> >> >> >>> >> > --
>> >> >> >>> >> > Tatsuo Ishii
>> >> >> >>> >> > SRA OSS, Inc. Japan
>> >> >> >>> >> > English: http://www.sraoss.co.jp/index_en.php
>> >> >> >>> >> > Japanese: http://www.sraoss.co.jp
>> >> >> >>> >> >
>> >> >> >>> >> >> Hi All,
>> >> >> >>> >> >>
>> >> >> >>> >> >> I am facing a strange issue in load balancing with
>> replication
>> >> >> mode
>> >> >> >>> set
>> >> >> >>> >> to
>> >> >> >>> >> >> true in Solaris. Load balancing algorithm always select the
>> >> same
>> >> >> node
>> >> >> >>> >> >> whatever may be the backend weight
>> >> >> >>> >> >>
>> >> >> >>> >> >> Here is the scenario.
>> >> >> >>> >> >>
>> >> >> >>> >> >> I have a pgpool installed installed in 1 server
>> >> >> >>> >> >> 2 postgres nodes in other 2 servers
>> >> >> >>> >> >> replication mode set to true and load balancing set to true
>> >> >> >>> >> >> backend weight of the 2 nodes is 1.
>> >> >> >>> >> >>
>> >> >> >>> >> >> When I fire the queries manuall using different
>> connections or
>> >> >> using
>> >> >> >>> >> >> pgbench all the queries hit the same node. Load balancing
>> >> >> algorithm
>> >> >> >>> >> always
>> >> >> >>> >> >> select the same node.
>> >> >> >>> >> >> No effect in changing the backend weight. Only when I set
>> >> backend
>> >> >> >>> >> weight to
>> >> >> >>> >> >> 0 hits go to the other server.
>> >> >> >>> >> >>
>> >> >> >>> >> >>
>> >> >> >>> >> >> I face this issue only in solaris. The same setup in other
>> >> >> servers (
>> >> >> >>> >> centos
>> >> >> >>> >> >> ,RHEL, ubunt etc) does the load balancing perfectly.
>> >> >> >>> >> >>
>> >> >> >>> >> >> Also tries various postgres versions and pgpool version
>> with
>> >> same
>> >> >> >>> >> result.
>> >> >> >>> >> >> But every version runs fine in other servers.
>> >> >> >>> >> >>
>> >> >> >>> >> >> Has anyone faced this issue?
>> >> >> >>> >> >>
>> >> >> >>> >> >> Any information would highly helpful.
>> >> >> >>> >> >>
>> >> >> >>> >> >> Regards,
>> >> >> >>> >> >> Aravinth
>> >> >> >>> >> _______________________________________________
>> >> >> >>> >> pgpool-general mailing list
>> >> >> >>> >> pgpool-general at pgpool.net
>> >> >> >>> >> http://www.pgpool.net/mailman/listinfo/pgpool-general
>> >> >> >>> >>
>> >> >> >>>
>> >> >> > _______________________________________________
>> >> >> > pgpool-general mailing list
>> >> >> > pgpool-general at pgpool.net
>> >> >> > http://www.pgpool.net/mailman/listinfo/pgpool-general
>> >> >> > _______________________________________________
>> >> >> > pgpool-general mailing list
>> >> >> > pgpool-general at pgpool.net
>> >> >> > http://www.pgpool.net/mailman/listinfo/pgpool-general
>> >> >> _______________________________________________
>> >> >> pgpool-general mailing list
>> >> >> pgpool-general at pgpool.net
>> >> >> http://www.pgpool.net/mailman/listinfo/pgpool-general
>> >> >>
>> >>
>> >> _______________________________________________
>> >> pgpool-general mailing list
>> >> pgpool-general at pgpool.net
>> >> http://www.pgpool.net/mailman/listinfo/pgpool-general
>> >>
>> >>
>> _______________________________________________
>> pgpool-general mailing list
>> pgpool-general at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>


More information about the pgpool-general mailing list