[pgpool-general: 4679] Re: Scaling Issues

Thu May 12 08:38:44 JST 2016

Great! Thanks for the report!

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Just upgraded to 3.5.2 this week- am seeing a hundred-fold reduction in
> server load and everything is running faster than before.  Thanks guys!
> 
> On Saturday, October 10, 2015, Joe Schaefer <joesuf4 at gmail.com> wrote:
> 
>> It's new to me too. It doesn't seem to happen for our four-node
>> application, just with the two node one.  I don't know if I can come up
>> with a test case since it only seems to happen when our application exceeds
>> about 60 users.
>>
>> The pgpool configuration on both sets of nodes is essentially the same, so
>> I'm at a loss what to do for the two-node application other than to disable
>> load-balancing.
>>
>> On Sat, Oct 10, 2015 at 7:05 PM, Tatsuo Ishii <ishii at postgresql.org
>> <javascript:_e(%7B%7D,'cvml','ishii at postgresql.org');>> wrote:
>>
>>> > Hi,
>>> >
>>> > Love the product so far but have run into a few bugs that will soon be
>>> > showstoppers for us.  Here's what we've found:
>>> >
>>> > Our application provides persistent connections to a web socket app,
>>> > potentially supporting hundreds of concurrent users.  At one site we
>>> have
>>> > num_init_children set to 2048 because we are replicating hundreds of
>>> > databases across four different master-slave postgres nodes.  The
>>> problem
>>> > here isn't really a big deal, but it's something that should be worked
>>> out
>>> > because no similar server-side software still possesses thundering-herd
>>> > type issues like we do with pgpool.  Our typical load averages for the
>>> > pgpool master is around 200-400.  CPU idle is fine, so we're not
>>> concerned
>>> > other than our load average statistics for that host are out of whack.
>>>
>>> The thundering-herd problem has been discussed recently (see
>>> [pgpool-general: 3934] accept() scalability issues)
>>> http://www.pgpool.net/pipermail/pgpool-general/2015-August/003992.html
>>>
>>> Next major version of pgpool-II will have a parameter to overcome the
>>> problem.
>>>
>>> http://www.pgpool.net/pipermail/pgpool-committers/2015-October/002711.html
>>>
>>> > Secondly, we have two-postgres-node setup with tens of databases and
>>> > num_init_children set to 512.  It is basically the same application but
>>> for
>>> > a different site.  What happens here is far more concerning: when we see
>>> > about 60+ concurrrent users (and hence have about 200 pgpool backend
>>> > connections) pgpool stops correctly shipping write queries to the
>>> master.
>>> > Instead it ships a certain percentage to the slave, which breaks our
>>> > application completely.
>>>
>>> This is new to me. Is there any test case to reproduce the problem?
>>>
>>> > Of the two, my real concern lies with the second situation as users are
>>> > impacted by pgpool's behavior during heavy load periods.  This doesn't
>>> seem
>>> > to happen under light load with less than 20 users and hence less than
>>> 100
>>> > backend connections.
>>> >
>>> > Anyone seen anything similar and know how to workaround it?  For kicks
>>> here
>>> > are my whitelist rules for sql functions:
>>> >
>>> > white_function_list = 'random,count,extract,date_part'
>>> >                                    # Comma separated list of function
>>> names
>>> >                                    # that don't write to database
>>> >                                    # Regexp are accepted
>>> > black_function_list = 'currval,lastval,nextval,setval,.*'
>>> >                                    # Comma separated list of function
>>> names
>>> >                                    # that write to database
>>> >                                    # Regexp are accepted
>>>
>>> Best regards,
>>> --
>>> Tatsuo Ishii
>>> SRA OSS, Inc. Japan
>>> English: http://www.sraoss.co.jp/index_en.php
>>> Japanese:http://www.sraoss.co.jp
>>>
>>
>>