[pgpool-general: 3998] Re: accept() scalability issues

Sun Aug 23 10:24:09 JST 2015

>> On 21 Aug 2015, at 00:50, Tatsuo Ishii <ishii at postgresql.org> wrote:
>> 
>> I was hoping to see imporvements in pgbench -C or similar use case
>> (for example ab), which shows the connection overhead from client to
>> pgpool-II.
> 
> Sorry, we don’t have any systems suitable for benchmarking - no free spare metal and the VM hosts are all pretty loaded.
> 
> The measurement that will improve the most is latency since there are much fewer context switches and CPU load, meaning that the time between completed TCP handshake and a pgpool child being attached to the socket is noticeably lower.
> 
> This is corroborated in this graph: http://i.imgur.com/d0IafG0.png <http://i.imgur.com/d0IafG0.png>
> 
> It shows the average SQL query runtime of a PHP application without connection pooling - each request will create a new database connection.
> 
> mean_90 shows the truncated mean value after the top 10% of the query runtimes have been removed
> upper_90 shows shows the 90th percentile of the query runtimes

It seems in the graph green line dropped from 2.0 to 1.0 at around
08/20 10PM. This is because you installed the patch?

> See https://github.com/etsy/statsd/pull/499/files <https://github.com/etsy/statsd/pull/499/files> for thorough explanation of the values
> 
> The ~0.5ms - 1ms drop in mean_90 is most likely the saved connection overhead, the lowered upper_90 hints that there were cases where the overall system load had adverse effects on some queries (e.g. pgpool processes not being scheduled on a CPU in time)
> 
> Since the amount of TPS of single processes is directly related to system latency we probably doubled the TPS of our system (2ms to 1ms) - the maximum achievable throughput of the system has massively increased due to the lowered CPU utilisation.

This makes sense. Even if you don't know TPS, the latency enhancement
can explain the performance improvement. Thanks!

> 
> Hope this suffices, if you absolutely need synthetic pgbench runs I’d have to check if we can decommission a few systems to have a proper testing environment.

No, I'm fine.

BTW, as I told before, the previous patch broke child_life_time
functionality. So I come up with different patch which does not break
child_life_time.

This time I introduced a global counter which indicates how many child
process concurrently issues accept(2). If it's greater than max
(currently it is set to 1. You can tweak it if you like. Look for
"MAX_ACCEPT_CHILD" in pool.h), the child gives up issuing accept(). If
not, count up it and issues accept().

However unlike the previous patch all child process concurrently
issues select(2) and will be woke up when an event arrives to file
descriptors. So heavy context switch might happen.  But this might be
over-thinking because even in the previous patch all the process that
are waiting for acquiring sophomore are woke up when the holder of the
semaphore release it anyway.

I hope the new patch does well as the previous patch.

Attached patch is against v3.3 stable tree.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: accept_lock_v3.patch
Type: text/x-patch
Size: 4286 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20150823/f5da7910/attachment.bin>