View Issue Details

IDProjectCategoryView StatusLast Update
0000315Pgpool-IIEnhancementpublic2017-08-29 09:25
Reporteraarevalo Assigned ToMuhammad Usama  
PrioritynormalSeveritymajorReproducibilityalways
Status closedResolutionopen 
Product Version3.6.4 
Summary0000315: High CPU usage when commiting large transactions and using in (shared) memory cache
DescriptionPgpool version: 3.6.4
OS: Ubuntu Server 16

In memory cache settings:

    memory_cache_enabled = on
    memqcache_method = 'shmem'
    memqcache_total_size = 2147483648 # 2 GB
    memqcache_max_num_cache = 1000000
    memqcache_auto_cache_invalidation = on
    memqcache_maxcache = 409600 # default, 400 KB


We have detected that some transactions, which involve large number of queries and data rows, make pgpool child processes get stuck up to several minutes just after they receive the "COMMIT" statement. During this time, processes consume 100% of their CPU, until they are ready to accept the next connection.

We have tracked down to the point where they are "stuck". It's always on the same place:

#0 AllocSetFree (context=0x12e3b50, pointer=0x53e5bb0) at ../../src/utils/mmgr/aset.c:965
0000001 0x0000000000437cf7 in pool_discard_buffer (buffer=0x53d4a80) at query_cache/pool_memqcache.c:2928
0000002 0x0000000000439a25 in pool_discard_temp_query_cache (temp_cache=0x53d4620) at query_cache/pool_memqcache.c:2817
0000003 0x0000000000439aa5 in pool_discard_query_cache_array (cache_array=0xeeb5a40) at query_cache/pool_memqcache.c:2750
0000004 0x0000000000439b9f in pool_reset_memqcache_buffer () at query_cache/pool_memqcache.c:1677
0000005 0x000000000043cf15 in pool_handle_query_cache (backend=backend@entry=0x131e490, query=query@entry=0x14f6dc0 "COMMIT", node=node@entry=0x134e4a0, state=<optimized out>) at query_cache/pool_memqcache.c:3285
0000006 0x0000000000435fbc in ReadyForQuery (frontend=frontend@entry=0x131ff10, backend=backend@entry=0x131e490, send_ready=send_ready@entry=1 '\001', cache_commit=cache_commit@entry=1 '\001') at protocol/pool_proto_modules.c:1942
0000007 0x000000000043662c in ProcessBackendResponse (frontend=frontend@entry=0x131ff10, backend=backend@entry=0x131e490, state=state@entry=0x7fffae4523fc, num_fields=num_fields@entry=0x7fffae4523fa) at protocol/pool_proto_modules.c:2567
0000008 0x000000000042ae9e in pool_process_query (frontend=0x131ff10, backend=0x131e490, reset_request=reset_request@entry=0) at protocol/pool_process_query.c:303
0000009 0x0000000000425781 in do_child (fds=fds@entry=0x11e64c0) at protocol/child.c:377
0000010 0x0000000000409a45 in fork_a_child (fds=0x11e64c0, id=89) at main/pgpool_main.c:755
0000011 0x000000000040ac4e in reaper () at main/pgpool_main.c:2525
0000012 0x000000000040d9ad in pool_sleep (second=<optimized out>) at main/pgpool_main.c:2741
0000013 0x000000000040f9ef in PgpoolMain (discard_status=discard_status@entry=0 '\000', clear_memcache_oidmaps=clear_memcache_oidmaps@entry=0 '\000') at main/pgpool_main.c:533
0000014 0x00000000004081ec in main (argc=<optimized out>, argv=<optimized out>) at main/main.c:300

After some investigation, it seems that processes are consuming huge amounts of CPU trying to free previously allocated chunks of memory. They do so by looping over a single linked list. As this piece of code comes from the Postgresql project, we have found that they have applied some optimizations on this topic (https://github.com/postgres/postgres/commit/ff97741bc810390db6dd4da0f31ee1e93c8d3abb) that can be back-ported to Pgpool.

In memory cache works really well most of the time, but at peak times it makes our system totally irresponsible, as it becomes a bottleneck, even with dedicated hardware (16x Xeon E5-2689v4 @ 3.10Ghz cores, 64 GB RAM).
TagsNo tags attached.

Activities

Muhammad Usama

2017-06-30 00:56

developer   ~0001562

Hi,
I have imported all the changes made to PostgreSQL's memory manager API since it was installed in the Pgpool-II.

https://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=85392b89b5791cb3dceb59c6567f47911758467e

Can you please check if it improves the performance for mentioned case, You would require to build the Pgpool-II from the source code for performing the test.



Thanks,
Best Regards

aarevalo

2017-07-07 16:52

reporter   ~0001578

We have compiled pgpool and is running in our production environment under heavy load (sustained +20K TPS) since two days ago.
We haven't seen the previous behaviour, so now it's working as expected.
Moreover, we see an improvement in CPU usage ... now it's using about 25% less processor resources (measuring load average and user cpu time), probably related to the improvements in memory management.

Thanks for the great work!

Issue History

Date Modified Username Field Change
2017-06-20 01:47 aarevalo New Issue
2017-06-20 11:20 t-ishii Assigned To => Muhammad Usama
2017-06-20 11:20 t-ishii Status new => assigned
2017-06-30 00:56 Muhammad Usama Note Added: 0001562
2017-07-07 16:52 aarevalo Note Added: 0001578
2017-08-29 09:25 pengbo Status assigned => closed