0000315: High CPU usage when commiting large transactions and using in (shared) memory cache - Pgpool-II Bug Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0000315	Pgpool-II	Enhancement	public	2017-06-20 01:47	2017-08-29 09:25

Reporter	aarevalo	Assigned To	Muhammad Usama
Priority	normal	Severity	major	Reproducibility	always
Status	closed	Resolution	open
Product Version	3.6.4

Summary	0000315: High CPU usage when commiting large transactions and using in (shared) memory cache
Description	Pgpool version: 3.6.4 OS: Ubuntu Server 16 In memory cache settings: memory_cache_enabled = on memqcache_method = 'shmem' memqcache_total_size = 2147483648 # 2 GB memqcache_max_num_cache = 1000000 memqcache_auto_cache_invalidation = on memqcache_maxcache = 409600 # default, 400 KB We have detected that some transactions, which involve large number of queries and data rows, make pgpool child processes get stuck up to several minutes just after they receive the "COMMIT" statement. During this time, processes consume 100% of their CPU, until they are ready to accept the next connection. We have tracked down to the point where they are "stuck". It's always on the same place: #0 AllocSetFree (context=0x12e3b50, pointer=0x53e5bb0) at ../../src/utils/mmgr/aset.c:965 0000001 0x0000000000437cf7 in pool_discard_buffer (buffer=0x53d4a80) at query_cache/pool_memqcache.c:2928 0000002 0x0000000000439a25 in pool_discard_temp_query_cache (temp_cache=0x53d4620) at query_cache/pool_memqcache.c:2817 0000003 0x0000000000439aa5 in pool_discard_query_cache_array (cache_array=0xeeb5a40) at query_cache/pool_memqcache.c:2750 0000004 0x0000000000439b9f in pool_reset_memqcache_buffer () at query_cache/pool_memqcache.c:1677 0000005 0x000000000043cf15 in pool_handle_query_cache (backend=backend@entry=0x131e490, query=query@entry=0x14f6dc0 "COMMIT", node=node@entry=0x134e4a0, state=<optimized out>) at query_cache/pool_memqcache.c:3285 0000006 0x0000000000435fbc in ReadyForQuery (frontend=frontend@entry=0x131ff10, backend=backend@entry=0x131e490, send_ready=send_ready@entry=1 '\001', cache_commit=cache_commit@entry=1 '\001') at protocol/pool_proto_modules.c:1942 0000007 0x000000000043662c in ProcessBackendResponse (frontend=frontend@entry=0x131ff10, backend=backend@entry=0x131e490, state=state@entry=0x7fffae4523fc, num_fields=num_fields@entry=0x7fffae4523fa) at protocol/pool_proto_modules.c:2567 0000008 0x000000000042ae9e in pool_process_query (frontend=0x131ff10, backend=0x131e490, reset_request=reset_request@entry=0) at protocol/pool_process_query.c:303 0000009 0x0000000000425781 in do_child (fds=fds@entry=0x11e64c0) at protocol/child.c:377 0000010 0x0000000000409a45 in fork_a_child (fds=0x11e64c0, id=89) at main/pgpool_main.c:755 0000011 0x000000000040ac4e in reaper () at main/pgpool_main.c:2525 0000012 0x000000000040d9ad in pool_sleep (second=<optimized out>) at main/pgpool_main.c:2741 0000013 0x000000000040f9ef in PgpoolMain (discard_status=discard_status@entry=0 '\000', clear_memcache_oidmaps=clear_memcache_oidmaps@entry=0 '\000') at main/pgpool_main.c:533 0000014 0x00000000004081ec in main (argc=<optimized out>, argv=<optimized out>) at main/main.c:300 After some investigation, it seems that processes are consuming huge amounts of CPU trying to free previously allocated chunks of memory. They do so by looping over a single linked list. As this piece of code comes from the Postgresql project, we have found that they have applied some optimizations on this topic (https://github.com/postgres/postgres/commit/ff97741bc810390db6dd4da0f31ee1e93c8d3abb) that can be back-ported to Pgpool. In memory cache works really well most of the time, but at peak times it makes our system totally irresponsible, as it becomes a bottleneck, even with dedicated hardware (16x Xeon E5-2689v4 @ 3.10Ghz cores, 64 GB RAM).
Tags	No tags attached.

Muhammad Usama 2017-06-30 00:56 developer ~0001562	Hi, I have imported all the changes made to PostgreSQL's memory manager API since it was installed in the Pgpool-II. https://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=85392b89b5791cb3dceb59c6567f47911758467e Can you please check if it improves the performance for mentioned case, You would require to build the Pgpool-II from the source code for performing the test. Thanks, Best Regards

aarevalo 2017-07-07 16:52 reporter ~0001578	We have compiled pgpool and is running in our production environment under heavy load (sustained +20K TPS) since two days ago. We haven't seen the previous behaviour, so now it's working as expected. Moreover, we see an improvement in CPU usage ... now it's using about 25% less processor resources (measuring load average and user cpu time), probably related to the improvements in memory management. Thanks for the great work!

Date Modified	Username	Field	Change
2017-06-20 01:47	aarevalo	New Issue
2017-06-20 11:20	t-ishii	Assigned To	=> Muhammad Usama
2017-06-20 11:20	t-ishii	Status	new => assigned
2017-06-30 00:56	Muhammad Usama	Note Added: 0001562
2017-07-07 16:52	aarevalo	Note Added: 0001578
2017-08-29 09:25	pengbo	Status	assigned => closed