0000129: pgpool child out of memory - Pgpool-II Bug Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0000129	Pgpool-II	Bug	public	2015-02-15 20:31	2016-01-29 23:43

Reporter	eldad	Assigned To	Muhammad Usama
Priority	high	Severity	crash	Reproducibility	always
Status	closed	Resolution	unable to reproduce
Platform	Linux	OS	RHEL	OS Version	6.3

Summary	0000129: pgpool child out of memory
Description	After upgrading from 3.3.3 to 3.4.0-3 queries are failing to run with out of memory exception Same HW and configuration is used. We have a dedicated host with 16G memory, only 3G is in use by pgpgool when getting these errors (see in additional info) We have 2 pgpool and 4 postgres servers. I'm attaching the conf file.
Additional Information	2015-02-15 01:03:01: pid 25048: DETAIL: connecting host=postgres-01.lab port=60184 TopMemoryContext: 234560 total in 25 blocks; 8392 free (7 chunks); 226168 used pgpool_child_main_loop: 24576 total in 2 blocks; 12080 free (8 chunks); 12496 used child_query_process: 0 total in 0 blocks; 0 free (0 chunks); 0 used QueryContextMemoryContext: 8192 total in 3 blocks; 3280 free (0 chunks); 4912 used pool_process_query: 8192 total in 1 blocks; 7872 free (0 chunks); 320 used SessionContext: 1024 total in 1 blocks; 664 free (0 chunks); 360 used pgpool_main_loop: 8192 total in 1 blocks; 8160 free (5 chunks); 32 used ErrorContext: 8192 total in 1 blocks; 8160 free (5 chunks); 32 used 2015-02-15 01:03:01: pid 30653: ERROR: out of memory 2015-02-15 01:03:01: pid 30653: DETAIL: Failed on request of size 2121728. 2015-02-15 01:03:01: pid 30653: LOG: garbage data from frontend after receiving terminate message ignored 2015-02-15 01:03:02: pid 29341: DETAIL: connecting host=postgres-02.lab port=43964 TopMemoryContext: 13202336 total in 79 blocks; 10872 free (12 chunks); 13191464 used pgpool_child_main_loop: 24576 total in 2 blocks; 12080 free (8 chunks); 12496 used child_query_process: 0 total in 0 blocks; 0 free (0 chunks); 0 used QueryContextMemoryContext: 4096 total in 2 blocks; 1408 free (1 chunks); 2688 used pool_process_query: 0 total in 0 blocks; 0 free (0 chunks); 0 used QueryContextMemoryContext: 4096 total in 2 blocks; 152 free (1 chunks); 3944 used pool_process_query: 8192 total in 1 blocks; 8160 free (1 chunks); 32 used SessionContext: 1024 total in 1 blocks; 592 free (0 chunks); 432 used pgpool_main_loop: 8192 total in 1 blocks; 8160 free (5 chunks); 32 used ErrorContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used 2015-02-15 01:03:03: pid 30489: DETAIL: connecting host=postgres-01.lab port=60197 TopMemoryContext: 289600 total in 35 blocks; 6848 free (10 chunks); 282752 used pgpool_child_main_loop: 24576 total in 2 blocks; 12080 free (8 chunks); 12496 used child_query_process: 0 total in 0 blocks; 0 free (0 chunks); 0 used QueryContextMemoryContext: 4096 total in 2 blocks; 152 free (1 chunks); 3944 used pool_process_query: 8192 total in 1 blocks; 8000 free (0 chunks); 192 used SessionContext: 1024 total in 1 blocks; 728 free (0 chunks); 296 used pgpool_main_loop: 8192 total in 1 blocks; 8160 free (5 chunks); 32 used ErrorContext: 8192 total in 1 blocks; 8160 free (4 chunks); 32 used 2015-02-15 01:03:03: pid 30489: ERROR: out of memory 2015-02-15 01:03:03: pid 30489: DETAIL: Failed on request of size 2121728. 2015-02-15 01:03:03: pid 30489: LOG: garbage data from frontend after receiving terminate message ignored 2015-02-15 01:03:03: pid 30621: DETAIL: connecting host=postgres-02.lab port=43971 TopMemoryContext: 391488 total in 54 blocks; 5000 free (19 chunks); 386488 used pgpool_child_main_loop: 24576 total in 2 blocks; 12816 free (4 chunks); 11760 used pgpool_main_loop: 8192 total in 1 blocks; 8160 free (5 chunks); 32 used ErrorContext: 8192 total in 1 blocks; 8160 free (4 chunks); 32 used 2015-02-15 01:03:03: pid 30621: ERROR: out of memory 2015-02-15 01:03:03: pid 30621: DETAIL: Failed on request of size 1024. 2015-02-15 01:03:03: pid 26997: WARNING: child process with pid: 30621 was terminated by segmentation fault 2015-02-15 01:03:03: pid 26997: FATAL: failed to fork a child 2015-02-15 01:03:03: pid 26997: DETAIL: system call fork() failed with reason: Cannot allocate memory
Tags	No tags attached.

eldad 2015-02-15 20:31 reporter	pgpool.conf (30,619 bytes)

t-ishii 2015-02-15 20:42 developer ~0000514	pgpool-II 3.4.1 fixed memory leak bugs. Please try 3.4.1. http://pgpool.net/mediawiki/index.php/Yum_Repository

eldad 2015-02-15 22:47 reporter ~0000515	Hi, Just upgraded to 3.4.1-2, same problem. 2015-02-15 03:44:30: pid 12557: ERROR: out of memory 2015-02-15 03:44:30: pid 12557: DETAIL: Failed on request of size 2121728. 2015-02-15 03:44:43: pid 12516: ERROR: out of memory 2015-02-15 03:44:43: pid 12516: DETAIL: Failed on request of size 1472000. 2015-02-15 03:44:44: pid 12244: FATAL: failed to fork a child 2015-02-15 03:44:44: pid 12244: DETAIL: system call fork() failed with reason: Cannot allocate memory after the last message pgpool shut itself down. seems like when it gets to 3-3.5G usage it crushed. the host has 16G I checked the limits.conf and sysctl.conf, looks ok, see below: limits.conf: postgres soft nproc 4500 postgres hard nproc 4500 postgres soft nofile 65536 postgres hard nofile 65536 postgres soft memlock 15728640 postgres hard memlock 15728640 sysctl.conf: kernel.shmmax=16106127360 kernel.shmall=8244661 kernel.sem=4010 569420 4010 142 No errors found in the postgres logs. Please advise, its urgent. Many thanks, Eldad

Muhammad Usama 2015-02-16 23:51 developer ~0000516	Hello Eldad! I am trying to reproduce the issue but with no luck till now. I am getting no memory leak and getting almost identical memory consumption of pgpool-II 3.3 and 3.4.1 with my testing. By the look of the above details it doesn't seem like a memory leak, because the log shared by you suggests that the process only is holding only ~1MB or memory and yet reporting the error "out of memory" "2015-02-15 01:03:03: pid 30489: ERROR: out of memory 2015-02-15 01:03:03: pid 30489: DETAIL: Failed on request of size 2121728. 2015-02-15 01:03:03: pid 30489: LOG: garbage data from frontend after receiving terminate message ignored 2015-02-15 01:03:03: pid 30621: DETAIL: connecting host=postgres-02.lab port=43971 TopMemoryContext: 391488 total in 54 blocks; 5000 free (19 chunks); 386488 used" For further investigation, Can you please share some more information on which OS are you using and how do you get the out of memory issue. Specifically the ps output of all the pgpool-II processes, output of Top, contents of /proc/meminfo and after how much time did pgpool-II goes out of memory. And is there any specific type of queries which cause this issue. Lastly, as you mentioned pgpool-II 3.4 start reporting "out of memory" when it gets around 3 to 3.5 G memory. Do you have any data like how much total memory did pgpool-II 3.3 uses with similar configuration and load. Thanks Best regards! Muhammad Usama

eldad 2015-02-17 00:18 reporter ~0000517	Hi Muhammad, I downgrade th pgpool to 3.3.3-1 that I had before but the problem persist so I'm not sure what is the source problem here. in this version I get these error messages for all the child processes: 2015-02-16 07:09:00 ERROR: pid 7318: pool_create_relcache: cannot allocate memory 2121728 2015-02-16 07:09:00 ERROR: pid 7318: is_system_catalog: pool_create_relcache error It happens 2-3 min after the load tests start from the application side. I'm attaching the information you need in the files. I use RHEL 6.3. Before pgpool processes usually took 2-3 GB overall. Thanks for your help, Eldad

eldad 2015-02-17 00:18 reporter	processes.txt (34,889 bytes) processes.txt (34,889 bytes)

eldad 2015-02-17 00:18 reporter	meminfo.txt (1,212 bytes) meminfo.txt (1,212 bytes)

eldad 2015-02-17 00:19 reporter	top.txt (3,329 bytes) top.txt (3,329 bytes)

eldad 2015-02-17 00:25 reporter ~0000518	from the error log I see that the memory exception is always on the pool_create_relcache. I think this query is related: SELECT n.nspname as "Schema", c.relname as "Name", CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' END as "Type", pg_catalog.pg_get_userbyid(c.relowner) as "Owner" FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind IN ('r','v','m','S','f','') AND n.nspname <> 'pg_catalog' AND n.nspname <> 'information_schema' AND n.nspname !~ '^pg_toast' AND pg_catalog.pg_table_is_visible(c.oid) ORDER BY 1,2;

Muhammad Usama 2015-02-17 01:07 developer ~0000519	Hi Thanks for providing the information. Apparently this seems to be some OS limit setting issue. Can you please share the the output of "ulimit -a"

eldad 2015-02-17 23:53 reporter ~0000520	Hi, here is the output: core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127456 max locked memory (kbytes, -l) 15728640 max memory size (kbytes, -m) unlimited open files (-n) 65536 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 4500 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited

eldad 2015-02-18 22:41 reporter ~0000521	Hi, I think I found the root cause for this problem, looks very very strange. trusted_servers parameter in the pgpool.conf can hold a list of server for connectivity checks by the watchdog process. Currently each pgpool has the other pgpool node in this parameter. But if the other pgpool is totally out (reboot for example) then the active pgpool fails the watchdog lifecheck and ping To trusted server, so it think it lost the network and shut himself down. To fix it I added more servers to the list (our DB servers) so even if 1-2 servers are down ping will still work. (I had total 5 servers in the list). Apparently this is what caused the pgpool to break apart and in some point we got out of memory exception for all child processes Even though this parameter is relevant for the watchdog process only. my guess this is a memory leak in the process that run over the child process memory. The bug exists in both old (3.3.3) and new version (3.4.1). Regards, Eldad

Muhammad Usama 2016-01-29 23:43 developer ~0000643	I am not able to reproduce the issue, and as per the reporter he is able to execute the pgpool-II without any problem after configuration changes. Also the system trace shared by user does not suggests memory leaks in pgpool-II. So I am closing this issue.

Date Modified	Username	Field	Change
2015-02-15 20:31	eldad	New Issue
2015-02-15 20:31	eldad	File Added: pgpool.conf
2015-02-15 20:39	t-ishii	Assigned To	=> t-ishii
2015-02-15 20:39	t-ishii	Status	new => feedback
2015-02-15 20:42	t-ishii	Note Added: 0000514
2015-02-15 22:47	eldad	Note Added: 0000515
2015-02-15 22:47	eldad	Status	feedback => assigned
2015-02-16 23:51	Muhammad Usama	Note Added: 0000516
2015-02-17 00:18	eldad	Note Added: 0000517
2015-02-17 00:18	eldad	File Added: processes.txt
2015-02-17 00:18	eldad	File Added: meminfo.txt
2015-02-17 00:19	eldad	File Added: top.txt
2015-02-17 00:25	eldad	Note Added: 0000518
2015-02-17 01:07	Muhammad Usama	Note Added: 0000519
2015-02-17 23:53	eldad	Note Added: 0000520
2015-02-18 22:41	eldad	Note Added: 0000521
2016-01-20 13:38	t-ishii	Assigned To	t-ishii => Muhammad Usama
2016-01-29 23:43	Muhammad Usama	Note Added: 0000643
2016-01-29 23:43	Muhammad Usama	Status	assigned => closed
2016-01-29 23:43	Muhammad Usama	Resolution	open => unable to reproduce

View Issue Details

Activities

Issue History