View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000129 | Pgpool-II | Bug | public | 2015-02-15 20:31 | 2016-01-29 23:43 |
| Reporter | eldad | Assigned To | Muhammad Usama | ||
| Priority | high | Severity | crash | Reproducibility | always |
| Status | closed | Resolution | unable to reproduce | ||
| Platform | Linux | OS | RHEL | OS Version | 6.3 |
| Summary | 0000129: pgpool child out of memory | ||||
| Description | After upgrading from 3.3.3 to 3.4.0-3 queries are failing to run with out of memory exception Same HW and configuration is used. We have a dedicated host with 16G memory, only 3G is in use by pgpgool when getting these errors (see in additional info) We have 2 pgpool and 4 postgres servers. I'm attaching the conf file. | ||||
| Additional Information | 2015-02-15 01:03:01: pid 25048: DETAIL: connecting host=postgres-01.lab port=60184 TopMemoryContext: 234560 total in 25 blocks; 8392 free (7 chunks); 226168 used pgpool_child_main_loop: 24576 total in 2 blocks; 12080 free (8 chunks); 12496 used child_query_process: 0 total in 0 blocks; 0 free (0 chunks); 0 used QueryContextMemoryContext: 8192 total in 3 blocks; 3280 free (0 chunks); 4912 used pool_process_query: 8192 total in 1 blocks; 7872 free (0 chunks); 320 used SessionContext: 1024 total in 1 blocks; 664 free (0 chunks); 360 used pgpool_main_loop: 8192 total in 1 blocks; 8160 free (5 chunks); 32 used ErrorContext: 8192 total in 1 blocks; 8160 free (5 chunks); 32 used 2015-02-15 01:03:01: pid 30653: ERROR: out of memory 2015-02-15 01:03:01: pid 30653: DETAIL: Failed on request of size 2121728. 2015-02-15 01:03:01: pid 30653: LOG: garbage data from frontend after receiving terminate message ignored 2015-02-15 01:03:02: pid 29341: DETAIL: connecting host=postgres-02.lab port=43964 TopMemoryContext: 13202336 total in 79 blocks; 10872 free (12 chunks); 13191464 used pgpool_child_main_loop: 24576 total in 2 blocks; 12080 free (8 chunks); 12496 used child_query_process: 0 total in 0 blocks; 0 free (0 chunks); 0 used QueryContextMemoryContext: 4096 total in 2 blocks; 1408 free (1 chunks); 2688 used pool_process_query: 0 total in 0 blocks; 0 free (0 chunks); 0 used QueryContextMemoryContext: 4096 total in 2 blocks; 152 free (1 chunks); 3944 used pool_process_query: 8192 total in 1 blocks; 8160 free (1 chunks); 32 used SessionContext: 1024 total in 1 blocks; 592 free (0 chunks); 432 used pgpool_main_loop: 8192 total in 1 blocks; 8160 free (5 chunks); 32 used ErrorContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used 2015-02-15 01:03:03: pid 30489: DETAIL: connecting host=postgres-01.lab port=60197 TopMemoryContext: 289600 total in 35 blocks; 6848 free (10 chunks); 282752 used pgpool_child_main_loop: 24576 total in 2 blocks; 12080 free (8 chunks); 12496 used child_query_process: 0 total in 0 blocks; 0 free (0 chunks); 0 used QueryContextMemoryContext: 4096 total in 2 blocks; 152 free (1 chunks); 3944 used pool_process_query: 8192 total in 1 blocks; 8000 free (0 chunks); 192 used SessionContext: 1024 total in 1 blocks; 728 free (0 chunks); 296 used pgpool_main_loop: 8192 total in 1 blocks; 8160 free (5 chunks); 32 used ErrorContext: 8192 total in 1 blocks; 8160 free (4 chunks); 32 used 2015-02-15 01:03:03: pid 30489: ERROR: out of memory 2015-02-15 01:03:03: pid 30489: DETAIL: Failed on request of size 2121728. 2015-02-15 01:03:03: pid 30489: LOG: garbage data from frontend after receiving terminate message ignored 2015-02-15 01:03:03: pid 30621: DETAIL: connecting host=postgres-02.lab port=43971 TopMemoryContext: 391488 total in 54 blocks; 5000 free (19 chunks); 386488 used pgpool_child_main_loop: 24576 total in 2 blocks; 12816 free (4 chunks); 11760 used pgpool_main_loop: 8192 total in 1 blocks; 8160 free (5 chunks); 32 used ErrorContext: 8192 total in 1 blocks; 8160 free (4 chunks); 32 used 2015-02-15 01:03:03: pid 30621: ERROR: out of memory 2015-02-15 01:03:03: pid 30621: DETAIL: Failed on request of size 1024. 2015-02-15 01:03:03: pid 26997: WARNING: child process with pid: 30621 was terminated by segmentation fault 2015-02-15 01:03:03: pid 26997: FATAL: failed to fork a child 2015-02-15 01:03:03: pid 26997: DETAIL: system call fork() failed with reason: Cannot allocate memory | ||||
| Tags | No tags attached. | ||||
|
|
|
|
|
pgpool-II 3.4.1 fixed memory leak bugs. Please try 3.4.1. http://pgpool.net/mediawiki/index.php/Yum_Repository |
|
|
Hi, Just upgraded to 3.4.1-2, same problem. 2015-02-15 03:44:30: pid 12557: ERROR: out of memory 2015-02-15 03:44:30: pid 12557: DETAIL: Failed on request of size 2121728. 2015-02-15 03:44:43: pid 12516: ERROR: out of memory 2015-02-15 03:44:43: pid 12516: DETAIL: Failed on request of size 1472000. 2015-02-15 03:44:44: pid 12244: FATAL: failed to fork a child 2015-02-15 03:44:44: pid 12244: DETAIL: system call fork() failed with reason: Cannot allocate memory after the last message pgpool shut itself down. seems like when it gets to 3-3.5G usage it crushed. the host has 16G I checked the limits.conf and sysctl.conf, looks ok, see below: limits.conf: postgres soft nproc 4500 postgres hard nproc 4500 postgres soft nofile 65536 postgres hard nofile 65536 postgres soft memlock 15728640 postgres hard memlock 15728640 sysctl.conf: kernel.shmmax=16106127360 kernel.shmall=8244661 kernel.sem=4010 569420 4010 142 No errors found in the postgres logs. Please advise, its urgent. Many thanks, Eldad |
|
|
Hello Eldad! I am trying to reproduce the issue but with no luck till now. I am getting no memory leak and getting almost identical memory consumption of pgpool-II 3.3 and 3.4.1 with my testing. By the look of the above details it doesn't seem like a memory leak, because the log shared by you suggests that the process only is holding only ~1MB or memory and yet reporting the error "out of memory" "2015-02-15 01:03:03: pid 30489: ERROR: out of memory 2015-02-15 01:03:03: pid 30489: DETAIL: Failed on request of size 2121728. 2015-02-15 01:03:03: pid 30489: LOG: garbage data from frontend after receiving terminate message ignored 2015-02-15 01:03:03: pid 30621: DETAIL: connecting host=postgres-02.lab port=43971 TopMemoryContext: 391488 total in 54 blocks; 5000 free (19 chunks); 386488 used" For further investigation, Can you please share some more information on which OS are you using and how do you get the out of memory issue. Specifically the ps output of all the pgpool-II processes, output of Top, contents of /proc/meminfo and after how much time did pgpool-II goes out of memory. And is there any specific type of queries which cause this issue. Lastly, as you mentioned pgpool-II 3.4 start reporting "out of memory" when it gets around 3 to 3.5 G memory. Do you have any data like how much total memory did pgpool-II 3.3 uses with similar configuration and load. Thanks Best regards! Muhammad Usama |
|
|
Hi Muhammad, I downgrade th pgpool to 3.3.3-1 that I had before but the problem persist so I'm not sure what is the source problem here. in this version I get these error messages for all the child processes: 2015-02-16 07:09:00 ERROR: pid 7318: pool_create_relcache: cannot allocate memory 2121728 2015-02-16 07:09:00 ERROR: pid 7318: is_system_catalog: pool_create_relcache error It happens 2-3 min after the load tests start from the application side. I'm attaching the information you need in the files. I use RHEL 6.3. Before pgpool processes usually took 2-3 GB overall. Thanks for your help, Eldad |
|
|
|
|
|
|
|
|
|
|
|
from the error log I see that the memory exception is always on the pool_create_relcache. I think this query is related: SELECT n.nspname as "Schema", c.relname as "Name", CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' END as "Type", pg_catalog.pg_get_userbyid(c.relowner) as "Owner" FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind IN ('r','v','m','S','f','') AND n.nspname <> 'pg_catalog' AND n.nspname <> 'information_schema' AND n.nspname !~ '^pg_toast' AND pg_catalog.pg_table_is_visible(c.oid) ORDER BY 1,2; |
|
|
Hi Thanks for providing the information. Apparently this seems to be some OS limit setting issue. Can you please share the the output of "ulimit -a" |
|
|
Hi, here is the output: core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127456 max locked memory (kbytes, -l) 15728640 max memory size (kbytes, -m) unlimited open files (-n) 65536 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 4500 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited |
|
|
Hi, I think I found the root cause for this problem, looks very very strange. trusted_servers parameter in the pgpool.conf can hold a list of server for connectivity checks by the watchdog process. Currently each pgpool has the other pgpool node in this parameter. But if the other pgpool is totally out (reboot for example) then the active pgpool fails the watchdog lifecheck and ping To trusted server, so it think it lost the network and shut himself down. To fix it I added more servers to the list (our DB servers) so even if 1-2 servers are down ping will still work. (I had total 5 servers in the list). Apparently this is what caused the pgpool to break apart and in some point we got out of memory exception for all child processes Even though this parameter is relevant for the watchdog process only. my guess this is a memory leak in the process that run over the child process memory. The bug exists in both old (3.3.3) and new version (3.4.1). Regards, Eldad |
|
|
I am not able to reproduce the issue, and as per the reporter he is able to execute the pgpool-II without any problem after configuration changes. Also the system trace shared by user does not suggests memory leaks in pgpool-II. So I am closing this issue. |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2015-02-15 20:31 | eldad | New Issue | |
| 2015-02-15 20:31 | eldad | File Added: pgpool.conf | |
| 2015-02-15 20:39 | t-ishii | Assigned To | => t-ishii |
| 2015-02-15 20:39 | t-ishii | Status | new => feedback |
| 2015-02-15 20:42 | t-ishii | Note Added: 0000514 | |
| 2015-02-15 22:47 | eldad | Note Added: 0000515 | |
| 2015-02-15 22:47 | eldad | Status | feedback => assigned |
| 2015-02-16 23:51 | Muhammad Usama | Note Added: 0000516 | |
| 2015-02-17 00:18 | eldad | Note Added: 0000517 | |
| 2015-02-17 00:18 | eldad | File Added: processes.txt | |
| 2015-02-17 00:18 | eldad | File Added: meminfo.txt | |
| 2015-02-17 00:19 | eldad | File Added: top.txt | |
| 2015-02-17 00:25 | eldad | Note Added: 0000518 | |
| 2015-02-17 01:07 | Muhammad Usama | Note Added: 0000519 | |
| 2015-02-17 23:53 | eldad | Note Added: 0000520 | |
| 2015-02-18 22:41 | eldad | Note Added: 0000521 | |
| 2016-01-20 13:38 | t-ishii | Assigned To | t-ishii => Muhammad Usama |
| 2016-01-29 23:43 | Muhammad Usama | Note Added: 0000643 | |
| 2016-01-29 23:43 | Muhammad Usama | Status | assigned => closed |
| 2016-01-29 23:43 | Muhammad Usama | Resolution | open => unable to reproduce |