View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000028 | Pgpool-II | Bug | public | 2012-10-16 20:56 | 2013-02-01 10:24 |
| Reporter | anilth | Assigned To | nagata | ||
| Priority | normal | Severity | minor | Reproducibility | always |
| Status | closed | Resolution | fixed | ||
| Platform | Cenots | OS | Centos 6 | OS Version | 6 |
| Summary | 0000028: general protection & Segfault | ||||
| Description | I have pgpool-II version 3.2.1 (namameboshi) installed in my Centos6. I have two postgres 9.2 backend server. And, following postgres92 packages are installed in pgpool server. postgresql92-libs-9.2.1-1PGDG.rhel6.x86_64 postgresql92-devel-9.2.1-1PGDG.rhel6.x86_64 postgresql92-9.2.1-1PGDG.rhel6.x86_64 In pgpool.conf I have only configured the replication and loadbalancing set to on, rest is default. Every now and then I get message in log : Oct 15 12:53:52 server1 kernel: pgpool[4432] general protection ip:46f02d sp:7ffff8641260 error:0 in pgpool[400000+ef000] Oct 15 12:57:26 server1 kernel: pgpool[2123] general protection ip:46f02d sp:7ffff86412a0 error:0 in pgpool[400000+ef000] Oct 15 13:12:39 server1 kernel: pgpool[2223] general protection ip:46f02d sp:7ffff86412a0 error:0 in pgpool[400000+ef000] Oct 15 14:01:51 server1 kernel: pgpool[4490]: segfault at 20 ip 000000000046f02d sp 00007ffff8641260 error 4 in pgpool[400000+ef000] Oct 15 14:11:33 server1 kernel: pgpool[4474]: segfault at 20 ip 000000000046f02d sp 00007ffff8641260 error 4 in pgpool[400000+ef000] Oct 15 14:34:28 server1 kernel: pgpool[4591] general protection ip:46f02d sp:7ffff8641260 error:0 in pgpool[400000+ef000] Oct 15 14:54:47 server1 kernel: pgpool[4577] general protection ip:46f02d sp:7ffff8641260 error:0 in pgpool[400000+ef000] Oct 15 14:59:25 server1 kernel: pgpool[4545] general protection ip:46f02d sp:7ffff8641260 error:0 in pgpool[400000+ef000] Oct 15 15:45:14 server1 kernel: pgpool[1953]: segfault at 20 ip 000000000046f02d sp 00007ffff86412a0 error 4 in pgpool[400000+ef000] Then, when this happens, my webinterface(PHP) stops and says connection timeout. After few refresh in the browser it comes back again. | ||||
| Additional Information | This happens when I am updating records in bulk via webinterface. Then I try with pgadmin tool - with no problem but with datastudio it is really really bad. | ||||
| Tags | No tags attached. | ||||
|
|
Could you provide the following information? - log of pgpool - back trace - (if possible) the query you executed and the number of rows updated. If you have a core file, you can get a back trace as follows. % gdb pgpool core-file (gdb) bt |
|
|
|
|
|
I have attached the pgpool log file. Created the core file by: gdb --pid=10049 (gdb)generate-core-file, then, gdb pgpool core.10049 (gdb)bt Then i get: (gdb) bt #0 0x00007fd60c811ce3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82 0000001 0x0000000000403c28 in pool_pause (timeout=<value optimized out>) at main.c:2461 0000002 0x0000000000407b51 in main (argc=<value optimized out>, argv=<value optimized out>) at main.c:855 This is very strange, When I browse the webinterface it does not complain but if i wait for a while do nothing and go back to browse I get connection time out on the browser then segfault in the pgpool log. Tell me if I am doing wrong in relation with generating backtrace. |
|
|
I have also uploaded pgpool.conf Maybe this will help more, Here is the dmesg output this morning. pgpool[1887]: segfault at 20 ip 000000000046f02d sp 00007fff41a2e080 error 4 in pgpool[400000+ef000] pgpool[1613]: segfault at 20 ip 000000000046f02d sp 00007fff41a2e0c0 error 4 in pgpool[400000+ef000] pgpool[1677]: segfault at 20 ip 000000000046f02d sp 00007fff41a2e0c0 error 4 in pgpool[400000+ef000] pgpool[1840]: segfault at 20 ip 000000000046f02d sp 00007fff41a2e080 error 4 in pgpool[400000+ef000] pgpool[2220]: segfault at 20 ip 000000000046f02d sp 00007ffff86412a0 error 4 in pgpool[400000+ef000] pgpool[2006]: segfault at 20 ip 000000000046f02d sp 00007ffff86412a0 error 4 in pgpool[400000+ef000] pgpool[2081]: segfault at 20 ip 000000000046f02d sp 00007ffff86412a0 error 4 in pgpool[400000+ef000] pgpool[2008]: segfault at 20 ip 000000000046f02d sp 00007ffff86412a0 error 4 in pgpool[400000+ef000] pgpool[4432] general protection ip:46f02d sp:7ffff8641260 error:0 in pgpool[400000+ef000] pgpool[2123] general protection ip:46f02d sp:7ffff86412a0 error:0 in pgpool[400000+ef000] pgpool[2223] general protection ip:46f02d sp:7ffff86412a0 error:0 in pgpool[400000+ef000] pgpool[4490]: segfault at 20 ip 000000000046f02d sp 00007ffff8641260 error 4 in pgpool[400000+ef000] pgpool[4474]: segfault at 20 ip 000000000046f02d sp 00007ffff8641260 error 4 in pgpool[400000+ef000] pgpool[4591] general protection ip:46f02d sp:7ffff8641260 error:0 in pgpool[400000+ef000] pgpool[4577] general protection ip:46f02d sp:7ffff8641260 error:0 in pgpool[400000+ef000] pgpool[4545] general protection ip:46f02d sp:7ffff8641260 error:0 in pgpool[400000+ef000] pgpool[1953]: segfault at 20 ip 000000000046f02d sp 00007ffff86412a0 error 4 in pgpool[400000+ef000] pgpool[30003] general protection ip:46f02d sp:7fff1071e9e0 error:0 in pgpool[400000+ef000] pgpool[762] general protection ip:46f02d sp:7fff1071e9e0 error:0 in pgpool[400000+ef000] pgpool[735] general protection ip:46f02d sp:7fff1071e9e0 error:0 in pgpool[400000+ef000] pgpool[758] general protection ip:46f02d sp:7fff1071e9e0 error:0 in pgpool[400000+ef000] pgpool[738] general protection ip:46f02d sp:7fff1071e9e0 error:0 in pgpool[400000+ef000] pgpool[29538] general protection ip:46f02d sp:7fff1071ea20 error:0 in pgpool[400000+ef000] pgpool[990] general protection ip:46f02d sp:7fff1071e9e0 error:0 in pgpool[400000+ef000] pgpool[10197] general protection ip:46f02d sp:7ffff6cea770 error:0 in pgpool[400000+ef000] pgpool[11490] general protection ip:46f02d sp:7ffff6cea730 error:0 in pgpool[400000+ef000] pgpool[11943] general protection ip:46f02d sp:7ffff6cea730 error:0 in pgpool[400000+ef000] pgpool[10309] general protection ip:46f02d sp:7ffff6cea770 error:0 in pgpool[400000+ef000] pgpool[11958] general protection ip:46f02d sp:7ffff6cea730 error:0 in pgpool[400000+ef000] pgpool[10231] general protection ip:46f02d sp:7ffff6cea770 error:0 in pgpool[400000+ef000] pgpool[20123] general protection ip:46f02d sp:7fff697a66c0 error:0 in pgpool[400000+ef000] pgpool[20119]: segfault at 20 ip 000000000046f02d sp 00007fff697a66c0 error 4 in pgpool[400000+ef000] pgpool[20107]: segfault at 20 ip 000000000046f02d sp 00007fff697a66c0 error 4 in pgpool[400000+ef000] pgpool[20189]: segfault at 20 ip 000000000046f02d sp 00007fff697a66c0 error 4 in pgpool[400000+ef000] pgpool[20484]: segfault at 20 ip 000000000046f02d sp 00007fff697a6680 error 4 in pgpool[400000+ef000] pgpool[20195]: segfault at 20 ip 000000000046f02d sp 00007fff697a66c0 error 4 in pgpool[400000+ef000] pgpool[20471]: segfault at 20 ip 000000000046f02d sp 00007fff697a6680 error 4 in pgpool[400000+ef000] pgpool[20219]: segfault at 20 ip 000000000046f02d sp 00007fff697a66c0 error 4 in pgpool[400000+ef000] pgpool[20079] general protection ip:46f02d sp:7fff697a66c0 error:0 in pgpool[400000+ef000] pgpool[20501] general protection ip:46f02d sp:7fff697a6680 error:0 in pgpool[400000+ef000] pgpool[20213]: segfault at 20 ip 000000000046f02d sp 00007fff697a66c0 error 4 in pgpool[400000+ef000] pgpool[20506] general protection ip:46f02d sp:7fff697a6680 error:0 in pgpool[400000+ef000] pgpool[20069] general protection ip:46f02d sp:7fff697a66c0 error:0 in pgpool[400000+ef000] pgpool[20061]: segfault at 20 ip 000000000046f02d sp 00007fff697a66c0 error 4 in pgpool[400000+ef000] pgpool[20059]: segfault at 20 ip 000000000046f02d sp 00007fff697a66c0 error 4 in pgpool[400000+ef000] pgpool[20534]: segfault at 20 ip 000000000046f02d sp 00007fff697a6680 error 4 in pgpool[400000+ef000] pgpool[20155]: segfault at 20 ip 000000000046f02d sp 00007fff697a66c0 error 4 in pgpool[400000+ef000] pgpool[20563]: segfault at 20 ip 000000000046f02d sp 00007fff697a6680 error 4 in pgpool[400000+ef000] pgpool[20169]: segfault at 20 ip 000000000046f02d sp 00007fff697a66c0 error 4 in pgpool[400000+ef000] pgpool[20490]: segfault at 20 ip 000000000046f02d sp 00007fff697a6680 error 4 in pgpool[400000+ef000] pgpool[20215]: segfault at 20 ip 000000000046f02d sp 00007fff697a66c0 error 4 in pgpool[400000+ef000] pgpool[20565]: segfault at 20 ip 000000000046f02d sp 00007fff697a6680 error 4 in pgpool[400000+ef000] pgpool[20579]: segfault at 20 ip 000000000046f02d sp 00007fff697a6680 error 4 in pgpool[400000+ef000] pgpool[20462]: segfault at 20 ip 000000000046f02d sp 00007fff697a6680 error 4 in pgpool[400000+ef000] pgpool[20585] general protection ip:46f02d sp:7fff697a6680 error:0 in pgpool[400000+ef000] pgpool[20483]: segfault at 20 ip 000000000046f02d sp 00007fff697a6680 error 4 in pgpool[400000+ef000] pgpool[21003]: segfault at 20 ip 000000000046f02d sp 00007fff697a6680 error 4 in pgpool[400000+ef000] |
|
|
|
|
|
I done the backtrace just now again and get following: (gdb) bt #0 0x00007f6f0bc1bce3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82 0000001 0x0000000000403c28 in pool_pause (timeout=<value optimized out>) at main.c:2461 0000002 0x0000000000407b51 in main (argc=<value optimized out>, argv=<value optimized out>) at main.c:855 Could it be ? 1. because I installed pgpool on virtual server, in my case it run on KVM ? 2. pgpool serve has 20 GB memory, 50 GB disk 3. no firewall This is the one very simple query which is segfaulting every now and then Oct 19 14:39:53 pgpool pgpool[20109]: DB node id: 1 backend pid: 2049 statement: Execute: SELECT kerfi_sidur.id#012FROM kerfi_sidur#012WHERE (kerfi_sidur.forrit = $1#012OR kerfi_sidur.path = $2#012); Oct 19 14:39:53 pgpool pgpool[20109]: statement: SELECT TYPNAME FROM PG_TYPE WHERE OID=23 Oct 19 14:39:53 pgpool pgpool[20109]: DB node id: 1 backend pid: 2049 statement: SELECT TYPNAME FROM PG_TYPE WHERE OID=23 Oct 19 14:39:53 pgpool pgpool[20109]: statement: DEALLOCATE pdo_stmt_0000002d Oct 19 14:39:53 pgpool pgpool[20109]: DB node id: 1 backend pid: 2049 statement: DEALLOCATE pdo_stmt_0000002d Oct 19 14:39:53 pgpool kernel: pgpool[20109] general protection ip:46f02d sp:7fff697a66c0 error:0 in pgpool[400000+ef000] Oct 19 14:39:53 pgpool pgpool[19920]: Child process 20109 was terminated by segmentation fault |
|
|
Thank you for providing the information. I'm investigating this now. |
|
|
Any news on this ? this is getting problematic at my place. Every now and then I get segfault. my development environment is php and postgres backend (where I deployed pgpool for load balancing. What other information you would like to see more ? It would be good If I know how to fix it. |
|
|
Right, Today I installed completely new pgpool server in physical machine this time and try. I still get segfault. I do not know if this related with my backend or the pgpool itself. This is what I get : Oct 24 15:23:25 pgpool pgpool[22425]: DB node id: 0 backend pid: 3270 statement: DEALLOCATE pdo_stmt_00000010 Oct 24 15:23:25 pgpool pgpool[22347]: Child process 22425 was terminated by segmentation fault Oct 24 15:23:25 pgpool kernel: pgpool[22425]: segfault at 20 ip 000000000046f02d sp 00007fffa0400800 error 4 in pgpool[400000+ef000] Version I use: pgpool-II version 3.2.1 (namameboshi) And, use following command to install ./configure --prefix=/opt/pgpool --with-pgsql="/usr/pgsql-9.2/" make make install What is the next step do you recommend ? small update: ============== I uploaded the another file for this to investigation Still Core was generated by `./pgpool'. #0 0x00007f7af6f8bce3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) bt #0 0x00007f7af6f8bce3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82 0000001 0x0000000000403c28 in pool_pause (timeout=<value optimized out>) at main.c:2461 0000002 0x0000000000407b51 in main (argc=<value optimized out>, argv=<value optimized out>) at main.c:855 Update ====== I have now uploaded the strace when segfault happened this morning. |
|
|
|
|
|
|
|
|
I have been testing here and there and I think there might be some timeout function which is occurring every now and then. Could you please take a look into it or let me know what else I should do further |
|
|
|
|
|
I think the backtrace you show is that of a parent process. Could you show the backtrace of the child process that ended by segfault? BTW, log.tar that you uploaded might be broken. I get a binary file "message" but I cannot read it. Could you please confirm it and provide a pgpool log again? |
|
|
here is the backtrace of the child process : (gdb) bt #0 0x00007fa678230ce3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82 0000001 0x0000000000403c28 in pool_pause (timeout=<value optimized out>) at main.c:2461 0000002 0x0000000000407b51 in main (argc=<value optimized out>, argv=<value optimized out>) at main.c:855 GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... "/root/pgpool": not in executable format: File format not recognized Missing separate debuginfo for the main executable file Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/6e/2139480beb4554c1accfa463963d2528617a07 [New Thread 16924] Core was generated by `./pgpool'. #0 0x00007fa678230ce3 in ?? () (gdb) bt #0 0x00007fa678230ce3 in ?? () 0000001 0x0000000000403c28 in ?? () 0000002 0x0000000000000080 in ?? () 0000003 0x0000000000000000 in ?? () I sent you the log file in your email address |
|
|
Any news on this ? |
|
|
Thanks for providing log and backtrace. I looked at the log. It seems that segfault occurs sometimes after executing "DEALLOCATE pdo_stmt_xxxxxx". However, I can't understand yet why segfault occurs. The output of your back trace says "Missing separate debuginfo...." Can you recompile pgpool without -O2 to disable optimization and get it again? You need to edit Makefile to disable -O2 flag. |
|
|
here is the new backtrace of parent process of pgpool and the backtrace Loaded symbols for /lib64/libnss_dns-2.12.so Core was generated by `pgpool: wait for connection request'. #0 0x00007f8f9ed29ce3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) bt #0 0x00007f8f9ed29ce3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82 0000001 0x0000000000409eec in do_accept (unix_fd=5, inet_fd=6, timeout=0x7fff04f837b0) at child.c:515 0000002 0x000000000040961f in do_child (unix_fd=5, inet_fd=6) at child.c:185 0000003 0x0000000000405f15 in fork_a_child (unix_fd=5, inet_fd=6, id=285) at main.c:1243 0000004 0x00000000004048d5 in main (argc=1, argv=0x7fff04f87a68) at main.c:661 After a while I still get segfault: ov 7 14:01:36 pgpool pgpool[3941]: DB node id: 0 backend pid: 22173 statement: DEALLOCATE pdo_stmt_00000010 Nov 7 14:01:36 pgpool kernel: pgpool[3941]: segfault at 20 ip 000000000049ec17 sp 00007fff04f82e50 error 4 in pgpool[400000+12d000] Nov 7 14:01:36 pgpool pgpool[3645]: Child process 3941 was terminated by segmentation fault Then I backtraced the child process that was segfaulted and got : Reading symbols from /lib64/libnss_dns.so.2...Reading symbols from /usr/lib/debug/lib64/libnss_dns-2.12.so.debug...done. done. Loaded symbols for /lib64/libnss_dns.so.2 0x00007f8f9ed29ce3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) bt #0 0x00007f8f9ed29ce3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82 0000001 0x0000000000408a85 in pool_pause (timeout=0x7fff04f87910) at main.c:2461 0000002 0x0000000000404f7a in main (argc=1, argv=0x7fff04f87a68) at main.c:855 I have uploaded the new log file in the same location, I will send you the link in your email. Is there something I am doing totally wrong ? or guide me what more information you need ? |
|
|
I have run the valgrind for pgopol please take a look if that helps, File uploaded here now. |
|
|
|
|
|
Hi Nagata, Have you had a chance to look into the latest log and the valgrind.log ? Any news ? |
|
|
I checked the logs, but I cannot understand why segfault occurs yet. I cannot reproduce it. One I noticed is that processes leading to segfault is connected from a particular host, "thor". (Though it may be accidental and unrelated.) From other views, could you check following matter? 1. Is there any error message on postgresql's log? 2. Are the file descriptor limit (ulimit -n) high enough to accept the connections? |
|
|
Thanks Nagata, Here are the ulimit output [root@pgpool bin]# ulimit -n 1024 [root@pgpool bin]# ulimit unlimited With regarding to postgres log: I checked with both backend server and saw : LOG: incomplete startup packet LOG: incomplete startup packet LOG: incomplete startup packet I am not sure if this relates to the segfault issue. As for the thor you mentioned: Thor is the webserver(that our webapplication are) which connects to pgpool. I have posted valgrind log - is that helpful. |
|
|
"incomplete startup packet" occurs when something connects to PostgreSQL and disconnects without sending anything. Are there any other process connecting to port 5432? And, when does the message occur? If this occurs periodically, there may be any monitoring system. If this occurs in concurrent with segfault, this message would be due to pgpool. |
|
|
In addition, could you try about following cases? - using pgpool 3.1.x - using other version of PostgreSQL - installing PostgreSQL from source files. |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2012-10-16 20:56 | anilth | New Issue | |
| 2012-10-17 10:52 | nagata | Assigned To | => nagata |
| 2012-10-17 10:52 | nagata | Status | new => assigned |
| 2012-10-17 11:00 | nagata | Note Added: 0000107 | |
| 2012-10-18 01:29 | anilth | File Added: log.tar | |
| 2012-10-18 01:35 | anilth | Note Added: 0000108 | |
| 2012-10-19 01:06 | anilth | Note Edited: 0000108 | |
| 2012-10-19 18:12 | anilth | Note Added: 0000111 | |
| 2012-10-19 20:04 | anilth | Note Edited: 0000111 | |
| 2012-10-19 20:04 | anilth | File Added: pgpool.conf | |
| 2012-10-19 20:16 | anilth | Note Added: 0000112 | |
| 2012-10-19 23:41 | anilth | Note Edited: 0000112 | |
| 2012-10-22 10:11 | nagata | Note Added: 0000119 | |
| 2012-10-23 17:55 | anilth | Note Added: 0000122 | |
| 2012-10-25 00:28 | anilth | Note Added: 0000123 | |
| 2012-10-25 22:37 | anilth | Note Edited: 0000123 | |
| 2012-10-25 22:38 | anilth | File Added: gdb | |
| 2012-10-31 18:29 | anilth | Note Edited: 0000123 | |
| 2012-10-31 18:29 | anilth | File Added: strace.txt | |
| 2012-11-01 19:31 | anilth | Note Added: 0000133 | |
| 2012-11-01 19:36 | anilth | File Added: strace.tar | |
| 2012-11-01 20:38 | nagata | Note Added: 0000134 | |
| 2012-11-01 21:36 | anilth | Note Added: 0000135 | |
| 2012-11-01 21:36 | anilth | Note Edited: 0000135 | |
| 2012-11-04 21:56 | anilth | Note Added: 0000138 | |
| 2012-11-05 19:39 | nagata | Note Added: 0000139 | |
| 2012-11-07 23:12 | anilth | Note Added: 0000140 | |
| 2012-11-08 20:20 | anilth | Note Added: 0000141 | |
| 2012-11-08 20:20 | anilth | File Added: valgrind.log | |
| 2012-11-12 23:28 | anilth | Note Added: 0000143 | |
| 2012-11-14 20:20 | anilth | Note Edited: 0000143 | |
| 2012-11-15 14:45 | nagata | Note Added: 0000153 | |
| 2012-11-15 18:09 | anilth | Note Added: 0000154 | |
| 2012-11-27 20:45 | nagata | Note Added: 0000171 | |
| 2012-11-27 20:48 | nagata | Note Added: 0000172 | |
| 2012-12-18 16:40 | nagata | Status | assigned => confirmed |
| 2012-12-18 16:41 | nagata | Status | confirmed => assigned |
| 2013-02-01 10:24 | nagata | Status | assigned => closed |
| 2013-02-01 10:24 | nagata | Resolution | open => fixed |