View Issue Details

IDProjectCategoryView StatusLast Update
0000265Pgpool-IIBugpublic2017-01-12 09:46
Reportera.dushechkin Assigned Tot-ishii  
PrioritynormalSeverityminorReproducibilityrandom
Status closedResolutionopen 
Platformx86_64OSUbuntuOS Version14.04
Product Version3.4.6 
Summary0000265: pgpool-II sometimes hangs on restart
DescriptionOn rare occasion when pgpool-II is restarted by Ansible deploy script, restart hangs indefinitely. This was spotted on initial deploy only (no application connections), just custom monitoring shell script asking for pgpool-II status running mostly "pcp_command node_info" stuff.

GDB reports following stacktrace:

(gdb) bt
#0 __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
0000001 0x00007f216943c684 in _L_lock_533 () from /lib/x86_64-linux-gnu/libc.so.6
0000002 0x00007f216943c0e2 in __GI___vsyslog_chk (pri=<optimized out>, flag=1, fmt=0x4ba720 "[%lu-%d] %s", ap=ap@entry=0x7fff250d9c68) at ../misc/syslog.c:258
0000003 0x00007f216943c532 in __syslog_chk (pri=pri@entry=6, flag=flag@entry=1, fmt=fmt@entry=0x4ba720 "[%lu-%d] %s") at ../misc/syslog.c:129
0000004 0x000000000046caeb in syslog (__fmt=0x4ba720 "[%lu-%d] %s", __pri=6) at /usr/include/x86_64-linux-gnu/bits/syslog.h:31
0000005 write_syslog (level=6, line=<optimized out>) at ../../src/utils/error/elog.c:1554
0000006 0x000000000046f863 in send_message_to_server_log (edata=0x74a2c0 <errordata+384>) at ../../src/utils/error/elog.c:2166
0000007 EmitErrorReport () at ../../src/utils/error/elog.c:1125
0000008 0x000000000046d9ce in errfinish (dummy=<optimized out>) at ../../src/utils/error/elog.c:430
0000009 0x0000000000425ad7 in die (sig=15) at protocol/child.c:923
0000010 <signal handler called>
0000011 __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:93
0000012 0x00007f216943c684 in _L_lock_533 () from /lib/x86_64-linux-gnu/libc.so.6
0000013 0x00007f216943c0e2 in __GI___vsyslog_chk (pri=<optimized out>, flag=1, fmt=0x4ba720 "[%lu-%d] %s", ap=ap@entry=0x7fff250da8e8) at ../misc/syslog.c:258
0000014 0x00007f216943c532 in __syslog_chk (pri=pri@entry=6, flag=flag@entry=1, fmt=fmt@entry=0x4ba720 "[%lu-%d] %s") at ../misc/syslog.c:129
0000015 0x000000000046caeb in syslog (__fmt=0x4ba720 "[%lu-%d] %s", __pri=6) at /usr/include/x86_64-linux-gnu/bits/syslog.h:31
0000016 write_syslog (level=6, line=<optimized out>) at ../../src/utils/error/elog.c:1554
0000017 0x000000000046f863 in send_message_to_server_log (edata=0x74a200 <errordata+192>) at ../../src/utils/error/elog.c:2166
0000018 EmitErrorReport () at ../../src/utils/error/elog.c:1125
0000019 0x000000000046d9ce in errfinish (dummy=<optimized out>) at ../../src/utils/error/elog.c:430
0000020 0x0000000000425ad7 in die (sig=3) at protocol/child.c:923
0000021 <signal handler called>
0000022 0x00007f21694433fd in __libc_send (fd=4, buf=0x1248d70, n=119, flags=-1, flags@entry=16384) at ../sysdeps/unix/sysv/linux/x86_64/send.c:27
0000023 0x00007f216943c271 in __GI___vsyslog_chk (pri=<optimized out>, flag=1, fmt=0x4ba720 "[%lu-%d] %s", ap=ap@entry=0x7fff250db4e8) at ../misc/syslog.c:279
0000024 0x00007f216943c532 in __syslog_chk (pri=pri@entry=3, flag=flag@entry=1, fmt=fmt@entry=0x4ba720 "[%lu-%d] %s") at ../misc/syslog.c:129
0000025 0x000000000046caeb in syslog (__fmt=0x4ba720 "[%lu-%d] %s", __pri=3) at /usr/include/x86_64-linux-gnu/bits/syslog.h:31
0000026 write_syslog (level=3, line=<optimized out>) at ../../src/utils/error/elog.c:1554
0000027 0x000000000046f863 in send_message_to_server_log (edata=0x74a140 <errordata>) at ../../src/utils/error/elog.c:2166
0000028 EmitErrorReport () at ../../src/utils/error/elog.c:1125
0000029 0x000000000046d9ce in errfinish (dummy=<optimized out>) at ../../src/utils/error/elog.c:430
0000030 0x0000000000432c89 in new_connection (p=p@entry=0x1240c20) at protocol/pool_connection_pool.c:855
0000031 0x0000000000432d15 in pool_create_cp () at protocol/pool_connection_pool.c:251
0000032 0x0000000000424f2a in connect_backend (sp=sp@entry=0x1246b40, frontend=frontend@entry=0x1242b20) at protocol/child.c:864
#33 0x00000000004273ee in get_backend_connection (frontend=0x1242b20) at protocol/child.c:2326
0000034 do_child (fds=fds@entry=0x12401b0) at protocol/child.c:334
0000035 0x0000000000406c4c in fork_a_child (fds=0x12401b0, id=23) at main/pgpool_main.c:678
0000036 0x000000000040aec5 in PgpoolMain (discard_status=discard_status@entry=0 '\000', clear_memcache_oidmaps=clear_memcache_oidmaps@entry=0 '\000') at main/pgpool_main.c:278
0000037 0x000000000040548f in main (argc=<optimized out>, argv=<optimized out>) at main/main.c:310
TagsNo tags attached.

Activities

t-ishii

2016-12-06 13:04

developer  

child_v3_4.diff (790 bytes)   
child_v3_4.diff (790 bytes)   

t-ishii

2016-12-06 13:06

developer   ~0001204

It seems the signal handler is interrupted by another signal and it causes a dealock in kernel.
Attached patch fixes the problem by blocking interruption inside the signal handler.

Can you please try it out?

a.dushechkin

2016-12-07 11:38

reporter   ~0001207

Hello. Thank you for quick response, but this issue is really rare. I'll try to statistically/artifically reproduce it this weekend.

t-ishii

2017-01-12 09:45

developer   ~0001290

I have committed fixes to all supported branches. The real issue was malloc() was interrupted by signal, and the signal handler calls elog/ereport which in turn calls malloc(). The fix was removing elog/ereport calls from signal handlers.
The fix for 3.4 branch will appear in the next minor version: 3.4.10.
(the schedule for the release is not decided yet though).

I'm going to close the issue for now.

Issue History

Date Modified Username Field Change
2016-12-05 17:39 a.dushechkin New Issue
2016-12-05 18:08 t-ishii Assigned To => Muhammad Usama
2016-12-05 18:08 t-ishii Status new => assigned
2016-12-06 13:04 t-ishii File Added: child_v3_4.diff
2016-12-06 13:06 t-ishii Note Added: 0001204
2016-12-06 13:06 t-ishii Assigned To Muhammad Usama => t-ishii
2016-12-07 07:38 t-ishii Status assigned => feedback
2016-12-07 11:38 a.dushechkin Note Added: 0001207
2016-12-07 11:38 a.dushechkin Status feedback => assigned
2016-12-07 14:21 t-ishii Status assigned => feedback
2017-01-12 09:45 t-ishii Note Added: 0001290
2017-01-12 09:46 t-ishii Status feedback => closed