[pgpool-general: 8087] PgPool 4.3.1 segfaults

Vitaly Voronov wizard1024 at gmail.com
Wed Apr 20 04:59:26 JST 2022


Hello PgPool Team,

I've seen repeated segfaults in our test environment.

Environment:
1. 2 app servers with stand alone pgpool on each server.
2. 2 DB servers with PostgreSQL  14 on it with streaming replication.
3. All servers - hardware servers in Hezner.

My PgPool was installed from PgPool RPM repository:
$ rpm -qa | grep pgpool
pgpool-II-release-4.3-1.noarch
pgpool-II-pg14-4.3.1-1pgdg.rhel7.x86_64
pgpool-II-pg14-debuginfo-4.3.1-1pgdg.rhel7.x86_64

Main strange issue - We saw segfaults only on the second application node.
Main difference with nodes - kernel version:
node1: Linux app1.dev-cluster.local 3.10.0-1127.18.2.el7.x86_64 #1 SMP Sun
Jul 26 15:27:06 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
node2: Linux app2.dev-cluster.local 3.10.0-862.11.6.el7.x86_64 #1 SMP Tue
Aug 14 21:49:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
and sysctl.conf:
node1:
# Redirect 10k RPS
# Default 128
net.core.somaxconn = 10240
# Default 1000
net.core.netdev_max_backlog = 10240

node2:
# the kernel's socket backlog limit
net.core.somaxconn = 1024
# maximum size of the receive queue
net.core.netdev_max_backlog=3000

pgpool.conf:
backend_clustering_mode = 'streaming_replication'
socket_dir = '/var/run/pgpool'
pcp_socket_dir = '/var/run/pgpool'
backend_hostname0 = '10.10.10.2'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/pgsql/14/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_application_name0 = 'hezner-db1'
backend_hostname1 = '10.10.10.1'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/pgsql/14/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_application_name1 = 'hezner-db2'
enable_pool_hba = on
pool_passwd = 'pool_passwd'
num_init_children = 600
max_pool = 1
child_max_connections = 1
connection_cache = off
load_balance_mode = off
sr_check_period = 0
sr_check_user = 'postgres'
sr_check_database = 'postgres'
health_check_period = 3
health_check_timeout = 1
health_check_user = 'postgres'
health_check_max_retries = 3
health_check_retry_delay = 1
connect_timeout = 1000
failover_command = '/etc/pgpool-II/failover.sh %d %h %p %D %m %H %M %P %r
%R %N %S'
failover_on_backend_error = off
failover_on_backend_shutdown = off
hostname0 = ''
wd_ipc_socket_dir = '/var/run/pgpool'

For last day (time in JST):
[Tue Apr 19 01:10:03 2022] pgpool[12136]: segfault at 14 ip
0000000000435935 sp 00007ffd4836f920 error 6 in pgpool[400000+21a000]
[Tue Apr 19 01:28:02 2022] pgpool[23275]: segfault at 14 ip
000000000043532e sp 00007ffd4836f920 error 6 in pgpool[400000+21a000]
[Tue Apr 19 01:45:31 2022] pgpool[32436]: segfault at 14 ip
000000000043532e sp 00007ffd4836f920 error 6 in pgpool[400000+21a000]
[Tue Apr 19 06:14:26 2022] pgpool[4583]: segfault at 14 ip 000000000043532e
sp 00007ffd4836f920 error 6 in pgpool[400000+21a000]
[Tue Apr 19 07:05:02 2022] pgpool[27061]: segfault at 14 ip
0000000000435935 sp 00007ffd4836f920 error 6 in pgpool[400000+21a000]
[Tue Apr 19 12:04:56 2022] pgpool[21666]: segfault at 14 ip
000000000043532e sp 00007ffd4836f920 error 6 in pgpool[400000+21a000]
[Tue Apr 19 16:24:56 2022] pgpool[26156]: segfault at 14 ip
000000000043532e sp 00007ffd4836f920 error 6 in pgpool[400000+21a000]
[Tue Apr 19 18:27:02 2022] pgpool[21477]: segfault at 14 ip
0000000000435935 sp 00007ffd4836f920 error 6 in pgpool[400000+21a000]
[Tue Apr 19 21:10:05 2022] pgpool[10103]: segfault at 14 ip
0000000000435935 sp 00007ffdfc6a0e00 error 6 in pgpool[400000+21a000]
[Wed Apr 20 02:51:23 2022] pgpool[16108]: segfault at 14 ip
000000000043532e sp 00007ffdfc6a0e00 error 6 in pgpool[400000+21a000]
[Wed Apr 20 02:55:45 2022] pgpool[16874]: segfault at 14 ip
000000000043532e sp 00007ffdfc6a0e00 error 6 in pgpool[400000+21a000]

I got only the last 3 core dumps. They were the same.
The last core dump:
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/pgpool...Reading symbols from
/usr/lib/debug/usr/bin/pgpool.debug...done.
done.
[New LWP 16874]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `pgpool: wait for connection request             '.
Program terminated with signal 11, Segmentation fault.
#0  0x000000000043532e in do_child (fds=fds at entry=0x17593e0) at
protocol/child.c:333
333                     proc_info->wait_for_connect = 0;
Missing separate debuginfos, use: debuginfo-install
audit-libs-2.8.4-4.el7.x86_64 cyrus-sasl-lib-2.1.26-23.el7.x86_64
glibc-2.17-260.el7_6.3.x86_64 keyutils-libs-1.5.8-3.el7.x86_64
krb5-libs-1.15.1-19.el7.x86_64 libcap-ng-0.7.5-4.el7.x86_64
libcom_err-1.42.9-12.el7_5.x86_64 libgcc-4.8.5-36.el7.x86_64
libmemcached-1.0.16-5.el7.x86_64 libselinux-2.5-14.1.el7.x86_64
libstdc++-4.8.5-36.el7.x86_64 nspr-4.19.0-1.el7_5.x86_64
nss-3.36.0-5.el7_5.x86_64 nss-softokn-freebl-3.36.0-5.el7_5.x86_64
nss-util-3.36.0-1.el7_5.x86_64 openldap-2.4.44-15.el7_5.x86_64
openssl-libs-1.0.2k-21.el7_9.x86_64 pam-1.1.8-22.el7.x86_64
pcre-8.32-17.el7.x86_64 postgresql14-libs-14.2-1PGDG.rhel7.x86_64
zlib-1.2.7-18.el7.x86_64
(gdb) bt full
#0  0x000000000043532e in do_child (fds=fds at entry=0x17593e0) at
protocol/child.c:333
        sp = <optimized out>
        saddr = {addr = {ss_family = 0, __ss_padding = '\000' <repeats 117
times>, __ss_align = 0}, salen = 128}
        local_sigjmp_buf = {{__jmpbuf = {24482792, 7991924142289912070,
211, 0, 139985774064536, 1, -7993048770381283066, 7991923603094831366},
__mask_was_saved = 1, __saved_mask = {__val = {0, 140728838264384,
                139985973211296, 140728838264848, 5212294, 140728838264400,
139985969563067, 42949672960, 680, 140728838264368, 0, 5085120, 0,
140724603453444, 3440, 5085122}}}}
        backend = 0x0
        now = {tv_sec = 1650390758, tv_usec = 681992}
        tz = {tz_minuteswest = 0, tz_dsttime = 0}
        connections_count = 0
        psbuf =
"|\022y\001\000\000\000\000\370\026j\374\375\177\000\000\b\026j\374\375\177\000\000|\022y\001\000\000\000\000\020\027j\374\375\177\000\000
\027j\374\375\177\000\000\002@\021\006Q\177\000\000W\323\377\005Q\177\000\000\001\200\255\373\000\000\377\000}\022y\001\000\000\000\000}\022y\001\000\000\000\000|\022y\001\000\000\000\000|\022y\001\000\000\000\000|\022y\001\000\000\000\000|\022y\001\070\067\066\063`\025j\374\375\177\000\000:xu\001\000\000\000\000`\025j\374\375\177\000\000\300\227M\000\000\000\000\000
\027j\374\375\177\000\000\001\000\000\000\000\000\000\000\335\003\000\000\000\000\000\000:xu\001\000\000\000\000\065\022\n\006Q\177\000\000\001\200\255\373\060\060\060\060"...
        proc_info = <optimized out>
        walk = <optimized out>
#1  0x000000000040b7e5 in fork_a_child (fds=0x17593e0, id=211) at
main/pgpool_main.c:686
        pid = 0
#2  0x000000000040c2a3 in reaper () at main/pgpool_main.c:2509
        new_pid = 0
        shutdown_system = 0 '\000'
        restart_child = 1 '\001'
        found = 1 '\001'
        process_health_check = 0 '\000'
        pid = 9100
        status = 256
        i = 211
#3  0x0000000000412d7d in PgpoolMain (discard_status=discard_status at entry=0
'\000', clear_memcache_oidmaps=clear_memcache_oidmaps at entry=0 '\000') at
main/pgpool_main.c:477
        i = 2
        local_sigjmp_buf = {{__jmpbuf = {1, 7991926004629963014, 9898,
140728838265744, 5, 1, -7993048769884258042, 7991923584716177670},
__mask_was_saved = 1, __saved_mask = {__val = {18446744066192964103, 0,
                139986006267048, 140728838265648, 140728838265632,
461466061, 4204876, 4294967295, 139985969301896, 139985969290592,
139986006197448, 5080360, 18446744073709551615, 0, 0, 4693792}}}}
        first = 0 '\000'
#4  0x0000000000409b4a in main (argc=<optimized out>, argv=<optimized out>)
at main/main.c:365
        opt = <optimized out>
        debug_level = <optimized out>
        optindex = 0
        discard_status = 0 '\000'
        clear_memcache_oidmaps = 0 '\000'
        pcp_conf_file_path = "/etc/pgpool-II/pcp.conf", '\000' <repeats
8169 times>
        conf_file_path = "/etc/pgpool-II/pgpool.conf", '\000' <repeats 8166
times>
        hba_file_path = "/etc/pgpool-II/pool_hba.conf", '\000' <repeats
8164 times>
        pool_passwd_key_file_path = "/var/lib/pgsql/.pgpoolkey\000
\000\000\000\000\000\002\000\000\000\006\000\000\000\270\r\006\000\000\000\000\000\270\r&\000\000\000\000\000\270\r&\000\000\000\000\000\340\001\000\000\000\000\000\000\340\001\000\000\000\000\000\000\b\000\000\000\000\000\000\000\004\000\000\000\004\000\000\000\310\001\000\000\000\000\000\000\310\001\000\000\000\000\000\000\310\001\000\000\000\000\000\000$\000\000\000\000\000\000\000$\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000P\345td\004\000\000\000\340\321\005\000\000\000\000\000\340\321\005\000\000\000\000\000\340\321\005\000\000\000\000\000\344\004\000\000\000\000\000\000\344\004\000\000\000\000\000\000\004\000\000\000\000\000\000\000"...
        long_options = {{name = 0x4d87d6 "hba-file", has_arg = 1, flag =
0x0, val = 97}, {name = 0x4d87df "debug", has_arg = 0, flag = 0x0, val =
100}, {name = 0x4d87e5 "config-file", has_arg = 1, flag = 0x0, val = 102},
          {name = 0x4d87f1 "key-file", has_arg = 1, flag = 0x0, val = 107},
{name = 0x4d87fa "pcp-file", has_arg = 1, flag = 0x0, val = 70}, {name =
0x4d8803 "help", has_arg = 0, flag = 0x0, val = 104}, {
            name = 0x4deab0 "mode", has_arg = 1, flag = 0x0, val = 109},
{name = 0x4d8808 "dont-detach", has_arg = 0, flag = 0x0, val = 110}, {name
= 0x4d8814 "discard-status", has_arg = 0, flag = 0x0, val = 68}, {
            name = 0x4d8823 "clear-oidmaps", has_arg = 0, flag = 0x0, val =
67}, {name = 0x4d8831 "debug-assertions", has_arg = 0, flag = 0x0, val =
120}, {name = 0x4ed79c "version", has_arg = 0, flag = 0x0, val = 118}, {
            name = 0x0, has_arg = 0, flag = 0x0, val = 0}}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20220419/134a1115/attachment.htm>


More information about the pgpool-general mailing list