[pgpool-general: 8072] Segfaults on PgPool-II 4.3.1

Gautam Bellary gautam at pulsasensors.com
Wed Apr 6 02:16:09 JST 2022


Hello PgPool Team,

We've been experiencing regular segfaults from PgPool-II 4.3.1 that appear
to have started recently after we began using this version. Our logs
include three different cases of segfaults, ("segfault at 14", "segfault at
24" and "segfault at 0" in the logs) and we're seeing each segfault several
times a day on the master/leader instance in our production and test
clusters. While these segfaults do not always have an immediate impact on
connections between our servers and the database, we have observed an
increasing number of "bad connection" issues over time, which we did not
observe before upgrading to PgPool 4.3.1 and experiencing the segfaults.

Details about our environment and the log lines for each segfault are
included below, and we've attached the core dump backtraces from gdb for
two of the three segfaults ("segfault at 14" and "segfault at 24"),
including `bt` and `bt full`. We'll add details for "segfault at 0" here or
in a new thread when we're able to capture a coredump for it.

Thanks,
Gautam

*Environment details (ubuntu-focal-20.04-amd64):*

   - Cluster contains 3 PgPool nodes and 3 PSQL nodes, all on AWS EC2
   instances, pgpool.conf attached.
   - $ uname -a
   Linux ip-172-30-166-230 5.4.0-1038-aws #40-Ubuntu SMP Fri Feb 5 23:50:40
   UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
   - # SHOW POOL_VERSION;
        pool_version
   -----------------------
    4.3.1 (tamahomeboshi)

*Log lines for segfaults:*
ISSUE 1 ("segfault at 14") - backtrace attached as
"segfault_at_14_backtrace_20220405.txt":
[350739.877169] pgpool[1157084]: segfault at 14 ip 000055b950530f25 sp
00007ffc9e560f50 error 6 in pgpool[55b950503000+d9000]
[350739.877178] Code: 00 00 89 ef be 06 00 00 00 c7 44 24 50 01 00 00 00 e8
ff 3e fd ff 85 c0 0f 88 a6 0f 00 00 89 ef e8 30 d0 03 00 48 8b 44 24 08
<c7> 40 14 00 00 00 00 4c 8d a4 24 20 01 00 00 48 8d 35 65 19 23 00

ISSUE 2 ("segfault at 24") - backtrace attached as
"segfault_at_24_backtrace_20220405.txt":
[410790.718731] pgpool[1354761]: segfault at 24 ip 000055b9505308fa sp
00007ffc9e560f50 error 6 in pgpool[55b950503000+d9000]
[410790.718740] Code: 80 78 4c 01 75 0a 83 78 50 00 0f 84 cd 05 00 00 31 f6
48 8d 3d 0d ac 0b 00 e8 72 c8 02 00 e8 7d 42 fd ff 89 c7 e8 d6 89 fd ff
<c7> 40 24 00 00 00 00 48 8b 05 38 89 1e 00 80 78 4c 01 0f 84 20 01

ISSUE 3 ("segfault at 0") - backtrace unavailable at this time:
[ 4932.453010] pgpool[53241]: segfault at 0 ip 000055ae84d76163 sp
00007ffd94e8c970 error 4 in pgpool[55ae84d2c000+d9000]
[ 4932.453018] Code: 01 c5 41 8b 45 10 85 c0 0f 8f f1 00 00 00 89 35 d3 60
1e 00 48 8d 35 d0 60 1e 00 48 8d 7e fc 89 0d c6 60 1e 00 e8 4d bd ff ff
<8b> 38 4c 8d 68 18 83 ef 18 48 63 ff 48 89 3b e8 99 21 02 00 48 8b
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20220405/9751c54e/attachment-0001.htm>
-------------- next part --------------
Details

[410790.718731] pgpool[1354761]: segfault at 24 ip 000055b9505308fa sp 00007ffc9e560f50 error 6 in pgpool[55b950503000+d9000]
[410790.718740] Code: 80 78 4c 01 75 0a 83 78 50 00 0f 84 cd 05 00 00 31 f6 48 8d 3d 0d ac 0b 00 e8 72 c8 02 00 e8 7d 42 fd ff 89 c7 e8 d6 89 fd ff <c7> 40 24 00 00 00 00 48 8b 05 38 89 1e 00 80 78 4c 01 0f 84 20 01


ubuntu at ip-172-30-171-146:~$ sudo coredumpctl gdb 1354761
           PID: 1354761 (pgpool)
           UID: 1001 (postgres)
           GID: 1001 (postgres)
        Signal: 11 (SEGV)
     Timestamp: Fri 2022-04-01 12:14:53 UTC (6h ago)
  Command Line: pgpool: wait for connection request
    Executable: /usr/local/pgpool/bin/pgpool
 Control Group: /system.slice/pgpool.service
          Unit: pgpool.service
         Slice: system.slice
       Boot ID: 68518f09e84c48528c745b99da817a0f
    Machine ID: ec2968ffa1ed3f46cc4f8ce331485ff1
      Hostname: ip-172-30-171-146
       Storage: /var/lib/systemd/coredump/core.pgpool.1001.68518f09e84c48528c745b99da817a0f.1354761.1648815293000000000000.lz4
       Message: Process 1354761 (pgpool) of user 1001 dumped core.
                
                Stack trace of thread 1354761:
                #0  0x000055b9505308fa set_process_status (pgpool + 0x398fa)
                #1  0x000055b950507576 fork_a_child (pgpool + 0x10576)
                #2  0x000055b95050807b reaper (pgpool + 0x1107b)
                #3  0x000055b95050ef20 PgpoolMain (pgpool + 0x17f20)
                #4  0x000055b950505566 main (pgpool + 0xe566)
                #5  0x00007f587967a0b3 __libc_start_main (libc.so.6 + 0x240b3)
                #6  0x000055b950505b1e _start (pgpool + 0xeb1e)

GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/pgpool/bin/pgpool...
[New LWP 1354761]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `pgpool: wait for'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  set_process_status (status=WAIT_FOR_CONNECT) at protocol/child.c:2113
2113        proc_info->status = status;
(gdb) bt
#0  set_process_status (status=WAIT_FOR_CONNECT) at protocol/child.c:2113
#1  wait_for_new_connections (saddr=0x7ffc9e5610f0, fds=0x55b95104bf40) at protocol/child.c:1488
#2  do_child (fds=fds at entry=0x55b95104bf40) at protocol/child.c:332
#3  0x000055b950507576 in fork_a_child (fds=0x55b95104bf40, id=48) at main/pgpool_main.c:686
#4  0x000055b95050807b in reaper () at main/pgpool_main.c:2509
#5  0x000055b95050ef20 in PgpoolMain (discard_status=<optimized out>, clear_memcache_oidmaps=<optimized out>) at main/pgpool_main.c:477
#6  0x000055b950505566 in main (argc=<optimized out>, argv=<optimized out>) at main/main.c:365
(gdb) bt full
#0  set_process_status (status=WAIT_FOR_CONNECT) at protocol/child.c:2113
        proc_info = 0x0
#1  wait_for_new_connections (saddr=0x7ffc9e5610f0, fds=0x55b95104bf40) at protocol/child.c:1488
        numfds = <optimized out>
        afd = <optimized out>
        walk = 0x55b95104bf4c
        proc_info = 0x0
        rmask = {fds_bits = {0 <repeats 14 times>, 8246126533403043939, 7599578576440353647}}
        fd = 0
        timeoutdata = {tv_sec = 0, tv_usec = 0}
        save_errno = <optimized out>
        on = 2036433016
        timeout = <optimized out>
        rmask = <optimized out>
        numfds = <optimized out>
        save_errno = <optimized out>
        fd = <optimized out>
        afd = <optimized out>
        walk = <optimized out>
        on = <optimized out>
        timeout = <optimized out>
        timeoutdata = <optimized out>
        proc_info = <optimized out>
        sts = <optimized out>
        seconds = <optimized out>
        elevel_ = <optimized out>
        elevel_ = <optimized out>
        elevel_ = <optimized out>
        __d = <optimized out>
        elevel_ = <optimized out>
        __d = <optimized out>
        elevel_ = <optimized out>
#2  do_child (fds=fds at entry=0x55b95104bf40) at protocol/child.c:332
        sp = <optimized out>
        front_end_fd = <optimized out>
        saddr = {addr = {ss_family = 0, __ss_padding = '\000' <repeats 17 times>, " ", '\000' <repeats 92 times>, "\063\343\005Q\271U\000", __ss_align = 206158430224}, 
          salen = 2656442528}
        con_count = <optimized out>
        local_sigjmp_buf = {{__jmpbuf = {94254416576332, -530653384382068941, 0, 140722964929360, 94254405635737, 1353942, -6041260252762192077, -530653153064106189}, 
            __mask_was_saved = 1, __saved_mask = {__val = {0 <repeats 16 times>}}}}
        backend = 0x0
        now = {tv_sec = 1648815293, tv_usec = 586125}
        tz = {tz_minuteswest = 0, tz_dsttime = 0}
        connections_count = 0
        psbuf = '\000' <repeats 1008 times>...
        proc_info = 0x0
        walk = <optimized out>
#3  0x000055b950507576 in fork_a_child (fds=0x55b95104bf40, id=48) at main/pgpool_main.c:686
        pid = 0
        elevel_ = <optimized out>
#4  0x000055b95050807b in reaper () at main/pgpool_main.c:2509
        exiting_process_name = 0x55b9505dce9d "child"
        new_pid = <optimized out>
        shutdown_system = <optimized out>
        restart_child = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--c
        found = 1 '\001'
        process_health_check = 0 '\000'
        pid = 1353942
        status = 256
        i = 48
#5  0x000055b95050ef20 in PgpoolMain (discard_status=<optimized out>, clear_memcache_oidmaps=<optimized out>) at main/pgpool_main.c:477
        i = 3
        local_sigjmp_buf = {{__jmpbuf = {94254406965376, -6041260252344858829, 94254407050368, 2, 5432, 140722964929952, -6041260252340664525, -530653160400206029}, __mask_was_saved = 1, __saved_mask = {__val = {18446744066192964103, 0, 529823011, 94254416847448, 140722964930568, 140722964930608, 140722964929968, 140722964929984, 140017978348009, 0, 0, 2, 0, 0, 94254416846528, 140722964930512}}}}
        first = 0 '\000'
#6  0x000055b950505566 in main (argc=<optimized out>, argv=<optimized out>) at main/main.c:365
        opt = <optimized out>
        debug_level = <optimized out>
        optindex = 0
        discard_status = 0 '\000'
        clear_memcache_oidmaps = 0 '\000'
        pcp_conf_file_path = "/usr/local/pgpool/etc/pcp.conf", '\000' <repeats 8162 times>
        conf_file_path = "/usr/local/pgpool/etc/pgpool.conf", '\000' <repeats 8159 times>
        hba_file_path = "/usr/local/pgpool/etc/pool_hba.conf", '\000' <repeats 8157 times>
        pool_passwd_key_file_path = "/psql/.pgpoolkey", '\000' <repeats 2808 times>...
        long_options = {{name = 0x55b9505dc2bb "hba-file", has_arg = 1, flag = 0x0, val = 97}, {name = 0x55b9505dc2c4 "debug", has_arg = 0, flag = 0x0, val = 100}, {name = 0x55b9505dc2ca "config-file", has_arg = 1, flag = 0x0, val = 102}, {name = 0x55b9505dc2d6 "key-file", has_arg = 1, flag = 0x0, val = 107}, {name = 0x55b9505dc2df "pcp-file", has_arg = 1, flag = 0x0, val = 70}, {name = 0x55b9505dc2e8 "help", has_arg = 0, flag = 0x0, val = 104}, {name = 0x55b9505e2204 "mode", has_arg = 1, flag = 0x0, val = 109}, {name = 0x55b9505dc2ed "dont-detach", has_arg = 0, flag = 0x0, val = 110}, {name = 0x55b9505dc2f9 "discard-status", has_arg = 0, flag = 0x0, val = 68}, {name = 0x55b9505dc308 "clear-oidmaps", has_arg = 0, flag = 0x0, val = 67}, {name = 0x55b9505dc316 "debug-assertions", has_arg = 0, flag = 0x0, val = 120}, {name = 0x55b9505f018c "version", has_arg = 0, flag = 0x0, val = 118}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool_20220405.conf
Type: application/octet-stream
Size: 40593 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20220405/9751c54e/attachment-0001.obj>
-------------- next part --------------
Details

[367055.274377] pgpool[1211301]: segfault at 14 ip 000055b950530f25 sp 00007ffc9e560f50 error 6 in pgpool[55b950503000+d9000]
[367055.274387] Code: 00 00 89 ef be 06 00 00 00 c7 44 24 50 01 00 00 00 e8 ff 3e fd ff 85 c0 0f 88 a6 0f 00 00 89 ef e8 30 d0 03 00 48 8b 44 24 08 <c7> 40 14 00 00 00 00 4c 8d a4 24 20 01 00 00 48 8d 35 65 19 23 00


ubuntu at ip-172-30-171-146:~$ sudo coredumpctl gdb 1211301
           PID: 1211301 (pgpool)
           UID: 1001 (postgres)
           GID: 1001 (postgres)
        Signal: 11 (SEGV)
     Timestamp: Fri 2022-04-01 00:05:58 UTC (18h ago)
  Command Line: pgpool: wait for connection request
    Executable: /usr/local/pgpool/bin/pgpool
 Control Group: /system.slice/pgpool.service
          Unit: pgpool.service
         Slice: system.slice
       Boot ID: 68518f09e84c48528c745b99da817a0f
    Machine ID: ec2968ffa1ed3f46cc4f8ce331485ff1
      Hostname: ip-172-30-171-146
       Storage: /var/lib/systemd/coredump/core.pgpool.1001.68518f09e84c48528c745b99da817a0f.1211301.1648771558000000000000.lz4
       Message: Process 1211301 (pgpool) of user 1001 dumped core.
                
                Stack trace of thread 1211301:
                #0  0x000055b950530f25 do_child (pgpool + 0x39f25)
                #1  0x000055b950507576 fork_a_child (pgpool + 0x10576)
                #2  0x000055b95050807b reaper (pgpool + 0x1107b)
                #3  0x000055b95050ef20 PgpoolMain (pgpool + 0x17f20)
                #4  0x000055b950505566 main (pgpool + 0xe566)
                #5  0x00007f587967a0b3 __libc_start_main (libc.so.6 + 0x240b3)
                #6  0x000055b950505b1e _start (pgpool + 0xeb1e)

GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/pgpool/bin/pgpool...
[New LWP 1211301]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `pgpool: wait for'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055b950530f25 in do_child (fds=fds at entry=0x55b95104bf40) at protocol/child.c:1695
1695        return afd;
(gdb) bt
#0  0x000055b950530f25 in do_child (fds=fds at entry=0x55b95104bf40) at protocol/child.c:1695
#1  0x000055b950507576 in fork_a_child (fds=0x55b95104bf40, id=17) at main/pgpool_main.c:686
#2  0x000055b95050807b in reaper () at main/pgpool_main.c:2509
#3  0x000055b95050ef20 in PgpoolMain (discard_status=<optimized out>, clear_memcache_oidmaps=<optimized out>) at main/pgpool_main.c:477
#4  0x000055b950505566 in main (argc=<optimized out>, argv=<optimized out>) at main/main.c:365
(gdb) bt full
#0  0x000055b950530f25 in do_child (fds=fds at entry=0x55b95104bf40) at protocol/child.c:1695
        sp = <optimized out>
        front_end_fd = 6
        saddr = {addr = {ss_family = 2, __ss_padding = "\322\000\254\036\245\063", '\000' <repeats 111 times>, __ss_align = 0}, salen = 16}
        con_count = <optimized out>
        local_sigjmp_buf = {{__jmpbuf = {94254416576332, -530653384382068941, 0, 140722964929360, 94254405635737, 1210957, -6041260252762192077, -530653153064106189}, 
            __mask_was_saved = 1, __saved_mask = {__val = {0 <repeats 16 times>}}}}
        backend = 0x0
        now = {tv_sec = 1648771538, tv_usec = 48166}
        tz = {tz_minuteswest = 0, tz_dsttime = 0}
        connections_count = 0
        psbuf = '\000' <repeats 1008 times>...
        proc_info = 0x0
        walk = <optimized out>
#1  0x000055b950507576 in fork_a_child (fds=0x55b95104bf40, id=17) at main/pgpool_main.c:686
        pid = 0
        elevel_ = <optimized out>
#2  0x000055b95050807b in reaper () at main/pgpool_main.c:2509
        exiting_process_name = 0x55b9505dce9d "child"
        new_pid = <optimized out>
        shutdown_system = <optimized out>
        restart_child = <optimized out>
        found = 1 '\001'
        process_health_check = 0 '\000'
        pid = 1210957
        status = 256
        i = 17
#3  0x000055b95050ef20 in PgpoolMain (discard_status=<optimized out>, clear_memcache_oidmaps=<optimized out>) at main/pgpool_main.c:477
        i = 3
        local_sigjmp_buf = {{__jmpbuf = {94254406965376, -6041260252344858829, 94254407050368, 2, 5432, 140722964929952, -6041260252340664525, -530653160400206029}, 
            __mask_was_saved = 1, __saved_mask = {__val = {18446744066192964103, 0, 529823011, 94254416847448, 140722964930568, 140722964930608, 140722964929968, 
                140722964929984, 140017978348009, 0, 0, 2, 0, 0, 94254416846528, 140722964930512}}}}
        first = 0 '\000'
#4  0x000055b950505566 in main (argc=<optimized out>, argv=<optimized out>) at main/main.c:365
        opt = <optimized out>
        debug_level = <optimized out>
        optindex = 0
        discard_status = 0 '\000'
        clear_memcache_oidmaps = 0 '\000'
        pcp_conf_file_path = "/usr/local/pgpool/etc/pcp.conf", '\000' <repeats 8162 times>
        conf_file_path = "/usr/local/pgpool/etc/pgpool.conf", '\000' <repeats 8159 times>
        hba_file_path = "/usr/local/pgpool/etc/pool_hba.conf", '\000' <repeats 8157 times>
        pool_passwd_key_file_path = "/psql/.pgpoolkey", '\000' <repeats 2808 times>...
        long_options = {{name = 0x55b9505dc2bb "hba-file", has_arg = 1, flag = 0x0, val = 97}, {name = 0x55b9505dc2c4 "debug", has_arg = 0, flag = 0x0, val = 100}, {
            name = 0x55b9505dc2ca "config-file", has_arg = 1, flag = 0x0, val = 102}, {name = 0x55b9505dc2d6 "key-file", has_arg = 1, flag = 0x0, val = 107}, {
            name = 0x55b9505dc2df "pcp-file", has_arg = 1, flag = 0x0, val = 70}, {name = 0x55b9505dc2e8 "help", has_arg = 0, flag = 0x0, val = 104}, {
            name = 0x55b9505e2204 "mode", has_arg = 1, flag = 0x0, val = 109}, {name = 0x55b9505dc2ed "dont-detach", has_arg = 0, flag = 0x0, val = 110}, {
            name = 0x55b9505dc2f9 "discard-status", has_arg = 0, flag = 0x0, val = 68}, {name = 0x55b9505dc308 "clear-oidmaps", has_arg = 0, flag = 0x0, val = 67}, {
            name = 0x55b9505dc316 "debug-assertions", has_arg = 0, flag = 0x0, val = 120}, {name = 0x55b9505f018c "version", has_arg = 0, flag = 0x0, val = 118}, {
            name = 0x0, has_arg = 0, flag = 0x0, val = 0}}


More information about the pgpool-general mailing list