View Issue Details

IDProjectCategoryView StatusLast Update
0000750Pgpool-IIBugpublic2022-10-24 16:21
Reporterfjcasero Assigned Tokawamoto  
PrioritynormalSeveritymajorReproducibilitysometimes
Status assignedResolutionopen 
PlatformLinuxOSRed Hat Enterprise Linux ServerOS Version7.9
Product Version4.3.1 
Summary0000750: child process with pid: 22220 was terminated by segmentation fault
Description
We have two Pgpool-II 4.3.1 nodes under RedHat 7.9 setup as a failover
cluster. Each node runs a PostgreSQL 12.9 server in a hot standby
configuration with streaming replication.

We are not using pgpool for load balancing, we are using haproxy for load balancing between. Haproxy
checks if pgpool is running in a server using psql cli with the query 'SELECT now()' every 5 seconds.

Even if there is no client connected (only haproxy using psql cli), core files are being generated.

It seems that the reaper handler may have something to do with the core files generated:

2022-03-21 08:41:11.813: main pid 23830: DEBUG: reaper handler
2022-03-21 08:41:11.813: main pid 23830: LOCATION: pgpool_main.c:2352
2022-03-21 08:41:11.813: main pid 23830: WARNING: child process with pid: 20048 was terminated by segmentation fault

2022-03-23 05:14:09.279: main pid 22074: DEBUG: reaper handler
2022-03-23 05:14:09.279: main pid 22074: LOCATION: pgpool_main.c:2352
2022-03-23 05:14:09.280: main pid 22074: WARNING: child process with pid: 22220 was terminated by segmentation fault

2022-03-23 05:14:25.430: main pid 22074: DEBUG: reaper handler
2022-03-23 05:14:25.430: main pid 22074: LOCATION: pgpool_main.c:2352
2022-03-23 05:14:25.431: main pid 22074: WARNING: child process with pid: 22197 was terminated by segmentation fault

Backtrace
-------------------
[postgres]$ gdb pgpool core.22220
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/pgpool...Reading symbols from /usr/lib/debug/usr/bin/pgpool.debug...done.
done.
[New LWP 22220]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `pgpool: wait for connection request '.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000043532e in do_child (fds=fds@entry=0xf87200) at protocol/child.c:333
333 proc_info->wait_for_connect = 0;
Missing separate debuginfos, use: debuginfo-install audit-libs-2.8.5-4.el7.x86_64 cyrus-sasl-lib-2.1.26-23.el7.x86_64 glibc-2.17-325.el7_9.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-51.el7_9.x86_64 libcap-ng-0.7.5-4.el7.x86_64 libcom_err-1.42.9-19.el7.x86_64 libgcc-4.8.5-44.el7.x86_64 libmemcached-1.0.16-5.el7.x86_64 libselinux-2.5-15.el7.x86_64 libstdc++-4.8.5-44.el7.x86_64 nspr-4.32.0-1.el7_9.x86_64 nss-3.67.0-4.el7_9.x86_64 nss-softokn-freebl-3.67.0-3.el7_9.x86_64 nss-util-3.67.0-1.el7_9.x86_64 openldap-2.4.44-24.el7_9.x86_64 openssl-libs-1.0.2k-24.el7_9.x86_64 pam-1.1.8-23.el7.x86_64 pcre-8.32-17.el7.x86_64 postgresql12-libs-12.9-1PGDG.rhel7.x86_64 zlib-1.2.7-19.el7_9.x86_64
(gdb) bt
#0 0x000000000043532e in do_child (fds=fds@entry=0xf87200) at protocol/child.c:333
0000001 0x000000000040b7e5 in fork_a_child (fds=0xf87200, id=131) at main/pgpool_main.c:686
0000002 0x00000000004127e0 in PgpoolMain (discard_status=discard_status@entry=1 '\001',
    clear_memcache_oidmaps=clear_memcache_oidmaps@entry=0 '\000') at main/pgpool_main.c:410
0000003 0x0000000000409b4a in main (argc=<optimized out>, argv=<optimized out>) at main/main.c:365

Steps To ReproduceNot known
Additional InformationEnvironment information:
=========================
Pgpool
--------
pgpool-II version 4.3.1 (tamahomeboshi)
pgpool-II-pg12.x86_64 4.3.1-1pgdg.rhel7
pgpool-II-pg12-extensions.x86_64 4.3.1-1pgdg.rhel7
Mode -> streaming replication
The configuration file we are using is attached. One example of core file is also attached and below you can see it backtrace.

O.S.
----
Red Hat Enterprise Linux Server release 7.9 (Maipo)
Linux 3.10.0-1160.53.1.el7.x86_64 x86_64
libc version -> (GNU libc) 2.17
HAProxy version 2.4.8

Postgres:
-----------------
postgresql12-server.x86_64 12.9-1PGDG.rhel7
postgresql12.x86_64 12.9-1PGDG.rhel7
postgresql12-contrib.x86_64 12.9-1PGDG.rhel7
postgresql12-libs.x86_64 12.9-1PGDG.rhel7

Application language
----------------------
Chrome -> wildfly-11.0.0.Final -> Java 1.8
haproxy -> psql cli

haproxy configuration
-------------------------------
option external-check
    external-check command /usr/local/sbin/postgreschk_qry
    default-server inter 5s fall 3 maxconn 250
    server server-a.domain.com server-a.domain.com:9999 check port 9999
    server server-b.domain.com server-b.domain.com:9999 check port 9999

/usr/local/sbin/postgreschk_qry
----------------------------------------------
/usr/pgsql-12/bin/psql -t -h $PGSQL_HOST -p $PGSQL_PORT -U postgres -w -c 'SELECT now();'
if [ $? -eq 0 ]; then
   exit 0
else
   exit 1
fi

Tagserror, segfault

Activities

fjcasero

2022-03-30 19:08

reporter  

pgpool.conf (49,578 bytes)   
# ----------------------------
# pgPool-II configuration file
# ----------------------------
#
# This file consists of lines of the form:
#
#   name = value
#
# Whitespace may be used.  Comments are introduced with "#" anywhere on a line.
# The complete list of parameter names and allowed values can be found in the
# pgPool-II documentation.
#
# This file is read on server startup and when the server receives a SIGHUP
# signal.  If you edit the file on a running system, you have to SIGHUP the
# server for the changes to take effect, or use "pgpool reload".  Some
# parameters, which are marked below, require a server shutdown and restart to
# take effect.
#

#------------------------------------------------------------------------------
# BACKEND CLUSTERING MODE
# Choose one of: 'streaming_replication', 'native_replication',
#	'logical_replication', 'slony', 'raw' or 'snapshot_isolation'
# (change requires restart)
#------------------------------------------------------------------------------

backend_clustering_mode = 'streaming_replication'

#------------------------------------------------------------------------------
# CONNECTIONS
#------------------------------------------------------------------------------

# - pgpool Connection Settings -

#listen_addresses = 'localhost'
listen_addresses = '*'
                                   # Host name or IP address to listen on:
                                   # '*' for all, '' for no TCP/IP connections
                                   # (change requires restart)
port = 9999
                                   # Port number
                                   # (change requires restart)
socket_dir = '/var/run/postgresql'
                                   # Unix domain socket path
                                   # The Debian package defaults to
                                   # /var/run/postgresql
                                   # (change requires restart)
reserved_connections = 5
                                   # Number of reserved connections.
                                   # Pgpool-II does not accept connections if over
                                   # num_init_chidlren - reserved_connections.


# - pgpool Communication Manager Connection Settings -

#pcp_listen_addresses = '*'
                                   # Host name or IP address for pcp process to listen on:
                                   # '*' for all, '' for no TCP/IP connections
                                   # (change requires restart)
#pcp_port = 9898
                                   # Port number for pcp
                                   # (change requires restart)
pcp_socket_dir = '/var/run/postgresql'
                                   # Unix domain socket path for pcp
                                   # The Debian package defaults to
                                   # /var/run/postgresql
                                   # (change requires restart)
#listen_backlog_multiplier = 2
                                   # Set the backlog parameter of listen(2) to
                                   # num_init_children * listen_backlog_multiplier.
                                   # (change requires restart)
#serialize_accept = off
                                   # whether to serialize accept() call to avoid thundering herd problem
                                   # (change requires restart)

# - Backend Connection Settings -

backend_hostname0 = 'server-a.domain.com'
                                   # Host name or IP address to connect to for backend 0
backend_port0 = 5432
                                   # Port number for backend 0
backend_weight0 = 1
                                   # Weight for backend 0 (only in load balancing mode)
backend_data_directory0 = '/var/lib/pgsql/12/data'
                                   # Data directory for backend 0
backend_flag0 = 'ALLOW_TO_FAILOVER'
                                   # Controls various backend behavior
                                   # ALLOW_TO_FAILOVER, DISALLOW_TO_FAILOVER
                                   # or ALWAYS_PRIMARY
backend_application_name0 = 'server-a.domain.com'
                                   # walsender's application_name, used for "show pool_nodes" command
backend_hostname1 = 'server-b.domain.com'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/pgsql/12/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_application_name1 = 'server-b.domain.com'

# - Authentication -

enable_pool_hba = on
                                   # Use pool_hba.conf for client authentication
#pool_passwd = 'pool_passwd'
                                   # File name of pool_passwd for md5 authentication.
                                   # "" disables pool_passwd.
                                   # (change requires restart)
#authentication_timeout = 1min
                                   # Delay in seconds to complete client authentication
                                   # 0 means no timeout.

#allow_clear_text_frontend_auth = off
                                   # Allow Pgpool-II to use clear text password authentication
                                   # with clients, when pool_passwd does not
                                   # contain the user password

# - SSL Connections -

#ssl = off
                                   # Enable SSL support
                                   # (change requires restart)
#ssl_key = 'server.key'
                                   # SSL private key file
                                   # (change requires restart)
#ssl_cert = 'server.crt'
                                   # SSL public certificate file
                                   # (change requires restart)
#ssl_ca_cert = ''
                                   # Single PEM format file containing
                                   # CA root certificate(s)
                                   # (change requires restart)
#ssl_ca_cert_dir = ''
                                   # Directory containing CA root certificate(s)
                                   # (change requires restart)
#ssl_crl_file = ''
                                   # SSL certificate revocation list file
                                   # (change requires restart)

#ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL'
                                   # Allowed SSL ciphers
                                   # (change requires restart)
#ssl_prefer_server_ciphers = off
                                   # Use server's SSL cipher preferences,
                                   # rather than the client's
                                   # (change requires restart)
#ssl_ecdh_curve = 'prime256v1'
                                   # Name of the curve to use in ECDH key exchange
#ssl_dh_params_file = ''
                                   # Name of the file containing Diffie-Hellman parameters used
                                   # for so-called ephemeral DH family of SSL cipher.
#ssl_passphrase_command=''
                                   # Sets an external command to be invoked when a passphrase
                                   # for decrypting an SSL file needs to be obtained
                                   # (change requires restart)

#------------------------------------------------------------------------------
# POOLS
#------------------------------------------------------------------------------

# - Concurrent session and pool size -

num_init_children = 250
                                   # Number of concurrent sessions allowed
                                   # (change requires restart)
max_pool = 4
                                   # Number of connection pool caches per connection
                                   # (change requires restart)

# - Life time -

#child_life_time = 5min
                                   # Pool exits after being idle for this many seconds
#child_max_connections = 0
                                   # Pool exits after receiving that many connections
                                   # 0 means no exit
#connection_life_time = 0
                                   # Connection to backend closes after being idle for this many seconds
                                   # 0 means no close
#client_idle_limit = 0
                                   # Client is disconnected after being idle for that many seconds
                                   # (even inside an explicit transactions!)
                                   # 0 means no disconnection


#------------------------------------------------------------------------------
# LOGS
#------------------------------------------------------------------------------

# - Where to log -

log_destination = 'stderr'
                                   # Where to log
                                   # Valid values are combinations of stderr,
                                   # and syslog. Default to stderr.

# - What to log -

#log_line_prefix = '%m: %a pid %p: '   # printf-style string to output at beginning of each log line.

#log_connections = off
                                   # Log connections
#log_disconnections = off
                                   # Log disconnections
#log_hostname = off
                                   # Hostname will be shown in ps status
                                   # and in logs if connections are logged
#log_statement = off
                                   # Log all statements
#log_per_node_statement = off
                                   # Log all statements
                                   # with node and backend informations
#log_client_messages = off
                                   # Log any client messages
#log_standby_delay = 'if_over_threshold'
                                   # Log standby delay
                                   # Valid values are combinations of always,
                                   # if_over_threshold, none

# - Syslog specific -

#syslog_facility = 'LOCAL0'
                                   # Syslog local facility. Default to LOCAL0
#syslog_ident = 'pgpool'
                                   # Syslog program identification string
                                   # Default to 'pgpool'

# - Debug -

log_error_verbosity = verbose          # terse, default, or verbose messages

client_min_messages = debug5           # values in order of decreasing detail:
                                        #   debug5
                                        #   debug4
                                        #   debug3
                                        #   debug2
                                        #   debug1
                                        #   log
                                        #   notice
                                        #   warning
                                        #   error

log_min_messages = debug5             # values in order of decreasing detail:
                                        #   debug5
                                        #   debug4
                                        #   debug3
                                        #   debug2
                                        #   debug1
                                        #   info
                                        #   notice
                                        #   warning
                                        #   error
                                        #   log
                                        #   fatal
                                        #   panic

# This is used when logging to stderr:
logging_collector = on
                                        # Enable capturing of stderr
                                        # into log files.
                                        # (change requires restart)

# -- Only used if logging_collector is on ---

log_directory = '/var/log/pgpool_log'
                                        # directory where log files are written,
                                        # can be absolute
log_filename = 'pgpool-%Y-%m-%d_%H%M%S.log'
                                        # log file name pattern,
                                        # can include strftime() escapes

#log_file_mode = 0600
                                        # creation mode for log files,
                                        # begin with 0 to use octal notation

log_truncate_on_rotation = on
                                        # If on, an existing log file with the
                                        # same name as the new log file will be
                                        # truncated rather than appended to.
                                        # But such truncation only occurs on
                                        # time-driven rotation, not on restarts
                                        # or size-driven rotation.  Default is
                                        # off, meaning append to existing files
                                        # in all cases.

log_rotation_age = 10d
                                        # Automatic rotation of logfiles will
                                        # happen after that (minutes)time.
                                        # 0 disables time based rotation.
log_rotation_size = 10MB
                                        # Automatic rotation of logfiles will
                                        # happen after that much (KB) log output.
                                        # 0 disables size based rotation.
#------------------------------------------------------------------------------
# FILE LOCATIONS
#------------------------------------------------------------------------------

#pid_file_name = '/var/run/pgpool/pgpool.pid'
                                   # PID file name
                                   # Can be specified as relative to the"
                                   # location of pgpool.conf file or
                                   # as an absolute path
                                   # (change requires restart)
#logdir = '/tmp'
                                   # Directory of pgPool status file
                                   # (change requires restart)


#------------------------------------------------------------------------------
# CONNECTION POOLING
#------------------------------------------------------------------------------

connection_cache = on
                                   # Activate connection pools
                                   # (change requires restart)

                                   # Semicolon separated list of queries
                                   # to be issued at the end of a session
                                   # The default is for 8.3 and later
#reset_query_list = 'ABORT; DISCARD ALL'
                                   # The following one is for 8.2 and before
#reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'


#------------------------------------------------------------------------------
# REPLICATION MODE
#------------------------------------------------------------------------------

#replicate_select = off
                                   # Replicate SELECT statements
                                   # when in replication mode
                                   # replicate_select is higher priority than
                                   # load_balance_mode.

#insert_lock = off
                                   # Automatically locks a dummy row or a table
                                   # with INSERT statements to keep SERIAL data
                                   # consistency
                                   # Without SERIAL, no lock will be issued
#lobj_lock_table = ''
                                   # When rewriting lo_creat command in
                                   # replication mode, specify table name to
                                   # lock

# - Degenerate handling -

#replication_stop_on_mismatch = off
                                   # On disagreement with the packet kind
                                   # sent from backend, degenerate the node
                                   # which is most likely "minority"
                                   # If off, just force to exit this session

#failover_if_affected_tuples_mismatch = off
                                   # On disagreement with the number of affected
                                   # tuples in UPDATE/DELETE queries, then
                                   # degenerate the node which is most likely
                                   # "minority".
                                   # If off, just abort the transaction to
                                   # keep the consistency


#------------------------------------------------------------------------------
# LOAD BALANCING MODE
#------------------------------------------------------------------------------

load_balance_mode = off
                                   # Activate load balancing mode
                                   # (change requires restart)
#ignore_leading_white_space = on
                                   # Ignore leading white spaces of each query
#read_only_function_list = ''
                                   # Comma separated list of function names
                                   # that don't write to database
                                   # Regexp are accepted
#write_function_list = ''
                                   # Comma separated list of function names
                                   # that write to database
                                   # Regexp are accepted
                                   # If both read_only_function_list and write_function_list
                                   # is empty, function's volatile property is checked.
                                   # If it's volatile, the function is regarded as a
                                   # writing function.

#primary_routing_query_pattern_list = ''
                                   # Semicolon separated list of query patterns
                                   # that should be sent to primary node
                                   # Regexp are accepted
                                   # valid for streaming replicaton mode only.

#database_redirect_preference_list = ''
                                   # comma separated list of pairs of database and node id.
                                   # example: postgres:primary,mydb[0-4]:1,mydb[5-9]:2'
                                   # valid for streaming replicaton mode only.

#app_name_redirect_preference_list = ''
                                   # comma separated list of pairs of app name and node id.
                                   # example: 'psql:primary,myapp[0-4]:1,myapp[5-9]:standby'
                                   # valid for streaming replicaton mode only.
#allow_sql_comments = off
                                   # if on, ignore SQL comments when judging if load balance or
                                   # query cache is possible.
                                   # If off, SQL comments effectively prevent the judgment
                                   # (pre 3.4 behavior).

#disable_load_balance_on_write = 'transaction'
                                   # Load balance behavior when write query is issued
                                   # in an explicit transaction.
                                   #
                                   # Valid values:
                                   #
                                   # 'transaction' (default):
                                   #     if a write query is issued, subsequent
                                   #     read queries will not be load balanced
                                   #     until the transaction ends.
                                   #
                                   # 'trans_transaction':
                                   #     if a write query is issued, subsequent
                                   #     read queries in an explicit transaction
                                   #     will not be load balanced until the session ends.
                                   #
                                   # 'dml_adaptive':
                                   #     Queries on the tables that have already been
                                   #     modified within the current explicit transaction will
                                   #     not be load balanced until the end of the transaction.
                                   #
                                   # 'always':
                                   #     if a write query is issued, read queries will
                                   #     not be load balanced until the session ends.
                                   #
                                   # Note that any query not in an explicit transaction
                                   # is not affected by the parameter except 'always'.

#dml_adaptive_object_relationship_list= ''
                                   # comma separated list of object pairs
                                   # [object]:[dependent-object], to disable load balancing
                                   # of dependent objects within the explicit transaction
                                   # after WRITE statement is issued on (depending-on) object.
                                   #
                                   # example: 'tb_t1:tb_t2,insert_tb_f_func():tb_f,tb_v:my_view'
                                   # Note: function name in this list must also be present in
                                   # the write_function_list
                                   # only valid for disable_load_balance_on_write = 'dml_adaptive'.

#statement_level_load_balance = off
                                   # Enables statement level load balancing

#------------------------------------------------------------------------------
# STREAMING REPLICATION MODE
#------------------------------------------------------------------------------

# - Streaming -

sr_check_period = 10
                                   # Streaming replication check period
                                   # Disabled (0) by default
sr_check_user = 'pgpool'
                                   # Streaming replication check user
                                   # This is neccessary even if you disable streaming
                                   # replication delay check by sr_check_period = 0
sr_check_password = ''
                                   # Password for streaming replication check user
                                   # Leaving it empty will make Pgpool-II to first look for the
                                   # Password in pool_passwd file before using the empty password

#sr_check_database = 'postgres'
                                   # Database name for streaming replication check
#delay_threshold = 0
                                   # Threshold before not dispatching query to standby node
                                   # Unit is in bytes
                                   # Disabled (0) by default
#prefer_lower_delay_standby = off
                                   # If delay_threshold is set larger than 0, Pgpool-II send to
                                   # the primary when selected node is delayed over delay_threshold.
                                   # If this is set to on, Pgpool-II send query to other standby
                                   # delayed lower.

# - Special commands -

follow_primary_command = '/var/lib/pgsql/12/pgpool/follow_primary.sh %d %h %p %D %m %H %M %P %r %R'
                                   # Executes this command after main node failover
                                   # Special values:
                                   #   %d = failed node id
                                   #   %h = failed node host name
                                   #   %p = failed node port number
                                   #   %D = failed node database cluster path
                                   #   %m = new main node id
                                   #   %H = new main node hostname
                                   #   %M = old main node id
                                   #   %P = old primary node id
                                   #   %r = new main port number
                                   #   %R = new main database cluster path
                                   #   %N = old primary node hostname
                                   #   %S = old primary node port number
                                   #   %% = '%' character

#------------------------------------------------------------------------------
# HEALTH CHECK GLOBAL PARAMETERS
#------------------------------------------------------------------------------

health_check_period = 5
                                   # Health check period
                                   # Disabled (0) by default
health_check_timeout = 30
                                   # Health check timeout
                                   # 0 means no timeout
health_check_user = 'pgpool'
                                   # Health check user
health_check_password = ''
                                   # Password for health check user
                                   # Leaving it empty will make Pgpool-II to first look for the
                                   # Password in pool_passwd file before using the empty password

#health_check_database = ''
                                   # Database name for health check. If '', tries 'postgres' frist, 
health_check_max_retries = 3
                                   # Maximum number of times to retry a failed health check before giving up.
#health_check_retry_delay = 1
                                   # Amount of time to wait (in seconds) between retries.
#connect_timeout = 10000
                                   # Timeout value in milliseconds before giving up to connect to backend.
                                   # Default is 10000 ms (10 second). Flaky network user may want to increase
                                   # the value. 0 means no timeout.
                                   # Note that this value is not only used for health check,
                                   # but also for ordinary conection to backend.

#------------------------------------------------------------------------------
# HEALTH CHECK PER NODE PARAMETERS (OPTIONAL)
#------------------------------------------------------------------------------
#health_check_period0 = 0
#health_check_timeout0 = 20
#health_check_user0 = 'nobody'
#health_check_password0 = ''
#health_check_database0 = ''
#health_check_max_retries0 = 0
#health_check_retry_delay0 = 1
#connect_timeout0 = 10000

#------------------------------------------------------------------------------
# FAILOVER AND FAILBACK
#------------------------------------------------------------------------------

failover_command = '/var/lib/pgsql/12/pgpool/failover.sh %d %h %p %D %m %H %M %P %r %R %N %S'
                                   # Executes this command at failover
                                   # Special values:
                                   #   %d = failed node id
                                   #   %h = failed node host name
                                   #   %p = failed node port number
                                   #   %D = failed node database cluster path
                                   #   %m = new main node id
                                   #   %H = new main node hostname
                                   #   %M = old main node id
                                   #   %P = old primary node id
                                   #   %r = new main port number
                                   #   %R = new main database cluster path
                                   #   %N = old primary node hostname
                                   #   %S = old primary node port number
                                   #   %% = '%' character
#failback_command = ''
                                   # Executes this command at failback.
                                   # Special values:
                                   #   %d = failed node id
                                   #   %h = failed node host name
                                   #   %p = failed node port number
                                   #   %D = failed node database cluster path
                                   #   %m = new main node id
                                   #   %H = new main node hostname
                                   #   %M = old main node id
                                   #   %P = old primary node id
                                   #   %r = new main port number
                                   #   %R = new main database cluster path
                                   #   %N = old primary node hostname
                                   #   %S = old primary node port number
                                   #   %% = '%' character

#failover_on_backend_error = on
                                   # Initiates failover when reading/writing to the
                                   # backend communication socket fails
                                   # If set to off, pgpool will report an
                                   # error and disconnect the session.

#failover_on_backend_shutdown = off
                                   # Initiates failover when backend is shutdown,
				   # or backend process is killed.
                                   # If set to off, pgpool will report an
                                   # error and disconnect the session.

#detach_false_primary = off
                                   # Detach false primary if on. Only
                                   # valid in streaming replicaton
                                   # mode and with PostgreSQL 9.6 or
                                   # after.

search_primary_node_timeout = 5min
                                   # Timeout in seconds to search for the
                                   # primary node when a failover occurs.
                                   # 0 means no timeout, keep searching
                                   # for a primary node forever.

#------------------------------------------------------------------------------
# ONLINE RECOVERY
#------------------------------------------------------------------------------

recovery_user = 'pgpool'
                                   # Online recovery user
recovery_password = ''
                                   # Online recovery password
                                   # Leaving it empty will make Pgpool-II to first look for the
                                   # Password in pool_passwd file before using the empty password

recovery_1st_stage_command = 'recovery_1st_stage'
                                   # Executes a command in first stage
#recovery_2nd_stage_command = ''
                                   # Executes a command in second stage
#recovery_timeout = 90
                                   # Timeout in seconds to wait for the
                                   # recovering node's postmaster to start up
                                   # 0 means no wait
#client_idle_limit_in_recovery = 0
                                   # Client is disconnected after being idle
                                   # for that many seconds in the second stage
                                   # of online recovery
                                   # 0 means no disconnection
                                   # -1 means immediate disconnection

#auto_failback = off
                                   # Dettached backend node reattach automatically
                                   # if replication_state is 'streaming'.
#auto_failback_interval = 1min
                                   # Min interval of executing auto_failback in
                                   # seconds.

#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------

# - Enabling -

use_watchdog = on
                                    # Activates watchdog
                                    # (change requires restart)

# -Connection to up stream servers -

#trusted_servers = ''
                                    # trusted server list which are used
                                    # to confirm network connection
                                    # (hostA,hostB,hostC,...)
                                    # (change requires restart)
#ping_path = '/bin'
                                    # ping command path
                                    # (change requires restart)

# - Watchdog communication Settings -

hostname0 = 'server-a.domain.com'
                                    # Host name or IP address of pgpool node
                                    # for watchdog connection
                                    # (change requires restart)
wd_port0 = 9001
                                    # Port number for watchdog service
                                    # (change requires restart)
pgpool_port0 = 9999
                                    # Port number for pgpool
                                    # (change requires restart)

hostname1 = 'server-b.domain.com'
wd_port1 = 9001
pgpool_port1 = 9999

#hostname2 = ''
#wd_port2 = 9001
#pgpool_port2 = 9999

#wd_priority = 1
                                    # priority of this watchdog in leader election
                                    # (change requires restart)

#wd_authkey = ''
                                    # Authentication key for watchdog communication
                                    # (change requires restart)

wd_ipc_socket_dir = '/var/run/postgresql'
                                    # Unix domain socket path for watchdog IPC socket
                                    # The Debian package defaults to
                                    # /var/run/postgresql
                                    # (change requires restart)


# - Virtual IP control Setting -

delegate_IP = ''
                                    # delegate IP address
                                    # If this is empty, virtual IP never bring up.
                                    # (change requires restart)
#if_cmd_path = '/sbin'
                                    # path to the directory where if_up/down_cmd exists
                                    # If if_up/down_cmd starts with "/", if_cmd_path will be ignored.
                                    # (change requires restart)
#if_up_cmd = '/usr/bin/sudo /sbin/ip addr add $_IP_$/24 dev eth0 label eth0:0'
                                    # startup delegate IP command
                                    # (change requires restart)
#if_down_cmd = '/usr/bin/sudo /sbin/ip addr del $_IP_$/24 dev eth0'
                                    # shutdown delegate IP command
                                    # (change requires restart)
#arping_path = '/usr/sbin'
                                    # arping command path
                                    # If arping_cmd starts with "/", if_cmd_path will be ignored.
                                    # (change requires restart)
#arping_cmd = '/usr/bin/sudo /usr/sbin/arping -U $_IP_$ -w 1 -I eth0'
                                    # arping command
                                    # (change requires restart)

# - Behaivor on escalation Setting -

#clear_memqcache_on_escalation = on
                                    # Clear all the query cache on shared memory
                                    # when standby pgpool escalate to active pgpool
                                    # (= virtual IP holder).
                                    # This should be off if client connects to pgpool
                                    # not using virtual IP.
                                    # (change requires restart)
wd_escalation_command = ''
                                    # Executes this command at escalation on new active pgpool.
                                    # (change requires restart)
#wd_de_escalation_command = ''
                                    # Executes this command when leader pgpool resigns from being leader.
                                    # (change requires restart)

# - Watchdog consensus settings for failover -

#failover_when_quorum_exists = on
                                    # Only perform backend node failover
                                    # when the watchdog cluster holds the quorum
                                    # (change requires restart)

#failover_require_consensus = on
                                    # Perform failover when majority of Pgpool-II nodes
                                    # aggrees on the backend node status change
                                    # (change requires restart)

#allow_multiple_failover_requests_from_node = off
                                    # A Pgpool-II node can cast multiple votes
                                    # for building the consensus on failover
                                    # (change requires restart)


enable_consensus_with_half_votes = on
                                    # apply majority rule for consensus and quorum computation
                                    # at 50% of votes in a cluster with even number of nodes.
                                    # when enabled the existence of quorum and consensus
                                    # on failover is resolved after receiving half of the
                                    # total votes in the cluster, otherwise both these
                                    # decisions require at least one more vote than
                                    # half of the total votes.
                                    # (change requires restart)

# - Watchdog cluster membership settings for quorum computation -

#wd_remove_shutdown_nodes = off
                                    # when enabled cluster membership of properly shutdown
                                    # watchdog nodes gets revoked, After that the node does
                                    # not count towards the quorum and consensus computations

#wd_lost_node_removal_timeout = 0s
                                    # Timeout after which the cluster membership of LOST watchdog
                                    # nodes gets revoked. After that the node node does not
                                    # count towards the quorum and consensus computations
                                    # setting timeout to 0 will never revoke the membership
                                    # of LOST nodes

#wd_no_show_node_removal_timeout = 0s
                                    # Time to wait for Watchdog node to connect to the cluster.
                                    # After that time the cluster membership of NO-SHOW node gets
                                    # revoked and it does not count towards the quorum and
                                    # consensus computations
                                    # setting timeout to 0 will not revoke the membership
                                    # of NO-SHOW nodes


# - Lifecheck Setting -

# -- common --

#wd_monitoring_interfaces_list = ''
                                    # Comma separated list of interfaces names to monitor.
                                    # if any interface from the list is active the watchdog will
                                    # consider the network is fine
                                    # 'any' to enable monitoring on all interfaces except loopback
                                    # '' to disable monitoring
                                    # (change requires restart)

wd_lifecheck_method = 'heartbeat'
                                    # Method of watchdog lifecheck ('heartbeat' or 'query' or 'external')
                                    # (change requires restart)
wd_interval = 10
                                    # lifecheck interval (sec) > 0
                                    # (change requires restart)

# -- heartbeat mode --

heartbeat_hostname0 = 'server-a.domain.com'
                                    # Host name or IP address used
                                    # for sending heartbeat signal.
                                    # (change requires restart)
heartbeat_port0 = 9694
                                    # Port number used for receiving/sending heartbeat signal
                                    # Usually this is the same as heartbeat_portX.
                                    # (change requires restart)
heartbeat_device0 = ''
                                    # Name of NIC device (such like 'eth0')
                                    # used for sending/receiving heartbeat
                                    # signal to/from destination 0.
                                    # This works only when this is not empty
                                    # and pgpool has root privilege.
                                    # (change requires restart)

heartbeat_hostname1 = 'server-b.domain.com'
heartbeat_port1 = 9694
heartbeat_device1 = ''
#heartbeat_hostname2 = ''
#heartbeat_port2 = 9694
#heartbeat_device2 = ''

wd_heartbeat_keepalive = 2
                                    # Interval time of sending heartbeat signal (sec)
                                    # (change requires restart)
wd_heartbeat_deadtime = 30
                                    # Deadtime interval for heartbeat signal (sec)
                                    # (change requires restart)

# -- query mode --

#wd_life_point = 3
                                    # lifecheck retry times
                                    # (change requires restart)
#wd_lifecheck_query = 'SELECT 1'
                                    # lifecheck query to pgpool from watchdog
                                    # (change requires restart)
#wd_lifecheck_dbname = 'template1'
                                    # Database name connected for lifecheck
                                    # (change requires restart)
#wd_lifecheck_user = 'nobody'
                                    # watchdog user monitoring pgpools in lifecheck
                                    # (change requires restart)
#wd_lifecheck_password = ''
                                    # Password for watchdog user in lifecheck
                                    # Leaving it empty will make Pgpool-II to first look for the
                                    # Password in pool_passwd file before using the empty password
                                    # (change requires restart)

#------------------------------------------------------------------------------
# OTHERS
#------------------------------------------------------------------------------
#relcache_expire = 0
                                   # Life time of relation cache in seconds.
                                   # 0 means no cache expiration(the default).
                                   # The relation cache is used for cache the
                                   # query result against PostgreSQL system
                                   # catalog to obtain various information
                                   # including table structures or if it's a
                                   # temporary table or not. The cache is
                                   # maintained in a pgpool child local memory
                                   # and being kept as long as it survives.
                                   # If someone modify the table by using
                                   # ALTER TABLE or some such, the relcache is
                                   # not consistent anymore.
                                   # For this purpose, cache_expiration
                                   # controls the life time of the cache.
#relcache_size = 256
                                   # Number of relation cache
                                   # entry. If you see frequently:
                                   # "pool_search_relcache: cache replacement happend"
                                   # in the pgpool log, you might want to increate this number.

#check_temp_table = catalog
                                   # Temporary table check method. catalog, trace or none.
                                   # Default is catalog.

#check_unlogged_table = on
                                   # If on, enable unlogged table check in SELECT statements.
                                   # This initiates queries against system catalog of primary/main
                                   # thus increases load of primary.
                                   # If you are absolutely sure that your system never uses unlogged tables
                                   # and you want to save access to primary/main, you could turn this off.
                                   # Default is on.
#enable_shared_relcache = on
                                   # If on, relation cache stored in memory cache,
                                   # the cache is shared among child process.
                                   # Default is on.
                                   # (change requires restart)

#relcache_query_target = primary
                                   # Target node to send relcache queries. Default is primary node.
                                   # If load_balance_node is specified, queries will be sent to load balance node.
#------------------------------------------------------------------------------
# IN MEMORY QUERY MEMORY CACHE
#------------------------------------------------------------------------------
#memory_cache_enabled = off
                                   # If on, use the memory cache functionality, off by default
                                   # (change requires restart)
#memqcache_method = 'shmem'
                                   # Cache storage method. either 'shmem'(shared memory) or
                                   # 'memcached'. 'shmem' by default
                                   # (change requires restart)
#memqcache_memcached_host = 'localhost'
                                   # Memcached host name or IP address. Mandatory if
                                   # memqcache_method = 'memcached'.
                                   # Defaults to localhost.
                                   # (change requires restart)
#memqcache_memcached_port = 11211
                                   # Memcached port number. Mondatory if memqcache_method = 'memcached'.
                                   # Defaults to 11211.
                                   # (change requires restart)
#memqcache_total_size = 64MB
                                   # Total memory size in bytes for storing memory cache.
                                   # Mandatory if memqcache_method = 'shmem'.
                                   # Defaults to 64MB.
                                   # (change requires restart)
#memqcache_max_num_cache = 1000000
                                   # Total number of cache entries. Mandatory
                                   # if memqcache_method = 'shmem'.
                                   # Each cache entry consumes 48 bytes on shared memory.
                                   # Defaults to 1,000,000(45.8MB).
                                   # (change requires restart)
#memqcache_expire = 0
                                   # Memory cache entry life time specified in seconds.
                                   # 0 means infinite life time. 0 by default.
                                   # (change requires restart)
#memqcache_auto_cache_invalidation = on
                                   # If on, invalidation of query cache is triggered by corresponding
                                   # DDL/DML/DCL(and memqcache_expire).  If off, it is only triggered
                                   # by memqcache_expire.  on by default.
                                   # (change requires restart)
#memqcache_maxcache = 400kB
                                   # Maximum SELECT result size in bytes.
                                   # Must be smaller than memqcache_cache_block_size. Defaults to 400KB.
                                   # (change requires restart)
#memqcache_cache_block_size = 1MB
                                   # Cache block size in bytes. Mandatory if memqcache_method = 'shmem'.
                                   # Defaults to 1MB.
                                   # (change requires restart)
#memqcache_oiddir = '/var/log/pgpool/oiddir'
                                   # Temporary work directory to record table oids
                                   # (change requires restart)
#cache_safe_memqcache_table_list = ''
                                   # Comma separated list of table names to memcache
                                   # that don't write to database
                                   # Regexp are accepted
#cache_unsafe_memqcache_table_list = ''
                                   # Comma separated list of table names not to memcache
                                   # that don't write to database
                                   # Regexp are accepted
pgpool.conf (49,578 bytes)   

kawamoto

2022-04-01 09:34

developer   ~0004016

>Core was generated by `pgpool: wait for connection request '.
>Program terminated with signal 11, Segmentation fault.
>#0 0x000000000043532e in do_child (fds=fds@entry=0xf87200) at protocol/child.c:333
>333 proc_info->wait_for_connect = 0;
proc_info is an area reserved in shared memory that stores child process information.
It seems that the information in the shared memory was invalid, or the address of proc_info that the child process had was invalid.

So far, no such bug have been reported.
Would you please upload the pgpool log file?
Also, if it can be reproduced, would you please tell me how to reproduce it?

fjcasero

2022-04-04 17:53

reporter   ~0004020

fjcasero

2022-04-04 18:07

reporter   ~0004021

Thanks kawamoto.

I've attached the pgpool log file pgpool-2022-03-23_051334.zip.

I'm working in how to reproduce it.

Best regards.

fjcasero

2022-04-06 03:33

reporter   ~0004024

I've run pgpool under valgrind with the following command line:

/usr/bin/valgrind --tool=memcheck --track-origins=yes --trace-children=yes --time-stamp=yes --show-error-list=yes -v --log-file="/var/log/valgrind/pgpool-%p.log" /usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf

Two cores have been generated after a situation that left the cluster without a primary backend.

---- BACKTRACE -------
gdb pgpool pgpool-9019.log.core.9019
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/pgpool...Reading symbols from /usr/lib/debug/usr/bin/pgpool.debug...done.
done.
[New LWP 9019]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `'.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000043532e in do_child (fds=fds@entry=0xa7f6c80) at protocol/child.c:333
333 proc_info->wait_for_connect = 0;
Missing separate debuginfos, use: debuginfo-install audit-libs-2.8.5-4.el7.x86_64 cyrus-sasl-lib-2.1.26-24.el7_9.x86_64 glibc-2.17-325.el7_9.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-51.el7_9.x86_64 libcap-ng-0.7.5-4.el7.x86_64 libcom_err-1.42.9-19.el7.x86_64 libgcc-4.8.5-44.el7.x86_64 libmemcached-1.0.16-5.el7.x86_64 libselinux-2.5-15.el7.x86_64 libstdc++-4.8.5-44.el7.x86_64 nspr-4.32.0-1.el7_9.x86_64 nss-3.67.0-4.el7_9.x86_64 nss-softokn-freebl-3.67.0-3.el7_9.x86_64 nss-util-3.67.0-1.el7_9.x86_64 openldap-2.4.44-25.el7_9.x86_64 openssl-libs-1.0.2k-25.el7_9.x86_64 pam-1.1.8-23.el7.x86_64 pcre-8.32-17.el7.x86_64 postgresql12-libs-12.10-1PGDG.rhel7.x86_64 zlib-1.2.7-19.el7_9.x86_64
(gdb) t a a bt full

Thread 1 (LWP 9019):
#0 0x000000000043532e in do_child (fds=fds@entry=0xa7f6c80) at protocol/child.c:333
        sp = <optimized out>
        saddr = {addr = {ss_family = 0, __ss_padding = '\000' <repeats 117 times>, __ss_align = 0}, salen = 128}
        local_sigjmp_buf = {{__jmpbuf = {176123020, 6421331532936416509, 7, 1, 1, 137422125524,
              6422936810320727293, 6421332106390572285}, __mask_was_saved = 1, __saved_mask = {__val = {0,
                137422124640, 116732032, 137422125104, 0, 137422124640, 113080123, 42949672960, 268, 175973680, 0,
                5085120, 0, 133143986181, 3440, 5085122}}}}
        backend = 0x0
        now = {tv_sec = 1649171120, tv_usec = 405556}
        tz = {tz_minuteswest = 0, tz_dsttime = 0}
        connections_count = 0
        psbuf = '\000' <repeats 32 times>, "\b\000\000\000\060\000\000\000 <\377\376\037\000\000\000P;\377\376\037", '\000' <repeats 62 times>, "\061\065\063\062\060\200\065\377\376\037\000\000\000\024\315x\n\000\000\000\000\200\065\377\376\037\000\000\000\300\227M\000\000\000\000\000@7\377\376\037\000\000\000\001\000\000\000\000\000\000\000\323\002\000\000\000\000\000\000\024\315x\n\000\000\000\000\365O\312\006\000\000\000\000\001\200\255\373\000\000\000\000"...
        proc_info = <optimized out>
        walk = <optimized out>
0000001 0x000000000040b7e5 in fork_a_child (fds=0xa7f6c80, id=7) at main/pgpool_main.c:686
        pid = 0
0000002 0x000000000040d099 in sync_backend_from_watchdog () at main/pgpool_main.c:4277
        restart = <optimized out>
        primary_changed = <optimized out>
        node_status_was_changed_to_down = <optimized out>
        node_status_was_changed_to_up = <optimized out>
        need_to_restart_children = 1 '\001'
        partial_restart = 0 '\000'
        reload_master_node_id = <optimized out>
        down_node_ids = {0, 0, 48, 48, 15320, 0, 0, 0, 175687862, 0, 170603846, 0, 808631616, 12338, 113537236, 0,
          176165088, 0, 790, 0, 67443888, 0, 175687874, 0, 5212377, 0, -16827744, 31, 2, 0, 4585167, 0, -16827488,
          31, 790, 0, 0, 0, 5212377, 0, 175679040, 0, 4883060, 0, 4, 0, -16827488, 31, 67443888, 0, 4883378, 0,
          32, 48, -16827488, 31, -16827712, 31, -2057830400, 1337486158, 1649171120, 0, 261833, 0, 5084094, 0,
          1295, 0, 0, 0, 113099469, 0, 774910522, 3225138, 175679040, 0, -16826880, 31, 4686049, 0, 175687640, 0,
          38, 1024, 0, 31, 0, 0, 8, 48, -16827360, 31, -16827568, 31, 1024, 0, 0, 0, -2057830400, 1337486158, 1,
---Type <return> to continue, or q <return> to quit---
          0, 8611616, 0, 0, 0, 8611616, 0, 0, 0, 175679384, 0, 1, 0, 4691897, 0, 175687640, 0, 264, 1024, 0, 0,
          -2057830400, 1337486158, 7, 0, 0, 0}
        down_node_ids_index = <optimized out>
        i = 7
        backendStatus = <optimized out>
0000003 0x000000000040f912 in sigusr1_interrupt_processor () at main/pgpool_main.c:1296
No locals.
0000004 0x0000000000412c4f in PgpoolMain (discard_status=discard_status@entry=0 '\000',
    clear_memcache_oidmaps=clear_memcache_oidmaps@entry=0 '\000') at main/pgpool_main.c:477
        i = 2
        local_sigjmp_buf = {{__jmpbuf = {1, 6421345790940318973, 9898, 137422126592, 6, 2, 6422936809838382333,
              6421332090443042045}, __mask_was_saved = 1, __saved_mask = {__val = {18446744066192964103,
                18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615,
                18446744073709551615, 18446744073709551615, 461466061, 4204876, 4294967295, 0, 112807296,
                67388600, 137422126576, 137422126560, 415520739}}}}
        first = 0 '\000'
0000005 0x0000000000409b4a in main (argc=<optimized out>, argv=<optimized out>) at main/main.c:365
        opt = <optimized out>
        debug_level = <optimized out>
        optindex = 0
        discard_status = 0 '\000'
        clear_memcache_oidmaps = 0 '\000'
        pcp_conf_file_path = "/etc/pgpool-II/pcp.conf", '\000' <repeats 8169 times>
        conf_file_path = "/etc/pgpool-II/pgpool.conf", '\000' <repeats 8166 times>
        hba_file_path = "/etc/pgpool-II/pool_hba.conf", '\000' <repeats 8164 times>
        pool_passwd_key_file_path = "/var/lib/pgsql/.pgpoolkey\000D\371\365پE\034\200!\033\064\333,\343H\340\071\266\000\000\000\000\021\000\000\000\034\000\000\000\004\000\000\000\b\000\000\000@\204\t\000\202!\020\240\250@\000\023\304Ё\030\002\"\030@\030\"D\v\200\030\b\001\">\226\002\034\000\000\000\035\000\000\000\037\000\000\000\"\000\000\000$\000\000\000'\000\000\000+\000\000\000\060\000\000\000\061\000\000\000\062\000\000\000\065\000\000\000\000\000\000\000\066\000\000\000\000\000\000\000\071\000\000\000<\000\000\000=\000\000\000\271\201\272\305\352\323\357\016\021\177\027\257\354\a\262\236\370\262E2\271\215\361\016f\267\251\177\331qX\034ʇ\345 bG\032"...
        long_options = {{name = 0x4d87d6 "hba-file", has_arg = 1, flag = 0x0, val = 97}, {name = 0x4d87df "debug",
            has_arg = 0, flag = 0x0, val = 100}, {name = 0x4d87e5 "config-file", has_arg = 1, flag = 0x0,
            val = 102}, {name = 0x4d87f1 "key-file", has_arg = 1, flag = 0x0, val = 107}, {
            name = 0x4d87fa "pcp-file", has_arg = 1, flag = 0x0, val = 70}, {name = 0x4d8803 "help", has_arg = 0,
            flag = 0x0, val = 104}, {name = 0x4deab0 "mode", has_arg = 1, flag = 0x0, val = 109}, {
---Type <return> to continue, or q <return> to quit---
            name = 0x4d8808 "dont-detach", has_arg = 0, flag = 0x0, val = 110}, {name = 0x4d8814 "discard-status",
            has_arg = 0, flag = 0x0, val = 68}, {name = 0x4d8823 "clear-oidmaps", has_arg = 0, flag = 0x0,
            val = 67}, {name = 0x4d8831 "debug-assertions", has_arg = 0, flag = 0x0, val = 120}, {
            name = 0x4ed79c "version", has_arg = 0, flag = 0x0, val = 118}, {name = 0x0, has_arg = 0, flag = 0x0,
            val = 0}}
(gdb)


valgrind log is attached (valgrind-pgpool-9019.log).

==00:02:57:29.890 9019== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==00:02:57:29.890 9019== Access not within mapped region at address 0x14
==00:02:57:29.890 9019== at 0x43532E: do_child (child.c:333)
==00:02:57:29.890 9019== by 0x40B7E4: fork_a_child (pgpool_main.c:686)
==00:02:57:29.890 9019== by 0x40D098: sync_backend_from_watchdog (pgpool_main.c:4277)
==00:02:57:29.890 9019== by 0x40F911: sigusr1_interrupt_processor (pgpool_main.c:1296)
==00:02:57:29.890 9019== by 0x412C4E: PgpoolMain (pgpool_main.c:477)
==00:02:57:29.890 9019== by 0x409B49: main (main.c:365)
==00:02:57:29.890 9019== If you believe this happened as a result of a stack
==00:02:57:29.890 9019== overflow in your program's main thread (unlikely but
==00:02:57:29.890 9019== possible), you can try to increase the size of the
==00:02:57:29.890 9019== main thread stack using the --main-stacksize= flag.
==00:02:57:29.890 9019== The main thread stack size used in this run was 8388608.
--00:02:57:30.450 9019-- Discarding syms at 0xab8b1b0-0xab92501 in /usr/lib64/libnss_files-2.17.so (have_dinfo 1)

==00:02:57:29.890 9019== Invalid write of size 4
==00:02:57:29.890 9019== at 0x43532E: do_child (child.c:333)
==00:02:57:29.890 9019== by 0x40B7E4: fork_a_child (pgpool_main.c:686)
==00:02:57:29.890 9019== by 0x40D098: sync_backend_from_watchdog (pgpool_main.c:4277)
==00:02:57:29.890 9019== by 0x40F911: sigusr1_interrupt_processor (pgpool_main.c:1296)
==00:02:57:29.890 9019== by 0x412C4E: PgpoolMain (pgpool_main.c:477)
==00:02:57:29.890 9019== by 0x409B49: main (main.c:365)
==00:02:57:29.890 9019== Address 0x14 is not stack'd, malloc'd or (recently) free'd

I think pool_get_process_info(getpid()); is returning NULL.

Best regards.
pgpool-9019.log (12,328 bytes)   
==00:02:57:19.082 9019== Memcheck, a memory error detector
==00:02:57:19.082 9019== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==00:02:57:19.082 9019== Using Valgrind-3.15.0-608cb11914-20190413 and LibVEX; rerun with -h for copyright info
==00:02:57:19.082 9019== Command: /usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf
==00:02:57:19.082 9019== Parent PID: 15320
==00:02:57:19.082 9019== 
--00:02:57:19.082 9019-- 
--00:02:57:19.082 9019-- Valgrind options:
--00:02:57:19.082 9019--    --tool=memcheck
--00:02:57:19.082 9019--    --track-origins=yes
--00:02:57:19.082 9019--    --trace-children=yes
--00:02:57:19.082 9019--    --time-stamp=yes
--00:02:57:19.082 9019--    --show-error-list=yes
--00:02:57:19.082 9019--    -v
--00:02:57:19.082 9019--    --log-file=/var/log/valgrind/pgpool-%p.log
--00:02:57:19.082 9019-- Contents of /proc/version:
--00:02:57:19.082 9019--   Linux version 3.10.0-1160.59.1.el7.x86_64 (mockbuild@x86-vm-37.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Wed Feb 16 12:17:35 UTC 2022
--00:02:57:19.082 9019-- 
--00:02:57:19.082 9019-- Arch and hwcaps: AMD64, LittleEndian, amd64-cx16-rdtscp-sse3-ssse3-avx-f16c-rdrand
--00:02:57:19.082 9019-- Page sizes: currently 4096, max supported 4096
--00:02:57:19.082 9019-- Valgrind library directory: /usr/libexec/valgrind
==00:02:57:19.087 9019== embedded gdbserver: reading from /tmp/vgdb-pipe-from-vgdb-to-9019-by-postgres-on-chris-oneview-a.pprism.acs
==00:02:57:19.087 9019== embedded gdbserver: writing to   /tmp/vgdb-pipe-to-vgdb-from-9019-by-postgres-on-chris-oneview-a.pprism.acs
==00:02:57:19.087 9019== embedded gdbserver: shared mem   /tmp/vgdb-pipe-shared-mem-vgdb-9019-by-postgres-on-chris-oneview-a.pprism.acs
==00:02:57:19.087 9019== 
==00:02:57:19.087 9019== TO CONTROL THIS PROCESS USING vgdb (which you probably
==00:02:57:19.087 9019== don't want to do, unless you know exactly what you're doing,
==00:02:57:19.087 9019== or are doing some strange experiment):
==00:02:57:19.087 9019==   /usr/libexec/valgrind/../../bin/vgdb --pid=9019 ...command...
==00:02:57:19.087 9019== 
==00:02:57:19.087 9019== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==00:02:57:19.087 9019==   /path/to/gdb /usr/bin/pgpool
==00:02:57:19.087 9019== and then give GDB the following command
==00:02:57:19.087 9019==   target remote | /usr/libexec/valgrind/../../bin/vgdb --pid=9019
==00:02:57:19.087 9019== --pid is optional if only one valgrind process is running
==00:02:57:19.087 9019== 
==00:02:57:29.890 9019== Invalid write of size 4
==00:02:57:29.890 9019==    at 0x43532E: do_child (child.c:333)
==00:02:57:29.890 9019==    by 0x40B7E4: fork_a_child (pgpool_main.c:686)
==00:02:57:29.890 9019==    by 0x40D098: sync_backend_from_watchdog (pgpool_main.c:4277)
==00:02:57:29.890 9019==    by 0x40F911: sigusr1_interrupt_processor (pgpool_main.c:1296)
==00:02:57:29.890 9019==    by 0x412C4E: PgpoolMain (pgpool_main.c:477)
==00:02:57:29.890 9019==    by 0x409B49: main (main.c:365)
==00:02:57:29.890 9019==  Address 0x14 is not stack'd, malloc'd or (recently) free'd
==00:02:57:29.890 9019== 
==00:02:57:29.890 9019== 
==00:02:57:29.890 9019== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==00:02:57:29.890 9019==  Access not within mapped region at address 0x14
==00:02:57:29.890 9019==    at 0x43532E: do_child (child.c:333)
==00:02:57:29.890 9019==    by 0x40B7E4: fork_a_child (pgpool_main.c:686)
==00:02:57:29.890 9019==    by 0x40D098: sync_backend_from_watchdog (pgpool_main.c:4277)
==00:02:57:29.890 9019==    by 0x40F911: sigusr1_interrupt_processor (pgpool_main.c:1296)
==00:02:57:29.890 9019==    by 0x412C4E: PgpoolMain (pgpool_main.c:477)
==00:02:57:29.890 9019==    by 0x409B49: main (main.c:365)
==00:02:57:29.890 9019==  If you believe this happened as a result of a stack
==00:02:57:29.890 9019==  overflow in your program's main thread (unlikely but
==00:02:57:29.890 9019==  possible), you can try to increase the size of the
==00:02:57:29.890 9019==  main thread stack using the --main-stacksize= flag.
==00:02:57:29.890 9019==  The main thread stack size used in this run was 8388608.
--00:02:57:30.450 9019-- Discarding syms at 0xab8b1b0-0xab92501 in /usr/lib64/libnss_files-2.17.so (have_dinfo 1)
==00:02:57:30.459 9019== 
==00:02:57:30.459 9019== HEAP SUMMARY:
==00:02:57:30.459 9019==     in use at exit: 289,260 bytes in 49 blocks
==00:02:57:30.459 9019==   total heap usage: 784 allocs, 735 frees, 440,995 bytes allocated
==00:02:57:30.459 9019== 
==00:02:57:30.459 9019== Searching for pointers to 49 not-freed blocks
==00:02:57:30.714 9019== Checked 136,431,632 bytes
==00:02:57:30.714 9019== 
==00:02:57:30.715 9019== LEAK SUMMARY:
==00:02:57:30.715 9019==    definitely lost: 224 bytes in 1 blocks
==00:02:57:30.715 9019==    indirectly lost: 2,356 bytes in 27 blocks
==00:02:57:30.715 9019==      possibly lost: 0 bytes in 0 blocks
==00:02:57:30.715 9019==    still reachable: 286,680 bytes in 21 blocks
==00:02:57:30.715 9019==         suppressed: 0 bytes in 0 blocks
==00:02:57:30.715 9019== Rerun with --leak-check=full to see details of leaked memory
==00:02:57:30.715 9019== 
==00:02:57:30.715 9019== ERROR SUMMARY: 596 errors from 7 contexts (suppressed: 0 from 0)
==00:02:57:30.715 9019== 
==00:02:57:30.715 9019== 1 errors in context 1 of 7:
==00:02:57:30.715 9019== Invalid write of size 4
==00:02:57:30.715 9019==    at 0x43532E: do_child (child.c:333)
==00:02:57:30.715 9019==    by 0x40B7E4: fork_a_child (pgpool_main.c:686)
==00:02:57:30.715 9019==    by 0x40D098: sync_backend_from_watchdog (pgpool_main.c:4277)
==00:02:57:30.715 9019==    by 0x40F911: sigusr1_interrupt_processor (pgpool_main.c:1296)
==00:02:57:30.715 9019==    by 0x412C4E: PgpoolMain (pgpool_main.c:477)
==00:02:57:30.715 9019==    by 0x409B49: main (main.c:365)
==00:02:57:30.715 9019==  Address 0x14 is not stack'd, malloc'd or (recently) free'd
==00:02:57:30.715 9019== 
==00:02:57:30.715 9019== 
==00:02:57:30.715 9019== 1 errors in context 2 of 7:
==00:02:57:30.715 9019== Conditional jump or move depends on uninitialised value(s)
==00:02:57:30.715 9019==    at 0x4C2D108: strlen (vg_replace_strmem.c:461)
==00:02:57:30.715 9019==    by 0x4A84D3: appendStringInfoString (stringinfo.c:167)
==00:02:57:30.715 9019==    by 0x476A32: log_line_prefix.isra.1 (elog.c:2154)
==00:02:57:30.715 9019==    by 0x479634: send_message_to_server_log (elog.c:2210)
==00:02:57:30.715 9019==    by 0x479634: EmitErrorReport (elog.c:1140)
==00:02:57:30.715 9019==    by 0x47779D: errfinish (elog.c:440)
==00:02:57:30.715 9019==    by 0x416692: SysLogger_Start (pgpool_logger.c:548)
==00:02:57:30.715 9019==    by 0x411E72: PgpoolMain (pgpool_main.c:274)
==00:02:57:30.715 9019==    by 0x409B49: main (main.c:365)
==00:02:57:30.715 9019==  Uninitialised value was created by a stack allocation
==00:02:57:30.715 9019==    at 0x4768B2: log_line_prefix.isra.1 (elog.c:1990)
==00:02:57:30.715 9019== 
==00:02:57:30.715 9019== 
==00:02:57:30.715 9019== 1 errors in context 3 of 7:
==00:02:57:30.715 9019== Syscall param open(filename) points to uninitialised byte(s)
==00:02:57:30.715 9019==    at 0x6C7E900: __open_nocancel (in /usr/lib64/libc-2.17.so)
==00:02:57:30.715 9019==    by 0x6C0A51F: _IO_file_fopen@@GLIBC_2.2.5 (in /usr/lib64/libc-2.17.so)
==00:02:57:30.715 9019==    by 0x6BFDCA3: __fopen_internal (in /usr/lib64/libc-2.17.so)
==00:02:57:30.715 9019==    by 0x41A618: SetPgpoolNodeId (pool_config_variables.c:4960)
==00:02:57:30.715 9019==    by 0x41A930: config_post_processor (pool_config_variables.c:4642)
==00:02:57:30.715 9019==    by 0x41855A: pool_get_config (pool_config.l:485)
==00:02:57:30.715 9019==    by 0x40989D: main (main.c:231)
==00:02:57:30.715 9019==  Address 0x1ffeff2110 is on thread 1's stack
==00:02:57:30.715 9019==  in frame #3, created by SetPgpoolNodeId (pool_config_variables.c:4947)
==00:02:57:30.715 9019==  Uninitialised value was created by a stack allocation
==00:02:57:30.715 9019==    at 0x41A586: SetPgpoolNodeId (pool_config_variables.c:4947)
==00:02:57:30.715 9019== 
==00:02:57:30.715 9019== 
==00:02:57:30.715 9019== 15 errors in context 4 of 7:
==00:02:57:30.715 9019== Conditional jump or move depends on uninitialised value(s)
==00:02:57:30.715 9019==    at 0x6BDC079: vfprintf (in /usr/lib64/libc-2.17.so)
==00:02:57:30.715 9019==    by 0x6CA4FF4: __vsnprintf_chk (in /usr/lib64/libc-2.17.so)
==00:02:57:30.715 9019==    by 0x6CA4F57: __snprintf_chk (in /usr/lib64/libc-2.17.so)
==00:02:57:30.715 9019==    by 0x41A609: UnknownInlinedFun (stdio2.h:64)
==00:02:57:30.715 9019==    by 0x41A609: SetPgpoolNodeId (pool_config_variables.c:4955)
==00:02:57:30.715 9019==    by 0x41A930: config_post_processor (pool_config_variables.c:4642)
==00:02:57:30.715 9019==    by 0x41855A: pool_get_config (pool_config.l:485)
==00:02:57:30.715 9019==    by 0x40989D: main (main.c:231)
==00:02:57:30.715 9019==  Uninitialised value was created by a stack allocation
==00:02:57:30.715 9019==    at 0x41A586: SetPgpoolNodeId (pool_config_variables.c:4947)
==00:02:57:30.715 9019== 
==00:02:57:30.715 9019== 
==00:02:57:30.715 9019== 20 errors in context 5 of 7:
==00:02:57:30.715 9019== Conditional jump or move depends on uninitialised value(s)
==00:02:57:30.715 9019==    at 0x4C2D108: strlen (vg_replace_strmem.c:461)
==00:02:57:30.715 9019==    by 0x4A84D3: appendStringInfoString (stringinfo.c:167)
==00:02:57:30.715 9019==    by 0x476A32: log_line_prefix.isra.1 (elog.c:2154)
==00:02:57:30.715 9019==    by 0x4797F1: send_message_to_server_log (elog.c:2203)
==00:02:57:30.715 9019==    by 0x4797F1: EmitErrorReport (elog.c:1140)
==00:02:57:30.715 9019==    by 0x47779D: errfinish (elog.c:440)
==00:02:57:30.715 9019==    by 0x44DEEA: pool_shared_memory_cache_size (pool_memqcache.c:2059)
==00:02:57:30.715 9019==    by 0x411F08: initialize_shared_mem_objects (pgpool_main.c:3499)
==00:02:57:30.715 9019==    by 0x411F08: PgpoolMain (pgpool_main.c:283)
==00:02:57:30.715 9019==    by 0x409B49: main (main.c:365)
==00:02:57:30.715 9019==  Uninitialised value was created by a stack allocation
==00:02:57:30.715 9019==    at 0x4768B2: log_line_prefix.isra.1 (elog.c:1990)
==00:02:57:30.715 9019== 
==00:02:57:30.715 9019== 
==00:02:57:30.715 9019== 279 errors in context 6 of 7:
==00:02:57:30.715 9019== Conditional jump or move depends on uninitialised value(s)
==00:02:57:30.715 9019==    at 0x4C2D108: strlen (vg_replace_strmem.c:461)
==00:02:57:30.715 9019==    by 0x4A84D3: appendStringInfoString (stringinfo.c:167)
==00:02:57:30.715 9019==    by 0x476A32: log_line_prefix.isra.1 (elog.c:2154)
==00:02:57:30.715 9019==    by 0x479849: send_message_to_server_log (elog.c:2241)
==00:02:57:30.715 9019==    by 0x479849: EmitErrorReport (elog.c:1140)
==00:02:57:30.715 9019==    by 0x47779D: errfinish (elog.c:440)
==00:02:57:30.715 9019==    by 0x416692: SysLogger_Start (pgpool_logger.c:548)
==00:02:57:30.715 9019==    by 0x411E72: PgpoolMain (pgpool_main.c:274)
==00:02:57:30.715 9019==    by 0x409B49: main (main.c:365)
==00:02:57:30.715 9019==  Uninitialised value was created by a stack allocation
==00:02:57:30.715 9019==    at 0x4768B2: log_line_prefix.isra.1 (elog.c:1990)
==00:02:57:30.715 9019== 
==00:02:57:30.715 9019== 
==00:02:57:30.715 9019== 279 errors in context 7 of 7:
==00:02:57:30.715 9019== Conditional jump or move depends on uninitialised value(s)
==00:02:57:30.715 9019==    at 0x4C2D108: strlen (vg_replace_strmem.c:461)
==00:02:57:30.715 9019==    by 0x4A84D3: appendStringInfoString (stringinfo.c:167)
==00:02:57:30.715 9019==    by 0x476A32: log_line_prefix.isra.1 (elog.c:2154)
==00:02:57:30.715 9019==    by 0x479564: send_message_to_server_log (elog.c:2174)
==00:02:57:30.715 9019==    by 0x479564: EmitErrorReport (elog.c:1140)
==00:02:57:30.715 9019==    by 0x47779D: errfinish (elog.c:440)
==00:02:57:30.715 9019==    by 0x416692: SysLogger_Start (pgpool_logger.c:548)
==00:02:57:30.715 9019==    by 0x411E72: PgpoolMain (pgpool_main.c:274)
==00:02:57:30.715 9019==    by 0x409B49: main (main.c:365)
==00:02:57:30.715 9019==  Uninitialised value was created by a stack allocation
==00:02:57:30.715 9019==    at 0x4768B2: log_line_prefix.isra.1 (elog.c:1990)
==00:02:57:30.715 9019== 
==00:02:57:30.715 9019== ERROR SUMMARY: 596 errors from 7 contexts (suppressed: 0 from 0)
pgpool-9019.log (12,328 bytes)   

fjcasero

2022-04-08 18:11

reporter   ~0004025

I've uploaded a new log file.

I compiled pgpool with the following changes:
Modified child.c to check and log if proc_info is NULL
Modified pool_get_process_info to log when ist is returning NULL

log extract
-------------
2022-04-07 14:36:03.046: child pid 24697: LOG: pool_get_process_info is returning NULL for pid: 24697
2022-04-07 14:36:03.046: child pid 24697: LOCATION: pgpool_main.c:2622
2022-04-07 14:36:03.046: child pid 24697: LOG: pool_get_process_info returned NULL for pid: 24697
2022-04-07 14:36:03.046: child pid 24697: LOCATION: child.c:165
2022-04-07 14:36:03.047: main pid 22984: LOG: fork a new child process with pid: 24697
2022-04-07 14:36:03.047: main pid 22984: LOCATION: pgpool_main.c:2554
2022-04-07 14:36:07.167: watchdog pid 22999: LOG: new IPC connection received
2022-04-07 14:36:07.167: watchdog pid 22999: LOCATION: watchdog.c:3447
2022-04-07 14:36:17.203: watchdog pid 22999: LOG: new IPC connection received
2022-04-07 14:36:17.203: watchdog pid 22999: LOCATION: watchdog.c:3447
2022-04-07 14:36:27.240: watchdog pid 22999: LOG: new IPC connection received
2022-04-07 14:36:27.241: watchdog pid 22999: LOCATION: watchdog.c:3447
2022-04-07 14:36:37.274: watchdog pid 22999: LOG: new IPC connection received
2022-04-07 14:36:37.274: watchdog pid 22999: LOCATION: watchdog.c:3447
2022-04-07 14:36:47.319: watchdog pid 22999: LOG: new IPC connection received
2022-04-07 14:36:47.320: watchdog pid 22999: LOCATION: watchdog.c:3447
2022-04-07 14:36:57.352: watchdog pid 22999: LOG: new IPC connection received
2022-04-07 14:36:57.352: watchdog pid 22999: LOCATION: watchdog.c:3447
2022-04-07 14:37:01.616: child pid 24697: LOG: proc_info is NULL before proc_info->wait_for_connect = 0
2022-04-07 14:37:01.617: child pid 24697: LOCATION: child.c:355
2022-04-07 14:37:01.633: pcp_main pid 11193: LOG: forked new pcp worker, pid=25202 socket=7
2022-04-07 14:37:01.633: pcp_main pid 11193: LOCATION: pcp_child.c:299
2022-04-07 14:37:01.636: pcp_main pid 11193: LOG: PCP process with pid: 25202 exit with SUCCESS.
2022-04-07 14:37:01.636: pcp_main pid 11193: LOCATION: pcp_child.c:355
2022-04-07 14:37:01.636: pcp_main pid 11193: LOG: PCP process with pid: 25202 exits with status 0
2022-04-07 14:37:01.636: pcp_main pid 11193: LOCATION: pcp_child.c:369
2022-04-07 14:37:01.648: pcp_main pid 11193: LOG: forked new pcp worker, pid=25206 socket=7
2022-04-07 14:37:01.648: pcp_main pid 11193: LOCATION: pcp_child.c:299
2022-04-07 14:37:01.678: pcp_main pid 11193: LOG: PCP process with pid: 25206 exit with SUCCESS.
2022-04-07 14:37:01.678: pcp_main pid 11193: LOCATION: pcp_child.c:355
2022-04-07 14:37:01.678: pcp_main pid 11193: LOG: PCP process with pid: 25206 exits with status 0
2022-04-07 14:37:01.678: pcp_main pid 11193: LOCATION: pcp_child.c:369
2022-04-07 14:37:01.711: pcp_main pid 11193: LOG: forked new pcp worker, pid=25229 socket=7
2022-04-07 14:37:01.711: pcp_main pid 11193: LOCATION: pcp_child.c:299
2022-04-07 14:37:01.733: pcp_main pid 11193: LOG: PCP process with pid: 25229 exit with SUCCESS.
2022-04-07 14:37:01.733: pcp_main pid 11193: LOCATION: pcp_child.c:355
2022-04-07 14:37:01.733: pcp_main pid 11193: LOG: PCP process with pid: 25229 exits with status 0
2022-04-07 14:37:01.733: pcp_main pid 11193: LOCATION: pcp_child.c:369
2022-04-07 14:37:02.276: main pid 22984: WARNING: child process with pid: 24697 was terminated by segmentation fault
2022-04-07 14:37:02.276: main pid 22984: LOCATION: pgpool_main.c:2404

I think that:
in pgpool_main.c:2509
if (!switching && !exiting && restart_child)
{
=> process_info[i].pid = fork_a_child(fds, i);
    process_info[i].start_time = time(NULL);

fork_a_child calls do_child in child.c that calls ProcessInfo* proc_info = pool_get_process_info(getpid()); before pgpool_main updates process_info[i] and that is why proc_info is NULL

The backtrace shows 357 protocol/child.c: No such file or directory. because the machine does not have the debuginfo updated with the changes that I made

-- backtrace ---
 gdb pgpool /var/tmp/core.24697
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/pgpool...done.
[New LWP 24697]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `pgpool: wait for connection request '.
Program terminated with signal 11, Segmentation fault.
#0 do_child (fds=fds@entry=0x16b28b0) at protocol/child.c:357
357 protocol/child.c: No such file or directory.
Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.26-24.el7_9.x86_64 glibc-2.17-325.el7_9.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-51.el7_9.x86_64 libcom_err-1.42.9-19.el7.x86_64 libselinux-2.5-15.el7.x86_64 nspr-4.32.0-1.el7_9.x86_64 nss-3.67.0-4.el7_9.x86_64 nss-softokn-freebl-3.67.0-3.el7_9.x86_64 nss-util-3.67.0-1.el7_9.x86_64 openldap-2.4.44-25.el7_9.x86_64 openssl-libs-1.0.2k-25.el7_9.x86_64 pcre-8.32-17.el7.x86_64 postgresql12-libs-12.10-1PGDG.rhel7.x86_64 zlib-1.2.7-19.el7_9.x86_64
(gdb) bt
#0 do_child (fds=fds@entry=0x16b28b0) at protocol/child.c:357
0000001 0x000000000040a125 in fork_a_child (fds=0x16b28b0, id=6) at main/pgpool_main.c:686
0000002 0x000000000040aa05 in reaper () at main/pgpool_main.c:2509
0000003 0x00000000004118ed in PgpoolMain (discard_status=discard_status@entry=1 '\001',
    clear_memcache_oidmaps=clear_memcache_oidmaps@entry=0 '\000') at main/pgpool_main.c:477
0000004 0x0000000000408434 in main (argc=<optimized out>, argv=<optimized out>) at main/main.c:365

It's hard to say how to reproduce because of the nature of the problem. I think that any operation that causes a child worker to be recreated could produce this problem in a machine with multiple cores. Any situation that calls fork_a_child.

The machine I'm testing has eight cores.

Best regards
pgpool-2022-04-07_053105.log (2,861,514 bytes)

kawamoto

2022-04-20 10:14

developer   ~0004026

Thank you for providing information.

We changed the way of shared memory access where Seg fault orrured to a safer way.
https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=f1710d86b61de476372a0086b977647156af415c

If possible, could you please use the corrent V4_3_STABLE code to check the seg fault nolonger occurs?

fjcasero

2022-04-26 17:59

reporter   ~0004028

Thank you Kamoto.

I saw the changes in child.c and I'm sure that will solve the problem.

I downloaded the V4_3_STABLE branch and compiled it, but haven't been able to test it yet on the machine where the core dumps were generated.

Best regards.

Issue History

Date Modified Username Field Change
2022-03-30 19:08 fjcasero New Issue
2022-03-30 19:08 fjcasero Tag Attached: error
2022-03-30 19:08 fjcasero Tag Attached: segfault
2022-03-30 19:08 fjcasero File Added: pgpool.conf
2022-03-31 17:38 kawamoto Assigned To => kawamoto
2022-03-31 17:38 kawamoto Status new => assigned
2022-04-01 09:34 kawamoto Note Added: 0004016
2022-04-04 17:53 fjcasero Note Added: 0004020
2022-04-04 17:53 fjcasero File Added: pgpool-2022-03-23_051334.zip
2022-04-04 18:07 fjcasero Note Added: 0004021
2022-04-06 03:33 fjcasero Note Added: 0004024
2022-04-06 03:33 fjcasero File Added: pgpool-9019.log
2022-04-08 18:11 fjcasero Note Added: 0004025
2022-04-08 18:11 fjcasero File Added: pgpool-2022-04-07_053105.log
2022-04-20 10:14 kawamoto Note Added: 0004026
2022-04-26 17:59 fjcasero Note Added: 0004028