View Issue Details

IDProjectCategoryView StatusLast Update
0000337Pgpool-IIBugpublic2019-01-30 10:10
Reporterann francisAssigned Topengbo 
PriorityurgentSeveritymajorReproducibilityalways
Status assignedResolutionopen 
Product Version3.4.6 
Target VersionFixed in Version 
Summary0000337: PgPool health check hangs at pool_check_fd() resulting in pgpool zombie child processes
DescriptionI have master-slave pgpool setup with 2 pgpools on top of 3 postgres nodes.

When one of the postgres nodes is not responding ( not even able to connect to that machine) , any new connection attempt to pgpool is stuck at pool_check_fd() method in the pool_process_query.c at

                  fds = select(fd+1, &readmask, NULL, &exceptmask, timeoutp);

I tried setting the timeout to 62 seconds using pool_set_timeout() method in the in do_health_check() before calling make_persistent_db_connection() . But that ended up executing failover scripts in both pgpools.

Below is the extract from the pgpool logs
-------------------------------------------------------------------------
PGPool – 1

2017-08-28 04:56:54: pid 24427: ERROR: unable to read data from DB node 0
2017-08-28 04:56:54: pid 24427: DETAIL: pool_check_fd call failed with an error "Interrupted system call"
2017-08-28 04:56:54: pid 24427: LOCATION: pool_stream.c:172
2017-08-28 04:57:56: pid 24427: ERROR: unable to read data from DB node 0
2017-08-28 04:57:56: pid 24427: DETAIL: pool_check_fd call failed with an error "Interrupted system call"
2017-08-28 04:57:56: pid 24427: LOCATION: pool_stream.c:172
2017-08-28 04:57:56: pid 24427: LOG: setting backend node 2 status to NODE DOWN
2017-08-28 04:57:56: pid 24427: LOCATION: pgpool_main.c:537
2017-08-28 04:57:56: pid 24427: LOG: watchdog notifying to start interlocking
2017-08-28 04:57:56: pid 24427: LOCATION: wd_interlock.c:80
2017-08-28 04:57:56: pid 24427: LOG: watchdog became a new lock holder
2017-08-28 04:57:56: pid 24427: LOCATION: wd_interlock.c:247
2017-08-28 04:58:06: pid 24427: WARNING: watchdog start interlocking, timed out
2017-08-28 04:58:06: pid 24427: LOCATION: wd_interlock.c:120
2017-08-28 04:58:06: pid 24427: LOG: starting degeneration. shutdown host 10.122.12.3(5432)


PGPool - 2
2017-08-28 04:57:16: pid 12400: ERROR: unable to read data from DB node 0
2017-08-28 04:57:16: pid 12400: DETAIL: pool_check_fd call failed with an error "Interrupted system call"
2017-08-28 04:57:16: pid 12400: LOCATION: pool_stream.c:172
2017-08-28 04:57:56: pid 12403: LOG: received degenerate backend request for node_id: 2 from pid [12403]
2017-08-28 04:57:56: pid 12403: LOCATION: pgpool_main.c:1104
2017-08-28 04:58:18: pid 12400: ERROR: unable to read data from DB node 0
2017-08-28 04:58:18: pid 12400: DETAIL: pool_check_fd call failed with an error "Interrupted system call"
2017-08-28 04:58:18: pid 12400: LOCATION: pool_stream.c:172
2017-08-28 04:58:18: pid 12400: LOG: setting backend node 2 status to NODE DOWN
2017-08-28 04:58:18: pid 12400: LOCATION: pgpool_main.c:537
2017-08-28 04:58:18: pid 12400: LOG: watchdog notifying to start interlocking
2017-08-28 04:58:18: pid 12400: LOCATION: wd_interlock.c:80
2017-08-28 04:58:18: pid 12400: LOG: watchdog became a new lock holder
2017-08-28 04:58:18: pid 12400: LOCATION: wd_interlock.c:247
2017-08-28 04:58:28: pid 12400: WARNING: watchdog start interlocking, timed out
2017-08-28 04:58:28: pid 12400: LOCATION: wd_interlock.c:120
2017-08-28 04:58:28: pid 12400: LOG: starting degeneration. shutdown host 10.122.12.3(5432)
2017-08-28 04:58:28: pid 12400: LOCATION: pgpool_main.c:1527
2017-08-28 04:58:28: pid 12400: LOG: Restart all children
Tagspgpool health check

Activities

There are no notes attached to this issue.

Issue History

Date Modified Username Field Change
2017-08-29 17:03 ann francis New Issue
2017-08-29 17:03 ann francis Tag Attached: pgpool health check
2019-01-30 10:10 administrator Assigned To => pengbo
2019-01-30 10:10 administrator Status new => assigned