View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000204 | Pgpool-II | Bug | public | 2016-06-02 12:14 | 2016-06-08 21:16 |
| Reporter | harukat | Assigned To | nagata | ||
| Priority | normal | Severity | minor | Reproducibility | random |
| Status | resolved | Resolution | fixed | ||
| Product Version | 3.4.5 | ||||
| Summary | 0000204: healthcheck get stuck | ||||
| Description | Pgpool-II' main process got stuck in health check when the databsae server machine got powerdown. #0 0x0000003eb70e13e3 in select () from /lib64/libc.so.6 0000001 0x000000000042ba2b in pool_check_fd () 0000002 0x0000000000466f9b in pool_read () 0000003 0x0000000000467300 in pool_read_with_error () 0000004 0x000000000042366f in s_do_auth () 0000005 0x0000000000423e8a in make_persistent_db_connection () 0000006 0x0000000000405f4a in do_health_check () 0000007 0x0000000000409f5a in PgpoolMain () 0000008 0x0000000000404501 in main () It occurs rarely in my tests. I reproduced it with gdb and iptable. | ||||
| Steps To Reproduce | PostgreSQL version: 9.4.5 pgpool-II version: V3_4_STABLE head (at 2016-06-02 10:00 JST) postgresql.conf: listen_addresses = '*' pg_hba.conf: host all all 10.10.10.72/32 trust [pghost]$ pg_ctl start [pghost]$ createuser nobody pgpool.conf (changed points from pgpool.conf.sample): backend_hostname0 = '10.10.10.126' debug_level = 1 health_check_period = 1 health_check_max_retries = 2 log_min_messages = debug1 pid_file_name = '/tmp/pgpool2/pgpool.pid' logdir = '/tmp/pgpool2' [poolhost]$ pgpool -D -n &>/tmp/pgpool2/log # start [poolhost]$ ps x [poolhost]$ gdb -p 10673 (gdb) b send_startup_packet Breakpoint 1 at 0x4236a0: file protocol/child.c, line 638. (gdb) c Continuing. Breakpoint 1, send_startup_packet (cp=0x1548fe0) at protocol/child.c:638 [pghost]# iptables -A OUTPUT -p tcp --sport 5432 -j DROP [poolhost] (gdb) c Continuing. pgpool.log: 2016-06-02 10:58:34: pid 10673: DEBUG: starting health check 2016-06-02 10:58:34: pid 10673: DEBUG: health check: clearing alarm 2016-06-02 10:58:34: pid 10673: DEBUG: Backend DB node 0 status is 1 2016-06-02 10:58:34: pid 10673: DEBUG: Trying to make persistent DB connection to backend node 0 having status 1 2016-06-02 10:58:34: pid 10673: DEBUG: SSL is requested but SSL support is not available 2016-06-02 10:58:34: pid 10673: DEBUG: authenticate kind = 0 2016-06-02 10:58:34: pid 10673: DEBUG: authenticate backend: key data received 2016-06-02 10:58:34: pid 10673: DEBUG: authenticate backend: transaction state: I 2016-06-02 10:58:34: pid 10673: DEBUG: persistent DB connection to backend node 0 having status 1 is successful 2016-06-02 10:58:34: pid 10673: DEBUG: health check: clearing alarm 2016-06-02 10:58:34: pid 10673: DEBUG: health check: clearing alarm {stop and continue by gdb here} 2016-06-02 10:58:35: pid 10673: DEBUG: starting health check 2016-06-02 10:58:35: pid 10673: DEBUG: health check: clearing alarm {stop and continue by gdb here} 2016-06-02 10:58:39: pid 10673: DEBUG: starting health check 2016-06-02 10:58:39: pid 10673: DEBUG: health check: clearing alarm 2016-06-02 10:58:39: pid 10673: DEBUG: Backend DB node 0 status is 1 2016-06-02 10:58:39: pid 10673: DEBUG: Trying to make persistent DB connection to backend node 0 having status 1 2016-06-02 10:58:39: pid 10673: DEBUG: SSL is requested but SSL support is not available {get stuck long time here} 2016-06-02 11:14:11: pid 10673: LOG: notice_backend_error: called from pgpool main. ignored. 2016-06-02 11:14:11: pid 10673: WARNING: child_exit: called from invalid process. ignored. 2016-06-02 11:14:11: pid 10673: ERROR: unable to read data from DB node 0 2016-06-02 11:14:11: pid 10673: DETAIL: socket read failed with an error "Success" 2016-06-02 11:14:11: pid 10673: DEBUG: health check: clearing alarm 2016-06-02 11:14:11: pid 10673: DEBUG: Backend DB node 0 status is 1 2016-06-02 11:14:11: pid 10673: DEBUG: Trying to make persistent DB connection to backend node 0 having status 1 2016-06-02 11:17:01: pid 10673: LOG: failed to connect to PostgreSQL server on "10.10.10.126:5432", timed out {noticed eventually} | ||||
| Tags | No tags attached. | ||||
|
|
I think that we have to set the timeout like the following code. I am not sure of the best location to set timeout. diff --git a/src/protocol/child.c b/src/protocol/child.c index 239b181..98e3487 100644 --- a/src/protocol/child.c +++ b/src/protocol/child.c @@ -1267,10 +1267,13 @@ POOL_CONNECTION_POOL_SLOT *make_persistent_db_connection PG_TRY(); { send_startup_packet(cp); + pool_set_timeout(30); s_do_auth(cp, password); + pool_set_timeout(0); } PG_CATCH(); { + pool_set_timeout(0); pool_close(cp->con); Or the following methods are thought about. diff --git a/src/protocol/pool_process_query.c b/src/protocol/pool_process_query index 33b752e..eeed37c 100644 --- a/src/protocol/pool_process_query.c +++ b/src/protocol/pool_process_query.c @@ -970,6 +970,8 @@ int pool_check_fd(POOL_CONNECTION *cp) fds = select(fd+1, &readmask, NULL, &exceptmask, timeoutp); if (fds == -1) { + if (processState == PERFORMING_HEALTH_CHECK && errno == EINTR) + return 1; if (errno == EAGAIN || errno == EINTR) continue; |
|
|
Thanks, committed. http://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=ed9f2900f1b611f5cfd52e8f758c3616861e60c0 |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2016-06-02 12:14 | harukat | New Issue | |
| 2016-06-02 19:46 | harukat | Note Added: 0000846 | |
| 2016-06-02 19:48 | harukat | Note Edited: 0000846 | |
| 2016-06-03 10:08 | nagata | Assigned To | => nagata |
| 2016-06-03 10:08 | nagata | Status | new => assigned |
| 2016-06-08 21:16 | nagata | Note Added: 0000851 | |
| 2016-06-08 21:16 | nagata | Status | assigned => resolved |
| 2016-06-08 21:16 | nagata | Resolution | open => fixed |