View Issue Details

IDProjectCategoryView StatusLast Update
0000204Pgpool-IIBugpublic2016-06-08 21:16
ReporterharukatAssigned Tonagata 
PrioritynormalSeverityminorReproducibilityrandom
Status resolvedResolutionfixed 
Product Version3.4.5 
Target VersionFixed in Version 
Summary0000204: healthcheck get stuck
DescriptionPgpool-II' main process got stuck in health check
when the databsae server machine got powerdown.

#0 0x0000003eb70e13e3 in select () from /lib64/libc.so.6
0000001 0x000000000042ba2b in pool_check_fd ()
0000002 0x0000000000466f9b in pool_read ()
0000003 0x0000000000467300 in pool_read_with_error ()
0000004 0x000000000042366f in s_do_auth ()
0000005 0x0000000000423e8a in make_persistent_db_connection ()
0000006 0x0000000000405f4a in do_health_check ()
0000007 0x0000000000409f5a in PgpoolMain ()
0000008 0x0000000000404501 in main ()

It occurs rarely in my tests.
I reproduced it with gdb and iptable.
Steps To ReproducePostgreSQL version: 9.4.5
pgpool-II version: V3_4_STABLE head (at 2016-06-02 10:00 JST)

postgresql.conf:
 listen_addresses = '*'
pg_hba.conf:
 host all all 10.10.10.72/32 trust

[pghost]$ pg_ctl start
[pghost]$ createuser nobody

pgpool.conf (changed points from pgpool.conf.sample):
 backend_hostname0 = '10.10.10.126'
 debug_level = 1
 health_check_period = 1
 health_check_max_retries = 2
 log_min_messages = debug1
 pid_file_name = '/tmp/pgpool2/pgpool.pid'
 logdir = '/tmp/pgpool2'

[poolhost]$ pgpool -D -n &>/tmp/pgpool2/log # start
[poolhost]$ ps x
[poolhost]$ gdb -p 10673
(gdb) b send_startup_packet
Breakpoint 1 at 0x4236a0: file protocol/child.c, line 638.
(gdb) c
Continuing.

Breakpoint 1, send_startup_packet (cp=0x1548fe0) at protocol/child.c:638

[pghost]# iptables -A OUTPUT -p tcp --sport 5432 -j DROP

[poolhost]
(gdb) c
Continuing.

pgpool.log:
2016-06-02 10:58:34: pid 10673: DEBUG: starting health check
2016-06-02 10:58:34: pid 10673: DEBUG: health check: clearing alarm
2016-06-02 10:58:34: pid 10673: DEBUG: Backend DB node 0 status is 1
2016-06-02 10:58:34: pid 10673: DEBUG: Trying to make persistent DB connection to backend node 0 having status 1
2016-06-02 10:58:34: pid 10673: DEBUG: SSL is requested but SSL support is not available
2016-06-02 10:58:34: pid 10673: DEBUG: authenticate kind = 0
2016-06-02 10:58:34: pid 10673: DEBUG: authenticate backend: key data received
2016-06-02 10:58:34: pid 10673: DEBUG: authenticate backend: transaction state: I
2016-06-02 10:58:34: pid 10673: DEBUG: persistent DB connection to backend node 0 having status 1 is successful
2016-06-02 10:58:34: pid 10673: DEBUG: health check: clearing alarm
2016-06-02 10:58:34: pid 10673: DEBUG: health check: clearing alarm

 {stop and continue by gdb here}

2016-06-02 10:58:35: pid 10673: DEBUG: starting health check
2016-06-02 10:58:35: pid 10673: DEBUG: health check: clearing alarm

 {stop and continue by gdb here}

2016-06-02 10:58:39: pid 10673: DEBUG: starting health check
2016-06-02 10:58:39: pid 10673: DEBUG: health check: clearing alarm
2016-06-02 10:58:39: pid 10673: DEBUG: Backend DB node 0 status is 1
2016-06-02 10:58:39: pid 10673: DEBUG: Trying to make persistent DB connection to backend node 0 having status 1
2016-06-02 10:58:39: pid 10673: DEBUG: SSL is requested but SSL support is not available

 {get stuck long time here}

2016-06-02 11:14:11: pid 10673: LOG: notice_backend_error: called from pgpool main. ignored.
2016-06-02 11:14:11: pid 10673: WARNING: child_exit: called from invalid process. ignored.
2016-06-02 11:14:11: pid 10673: ERROR: unable to read data from DB node 0
2016-06-02 11:14:11: pid 10673: DETAIL: socket read failed with an error "Success"
2016-06-02 11:14:11: pid 10673: DEBUG: health check: clearing alarm
2016-06-02 11:14:11: pid 10673: DEBUG: Backend DB node 0 status is 1
2016-06-02 11:14:11: pid 10673: DEBUG: Trying to make persistent DB connection to backend node 0 having status 1
2016-06-02 11:17:01: pid 10673: LOG: failed to connect to PostgreSQL server on "10.10.10.126:5432", timed out

 {noticed eventually}
TagsNo tags attached.

Activities

harukat

2016-06-02 19:46

developer   ~0000846

Last edited: 2016-06-02 19:48

View 2 revisions

I think that we have to set the timeout like the following code.
I am not sure of the best location to set timeout.

diff --git a/src/protocol/child.c b/src/protocol/child.c
index 239b181..98e3487 100644
--- a/src/protocol/child.c
+++ b/src/protocol/child.c
@@ -1267,10 +1267,13 @@ POOL_CONNECTION_POOL_SLOT *make_persistent_db_connection
        PG_TRY();
        {
                send_startup_packet(cp);
+ pool_set_timeout(30);
                s_do_auth(cp, password);
+ pool_set_timeout(0);
        }
        PG_CATCH();
        {
+ pool_set_timeout(0);
                pool_close(cp->con);



Or the following methods are thought about.

diff --git a/src/protocol/pool_process_query.c b/src/protocol/pool_process_query
index 33b752e..eeed37c 100644
--- a/src/protocol/pool_process_query.c
+++ b/src/protocol/pool_process_query.c
@@ -970,6 +970,8 @@ int pool_check_fd(POOL_CONNECTION *cp)
                fds = select(fd+1, &readmask, NULL, &exceptmask, timeoutp);
                if (fds == -1)
                {
+ if (processState == PERFORMING_HEALTH_CHECK && errno == EINTR)
+ return 1;
                        if (errno == EAGAIN || errno == EINTR)
                                continue;

nagata

2016-06-08 21:16

developer   ~0000851

Thanks, committed.

http://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=ed9f2900f1b611f5cfd52e8f758c3616861e60c0

Issue History

Date Modified Username Field Change
2016-06-02 12:14 harukat New Issue
2016-06-02 19:46 harukat Note Added: 0000846
2016-06-02 19:48 harukat Note Edited: 0000846 View Revisions
2016-06-03 10:08 nagata Assigned To => nagata
2016-06-03 10:08 nagata Status new => assigned
2016-06-08 21:16 nagata Note Added: 0000851
2016-06-08 21:16 nagata Status assigned => resolved
2016-06-08 21:16 nagata Resolution open => fixed