[pgpool-hackers: 1768] Sometimes watchdog regression test fails.

Fri Aug 19 09:42:00 JST 2016

Hi Usama,

Sometimes watchdog regression test fails.

When it fails, regression/log/004.watchdog is as follows:
-------------------------------------------------------------------
creating master pgpool...Satrting set up in raw mode
creating startall and shutdownall
creating failover script
creating database cluster /home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regression/tests/004.watchdog/master/data0...done.
update postgreql.conf
creating pgpool_remote_start
creating basebackup.sh
start all
waiting for pgpool-II coming up...done.
create extensions
Pager usage is off.
 node_id | hostname | port  | status | lb_weight |  role  | select_cnt | load_balance_node | replication_delay 
---------+----------+-------+--------+-----------+--------+------------+-------------------+-------------------
 0       | /tmp     | 11002 | up     | 1.000000  | master | 0          | true              | 0
(1 row)

shutdown all

pgpool-II setting for raw mode is done.
To start the whole system, use /home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regression/tests/004.watchdog/master/startall.
To shutdown the whole system, use /home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regression/tests/004.watchdog/master/shutdownall.
pcp command user name is "t-ishii", password is "t-ishii".
Each PostgreSQL, pgpool-II and pcp port is as follows:
#1 port is 11002
pgpool port is 11000
pcp port is 11001
The info above is in README.port.
master setup done.
waiting for server to start....9766 2016-08-19 09:27:24 JST LOG:  redirecting log output to logging collector process
9766 2016-08-19 09:27:24 JST HINT:  Future log output will appear in directory "pg_log".
 done
server started
creating standby pgpool...Waiting for the pgpool master...
Master brought up successfully.
Waiting for the standby to join cluster...
[check] 1 times
[check] 2 times
[check] 3 times
[check] 4 times
[check] 5 times
[check] 6 times
[check] 7 times
[check] 8 times
[check] 9 times
[check] 10 times
2016-08-19 09:28:00: pid 9977: LOG:  stop request sent to pgpool. waiting for termination...
.done.
Checking if the Standby pgpool-II detected the master shutdown...
Master shutdown detected.
Checking if the Standby pgpool-II takes over the master responsibility...
[check] 1 times
[check] 2 times
[check] 3 times
[check] 4 times
[check] 5 times
Standby successfully became the new master.
2016-08-19 09:28:11: pid 10008: LOG:  stop request sent to pgpool. waiting for termination...
.done.
2016-08-19 09:28:12: pid 10012: FATAL:  could not read pid file
waiting for server to shut down.... done
server stopped
3 out of 4 successfull
-------------------------------------------------------------------

It seems whenever the test fails, that is always at the same place.

Waiting for the standby to join cluster...
[check] 1 times
[check] 2 times
[check] 3 times
[check] 4 times
[check] 5 times
[check] 6 times
[check] 7 times
[check] 8 times
[check] 9 times
[check] 10 times

Any idea?

Best regards,

Pgpool-II log attached.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
-------------- next part --------------
2016-08-19 09:27:25: pid 9775: LOG:  Backend status file /home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regression/tests/004.watchdog/master/log/pgpool_status discarded
2016-08-19 09:27:25: pid 9775: LOG:  waiting for watchdog to initialize
2016-08-19 09:27:25: pid 9780: LOG:  setting the local watchdog node name to "Linux_tishii-CF-SX3HE4BP_11000"
2016-08-19 09:27:25: pid 9780: LOG:  watchdog cluster configured with 1 remote nodes
2016-08-19 09:27:25: pid 9780: LOG:  watchdog remote node:0 on localhost:21104
2016-08-19 09:27:25: pid 9780: LOG:  interface monitoring is disabled in watchdog
2016-08-19 09:27:25: pid 9780: LOG:  IPC socket path: "/tmp/.s.PGPOOLWD_CMD.21004"
2016-08-19 09:27:25: pid 9780: LOG:  watchdog node state changed from [DEAD] to [LOADING]
2016-08-19 09:27:29: pid 9780: LOG:  watchdog node state changed from [LOADING] to [JOINING]
2016-08-19 09:27:34: pid 9780: LOG:  watchdog node state changed from [JOINING] to [INITIALIZING]
2016-08-19 09:27:35: pid 9780: LOG:  I am the only alive node in the watchdog cluster
2016-08-19 09:27:35: pid 9780: HINT:  skiping stand for coordinator state
2016-08-19 09:27:35: pid 9780: LOG:  watchdog node state changed from [INITIALIZING] to [MASTER]
2016-08-19 09:27:35: pid 9780: LOG:  I am announcing my self as master/coordinator watchdog node
2016-08-19 09:27:40: pid 9780: LOG:  I am the cluster leader node
2016-08-19 09:27:40: pid 9780: DETAIL:  our declare coordinator message is accepted by all nodes
2016-08-19 09:27:40: pid 9780: LOG:  I am the cluster leader node. Starting escalation process
2016-08-19 09:27:40: pid 9775: LOG:  watchdog process is initialized
2016-08-19 09:27:40: pid 9780: LOG:  escalation process started with PID:9843
2016-08-19 09:27:40: pid 9780: LOG:  new IPC connection received
2016-08-19 09:27:40: pid 9843: LOG:  watchdog: escalation started
2016-08-19 09:27:40: pid 9780: LOG:  processing master node data request from IPC socket
2016-08-19 09:27:40: pid 9780: LOG:  new IPC connection received
2016-08-19 09:27:40: pid 9775: LOG:  I am the master watchdog node
2016-08-19 09:27:40: pid 9775: DETAIL:  using the local backend node status
2016-08-19 09:27:40: pid 9844: LOG:  2 watchdog nodes are configured for lifecheck
2016-08-19 09:27:40: pid 9844: LOG:  watchdog nodes ID:0 Name:"Linux_tishii-CF-SX3HE4BP_11000"
2016-08-19 09:27:40: pid 9844: DETAIL:  Host:"localhost" WD Port:21004 pgpool-II port:11000
2016-08-19 09:27:40: pid 9844: LOG:  watchdog nodes ID:1 Name:"Not_Set"
2016-08-19 09:27:40: pid 9844: DETAIL:  Host:"localhost" WD Port:21104 pgpool-II port:11100
2016-08-19 09:27:40: pid 9780: LOG:  watchdog escalation process with pid: 9843 exit with SUCCESS.
2016-08-19 09:27:40: pid 9775: LOG:  Setting up socket for 127.0.0.1:11000
2016-08-19 09:27:40: pid 9775: LOG:  pgpool-II successfully started. version 3.6-alpha1 (subaruboshi)
2016-08-19 09:27:40: pid 9877: LOG:  DB node id: 0 backend pid: 9883 statement:  DISCARD ALL
2016-08-19 09:27:40: pid 9780: LOG:  new watchdog node connection is received from "127.0.0.1:41884"
2016-08-19 09:27:41: pid 9845: LOG:  set SO_REUSEPORT option to the socket
2016-08-19 09:27:41: pid 9845: LOG:  creating watchdog heartbeat receive socket.
2016-08-19 09:27:41: pid 9845: DETAIL:  set SO_REUSEPORT
2016-08-19 09:27:41: pid 9846: LOG:  set SO_REUSEPORT option to the socket
2016-08-19 09:27:41: pid 9846: LOG:  creating socket for sending heartbeat
2016-08-19 09:27:41: pid 9846: DETAIL:  set SO_REUSEPORT
2016-08-19 09:27:47: pid 9780: LOG:  new outbond connection to localhost:21104 
2016-08-19 09:27:50: pid 9844: LOG:  watchdog: lifecheck started
2016-08-19 09:28:00: pid 9775: LOG:  received fast shutdown request
2016-08-19 09:28:00: pid 9775: LOG:  shutdown request. closing listen socket
2016-08-19 09:28:00: pid 9852: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9853: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9854: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9847: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9876: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9875: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9780: LOG:  Watchdog is shutting down
2016-08-19 09:28:00: pid 9978: LOG:  watchdog: de-escalation started
2016-08-19 09:28:00: pid 9878: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9877: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9874: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9873: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9872: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9871: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9870: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9869: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9868: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9867: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9866: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9865: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9864: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9863: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9862: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9861: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9860: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9859: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9858: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9857: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9856: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9855: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9851: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9850: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9849: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9848: LOG:  child process received shutdown request signal 2
-------------- next part --------------
2016-08-19 09:27:40: pid 9886: LOG:  Backend status file /home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regression/tests/004.watchdog/master/log/pgpool_status discarded
2016-08-19 09:27:40: pid 9886: LOG:  waiting for watchdog to initialize
2016-08-19 09:27:40: pid 9890: LOG:  setting the local watchdog node name to "Linux_tishii-CF-SX3HE4BP_11100"
2016-08-19 09:27:40: pid 9890: LOG:  watchdog cluster configured with 1 remote nodes
2016-08-19 09:27:40: pid 9890: LOG:  watchdog remote node:0 on localhost:21004
2016-08-19 09:27:40: pid 9890: LOG:  interface monitoring is disabled in watchdog
2016-08-19 09:27:40: pid 9890: LOG:  IPC socket path: "/tmp/.s.PGPOOLWD_CMD.21104"
2016-08-19 09:27:40: pid 9890: LOG:  watchdog node state changed from [DEAD] to [LOADING]
2016-08-19 09:27:40: pid 9890: LOG:  new outbond connection to localhost:21004 
2016-08-19 09:27:40: pid 9890: LOG:  watchdog node state changed from [LOADING] to [INITIALIZING]
2016-08-19 09:27:41: pid 9890: LOG:  watchdog node state changed from [INITIALIZING] to [STANDBY]
2016-08-19 09:27:41: pid 9886: LOG:  watchdog process is initialized
2016-08-19 09:27:41: pid 9890: LOG:  new IPC connection received
2016-08-19 09:27:41: pid 9890: LOG:  processing master node data request from IPC socket
2016-08-19 09:27:41: pid 9890: LOG:  data request from IPC socket is forwarded to master watchdog node "Linux_tishii-CF-SX3HE4BP_11000"
2016-08-19 09:27:41: pid 9890: DETAIL:  waiting for the reply from master node...
2016-08-19 09:27:41: pid 9890: LOG:  new IPC connection received
2016-08-19 09:27:41: pid 9886: LOG:  master watchdog node "Linux_tishii-CF-SX3HE4BP_11000" returned status for 1 backend nodes
2016-08-19 09:27:41: pid 9886: LOG:  primary node on master watchdog node "Linux_tishii-CF-SX3HE4BP_11000" is -1
2016-08-19 09:27:41: pid 9891: LOG:  2 watchdog nodes are configured for lifecheck
2016-08-19 09:27:41: pid 9891: LOG:  watchdog nodes ID:0 Name:"Linux_tishii-CF-SX3HE4BP_11100"
2016-08-19 09:27:41: pid 9891: DETAIL:  Host:"localhost" WD Port:21104 pgpool-II port:11100
2016-08-19 09:27:41: pid 9891: LOG:  watchdog nodes ID:1 Name:"Linux_tishii-CF-SX3HE4BP_11000"
2016-08-19 09:27:41: pid 9891: DETAIL:  Host:"localhost" WD Port:21004 pgpool-II port:11000
2016-08-19 09:27:41: pid 9886: LOG:  Setting up socket for 127.0.0.1:11100
2016-08-19 09:27:41: pid 9886: LOG:  pgpool-II successfully started. version 3.6-alpha1 (subaruboshi)
2016-08-19 09:27:42: pid 9892: LOG:  set SO_REUSEPORT option to the socket
2016-08-19 09:27:42: pid 9892: LOG:  creating watchdog heartbeat receive socket.
2016-08-19 09:27:42: pid 9892: DETAIL:  set SO_REUSEPORT
2016-08-19 09:27:42: pid 9893: LOG:  set SO_REUSEPORT option to the socket
2016-08-19 09:27:42: pid 9893: LOG:  creating socket for sending heartbeat
2016-08-19 09:27:42: pid 9893: DETAIL:  set SO_REUSEPORT
2016-08-19 09:27:47: pid 9890: LOG:  new watchdog node connection is received from "127.0.0.1:2186"
2016-08-19 09:27:51: pid 9891: LOG:  watchdog: lifecheck started
2016-08-19 09:28:00: pid 9890: LOG:  read from socket failed, remote end closed the connection
2016-08-19 09:28:00: pid 9890: LOG:  remote node "Linux_tishii-CF-SX3HE4BP_11000" is shutting down
2016-08-19 09:28:00: pid 9890: LOG:  watchdog cluster has lost the coordinator node
2016-08-19 09:28:00: pid 9890: LOG:  watchdog node state changed from [STANDBY] to [JOINING]
2016-08-19 09:28:00: pid 9890: LOG:  failed to send command id, Socket:8 Type:[NODE INFO], Command_ID:11, data Length:165
2016-08-19 09:28:00: pid 9890: DETAIL:  Broken pipe
2016-08-19 09:28:05: pid 9890: LOG:  watchdog node state changed from [JOINING] to [INITIALIZING]
2016-08-19 09:28:06: pid 9890: LOG:  I am the only alive node in the watchdog cluster
2016-08-19 09:28:06: pid 9890: HINT:  skiping stand for coordinator state
2016-08-19 09:28:06: pid 9890: LOG:  watchdog node state changed from [INITIALIZING] to [MASTER]
2016-08-19 09:28:06: pid 9890: LOG:  I am announcing my self as master/coordinator watchdog node
2016-08-19 09:28:11: pid 9890: LOG:  I am the cluster leader node
2016-08-19 09:28:11: pid 9890: DETAIL:  our declare coordinator message is accepted by all nodes
2016-08-19 09:28:11: pid 9890: LOG:  I am the cluster leader node. Starting escalation process
2016-08-19 09:28:11: pid 9890: LOG:  escalation process started with PID:10006
2016-08-19 09:28:11: pid 10006: LOG:  watchdog: escalation started
2016-08-19 09:28:11: pid 9890: LOG:  watchdog escalation process with pid: 10006 exit with SUCCESS.
2016-08-19 09:28:11: pid 9886: LOG:  received fast shutdown request
2016-08-19 09:28:11: pid 9886: LOG:  shutdown request. closing listen socket
2016-08-19 09:28:11: pid 9896: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:11: pid 9894: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:11: pid 9899: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:11: pid 9890: LOG:  Watchdog is shutting down
2016-08-19 09:28:11: pid 9898: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:11: pid 9897: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:11: pid 9903: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:11: pid 9902: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:11: pid 9901: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:11: pid 9924: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:11: pid 10009: LOG:  watchdog: de-escalation started
equest signal 2
2016-08-19 09:28:11: pid 9922: LOG:  child process received shutdown request signal 2
2016-08-19 09:28:11: pid 9925: LOG:  child process received shutdown request signal 2