[pgpool-hackers: 1768] Sometimes watchdog regression test fails.
Tatsuo Ishii
ishii at sraoss.co.jp
Fri Aug 19 09:42:00 JST 2016
Hi Usama,
Sometimes watchdog regression test fails.
When it fails, regression/log/004.watchdog is as follows:
-------------------------------------------------------------------
creating master pgpool...Satrting set up in raw mode
creating startall and shutdownall
creating failover script
creating database cluster /home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regression/tests/004.watchdog/master/data0...done.
update postgreql.conf
creating pgpool_remote_start
creating basebackup.sh
start all
waiting for pgpool-II coming up...done.
create extensions
Pager usage is off.
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+----------+-------+--------+-----------+--------+------------+-------------------+-------------------
0 | /tmp | 11002 | up | 1.000000 | master | 0 | true | 0
(1 row)
shutdown all
pgpool-II setting for raw mode is done.
To start the whole system, use /home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regression/tests/004.watchdog/master/startall.
To shutdown the whole system, use /home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regression/tests/004.watchdog/master/shutdownall.
pcp command user name is "t-ishii", password is "t-ishii".
Each PostgreSQL, pgpool-II and pcp port is as follows:
#1 port is 11002
pgpool port is 11000
pcp port is 11001
The info above is in README.port.
master setup done.
waiting for server to start....9766 2016-08-19 09:27:24 JST LOG: redirecting log output to logging collector process
9766 2016-08-19 09:27:24 JST HINT: Future log output will appear in directory "pg_log".
done
server started
creating standby pgpool...Waiting for the pgpool master...
Master brought up successfully.
Waiting for the standby to join cluster...
[check] 1 times
[check] 2 times
[check] 3 times
[check] 4 times
[check] 5 times
[check] 6 times
[check] 7 times
[check] 8 times
[check] 9 times
[check] 10 times
2016-08-19 09:28:00: pid 9977: LOG: stop request sent to pgpool. waiting for termination...
.done.
Checking if the Standby pgpool-II detected the master shutdown...
Master shutdown detected.
Checking if the Standby pgpool-II takes over the master responsibility...
[check] 1 times
[check] 2 times
[check] 3 times
[check] 4 times
[check] 5 times
Standby successfully became the new master.
2016-08-19 09:28:11: pid 10008: LOG: stop request sent to pgpool. waiting for termination...
.done.
2016-08-19 09:28:12: pid 10012: FATAL: could not read pid file
waiting for server to shut down.... done
server stopped
3 out of 4 successfull
-------------------------------------------------------------------
It seems whenever the test fails, that is always at the same place.
Waiting for the standby to join cluster...
[check] 1 times
[check] 2 times
[check] 3 times
[check] 4 times
[check] 5 times
[check] 6 times
[check] 7 times
[check] 8 times
[check] 9 times
[check] 10 times
Any idea?
Best regards,
Pgpool-II log attached.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
-------------- next part --------------
2016-08-19 09:27:25: pid 9775: LOG: Backend status file /home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regression/tests/004.watchdog/master/log/pgpool_status discarded
2016-08-19 09:27:25: pid 9775: LOG: waiting for watchdog to initialize
2016-08-19 09:27:25: pid 9780: LOG: setting the local watchdog node name to "Linux_tishii-CF-SX3HE4BP_11000"
2016-08-19 09:27:25: pid 9780: LOG: watchdog cluster configured with 1 remote nodes
2016-08-19 09:27:25: pid 9780: LOG: watchdog remote node:0 on localhost:21104
2016-08-19 09:27:25: pid 9780: LOG: interface monitoring is disabled in watchdog
2016-08-19 09:27:25: pid 9780: LOG: IPC socket path: "/tmp/.s.PGPOOLWD_CMD.21004"
2016-08-19 09:27:25: pid 9780: LOG: watchdog node state changed from [DEAD] to [LOADING]
2016-08-19 09:27:29: pid 9780: LOG: watchdog node state changed from [LOADING] to [JOINING]
2016-08-19 09:27:34: pid 9780: LOG: watchdog node state changed from [JOINING] to [INITIALIZING]
2016-08-19 09:27:35: pid 9780: LOG: I am the only alive node in the watchdog cluster
2016-08-19 09:27:35: pid 9780: HINT: skiping stand for coordinator state
2016-08-19 09:27:35: pid 9780: LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
2016-08-19 09:27:35: pid 9780: LOG: I am announcing my self as master/coordinator watchdog node
2016-08-19 09:27:40: pid 9780: LOG: I am the cluster leader node
2016-08-19 09:27:40: pid 9780: DETAIL: our declare coordinator message is accepted by all nodes
2016-08-19 09:27:40: pid 9780: LOG: I am the cluster leader node. Starting escalation process
2016-08-19 09:27:40: pid 9775: LOG: watchdog process is initialized
2016-08-19 09:27:40: pid 9780: LOG: escalation process started with PID:9843
2016-08-19 09:27:40: pid 9780: LOG: new IPC connection received
2016-08-19 09:27:40: pid 9843: LOG: watchdog: escalation started
2016-08-19 09:27:40: pid 9780: LOG: processing master node data request from IPC socket
2016-08-19 09:27:40: pid 9780: LOG: new IPC connection received
2016-08-19 09:27:40: pid 9775: LOG: I am the master watchdog node
2016-08-19 09:27:40: pid 9775: DETAIL: using the local backend node status
2016-08-19 09:27:40: pid 9844: LOG: 2 watchdog nodes are configured for lifecheck
2016-08-19 09:27:40: pid 9844: LOG: watchdog nodes ID:0 Name:"Linux_tishii-CF-SX3HE4BP_11000"
2016-08-19 09:27:40: pid 9844: DETAIL: Host:"localhost" WD Port:21004 pgpool-II port:11000
2016-08-19 09:27:40: pid 9844: LOG: watchdog nodes ID:1 Name:"Not_Set"
2016-08-19 09:27:40: pid 9844: DETAIL: Host:"localhost" WD Port:21104 pgpool-II port:11100
2016-08-19 09:27:40: pid 9780: LOG: watchdog escalation process with pid: 9843 exit with SUCCESS.
2016-08-19 09:27:40: pid 9775: LOG: Setting up socket for 127.0.0.1:11000
2016-08-19 09:27:40: pid 9775: LOG: pgpool-II successfully started. version 3.6-alpha1 (subaruboshi)
2016-08-19 09:27:40: pid 9877: LOG: DB node id: 0 backend pid: 9883 statement: DISCARD ALL
2016-08-19 09:27:40: pid 9780: LOG: new watchdog node connection is received from "127.0.0.1:41884"
2016-08-19 09:27:41: pid 9845: LOG: set SO_REUSEPORT option to the socket
2016-08-19 09:27:41: pid 9845: LOG: creating watchdog heartbeat receive socket.
2016-08-19 09:27:41: pid 9845: DETAIL: set SO_REUSEPORT
2016-08-19 09:27:41: pid 9846: LOG: set SO_REUSEPORT option to the socket
2016-08-19 09:27:41: pid 9846: LOG: creating socket for sending heartbeat
2016-08-19 09:27:41: pid 9846: DETAIL: set SO_REUSEPORT
2016-08-19 09:27:47: pid 9780: LOG: new outbond connection to localhost:21104
2016-08-19 09:27:50: pid 9844: LOG: watchdog: lifecheck started
2016-08-19 09:28:00: pid 9775: LOG: received fast shutdown request
2016-08-19 09:28:00: pid 9775: LOG: shutdown request. closing listen socket
2016-08-19 09:28:00: pid 9852: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9853: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9854: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9847: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9876: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9875: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9780: LOG: Watchdog is shutting down
2016-08-19 09:28:00: pid 9978: LOG: watchdog: de-escalation started
2016-08-19 09:28:00: pid 9878: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9877: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9874: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9873: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9872: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9871: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9870: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9869: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9868: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9867: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9866: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9865: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9864: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9863: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9862: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9861: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9860: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9859: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9858: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9857: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9856: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9855: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9851: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9850: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9849: LOG: child process received shutdown request signal 2
2016-08-19 09:28:00: pid 9848: LOG: child process received shutdown request signal 2
-------------- next part --------------
2016-08-19 09:27:40: pid 9886: LOG: Backend status file /home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regression/tests/004.watchdog/master/log/pgpool_status discarded
2016-08-19 09:27:40: pid 9886: LOG: waiting for watchdog to initialize
2016-08-19 09:27:40: pid 9890: LOG: setting the local watchdog node name to "Linux_tishii-CF-SX3HE4BP_11100"
2016-08-19 09:27:40: pid 9890: LOG: watchdog cluster configured with 1 remote nodes
2016-08-19 09:27:40: pid 9890: LOG: watchdog remote node:0 on localhost:21004
2016-08-19 09:27:40: pid 9890: LOG: interface monitoring is disabled in watchdog
2016-08-19 09:27:40: pid 9890: LOG: IPC socket path: "/tmp/.s.PGPOOLWD_CMD.21104"
2016-08-19 09:27:40: pid 9890: LOG: watchdog node state changed from [DEAD] to [LOADING]
2016-08-19 09:27:40: pid 9890: LOG: new outbond connection to localhost:21004
2016-08-19 09:27:40: pid 9890: LOG: watchdog node state changed from [LOADING] to [INITIALIZING]
2016-08-19 09:27:41: pid 9890: LOG: watchdog node state changed from [INITIALIZING] to [STANDBY]
2016-08-19 09:27:41: pid 9886: LOG: watchdog process is initialized
2016-08-19 09:27:41: pid 9890: LOG: new IPC connection received
2016-08-19 09:27:41: pid 9890: LOG: processing master node data request from IPC socket
2016-08-19 09:27:41: pid 9890: LOG: data request from IPC socket is forwarded to master watchdog node "Linux_tishii-CF-SX3HE4BP_11000"
2016-08-19 09:27:41: pid 9890: DETAIL: waiting for the reply from master node...
2016-08-19 09:27:41: pid 9890: LOG: new IPC connection received
2016-08-19 09:27:41: pid 9886: LOG: master watchdog node "Linux_tishii-CF-SX3HE4BP_11000" returned status for 1 backend nodes
2016-08-19 09:27:41: pid 9886: LOG: primary node on master watchdog node "Linux_tishii-CF-SX3HE4BP_11000" is -1
2016-08-19 09:27:41: pid 9891: LOG: 2 watchdog nodes are configured for lifecheck
2016-08-19 09:27:41: pid 9891: LOG: watchdog nodes ID:0 Name:"Linux_tishii-CF-SX3HE4BP_11100"
2016-08-19 09:27:41: pid 9891: DETAIL: Host:"localhost" WD Port:21104 pgpool-II port:11100
2016-08-19 09:27:41: pid 9891: LOG: watchdog nodes ID:1 Name:"Linux_tishii-CF-SX3HE4BP_11000"
2016-08-19 09:27:41: pid 9891: DETAIL: Host:"localhost" WD Port:21004 pgpool-II port:11000
2016-08-19 09:27:41: pid 9886: LOG: Setting up socket for 127.0.0.1:11100
2016-08-19 09:27:41: pid 9886: LOG: pgpool-II successfully started. version 3.6-alpha1 (subaruboshi)
2016-08-19 09:27:42: pid 9892: LOG: set SO_REUSEPORT option to the socket
2016-08-19 09:27:42: pid 9892: LOG: creating watchdog heartbeat receive socket.
2016-08-19 09:27:42: pid 9892: DETAIL: set SO_REUSEPORT
2016-08-19 09:27:42: pid 9893: LOG: set SO_REUSEPORT option to the socket
2016-08-19 09:27:42: pid 9893: LOG: creating socket for sending heartbeat
2016-08-19 09:27:42: pid 9893: DETAIL: set SO_REUSEPORT
2016-08-19 09:27:47: pid 9890: LOG: new watchdog node connection is received from "127.0.0.1:2186"
2016-08-19 09:27:51: pid 9891: LOG: watchdog: lifecheck started
2016-08-19 09:28:00: pid 9890: LOG: read from socket failed, remote end closed the connection
2016-08-19 09:28:00: pid 9890: LOG: remote node "Linux_tishii-CF-SX3HE4BP_11000" is shutting down
2016-08-19 09:28:00: pid 9890: LOG: watchdog cluster has lost the coordinator node
2016-08-19 09:28:00: pid 9890: LOG: watchdog node state changed from [STANDBY] to [JOINING]
2016-08-19 09:28:00: pid 9890: LOG: failed to send command id, Socket:8 Type:[NODE INFO], Command_ID:11, data Length:165
2016-08-19 09:28:00: pid 9890: DETAIL: Broken pipe
2016-08-19 09:28:05: pid 9890: LOG: watchdog node state changed from [JOINING] to [INITIALIZING]
2016-08-19 09:28:06: pid 9890: LOG: I am the only alive node in the watchdog cluster
2016-08-19 09:28:06: pid 9890: HINT: skiping stand for coordinator state
2016-08-19 09:28:06: pid 9890: LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
2016-08-19 09:28:06: pid 9890: LOG: I am announcing my self as master/coordinator watchdog node
2016-08-19 09:28:11: pid 9890: LOG: I am the cluster leader node
2016-08-19 09:28:11: pid 9890: DETAIL: our declare coordinator message is accepted by all nodes
2016-08-19 09:28:11: pid 9890: LOG: I am the cluster leader node. Starting escalation process
2016-08-19 09:28:11: pid 9890: LOG: escalation process started with PID:10006
2016-08-19 09:28:11: pid 10006: LOG: watchdog: escalation started
2016-08-19 09:28:11: pid 9890: LOG: watchdog escalation process with pid: 10006 exit with SUCCESS.
2016-08-19 09:28:11: pid 9886: LOG: received fast shutdown request
2016-08-19 09:28:11: pid 9886: LOG: shutdown request. closing listen socket
2016-08-19 09:28:11: pid 9896: LOG: child process received shutdown request signal 2
2016-08-19 09:28:11: pid 9894: LOG: child process received shutdown request signal 2
2016-08-19 09:28:11: pid 9899: LOG: child process received shutdown request signal 2
2016-08-19 09:28:11: pid 9890: LOG: Watchdog is shutting down
2016-08-19 09:28:11: pid 9898: LOG: child process received shutdown request signal 2
2016-08-19 09:28:11: pid 9897: LOG: child process received shutdown request signal 2
2016-08-19 09:28:11: pid 9903: LOG: child process received shutdown request signal 2
2016-08-19 09:28:11: pid 9902: LOG: child process received shutdown request signal 2
2016-08-19 09:28:11: pid 9901: LOG: child process received shutdown request signal 2
2016-08-19 09:28:11: pid 9924: LOG: child process received shutdown request signal 2
2016-08-19 09:28:11: pid 10009: LOG: watchdog: de-escalation started
equest signal 2
2016-08-19 09:28:11: pid 9922: LOG: child process received shutdown request signal 2
2016-08-19 09:28:11: pid 9925: LOG: child process received shutdown request signal 2
More information about the pgpool-hackers
mailing list