View Issue Details

IDProjectCategoryView StatusLast Update
0000219Pgpool-IIBugpublic2016-07-26 07:32
Reporterstivi21 Assigned To 
PriorityhighSeveritymajorReproducibilityalways
Status closedResolutionopen 
OSUbuntuOS Version14.04 
Summary0000219: Inconsistent status of Postgresql nodes in pgPool instances after restart. Watchdog not syncing status.
DescriptionPgpool: 3.5.3
Postgresql: 9.4
Pgpool configuration: replication mode, using watchdog but without VIP, haproxy instead.
Environment: Amazon Web Services, EC2

Host0 (vir-stag-pg-00): pgpool and postgres instance (master)
Host1 (vir-stag-pg-01): pgpool and postgres instance (slave)
Host2 (vir-stag-pg-02): pgpool instance (for quorum)
Steps To Reproduce1) All hosts are working with statuses:
# Status od postgresql (the same status of all instances od pgpool):
 node_id | hostname | port | status | lb_weight | role | select_cnt
---------+----------------+------+--------+-----------+---------+------------
 0 | vir-stag-pg-00 | 5432 | 2 | 0.500000 | primary | 1023309
 1 | vir-stag-pg-01 | 5432 | 3 | 0.500000 | standby | 0

# Watchdog status (from Host0):
3 YES Linux_ip-10-0-1-129_5433 vir-stag-pg-00

Linux_ip-10-0-1-129_5433 vir-stag-pg-00 5433 9000 4 MASTER
Linux_ip-10-0-112-190_5433 vir-stag-pg-01 5433 9000 7 STANDBY
Linux_ip-10-0-2-132_5433 vir-stag-pg-02 5433 9000 7 STANDBY

2) Host1 going to stop for few minutes, then start again

3) After this, on Host0 and Host2 are the same status, where postgres from host2 have status 3:
 node_id | hostname | port | status | lb_weight | role | select_cnt
---------+----------------+------+--------+-----------+---------+------------
 0 | vir-stag-pg-00 | 5432 | 2 | 0.500000 | primary | 1026367
 1 | vir-stag-pg-01 | 5432 | 3 | 0.500000 | standby | 0

On Host1 both postgres nodes has status 2:
 node_id | hostname | port | status | lb_weight | role | select_cnt
---------+----------------+------+--------+-----------+---------+------------
 0 | vir-stag-pg-00 | 5432 | 2 | 0.500000 | primary | 5
 1 | vir-stag-pg-01 | 5432 | 2 | 0.500000 | standby | 0

Status of watchdog is ok on all pgpool instances:
3 YES Linux_ip-10-0-1-129_5433 vir-stag-pg-00

Linux_ip-10-0-1-129_5433 vir-stag-pg-00 5433 9000 4 MASTER
Linux_ip-10-0-112-190_5433 vir-stag-pg-01 5433 9000 7 STANDBY
Linux_ip-10-0-2-132_5433 vir-stag-pg-02 5433 9000 7 STANDBY
Additional InformationThe same situation, when I restarted Host0, but after restart pgpool on this host see 2 masters postgresql (backause slave was promoted to master), but both are active, another pgpool instances have first pgpool node as detached.



Logs from pgpool on Host2 (node "Linux_ip-10-0-112-190_5433" is pgpool on restarted Host1):

### Host1 stopped
Jul 15 09:15:07 ip-10-0-2-132 pgpool[7154]: [40-1] 2016-07-15 09:15:07: pid 7154: LOG: read from socket failed, remote end closed the connection
Jul 15 09:15:07 ip-10-0-2-132 pgpool: 2016-07-15 09:15:07: pid 7154: LOG: read from socket failed, remote end closed the connection
Jul 15 09:15:07 ip-10-0-2-132 pgpool[7154]: [41-1] 2016-07-15 09:15:07: pid 7154: LOG: remote node "Linux_ip-10-0-112-190_5433" is shutting down
Jul 15 09:15:07 ip-10-0-2-132 pgpool: 2016-07-15 09:15:07: pid 7154: LOG: remote node "Linux_ip-10-0-112-190_5433" is shutting down
Jul 15 09:15:07 ip-10-0-2-132 pgpool[7154]: [42-1] 2016-07-15 09:15:07: pid 7154: LOG: read from socket failed, remote end closed the connection
Jul 15 09:15:07 ip-10-0-2-132 pgpool: 2016-07-15 09:15:07: pid 7154: LOG: read from socket failed, remote end closed the connection
Jul 15 09:15:08 ip-10-0-2-132 pgpool[7154]: [43-1] 2016-07-15 09:15:08: pid 7154: LOG: received degenerate backend request for node_id: 1 from pid [7154]
Jul 15 09:15:08 ip-10-0-2-132 pgpool: 2016-07-15 09:15:08: pid 7154: LOG: received degenerate backend request for node_id: 1 from pid [7154]
Jul 15 09:15:08 ip-10-0-2-132 pgpool[7150]: [44-1] 2016-07-15 09:15:08: pid 7150: LOG: starting degeneration. shutdown host vir-stag-pg-01(5432)
Jul 15 09:15:08 ip-10-0-2-132 pgpool: 2016-07-15 09:15:08: pid 7150: LOG: starting degeneration. shutdown host vir-stag-pg-01(5432)
Jul 15 09:15:08 ip-10-0-2-132 pgpool[7154]: [44-1] 2016-07-15 09:15:08: pid 7154: LOG: sending watchdog response
Jul 15 09:15:08 ip-10-0-2-132 pgpool[7154]: [44-2] 2016-07-15 09:15:08: pid 7154: DETAIL: failover request from other pgpool is canceled because of switching
Jul 15 09:15:08 ip-10-0-2-132 pgpool: 2016-07-15 09:15:08: pid 7154: LOG: sending watchdog response
Jul 15 09:15:08 ip-10-0-2-132 pgpool: 2016-07-15 09:15:08: pid 7154: DETAIL: failover request from other pgpool is canceled because of switching
Jul 15 09:15:08 ip-10-0-2-132 pgpool[7150]: [45-1] 2016-07-15 09:15:08: pid 7150: LOG: Restart all children
Jul 15 09:15:08 ip-10-0-2-132 pgpool: 2016-07-15 09:15:08: pid 7150: LOG: Restart all children
Jul 15 09:15:08 ip-10-0-2-132 pgpool[7156]: [5-1] 2016-07-15 09:15:08: pid 7156: LOG: child process received shutdown request signal 3
Jul 15 09:15:08 ip-10-0-2-132 pgpool: 2016-07-15 09:15:08: pid 7156: LOG: child process received shutdown request signal 3
[ cut a lot of the same lines ]
Jul 15 09:15:08 ip-10-0-2-132 pgpool[7154]: [45-1] 2016-07-15 09:15:08: pid 7154: LOG: new IPC connection received
Jul 15 09:15:08 ip-10-0-2-132 pgpool[7154]: [46-1] 2016-07-15 09:15:08: pid 7154: LOG: processing sync request from IPC socket
Jul 15 09:15:08 ip-10-0-2-132 pgpool[7154]: [47-1] 2016-07-15 09:15:08: pid 7154: LOG: sync request from IPC socket is forwarded to master watchdog node "Linux_ip-10-0-1-129_5433"
Jul 15 09:15:08 ip-10-0-2-132 pgpool[7154]: [47-2] 2016-07-15 09:15:08: pid 7154: DETAIL: waiting for the reply from master node...
Jul 15 09:15:08 ip-10-0-2-132 pgpool: 2016-07-15 09:15:08: pid 7303: LOG: child process received shutdown request signal 3
[ cut a lot of the same lines ]
Jul 15 09:15:08 ip-10-0-2-132 pgpool: 2016-07-15 09:15:08: pid 7154: LOG: new IPC connection received
Jul 15 09:15:08 ip-10-0-2-132 pgpool: 2016-07-15 09:15:08: pid 7154: LOG: processing sync request from IPC socket
Jul 15 09:15:08 ip-10-0-2-132 pgpool: 2016-07-15 09:15:08: pid 7154: LOG: sync request from IPC socket is forwarded to master watchdog node "Linux_ip-10-0-1-129_5433"
Jul 15 09:15:08 ip-10-0-2-132 pgpool: 2016-07-15 09:15:08: pid 7154: DETAIL: waiting for the reply from master node...
Jul 15 09:15:08 ip-10-0-2-132 pgpool[7150]: [46-1] 2016-07-15 09:15:08: pid 7150: LOG: execute command: /etc/pgpool2/failover_stream.sh 1 vir-stag-pg-00 0
Jul 15 09:15:08 ip-10-0-2-132 pgpool: 2016-07-15 09:15:08: pid 7150: LOG: execute command: /etc/pgpool2/failover_stream.sh 1 vir-stag-pg-00 0
Jul 15 09:15:08 ip-10-0-2-132 pgpool: Fri Jul 15 09:15:08 UTC 2016
Jul 15 09:15:08 ip-10-0-2-132 pgpool: Failed node: 1
Jul 15 09:15:08 ip-10-0-2-132 pgpool: + [ 1 != 0 ]
Jul 15 09:15:08 ip-10-0-2-132 pgpool: + echo [INFO] Slave node is down. New master not promoted!
Jul 15 09:15:08 ip-10-0-2-132 pgpool: [INFO] Slave node is down. New master not promoted!
Jul 15 09:15:08 ip-10-0-2-132 pgpool: + exit 0
Jul 15 09:15:08 ip-10-0-2-132 pgpool[7154]: [48-1] 2016-07-15 09:15:08: pid 7154: LOG: new IPC connection received
Jul 15 09:15:08 ip-10-0-2-132 pgpool[7154]: [49-1] 2016-07-15 09:15:08: pid 7154: LOG: processing sync request from IPC socket
Jul 15 09:15:08 ip-10-0-2-132 pgpool: 2016-07-15 09:15:08: pid 7154: LOG: new IPC connection received
Jul 15 09:15:08 ip-10-0-2-132 pgpool[7154]: [50-1] 2016-07-15 09:15:08: pid 7154: LOG: sync request from IPC socket is forwarded to master watchdog node "Linux_ip-10-0-1-129_5433"
Jul 15 09:15:08 ip-10-0-2-132 pgpool: 2016-07-15 09:15:08: pid 7154: LOG: processing sync request from IPC socket
Jul 15 09:15:08 ip-10-0-2-132 pgpool[7154]: [50-2] 2016-07-15 09:15:08: pid 7154: DETAIL: waiting for the reply from master node...
Jul 15 09:15:08 ip-10-0-2-132 pgpool: 2016-07-15 09:15:08: pid 7154: LOG: sync request from IPC socket is forwarded to master watchdog node "Linux_ip-10-0-1-129_5433"
Jul 15 09:15:08 ip-10-0-2-132 pgpool: 2016-07-15 09:15:08: pid 7154: DETAIL: waiting for the reply from master node...
Jul 15 09:15:08 ip-10-0-2-132 pgpool[7150]: [47-1] 2016-07-15 09:15:08: pid 7150: LOG: failover: set new primary node: 0
Jul 15 09:15:08 ip-10-0-2-132 pgpool[7150]: [48-1] 2016-07-15 09:15:08: pid 7150: LOG: failover: set new master node: 0
Jul 15 09:15:08 ip-10-0-2-132 pgpool: 2016-07-15 09:15:08: pid 7150: LOG: failover: set new primary node: 0
Jul 15 09:15:08 ip-10-0-2-132 pgpool: 2016-07-15 09:15:08: pid 7150: LOG: failover: set new master node: 0
Jul 15 09:15:08 ip-10-0-2-132 pgpool[7361]: [5-1] 2016-07-15 09:15:08: pid 7361: LOG: worker process received restart request
Jul 15 09:15:08 ip-10-0-2-132 pgpool: 2016-07-15 09:15:08: pid 7361: LOG: worker process received restart request
Jul 15 09:15:08 ip-10-0-2-132 pgpool[7154]: [51-1] 2016-07-15 09:15:08: pid 7154: LOG: new IPC connection received
Jul 15 09:15:08 ip-10-0-2-132 pgpool: 2016-07-15 09:15:08: pid 7154: LOG: new IPC connection received
Jul 15 09:15:08 ip-10-0-2-132 pgpool[7154]: [52-1] 2016-07-15 09:15:08: pid 7154: LOG: processing sync request from IPC socket
Jul 15 09:15:08 ip-10-0-2-132 pgpool[7154]: [53-1] 2016-07-15 09:15:08: pid 7154: LOG: sync request from IPC socket is forwarded to master watchdog node "Linux_ip-10-0-1-129_5433"
Jul 15 09:15:08 ip-10-0-2-132 pgpool: 2016-07-15 09:15:08: pid 7154: LOG: processing sync request from IPC socket
Jul 15 09:15:08 ip-10-0-2-132 pgpool[7154]: [53-2] 2016-07-15 09:15:08: pid 7154: DETAIL: waiting for the reply from master node...
Jul 15 09:15:08 ip-10-0-2-132 pgpool: 2016-07-15 09:15:08: pid 7154: LOG: sync request from IPC socket is forwarded to master watchdog node "Linux_ip-10-0-1-129_5433"
Jul 15 09:15:08 ip-10-0-2-132 pgpool: 2016-07-15 09:15:08: pid 7154: DETAIL: waiting for the reply from master node...
Jul 15 09:15:08 ip-10-0-2-132 pgpool[7150]: [49-1] 2016-07-15 09:15:08: pid 7150: LOG: failover done. shutdown host vir-stag-pg-01(5432)
Jul 15 09:15:08 ip-10-0-2-132 pgpool: failover done. shutdown host vir-stag-pg-01(5432)2016-07-15 09:15:08: pid 7150: LOG: failover done. shutdown host vir-stag-pg-01(5432)
Jul 15 09:15:09 ip-10-0-2-132 pgpool[7360]: [1334-1] 2016-07-15 09:15:09: pid 7360: LOG: restart request received in pcp child process
Jul 15 09:15:09 ip-10-0-2-132 pgpool: 2016-07-15 09:15:09: pid 7360: LOG: restart request received in pcp child process
Jul 15 09:15:09 ip-10-0-2-132 pgpool[7150]: [50-1] 2016-07-15 09:15:09: pid 7150: LOG: PCP child 7360 exits with status 0 in failover()
Jul 15 09:15:09 ip-10-0-2-132 pgpool: 2016-07-15 09:15:09: pid 7150: LOG: PCP child 7360 exits with status 0 in failover()
Jul 15 09:15:09 ip-10-0-2-132 pgpool[7150]: [51-1] 2016-07-15 09:15:09: pid 7150: LOG: fork a new PCP child pid 28732 in failover()
Jul 15 09:15:09 ip-10-0-2-132 pgpool: 2016-07-15 09:15:09: pid 7150: LOG: fork a new PCP child pid 28732 in failover()
Jul 15 09:15:09 ip-10-0-2-132 pgpool[7150]: [52-1] 2016-07-15 09:15:09: pid 7150: LOG: child process with pid: 7156 exits with status 0
Jul 15 09:15:09 ip-10-0-2-132 pgpool[7150]: [53-1] 2016-07-15 09:15:09: pid 7150: LOG: child process with pid: 7156 exited with success and will not be restarted
Jul 15 09:15:09 ip-10-0-2-132 pgpool: 2016-07-15 09:15:09: pid 7150: LOG: child process with pid: 7156 exits with status 0
Jul 15 09:15:09 ip-10-0-2-132 pgpool[7150]: [54-1] 2016-07-15 09:15:09: pid 7150: LOG: child process with pid: 7157 exits with status 0
Jul 15 09:15:09 ip-10-0-2-132 pgpool: 2016-07-15 09:15:09: pid 7150: LOG: child process with pid: 7156 exited with success and will not be restarted
Jul 15 09:15:09 ip-10-0-2-132 pgpool[7150]: [55-1] 2016-07-15 09:15:09: pid 7150: LOG: child process with pid: 7157 exited with success and will not be restarted
Jul 15 09:15:09 ip-10-0-2-132 pgpool: 2016-07-15 09:15:09: pid 7150: LOG: child process with pid: 7157 exits with status 0
[ cut a lot of the same lines ]



### Host1 started
Jul 15 09:15:41 ip-10-0-2-132 pgpool[7155]: [8-1] 2016-07-15 09:15:41: pid 7155: LOG: informing the node status change to watchdog
Jul 15 09:15:41 ip-10-0-2-132 pgpool[7155]: [8-2] 2016-07-15 09:15:41: pid 7155: DETAIL: node id :2 status = "NODE DEAD" message:"No heartbeat signal from node"
Jul 15 09:15:41 ip-10-0-2-132 pgpool: 2016-07-15 09:15:41: pid 7155: LOG: informing the node status change to watchdog
Jul 15 09:15:41 ip-10-0-2-132 pgpool: 2016-07-15 09:15:41: pid 7155: DETAIL: node id :2 status = "NODE DEAD" message:"No heartbeat signal from node"
Jul 15 09:15:41 ip-10-0-2-132 pgpool[7154]: [54-1] 2016-07-15 09:15:41: pid 7154: LOG: new IPC connection received
Jul 15 09:15:41 ip-10-0-2-132 pgpool[7154]: [55-1] 2016-07-15 09:15:41: pid 7154: LOG: received node status change ipc message
Jul 15 09:15:41 ip-10-0-2-132 pgpool: 2016-07-15 09:15:41: pid 7154: LOG: new IPC connection received
Jul 15 09:15:41 ip-10-0-2-132 pgpool[7154]: [55-2] 2016-07-15 09:15:41: pid 7154: DETAIL: No heartbeat signal from node
Jul 15 09:15:41 ip-10-0-2-132 pgpool[7154]: [56-1] 2016-07-15 09:15:41: pid 7154: LOG: remote node "Linux_ip-10-0-112-190_5433" is shutting down
Jul 15 09:15:41 ip-10-0-2-132 pgpool: 2016-07-15 09:15:41: pid 7154: LOG: received node status change ipc message
Jul 15 09:15:41 ip-10-0-2-132 pgpool: 2016-07-15 09:15:41: pid 7154: DETAIL: No heartbeat signal from node
Jul 15 09:15:41 ip-10-0-2-132 pgpool: 2016-07-15 09:15:41: pid 7154: LOG: remote node "Linux_ip-10-0-112-190_5433" is shutting down
Jul 15 09:42:21 ip-10-0-2-132 pgpool[7154]: [57-1] 2016-07-15 09:42:21: pid 7154: LOG: new watchdog node connection is received from "10.0.112.190:33746"
Jul 15 09:42:21 ip-10-0-2-132 pgpool: 2016-07-15 09:42:21: pid 7154: LOG: new watchdog node connection is received from "10.0.112.190:33746"
Jul 15 09:42:21 ip-10-0-2-132 pgpool[7154]: [58-1] 2016-07-15 09:42:21: pid 7154: LOG: new outbond connection to vir-stag-pg-01:9000
Jul 15 09:42:21 ip-10-0-2-132 pgpool: 2016-07-15 09:42:21: pid 7154: LOG: new outbond connection to vir-stag-pg-01:9000
Jul 15 09:42:31 ip-10-0-2-132 pgpool[7155]: [9-1] 2016-07-15 09:42:31: pid 7155: LOG: informing the node status change to watchdog
Jul 15 09:42:31 ip-10-0-2-132 pgpool[7155]: [9-2] 2016-07-15 09:42:31: pid 7155: DETAIL: node id :2 status = "NODE ALIVE" message:"Heartbeat signal found"
Jul 15 09:42:31 ip-10-0-2-132 pgpool: 2016-07-15 09:42:31: pid 7155: LOG: informing the node status change to watchdog
Jul 15 09:42:31 ip-10-0-2-132 pgpool: 2016-07-15 09:42:31: pid 7155: DETAIL: node id :2 status = "NODE ALIVE" message:"Heartbeat signal found"
Jul 15 09:42:31 ip-10-0-2-132 pgpool[7154]: [59-1] 2016-07-15 09:42:31: pid 7154: LOG: new IPC connection received
Jul 15 09:42:31 ip-10-0-2-132 pgpool[7154]: [60-1] 2016-07-15 09:42:31: pid 7154: LOG: received node status change ipc message
Jul 15 09:42:31 ip-10-0-2-132 pgpool: 2016-07-15 09:42:31: pid 7154: LOG: new IPC connection received
Jul 15 09:42:31 ip-10-0-2-132 pgpool[7154]: [60-2] 2016-07-15 09:42:31: pid 7154: DETAIL: Heartbeat signal found
Jul 15 09:42:31 ip-10-0-2-132 pgpool: 2016-07-15 09:42:31: pid 7154: LOG: received node status change ipc message
Jul 15 09:42:31 ip-10-0-2-132 pgpool: 2016-07-15 09:42:31: pid 7154: DETAIL: Heartbeat signal found
TagsNo tags attached.

Activities

stivi21

2016-07-15 19:46

reporter   ~0000920

This report is duplicated by my mistake, please remove or close it.

Issue History

Date Modified Username Field Change
2016-07-15 19:41 stivi21 New Issue
2016-07-15 19:46 stivi21 Note Added: 0000920
2016-07-26 07:32 t-ishii Status new => closed