View Issue Details

IDProjectCategoryView StatusLast Update
0000348Pgpool-IIBugpublic2017-11-09 16:51
ReportercohavisiAssigned ToMuhammad Usama 
PriorityurgentSeveritymajorReproducibilityalways
Status feedbackResolutionopen 
PlatformOracle VirtualBoxOSCentOSOS Version6.8
Product Version3.6.6 
Target VersionFixed in Version 
Summary0000348: pgpool failover scenario ERROR - pgpool 3.6.6 (postgres 9.3.5)
DescriptionHi,
I have 2 postgres servers (postgres 9.3.5) installed with pgpool(3.6.6) on the same node.


server1 192.168.15.55
server 2 192.168.15.56
pgpool vip 192.168.15.59

# psql -U postgres -h 192.168.15.59 -p 9999 -c 'show pool_nodes'
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+---------------+------+--------+-----------+---------+------------+-------------------+-------------------
 0 | 192.168.15.55 | 5432 | up | 0.500000 | standby | 0 | false | 0
 1 | 192.168.15.56 | 5432 | up | 0.500000 | primary | 0 | true | 0
(2 rows)

error appear in a disaster recovery scenario:

1. "primary postgres" and the "master pgpool" is on server1 . pgpool vip established on this node (192.168.15.59)

2. server1 went down due to power break. (server1 is dead!!!)

3. pgpool on server server2 get promoted (vip established 192.168.15.59) .

4. postgres failover action occurs by the pgpool. postgres is being promoted to primary.

5. staring up the faulty node , pgpool started as STANDBY. and postgres is being recovered vis pcp_recovery_node tool!

when performing this action multiple times...(crashing the MASTER pgpool & primary postgres)

example:
server1 (STANDBY pgpool and secondary postgres)
server2 (MASTER pgpool and primary postgres)

the STANDBY pgpool node (server1) is not being elected as MASTER and no failback to the postgres.


only when starting up again the faulty node... (server2) the STANBY pgpool (server1) is being elected as MASTER and server2 is being elected as STANDBY but no primary postgres is available due to lack of failback command !!!


logs attached as well as pgpool.conf file of both servers!
Steps To Reproduce1. "primary postgres" and the "master pgpool" is on server1 . pgpool vip established on this node (192.168.15.59)

2. server1 went down due to power break. (server1 is dead!!!)

3. pgpool on server server2 get promoted (vip established 192.168.15.59) .

4. postgres failover action occurs by the pgpool. postgres is being promoted to primary.

5. staring up the faulty node , pgpool started as STANDBY. and postgres is being recovered vis pcp_recovery_node tool!

when performing this action multiple times...(crashing the MASTER pgpool & primary postgres)

example:
server1 (STANDBY pgpool and secondary postgres)
server2 (MASTER pgpool and primary postgres)

the STANDBY pgpool node (server1) is not being elected as MASTER and no failback to the postgres.
Additional Informationlog of the STANDBY pgpool which has not been elected as MASTER:

2017-09-21 09:53:08: pid 2014: WARNING: checking setuid bit of if_up_cmd
2017-09-21 09:53:08: pid 2014: DETAIL: ifup[/sbin/ifconfig] doesn't have setuid bit
2017-09-21 09:53:08: pid 2014: WARNING: checking setuid bit of if_down_cmd
2017-09-21 09:53:08: pid 2014: DETAIL: ifdown[/sbin/ifconfig] doesn't have setuid bit
2017-09-21 09:53:08: pid 2014: WARNING: checking setuid bit of arping command
2017-09-21 09:53:08: pid 2014: DETAIL: arping[/usr/sbin/arping] doesn't have setuid bit
2017-09-21 09:53:08: pid 2014: LOG: waiting for watchdog to initialize
2017-09-21 09:53:08: pid 2016: LOG: setting the local watchdog node name to "1.1.1.55:9999 Linux mgrdb-55"
2017-09-21 09:53:08: pid 2016: LOG: watchdog cluster is configured with 1 remote nodes
2017-09-21 09:53:08: pid 2016: LOG: watchdog remote node:0 on 1.1.1.56:9000
2017-09-21 09:53:08: pid 2016: LOG: interface monitoring is disabled in watchdog
2017-09-21 09:53:08: pid 2016: FATAL: failed to create watchdog command server socket
2017-09-21 09:53:08: pid 2016: DETAIL: bind on "/tmp/.s.PGPOOLWD_CMD.9000" failed with reason: "Address already in use"
2017-09-21 09:53:08: pid 2014: LOG: watchdog child process with pid: 2016 exits with status 768
2017-09-21 09:53:08: pid 2014: FATAL: watchdog child process exit with fatal error. exiting pgpool-II
2017-09-21 09:53:08: pid 2018: LOG: setting the local watchdog node name to "1.1.1.55:9999 Linux mgrdb-55"
2017-09-21 09:53:08: pid 2018: LOG: watchdog cluster is configured with 1 remote nodes
2017-09-21 09:53:08: pid 2018: LOG: watchdog remote node:0 on 1.1.1.56:9000
2017-09-21 09:53:08: pid 2018: LOG: interface monitoring is disabled in watchdog
2017-09-21 09:53:08: pid 2018: LOG: watchdog node state changed from [DEAD] to [LOADING]
2017-09-21 09:53:08: pid 2018: LOG: new outbond connection to 1.1.1.56:9000
2017-09-21 09:53:08: pid 2018: LOG: setting the remote node "1.1.1.56:9999 Linux mgrdb-56" as watchdog cluster master
2017-09-21 09:53:08: pid 2018: LOG: watchdog node state changed from [LOADING] to [INITIALIZING]
2017-09-21 09:53:09: pid 2018: LOG: watchdog node state changed from [INITIALIZING] to [STANDBY]
2017-09-21 09:53:09: pid 2018: LOG: successfully joined the watchdog cluster as standby node
2017-09-21 09:53:09: pid 2018: DETAIL: our join coordinator request is accepted by cluster leader node "1.1.1.56:9999 Linux mgrdb-56"
2017-09-21 09:53:10: pid 2018: LOG: new watchdog node connection is received from "1.1.1.56:40168"
2017-09-21 09:53:10: pid 2018: LOG: new node joined the cluster hostname:"1.1.1.56" port:9000 pgpool_port:9999
2017-09-21 09:55:01: pid 2018: LOG: watchdog received the failover command from "1.1.1.56:9999 Linux mgrdb-56"
2017-09-21 09:55:01: pid 2018: LOG: received failback request for node_id: 0 from pid [2018] wd_failover_id [42]




when starting the faulty node again(server2).. this node elected as MASTER:

2017-09-21 10:49:11: pid 2018: LOG: new watchdog node connection is received from "1.1.1.56:30363"
2017-09-21 10:49:11: pid 2018: LOG: new node joined the cluster hostname:"1.1.1.56" port:9000 pgpool_port:9999
2017-09-21 10:49:11: pid 2018: LOG: failed to write watchdog packet to socket
2017-09-21 10:49:11: pid 2018: DETAIL: Connection reset by peer
2017-09-21 10:49:11: pid 2018: WARNING: the coordinator as per our record is not coordinator anymore
2017-09-21 10:49:11: pid 2018: DETAIL: re-initializing the cluster
2017-09-21 10:49:11: pid 2018: LOG: watchdog node state changed from [STANDBY] to [JOINING]
2017-09-21 10:49:11: pid 2018: LOG: unassigning the remote node "1.1.1.56:9999 Linux mgrdb-56" from watchdog cluster master
2017-09-21 10:49:11: pid 2018: LOG: new outbond connection to 1.1.1.56:9000
2017-09-21 10:49:11: pid 2018: LOG: watchdog node state changed from [JOINING] to [INITIALIZING]
2017-09-21 10:49:11: pid 2018: LOG: remote node "1.1.1.56:9999 Linux mgrdb-56" is shutting down
2017-09-21 10:49:12: pid 2018: LOG: I am the only alive node in the watchdog cluster
2017-09-21 10:49:12: pid 2018: HINT: skiping stand for coordinator state
2017-09-21 10:49:12: pid 2018: LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
2017-09-21 10:49:12: pid 2018: LOG: I am announcing my self as master/coordinator watchdog node
2017-09-21 10:49:16: pid 2018: LOG: I am the cluster leader node
2017-09-21 10:49:16: pid 2018: DETAIL: our declare coordinator message is accepted by all nodes
2017-09-21 10:49:16: pid 2018: LOG: setting the local node "1.1.1.55:9999 Linux mgrdb-55" as watchdog cluster master
2017-09-21 10:49:16: pid 2018: LOG: I am the cluster leader node. Starting escalation process
2017-09-21 10:49:16: pid 2018: LOG: escalation process started with PID:10980
2017-09-21 10:49:16: pid 10980: LOG: watchdog: escalation started
2017-09-21 10:49:20: pid 10980: LOG: successfully acquired the delegate IP:"192.168.15.59"
2017-09-21 10:49:20: pid 10980: DETAIL: 'if_up_cmd' returned with success
2017-09-21 10:49:20: pid 2018: LOG: watchdog escalation process with pid: 10980 exit with SUCCESS.
2017-09-21 10:49:25: pid 2018: LOG: new watchdog node connection is received from "1.1.1.56:32923"
2017-09-21 10:49:25: pid 2018: LOG: new node joined the cluster hostname:"1.1.1.56" port:9000 pgpool_port:9999
2017-09-21 10:49:25: pid 2018: LOG: new outbond connection to 1.1.1.56:9000
2017-09-21 10:49:26: pid 2018: LOG: adding watchdog node "1.1.1.56:9999 Linux mgrdb-56" to the standby list
2017-09-21 10:50:15: pid 2018: LOG: remote node "1.1.1.56:9999 Linux mgrdb-56" is shutting down
2017-09-21 10:50:15: pid 2018: LOG: removing watchdog node "1.1.1.56:9999 Linux mgrdb-56" from the standby list
TagsNo tags attached.

Activities

cohavisi

2017-10-05 15:43

reporter  

pgpool_error.tar.gz (25,422 bytes)

t-ishii

2017-11-09 13:08

developer   ~0001816

I think Pgpool-II 3.6.7 fixes some bugs similar to your issue. Please try 3.6.7.

Issue History

Date Modified Username Field Change
2017-10-05 15:43 cohavisi New Issue
2017-10-05 15:43 cohavisi File Added: pgpool_error.tar.gz
2017-11-09 13:08 t-ishii Note Added: 0001816
2017-11-09 16:51 t-ishii Assigned To => Muhammad Usama
2017-11-09 16:51 t-ishii Status new => feedback