0000179: Run Continuously attaching two degenerate backend，the second backend WILLNOT SUCCESS。 - Pgpool-II Bug Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0000179	Pgpool-II	Bug	public	2016-03-22 11:44	2016-03-28 10:57

Reporter	wangzhenbo	Assigned To	Muhammad Usama
Priority	normal	Severity	minor	Reproducibility	always
Status	assigned	Resolution	open
Platform	linux	OS	ubuntu	OS Version	12.04

Summary	0000179: Run Continuously attaching two degenerate backend，the second backend WILLNOT SUCCESS。
Description	Pgpool: 3.4.4 Postgresql: 9.1.2 mode: master-slave streaming replication node_id1：primary node node_id0: slave node node_id2: slave node Mar 22 09:51:59 puppetserver pgpool[155963]: [776-1] 2016-03-22 09:51:59: pid 155963: LOG: received failback request for node_id: 0 from pid [155963] Mar 22 09:51:59 puppetserver pgpool[71082]: [843-1] 2016-03-22 09:51:59: pid 71082: LOG: watchdog notifying to start interlocking Mar 22 09:51:59 puppetserver pgpool[71082]: [844-1] 2016-03-22 09:51:59: pid 71082: LOG: watchdog became a new lock holder Mar 22 09:51:59 puppetserver pgpool[71095]: [1423-1] 2016-03-22 09:51:59: pid 71095: LOG: sending watchdog response Mar 22 09:51:59 puppetserver pgpool[71095]: [1423-2] 2016-03-22 09:51:59: pid 71095: DETAIL: WD_STAND_FOR_LOCK_HOLDER received but lock holder already exists Mar 22 09:51:59 puppetserver pgpool[71095]: [1424-1] 2016-03-22 09:51:59: pid 71095: LOG: sending watchdog response Mar 22 09:51:59 puppetserver pgpool[71095]: [1424-2] 2016-03-22 09:51:59: pid 71095: DETAIL: WD_STAND_FOR_LOCK_HOLDER received but lock holder already exists Mar 22 09:51:59 puppetserver pgpool[155963]: [777-1] 2016-03-22 09:51:59: pid 155963: LOG: received failback request for node_id: 2 from pid [155963] Mar 22 09:51:59 puppetserver pgpool[155963]: [778-1] 2016-03-22 09:51:59: pid 155963: LOG: failback request for node_id: 2 from pid [155963] is canceled by other pgpool
Steps To Reproduce	1.set node_id1 as primary 2.build streaming replication from node_id0/2 to node_id1 3.execute the two commands pcp_attach_node 5 localhost 9898 postgres postgres 0 pcp_attach_node 5 localhost 9898 postgres postgres 2 4.node_id2 status is 3 postgres=# SHOW pool_nodes; node_id \| hostname \| port \| status \| lb_weight \| role ---------+----------------+------+--------+-----------+--------- 0 \| 192.168.80.165 \| 5432 \| 2 \| 0.333333 \| standby 1 \| 192.168.80.163 \| 5432 \| 2 \| 0.333333 \| primary 2 \| 192.168.80.162 \| 5432 \| 3 \| 0.333333 \| standby (3 rows)
Tags	No tags attached.

Muhammad Usama 2016-03-27 02:35 developer ~0000727	You have not shared the pgpool config file, but from the log messages you shared the appearently you have enabled the watchdog and at least two pgpool nodes are connected through the watchdog. So when the watchdog is enabled, all node related commands are replicated to and processed by all connected pgpool nodes, And the next node (failback/failover) command will only succeed when the previous command is completely processed by all pgpool nodes. The message "Mar 22 09:51:59 puppetserver pgpool[155963]: [778-1] 2016-03-22 09:51:59: pid 155963: LOG: failback request for node_id: 2 from pid [155963] is canceled by other pgpool" in the above log shared by you means that the second failback command failed because the other pgpool node declined the request as it was still processing the first command. The work around for this is to give some time between issuing the multiple node commands, espacially when the watchdog is enabled on pgpool.

wangzhenbo 2016-03-28 10:46 reporter ~0000728	backend_hostname0 = '192.168.80.165' # Host name or IP address to connect to for backend 0 backend_port0 = 5432 # Port number for backend 0 backend_weight0 = 1 # Weight for backend 0 (only in load balancing mode) backend_data_directory0 = '/var/lib/postgresql/9.1/main/' # Data directory for backend 0 backend_flag0 = 'ALLOW_TO_FAILOVER' # Controls various backend behavior # ALLOW_TO_FAILOVER or DISALLOW_TO_FAILOVER backend_hostname1 = '192.168.80.163' backend_port1 = 5432 backend_weight1 = 1 backend_data_directory1 = '/var/lib/postgresql/9.1/main/' backend_flag1 = 'ALLOW_TO_FAILOVER' backend_hostname2 = '192.168.80.162' backend_port2 = 5432 backend_weight2 = 1 backend_data_directory2 = '/var/lib/postgresql/9.1/main/' backend_flag2 = 'ALLOW_TO_FAILOVER' ..... follow_master_command = 'sleep 3;/var/lib/postgresql/9.1/main/pcp_recovery_nodes.sh %d %m %P ;sleep 120' ,,,,,... failover_command = '/var/lib/postgresql/9.1/main/failover_stream.sh %d %m %P' ........ failback_command = 'sleep 10' fail_over_on_backend_error = off use_watchdog = on heartbeat_destination0 = '192.168.80.163' heartbeat_destination_port0 = 9694 heartbeat_device0 = 'eth1' heartbeat_destination1 = '192.168.80.162' heartbeat_destination_port1 = 9694 heartbeat_device1 = 'eth1'

wangzhenbo 2016-03-28 10:57 reporter ~0000729	I think so. So I call "pcp_attach_node 5 localhost 9898 postgres postgres $recovery_node" after "pcp_recovery_node 5 localhost 9898 postgres postgres $recovery_node" for every recovery node. And this can resolve the problem. I think whether or not you can solve the problem from the mechanism.

Date Modified	Username	Field	Change
2016-03-22 11:44	wangzhenbo	New Issue
2016-03-23 13:12	t-ishii	Assigned To	=> Muhammad Usama
2016-03-23 13:12	t-ishii	Status	new => assigned
2016-03-27 02:35	Muhammad Usama	Note Added: 0000727
2016-03-28 10:46	wangzhenbo	Note Added: 0000728
2016-03-28 10:57	wangzhenbo	Note Added: 0000729