View Issue Details

IDProjectCategoryView StatusLast Update
0000179Pgpool-IIBugpublic2016-03-28 10:57
ReporterwangzhenboAssigned ToMuhammad Usama 
PrioritynormalSeverityminorReproducibilityalways
Status assignedResolutionopen 
PlatformlinuxOSubuntuOS Version12.04
Product Version 
Target VersionFixed in Version 
Summary0000179: Run Continuously attaching two degenerate backend,the second backend WILLNOT SUCCESS。
DescriptionPgpool: 3.4.4
Postgresql: 9.1.2
mode: master-slave streaming replication
node_id1:primary node
node_id0: slave node
node_id2: slave node

Mar 22 09:51:59 puppetserver pgpool[155963]: [776-1] 2016-03-22 09:51:59: pid 155963: LOG: received failback request for node_id: 0 from pid [155963]
Mar 22 09:51:59 puppetserver pgpool[71082]: [843-1] 2016-03-22 09:51:59: pid 71082: LOG: watchdog notifying to start interlocking
Mar 22 09:51:59 puppetserver pgpool[71082]: [844-1] 2016-03-22 09:51:59: pid 71082: LOG: watchdog became a new lock holder
Mar 22 09:51:59 puppetserver pgpool[71095]: [1423-1] 2016-03-22 09:51:59: pid 71095: LOG: sending watchdog response
Mar 22 09:51:59 puppetserver pgpool[71095]: [1423-2] 2016-03-22 09:51:59: pid 71095: DETAIL: WD_STAND_FOR_LOCK_HOLDER received but lock holder already exists
Mar 22 09:51:59 puppetserver pgpool[71095]: [1424-1] 2016-03-22 09:51:59: pid 71095: LOG: sending watchdog response
Mar 22 09:51:59 puppetserver pgpool[71095]: [1424-2] 2016-03-22 09:51:59: pid 71095: DETAIL: WD_STAND_FOR_LOCK_HOLDER received but lock holder already exists
Mar 22 09:51:59 puppetserver pgpool[155963]: [777-1] 2016-03-22 09:51:59: pid 155963: LOG: received failback request for node_id: 2 from pid [155963]
Mar 22 09:51:59 puppetserver pgpool[155963]: [778-1] 2016-03-22 09:51:59: pid 155963: LOG: failback request for node_id: 2 from pid [155963] is canceled by other pgpool
Steps To Reproduce1.set node_id1 as primary
2.build streaming replication from node_id0/2 to node_id1
3.execute the two commands
pcp_attach_node 5 localhost 9898 postgres postgres 0
pcp_attach_node 5 localhost 9898 postgres postgres 2
4.node_id2 status is 3
postgres=# SHOW pool_nodes;
 node_id | hostname | port | status | lb_weight | role
---------+----------------+------+--------+-----------+---------
 0 | 192.168.80.165 | 5432 | 2 | 0.333333 | standby
 1 | 192.168.80.163 | 5432 | 2 | 0.333333 | primary
 2 | 192.168.80.162 | 5432 | 3 | 0.333333 | standby
(3 rows)
TagsNo tags attached.

Activities

Muhammad Usama

2016-03-27 02:35

developer   ~0000727

You have not shared the pgpool config file, but from the log messages you shared the appearently you have enabled the watchdog and at least two pgpool nodes are connected through the watchdog. So when the watchdog is enabled, all node related commands are replicated to and processed by all connected pgpool nodes, And the next node (failback/failover) command will only succeed when the previous command is completely processed by all pgpool nodes.

The message "Mar 22 09:51:59 puppetserver pgpool[155963]: [778-1] 2016-03-22 09:51:59: pid 155963: LOG: failback request for node_id: 2 from pid [155963] is canceled by other pgpool" in the above log shared by you means that the second failback command failed because the other pgpool node declined the request as it was still processing the first command.

The work around for this is to give some time between issuing the multiple node commands, espacially when the watchdog is enabled on pgpool.

wangzhenbo

2016-03-28 10:46

reporter   ~0000728

backend_hostname0 = '192.168.80.165'
                                   # Host name or IP address to connect to for backend 0
backend_port0 = 5432
                                   # Port number for backend 0
backend_weight0 = 1
                                   # Weight for backend 0 (only in load balancing mode)
backend_data_directory0 = '/var/lib/postgresql/9.1/main/'
                                   # Data directory for backend 0
backend_flag0 = 'ALLOW_TO_FAILOVER'
                                   # Controls various backend behavior
                                   # ALLOW_TO_FAILOVER or DISALLOW_TO_FAILOVER
backend_hostname1 = '192.168.80.163'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/postgresql/9.1/main/'
backend_flag1 = 'ALLOW_TO_FAILOVER'

backend_hostname2 = '192.168.80.162'
backend_port2 = 5432
backend_weight2 = 1
backend_data_directory2 = '/var/lib/postgresql/9.1/main/'
backend_flag2 = 'ALLOW_TO_FAILOVER'
.....
follow_master_command = 'sleep 3;/var/lib/postgresql/9.1/main/pcp_recovery_nodes.sh %d %m %P ;sleep 120'
,,,,,...
failover_command = '/var/lib/postgresql/9.1/main/failover_stream.sh %d %m %P'
........
failback_command = 'sleep 10'
fail_over_on_backend_error = off
use_watchdog = on
heartbeat_destination0 = '192.168.80.163'
heartbeat_destination_port0 = 9694
heartbeat_device0 = 'eth1'
heartbeat_destination1 = '192.168.80.162'
heartbeat_destination_port1 = 9694
heartbeat_device1 = 'eth1'

wangzhenbo

2016-03-28 10:57

reporter   ~0000729

I think so.
So I call "pcp_attach_node 5 localhost 9898 postgres postgres $recovery_node" after "pcp_recovery_node 5 localhost 9898 postgres postgres $recovery_node" for every recovery node.

And this can resolve the problem.

I think whether or not you can solve the problem from the mechanism.

Issue History

Date Modified Username Field Change
2016-03-22 11:44 wangzhenbo New Issue
2016-03-23 13:12 t-ishii Assigned To => Muhammad Usama
2016-03-23 13:12 t-ishii Status new => assigned
2016-03-27 02:35 Muhammad Usama Note Added: 0000727
2016-03-28 10:46 wangzhenbo Note Added: 0000728
2016-03-28 10:57 wangzhenbo Note Added: 0000729