View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000179 | Pgpool-II | Bug | public | 2016-03-22 11:44 | 2016-03-28 10:57 |
Reporter | wangzhenbo | Assigned To | Muhammad Usama | ||
Priority | normal | Severity | minor | Reproducibility | always |
Status | assigned | Resolution | open | ||
Platform | linux | OS | ubuntu | OS Version | 12.04 |
Product Version | |||||
Target Version | Fixed in Version | ||||
Summary | 0000179: Run Continuously attaching two degenerate backend,the second backend WILLNOT SUCCESS。 | ||||
Description | Pgpool: 3.4.4 Postgresql: 9.1.2 mode: master-slave streaming replication node_id1:primary node node_id0: slave node node_id2: slave node Mar 22 09:51:59 puppetserver pgpool[155963]: [776-1] 2016-03-22 09:51:59: pid 155963: LOG: received failback request for node_id: 0 from pid [155963] Mar 22 09:51:59 puppetserver pgpool[71082]: [843-1] 2016-03-22 09:51:59: pid 71082: LOG: watchdog notifying to start interlocking Mar 22 09:51:59 puppetserver pgpool[71082]: [844-1] 2016-03-22 09:51:59: pid 71082: LOG: watchdog became a new lock holder Mar 22 09:51:59 puppetserver pgpool[71095]: [1423-1] 2016-03-22 09:51:59: pid 71095: LOG: sending watchdog response Mar 22 09:51:59 puppetserver pgpool[71095]: [1423-2] 2016-03-22 09:51:59: pid 71095: DETAIL: WD_STAND_FOR_LOCK_HOLDER received but lock holder already exists Mar 22 09:51:59 puppetserver pgpool[71095]: [1424-1] 2016-03-22 09:51:59: pid 71095: LOG: sending watchdog response Mar 22 09:51:59 puppetserver pgpool[71095]: [1424-2] 2016-03-22 09:51:59: pid 71095: DETAIL: WD_STAND_FOR_LOCK_HOLDER received but lock holder already exists Mar 22 09:51:59 puppetserver pgpool[155963]: [777-1] 2016-03-22 09:51:59: pid 155963: LOG: received failback request for node_id: 2 from pid [155963] Mar 22 09:51:59 puppetserver pgpool[155963]: [778-1] 2016-03-22 09:51:59: pid 155963: LOG: failback request for node_id: 2 from pid [155963] is canceled by other pgpool | ||||
Steps To Reproduce | 1.set node_id1 as primary 2.build streaming replication from node_id0/2 to node_id1 3.execute the two commands pcp_attach_node 5 localhost 9898 postgres postgres 0 pcp_attach_node 5 localhost 9898 postgres postgres 2 4.node_id2 status is 3 postgres=# SHOW pool_nodes; node_id | hostname | port | status | lb_weight | role ---------+----------------+------+--------+-----------+--------- 0 | 192.168.80.165 | 5432 | 2 | 0.333333 | standby 1 | 192.168.80.163 | 5432 | 2 | 0.333333 | primary 2 | 192.168.80.162 | 5432 | 3 | 0.333333 | standby (3 rows) | ||||
Tags | No tags attached. | ||||
|
You have not shared the pgpool config file, but from the log messages you shared the appearently you have enabled the watchdog and at least two pgpool nodes are connected through the watchdog. So when the watchdog is enabled, all node related commands are replicated to and processed by all connected pgpool nodes, And the next node (failback/failover) command will only succeed when the previous command is completely processed by all pgpool nodes. The message "Mar 22 09:51:59 puppetserver pgpool[155963]: [778-1] 2016-03-22 09:51:59: pid 155963: LOG: failback request for node_id: 2 from pid [155963] is canceled by other pgpool" in the above log shared by you means that the second failback command failed because the other pgpool node declined the request as it was still processing the first command. The work around for this is to give some time between issuing the multiple node commands, espacially when the watchdog is enabled on pgpool. |
|
backend_hostname0 = '192.168.80.165' # Host name or IP address to connect to for backend 0 backend_port0 = 5432 # Port number for backend 0 backend_weight0 = 1 # Weight for backend 0 (only in load balancing mode) backend_data_directory0 = '/var/lib/postgresql/9.1/main/' # Data directory for backend 0 backend_flag0 = 'ALLOW_TO_FAILOVER' # Controls various backend behavior # ALLOW_TO_FAILOVER or DISALLOW_TO_FAILOVER backend_hostname1 = '192.168.80.163' backend_port1 = 5432 backend_weight1 = 1 backend_data_directory1 = '/var/lib/postgresql/9.1/main/' backend_flag1 = 'ALLOW_TO_FAILOVER' backend_hostname2 = '192.168.80.162' backend_port2 = 5432 backend_weight2 = 1 backend_data_directory2 = '/var/lib/postgresql/9.1/main/' backend_flag2 = 'ALLOW_TO_FAILOVER' ..... follow_master_command = 'sleep 3;/var/lib/postgresql/9.1/main/pcp_recovery_nodes.sh %d %m %P ;sleep 120' ,,,,,... failover_command = '/var/lib/postgresql/9.1/main/failover_stream.sh %d %m %P' ........ failback_command = 'sleep 10' fail_over_on_backend_error = off use_watchdog = on heartbeat_destination0 = '192.168.80.163' heartbeat_destination_port0 = 9694 heartbeat_device0 = 'eth1' heartbeat_destination1 = '192.168.80.162' heartbeat_destination_port1 = 9694 heartbeat_device1 = 'eth1' |
|
I think so. So I call "pcp_attach_node 5 localhost 9898 postgres postgres $recovery_node" after "pcp_recovery_node 5 localhost 9898 postgres postgres $recovery_node" for every recovery node. And this can resolve the problem. I think whether or not you can solve the problem from the mechanism. |
Date Modified | Username | Field | Change |
---|---|---|---|
2016-03-22 11:44 | wangzhenbo | New Issue | |
2016-03-23 13:12 | t-ishii | Assigned To | => Muhammad Usama |
2016-03-23 13:12 | t-ishii | Status | new => assigned |
2016-03-27 02:35 | Muhammad Usama | Note Added: 0000727 | |
2016-03-28 10:46 | wangzhenbo | Note Added: 0000728 | |
2016-03-28 10:57 | wangzhenbo | Note Added: 0000729 |