View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000251 | Pgpool-II | Bug | public | 2016-09-27 21:32 | 2017-08-29 09:37 |
| Reporter | supp_k | Assigned To | Muhammad Usama | ||
| Priority | high | Severity | major | Reproducibility | always |
| Status | closed | Resolution | open | ||
| Platform | x86 64 | OS | CentOS | OS Version | 6.x |
| Product Version | 3.5.3 | ||||
| Summary | 0000251: failover_command not executed | ||||
| Description | Failover_command not executed in case server with master database dies: Environment: Server 1. - pgpool 3.5.4 (Master + VIP) Server 2. - pgpool 3.5.4 - postgresql (master database) Server 3. - pgpool 3.5.4 - postgresql (standby readonly database) | ||||
| Steps To Reproduce | 1) Setup the mentioned environment 2) Shutdown the Server 2 failover_command not performed by any server. | ||||
| Tags | No tags attached. | ||||
|
|
Hi, is there any solution available? We have the same problem and it is kind of crucial for us. pgpool2 master (without pg db) after pgpool2 slave (with pg db master) gone (network partitioning) is saying "FOLLOW MASTER lock is currently LOCKED" and will not perform failover. pgpool2-3.6.2-1.pgdg16.04+ Ubuntu 16.04.2 LTS Thanks Regards |
|
|
Can you please share the log files for both Pgpool-II and the pgpool.conf file |
|
|
Hi, please find logs and conf files attached. Note: Simulation of network partitioning was done through iptables that is why there are "Operation not permitted" errors. node1: PG master with pgpool slave node2: PG standby with pgpool slave node3: only pgpool (at the moment master, but with low prio) Another point is, that even if node3 has lower pgpool prio and there is another node with higher prio, still it is selected as master - is that correct behavior? |
|
|
As per the attached logs the Pgpool-II is correctly performing the failover. There is a permission issues (with your setup) on the pgpool-failover-pgpool95prod.sh file that prevents the Pgpool-II to execute the failover command so it may appear like the failover is not performed by Pgpool-II, but other than that everything seems to be working. See the error message in the attached node1 log (node1_pgpool.log line 251) Apr 15 13:50:41 enepg01 pgpool[3358]: sh: 1: /usr/local/sbin/pgpool-failover-pgpool95prod.sh: Permission denied And the messages "FOLLOW MASTER lock is currently LOCKED" in the node2 log are perfectly normal. These log messages are generated by Pgpool-II watchdog when Pgpool-II node inquires about the current status of failover lock from watchdog. Also the node3 log in the attached zip file is empty and if there is some error in that you are referring to, please reattach the log for pgpool-II node 3 |
|
|
Regarding your question about watchdog priority. The node with the lower priority should only be selected as a cluster master/leader if it becomes master before the nodes with higher priory join the watchdog cluster, But there was an issue in the current versions which I have already fixed that could cause the lower priority node to become the master in some cases even when the higher priority nodes at contesting to be master/leader node. |
|
|
I have fixed permissions of failover_command on enepg01 node, but this was not the issue, since this is the node which I cut from the network connection. I tried it second time and the results is the same with following situation: +-------+ | S pgp | | | +-------+ vIP +-------+ +-------+ | S pgp | | M pgp | | M db | | S db | +-------+ +-------+ I fenced (cut the connection completely) the left bottom node [S pgp/M db], and I would expect that pgpool2 recognizes that PG DB Master node is down, and starts failover by running failover_command, which promoted PG DB Slave on the right bottom node. This has not happend. Please, see the attached logs. |
|
|
In the latest logs you shared everything is working as expected and Pgpool-II is correctly detecting the failure of backend node 0 "PostgreSQL server on pg95prod01.enectiva.intranet:5435" and performing the failover on it. I think what causing the confusion is that the Pgpool-II node#3 (Pgpool-II on pgpool95prod03.enectiva.intranet) gets selected as a master/coordinator and since after the recent enhancements in the watchdog, Pgpool-II makes sure that only the watchdog master/coordinator node should execute the failover, failback and follow_master commands. But the problem is the failover_command in the pgpool.conf (node3_pgpool.conf) file for Pgpool-II node#3 is empty, So even after the failover is correctly performed (by pgpool-II node#3) the PG STANDBY never gets promoted to the PG-master (as failover_command was supposed to do that), and it appears like the failover was not performed by the Pgpool-II. Also the bug with the watchdog that it does not reliably selects the Pgpool-II node with highest wd_priority as the master/coordinator node is fixed in the latest code base and you can try the latest code of Pgpool-II which makes sure that the correct Pgpool-II node should get selected as the watchdog cluster leader. |
|
|
I have modified failover script to connect from the 3rd (witness) node to the node with PG and promote it to PG master. And finally it worked. Thank you very much for your help. Which version of pgpool2 fixes the selection priority bug? I could not find it. I found only version 2.7beta, but I would not like to install it into production. |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2016-09-27 21:32 | supp_k | New Issue | |
| 2016-09-28 09:55 | t-ishii | Assigned To | => Muhammad Usama |
| 2016-09-28 09:55 | t-ishii | Status | new => assigned |
| 2017-04-15 21:21 | slivik | Note Added: 0001422 | |
| 2017-04-17 19:01 | Muhammad Usama | Note Added: 0001424 | |
| 2017-04-17 21:44 | slivik | File Added: pgpool.tgz | |
| 2017-04-17 21:44 | slivik | Note Added: 0001425 | |
| 2017-04-17 23:20 | Muhammad Usama | Note Added: 0001426 | |
| 2017-04-17 23:27 | Muhammad Usama | Note Added: 0001427 | |
| 2017-04-18 05:53 | slivik | File Added: pgpool_logs.tgz | |
| 2017-04-18 05:53 | slivik | Note Added: 0001429 | |
| 2017-04-18 21:09 | Muhammad Usama | Note Added: 0001434 | |
| 2017-04-21 06:20 | slivik | File Added: pgpool_logs2.tgz | |
| 2017-04-21 06:20 | slivik | Note Added: 0001446 | |
| 2017-08-29 09:37 | pengbo | Status | assigned => closed |