View Issue Details

IDProjectCategoryView StatusLast Update
0000251Pgpool-IIBugpublic2017-08-29 09:37
Reportersupp_k Assigned ToMuhammad Usama  
PriorityhighSeveritymajorReproducibilityalways
Status closedResolutionopen 
Platformx86 64OSCentOSOS Version6.x
Product Version3.5.3 
Summary0000251: failover_command not executed
DescriptionFailover_command not executed in case server with master database dies:

Environment:
Server 1.
  - pgpool 3.5.4 (Master + VIP)

Server 2.
  - pgpool 3.5.4
  - postgresql (master database)

Server 3.
  - pgpool 3.5.4
  - postgresql (standby readonly database)
Steps To Reproduce1) Setup the mentioned environment
2) Shutdown the Server 2

failover_command not performed by any server.

TagsNo tags attached.

Activities

slivik

2017-04-15 21:21

reporter   ~0001422

Hi, is there any solution available?
We have the same problem and it is kind of crucial for us.

pgpool2 master (without pg db) after pgpool2 slave (with pg db master) gone (network partitioning) is saying "FOLLOW MASTER lock is currently LOCKED" and will not perform failover.

pgpool2-3.6.2-1.pgdg16.04+
Ubuntu 16.04.2 LTS


Thanks
Regards

Muhammad Usama

2017-04-17 19:01

developer   ~0001424

Can you please share the log files for both Pgpool-II and the pgpool.conf file

slivik

2017-04-17 21:44

reporter   ~0001425

Hi, please find logs and conf files attached.
Note: Simulation of network partitioning was done through iptables that is why there are "Operation not permitted" errors.

node1: PG master with pgpool slave
node2: PG standby with pgpool slave
node3: only pgpool (at the moment master, but with low prio)

Another point is, that even if node3 has lower pgpool prio and there is another node with higher prio, still it is selected as master - is that correct behavior?
pgpool.tgz (38,178 bytes)

Muhammad Usama

2017-04-17 23:20

developer   ~0001426

As per the attached logs the Pgpool-II is correctly performing the failover. There is a permission issues (with your setup) on the pgpool-failover-pgpool95prod.sh file that prevents the Pgpool-II to execute the failover command so it may appear like the failover is not performed by Pgpool-II, but other than that everything seems to be working.

See the error message in the attached node1 log
(node1_pgpool.log line 251)
Apr 15 13:50:41 enepg01 pgpool[3358]: sh: 1: /usr/local/sbin/pgpool-failover-pgpool95prod.sh: Permission denied

And the messages "FOLLOW MASTER lock is currently LOCKED" in the node2 log are perfectly normal. These log messages are generated by Pgpool-II watchdog when Pgpool-II node inquires about the current status of failover lock from watchdog.

Also the node3 log in the attached zip file is empty and if there is some error in that you are referring to, please reattach the log for pgpool-II node 3

Muhammad Usama

2017-04-17 23:27

developer   ~0001427

Regarding your question about watchdog priority. The node with the lower priority should only be selected as a cluster master/leader if it becomes master before the nodes with higher priory join the watchdog cluster, But there was an issue in the current versions which I have already fixed that could cause the lower priority node to become the master in some cases even when the higher priority nodes at contesting to be master/leader node.

slivik

2017-04-18 05:53

reporter   ~0001429

I have fixed permissions of failover_command on enepg01 node, but this was not the issue, since this is the node which I cut from the network connection.

I tried it second time and the results is the same with following situation:

         +-------+
         | S pgp |
         | |
         +-------+

            vIP
+-------+ +-------+
| S pgp | | M pgp |
| M db | | S db |
+-------+ +-------+

I fenced (cut the connection completely) the left bottom node [S pgp/M db], and I would expect that pgpool2 recognizes that PG DB Master node is down, and starts failover by running failover_command, which promoted PG DB Slave on the right bottom node. This has not happend.

Please, see the attached logs.
pgpool_logs.tgz (11,434 bytes)

Muhammad Usama

2017-04-18 21:09

developer   ~0001434

In the latest logs you shared everything is working as expected and Pgpool-II is correctly detecting the failure of backend node 0 "PostgreSQL server on pg95prod01.enectiva.intranet:5435" and performing the failover on it.

I think what causing the confusion is that the Pgpool-II node#3 (Pgpool-II on pgpool95prod03.enectiva.intranet) gets selected as a master/coordinator and since after the recent enhancements in the watchdog, Pgpool-II makes sure that only the watchdog master/coordinator node should execute the failover, failback and follow_master commands.
But the problem is the failover_command in the pgpool.conf (node3_pgpool.conf) file for Pgpool-II node#3 is empty, So even after the failover is correctly performed (by pgpool-II node#3) the PG STANDBY never gets promoted to the PG-master (as failover_command was supposed to do that), and it appears like the failover was not performed by the Pgpool-II.

Also the bug with the watchdog that it does not reliably selects the Pgpool-II node with highest wd_priority as the master/coordinator node is fixed in the latest code base and you can try the latest code of Pgpool-II which makes sure that the correct Pgpool-II node should get selected as the watchdog cluster leader.

slivik

2017-04-21 06:20

reporter   ~0001446

I have modified failover script to connect from the 3rd (witness) node to the node with PG and promote it to PG master. And finally it worked. Thank you very much for your help.

Which version of pgpool2 fixes the selection priority bug? I could not find it. I found only version 2.7beta, but I would not like to install it into production.
pgpool_logs2.tgz (8,394 bytes)

Issue History

Date Modified Username Field Change
2016-09-27 21:32 supp_k New Issue
2016-09-28 09:55 t-ishii Assigned To => Muhammad Usama
2016-09-28 09:55 t-ishii Status new => assigned
2017-04-15 21:21 slivik Note Added: 0001422
2017-04-17 19:01 Muhammad Usama Note Added: 0001424
2017-04-17 21:44 slivik File Added: pgpool.tgz
2017-04-17 21:44 slivik Note Added: 0001425
2017-04-17 23:20 Muhammad Usama Note Added: 0001426
2017-04-17 23:27 Muhammad Usama Note Added: 0001427
2017-04-18 05:53 slivik File Added: pgpool_logs.tgz
2017-04-18 05:53 slivik Note Added: 0001429
2017-04-18 21:09 Muhammad Usama Note Added: 0001434
2017-04-21 06:20 slivik File Added: pgpool_logs2.tgz
2017-04-21 06:20 slivik Note Added: 0001446
2017-08-29 09:37 pengbo Status assigned => closed