[pgpool-committers: 3514] pgpool: Fixing a problem with the watchdog failover_command locking mec

Muhammad Usama m.usama at gmail.com
Tue Sep 20 04:45:53 JST 2016

Fixing a problem with the watchdog failover_command locking mechanism

From pgpool-II 3.5 watchdog was using the separate individual locks for each
node-failover command(failover, failback and follow-master) and the lock was
acquired just before executing the respective failover script and was released
as soon as the script execution finishes. This technique although was very
efficient but also had a problem. If the failover command takes a very little
time and gets finished before the lock request from other pgpool-II node
arrives, the other node is also granted a lock, since the lock was already
released by the first node at that time. Consequently, both nodes ends up
executing the failover script.
So to fix this we are reverting back to the tested failover interlocking design
used prior to pgpool-II 3.5 where all the commands gets locked at the failover
start by the node that becomes a lock-holder and each command lock is released
after its execution finishes. And only the lock-holder node is allowed to
acquire/release the individual command lock. That way the lock-holder node
keeps the lock-holder status throughout the span of the failover execution and
the system becomes less time sensitive.

The issue was identified by Yugo<nagata at sraoss.co.jp>



Modified Files
src/include/pool.h                     |   4 +-
src/include/watchdog/wd_ipc_commands.h |  14 +-
src/include/watchdog/wd_ipc_defines.h  |  32 ++-
src/main/pgpool_main.c                 |  64 +++---
src/watchdog/watchdog.c                | 395 +++++++++++++++++++++------------
src/watchdog/wd_commands.c             |  79 +++----
6 files changed, 351 insertions(+), 237 deletions(-)

More information about the pgpool-committers mailing list