[pgpool-general: 364] Re: Multiple pgpool servers failover

Matt Wise matt at nextdoor.com
Sat Apr 21 00:03:23 JST 2012

  I have thought about going down this path a few times... but I keep staying away from it because of the KISS method. Its too damn complicated, too likely to fail at the worst possible time. As it is, I have yet to find a doc that clearly explains exactly how and when all of the *_command options in the config file for PGPool get run, and with what arguments. I've been working on our PGPool auto-failover scripts for weeks, and still do not feel confident enough to turn them on in production.

  I strongly believe that PGPool needs a "pure load balancing" mode.. one where it pays attention to which server is the master, disables traffic to downed (or behind-in-replication) slaves, but does not interfere with any of the operations that a database/Ops team may be doing on the servers themselves. There are too many cases where we do not want auto-failover yet, and its simply not controllable in PGPool.

  As an example ... if I issue a "restart" to Postgres, I do not want PGPool to initiate a failover! A 5 second outage while I restart the master is FAR better than the much longer outage of re-coordinating and syncing our slaves!

  Another example is the one you have been discussing, where we want more than 1 PGPool server. A single PGPool server is too much of a single point of failure for us — so we have two. The second one needs to be "dumb" and just bounce traffic around. Nothing else.


On Apr 20, 2012, at 6:41 AM, Lou Kamenov wrote:

> Hey there, 
> My plan was to ideally put two of pgpool instances under a stateless load balancer, the problem with the failover on which pgpool I believe can be handled by a well crafted pgpool failover command by introducing some synchronization mechanism through it. 
> Eg pgpool1 detects the failing master and runs the cmd, before that it flocks a file on the next backed which and touches the trigger file, after this happens it touches another file say 'compled-failover' containing the name of the new backed. 
> Now the second pgpool detects the failing master and executes the failover command but is waiting on the lock to be released after which it checks the existence of the completed-failover file and makes a decision if it should touch the trigger or not. 
> It's a bit clunky and it doesn't handle a scenario of a cascading failure where backend1 could fail as well. 
> Possibly the solution could be just to keep a standby pgpool server and switch ips between with the failing pgpool. 
> I'm just throwing ideas right now. 
> Thanks
> Lou
> On 2012-04-20, at 9:10, Matt Wise <matt at nextdoor.com> wrote:
>> Ludwig,
>>   The problem with that (and that was our original idea) DISALLOW does not function in the way we want. When DISALLOW is set, if any server in your pool of 4 DB servers goes down, access to all of them is hung. Worse yet, if the master changes from say DB1 to DB2 (and you bring DB1 back up as a slave of DB2), the non-controlling PGPool in DISALLOW mode will never notice.
>> —Matt
>> On Apr 19, 2012, at 10:48 PM, Ludwig Adam wrote:
>>> Dear Lou, why would you use multiple pgpool instances to control failover?
>>> Perhaps it would be a solution to have one instance of pgpool to set to FAILOVER and the others to DISALLOW...? 
>>> Ludwig 
>>> Mobil gesendet.
>>> -----Original Message----- 
>>> From: Lou Kamenov [kamenovl at defx.org]
>>> Received: Freitag, 20 Apr. 2012, 3:54
>>> To: Matt Wise [matt at nextdoor.com]
>>> CC: pgpool-general at pgpool.net [pgpool-general at pgpool.net]
>>> Subject: [pgpool-general: 360] Re: Multiple pgpool servers failover
>>> On Thu, Apr 19, 2012 at 9:32 PM, Matt Wise <matt at nextdoor.com> wrote:
>>> [..]
>>> > If DISALLOW_TO_FAILOVER is set and ..
>>> >   a) one of the slaves fails: that slave is taken out of rotation until its
>>> > back up
>>> >   b) the master fails: all connections hang until the master is back, OR a
>>> > new master is detected. pgpool goes into a loop looking for new masters.
>>> I was thinking more about section B, my problem is essentially an
>>> atomic fail-over,
>>> where we are ensured that this is triggered only once.
>>> I will give it a shot and post back my findings.
>>> If anyone else has any ideas, please send them over ;)
>>> cheers,
>>> Lou
>>> _______________________________________________
>>> pgpool-general mailing list
>>> pgpool-general at pgpool.net
>>> http://www.pgpool.net/mailman/listinfo/pgpool-general

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20120420/2e3f12cb/attachment-0001.html>

More information about the pgpool-general mailing list