[pgpool-general: 5958] Re: Split-brain remedy

Pierre Timmermans ptim007 at yahoo.com
Thu Mar 1 04:53:56 JST 2018


I am using pgpool inside a docker container so I cannot tell what the service command will say
I think you should have a look at the pgpool log file at the moment you unplug the interface: it will probably say something about the fact that it cannot reach the trusted_server and that it will exclude itself from the cluster (I am not sure). You can also start pgpool in debug to get extra logging. I think that I validated that in the past, I cannot find the doc anymore
You can also execute the following command:
pcp_watchdog_info -h <ip pgpool> -p 9898 -w
it will return information about the watchdog, among others the cluster quorum
nb: due to a bug in the packaging by postgres, if you installed pgpool from postgres yum repositories (and not from pgpool) then pcp_watchdog_info will not be in the path (but in a directory somewhere, I forgot which)


Pierre 

    On Wednesday, February 28, 2018, 5:37:49 PM GMT+1, Alexander Dorogensky <amazinglifetime at gmail.com> wrote:  
 
 With 'trusted_servers' configured, when I unplug 10.0.0.1 it kills pgpool, i.e. 'service pgpool status' reports 'pgpool dead but subsys locked'.
Is that how it should be?

​Plug/unplug = ifconfig eth0 up/down​


On Tue, Feb 27, 2018 at 1:49 PM, Pierre Timmermans <ptim007 at yahoo.com> wrote:

To prevent this split brain scenario (caused by a network partition) you can use the configuration trusted_servers. This setting is a list of servers that pgpool can use to determine if a node is suffering a network partition or not. If a node cannot reach any of the servers in the list, then it will assume it is isolated (by a network partition) and will not promote itself to master.
In general, when you have only two nodes, it is not safe to do an automatic failover I believe.  Unless you have some kind of fencing mechanism (means: you can shutdown and prevent a failed node to come back after a failure).
Pierre 

    On Tuesday, February 27, 2018, 7:58:55 PM GMT+1, Alexander Dorogensky <amazinglifetime at gmail.com> wrote:  
 
 Hi All,

I have a 10.0.0.1/10.0.0.2 master/hot standby configuration with streaming replication, where each node runs pgpool with watchdog enabled and postgres.

I shut down the network interface on 10.0.0.1 and wait until 10.0.0.2 triggers failover and promotes itself to master through my failover script.

Now the watchdogs on 10.0.0.1 and 10.0.0.2 are out of sync, have conflicting views on which node has failed and both think they are master.

When I bring back the network interface on 10.0.0.1, 'show pool_nodes' says that 10.0.0.1 is master/up and 10.0.0.2 is standby/down. 

I want 10.0.0.1 to be standby and 10.0.0.2 to be master. 

I've been playing with the failover script.. e.g.

if (default network gateway is pingable) {
    shut down pgpool and postgres
} else if (this node is standby) {
    promote this node to master
    create a job that will run every minute and try to recover failed node (base backup) 
    cancel the job upon successful recovery
} 

Can you please help me with this? Any ideas would be highly appreciated.

Regards, Alex
______________________________ _________________
pgpool-general mailing list
pgpool-general at pgpool.net
http://www.pgpool.net/mailman/ listinfo/pgpool-general
  

  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20180228/34567d8e/attachment.html>


More information about the pgpool-general mailing list