[pgpool-general: 5959] Re: Split-brain remedy
Alexander Dorogensky
amazinglifetime at gmail.com
Thu Mar 1 07:01:06 JST 2018
It looks like pgpool child crashes.. see below, but I'm not sure..
So the question remains.. is it a bug or expected behavior?
DEBUG: watchdog trying to ping host "10.0.0.100"
WARNING: watchdog failed to ping host"10.0.0.100"
DETAIL: ping process exits with code: 2
WARNING: watchdog lifecheck, failed to connect to any trusted servers
LOG: informing the node status change to watchdog
DETAIL: node id :0 status = "NODE DEAD" message:"trusted server is
unreachable"
LOG: new IPC connection received
LOCATION: watchdog.c:3319
LOG: received node status change ipc message
DETAIL: trusted server is unreachable
DEBUG: processing node status changed to DEAD event for node ID:0
STATE MACHINE INVOKED WITH EVENT = THIS NODE LOST Current State = MASTER
WARNING: watchdog lifecheck reported, we are disconnected from the network
DETAIL: changing the state to LOST
DEBUG: removing all watchdog nodes from the standby list
DETAIL: standby list contains 1 nodes
LOG: watchdog node state changed from [MASTER] to [LOST]
DEBUG: STATE MACHINE INVOKED WITH EVENT = STATE CHANGED Current State =
LOST
FATAL: system has lost the network
LOG: Watchdog is shutting down
DEBUG: sending packet, watchdog node:[10.0.0.2:5432 Linux alex2] command
id:[67] type:[INFORM I AM GOING DOWN] state:[LOST]
DEBUG: sending watchdog packet to socket:7, type:[X], command ID:67, data
Length:0
DEBUG: sending watchdog packet, command id:[67] type:[INFORM I AM GOING
DOWN] state :[LOST]
DEBUG: new cluster command X issued with command id 67
LOG: watchdog: de-escalation started
DEBUG: shmem_exit(-1): 0 callbacks to make
DEBUG: proc_exit(-1): 0 callbacks to make
DEBUG: shmem_exit(3): 0 callbacks to make
DEBUG: proc_exit(3): 1 callbacks to make
DEBUG: exit(3)
DEBUG: shmem_exit(-1): 0 callbacks to make
DEBUG: proc_exit(-1): 0 callbacks to make
DEBUG: reaper handler
DEBUG: watchdog child process with pid: 30288 exit with FATAL ERROR.
pgpool-II will be shutdown
LOG: watchdog child process with pid: 30288 exits with status 768
FATAL: watchdog child process exit with fatal error. exiting pgpool-II
LOG: setting the local watchdog node name to "10.0.0.1:5432 Linux alex1"
LOG: watchdog cluster is configured with 1 remote nodes
LOG: watchdog remote node:0 on 10.0.0.2:9000
LOG: interface monitoring is disabled in watchdog
DEBUG: pool_write: to backend: 0 kind:X
DEBUG: pool_flush_it: flush size: 5
...
DEBUG: shmem_exit(-1): 0 callbacks to make
...
DEBUG: lifecheck child receives shutdown request signal 2, forwarding to
all children
DEBUG: lifecheck child receives fast shutdown request
DEBUG: watchdog heartbeat receiver child receives shutdown request signal 2
DEBUG: shmem_exit(-1): 0 callbacks to make
DEBUG: proc_exit(-1): 0 callbacks to make
...
On Wed, Feb 28, 2018 at 1:53 PM, Pierre Timmermans <ptim007 at yahoo.com>
wrote:
> I am using pgpool inside a docker container so I cannot tell what the
> service command will say
>
> I think you should have a look at the pgpool log file at the moment you
> unplug the interface: it will probably say something about the fact that it
> cannot reach the trusted_server and that it will exclude itself from the
> cluster (I am not sure). You can also start pgpool in debug to get extra
> logging. I think that I validated that in the past, I cannot find the doc
> anymore
>
> You can also execute the following command:
>
> pcp_watchdog_info -h <ip pgpool> -p 9898 -w
>
> it will return information about the watchdog, among others the cluster
> quorum
>
> nb: due to a bug in the packaging by postgres, if you installed pgpool
> from postgres yum repositories (and not from pgpool) then pcp_watchdog_info
> will not be in the path (but in a directory somewhere, I forgot which)
>
>
>
> Pierre
>
>
> On Wednesday, February 28, 2018, 5:37:49 PM GMT+1, Alexander Dorogensky <
> amazinglifetime at gmail.com> wrote:
>
>
> With 'trusted_servers' configured, when I unplug 10.0.0.1 it kills pgpool,
> i.e. 'service pgpool status' reports 'pgpool dead but subsys locked'.
> Is that how it should be?
>
> Plug/unplug = ifconfig eth0 up/down
>
>
>
> On Tue, Feb 27, 2018 at 1:49 PM, Pierre Timmermans <ptim007 at yahoo.com>
> wrote:
>
> To prevent this split brain scenario (caused by a network partition) you
> can use the configuration trusted_servers. This setting is a list of
> servers that pgpool can use to determine if a node is suffering a network
> partition or not. If a node cannot reach any of the servers in the list,
> then it will assume it is isolated (by a network partition) and will not
> promote itself to master.
>
> In general, when you have only two nodes, it is not safe to do an
> automatic failover I believe. Unless you have some kind of fencing
> mechanism (means: you can shutdown and prevent a failed node to come back
> after a failure).
>
> Pierre
>
>
> On Tuesday, February 27, 2018, 7:58:55 PM GMT+1, Alexander Dorogensky <
> amazinglifetime at gmail.com> wrote:
>
>
> Hi All,
>
> I have a 10.0.0.1/10.0.0.2 master/hot standby configuration with
> streaming replication, where each node runs pgpool with watchdog enabled
> and postgres.
>
> I shut down the network interface on 10.0.0.1 and wait until 10.0.0.2
> triggers failover and promotes itself to master through my failover script.
>
> Now the watchdogs on 10.0.0.1 and 10.0.0.2 are out of sync, have
> conflicting views on which node has failed and both think they are master.
>
> When I bring back the network interface on 10.0.0.1, 'show pool_nodes'
> says that 10.0.0.1 is master/up and 10.0.0.2 is standby/down.
>
> I want 10.0.0.1 to be standby and 10.0.0.2 to be master.
>
> I've been playing with the failover script.. e.g.
>
> if (default network gateway is pingable) {
> shut down pgpool and postgres
> } else if (this node is standby) {
> promote this node to master
> create a job that will run every minute and try to recover failed node
> (base backup)
> cancel the job upon successful recovery
> }
>
> Can you please help me with this? Any ideas would be highly appreciated.
>
> Regards, Alex
> ______________________________ _________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/ listinfo/pgpool-general
> <http://www.pgpool.net/mailman/listinfo/pgpool-general>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20180228/b7b8c416/attachment.htm>
More information about the pgpool-general
mailing list