[pgpool-general: 5785] Re: After failback, standby goes to primary, and primary goes to standy

Tue Oct 24 09:59:19 JST 2017

Hello,

> > > Then, Postgresql service is started in node 0.
> > > node 0 is standby, with status 3
> > > node 1 is primary, with status 2

How did you start the node0 in this step?
I think the "recovery" step is missed by you.

> I have checked these steps but attaching node1 (after failover without
> recovery) instead of node 0, and I can't reproduce this situation with node
> 1. Do you know if this behaviour is by desing of Pgpool? Why is it
> necessary to use pcp_recovery_node  instead of pcp_attach_node?

If you recover the node as a standby and then attach it to pgpool, 
"pcp_recovery_node" is not necessary.

Let's confirm the scenario of doing failover and recover downed backend node as a standby.

1. Star node0 and node1

   node0 : primay
   node1 : standby

2. Stop node0, and failover occurs

   node0 : down
   node1 : primary  <= failover

3. Recover node0 as standby 

   node0 : standby
   node1 : primary

   There are two ways to recover the downed node.

    (1) Recover node0 as standby by using "pcp_recovery_node".

        "pcp_recovery_node" will recover the downed node and attach it to pgpool.
        But to use the commad,you need configure 'recovery_1st_stage_command' parameter.

        Please see the following document for more details about configuring Pgpool-II online recovery.

        http://www.pgpool.net/docs/latest/en/html/example-cluster.html

    (2) Recover node0 as a standby by using such as "pg_basebackup" command, 
        then attach the node to pgpool. Because pgpool has already dettach the 
        node, you need attach the node to pgpool again, to let pgpool know the node.
        Without attach node, the status of the node will be "down", even if it is running as standby.

   If you just start the downed PostgreSQL node by using "pg_ctl start" without recovery,
   the node will be started as a primary.

On Mon, 23 Oct 2017 22:13:44 +0200
Lucas Luengas <lucasluengas at gmail.com> wrote:

> Hello Bo.
> Thank you for your answer.
> 
> I have checked these steps but attaching node1 (after failover without
> recovery) instead of node 0, and I can't reproduce this situation with node
> 1. Do you know if this behaviour is by desing of Pgpool? Why is it
> necessary to use pcp_recovery_node  instead of pcp_attach_node?
> 
> Kind regards.
> 
> On Sat, Oct 21, 2017 at 1:30 AM, Bo Peng <pengbo at sraoss.co.jp> wrote:
> 
> > Hello,
> >
> > If you want to start node0 (old primary) as standby,
> > you should use pcp_recovery_node to recovery node0 as standby.
> >
> > If you just restart node0 after failover without recovery,
> > it will run as primary.
> >
> > On Fri, 20 Oct 2017 18:41:36 +0200
> > Lucas Luengas <lucasluengas at gmail.com> wrote:
> >
> > > Hello
> > > I am testing Pgpool 3.4.13 with Postgresql-9.6, with streaming
> > replication
> > > and watchdog, on Centos 7. I have two server. Every server has installed
> > > Pgpool and Postgresql. I have installed pgpool from yum repository.
> > >
> > > Node 0 is primary, with status 2
> > > Node 1 is standby, with status 2
> > >
> > > If Postgresql service is stopped in node 0, then:
> > > node 0 is standby, with status 3
> > > node 1 is primary, with status 2. (failover)
> > >
> > > Then, Postgresql service is started in node 0.
> > > node 0 is standby, with status 3
> > > node 1 is primary, with status 2
> > >
> > > Then, I attach node 0 using pcp_attach_node command.
> > > node 0 is primary, with status 2.
> > > node 1 is standby, with status 2.
> > > Node 0 was changed to primary and node 1 was changed to standby. Why ?
> > Do I
> > > have any error in my setup?
> > > I think the correct result should be:
> > > node 0 is standby, with status 2
> > > node 1 is primary, with status 2
> > >
> > > I have repeated previous steps with pgpool 3.4.12, 3,4.11, 3.4.10 and
> > 3.4.9
> > > with same configuration and same server. I get same results.
> > > Also, I have repeated step with pgpool 3.6.6 and I get same results.
> > >
> > > Some log lines during fallback
> > >
> > > Oct 20 13:41:03 localhost pgpool[9687]: [128-1] 2017-10-20 13:41:03: pid
> > > 9687: LOG:  received failback request for node_id: 0 from pid [9687]
> > > Oct 20 13:41:03 localhost pgpool[4913]: [255-1] 2017-10-20 13:41:03: pid
> > > 4913: LOG:  watchdog notifying to start interlocking
> > > Oct 20 13:41:03 localhost pgpool[4913]: [256-1] 2017-10-20 13:41:03: pid
> > > 4913: LOG:  starting fail back. reconnect host 192.168.0.136(5432)
> > > Oct 20 13:41:03 localhost pgpool[4913]: [257-1] 2017-10-20 13:41:03: pid
> > > 4913: LOG:  Node 1 is not down (status: 2)
> > > Oct 20 13:41:04 localhost pgpool[4913]: [258-1] 2017-10-20 13:41:04: pid
> > > 4913: LOG:  Do not restart children because we are failbacking node id 0
> > > host: 192.168.0.136 port: 5432 and we are in streaming replication mode
> > and
> > > not all backends were down
> > > Oct 20 13:41:04 localhost pgpool[4913]: [259-1] 2017-10-20 13:41:04: pid
> > > 4913: LOG:  find_primary_node_repeatedly: waiting for finding a primary
> > node
> > > Oct 20 13:41:04 localhost pgpool[4913]: [260-1] 2017-10-20 13:41:04: pid
> > > 4913: LOG:  find_primary_node: checking backend no 0
> > > Oct 20 13:41:04 localhost pgpool[4913]: [260-2]
> > > Oct 20 13:41:04 localhost pgpool[4913]: [261-1] 2017-10-20 13:41:04: pid
> > > 4913: LOG:  find_primary_node: primary node id is 0
> > > Oct 20 13:41:04 localhost pgpool[4913]: [262-1] 2017-10-20 13:41:04: pid
> > > 4913: LOG:  watchdog notifying to end interlocking
> > > Oct 20 13:41:04 localhost pgpool[4913]: [263-1] 2017-10-20 13:41:04: pid
> > > 4913: LOG:  failover: set new primary node: 0
> > > Oct 20 13:41:04 localhost pgpool[4913]: [264-1] 2017-10-20 13:41:04: pid
> > > 4913: LOG:  failover: set new master node: 0
> > > Oct 20 13:41:04 localhost pgpool[4913]: [265-1] 2017-10-20 13:41:04: pid
> > > 4913: LOG:  failback done. reconnect host 192.168.0.136(5432)
> > > Oct 20 13:41:04 localhost pgpool[9688]: [194-1] 2017-10-20 13:41:04: pid
> > > 9688: LOG:  worker process received restart request
> > > Oct 20 13:41:05 localhost pgpool[9687]: [129-1] 2017-10-20 13:41:05: pid
> > > 9687: LOG:  restart request received in pcp child process
> > > Oct 20 13:41:05 localhost pgpool[4913]: [266-1] 2017-10-20 13:41:05: pid
> > > 4913: LOG:  PCP child 9687 exits with status 256 in failover()
> > > Oct 20 13:41:05 localhost pgpool[4913]: [267-1] 2017-10-20 13:41:05: pid
> > > 4913: LOG:  fork a new PCP child pid 10410 in failover()
> > > Oct 20 13:41:05 localhost pgpool[4913]: [268-1] 2017-10-20 13:41:05: pid
> > > 4913: LOG:  worker child process with pid: 9688 exits with status 256
> > > Oct 20 13:41:05 localhost pgpool[4913]: [269-1] 2017-10-20 13:41:05: pid
> > > 4913: LOG:  fork a new worker child process with pid: 10411
> > > Oct 20 13:41:10 localhost pgpool[9692]: [202-1] 2017-10-20 13:41:10: pid
> > > 9692: LOG:  selecting backend connection
> > > Oct 20 13:41:10 localhost pgpool[9692]: [202-2] 2017-10-20 13:41:10: pid
> > > 9692: DETAIL:  failback event detected, discarding existing connections
> > >
> > > Kind regards
> >
> >
> > --
> > Bo Peng <pengbo at sraoss.co.jp>
> > SRA OSS, Inc. Japan
> >
> >

-- 
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS, Inc. Japan