[pgpool-general: 5787] Re: After failback, standby goes to primary, and primary goes to standy

Bo Peng pengbo at sraoss.co.jp
Wed Oct 25 08:45:52 JST 2017


Hello Lucas,

> Excuse me, I think my explanation is not complete.
> In this step I start node0 with command "systemctl start postgresql-9.6".
> Previously I have not recovered node0 because I have started replication
> between both nodes. I know that in a production environment it is necessary
> a replication between both nodes, but I did not make this replication
> because I wanted to check that pcp_attach_command will fail and node0 will
> continue with status 3 because no replication between nodes.

Sorry for my misunderstanding.

Now I have understood your question.

> > > > > Then, Postgresql service is started in node 0.
> > > > > node 0 is standby, with status 3
> > > > > node 1 is primary, with status 2
> > > > >
> > > > > Then, I attach node 0 using pcp_attach_node command.
> > > > > node 0 is primary, with status 2.
> > > > > node 1 is standby, with status 2.
> > > > > Node 0 was changed to primary and node 1 was changed to standby. Why
> > ?

When you attach the node0 to Pgpool-II, then status of node0 changed [3 (down) -> 2 (up)].

The reason why node0 became primary and node1 became standby is that
Pgpool will choose the node (not down) with smaller id as primary.
In this case, because "id 0 < id 1", Pgpool-II chose node0 as primary and
set other nodes as standby. After that Pgpool-II will send all writing queries to 
the primay node0.

This is a feature of Pgpool-II that 
if more than one node is up, Pgpool-II will choose the node with smaller id as primary.


On Tue, 24 Oct 2017 21:05:54 +0200
Lucas Luengas <lucasluengas at gmail.com> wrote:

> Hello Bo
> 
> On Tue, Oct 24, 2017 at 2:59 AM, Bo Peng <pengbo at sraoss.co.jp> wrote:
> 
> > Hello,
> >
> > > > > Then, Postgresql service is started in node 0.
> > > > > node 0 is standby, with status 3
> > > > > node 1 is primary, with status 2
> >
> > How did you start the node0 in this step?
> > I think the "recovery" step is missed by you.
> >
> >
> Excuse me, I think my explanation is not complete.
> In this step I start node0 with command "systemctl start postgresql-9.6".
> Previously I have not recovered node0 because I have started replication
> between both nodes. I know that in a production environment it is necessary
> a replication between both nodes, but I did not make this replication
> because I wanted to check that pcp_attach_command will fail and node0 will
> continue with status 3 because no replication between nodes.
> 
> >
> > > I have checked these steps but attaching node1 (after failover without
> > > recovery) instead of node 0, and I can't reproduce this situation with
> > node
> > > 1. Do you know if this behaviour is by desing of Pgpool? Why is it
> > > necessary to use pcp_recovery_node  instead of pcp_attach_node?
> >
> > If you recover the node as a standby and then attach it to pgpool,
> > "pcp_recovery_node" is not necessary.
> >
> >
> > Let's confirm the scenario of doing failover and recover downed backend
> > node as a standby.
> >
> > 1. Star node0 and node1
> >
> >    node0 : primay
> >    node1 : standby
> >
> > 2. Stop node0, and failover occurs
> >
> >    node0 : down
> >    node1 : primary  <= failover
> >
> > 3. Recover node0 as standby
> >
> >    node0 : standby
> >    node1 : primary
> >
> >    There are two ways to recover the downed node.
> >
> >     (1) Recover node0 as standby by using "pcp_recovery_node".
> >
> >         "pcp_recovery_node" will recover the downed node and attach it to
> > pgpool.
> >         But to use the commad,you need configure
> > 'recovery_1st_stage_command' parameter.
> >
> >         Please see the following document for more details about
> > configuring Pgpool-II online recovery.
> >
> >         http://www.pgpool.net/docs/latest/en/html/example-cluster.html
> >
> >     (2) Recover node0 as a standby by using such as "pg_basebackup"
> > command,
> >         then attach the node to pgpool. Because pgpool has already dettach
> > the
> >         node, you need attach the node to pgpool again, to let pgpool know
> > the node.
> >         Without attach node, the status of the node will be "down", even
> > if it is running as standby.
> >
> >    If you just start the downed PostgreSQL node by using "pg_ctl start"
> > without recovery,
> >    the node will be started as a primary.
> >
> >
> Hello Bo.
> Thank you for your explanation.
> I use way number 2, with a shell script with Postgresql pg_basebackup
> command
> For this test, I did not use any recovery way, only I started Postgresql
> database service, and I run pcp_attach_command on node0. I know it is
> incorrect. Only I want to check that I run pcp_attach_command on any node
> and previously I don't have a replication between both node, then pgpool
> will fail during attach process. I don't understand why pcpool make node0
> as primary and node1 as standby if previously node0 is standby and node1 is
> primary and there is no replication between nodes.
> Excuse me because I think my first email is not enough clear
> 
> 
> 
> >
> > On Mon, 23 Oct 2017 22:13:44 +0200
> > Lucas Luengas <lucasluengas at gmail.com> wrote:
> >
> > > Hello Bo.
> > > Thank you for your answer.
> > >
> > > I have checked these steps but attaching node1 (after failover without
> > > recovery) instead of node 0, and I can't reproduce this situation with
> > node
> > > 1. Do you know if this behaviour is by desing of Pgpool? Why is it
> > > necessary to use pcp_recovery_node  instead of pcp_attach_node?
> > >
> > > Kind regards.
> > >
> > > On Sat, Oct 21, 2017 at 1:30 AM, Bo Peng <pengbo at sraoss.co.jp> wrote:
> > >
> > > > Hello,
> > > >
> > > > If you want to start node0 (old primary) as standby,
> > > > you should use pcp_recovery_node to recovery node0 as standby.
> > > >
> > > > If you just restart node0 after failover without recovery,
> > > > it will run as primary.
> > > >
> > > > On Fri, 20 Oct 2017 18:41:36 +0200
> > > > Lucas Luengas <lucasluengas at gmail.com> wrote:
> > > >
> > > > > Hello
> > > > > I am testing Pgpool 3.4.13 with Postgresql-9.6, with streaming
> > > > replication
> > > > > and watchdog, on Centos 7. I have two server. Every server has
> > installed
> > > > > Pgpool and Postgresql. I have installed pgpool from yum repository.
> > > > >
> > > > > Node 0 is primary, with status 2
> > > > > Node 1 is standby, with status 2
> > > > >
> > > > > If Postgresql service is stopped in node 0, then:
> > > > > node 0 is standby, with status 3
> > > > > node 1 is primary, with status 2. (failover)
> > > > >
> > > > > Then, Postgresql service is started in node 0.
> > > > > node 0 is standby, with status 3
> > > > > node 1 is primary, with status 2
> > > > >
> > > > > Then, I attach node 0 using pcp_attach_node command.
> > > > > node 0 is primary, with status 2.
> > > > > node 1 is standby, with status 2.
> > > > > Node 0 was changed to primary and node 1 was changed to standby. Why
> > ?
> > > > Do I
> > > > > have any error in my setup?
> > > > > I think the correct result should be:
> > > > > node 0 is standby, with status 2
> > > > > node 1 is primary, with status 2
> > > > >
> > > > > I have repeated previous steps with pgpool 3.4.12, 3,4.11, 3.4.10 and
> > > > 3.4.9
> > > > > with same configuration and same server. I get same results.
> > > > > Also, I have repeated step with pgpool 3.6.6 and I get same results.
> > > > >
> > > > > Some log lines during fallback
> > > > >
> > > > > Oct 20 13:41:03 localhost pgpool[9687]: [128-1] 2017-10-20 13:41:03:
> > pid
> > > > > 9687: LOG:  received failback request for node_id: 0 from pid [9687]
> > > > > Oct 20 13:41:03 localhost pgpool[4913]: [255-1] 2017-10-20 13:41:03:
> > pid
> > > > > 4913: LOG:  watchdog notifying to start interlocking
> > > > > Oct 20 13:41:03 localhost pgpool[4913]: [256-1] 2017-10-20 13:41:03:
> > pid
> > > > > 4913: LOG:  starting fail back. reconnect host 192.168.0.136(5432)
> > > > > Oct 20 13:41:03 localhost pgpool[4913]: [257-1] 2017-10-20 13:41:03:
> > pid
> > > > > 4913: LOG:  Node 1 is not down (status: 2)
> > > > > Oct 20 13:41:04 localhost pgpool[4913]: [258-1] 2017-10-20 13:41:04:
> > pid
> > > > > 4913: LOG:  Do not restart children because we are failbacking node
> > id 0
> > > > > host: 192.168.0.136 port: 5432 and we are in streaming replication
> > mode
> > > > and
> > > > > not all backends were down
> > > > > Oct 20 13:41:04 localhost pgpool[4913]: [259-1] 2017-10-20 13:41:04:
> > pid
> > > > > 4913: LOG:  find_primary_node_repeatedly: waiting for finding a
> > primary
> > > > node
> > > > > Oct 20 13:41:04 localhost pgpool[4913]: [260-1] 2017-10-20 13:41:04:
> > pid
> > > > > 4913: LOG:  find_primary_node: checking backend no 0
> > > > > Oct 20 13:41:04 localhost pgpool[4913]: [260-2]
> > > > > Oct 20 13:41:04 localhost pgpool[4913]: [261-1] 2017-10-20 13:41:04:
> > pid
> > > > > 4913: LOG:  find_primary_node: primary node id is 0
> > > > > Oct 20 13:41:04 localhost pgpool[4913]: [262-1] 2017-10-20 13:41:04:
> > pid
> > > > > 4913: LOG:  watchdog notifying to end interlocking
> > > > > Oct 20 13:41:04 localhost pgpool[4913]: [263-1] 2017-10-20 13:41:04:
> > pid
> > > > > 4913: LOG:  failover: set new primary node: 0
> > > > > Oct 20 13:41:04 localhost pgpool[4913]: [264-1] 2017-10-20 13:41:04:
> > pid
> > > > > 4913: LOG:  failover: set new master node: 0
> > > > > Oct 20 13:41:04 localhost pgpool[4913]: [265-1] 2017-10-20 13:41:04:
> > pid
> > > > > 4913: LOG:  failback done. reconnect host 192.168.0.136(5432)
> > > > > Oct 20 13:41:04 localhost pgpool[9688]: [194-1] 2017-10-20 13:41:04:
> > pid
> > > > > 9688: LOG:  worker process received restart request
> > > > > Oct 20 13:41:05 localhost pgpool[9687]: [129-1] 2017-10-20 13:41:05:
> > pid
> > > > > 9687: LOG:  restart request received in pcp child process
> > > > > Oct 20 13:41:05 localhost pgpool[4913]: [266-1] 2017-10-20 13:41:05:
> > pid
> > > > > 4913: LOG:  PCP child 9687 exits with status 256 in failover()
> > > > > Oct 20 13:41:05 localhost pgpool[4913]: [267-1] 2017-10-20 13:41:05:
> > pid
> > > > > 4913: LOG:  fork a new PCP child pid 10410 in failover()
> > > > > Oct 20 13:41:05 localhost pgpool[4913]: [268-1] 2017-10-20 13:41:05:
> > pid
> > > > > 4913: LOG:  worker child process with pid: 9688 exits with status 256
> > > > > Oct 20 13:41:05 localhost pgpool[4913]: [269-1] 2017-10-20 13:41:05:
> > pid
> > > > > 4913: LOG:  fork a new worker child process with pid: 10411
> > > > > Oct 20 13:41:10 localhost pgpool[9692]: [202-1] 2017-10-20 13:41:10:
> > pid
> > > > > 9692: LOG:  selecting backend connection
> > > > > Oct 20 13:41:10 localhost pgpool[9692]: [202-2] 2017-10-20 13:41:10:
> > pid
> > > > > 9692: DETAIL:  failback event detected, discarding existing
> > connections
> > > > >
> > > > > Kind regards
> > > >
> > > >
> > > > --
> > > > Bo Peng <pengbo at sraoss.co.jp>
> > > > SRA OSS, Inc. Japan
> > > >
> > > >
> >
> >
> > --
> > Bo Peng <pengbo at sraoss.co.jp>
> > SRA OSS, Inc. Japan
> >
> >


-- 
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS, Inc. Japan



More information about the pgpool-general mailing list