[pgpool-general: 5786] Re: After failback, standby goes to primary, and primary goes to standy

Wed Oct 25 04:05:54 JST 2017

Hello Bo

On Tue, Oct 24, 2017 at 2:59 AM, Bo Peng <pengbo at sraoss.co.jp> wrote:

> Hello,
>
> > > > Then, Postgresql service is started in node 0.
> > > > node 0 is standby, with status 3
> > > > node 1 is primary, with status 2
>
> How did you start the node0 in this step?
> I think the "recovery" step is missed by you.
>
>
Excuse me, I think my explanation is not complete.
In this step I start node0 with command "systemctl start postgresql-9.6".
Previously I have not recovered node0 because I have started replication
between both nodes. I know that in a production environment it is necessary
a replication between both nodes, but I did not make this replication
because I wanted to check that pcp_attach_command will fail and node0 will
continue with status 3 because no replication between nodes.

>
> > I have checked these steps but attaching node1 (after failover without
> > recovery) instead of node 0, and I can't reproduce this situation with
> node
> > 1. Do you know if this behaviour is by desing of Pgpool? Why is it
> > necessary to use pcp_recovery_node  instead of pcp_attach_node?
>
> If you recover the node as a standby and then attach it to pgpool,
> "pcp_recovery_node" is not necessary.
>
>
> Let's confirm the scenario of doing failover and recover downed backend
> node as a standby.
>
> 1. Star node0 and node1
>
>    node0 : primay
>    node1 : standby
>
> 2. Stop node0, and failover occurs
>
>    node0 : down
>    node1 : primary  <= failover
>
> 3. Recover node0 as standby
>
>    node0 : standby
>    node1 : primary
>
>    There are two ways to recover the downed node.
>
>     (1) Recover node0 as standby by using "pcp_recovery_node".
>
>         "pcp_recovery_node" will recover the downed node and attach it to
> pgpool.
>         But to use the commad,you need configure
> 'recovery_1st_stage_command' parameter.
>
>         Please see the following document for more details about
> configuring Pgpool-II online recovery.
>
>         http://www.pgpool.net/docs/latest/en/html/example-cluster.html
>
>     (2) Recover node0 as a standby by using such as "pg_basebackup"
> command,
>         then attach the node to pgpool. Because pgpool has already dettach
> the
>         node, you need attach the node to pgpool again, to let pgpool know
> the node.
>         Without attach node, the status of the node will be "down", even
> if it is running as standby.
>
>    If you just start the downed PostgreSQL node by using "pg_ctl start"
> without recovery,
>    the node will be started as a primary.
>
>
Hello Bo.
Thank you for your explanation.
I use way number 2, with a shell script with Postgresql pg_basebackup
command
For this test, I did not use any recovery way, only I started Postgresql
database service, and I run pcp_attach_command on node0. I know it is
incorrect. Only I want to check that I run pcp_attach_command on any node
and previously I don't have a replication between both node, then pgpool
will fail during attach process. I don't understand why pcpool make node0
as primary and node1 as standby if previously node0 is standby and node1 is
primary and there is no replication between nodes.
Excuse me because I think my first email is not enough clear

>
> On Mon, 23 Oct 2017 22:13:44 +0200
> Lucas Luengas <lucasluengas at gmail.com> wrote:
>
> > Hello Bo.
> > Thank you for your answer.
> >
> > I have checked these steps but attaching node1 (after failover without
> > recovery) instead of node 0, and I can't reproduce this situation with
> node
> > 1. Do you know if this behaviour is by desing of Pgpool? Why is it
> > necessary to use pcp_recovery_node  instead of pcp_attach_node?
> >
> > Kind regards.
> >
> > On Sat, Oct 21, 2017 at 1:30 AM, Bo Peng <pengbo at sraoss.co.jp> wrote:
> >
> > > Hello,
> > >
> > > If you want to start node0 (old primary) as standby,
> > > you should use pcp_recovery_node to recovery node0 as standby.
> > >
> > > If you just restart node0 after failover without recovery,
> > > it will run as primary.
> > >
> > > On Fri, 20 Oct 2017 18:41:36 +0200
> > > Lucas Luengas <lucasluengas at gmail.com> wrote:
> > >
> > > > Hello
> > > > I am testing Pgpool 3.4.13 with Postgresql-9.6, with streaming
> > > replication
> > > > and watchdog, on Centos 7. I have two server. Every server has
> installed
> > > > Pgpool and Postgresql. I have installed pgpool from yum repository.
> > > >
> > > > Node 0 is primary, with status 2
> > > > Node 1 is standby, with status 2
> > > >
> > > > If Postgresql service is stopped in node 0, then:
> > > > node 0 is standby, with status 3
> > > > node 1 is primary, with status 2. (failover)
> > > >
> > > > Then, Postgresql service is started in node 0.
> > > > node 0 is standby, with status 3
> > > > node 1 is primary, with status 2
> > > >
> > > > Then, I attach node 0 using pcp_attach_node command.
> > > > node 0 is primary, with status 2.
> > > > node 1 is standby, with status 2.
> > > > Node 0 was changed to primary and node 1 was changed to standby. Why
> ?
> > > Do I
> > > > have any error in my setup?
> > > > I think the correct result should be:
> > > > node 0 is standby, with status 2
> > > > node 1 is primary, with status 2
> > > >
> > > > I have repeated previous steps with pgpool 3.4.12, 3,4.11, 3.4.10 and
> > > 3.4.9
> > > > with same configuration and same server. I get same results.
> > > > Also, I have repeated step with pgpool 3.6.6 and I get same results.
> > > >
> > > > Some log lines during fallback
> > > >
> > > > Oct 20 13:41:03 localhost pgpool[9687]: [128-1] 2017-10-20 13:41:03:
> pid
> > > > 9687: LOG:  received failback request for node_id: 0 from pid [9687]
> > > > Oct 20 13:41:03 localhost pgpool[4913]: [255-1] 2017-10-20 13:41:03:
> pid
> > > > 4913: LOG:  watchdog notifying to start interlocking
> > > > Oct 20 13:41:03 localhost pgpool[4913]: [256-1] 2017-10-20 13:41:03:
> pid
> > > > 4913: LOG:  starting fail back. reconnect host 192.168.0.136(5432)
> > > > Oct 20 13:41:03 localhost pgpool[4913]: [257-1] 2017-10-20 13:41:03:
> pid
> > > > 4913: LOG:  Node 1 is not down (status: 2)
> > > > Oct 20 13:41:04 localhost pgpool[4913]: [258-1] 2017-10-20 13:41:04:
> pid
> > > > 4913: LOG:  Do not restart children because we are failbacking node
> id 0
> > > > host: 192.168.0.136 port: 5432 and we are in streaming replication
> mode
> > > and
> > > > not all backends were down
> > > > Oct 20 13:41:04 localhost pgpool[4913]: [259-1] 2017-10-20 13:41:04:
> pid
> > > > 4913: LOG:  find_primary_node_repeatedly: waiting for finding a
> primary
> > > node
> > > > Oct 20 13:41:04 localhost pgpool[4913]: [260-1] 2017-10-20 13:41:04:
> pid
> > > > 4913: LOG:  find_primary_node: checking backend no 0
> > > > Oct 20 13:41:04 localhost pgpool[4913]: [260-2]
> > > > Oct 20 13:41:04 localhost pgpool[4913]: [261-1] 2017-10-20 13:41:04:
> pid
> > > > 4913: LOG:  find_primary_node: primary node id is 0
> > > > Oct 20 13:41:04 localhost pgpool[4913]: [262-1] 2017-10-20 13:41:04:
> pid
> > > > 4913: LOG:  watchdog notifying to end interlocking
> > > > Oct 20 13:41:04 localhost pgpool[4913]: [263-1] 2017-10-20 13:41:04:
> pid
> > > > 4913: LOG:  failover: set new primary node: 0
> > > > Oct 20 13:41:04 localhost pgpool[4913]: [264-1] 2017-10-20 13:41:04:
> pid
> > > > 4913: LOG:  failover: set new master node: 0
> > > > Oct 20 13:41:04 localhost pgpool[4913]: [265-1] 2017-10-20 13:41:04:
> pid
> > > > 4913: LOG:  failback done. reconnect host 192.168.0.136(5432)
> > > > Oct 20 13:41:04 localhost pgpool[9688]: [194-1] 2017-10-20 13:41:04:
> pid
> > > > 9688: LOG:  worker process received restart request
> > > > Oct 20 13:41:05 localhost pgpool[9687]: [129-1] 2017-10-20 13:41:05:
> pid
> > > > 9687: LOG:  restart request received in pcp child process
> > > > Oct 20 13:41:05 localhost pgpool[4913]: [266-1] 2017-10-20 13:41:05:
> pid
> > > > 4913: LOG:  PCP child 9687 exits with status 256 in failover()
> > > > Oct 20 13:41:05 localhost pgpool[4913]: [267-1] 2017-10-20 13:41:05:
> pid
> > > > 4913: LOG:  fork a new PCP child pid 10410 in failover()
> > > > Oct 20 13:41:05 localhost pgpool[4913]: [268-1] 2017-10-20 13:41:05:
> pid
> > > > 4913: LOG:  worker child process with pid: 9688 exits with status 256
> > > > Oct 20 13:41:05 localhost pgpool[4913]: [269-1] 2017-10-20 13:41:05:
> pid
> > > > 4913: LOG:  fork a new worker child process with pid: 10411
> > > > Oct 20 13:41:10 localhost pgpool[9692]: [202-1] 2017-10-20 13:41:10:
> pid
> > > > 9692: LOG:  selecting backend connection
> > > > Oct 20 13:41:10 localhost pgpool[9692]: [202-2] 2017-10-20 13:41:10:
> pid
> > > > 9692: DETAIL:  failback event detected, discarding existing
> connections
> > > >
> > > > Kind regards
> > >
> > >
> > > --
> > > Bo Peng <pengbo at sraoss.co.jp>
> > > SRA OSS, Inc. Japan
> > >
> > >
>
>
> --
> Bo Peng <pengbo at sraoss.co.jp>
> SRA OSS, Inc. Japan
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20171024/1db5d063/attachment.html>