[pgpool-general: 5783] Re: After failback, standby goes to primary, and primary goes to standy

Sat Oct 21 08:30:32 JST 2017

Hello,

If you want to start node0 (old primary) as standby,
you should use pcp_recovery_node to recovery node0 as standby.

If you just restart node0 after failover without recovery, 
it will run as primary.

On Fri, 20 Oct 2017 18:41:36 +0200
Lucas Luengas <lucasluengas at gmail.com> wrote:

> Hello
> I am testing Pgpool 3.4.13 with Postgresql-9.6, with streaming replication
> and watchdog, on Centos 7. I have two server. Every server has installed
> Pgpool and Postgresql. I have installed pgpool from yum repository.
> 
> Node 0 is primary, with status 2
> Node 1 is standby, with status 2
> 
> If Postgresql service is stopped in node 0, then:
> node 0 is standby, with status 3
> node 1 is primary, with status 2. (failover)
> 
> Then, Postgresql service is started in node 0.
> node 0 is standby, with status 3
> node 1 is primary, with status 2
> 
> Then, I attach node 0 using pcp_attach_node command.
> node 0 is primary, with status 2.
> node 1 is standby, with status 2.
> Node 0 was changed to primary and node 1 was changed to standby. Why ? Do I
> have any error in my setup?
> I think the correct result should be:
> node 0 is standby, with status 2
> node 1 is primary, with status 2
> 
> I have repeated previous steps with pgpool 3.4.12, 3,4.11, 3.4.10 and 3.4.9
> with same configuration and same server. I get same results.
> Also, I have repeated step with pgpool 3.6.6 and I get same results.
> 
> Some log lines during fallback
> 
> Oct 20 13:41:03 localhost pgpool[9687]: [128-1] 2017-10-20 13:41:03: pid
> 9687: LOG:  received failback request for node_id: 0 from pid [9687]
> Oct 20 13:41:03 localhost pgpool[4913]: [255-1] 2017-10-20 13:41:03: pid
> 4913: LOG:  watchdog notifying to start interlocking
> Oct 20 13:41:03 localhost pgpool[4913]: [256-1] 2017-10-20 13:41:03: pid
> 4913: LOG:  starting fail back. reconnect host 192.168.0.136(5432)
> Oct 20 13:41:03 localhost pgpool[4913]: [257-1] 2017-10-20 13:41:03: pid
> 4913: LOG:  Node 1 is not down (status: 2)
> Oct 20 13:41:04 localhost pgpool[4913]: [258-1] 2017-10-20 13:41:04: pid
> 4913: LOG:  Do not restart children because we are failbacking node id 0
> host: 192.168.0.136 port: 5432 and we are in streaming replication mode and
> not all backends were down
> Oct 20 13:41:04 localhost pgpool[4913]: [259-1] 2017-10-20 13:41:04: pid
> 4913: LOG:  find_primary_node_repeatedly: waiting for finding a primary node
> Oct 20 13:41:04 localhost pgpool[4913]: [260-1] 2017-10-20 13:41:04: pid
> 4913: LOG:  find_primary_node: checking backend no 0
> Oct 20 13:41:04 localhost pgpool[4913]: [260-2]
> Oct 20 13:41:04 localhost pgpool[4913]: [261-1] 2017-10-20 13:41:04: pid
> 4913: LOG:  find_primary_node: primary node id is 0
> Oct 20 13:41:04 localhost pgpool[4913]: [262-1] 2017-10-20 13:41:04: pid
> 4913: LOG:  watchdog notifying to end interlocking
> Oct 20 13:41:04 localhost pgpool[4913]: [263-1] 2017-10-20 13:41:04: pid
> 4913: LOG:  failover: set new primary node: 0
> Oct 20 13:41:04 localhost pgpool[4913]: [264-1] 2017-10-20 13:41:04: pid
> 4913: LOG:  failover: set new master node: 0
> Oct 20 13:41:04 localhost pgpool[4913]: [265-1] 2017-10-20 13:41:04: pid
> 4913: LOG:  failback done. reconnect host 192.168.0.136(5432)
> Oct 20 13:41:04 localhost pgpool[9688]: [194-1] 2017-10-20 13:41:04: pid
> 9688: LOG:  worker process received restart request
> Oct 20 13:41:05 localhost pgpool[9687]: [129-1] 2017-10-20 13:41:05: pid
> 9687: LOG:  restart request received in pcp child process
> Oct 20 13:41:05 localhost pgpool[4913]: [266-1] 2017-10-20 13:41:05: pid
> 4913: LOG:  PCP child 9687 exits with status 256 in failover()
> Oct 20 13:41:05 localhost pgpool[4913]: [267-1] 2017-10-20 13:41:05: pid
> 4913: LOG:  fork a new PCP child pid 10410 in failover()
> Oct 20 13:41:05 localhost pgpool[4913]: [268-1] 2017-10-20 13:41:05: pid
> 4913: LOG:  worker child process with pid: 9688 exits with status 256
> Oct 20 13:41:05 localhost pgpool[4913]: [269-1] 2017-10-20 13:41:05: pid
> 4913: LOG:  fork a new worker child process with pid: 10411
> Oct 20 13:41:10 localhost pgpool[9692]: [202-1] 2017-10-20 13:41:10: pid
> 9692: LOG:  selecting backend connection
> Oct 20 13:41:10 localhost pgpool[9692]: [202-2] 2017-10-20 13:41:10: pid
> 9692: DETAIL:  failback event detected, discarding existing connections
> 
> Kind regards

-- 
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS, Inc. Japan