[pgpool-hackers: 4392] Re: Proposal: promoting standby which is most up to date to become the new Primary

Wed Aug 30 12:11:05 JST 2023

Ishii-san,

Thank you for your comments.

> (a) Track replication delay and keep it.
> 
> At the time when failover script gets called, apparently we cannot
> collect replication delay information since the existing primary
> already went down. Probably we have to save the last replication delay
> information collected by streaming replication check process to
> somewhere (probably to shared memory).

SR worker process checkes periodically the standby delay
and save it as "BACKEND_INFO(i).standby_delay".
I think we can use it, isn't it?

On Tue, 29 Aug 2023 20:24:42 +0900 (JST)
Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> Hi,
> 
> > Hi,
> > 
> > Currently, pgpool promotes the standby node with minimal node id to become the new primary.
> 
> Actually that's not pgpool's decision, but the failover script's
> choice.
> 
> > If there is replication lag between primary and the standby node,
> > it may cause data loss after failover.
> 
> Yeah, that's not good.
> 
> > I propose to promote the standby, which is most up to date, to become the new Primary.
> > This feature is able to prevent data loss.
> > 
> > I am not sure which senario is better. Do you have any ideas?
> > 
> > 1. Change get_next_main_node() to return the standby which is most up to date
> 
> I suggest not to change get_next_main_node(). As I already said
> above, it's not pgpool but the failover script that promotes the
> main node. For the purpose (promoting the least delay standby node),
> separate information should be provided.
> 
> > 2. Add a new PCP command to return the standby which is most up to date.
> >    Modify the failover sample script to run the PCP command and set the result to NEW_MAIN_NODE_ID.
> 
> Let's step back. I think for the purpose we need to implement two
> infrastructure.
> 
> (a) Track replication delay and keep it.
> 
> At the time when failover script gets called, apparently we cannot
> collect replication delay information since the existing primary
> already went down. Probably we have to save the last replication delay
> information collected by streaming replication check process to
> somewhere (probably to shared memory).
> 
> (b) Pass the replication delay info to failover script.
> 
> Once the information is collected in (a), it needs be handed to the
> failover script so that the script can choose the appropriate standby
> node to promote. Inventing new PCP command is an idea. Another idea is
> adding new parameter to the failover script. For example, we could
> have new parameter: %l which is a string representing the least delay
> standby node information. Example:
> 
> 1 host1 5432 /usr/local/pgsql/data
> 
> which means:
> 
> node id = 1
> hostname = host1
> port number = 5432
> database cluster path = /usr/local/pgsql/data
> 
> What do you think?
> --
> Tatsuo Ishii
> SRA OSS LLC
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp

-- 
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS LLC
TEL: 03-5979-2701 FAX: 03-5979-2702
URL: https://www.sraoss.co.jp/