[pgpool-general: 9029] Re: Failure to perform failover when primary lost network connectivity

Bo Peng pengbo at sraoss.co.jp
Tue Feb 27 09:30:35 JST 2024


Hi,

Pgpool can handle failover even in case of network failure.

I think this issue may be caused by the setting of follow primary script.

Could you share the following information?

- pgpool.conf
- the scripts specified in failover_command and follow_primary_command
- pgpool logs

On Mon, 26 Feb 2024 10:57:56 +0800
Ang Sei Heng <seiheng at sendquick.com> wrote:

> Hello,
> 
> We recently embark into Pgpool-II as  failover and HA solution for our 
> PostgreSQL database cluster. We use the example here:
> 
> https://www.pgpool.net/docs/latest/en/html/example-configs.html
> 
> The cluster is running smoothly and we are able to perform 
> pcp_recovery_node, pcp_attach_node, pcp_detach_node and pcp_promote_node 
> without issue.
> 
> However, when we perform a test scenario where we need to simulate a 
> lost network connectivity at the primary postgres database node, both 
> standby systems went into shutdown instead of taking over the role of 
> primary.
> 
>  From the logs, we found these:
> 
> 2024-02-26 09:56:55.009 +08:  pid 20700: LOG:  database system was 
> interrupted; last known up at 2024-02-26 09:56:52 +08
> Warning: Permanently added 'pgnode2' (ED25519) to the list of known hosts.
> scp: /opt/postgres/pgpool/data/archivedir/00000039.history: No such file 
> or directory
> 2024-02-26 09:56:55.445 +08:  pid 20700: LOG:  entering standby mode
> Warning: Permanently added 'pgnode2' (ED25519) to the list of known hosts.
> 2024-02-26 09:56:55.573 +08:  pid 20700: LOG:  restored log file 
> "00000038.history" from archive
> Warning: Permanently added 'pgnode2' (ED25519) to the list of known hosts.
> scp: /opt/postgres/pgpool/data/archivedir/0000003800000000000000F2: No 
> such file or directory
> 2024-02-26 09:56:55.759 +08:  pid 20700: LOG:  redo starts at 0/F2000028
> Warning: Permanently added 'pgnode2' (ED25519) to the list of known hosts.
> scp: /opt/postgres/pgpool/data/archivedir/0000003800000000000000F3: No 
> such file or directory
> 2024-02-26 09:56:55.912 +08:  pid 20700: LOG:  consistent recovery state 
> reached at 0/F2000138
> 2024-02-26 09:56:55.912 +08:  pid 20696: LOG:  database system is ready 
> to accept read-only connections
> Warning: Permanently added 'pgnode2' (ED25519) to the list of known hosts.
> scp: /opt/postgres/pgpool/data/archivedir/0000003800000000000000F3: No 
> such file or directory
> 2024-02-26 09:56:56.165 +08:  pid 20720: LOG:  started streaming WAL 
> from primary at 0/F3000000 on timeline 56
> 2024-02-26 09:58:48.097 +08:  pid 20720: FATAL:  terminating walreceiver 
> due to timeout
> ssh: connect to host pgnode2 port 22: No route to host
> scp: Connection closed
> 2024-02-26 09:58:50.684 +08:  pid 20700: FATAL:  could not restore file 
> "00000039.history" from archive: child process exited with exit code 255
> 2024-02-26 09:58:50.685 +08:  pid 20696: LOG:  startup process (PID 
> 20700) exited with exit code 1
> 2024-02-26 09:58:50.685 +08:  pid 20696: LOG:  terminating any other 
> active server processes
> 2024-02-26 09:58:50.685 +08:  pid 20696: LOG:  shutting down due to 
> startup process failure
> 2024-02-26 09:58:50.687 +08:  pid 20696: LOG:  database system is shut down
> 
> Here, the pgnode2 was the primary database node, when we disconnected 
> its network connectivity, both standby nodes showed the logs above. It 
> seems, when the system was trying to perform failover, it tried to copy 
> the some files from primary node, pgnode2, which no longer accessible. 
> This cause the failover to fail and resulting the system not able to start.
> 
> We are not able to run pg_ctl start in both standby nodes.
> 
> Our questions here:
> 
> 1) Does this means, in network failure, Pgpool-II will not be able to 
> handle it?
> 
> 2) Is it possible to disable recovery process of copying files from dead 
> primary node in the event of network failure (for example, a sudden 
> power failure)?
> 
> 
> Any advice is welcome.
> 
> 
> 
> Thanks,
> Sei Heng
> 
> 
> -- 
> 
> SendQuick Pte Ltd
> 76 Playfair Road, #08-01,
> LHK 2 Building, Singapore 367996
> Tel: +65 6280 2881
> Fax: +65 6280 6882
> 
> CONFIDENTIALITY NOTE:
> This email and any files transmitted with it is intended only for the use
> of the person(s) to whom it is addressed,and may contain information that
> is privileged, confidential and exempt from disclosure under applicable
> law. If you are not the intended recipient, please immediately notify the
> sender and delete the email. If you are not the intended recipient please
> do not disclose, copy, distribute or take any action in reliance on the
> contents of this e-mail. Thank you.
> 
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
> 
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general


-- 
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS LLC
TEL: 03-5979-2701 FAX: 03-5979-2702
URL: https://www.sraoss.co.jp/


More information about the pgpool-general mailing list