[pgpool-general: 8646] Re: online recovery causes node to loose replication state.

Todd Stein todd.stein at microfocus.com
Thu Mar 16 20:36:11 JST 2023


Yes, I'll send them.  It may take a little while to clean up the hostnames and addresses.

-----Original Message-----
From: pgpool-general <pgpool-general-bounces at pgpool.net> On Behalf Of Tatsuo Ishii
Sent: Wednesday, March 15, 2023 11:02 PM
To: Todd Stein <todd.stein at microfocus.com>
Cc: pgpool-general at pgpool.net
Subject: [pgpool-general: 8644] Re: online recovery causes node to loose replication state.

To investigate the issue we need pgpool.conf, postgresql.conf and PostgreSQL logs. Can you share? (of course you can hide password etc. sensitive information).

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.sraoss.co.jp%2Findex_en%2F&data=05%7C01%7Ctodd.stein%40microfocus.com%7C7164eefacc3a4e5fa98308db25cac2ca%7C856b813c16e549a585ec6f081e13b527%7C0%7C0%7C638145325010596426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ed6j%2BZPq1ItqbI0YRdQ7wtDR4WxvZJsvydWxwe%2BQqm0%3D&reserved=0
Japanese:https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.sraoss.co.jp%2F&data=05%7C01%7Ctodd.stein%40microfocus.com%7C7164eefacc3a4e5fa98308db25cac2ca%7C856b813c16e549a585ec6f081e13b527%7C0%7C0%7C638145325010596426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZB62Fo7jrVxzgPkwYvcOgWs5zkJ5kXQxf2VWtgFQLfE%3D&reserved=0


> Hi All,
> I'm running a 3 node cluster based on PostgreSQL 14 and pgpool 4.4.1.  I'm having an issue while testing failover scenarios.
> 
> At start of scenario:
> 
> node_id |           hostname           | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
> ---------+------------------------------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> 0       | catvmtspg04a.domain.net | 5432 | up     | up        | 0.333333  | primary | primary | 0          | true              | 0                 |                   |                        | 2023-03-15 12:56:31
> 1       | catvmtspg04b.domain.net | 5432 | up     | up        | 0.333333  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2023-03-15 12:56:38
> 2       | catvmtspg04c.domain.net | 5432 | up     | up        | 0.333333  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2023-03-15 13:46:29
> (3 rows)
> 
> After stopping postgres on node0
> 
> node_id |           hostname           | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
> ---------+------------------------------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> 0       | catvmtspg04a.domain.net | 5432 | down   | down      | 0.333333  | standby | unknown | 0          | false             | 0                 |                   |                        | 2023-03-15 12:34:05
> 1       | catvmtspg04b.domain.net | 5432 | up     | up        | 0.333333  | primary | primary | 0          | true              | 0                 |                   |                        | 2023-03-15 12:34:05
> 2       | catvmtspg04c.domain.net | 5432 | up     | up        | 0.333333  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2023-03-15 12:34:09
> (3 rows)
> 
> After running recovery on node 0
> pcp_recovery_node -h 10.78.121.3 -p 9898 -U postgres -n 0
> node_id |           hostname           | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
> ---------+------------------------------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> 0       | catvmtspg04a.domain.net | 5432 | up     | up        | 0.333333  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2023-03-15 12:55:59
> 1       | catvmtspg04b.domain.net | 5432 | up     | up        | 0.333333  | primary | primary | 0          | true              | 0                 |                   |                        | 2023-03-15 12:34:05
> 2       | catvmtspg04c.domain.net | 5432 | up     | up        | 0.333333  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2023-03-15 12:34:09
> (3 rows)
> 
> After promoting node1 so that node0 becomes primary again:
> pcp_promote_node -s -n 0 -U pgpool
> Note that Ioose the replication_state and replication_sync_state on node2, would like to know why this happens.
> node_id |           hostname           | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
> ---------+------------------------------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> 0       | catvmtspg04a.domain.net | 5432 | up     | up        | 0.333333  | primary | primary | 0          | true              | 0                 |                   |                        | 2023-03-15 12:56:31
> 1       | catvmtspg04b.domain.net | 5432 | up     | up        | 0.333333  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2023-03-15 12:56:38
> 2       | catvmtspg04c.domain.net | 5432 | up     | up        | 0.333333  | standby | standby | 0          | false             | 0                 |                   |                        | 2023-03-15 12:56:38
> (3 rows)
> 
> My process to get the cluster back to normal.  Don't understand why the recovery process is failing...  Eventually, it works...
> pcp_recovery_node -h 10.78.121.3 -p 9898 -U postgres -n 2
> 
> ERROR:  recovery is checking if postmaster is started
> DETAIL:  postmaster on hostname:"catvmtspg04c.domain.net" 
> database:"template1" user:"postgres" failed to start in 0 second
> 
> [postgres at catvmtspg04a pgpool-II]$ pcp_recovery_node -h 10.78.121.3 -p 
> 9898 -U postgres -n 2
> Password:
> ERROR:  recovery is checking if postmaster is started
> DETAIL:  postmaster on hostname:"catvmtspg04c.domain.net" 
> database:"template1" user:"postgres" failed to start in 0 second
> 
> [postgres at catvmtspg04a pgpool-II]$ pcp_recovery_node -h 10.78.121.3 -p 
> 9898 -U postgres -n 2
> Password:
> pcp_recovery_node -- Command Successful cluster back to normal.
> [postgres at catvmtspg04a pgpool-II]$ psql -h 10.78.121.3 -p 9999 -U pgpool postgres -c "show pool_nodes"
> Password for user pgpool:
> node_id |           hostname           | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
> ---------+------------------------------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> 0       | catvmtspg04a.domain.net | 5432 | up     | up        | 0.333333  | primary | primary | 0          | true              | 0                 |                   |                        | 2023-03-15 12:56:31
> 1       | catvmtspg04b.domain.net | 5432 | up     | up        | 0.333333  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2023-03-15 12:56:38
> 2       | catvmtspg04c.domain.net | 5432 | up     | up        | 0.333333  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2023-03-15 13:46:29
> (3 rows)
> 
> 
> Regards,
> 
> Todd Stein
> OpsBridge Technical Success
> OpenText
> (Cell) +1 (941) 248-8752
> tstein2 at opentext.com<mailto:tstein2 at opentext.com>
> 
_______________________________________________
pgpool-general mailing list
pgpool-general at pgpool.net
https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.pgpool.net%2Fmailman%2Flistinfo%2Fpgpool-general&data=05%7C01%7Ctodd.stein%40microfocus.com%7C7164eefacc3a4e5fa98308db25cac2ca%7C856b813c16e549a585ec6f081e13b527%7C0%7C0%7C638145325010596426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=YbceillrVoA3t3lz%2B195ns87O%2Fq4GO1ENCSFFxhN2xg%3D&reserved=0


More information about the pgpool-general mailing list