[pgpool-general: 8247] Re: pcp_recovery_node command fails

Fri Jun 24 23:00:21 JST 2022

Looks like I have the same extension.
postgres=# \dx pgpool_recovery
                          List of installed extensions
      Name       | Version | Schema |                Description
-----------------+---------+--------+-------------------------------------------
 pgpool_recovery | 1.4     | public | recovery functions for pgpool-II for V4.3
(1 row)

postgres=#

I want to run pcp_recovery_node command
/usr/bin/pcp_recovery_node -d -U postgres -h 16.78.121.246 -p 9898 -n 0

AFAIK the first step (stage) in the pcp_recovery_node process is to run the following:
recovery_1st_stage_command = '/var/lib/pgsql/12/data/recovery_1st_stage'
then the pgpool_remote_start script is run.

When the pcp_recovery_node command is run, it recieves the following list of arguments:
	PRIMARY_NODE_PGDATA=/var/lib/pgsql/12/data %R
	DEST_NODE_HOST=catvmdxcpg12a.ftc.hpeswlab.net %h
	DEST_NODE_PGDATA=/var/lib/pgsql/12/data %D
	PRIMARY_NODE_PORT=5432 %r
	DEST_NODE_ID=0 %d
	DEST_NODE_PORT=5432 %p
	PRIMARY_NODE_HOST=catvmdxcpg12b.ftc.hpeswlab.net %H

When the pgpool_remote_start script is run, it recieves the following list of arguments:
	DEST_NODE_HOST=catvmdxcpg12a.ftc.hpeswlab.net %h
	DEST_NODE_PGDATA=/var/lib/pgsql/12/data %D

When I run /usr/bin/pcp_recovery_node, the following error is sent to stdout.
	-bash-4.2$ /usr/bin/pcp_recovery_node -U postgres -h 16.78.121.246 -p 9898 -n 0
	Password:
	ERROR:  executing recovery, execution of command failed at "1st stage"
	DETAIL:  command:"recovery_1st_stage"

However, if I run the two scripts manually with the arguments, the process works.

-bash-4.2$ $PGDATA/recovery_1st_stage /var/lib/pgsql/12/data catvmdxcpg12a.ftc.hpeswlab.net /var/lib/pgsql/12/data 5432 0 9898 catvmdxcpg12b.ftc.hpeswlab.net
+ MAIN_NODE_PGDATA=/var/lib/pgsql/12/data
+ DEST_NODE_HOST=catvmdxcpg12a.ftc.hpeswlab.net
+ DEST_NODE_PGDATA=/var/lib/pgsql/12/data
+ MAIN_NODE_PORT=5432
+ DEST_NODE_ID=0
+ DEST_NODE_PORT=9898
+ MAIN_NODE_HOST=catvmdxcpg12b.ftc.hpeswlab.net
+ PGHOME=/usr/pgsql-12
+ ARCHIVEDIR=/var/lib/pgsql/archivedir
+ REPLUSER=repl
+ MAX_DURATION=60
+ echo recovery_1st_stage: start: pg_basebackup for Standby node 0
recovery_1st_stage: start: pg_basebackup for Standby node 0
...
...
recovery_1st_stage: end: recovery_1st_stage is completed successfully
+ exit 0

Next, manually run pgpool_remote_start:
-bash-4.2$ $PGDATA/pgpool_remote_start catvmdxcpg12a.ftc.hpeswlab.net /var/lib/pgsql/12/data
+ DEST_NODE_HOST=catvmdxcpg12a.ftc.hpeswlab.net
+ DEST_NODE_PGDATA=/var/lib/pgsql/12/data
+ PGHOME=/usr/pgsql-12
+ echo pgpool_remote_start: start: remote start Standby node catvmdxcpg12a.ftc.hpeswlab.net
pgpool_remote_start: start: remote start Standby node catvmdxcpg12a.ftc.hpeswlab.net
+ ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres at catvmdxcpg12a.ftc.hpeswlab.net -i /var/lib/pgsql/.ssh/id_rsa_pgpool ls /tmp
Warning: Permanently added 'catvmdxcpg12a.ftc.hpeswlab.net,16.78.126.184' (ECDSA) to the list of known hosts.
+ '[' 0 -ne 0 ']'
+ ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres at catvmdxcpg12a.ftc.hpeswlab.net -i /var/lib/pgsql/.ssh/id_rsa_pgpool '
    /usr/pgsql-12/bin/pg_ctl -l /dev/null -w -D /var/lib/pgsql/12/data start
'
Warning: Permanently added 'catvmdxcpg12a.ftc.hpeswlab.net,16.78.126.184' (ECDSA) to the list of known hosts.
waiting for server to start.... done
server started
+ '[' 0 -ne 0 ']'
+ echo pgpool_remote_start: end: PostgreSQL on catvmdxcpg12a.ftc.hpeswlab.net is started successfully.
pgpool_remote_start: end: PostgreSQL on catvmdxcpg12a.ftc.hpeswlab.net is started successfully.
+ exit 0
-bash-4.2$

The postgres server did not start! (only according to systemctl status postgresql-12
running pg_ctl status shows that it is running.

Looking at the replication_delay for node 0 shows a value of 67109080.
All of the files in $PGDATA have a very recent time stamp indicating that pg_basebackup had run.

Regards,

Todd Stein

-----Original Message-----
From: Tatsuo Ishii <ishii at sraoss.co.jp> 
Sent: Thursday, June 23, 2022 7:46 PM
To: Todd Stein <todd.stein at microfocus.com>
Cc: jon.schewe at raytheon.com; pgpool-general at pgpool.net
Subject: Re: [pgpool-general: 8244] Re: pcp_recovery_node command fails

> Many responses recommended installing the pgpool_recovery extension, I had done it as part of the install.  My install was done with RPMs.
> 
> ERROR:  extension "pgpool_recovery" already exists
> 2022-06-23 16:29:25.782 EDT [21981] STATEMENT:  CREATE EXTENSION 
> pgpool_recovery; The recovery_1st_stage script came from a sample provided with the RPM version.  The only thing I should need to do with it is to adjust the path of $PGHOME.

It's apparent that the correct version of pgpool_recovery extension was not installed or pgpool_recovery extension was not installed at all. You can check it by following command using psql on the primary
PostgreSQL:

test=# \dx pgpool_recovery
                          List of installed extensions
      Name       | Version | Schema |                Description                
-----------------+---------+--------+-----------------------------------
-----------------+---------+--------+--------
 pgpool_recovery | 1.4     | public | recovery functions for pgpool-II for V4.3
(1 row)

> Regards,
> 
> Todd Stein
> 
> -----Original Message-----
> From: Todd Stein
> Sent: Thursday, June 23, 2022 4:08 PM
> To: Jon SCHEWE <jon.schewe at raytheon.com>; pgpool-general at pgpool.net
> Subject: RE: pcp_recovery_node command fails
> 
> This is the stdout:
> ERROR:  executing recovery, execution of command failed at "1st stage"
> DETAIL:  command:"recovery_1st_stage"
> 
> The pgpool logs don't have much useful info.  Even when I set them to debug, it's not very helpful.
> 
> This seems to be a pretty common issue, lots of people post about the issue, but I've not seen a resolution to it yet.
> 
> the postgres log is actually more useful:
> ERROR:  function pgpool_recovery(unknown, unknown, unknown, unknown, 
> integer, unknown, unknown) does not exist at character 8
> 2022-06-23 16:03:53.740 EDT [25708] HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
> 2022-06-23 16:03:53.740 EDT [25708] STATEMENT:  SELECT 
> pgpool_recovery('recovery_1st_stage', 'nodea', 
> '/var/lib/pgsql/12/data', '5432', 0, '5432', 'nodeb')
> 
> 
> Regards,
> 
> Todd Stein
> 
> -----Original Message-----
> From: pgpool-general <pgpool-general-bounces at pgpool.net> On Behalf Of 
> Jon SCHEWE
> Sent: Thursday, June 23, 2022 3:34 PM
> To: Todd Stein <todd.stein at microfocus.com>; pgpool-general at pgpool.net
> Subject: [pgpool-general: 8242] Re: pcp_recovery_node command fails
> 
>> I'm trying to use pcp_recovery_node for online recovery in a pgpool/postgresql-12 cluster.
>> 
>> My cluster has PostgreSQL 12.8 and pgpool 4.3.2 running on CentOS 7.9 linux.
>> 
>>  
>> 
>> I've tried so many things, I'll not go into those details just yet. 
>> 
>>  
>> 
>> To start with, here is the output of the pcp_recovery_node command:
>> 
>>  
>> 
>> pcp_recovery_node -U postgres -h <VIP> -p 9898 -n 0
>> 
>> Password:
>> 
>> ERROR:  executing recovery, execution of command failed at "1st stage"
>> 
>> DETAIL:  command:"recovery_1st_stage"
>> 
> 
> Do you see anything in your logs about the errors? Usually this is either on stdout from the service or in /var/log/pgpool...
> I'm guessing that your recovery_1st_stage script either isn't defined or isn't doing what you expect.
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general