[pgpool-general: 8693] Fwd: Online recovery - Command failed at 1st stage

Prasanna Venkadesh prasmailme at gmail.com
Thu Mar 30 03:48:04 JST 2023


Postgres version: 12.2
pgpool-II version: 4.1.4 (karasukihoshi)
OS: RHEL 7

I have two instances of Postgres and one instance of pgpool-II running. I
am following the tutorial here to set up HA for postgres.
https://www.pgpool.net/docs/41/en/html/example-cluster.html

I have created the required users (pgpool, repl), enabled passwordless SSH,
configured pgpool to use streaming replication and brought up the primary
postgres server. I am able to connect to pgpool via port 9999 and show
pool_nodes shows both postgres instances.

The primary's status is shown as up and the standby is shown as down as
expected. Following the next steps to setup a standby server using online
recovery mechanism fails with "command failed at 1st stage".

Please find the Postgres log from the primary instance below.
One thing I observed is that the PRIMARY_NODE_HOST value is empty and due
to which  pg_basebackup and psql commands are failing because -h has no
value.

Why would the value for PRIMARY_NODE_HOST go missing while all other
variables have values?

+ PRIMARY_NODE_PGDATA=/var/lib/pgsql/12/data
+ DEST_NODE_HOST=172.16.2.149 <callto:172.16.2.149>
+ DEST_NODE_PGDATA=/var/lib/pgsql/12/data
+ PRIMARY_NODE_PORT=5432
+ DEST_NODE_ID=1
+ DEST_NODE_PORT=15432
+ PRIMARY_NODE_HOST=
+ PGHOME=/var/lib/pgsql/12
+ PGBIN=/usr/pgsql-12
+ ARCHIVEDIR=/var/lib/pgsql/12/archivedir
+ REPLUSER=repl
+ REPL_SLOT_NAME=172_16_2_149
+ POSTGRESQL_STARTUP_USER=postgres
+ SSH_KEY_FILE=id_rsa_pgpool
+ SSH_OPTIONS='-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null
-i ~/.ssh/id_rsa_pgpool'
+ echo recovery_1st_stage: start: pg_basebackup for Standby node 1
recovery_1st_stage: start: pg_basebackup for Standby node 1
+ ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i
'~/.ssh/id_rsa_pgpool' postgres at 172.16.2.149 <callto:172.16.2.149> ls /tmp
Warning: Permanently added '172.16.2.149 <callto:172.16.2.149>' (ECDSA) to
the list of known hosts.
+ '[' 0 -ne 0 ']'
++ /usr/pgsql-12/bin/initdb -V
++ awk '{print $3}'
++ sed 's/\([0-9]*\)[a-zA-Z].*/\1/'
++ sed 's/\..*//'
+ PGVERSION=12
+ '[' 12 -ge 12 ']'
+ RECOVERYCONF=/var/lib/pgsql/12/data/myrecovery.conf
+ /usr/pgsql-12/bin/psql -h -p 5432 postgres -c 'SELECT
pg_create_physical_replication_slot('\''172_16_2_149'\'');'
+ '[' 2 -ne 0 ']'
+ echo ERROR: recovery_1st_stage: create replication slot '"172_16_2_149"'
failed. You may need to create replication slot manually.
ERROR: recovery_1st_stage: create replication slot "172_16_2_149" failed.
You may need to create replication slot manually.
++ echo /var/lib/pgsql/12/data/myrecovery.conf
++ sed -e 's/\//\\\//g'
++ echo /var/lib/pgsql/12/data/myrecovery.conf
++ sed -e 's/\//\\\//g'
+ ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i
'~/.ssh/id_rsa_pgpool' postgres at 172.16.2.149 <callto:172.16.2.149> '

    set -o errexit

    [ -d "/var/lib/pgsql/12/data" ] && rm -rf /var/lib/pgsql/12/data
    [ -d "/var/lib/pgsql/12/archivedir" ] && rm -rf
/var/lib/pgsql/12/archivedir/*

    /usr/pgsql-12/bin/pg_basebackup -h  -U repl -p 5432 -D
/var/lib/pgsql/12/data -X stream

    cat > /var/lib/pgsql/12/data/myrecovery.conf << EOT
primary_conninfo = '\''host= port=5432 user=repl application_name=
172.16.2.149 <callto:172.16.2.149>
passfile='\'''\''/var/lib/pgsql/.pgpass'\'''\'''\''
recovery_target_timeline = '\''latest'\''
restore_command = '\''scp -o StrictHostKeyChecking=no -o
UserKnownHostsFile=/dev/null -i ~/.ssh/id_rsa_pgpool
:/var/lib/pgsql/12/archivedir/%f %p'\''
primary_slot_name = '\''172_16_2_149'\''
EOT

    if [ 12 -ge 12 ]; then
        sed -i -e "\$ainclude_if_exists =
'\''\/var\/lib\/pgsql\/12\/data\/myrecovery.conf'\''"  -e
"/^include_if_exists =
'\''\/var\/lib\/pgsql\/12\/data\/myrecovery.conf'\''/d"
/var/lib/pgsql/12/data/postgresql.conf
        touch /var/lib/pgsql/12/data/standby.signal
    else
        echo "standby_mode = '\''on'\''" >>
/var/lib/pgsql/12/data/myrecovery.conf
    fi
    sed -i -e "s/#*port = .*/port = 15432/" -e "s@#*archive_command =
.*@archive_command = '\''cp \"%p\"
\"/var/lib/pgsql/12/archivedir/%f\"'\''@"
/var/lib/pgsql/12/data/postgresql.conf
'
Warning: Permanently added '172.16.2.149 <callto:172.16.2.149>' (ECDSA) to
the list of known hosts.
pg_basebackup: error: too many command-line arguments (first is "repl")
Try "pg_basebackup --help" for more information.
+ '[' 1 -ne 0 ']'
+ /usr/pgsql-12/bin/psql -h -p 5432 postgres -c 'SELECT
pg_drop_replication_slot('\''172_16_2_149'\'');'
+ '[' 2 -ne 0 ']'
+ echo ERROR: recovery_1st_stage: drop replication slot '"172_16_2_149"'
failed. You may need to drop replication slot manually.
ERROR: recovery_1st_stage: drop replication slot "172_16_2_149" failed. You
may need to drop replication slot manually.
+ echo ERROR: recovery_1st_stage: end: pg_basebackup failed. online
recovery failed
ERROR: recovery_1st_stage: end: pg_basebackup failed. online recovery failed
+ exit 1
2023-03-29 14 <callto:2023-03-29%2014>:11:26.654 EDT [7708] ERROR:
pgpool_recovery failed
2023-03-29 14 <callto:2023-03-29%2014>:11:26.654 EDT [7708] STATEMENT:
SELECT pgpool_recovery('recovery_1st_stage', '172.16.2.149
<callto:172.16.2.149>', '/var/lib/pgsql/12/data', '5432', 1, '15432')



-- 
Thanks & Regards,
Prasanna Venkadesh.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20230330/5447696e/attachment.htm>


More information about the pgpool-general mailing list