[pgpool-general: 7851] Re: PGPOOL II - ERROR: executing recovery, execution of command failed at "1st stage"

Pavel Křehlík krehlik at techsys.cz
Fri Nov 5 16:20:53 JST 2021


> Hello all,
> 
> I use pgpool 4.2.5 on Centos 7 with Postgresql 13.
> I am setting up two nodes according to https://www.pgpool.net/docs/42/en/html/example-cluster.html
> I am at the section 8.2.9.1. Set up PostgreSQL standby server and here I have a problem with pcp_recovery_node.
> For example delegate IP is 192.168.8.139. I need recovery IP '192.168.8.137'
> 
> this is the command:
> pcp_recovery_node -h 192.168.8.139 -p 9898 -U pgpool -n 0 -d
> Password:
> DEBUG: recv: tos="m", len=8
> DEBUG: recv: tos="r", len=21
> DEBUG: send: tos="D", len=6
> DEBUG: recv: tos="E", len=130
> ERROR:  executing recovery, execution of command failed at "1st stage"
> DETAIL:  command:"recovery_1st_stage"
> 
> DEBUG: send: tos="X", len=4
> 
> and here is the log :
> 2021-11-04 10:04:38: pid 28665: LOG:  starting recovering node 0
> 2021-11-04 10:04:38: pid 28665: LOG:  executing recovery
> 2021-11-04 10:04:38: pid 28665: DETAIL:  starting recovery command: "SELECT pgpool_recovery('recovery_1st_stage', '192.168.8.137', '/var/lib/pgsql/13/data', '5432', 0, '5432')"
> 2021-11-04 10:04:38: pid 28665: LOG:  executing recovery
> 2021-11-04 10:04:38: pid 28665: DETAIL:  disabling statement_timeout
> 2021-11-04 10:04:39: pid 28665: ERROR:  executing recovery, execution of command failed at "1st stage"
> 2021-11-04 10:04:39: pid 28665: DETAIL:  command:"recovery_1st_stage"
> 
> I tried run this from PGADMIN: SELECT pgpool_recovery('recovery_1st_stage', '192.168.8.137', '/var/lib/pgsql/13/data', '5432', 0, '5432')
> ERROR: pgpool_recovery failed
> SQL state: XX000
> 
> File recovery_1st_stage is located in /var/lib/pgsql/13/data  directory with permissions 0755 and group/owner postgres/postgres.
> What should be wrong? Could you help me?

To know what's wrong the PostgreSQL log on the primary node (perhaps
node 1?) is necessary. Can you share that?

--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
_______________________________________________
pgpool-general mailing list
pgpool-general at pgpool.net
http://www.pgpool.net/mailman/listinfo/pgpool-general

Hi, 
it looks like some problem with PRIMARY_NODE_HOST, it should be 192.168.8.138 but there is like hostname 'CentOS7-TW-POSTGRES13'. But why?

here is the Postgresql log:

+ PRIMARY_NODE_PGDATA=/var/lib/pgsql/13/data
+ DEST_NODE_HOST=192.168.8.137
+ DEST_NODE_PGDATA=/var/lib/pgsql/13/data
+ PRIMARY_NODE_PORT=5432
+ DEST_NODE_ID=0
+ DEST_NODE_PORT=5432
++ hostname
+ PRIMARY_NODE_HOST=CentOS7-TW-POSTGRES13
+ PGHOME=/usr/pgsql-13
+ ARCHIVEDIR=/var/lib/pgsql/archivedir
+ REPLUSER=repl
+ REPL_SLOT_NAME=192_168_8_137
+ echo recovery_1st_stage: start: pg_basebackup for Standby node 0
recovery_1st_stage: start: pg_basebackup for Standby node 0
+ ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres at 192.168.8.137 -i /var/lib/pgsql/.ssh/id_rsa_pgpool ls /tmp
Warning: Permanently added '192.168.8.137' (ECDSA) to the list of known hosts.
+ '[' 0 -ne 0 ']'
++ awk '{print $3}'
++ sed 's/\..*//'
++ sed 's/\([0-9]*\)[a-zA-Z].*/\1/'
++ /usr/pgsql-13/bin/initdb -V
+ PGVERSION=13
+ '[' 13 -ge 12 ']'
+ RECOVERYCONF=/var/lib/pgsql/13/data/myrecovery.conf
+ /usr/pgsql-13/bin/psql -h CentOS7-TW-POSTGRES13 -p 5432 -c 'SELECT pg_create_physical_replication_slot('\''192_168_8_137'\'');'
2021-11-04 14:30:18.954 CET [22519] FATAL:  no pg_hba.conf entry for host "fe80::1943:60ca:b919:488a%ens192", user "postgres", database "postgres", SSL off
+ '[' 2 -ne 0 ']'
+ echo ERROR: recovery_1st_stage: create replication slot '"192_168_8_137"' failed. You may need to create replication slot manually.
ERROR: recovery_1st_stage: create replication slot "192_168_8_137" failed. You may need to create replication slot manually.
++ echo /var/lib/pgsql/13/data/myrecovery.conf
++ sed -e 's/\//\\\//g'
++ echo /var/lib/pgsql/13/data/myrecovery.conf
++ sed -e 's/\//\\\//g'
+ ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres at 192.168.8.137 -i /var/lib/pgsql/.ssh/id_rsa_pgpool '

    set -o errexit

    rm -rf /var/lib/pgsql/13/data
    rm -rf /var/lib/pgsql/archivedir/*

    /usr/pgsql-13/bin/pg_basebackup -h CentOS7-TW-POSTGRES13 -U repl -p 5432 -D /var/lib/pgsql/13/data -X stream

    cat > /var/lib/pgsql/13/data/myrecovery.conf << EOT
primary_conninfo = '\''host=CentOS7-TW-POSTGRES13 port=5432 user=repl application_name=192.168.8.137 passfile='\'''\''/var/lib/pgsql/.pgpass'\'''\'''\''
recovery_target_timeline = '\''latest'\''
restore_command = '\''scp CentOS7-TW-POSTGRES13:/var/lib/pgsql/archivedir/%f %p'\''
primary_slot_name = '\''192_168_8_137'\''
EOT

    if [ 13 -ge 12 ]; then
        sed -i -e "\$ainclude_if_exists = '\''\/var\/lib\/pgsql\/13\/data\/myrecovery.conf'\''"                -e "/^include_if_exists = '\''\/var\/lib\/pgsql\/13\/data\/myrecovery.conf'\''/d" /var/lib/pgsql/13/data/postgresql.conf
        touch /var/lib/pgsql/13/data/standby.signal
    else
        echo "standby_mode = '\''on'\''" >> /var/lib/pgsql/13/data/myrecovery.conf
    fi

    sed -i "s/#*port = .*/port = 5432/" /var/lib/pgsql/13/data/postgresql.conf
'
Warning: Permanently added '192.168.8.137' (ECDSA) to the list of known hosts.
pg_basebackup: error: could not translate host name "CentOS7-TW-POSTGRES13" to address: Name or service not known
+ '[' 1 -ne 0 ']'
+ /usr/pgsql-13/bin/psql -h CentOS7-TW-POSTGRES13 -p 5432 -c 'SELECT pg_drop_replication_slot('\''192_168_8_137'\'');'
2021-11-04 14:30:19.326 CET [22530] FATAL:  no pg_hba.conf entry for host "fe80::1943:60ca:b919:488a%ens192", user "postgres", database "postgres", SSL off
+ '[' 2 -ne 0 ']'
+ echo ERROR: recovery_1st_stage: drop replication slot '"192_168_8_137"' failed. You may need to drop replication slot manually.
ERROR: recovery_1st_stage: drop replication slot "192_168_8_137" failed. You may need to drop replication slot manually.
+ echo ERROR: recovery_1st_stage: end: pg_basebackup failed. online recovery failed
ERROR: recovery_1st_stage: end: pg_basebackup failed. online recovery failed
+ exit 1
2021-11-04 14:30:19.328 CET [22509] ERROR:  pgpool_recovery failed

Regards
Pavel


More information about the pgpool-general mailing list