View Issue Details

IDProjectCategoryView StatusLast Update
0000128Pgpool-IIBugpublic2015-04-22 21:08
ReporterjotherootAssigned ToMuhammad Usama 
PrioritynormalSeverityminorReproducibilityhave not tried
Status assignedResolutionopen 
PlatformLinuxOSDebianOS Version7.3
Product Version 
Target VersionFixed in Version 
Summary0000128: Error during online recovery
DescriptionHi,

I choose Category Bug but it's possible that it is a user problem ;).

When I perform an online recovery, it failed with the following error on the second node:

Starting PostgreSQL 9.3 database server: mainThe PostgreSQL server failed to start. Please check the log output: 2015-01-06 15:57:56 CET LOG: database system was interrupted while in recovery at log time 2015-01-06 15:36:44 CET 2015-01-06 15:57:56 CET HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. 2015-01-06 15:57:56 CET LOG: starting archive recovery 2015-01-06 15:57:56 CET LOG: restored log file "00000009.history" from archive scp: /var/lib/postgresql/9.3/main/pg_xlog/00000009000000130000003A: No such file or directory 2015-01-06 15:57:56 CET LOG: redo starts at 13/3A000028 2015-01-06 15:57:56 CET LOG: record with zero length at 13/3A07D3F0 2015-01-06 15:57:56 CET LOG: redo done at 13/3A07D3C0 2015-01-06 15:57:56 CET LOG: last completed transaction was at log time 2015-01-06 15:36:52.032545+01 scp: /var/lib/postgresql/9.3/main/pg_xlog/00000009000000130000003A: No such file or directory 2015-01-06 15:57:56 CET FATAL: WAL ends before end of online backup 2015-01-06 15:57:56 CET HINT: Online backup started with pg_start_backup() must be ended with pg_stop_backup(), and all WAL up to that point must be available at recovery. 2015-01-06 15:57:56 CET LOG: startup process (PID 11150) exited with exit code 1 2015-01-06 15:57:56 CET LOG: terminating any other active server processes ... failed!

Some informations:

On the master node, before online recovery

-rw------- 1 postgres postgres 16777216 Jan 6 15:36 000000090000001300000039
-rw------- 1 postgres postgres 16777216 Jan 6 15:36 00000009000000130000003A
-rw------- 1 postgres postgres 16777216 Jan 6 14:56 00000009000000130000003B
-rw------- 1 postgres postgres 16777216 Jan 6 15:02 00000009000000130000003C
-rw------- 1 postgres postgres 16777216 Jan 6 15:10 00000009000000130000003D
-rw------- 1 postgres postgres 16777216 Jan 6 15:17 00000009000000130000003E
-rw------- 1 postgres postgres 16777216 Jan 6 15:24 00000009000000130000003F
-rw------- 1 postgres postgres 16777216 Jan 6 15:31 000000090000001300000040

Still on the master node when it failed

-rw------- 1 postgres postgres 16777216 Jan 6 15:48 00000009000000130000003B
-rw------- 1 postgres postgres 16777216 Jan 6 15:49 00000009000000130000003C
-rw------- 1 postgres postgres 16777216 Jan 6 15:10 00000009000000130000003D
-rw------- 1 postgres postgres 16777216 Jan 6 15:17 00000009000000130000003E
-rw------- 1 postgres postgres 16777216 Jan 6 15:24 00000009000000130000003F
-rw------- 1 postgres postgres 16777216 Jan 6 15:31 000000090000001300000040
-rw------- 1 postgres postgres 16777216 Jan 6 15:36 000000090000001300000041
-rw------- 1 postgres postgres 16777216 Jan 6 15:43 000000090000001300000042

thanks for your answer

best regards,
Additional Informationpgpool version: pgpool-II version 3.3.2 (tokakiboshi)

Postgresql version (on both node): 9.3.3

replication mode: on

copy-base-backup:

#! /bin/sh
DATA=$1
RECOVERY_TARGET=$2
RECOVERY_DATA=$3

psql -c "select pg_start_backup('pgpool-recovery')" postgres
echo "restore_command = 'scp -o StrictHostKeyChecking=no $(hostname):/var/lib/postgresql/9.3/main/pg_xlog/%f %p'" > /var/lib/postgresql/9.3/main/recovery.conf
tar -C /var/lib/postgresql/9.3/ -zcf /var/lib/postgresql/9.3/main.tar.gz main
psql -c 'select pg_stop_backup()' postgres
scp -o StrictHostKeyChecking=no /var/lib/postgresql/9.3/main.tar.gz $RECOVERY_TARGET:/var/lib/postgresql/9.3

pgpool_recovery_pitr:

#! /bin/sh
# Online recovery 2nd stage script
#
datadir=$1 # master dabatase cluster
DEST=$2 # hostname of the DB node to be recovered
DESTDIR=$3 # database cluster of the DB node to be recovered
port=5432 # PostgreSQL port number
archdir=/var/lib/postgresql/9.3/main/pg_xlog/ # archive log directory

# Force to flush current value of sequences to xlog
psql -p $port -t -c 'SELECT datname FROM pg_database WHERE NOT datistemplate AND datallowconn' template1
while read i
do
  if [ "$i" != "" ];then
    psql -p $port -c "SELECT setval(oid, nextval(oid)) FROM pg_class WHERE relkind = 'S'" $i
  fi
done

psql -p $port -c "SELECT pgpool_switch_xlog('$archdir')" template1

pgpool_remote_start:

#!/bin/sh
DEST=$1
DESTDIR=$2
PGCTL=/usr/lib/postgresql/9.3/bin/pg_ctl

# Déploiement du script de backup
ssh -o StrictHostKeyChecking=no -T $DEST 'cd /var/lib/postgresql/9.3/; tar zxf main.tar.gz' 2>/dev/null 1>/dev/null < /dev/null
# Démarrage du serveur PostgreSQL
ssh -o StrictHostKeyChecking=no -T $DEST $PGCTL -w -D /etc/postgresql/9.3/main/ start 2>/dev/null 1>/dev/null < /dev/null &
TagsNo tags attached.

Activities

arnold_s

2015-02-20 01:19

reporter   ~0000522

I don't read your backup script, but I think this isn't a problem of pgpool,
but of wal_keep_segments adjustment on the backend server. You have to keep enough segments. Once they are recycled and you need them...
(If you change to 9.4, you can use also use a replication_slot)

Issue History

Date Modified Username Field Change
2015-01-07 02:26 jotheroot New Issue
2015-02-20 01:19 arnold_s Note Added: 0000522
2015-04-22 21:08 Muhammad Usama Assigned To => Muhammad Usama
2015-04-22 21:08 Muhammad Usama Status new => assigned