[Pgpool-general] Second stage online recovery with PITR problems on pgpool 3.0.3 / postgresql 9.0.4

Toshihiro Kitagawa kitagawa at sraoss.co.jp
Wed Sep 14 04:33:57 UTC 2011


Hi,

> [2011-09-13 12:43:01 CEST]-[]-[31877|] LOG:  invalid record length at 1/21000020

Was your PostgreSQL 9.0.4 built by gcc 4.6.0?

gcc 4.6.0 has the bug which cause this error.
See the following thread for more details:
http://archives.postgresql.org/pgsql-hackers/2011-06/msg00661.php

-- 
Toshihiro Kitagawa
SRA OSS, Inc. Japan

On Tue, 13 Sep 2011 13:12:32 +0200
Nikola Ivačič <nikola.ivacic at gmail.com> wrote:

> I have problem with 2nd. stage online PITR recovery procedure.
> The data received in second stage after base backup and prior to WAL
> switch gets lost.
> 
> I've managed to isolate the problem down to postgresql without the
> pgpool-II running:
> - stop failed node
> //1st stage
> - start backup
> - rsync files to failed node
> - stop backup
> - do intentional insert in master node
> //2nd stage
> - do pg_switch_log (tested also pgpool_xlog_switch with same results)
> - rsync archive WAL files to failed node
> - start failed node
> 
> The failed node starts fine and it does recovery, but for the last WAL
> file it always reports "invalid record length" error, and it returns
> to last known good WAL file (the one created in backup step).
> 
> Log from failed node when I do restore (increasing verbosity reveals
> no more information):
> [2011-09-13 12:42:58 CEST]-[]-[31877|] LOG:  database system was
> interrupted; last known up at 2011-09-13 12:40:46 CEST
> [2011-09-13 12:42:58 CEST]-[]-[31877|] LOG:  creating missing WAL
> directory "pg_xlog/archive_status"
> [2011-09-13 12:42:58 CEST]-[]-[31877|] LOG:  starting archive recovery
> [2011-09-13 12:42:58 CEST]-[postgres]-[31882|] FATAL:  the database
> system is starting up
> [2011-09-13 12:42:59 CEST]-[]-[31877|] LOG:  restored log file
> "000000020000000100000020" from archive
> [2011-09-13 12:42:59 CEST]-[]-[31877|] LOG:  redo starts at 1/20000078
> [2011-09-13 12:42:59 CEST]-[]-[31877|] LOG:  consistent recovery state
> reached at 1/21000000
> [2011-09-13 12:42:59 CEST]-[postgres]-[31886|] FATAL:  the database
> system is starting up
> [2011-09-13 12:43:00 CEST]-[postgres]-[31887|] FATAL:  the database
> system is starting up
> [2011-09-13 12:43:01 CEST]-[]-[31877|] LOG:  restored log file
> "000000020000000100000021" from archive
> [2011-09-13 12:43:01 CEST]-[]-[31877|] LOG:  invalid record length at 1/21000020
> [2011-09-13 12:43:01 CEST]-[]-[31877|] LOG:  redo done at 1/200000A0
> [2011-09-13 12:43:01 CEST]-[postgres]-[31890|] FATAL:  the database
> system is starting up
> [2011-09-13 12:43:01 CEST]-[]-[31877|] LOG:  restored log file
> "000000020000000100000020" from archive
> [2011-09-13 12:43:01 CEST]-[]-[31877|] LOG:  selected new timeline ID: 3
> [2011-09-13 12:43:01 CEST]-[]-[31877|] LOG:  archive recovery complete
> [2011-09-13 12:43:01 CEST]-[]-[31883|] LOG:  checkpoint starting:
> end-of-recovery immediate wait
> [2011-09-13 12:43:02 CEST]-[]-[31883|] LOG:  checkpoint complete:
> wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0
> recycled; write=0.000 s, sync=0.000 s, total=0.659 s
> [2011-09-13 12:43:02 CEST]-[]-[31876|] LOG:  database system is ready
> to accept connections
> [2011-09-13 12:43:02 CEST]-[]-[31896|] LOG:  autovacuum launcher started
> 
> I've done md5sum of 000000020000000100000021 WAL file in archive dir
> on master and target node, and the file is the same on both nodes.
> 
> So my question goes:
> Did I miss something, or did I get the procedure wrong?
> Is online recovery with PITR procedure still valid as it is presented
> in manual?
> Can I replace the pg_switch_xlog with another pg_start_backup and
> pg_stop_backup call and what are performance implications in this
> case?
> 
> Software versions:
> I'm using: PostgreSQL 9.0.4 on both nodes with same OS
> Restore master:
> Linux miho 3.0-ARCH #1 SMP PREEMPT Wed Aug 17 21:55:57 CEST 2011
> x86_64 Intel(R) Core(TM) i7 CPU 930 @ 2.80GHz GenuineIntel GNU/Linux
> Restore target:
> Linux alice 3.0-ARCH #1 SMP PREEMPT Wed Aug 17 21:55:57 CEST 2011
> x86_64 Intel(R) Core(TM) i7 CPU 930 @ 2.80GHz GenuineIntel GNU/Linux
> 
> Thanks for help.
> Nikola
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgpool-general
> 



More information about the Pgpool-general mailing list