[Pgpool-general] Second stage online recovery with PITR problems on pgpool 3.0.3 / postgresql 9.0.4

Wed Sep 14 06:35:45 UTC 2011

Yep ... I've found the same post on the web and upgraded to Postgresql
9.1 (and to pgpool-II-3.1, while I was at it).
Everything works now.

Thanks

On Wed, Sep 14, 2011 at 06:33, Toshihiro Kitagawa <kitagawa at sraoss.co.jp> wrote:
> Hi,
>
>> [2011-09-13 12:43:01 CEST]-[]-[31877|] LOG:  invalid record length at 1/21000020
>
> Was your PostgreSQL 9.0.4 built by gcc 4.6.0?
>
> gcc 4.6.0 has the bug which cause this error.
> See the following thread for more details:
> http://archives.postgresql.org/pgsql-hackers/2011-06/msg00661.php
>
> --
> Toshihiro Kitagawa
> SRA OSS, Inc. Japan
>
> On Tue, 13 Sep 2011 13:12:32 +0200
> Nikola Ivačič <nikola.ivacic at gmail.com> wrote:
>
>> I have problem with 2nd. stage online PITR recovery procedure.
>> The data received in second stage after base backup and prior to WAL
>> switch gets lost.
>>
>> I've managed to isolate the problem down to postgresql without the
>> pgpool-II running:
>> - stop failed node
>> //1st stage
>> - start backup
>> - rsync files to failed node
>> - stop backup
>> - do intentional insert in master node
>> //2nd stage
>> - do pg_switch_log (tested also pgpool_xlog_switch with same results)
>> - rsync archive WAL files to failed node
>> - start failed node
>>
>> The failed node starts fine and it does recovery, but for the last WAL
>> file it always reports "invalid record length" error, and it returns
>> to last known good WAL file (the one created in backup step).
>>
>> Log from failed node when I do restore (increasing verbosity reveals
>> no more information):
>> [2011-09-13 12:42:58 CEST]-[]-[31877|] LOG:  database system was
>> interrupted; last known up at 2011-09-13 12:40:46 CEST
>> [2011-09-13 12:42:58 CEST]-[]-[31877|] LOG:  creating missing WAL
>> directory "pg_xlog/archive_status"
>> [2011-09-13 12:42:58 CEST]-[]-[31877|] LOG:  starting archive recovery
>> [2011-09-13 12:42:58 CEST]-[postgres]-[31882|] FATAL:  the database
>> system is starting up
>> [2011-09-13 12:42:59 CEST]-[]-[31877|] LOG:  restored log file
>> "000000020000000100000020" from archive
>> [2011-09-13 12:42:59 CEST]-[]-[31877|] LOG:  redo starts at 1/20000078
>> [2011-09-13 12:42:59 CEST]-[]-[31877|] LOG:  consistent recovery state
>> reached at 1/21000000
>> [2011-09-13 12:42:59 CEST]-[postgres]-[31886|] FATAL:  the database
>> system is starting up
>> [2011-09-13 12:43:00 CEST]-[postgres]-[31887|] FATAL:  the database
>> system is starting up
>> [2011-09-13 12:43:01 CEST]-[]-[31877|] LOG:  restored log file
>> "000000020000000100000021" from archive
>> [2011-09-13 12:43:01 CEST]-[]-[31877|] LOG:  invalid record length at 1/21000020
>> [2011-09-13 12:43:01 CEST]-[]-[31877|] LOG:  redo done at 1/200000A0
>> [2011-09-13 12:43:01 CEST]-[postgres]-[31890|] FATAL:  the database
>> system is starting up
>> [2011-09-13 12:43:01 CEST]-[]-[31877|] LOG:  restored log file
>> "000000020000000100000020" from archive
>> [2011-09-13 12:43:01 CEST]-[]-[31877|] LOG:  selected new timeline ID: 3
>> [2011-09-13 12:43:01 CEST]-[]-[31877|] LOG:  archive recovery complete
>> [2011-09-13 12:43:01 CEST]-[]-[31883|] LOG:  checkpoint starting:
>> end-of-recovery immediate wait
>> [2011-09-13 12:43:02 CEST]-[]-[31883|] LOG:  checkpoint complete:
>> wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0
>> recycled; write=0.000 s, sync=0.000 s, total=0.659 s
>> [2011-09-13 12:43:02 CEST]-[]-[31876|] LOG:  database system is ready
>> to accept connections
>> [2011-09-13 12:43:02 CEST]-[]-[31896|] LOG:  autovacuum launcher started
>>
>> I've done md5sum of 000000020000000100000021 WAL file in archive dir
>> on master and target node, and the file is the same on both nodes.
>>
>> So my question goes:
>> Did I miss something, or did I get the procedure wrong?
>> Is online recovery with PITR procedure still valid as it is presented
>> in manual?
>> Can I replace the pg_switch_xlog with another pg_start_backup and
>> pg_stop_backup call and what are performance implications in this
>> case?
>>
>> Software versions:
>> I'm using: PostgreSQL 9.0.4 on both nodes with same OS
>> Restore master:
>> Linux miho 3.0-ARCH #1 SMP PREEMPT Wed Aug 17 21:55:57 CEST 2011
>> x86_64 Intel(R) Core(TM) i7 CPU 930 @ 2.80GHz GenuineIntel GNU/Linux
>> Restore target:
>> Linux alice 3.0-ARCH #1 SMP PREEMPT Wed Aug 17 21:55:57 CEST 2011
>> x86_64 Intel(R) Core(TM) i7 CPU 930 @ 2.80GHz GenuineIntel GNU/Linux
>>
>> Thanks for help.
>> Nikola
>> _______________________________________________
>> Pgpool-general mailing list
>> Pgpool-general at pgfoundry.org
>> http://pgfoundry.org/mailman/listinfo/pgpool-general
>>
>
>