[pgpool-general: 1325] Re: hot recovery of a node

Gilbert Soucy gsoucy at 36pix.com
Thu Jan 17 22:57:54 JST 2013


Thank you for the reply.

In the node 1 log, I dont find a "last completed transaction" line as you
have in your example.

For this to happen, am I right to say that I need to rsync the files in
pg_xlog (node 0) to my archive folder ( /thumbnails/postgres_archives/ ) on
node 1 ? My stage 2 script does not do that right now.

I am not completely comfortable with what the files in pg_xlog contain
exactly, at the various stages. Also, I find it strange that the log show
many "No such file or directory", even in your example.

Thanks

Gilbert

~~~~
LOg file for postgres on node 1 during recovery:

2013-01-16 19:50:20 EST : LOG:  database system was interrupted; last known
up at 2013-01-16 19:50:16 EST
2013-01-16 19:50:20 EST : LOG:  starting archive recovery
2013-01-16 19:50:20 EST : LOG:  restore_command = 'cp
 /thumbnails/postgres_archives/pg_log/%f %p'
2013-01-16 19:50:20 EST : DEBUG:  executing restore command "cp
 /thumbnails/postgres_archives/pg_log/00000002.history
pg_xlog/RECOVERYHISTORY"
cp: cannot stat `/thumbnails/postgres_archives/pg_log/00000002.history': No
such file or directory
2013-01-16 19:50:20 EST : DEBUG:  could not restore file "00000002.history"
from archive: return code 256
2013-01-16 19:50:20 EST : DEBUG:  history of timeline 2 is (i 2 1)
2013-01-16 19:50:20 EST : DEBUG:  executing restore command "cp
 /thumbnails/postgres_archives/pg_log/00000002000000040000008D.00000020.backup
pg_xlog/RECOVERY
HISTORY"
cp: cannot stat
`/thumbnails/postgres_archives/pg_log/00000002000000040000008D.00000020.backup':
No such file or directory
2013-01-16 19:50:20 EST : DEBUG:  could not restore file
"00000002000000040000008D.00000020.backup" from archive: return code 256
2013-01-16 19:50:20 EST : DEBUG:  executing restore command "cp
 /thumbnails/postgres_archives/pg_log/00000002000000040000008D
pg_xlog/RECOVERYXLOG"
cp: cannot stat
`/thumbnails/postgres_archives/pg_log/00000002000000040000008D': No such
file or directory
2013-01-16 19:50:20 EST : DEBUG:  could not restore file
"00000002000000040000008D" from archive: return code 256
2013-01-16 19:50:20 EST : DEBUG:  checkpoint record is at 4/8D000020
2013-01-16 19:50:20 EST : DEBUG:  redo record is at 4/8D000020; shutdown
FALSE
2013-01-16 19:50:20 EST : DEBUG:  next transaction ID: 0/1625; next OID:
17797
2013-01-16 19:50:20 EST : DEBUG:  next MultiXactId: 1; next
MultiXactOffset: 0
2013-01-16 19:50:20 EST : LOG:  automatic recovery in progress
2013-01-16 19:50:20 EST : LOG:  record with zero length at 4/8D000068
2013-01-16 19:50:20 EST : LOG:  redo is not required
2013-01-16 19:50:20 EST : DEBUG:  executing restore command "cp
 /thumbnails/postgres_archives/pg_log/00000002000000040000008D
pg_xlog/RECOVERYXLOG"
cp: cannot stat
`/thumbnails/postgres_archives/pg_log/00000002000000040000008D': No such
file or directory
2013-01-16 19:50:20 EST : DEBUG:  could not restore file
"00000002000000040000008D" from archive: return code 256
2013-01-16 19:50:20 EST : DEBUG:  executing restore command "cp
 /thumbnails/postgres_archives/pg_log/00000003.history
pg_xlog/RECOVERYHISTORY"
2013-01-16 19:50:20 EST : LOG:  restored log file "00000003.history" from
archive
2013-01-16 19:50:20 EST : DEBUG:  executing restore command "cp
 /thumbnails/postgres_archives/pg_log/00000004.history
pg_xlog/RECOVERYHISTORY"
2013-01-16 19:50:20 EST : LOG:  restored log file "00000004.history" from
archive
2013-01-16 19:50:20 EST : DEBUG:  executing restore command "cp
 /thumbnails/postgres_archives/pg_log/00000005.history
pg_xlog/RECOVERYHISTORY"
2013-01-16 19:50:20 EST : LOG:  restored log file "00000005.history" from
archive
2013-01-16 19:50:20 EST : DEBUG:  executing restore command "cp
 /thumbnails/postgres_archives/pg_log/00000006.history
pg_xlog/RECOVERYHISTORY"
cp: cannot stat `/thumbnails/postgres_archives/pg_log/00000006.history': No
such file or directory
2013-01-16 19:50:20 EST : DEBUG:  could not restore file "00000006.history"
from archive: return code 256
2013-01-16 19:50:20 EST : LOG:  selected new timeline ID: 6
2013-01-16 19:50:20 EST : DEBUG:  executing restore command "cp
 /thumbnails/postgres_archives/pg_log/00000002.history
pg_xlog/RECOVERYHISTORY"
cp: cannot stat `/thumbnails/postgres_archives/pg_log/00000002.history': No
such file or directory
2013-01-16 19:50:20 EST : DEBUG:  could not restore file "00000002.history"
from archive: return code 256
2013-01-16 19:50:20 EST : LOG:  archive recovery complete
2013-01-16 19:50:20 EST : DEBUG:  recycled transaction log file
"00000002000000040000008C"
2013-01-16 19:50:20 EST : DEBUG:  transaction ID wrap limit is 2147484295,
limited by database "template1"
2013-01-16 19:50:20 EST : DEBUG:  shmem_exit(0): 3 callbacks to make
2013-01-16 19:50:20 EST : DEBUG:  proc_exit(0): 2 callbacks to make
2013-01-16 19:50:20 EST : DEBUG:  exit(0)
2013-01-16 19:50:20 EST : DEBUG:  shmem_exit(-1): 0 callbacks to make
2013-01-16 19:50:20 EST : DEBUG:  proc_exit(-1): 0 callbacks to make
2013-01-16 19:50:20 EST : DEBUG:  reaping dead processes
2013-01-16 19:50:20 EST : LOG:  autovacuum launcher started
2013-01-16 19:50:20 EST : LOG:  database system is ready to accept
connections


On Wed, Jan 16, 2013 at 10:06 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:

> > Hello
> >
> > I hope this is the place whee I can ask general questions about pgpool. I
> > could not find a forum. Otheriwse, please let me know where I should go
> to
> > get help
>
> Yes, this is the perfect place to ask general questions about
> pgpool(there's no forum).
>
> > Here is my question:
> >
> > I am trying to setup pgpool II with 2 postgres nodes using replication
> and
> > load balancing. I have setup the watch dog to switch the virtual IP
> > automatically in case of failure. The 2 postgres nodes run on 2 identical
> > computers.
> >
> > Everything works fine except for the recovery which I dont quite
> undestand.
> > I have installed and configured the scripts:
> >
> > basebackup.sh
> > pgpool_remote_start
> > pgpool_recovery_pitr
> >
> >
> > Then I simulate a failure : I shutdown one of the postgres nodes. This
> gets
> > detected fine and pgpool is still functional with the remaining node
> (which
> > can still accept connections).
> >
> > When I try to recover, it does not work as I expected. I thought that we
> > could recover *while pgpool keeps accepting requests*. (at least for
> stage
> > 1 of the recover). So, I simulated this by inserting a new line in a
> table
> > of my database just before the end of the basebackup.sh script  (just
> *after
> > * the rsync and *before* the pg_stop_backup() ).
> >
> > So, my basebackup.sh script looks like:
> > ~~~~~~~~~~
> >
> > ....
> >
> > *psql -U postgres -p 5432 -c "insert into users values (3000, 'def1');"
> > myDB      # this line was added to basebackup.sh to simulate a
> transaction
> > during a recover, stage 1      *
> >
> > psql -U postgres -p 5432 -c "SELECT pg_stop_backup()" postgres
> > ~~~~~~~~~~~~
> >
> > I then try to recover my failed node with:
> >
> > /usr/local/bin/pcp_recovery_node -d 15 192.168.0.110 9898 pgpool pgpool 1
> >
> > Which succeeds. However, when I look at the content of the databases,
> only
> > the master database has the new line  "3000  test" (purposely added in my
> > basebackup.sh script). The recovered database has all other lines but not
> > this one. So the recovery is not complete in my opinion (or I dont
> > understand what it is supposed to do).
> >
> > Am I wrong in expecting that this should work (i.e. after the recover,
> the
> > 2 database should be completely in sync) ?
>
> Works fine for me. I guess you have problem with archive recovery
> setting of PostgreSQL.
>
> Below is my recovered PostgreSQL log.  Do you find something like
> "restored log file "000000030000000100000021" from archive" in your
> PostgreSQL log(in your case DB node 1)?
>
> 31055 2013-01-17 11:38:14 JST LOG:  database system was interrupted; last
> known up at 2013-01-17 11:38:11 JST
> 31055 2013-01-17 11:38:14 JST LOG:  creating missing WAL directory
> "pg_xlog/archive_status"
> cp: cannot stat `/home/t-ishii/work/
> git.postgresql.org/test.repli/archivedir/00000003.history': No such file
> or directory
> 31055 2013-01-17 11:38:14 JST LOG:  starting archive recovery
> 31055 2013-01-17 11:38:14 JST LOG:  restored log file
> "000000030000000100000020" from archive
> 31055 2013-01-17 11:38:14 JST LOG:  redo starts at 1/20000020
> 31055 2013-01-17 11:38:14 JST LOG:  consistent recovery state reached at
> 1/21000000
> 31053 2013-01-17 11:38:14 JST LOG:  database system is ready to accept
> read only connections
> 31055 2013-01-17 11:38:14 JST LOG:  restored log file
> "000000030000000100000021" from archive
> cp: cannot stat `/home/t-ishii/work/
> git.postgresql.org/test.repli/archivedir/000000030000000100000022': No
> such file or directory
> 31055 2013-01-17 11:38:14 JST LOG:  could not open file
> "pg_xlog/000000030000000100000022" (log file 1, segment 34): No such file
> or directory
> 31055 2013-01-17 11:38:14 JST LOG:  redo done at 1/21000E48
> 31055 2013-01-17 11:38:14 JST LOG:  last completed transaction was at log
> time 2013-01-17 11:38:12.376859+09
> 31055 2013-01-17 11:38:14 JST LOG:  restored log file
> "000000030000000100000021" from archive
> cp: cannot stat `/home/t-ishii/work/
> git.postgresql.org/test.repli/archivedir/00000004.history': No such file
> or directory
> 31055 2013-01-17 11:38:14 JST LOG:  selected new timeline ID: 4
> cp: cannot stat `/home/t-ishii/work/
> git.postgresql.org/test.repli/archivedir/00000003.history': No such file
> or directory
> 31055 2013-01-17 11:38:15 JST LOG:  archive recovery complete
> 31053 2013-01-17 11:38:15 JST LOG:  database system is ready to accept
> connections
> 31067 2013-01-17 11:38:15 JST LOG:  autovacuum launcher started
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20130117/a012b29f/attachment-0001.html>


More information about the pgpool-general mailing list