[pgpool-general: 1324] Re: hot recovery of a node

Tatsuo Ishii ishii at postgresql.org
Thu Jan 17 12:06:50 JST 2013


> Hello
> 
> I hope this is the place whee I can ask general questions about pgpool. I
> could not find a forum. Otheriwse, please let me know where I should go to
> get help

Yes, this is the perfect place to ask general questions about
pgpool(there's no forum).

> Here is my question:
> 
> I am trying to setup pgpool II with 2 postgres nodes using replication and
> load balancing. I have setup the watch dog to switch the virtual IP
> automatically in case of failure. The 2 postgres nodes run on 2 identical
> computers.
> 
> Everything works fine except for the recovery which I dont quite undestand.
> I have installed and configured the scripts:
> 
> basebackup.sh
> pgpool_remote_start
> pgpool_recovery_pitr
> 
> 
> Then I simulate a failure : I shutdown one of the postgres nodes. This gets
> detected fine and pgpool is still functional with the remaining node (which
> can still accept connections).
> 
> When I try to recover, it does not work as I expected. I thought that we
> could recover *while pgpool keeps accepting requests*. (at least for stage
> 1 of the recover). So, I simulated this by inserting a new line in a table
> of my database just before the end of the basebackup.sh script  (just *after
> * the rsync and *before* the pg_stop_backup() ).
> 
> So, my basebackup.sh script looks like:
> ~~~~~~~~~~
> 
> ....
> 
> *psql -U postgres -p 5432 -c "insert into users values (3000, 'def1');"
> myDB      # this line was added to basebackup.sh to simulate a transaction
> during a recover, stage 1      *
> 
> psql -U postgres -p 5432 -c "SELECT pg_stop_backup()" postgres
> ~~~~~~~~~~~~
> 
> I then try to recover my failed node with:
> 
> /usr/local/bin/pcp_recovery_node -d 15 192.168.0.110 9898 pgpool pgpool 1
> 
> Which succeeds. However, when I look at the content of the databases, only
> the master database has the new line  "3000  test" (purposely added in my
> basebackup.sh script). The recovered database has all other lines but not
> this one. So the recovery is not complete in my opinion (or I dont
> understand what it is supposed to do).
> 
> Am I wrong in expecting that this should work (i.e. after the recover, the
> 2 database should be completely in sync) ?

Works fine for me. I guess you have problem with archive recovery
setting of PostgreSQL.

Below is my recovered PostgreSQL log.  Do you find something like
"restored log file "000000030000000100000021" from archive" in your
PostgreSQL log(in your case DB node 1)?

31055 2013-01-17 11:38:14 JST LOG:  database system was interrupted; last known up at 2013-01-17 11:38:11 JST
31055 2013-01-17 11:38:14 JST LOG:  creating missing WAL directory "pg_xlog/archive_status"
cp: cannot stat `/home/t-ishii/work/git.postgresql.org/test.repli/archivedir/00000003.history': No such file or directory
31055 2013-01-17 11:38:14 JST LOG:  starting archive recovery
31055 2013-01-17 11:38:14 JST LOG:  restored log file "000000030000000100000020" from archive
31055 2013-01-17 11:38:14 JST LOG:  redo starts at 1/20000020
31055 2013-01-17 11:38:14 JST LOG:  consistent recovery state reached at 1/21000000
31053 2013-01-17 11:38:14 JST LOG:  database system is ready to accept read only connections
31055 2013-01-17 11:38:14 JST LOG:  restored log file "000000030000000100000021" from archive
cp: cannot stat `/home/t-ishii/work/git.postgresql.org/test.repli/archivedir/000000030000000100000022': No such file or directory
31055 2013-01-17 11:38:14 JST LOG:  could not open file "pg_xlog/000000030000000100000022" (log file 1, segment 34): No such file or directory
31055 2013-01-17 11:38:14 JST LOG:  redo done at 1/21000E48
31055 2013-01-17 11:38:14 JST LOG:  last completed transaction was at log time 2013-01-17 11:38:12.376859+09
31055 2013-01-17 11:38:14 JST LOG:  restored log file "000000030000000100000021" from archive
cp: cannot stat `/home/t-ishii/work/git.postgresql.org/test.repli/archivedir/00000004.history': No such file or directory
31055 2013-01-17 11:38:14 JST LOG:  selected new timeline ID: 4
cp: cannot stat `/home/t-ishii/work/git.postgresql.org/test.repli/archivedir/00000003.history': No such file or directory
31055 2013-01-17 11:38:15 JST LOG:  archive recovery complete
31053 2013-01-17 11:38:15 JST LOG:  database system is ready to accept connections
31067 2013-01-17 11:38:15 JST LOG:  autovacuum launcher started
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


More information about the pgpool-general mailing list