[pgpool-general: 1980] trouble with recovery of a downed node
sean at compusult.net
Fri Aug 2 04:11:40 JST 2013
I am having difficulty achieving a consistent database recovery in my
three-PostgreSQL, two-pgpool setup (native replication). In a previous
post I mentioned that I often get inconsistent counts of updated rows
for a table that is probably being updated during the recovery.
The technique I'm using is a daily backup with continuous archiving as
As a result my setup has no recovery_1st_stage_command,
pgpool_recovery_pitr as the second phase command, and my
pgpool_remote_start takes care of the complete recovery and restart as
in that blog post.
I would include my exact scripts but I don't want to be one of those
people who say "here's my setup, now fix it". :-) Instead I'd like to
better understand the recovery phases and how they impact database
availability. Here is the order of things as I understand it:
1) Start recovery by running the stage 1 script. Existing and new
connections, and all queries and updates, are allowed. Updates are
written to the WAL.
2) Stop accepting connections and queries. (Question: Are existing
connections terminated at this point?)
3) Run the stage 2 script. pgpool_recovery_pitr does
pgpool_switch_xlog() as recommended, so all WAL are archived by the time
4) Run pgpool_remote_start.
a) This copies the base backup and archived WALs, and writes
b) Since in postgresql.conf I have hot_standby=off, connections
are blocked until recovery completes.
c) My pgpool_remote_start starts PostgreSQL on the recovered node
in a synchronous fashion, so it does not terminate until the PostgreSQL
startup is finished.
5) Connections are allowed once more.
Is this correct? With this flow I can't see how the newly recovered
node could be out of sync with the master. But the cluster behaves as
if an update has not been recorded in the archived WAL, or an update
took place on the master and the other slave somewhere between step 2
and step 5.
pgpool-II's logs of the recovery look correct:
2013-08-01 16:16:09 LOG: pid 25265: send_failback_request: fail back 1
th node request from pid 25265
2013-08-01 16:16:09 LOG: pid 25257: wd_start_interlock: start interlocking
2013-08-01 16:16:09 LOG: pid 25265: wd_send_response:
WD_STAND_FOR_LOCK_HOLDER received it
2013-08-01 16:16:09 LOG: pid 25257: starting fail back. reconnect host
2013-08-01 16:16:11 LOG: pid 25257: Restart all children
2013-08-01 16:16:11 LOG: pid 25257: wd_end_interlock: end interlocking
2013-08-01 16:16:12 LOG: pid 25257: failover: set new primary node: -1
2013-08-01 16:16:12 LOG: pid 25257: failover: set new master node: 0
2013-08-01 16:16:12 LOG: pid 6894: worker process received restart request
2013-08-01 16:16:12 LOG: pid 25257: failback done. reconnect host
2013-08-01 16:16:13 LOG: pid 6893: pcp child process received restart
2013-08-01 16:16:13 LOG: pid 25257: PCP child 6893 exits with status
256 in failover()
2013-08-01 16:16:13 LOG: pid 25257: fork a new PCP child pid 7374 in
2013-08-01 16:16:13 LOG: pid 25257: worker child 6894 exits with
2013-08-01 16:16:13 LOG: pid 25257: fork a new worker child pid 7375
But no matter how I do it, I get something like this after running my
app for a while:
2013-08-01 17:08:28 ERROR: pid 6597: pgpool detected difference of the
number of inserted, updated or deleted tuples. Possible last query was:
" UPDATE ws_cached_searches SET ... "
2013-08-01 17:08:28 LOG: pid 6597: CommandComplete: Number of affected
tuples are: 1 0 1
2013-08-01 17:08:28 LOG: pid 6597: ReadyForQuery: Degenerate backends: 1
2013-08-01 17:08:28 LOG: pid 6597: ReadyForQuery: Number of affected
tuples are: 1 0 1
2013-08-01 17:08:28 LOG: pid 25257: starting degeneration. shutdown
It varies whether the complaint is for the node just recovered, or the
I feel like I'm missing something critical in the backup or recovery
process, but it is eluding me. Can anyone help?
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 275 bytes
Desc: not available
More information about the pgpool-general