[pgpool-general: 2638] pcp_recovery_node failing in stage 2
Sean Hogan
sean at compusult.net
Fri Mar 21 00:18:05 JST 2014
Hi,
In my setup at the moment I have a pair of version 3.3.2 pgpool
instances with two backend PostgreSQL 9.2.4 servers, all running on
CentOS 6.4. The PostgreSQL data directories are quite large - 144GB. I
have run into a situation where pcp_recovery_node consistently fails
with a BackendError.
The stage 1 recovery command is a script called do-base-backup.sh that
runs an rsync as follows:
rsync -Cacvv --delete \
--exclude postmaster.pid --exclude postmaster.opts \
--exclude recovery.done \
--exclude pg_log/\* --exclude pg_xlog/\* \
$SOURCE/ $DESTINATION/ 2>&1 |
mailx -s "rsync verbose output" sean at compusult.net
For some reason this rsync is failing after some minutes (typically 10
to 12) with undocumented exit code 255. The verbose rsync logging says
this:
Killed by signal 2.
rsync: writefd_unbuffered failed to write 4 bytes to socket
[sender]: Broken pipe (32)
rsync: connection unexpectedly closed (50735 bytes received so far)
[sender]
rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
Googling has not brought up anything helpful other than bugs with large
files in older versions of rsync. I'm fairly certain that is not the
case here, especially because of the "Killed by signal 2", which is
suggestive of some sort of timeout on the pgpool end.
The specific command line I'm using to recover the second database node is:
sudo -u postgres /usr/local/bin/pcp_recovery_node 10000 psql01 9898
postgres XXXXXX 1
With such a large timeout value I shouldn't be hitting a timeout there.
The weird thing, which makes me point the finger at either pgpool or
pcp_recovery_node, is that if I run do-base-backup.sh manually it works
fine (and takes much much longer, as expected).
Does pgpool have some internal limit on how long it will wait for the
1st stage command to run? I've attached the log file but it isn't very
informative. (Note that the do-base-backup.sh script isn't
communicating the rsync failure back to pgpool, so pgpool goes ahead and
runs stage 2. Of course, that fails because not everything has been
synced.)
Thanks,
Sean
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool.log
Type: text/x-log
Size: 21887 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20140320/fb3abe15/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sean.vcf
Type: text/x-vcard
Size: 275 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20140320/fb3abe15/attachment.vcf>
More information about the pgpool-general
mailing list