<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">Sorry, the subject line should have

      said stage <b>1</b>.<br>

      <br>

      On 14-03-20 12:48 PM, Sean Hogan wrote:<br>

    </div>

    <blockquote cite="mid:532B06AD.9050507@compusult.net" type="cite">Hi,

      <br>

      <br>

      In my setup at the moment I have a pair of version 3.3.2 pgpool

      instances with two backend PostgreSQL 9.2.4 servers, all running

      on CentOS 6.4.  The PostgreSQL data directories are quite large -

      144GB.  I have run into a situation where pcp_recovery_node

      consistently fails with a BackendError.

      <br>

      <br>

      The stage 1 recovery command is a script called do-base-backup.sh

      that runs an rsync as follows:

      <br>

      <br>

          rsync -Cacvv --delete \

      <br>

                  --exclude postmaster.pid --exclude postmaster.opts \

      <br>

                  --exclude recovery.done \

      <br>

                  --exclude pg_log/\* --exclude pg_xlog/\* \

      <br>

                  $SOURCE/ $DESTINATION/ 2>&1 |

      <br>

          mailx -s "rsync verbose output" <a class="moz-txt-link-abbreviated" href="mailto:sean@compusult.net">sean@compusult.net</a>

      <br>

      <br>

      For some reason this rsync is failing after some minutes

      (typically 10 to 12) with undocumented exit code 255.  The verbose

      rsync logging says this:

      <br>

      <br>

          Killed by signal 2.

      <br>

          rsync: writefd_unbuffered failed to write 4 bytes to socket

      [sender]: Broken pipe (32)

      <br>

          rsync: connection unexpectedly closed (50735 bytes received so

      far) [sender]

      <br>

          rsync error: unexplained error (code 255) at io.c(600)

      [sender=3.0.6]

      <br>

      <br>

      Googling has not brought up anything helpful other than bugs with

      large files in older versions of rsync.  I'm fairly certain that

      is not the case here, especially because of the "Killed by signal

      2", which is suggestive of some sort of timeout on the pgpool end.

      <br>

      <br>

      The specific command line I'm using to recover the second database

      node is:

      <br>

      <br>

          sudo -u postgres /usr/local/bin/pcp_recovery_node 10000 psql01

      9898 postgres XXXXXX 1

      <br>

      <br>

      With such a large timeout value I shouldn't be hitting a timeout

      there.

      <br>

      <br>

      The weird thing, which makes me point the finger at either pgpool

      or pcp_recovery_node, is that if I run do-base-backup.sh manually

      it works fine (and takes much much longer, as expected).

      <br>

      <br>

      Does pgpool have some internal limit on how long it will wait for

      the 1st stage command to run?  I've attached the log file but it

      isn't very informative.  (Note that the do-base-backup.sh script

      isn't communicating the rsync failure back to pgpool, so pgpool

      goes ahead and runs stage 2.  Of course, that fails because not

      everything has been synced.)

      <br>

      <br>

      Thanks,

      <br>

      Sean

      <br>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

pgpool-general mailing list

<a class="moz-txt-link-abbreviated" href="mailto:pgpool-general@pgpool.net">pgpool-general@pgpool.net</a>

<a class="moz-txt-link-freetext" href="http://www.pgpool.net/mailman/listinfo/pgpool-general">http://www.pgpool.net/mailman/listinfo/pgpool-general</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>