[Pgpool-general] Cannot add node after failure

Fernando Morgenstern fernando at consultorpc.com
Thu Dec 17 12:40:32 UTC 2009


Hello,

Sorry for the lack replies.

Before checking your email, i realized that recovery was only working  
on the nodes that pgpool was compiled. I decided to compile it on the  
others and now it works ok ( just compiled, didn't run pgpool there ).

During the last day, i have been simulating recoveries by killing  
postgres and shutting down servers randomly. Most of the time,  
recovery is perfectly done, but there are some specific cases when  
pcp_recovery_node reports that the command is complete but the  
recovery isn't done ( eg.: some databases that i created while the  
failed node was down, are not present when it starts up ).

I tried to isolate the log messsages of when this behaviour happens  
and here it is http://pastebin.ca/1718115

I really don't see anything different, but the fact is that some data  
is missing on node 1, which is being recovered.

Do you mind giving some kind of advice of things that i should check  
or something that you think it is wrong?



By the way, i am using pgpool_recovery as 1st and 2nd recovery command  
on pgpool and the script looks like this one:

#! /bin/sh

if [ $# -ne 3 ]
then
     echo "pgpool_recovery datadir remote_host remote_datadir"
     exit 1
fi

datadir=$1
DEST=$2
DESTDIR=$3

rsync -aurz --delete -e ssh $datadir/global/ $DEST:$DESTDIR/global/ &
rsync -aurz --delete -e ssh $datadir/base/ $DEST:$DESTDIR/base/ &
rsync -aurz --delete -e ssh $datadir/pg_multixact/ $DEST:$DESTDIR/ 
pg_multixact/ &
rsync -aurz --delete -e ssh $datadir/pg_subtrans/ $DEST:$DESTDIR/ 
pg_subtrans/ &
rsync -aurz --delete -e ssh $datadir/pg_clog/ $DEST:$DESTDIR/pg_clog/ &
rsync -aurz --delete -e ssh $datadir/pg_xlog/ $DEST:$DESTDIR/pg_xlog/ &
rsync -aurz --delete -e ssh $datadir/pg_twophase/ $DEST:$DESTDIR/ 
pg_twophase/ &
wait

Regards,
---

Fernando Marcelo
www.consultorpc.com
fernando at consultorpc.com
Tel: +34 902 998971
Fax: +34 91 7903701

## legal disclaimer

The information contained in this email is confidential. It is  
intended only
for the stated addressee(s) and access to it by any other person is
unauthorized. If you are not an addressee, you must not disclose, copy,
circulate or in any other way use or rely on the information contained  
in
this email. Such unauthorized use may be unlawful. If you have  
received this
email in error, please inform us immediately by emailing admin at consultorpc.com
and delete it and all copies from your system.

## end mail

Em 16/12/2009, às 06:47, Tatsuo Ishii escreveu:

>> Hello,
>>
>> Thanks for your info!
>>
>> I was able to do some progress with node recovery when using
>> pgpool_recovery on both recovery command.
>>
>> I am able to recovery most of the times, but sometimes it fails with
>> the following error:
>>
>> $ pcp_recovery_node  -d 90 localhost 9898 postgres ******* 2
>> DEBUG: send: tos="R", len=46
>> DEBUG: recv: tos="r", len=21, data=AuthenticationOK
>> DEBUG: send: tos="D", len=6
>> DEBUG: recv: tos="e", len=20, data=recovery failed
>> DEBUG: command failed. reason=recovery failed
>> BackendError
>> DEBUG: send: tos="X", len=4
>>
>> pgpool log
>>
>> 2009-12-15 20:10:56 DEBUG: pid 8747: pcp_child: received PCP packet
>> type of service 'M'
>> 2009-12-15 20:10:56 DEBUG: pid 8747: pcp_child: salt sent to the  
>> client
>> 2009-12-15 20:10:56 DEBUG: pid 8747: pcp_child: received PCP packet
>> type of service 'R'
>> 2009-12-15 20:10:56 DEBUG: pid 8747: pcp_child: authentication OK
>> 2009-12-15 20:10:56 DEBUG: pid 8747: pcp_child: received PCP packet
>> type of service 'O'
>> 2009-12-15 20:10:56 DEBUG: pid 8747: pcp_child: start online recovery
>> 2009-12-15 20:10:56 LOG:   pid 8747: starting recovering node 2
>> 2009-12-15 20:10:56 DEBUG: pid 8747: exec_checkpoint: start  
>> checkpoint
>> 2009-12-15 20:10:56 DEBUG: pid 8747: exec_checkpoint: finish  
>> checkpoint
>> 2009-12-15 20:10:56 LOG:   pid 8747: CHECKPOINT in the 1st stage done
>> 2009-12-15 20:10:56 LOG:   pid 8747: starting recovery command:
>> "SELECT pgpool_recovery('pgpool_recovery', 'im-pp3', '/usr/local/ 
>> pgsql/
>> data')"
>> 2009-12-15 20:10:56 DEBUG: pid 8747: exec_recovery: start recovery
>> 2009-12-15 20:10:56 ERROR: pid 8747: exec_recovery: pgpool_recovery
>> command failed at 1st stage
>> 2009-12-15 20:10:56 DEBUG: pid 8747: exec_recovery: finish recovery
>> 2009-12-15 20:10:56 DEBUG: pid 8747: pcp_child: received PCP packet
>> type of service 'X'
>> 2009-12-15 20:10:56 DEBUG: pid 8747: pcp_child: client disconnecting.
>> close connection
>> 2009-12-15 20:11:22 DEBUG: pid 8446: starting health checking
>>
>> Unfortunately i am not sure what this error means. Did it failed at
>> "SELECT pgpool_recovery('pgpool_recovery', 'im-pp3', '/usr/local/ 
>> pgsql/
>> data')"? How can i find the reason?
>
> Recovery command "pgpool_recovery" failed for some reason. Check
> PostgreSQL log on master node. If it is not clear, try to add -x to
> shell in your pgpool_recovery script. i.e.
>
> #! /bin/sh -x
>
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
>
>> Best Regards,
>> ---
>>
>> Fernando Marcelo
>> www.consultorpc.com
>> fernando at consultorpc.com
>>
>>
>> Em 15/12/2009, às 13:36, Jaume Sabater escreveu:
>>
>>> On Tue, Dec 15, 2009 at 4:20 PM, Fernando Morgenstern
>>> <fernando at consultorpc.com> wrote:
>>>
>>>> While reading pgpool manual i found this:
>>>> Note that there is a restriction about online recovery. If pgpool-
>>>> II works
>>>> on multiple hosts, online recovery does not work correctly, because
>>>> pgpool-II stops clients on the 2nd stage of online recovery. If
>>>> there are
>>>> some pgpool hosts, pgpool-II excepted for receiving online recovery
>>>> request
>>>> cannot block connections.
>>>
>>> It means running two or more pgpool-II instances simultaneously,  
>>> which
>>> won't be your case since, with Heartbeat, you'll configure pgpool-II
>>> as a resource, hence it will only be active in one node at a given
>>> time.
>>>
>>> -- 
>>> Jaume Sabater
>>> http://linuxsilo.net/
>>>
>>> "Ubi sapientas ibi libertas"
>>
>> _______________________________________________
>> Pgpool-general mailing list
>> Pgpool-general at pgfoundry.org
>> http://pgfoundry.org/mailman/listinfo/pgpool-general



More information about the Pgpool-general mailing list