[Pgpool-general] Cannot add node after failure

Tatsuo Ishii ishii at sraoss.co.jp
Wed Dec 16 08:47:22 UTC 2009


> Hello,
> 
> Thanks for your info!
> 
> I was able to do some progress with node recovery when using  
> pgpool_recovery on both recovery command.
> 
> I am able to recovery most of the times, but sometimes it fails with  
> the following error:
> 
> $ pcp_recovery_node  -d 90 localhost 9898 postgres ******* 2
> DEBUG: send: tos="R", len=46
> DEBUG: recv: tos="r", len=21, data=AuthenticationOK
> DEBUG: send: tos="D", len=6
> DEBUG: recv: tos="e", len=20, data=recovery failed
> DEBUG: command failed. reason=recovery failed
> BackendError
> DEBUG: send: tos="X", len=4
> 
> pgpool log
> 
> 2009-12-15 20:10:56 DEBUG: pid 8747: pcp_child: received PCP packet  
> type of service 'M'
> 2009-12-15 20:10:56 DEBUG: pid 8747: pcp_child: salt sent to the client
> 2009-12-15 20:10:56 DEBUG: pid 8747: pcp_child: received PCP packet  
> type of service 'R'
> 2009-12-15 20:10:56 DEBUG: pid 8747: pcp_child: authentication OK
> 2009-12-15 20:10:56 DEBUG: pid 8747: pcp_child: received PCP packet  
> type of service 'O'
> 2009-12-15 20:10:56 DEBUG: pid 8747: pcp_child: start online recovery
> 2009-12-15 20:10:56 LOG:   pid 8747: starting recovering node 2
> 2009-12-15 20:10:56 DEBUG: pid 8747: exec_checkpoint: start checkpoint
> 2009-12-15 20:10:56 DEBUG: pid 8747: exec_checkpoint: finish checkpoint
> 2009-12-15 20:10:56 LOG:   pid 8747: CHECKPOINT in the 1st stage done
> 2009-12-15 20:10:56 LOG:   pid 8747: starting recovery command:  
> "SELECT pgpool_recovery('pgpool_recovery', 'im-pp3', '/usr/local/pgsql/ 
> data')"
> 2009-12-15 20:10:56 DEBUG: pid 8747: exec_recovery: start recovery
> 2009-12-15 20:10:56 ERROR: pid 8747: exec_recovery: pgpool_recovery  
> command failed at 1st stage
> 2009-12-15 20:10:56 DEBUG: pid 8747: exec_recovery: finish recovery
> 2009-12-15 20:10:56 DEBUG: pid 8747: pcp_child: received PCP packet  
> type of service 'X'
> 2009-12-15 20:10:56 DEBUG: pid 8747: pcp_child: client disconnecting.  
> close connection
> 2009-12-15 20:11:22 DEBUG: pid 8446: starting health checking
> 
> Unfortunately i am not sure what this error means. Did it failed at  
> "SELECT pgpool_recovery('pgpool_recovery', 'im-pp3', '/usr/local/pgsql/ 
> data')"? How can i find the reason?

Recovery command "pgpool_recovery" failed for some reason. Check
PostgreSQL log on master node. If it is not clear, try to add -x to
shell in your pgpool_recovery script. i.e.

#! /bin/sh -x

--
Tatsuo Ishii
SRA OSS, Inc. Japan

> Best Regards,
> ---
> 
> Fernando Marcelo
> www.consultorpc.com
> fernando at consultorpc.com
> 
> 
> Em 15/12/2009, às 13:36, Jaume Sabater escreveu:
> 
> > On Tue, Dec 15, 2009 at 4:20 PM, Fernando Morgenstern
> > <fernando at consultorpc.com> wrote:
> >
> >> While reading pgpool manual i found this:
> >> Note that there is a restriction about online recovery. If pgpool- 
> >> II works
> >> on multiple hosts, online recovery does not work correctly, because
> >> pgpool-II stops clients on the 2nd stage of online recovery. If  
> >> there are
> >> some pgpool hosts, pgpool-II excepted for receiving online recovery  
> >> request
> >> cannot block connections.
> >
> > It means running two or more pgpool-II instances simultaneously, which
> > won't be your case since, with Heartbeat, you'll configure pgpool-II
> > as a resource, hence it will only be active in one node at a given
> > time.
> >
> > -- 
> > Jaume Sabater
> > http://linuxsilo.net/
> >
> > "Ubi sapientas ibi libertas"
> 
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgpool-general


More information about the Pgpool-general mailing list