[Pgpool-general] failback setup problem

Uwe Bartels uwe.bartels at gmail.com
Thu Apr 28 06:40:01 UTC 2011


Hi Tatsuo,

ok. now its working fine. thanks for your help.

after getting through that initial setup for the first time, I'd like to
give you some feedback about the (at least for me) missing information in
the documentation.

- please document the control flow during recovery, e.g.
connect to master server with recovery_user/recovery_password (connection
check)
run recovery_1st_stage_command
run checkpoint
run recovery_2nd_stage_command
...
run failback_command

- please document that recovery_1st_stage_command and
recovery_2nd_stage_command system calls by the current postgres masterserver
in the PGDATA directory are. And that the failback_command a shell script
command or system call from the pgpool server is. I needed to search for it
in the source code.

I have a different approach of recovering the postgres server. I'm
recovering from an existing backup. I do that because it is faster and I
don't put additional i/o load on the just activated server. I guess (or
hope) most people will have an existing backup.
So my question is - if the aproach of recovering the failed server via a sql
command is optimal? What if both servers failed? then I'm not able to use
pcp-tools or pgpoolAdmin for recovering?

I'm asking because I worked for several years as an
it-production-responsible and I learned a little how administrators
think/work. They are happy if they have a (or better ONE) defined recovery
procedure.
Where am I getting? I'm asking you if it would makes sense to recode or
reduce the recovery procedure code to one system call e.g. failback_command.
Most people have their backup and restore functionality coded and ready for
training and/or desaster. If they could simply use this very same
functionality within pgpooladmin that would be great.

It might be that i have overseen something (as before) and this is already
possible. If so please tell me how.

Best Regards,
Uwe


On 26 April 2011 07:34, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> > thanks, that message already helped. I tried to recover the postgres
> server
> > with the failback_command.
>
> You are welcome.
>
> > I didn't realize these recovery_* parameters yet.
> > So I use the recovery_* parameters for recovering the failed postgres
> > server.
>
> pgpool-II connects to backend to issue some SQLs including CHECKPOINT.
> recovery_* parameters define the user and password for the connection.
> Usually they are for PostgreSQL super user (postgres).
>
> > And the failback_command to attach the postgres server into pgpool
> > right?
>
> If you want to do something special, for example mailing to DBA, then
> you might want to specify it.  Otherwise you can leave it empty.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>
> > Best Regards,
> > Uwe
> >
> >
> > On 26 April 2011 01:06, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> >
> >> > I'm using pg-pool-II 3.0.3 with streaming replication.
> >> > I coded the failback scenario/script for the slave server and the
> script
> >> > itself works fine.
> >> >
> >> > I now configured the failback script in pgpool.conf and during testing
> an
> >> > error message comes up:
> >> > 2011-04-09 06:42:18 LOG:   pid 16863: starting recovering node 1
> >> > 2011-04-09 06:42:18 ERROR: pid 16863: start_recover: could not connect
> >> > master node.
> >> >
> >> > [root at adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898
> >> postgres
> >> > postgres 0
> >> > adt-db01 5432 1 0.500000
> >> > [root at adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898
> >> postgres
> >> > postgres 1
> >> > adt-db02 5432 3 0.500000
> >> >
> >> > pcp commands and pgpooladmin report that the master is up and running
> and
> >> > I'm able to connect to the master directly and through pgpool.
> >> > So what's wrong? So far everything else works fine.
> >>
> >> Assuming you have set recovery_user and recovery_passwd correctly, I'm
> >> not sure what's going on. IMO, the error message is very rare. It's so
> >> rare and there's a bug in the error path, which had not been found for
> >> long time. Can please try attached patch? The patch add a little bit
> >> usefull info to the error message above.
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >> English: http://www.sraoss.co.jp/index_en.php
> >> Japanese: http://www.sraoss.co.jp
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pgfoundry.org/pipermail/pgpool-general/attachments/20110428/0d06cc19/attachment.html>


More information about the Pgpool-general mailing list