[Pgpool-general] failback setup problem

Sat Apr 30 07:44:44 UTC 2011

> Hi Tatsuo,
> 
> ok. now its working fine. thanks for your help.
> 
> after getting through that initial setup for the first time, I'd like to
> give you some feedback about the (at least for me) missing information in
> the documentation.
> 
> - please document the control flow during recovery, e.g.
> connect to master server with recovery_user/recovery_password (connection
> check)
> run recovery_1st_stage_command
> run checkpoint
> run recovery_2nd_stage_command
> ...
> run failback_command
> 
> - please document that recovery_1st_stage_command and
> recovery_2nd_stage_command system calls by the current postgres masterserver
> in the PGDATA directory are. And that the failback_command a shell script
> command or system call from the pgpool server is. I needed to search for it
> in the source code.

Sorry for inconvenience. I will add info to the docs as you suggested.

> I have a different approach of recovering the postgres server. I'm
> recovering from an existing backup. I do that because it is faster and I
> don't put additional i/o load on the just activated server. I guess (or
> hope) most people will have an existing backup.
>
> So my question is - if the aproach of recovering the failed server via a sql
> command is optimal? What if both servers failed? then I'm not able to use
> pcp-tools or pgpoolAdmin for recovering?

I'm not sure what you are trying to do here. If "backup" means it was
created by pg_dump_all, I don't think your approach works. Streaming
replication requires a base backup(binary backup) which is managed by
pg_start_backup/pg_stop_backup.

> I'm asking because I worked for several years as an
> it-production-responsible and I learned a little how administrators
> think/work. They are happy if they have a (or better ONE) defined recovery
> procedure.
> Where am I getting? I'm asking you if it would makes sense to recode or
> reduce the recovery procedure code to one system call e.g. failback_command.
> Most people have their backup and restore functionality coded and ready for
> training and/or desaster. If they could simply use this very same
> functionality within pgpooladmin that would be great.
> 
> It might be that i have overseen something (as before) and this is already
> possible. If so please tell me how.
> 
> Best Regards,
> Uwe
> 
> 
> On 26 April 2011 07:34, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
>> > thanks, that message already helped. I tried to recover the postgres
>> server
>> > with the failback_command.
>>
>> You are welcome.
>>
>> > I didn't realize these recovery_* parameters yet.
>> > So I use the recovery_* parameters for recovering the failed postgres
>> > server.
>>
>> pgpool-II connects to backend to issue some SQLs including CHECKPOINT.
>> recovery_* parameters define the user and password for the connection.
>> Usually they are for PostgreSQL super user (postgres).
>>
>> > And the failback_command to attach the postgres server into pgpool
>> > right?
>>
>> If you want to do something special, for example mailing to DBA, then
>> you might want to specify it.  Otherwise you can leave it empty.
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>> > Best Regards,
>> > Uwe
>> >
>> >
>> > On 26 April 2011 01:06, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>> >
>> >> > I'm using pg-pool-II 3.0.3 with streaming replication.
>> >> > I coded the failback scenario/script for the slave server and the
>> script
>> >> > itself works fine.
>> >> >
>> >> > I now configured the failback script in pgpool.conf and during testing
>> an
>> >> > error message comes up:
>> >> > 2011-04-09 06:42:18 LOG:   pid 16863: starting recovering node 1
>> >> > 2011-04-09 06:42:18 ERROR: pid 16863: start_recover: could not connect
>> >> > master node.
>> >> >
>> >> > [root at adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898
>> >> postgres
>> >> > postgres 0
>> >> > adt-db01 5432 1 0.500000
>> >> > [root at adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898
>> >> postgres
>> >> > postgres 1
>> >> > adt-db02 5432 3 0.500000
>> >> >
>> >> > pcp commands and pgpooladmin report that the master is up and running
>> and
>> >> > I'm able to connect to the master directly and through pgpool.
>> >> > So what's wrong? So far everything else works fine.
>> >>
>> >> Assuming you have set recovery_user and recovery_passwd correctly, I'm
>> >> not sure what's going on. IMO, the error message is very rare. It's so
>> >> rare and there's a bug in the error path, which had not been found for
>> >> long time. Can please try attached patch? The patch add a little bit
>> >> usefull info to the error message above.
>> >> --
>> >> Tatsuo Ishii
>> >> SRA OSS, Inc. Japan
>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> Japanese: http://www.sraoss.co.jp
>> >>
>>