[Pgpool-general] failback setup problem

Tatsuo Ishii ishii at sraoss.co.jp
Sun May 1 09:42:50 UTC 2011


>> Hi Tatsuo,
>> 
>> ok. now its working fine. thanks for your help.
>> 
>> after getting through that initial setup for the first time, I'd like to
>> give you some feedback about the (at least for me) missing information in
>> the documentation.
>> 
>> - please document the control flow during recovery, e.g.
>> connect to master server with recovery_user/recovery_password (connection
>> check)
>> run recovery_1st_stage_command
>> run checkpoint
>> run recovery_2nd_stage_command
>> ...
>> run failback_command
>> 
>> - please document that recovery_1st_stage_command and
>> recovery_2nd_stage_command system calls by the current postgres masterserver
>> in the PGDATA directory are. And that the failback_command a shell script
>> command or system call from the pgpool server is. I needed to search for it
>> in the source code.
> 
> Sorry for inconvenience. I will add info to the docs as you suggested.

Done.

>> I have a different approach of recovering the postgres server. I'm
>> recovering from an existing backup. I do that because it is faster and I
>> don't put additional i/o load on the just activated server. I guess (or
>> hope) most people will have an existing backup.
>>
>> So my question is - if the aproach of recovering the failed server via a sql
>> command is optimal? What if both servers failed? then I'm not able to use
>> pcp-tools or pgpoolAdmin for recovering?
> 
> I'm not sure what you are trying to do here. If "backup" means it was
> created by pg_dump_all, I don't think your approach works. Streaming
> replication requires a base backup(binary backup) which is managed by
> pg_start_backup/pg_stop_backup.
> 
>> I'm asking because I worked for several years as an
>> it-production-responsible and I learned a little how administrators
>> think/work. They are happy if they have a (or better ONE) defined recovery
>> procedure.
>> Where am I getting? I'm asking you if it would makes sense to recode or
>> reduce the recovery procedure code to one system call e.g. failback_command.
>> Most people have their backup and restore functionality coded and ready for
>> training and/or desaster. If they could simply use this very same
>> functionality within pgpooladmin that would be great.
>> 
>> It might be that i have overseen something (as before) and this is already
>> possible. If so please tell me how.
>> 
>> Best Regards,
>> Uwe
>> 
>> 
>> On 26 April 2011 07:34, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>> 
>>> > thanks, that message already helped. I tried to recover the postgres
>>> server
>>> > with the failback_command.
>>>
>>> You are welcome.
>>>
>>> > I didn't realize these recovery_* parameters yet.
>>> > So I use the recovery_* parameters for recovering the failed postgres
>>> > server.
>>>
>>> pgpool-II connects to backend to issue some SQLs including CHECKPOINT.
>>> recovery_* parameters define the user and password for the connection.
>>> Usually they are for PostgreSQL super user (postgres).
>>>
>>> > And the failback_command to attach the postgres server into pgpool
>>> > right?
>>>
>>> If you want to do something special, for example mailing to DBA, then
>>> you might want to specify it.  Otherwise you can leave it empty.
>>> --
>>> Tatsuo Ishii
>>> SRA OSS, Inc. Japan
>>> English: http://www.sraoss.co.jp/index_en.php
>>> Japanese: http://www.sraoss.co.jp
>>>
>>> > Best Regards,
>>> > Uwe
>>> >
>>> >
>>> > On 26 April 2011 01:06, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>>> >
>>> >> > I'm using pg-pool-II 3.0.3 with streaming replication.
>>> >> > I coded the failback scenario/script for the slave server and the
>>> script
>>> >> > itself works fine.
>>> >> >
>>> >> > I now configured the failback script in pgpool.conf and during testing
>>> an
>>> >> > error message comes up:
>>> >> > 2011-04-09 06:42:18 LOG:   pid 16863: starting recovering node 1
>>> >> > 2011-04-09 06:42:18 ERROR: pid 16863: start_recover: could not connect
>>> >> > master node.
>>> >> >
>>> >> > [root at adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898
>>> >> postgres
>>> >> > postgres 0
>>> >> > adt-db01 5432 1 0.500000
>>> >> > [root at adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898
>>> >> postgres
>>> >> > postgres 1
>>> >> > adt-db02 5432 3 0.500000
>>> >> >
>>> >> > pcp commands and pgpooladmin report that the master is up and running
>>> and
>>> >> > I'm able to connect to the master directly and through pgpool.
>>> >> > So what's wrong? So far everything else works fine.
>>> >>
>>> >> Assuming you have set recovery_user and recovery_passwd correctly, I'm
>>> >> not sure what's going on. IMO, the error message is very rare. It's so
>>> >> rare and there's a bug in the error path, which had not been found for
>>> >> long time. Can please try attached patch? The patch add a little bit
>>> >> usefull info to the error message above.
>>> >> --
>>> >> Tatsuo Ishii
>>> >> SRA OSS, Inc. Japan
>>> >> English: http://www.sraoss.co.jp/index_en.php
>>> >> Japanese: http://www.sraoss.co.jp
>>> >>
>>>
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgpool-general


More information about the Pgpool-general mailing list