[Pgpool-general] failover done, now need help in online recovery - Please help!

Tue Apr 19 23:02:34 UTC 2011

> and I cannot even stop it because there is no such process. But since the port 
> is listening, I'm unable to start the pgpool again. 
> 
> 
> BTW, my "recovery_2nd_stage_command" is empty. I hope this is not an issue.

This is not a problem.

BTW can you show the PostgreSQL log of both old primary and new
primary? This might help me to diagnosing the problem.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> ________________________________
> From: Sandeep Thakkar <sandeeptt at yahoo.com>
> To: Tatsuo Ishii <ishii at sraoss.co.jp>
> Cc: pgpool-general at pgfoundry.org
> Sent: Tue, April 19, 2011 2:52:40 PM
> Subject: Re: [Pgpool-general] failover done, now need help in online recovery - 
> Please help!
> 
> 
> Yes, I can connect to pgpool with port 9999. I even checked the 999 port using 
> netstat command to see if it is listening and yes it does.
> 
> 
> But, then I see that there is no pgpool process running and the parent id for 
> all the processes (pgpool: wait for connection request) is '1'.  Also, I said, 
> the terminal does not return where I execute pcp_recovery_node. I have to press 
> Control-C on that shell.
> 
> 
> 
> 
> ________________________________
> From: Tatsuo Ishii <ishii at sraoss.co.jp>
> To: sandeeptt at yahoo.com
> Cc: pgpool-general at pgfoundry.org
> Sent: Tue, April 19, 2011 2:19:57 PM
> Subject: Re: [Pgpool-general] failover done, now need help in online recovery - 
> Please help!
> 
> After recovering old primary(5432) can you connect to pgpool without
> problem? Then this is normal. Pgpool needs to restart all child
> process after recovery.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
> 
>> Alright. Once the failover is done successfully and the standby is promoted to 
> 
>> new primary, I start the recovery of the old primary (5432).
>> 
>> The ol primary is getting recovered. But I found that pgpool is getting killed. 
>>
>> Why? Here is the pgpool log:
>> ......
>> 2011-04-19 11:23:42 LOG:   pid 15894: send_failback_request: fail back 0 th 
>>node 
>>
>> request from pid 15894
>> 2011-04-19 11:23:42 DEBUG: pid 21977: s_do_auth: auth kind: 0
>> 2011-04-19  11:23:42 ERROR: pid 21977: s_do_auth: unknown response "E" before 
>> processing BackendKeyData
>> 2011-04-19 11:23:42 ERROR: pid 21977: s_do_auth: unknown response "^@" before 
>> processing BackendKeyData
>> 2011-04-19 11:23:42 ERROR: pid 21977: s_do_auth: unknown response "^@" before 
>> processing BackendKeyData
>> 2011-04-19 11:23:42 ERROR: pid 21977: s_do_auth: unknown response "^@" before 
>> processing BackendKeyData
>> 2011-04-19 11:23:42 ERROR: pid 21977: s_do_auth: unknown response "V" before 
>> processing BackendKeyData
>> 2011-04-19 11:23:42 DEBUG: pid 21977: s_do_auth: parameter status data 
> received
>> 2011-04-19 11:23:42 ERROR: pid 21977: pool_read2: failed to realloc
>> 2011-04-19 11:23:42 DEBUG: pid 15861: failover_handler called 
>> 2011-04-19 11:23:42 DEBUG: pid 15861: failover_handler: starting to select new 
> 
>> master node
>> 2011-04-19 11:23:42 LOG:    pid 15861: starting fail back. reconnect host 
>> localhost(5432)
>> 2011-04-19 11:23:42 LOG:   pid 15861: execute command: touch 
>> /home/sandeep/PostgreSQL9.0/inst/bin/../failback.log
>> 2011-04-19 11:23:42 DEBUG: pid 20267: child received shutdown request signal 3
>> 2011-04-19 11:23:42 DEBUG: pid 15861: failover_handler: kill 20267
>> 2011-04-19 11:23:42 DEBUG: pid 15861: failover_handler: kill 20268
>> 2011-04-19 11:23:42 DEBUG: pid 15861: failover_handler: kill 20269
>> 2011-04-19 11:23:42 DEBUG: pid 20268: child received shutdown request signal 3
>> 2011-04-19 11:23:42 DEBUG: pid 20270: child received shutdown request signal 3
>> .....
>> 2011-04-19 11:23:42 LOG:   pid 15861: failover_handler: set new master node: 0
>> 2011-04-19 11:23:42 DEBUG: pid 22011: I am 22011
>> 2011-04-19 11:23:42 LOG:   pid 15861: failback done. reconnect host 
>>  localhost(5432)
>> 
>> My command to start recovery is:
>> pcp_recovery_node  -d 20 localhost 9898 pg pg 0
>> 
>> 
>>  The other thing I noticed is that the above command does not return, I have to 
>>
>> press Control-C, to get the prompt back. I'm working on CentOS 64bit.
>> Thanks for your help.
>> 
>> 
>> 
>> ________________________________
>> From: Sandeep Thakkar <sandeeptt at yahoo.com>
>> To: Sandeep Thakkar <sandeeptt at yahoo.com>; pgpool-general at pgfoundry.org
>> Sent: Thu, April 14, 2011 3:28:34 PM
>> Subject: Re: [Pgpool-general] failover done, now need help in online recovery
>> 
>> 
>>  Can we bring the primary server up again? I found in the doc that "In 
>> master/slave mode with streaming replication, online recovery can be  
>>performed. 
>>
>> Only a standby node can be recovered. You cannot recover the primary node. To 
>> recover the primary node, you have to stop all DB nodes and pgpool-II, and then 
>>
>> restore it from a backup." 
>> 
>> 
>> 
>> 
>> So, can't we restore the primary without making the standby (new primary) down? 
>>
>> 
>> 
>> Thanks.
>> 
>> 
>> 
>> 
>> ________________________________
>> From: Sandeep Thakkar <sandeeptt at yahoo.com>
>> To: pgpool-general at pgfoundry.org
>> Sent: Wed, April 13, 2011 3:05:25 PM
>> Subject: [Pgpool-general]  failover done, now need help in online recovery
>> 
>> 
>> Hi,
>> 
>> Failover:
>> I have one Master (PG9.0), one Standby (PG9.0) and one instance of pgpool-II 
>> (3.0.3) on the same box. I created a recovery.conf in the Standby and did all 
>> the other required settings in pgool.conf and postgresql.conf. To mimic the 
>> failover scenario, I killed the Master server, and found that failover process 
> 
>> started successfully. Standby stops in recovering mode, and is promoted to 
>> primary (read-write). I could even execute write query on Standby (new Primary) 
>>
>> now.
>> 
>> Online recovery:
>> To take it forward, I want to bring my old primary up and once up, it should 
>> behave as a Standby (read only).  One step I know of is to execute the 
>> basebackup.sh on the new Primary, which will copy it's base  directory to the 
>>old 
>>
>> primary's data directory. Then what? I do not have the  recovery.conf on the 
>>old 
>>
>> primary yet. Do I need to keep it there? What else do I need to do?
>> 
>> Thanks