[Pgpool-hackers] New patches for pcp_promote_node

Wed Mar 9 09:01:08 UTC 2011

Le 09/03/2011 00:45, Tatsuo Ishii a écrit :
> Thanks for clarification. That sounds reasonable to me.
>
> BTW, after applying the patches I got following errors while doing
> online recovery.  In my testing node 0 is down status and is the
> recovery target. Node 1 is up and running as primary node. This worked
> perfectly before applying your patches. Thoought?
>
> 2011-03-09 08:37:17 ERROR: pid 15531: health check failed. 0 th host /tmp at port 5433 is down
> 2011-03-09 08:37:17 LOG:   pid 15531: set 0 th backend down status
> 2011-03-09 08:37:17 LOG:   pid 15531: starting degeneration. shutdown host /tmp(5433)
> 2011-03-09 08:37:17 LOG:   pid 15531: execute command: /usr/local/etc/failover.sh 0 "/tmp" 5433 /usr/local/pgsql/data 1 0 "/tmp" 0
> 2011-03-09 08:37:17 LOG:   pid 15531: find_primary_node: 1 node is standby
> 2011-03-09 08:37:17 LOG:   pid 15531: find_primary_node: no primary node found
> 2011-03-09 08:37:17 LOG:   pid 15531: Primary node id saved: -1
> 2011-03-09 08:37:17 LOG:   pid 15531: failover done. shutdown host /tmp(5433)
> 2011-03-09 08:37:34 LOG:   pid 15566: starting recovering node 0
> 2011-03-09 08:37:34 ERROR: pid 15566: start_recover: could not connect master node.

Have you applied the entire patch, any reject ? I mean this error
appears when the change in find_primary_node() has not be done. Please
take a look, you must have:

    SELECT pg_is_in_recovery() AND pgpool_walrecrunning()

replaced by:

    SELECT not pg_is_in_recovery() AND not pgpool_walrecrunning()

and the response comparison: strcmp(res->data[0], "t") replaced by
strcmp(res->data[0], "f")

Could you please check that? I will check again in my side to see if I
forgot something in the patch.

Regards,

-- 
Gilles Darold
http://dalibo.com - http://dalibo.org