[Pgpool-hackers] pgpool-II 3.0.4 release

Wed May 25 13:52:08 UTC 2011

>> We are about to release pgpool-II 3.0.4, the lastest version of
>> pgpool-II 3.0 stable tree. Currently release date is scheduled on May
>> 30, Monday.
>> 
>> If you would like add fixes to this release please propose.
> 
> See "Detaching,	then attaching node results in weird status". I'm still
> waiting for your answer on this one.

I have been looking into it. Here are log when pcp_detach_node executed:

2011-05-25 22:39:37 LOG:   pid 17063: starting degeneration. shutdown host /tmp(5433)
2011-05-25 22:39:37 ERROR: pid 17063: failover_handler: no valid DB node found
2011-05-25 22:39:37 LOG:   pid 17063: Restart all children
2011-05-25 22:39:37 LOG:   pid 17063: execute command: /usr/local/etc/failover.sh 0 "/tmp" 5433 /usr/local/pgsql/data -1 0 "" 0
2011-05-25 22:39:37 LOG:   pid 17063: find_primary_node_repeatedly: waiting for finding a primary node

The log stopped here while. After 60 seconds, you see below.

2011-05-25 22:40:37 LOG:   pid 17063: failover: set new primary node: -1
2011-05-25 22:40:37 LOG:   pid 17694: do_child: failback event found. restart myself.
2011-05-25 22:40:37 LOG:   pid 17695: do_child: failback event found. restart myself.
2011-05-25 22:40:37 LOG:   pid 17696: do_child: failback event found. restart myself.
2011-05-25 22:40:37 LOG:   pid 17063: failover done. shutdown host /tmp(5433)

If I do not wait for 60 seconds and execute pcp_attach_node, the
problem happens. However if I wait for 60 seconds and see "failover
done." message, the problem does not occur.

So it seems we need to wait for find_primary_node_repeatedly finish
before we issue pcp_attach_node. This suggests that your fix might not
be appropreate because your fix does not corresponds to this "timing"
issue.

I'm going to keep on looking into this...
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp