[Pgpool-hackers] pgpool-II 3.0.4 release

Wed May 25 20:13:17 UTC 2011

Le 05/25/2011 03:52 PM, Tatsuo Ishii a écrit :
>>> We are about to release pgpool-II 3.0.4, the lastest version of
>>> pgpool-II 3.0 stable tree. Currently release date is scheduled on May
>>> 30, Monday.
>>>
>>> If you would like add fixes to this release please propose.
>>
>> See "Detaching,	then attaching node results in weird status". I'm still
>> waiting for your answer on this one.
> 
> I have been looking into it. Here are log when pcp_detach_node executed:
> 
> 2011-05-25 22:39:37 LOG:   pid 17063: starting degeneration. shutdown host /tmp(5433)
> 2011-05-25 22:39:37 ERROR: pid 17063: failover_handler: no valid DB node found
> 2011-05-25 22:39:37 LOG:   pid 17063: Restart all children
> 2011-05-25 22:39:37 LOG:   pid 17063: execute command: /usr/local/etc/failover.sh 0 "/tmp" 5433 /usr/local/pgsql/data -1 0 "" 0
> 2011-05-25 22:39:37 LOG:   pid 17063: find_primary_node_repeatedly: waiting for finding a primary node
> 
> The log stopped here while. After 60 seconds, you see below.
> 
> 2011-05-25 22:40:37 LOG:   pid 17063: failover: set new primary node: -1
> 2011-05-25 22:40:37 LOG:   pid 17694: do_child: failback event found. restart myself.
> 2011-05-25 22:40:37 LOG:   pid 17695: do_child: failback event found. restart myself.
> 2011-05-25 22:40:37 LOG:   pid 17696: do_child: failback event found. restart myself.
> 2011-05-25 22:40:37 LOG:   pid 17063: failover done. shutdown host /tmp(5433)
> 
> If I do not wait for 60 seconds and execute pcp_attach_node, the
> problem happens. However if I wait for 60 seconds and see "failover
> done." message, the problem does not occur.
> 
> So it seems we need to wait for find_primary_node_repeatedly finish
> before we issue pcp_attach_node. This suggests that your fix might not
> be appropreate because your fix does not corresponds to this "timing"
> issue.
> 
> I'm going to keep on looking into this...

That's actually not the issue I'm talking about. I'm in V3.0, with a
single backend, no failover script. See my config in attachment.

When I do the pcp_detach_node, I have this:

2011-05-25 20:24:12 LOG:   pid 31861: notice_backend_error: 0 fail over
request from pid 31861
2011-05-25 20:24:12 LOG:   pid 31828: starting degeneration. shutdown
host localhost(5432)
2011-05-25 20:24:12 ERROR: pid 31828: failover_handler: no valid DB node
found
2011-05-25 20:24:12 LOG:   pid 31828: failover done. shutdown host
localhost(5432)

Which seems fine to me. Then I do the pcp_attach_node, and I got this:

2011-05-25 20:25:23 LOG:   pid 31861: send_failback_request: fail back 0
th node request from pid 31861
2011-05-25 20:25:23 ERROR: pid 31861: send_failback_request: node 0 is
alive.

I was mistaken on the "node 0 is alive" message. I thought it means that
node 0 is NOW up. What it really means is that pgpool thought it was
ALREADY alive (hence the ERROR message level on the
send_failback_request function). Digging harder on this issue, I finally
found that the VALID_BACKEND macro returns true when it should return
false. Actually, there is already this comment in get_next_master_node():

        /*
         * Do not use VALID_BACKEND macro in raw mode.
         * VALID_BACKEND return true only if the argument is master
         * node id. In other words, standby nodes are false. So need
         * to check backend status without VALID_BACKEND.
         */

And I'm actually in raw mode. VALID_BACKEND is used so much it would be
really dangerous to change it. So, I'm not sure what we really should do
here. I've got a patch that fixes my issue cleanly, not sure it's the
best way to do this. See the patch in attachment.

BTW, when I do a pcp_attach_node, I have the status 2, but it didn't
check if there was a PostgreSQL backend available. Not sure we want to
do something on this too. Why doesn't it check if the backend is
available? it doesn't do at startup too. I find this really weird, but
I'm sure there is a reason.

-- 
Guillaume
 http://www.postgresql.fr
 http://dalibo.com
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: pgpool.conf
URL: <http://pgfoundry.org/pipermail/pgpool-hackers/attachments/20110525/e42b99ee/attachment.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: main.patch
Type: text/x-patch
Size: 1628 bytes
Desc: not available
URL: <http://pgfoundry.org/pipermail/pgpool-hackers/attachments/20110525/e42b99ee/attachment.bin>