[Pgpool-hackers] pgpool-II 3.0.4 release

Thu May 26 07:06:03 UTC 2011

Le 05/26/2011 12:57 AM, Tatsuo Ishii a écrit :
>>> So it seems we need to wait for find_primary_node_repeatedly finish
>>> before we issue pcp_attach_node. This suggests that your fix might not
>>> be appropreate because your fix does not corresponds to this "timing"
>>> issue.
>>>
>>> I'm going to keep on looking into this...
>>
>> That's actually not the issue I'm talking about. I'm in V3.0, with a
>> single backend, no failover script. See my config in attachment.
> 
> Ok. pgpool-II-3.0-stable does not use find_primary_node_repeatedly()
> and it does not have the problem I'm talking about.
> 

Yes.

>> When I do the pcp_detach_node, I have this:
>>
>> 2011-05-25 20:24:12 LOG:   pid 31861: notice_backend_error: 0 fail over
>> request from pid 31861
>> 2011-05-25 20:24:12 LOG:   pid 31828: starting degeneration. shutdown
>> host localhost(5432)
>> 2011-05-25 20:24:12 ERROR: pid 31828: failover_handler: no valid DB node
>> found
>> 2011-05-25 20:24:12 LOG:   pid 31828: failover done. shutdown host
>> localhost(5432)
>>
>> Which seems fine to me. Then I do the pcp_attach_node, and I got this:
>>
>> 2011-05-25 20:25:23 LOG:   pid 31861: send_failback_request: fail back 0
>> th node request from pid 31861
>> 2011-05-25 20:25:23 ERROR: pid 31861: send_failback_request: node 0 is
>> alive.
>>
>> I was mistaken on the "node 0 is alive" message. I thought it means that
>> node 0 is NOW up. What it really means is that pgpool thought it was
>> ALREADY alive (hence the ERROR message level on the
>> send_failback_request function). Digging harder on this issue, I finally
>> found that the VALID_BACKEND macro returns true when it should return
>> false. Actually, there is already this comment in get_next_master_node():
>>
>>         /*
>>          * Do not use VALID_BACKEND macro in raw mode.
>>          * VALID_BACKEND return true only if the argument is master
>>          * node id. In other words, standby nodes are false. So need
>>          * to check backend status without VALID_BACKEND.
>>          */
>>
>> And I'm actually in raw mode. VALID_BACKEND is used so much it would be
>> really dangerous to change it. So, I'm not sure what we really should do
>> here. I've got a patch that fixes my issue cleanly, not sure it's the
>> best way to do this. See the patch in attachment.
> 
> My suggestion is, leave this as it is for 3.0.4. I think we need more
> time to investigate it. Let's continue the work after 3.0.4 released.
> We already have critical issues such as "unnamed statement not found"
> with 3.0.3, and I have personaly sent to users who were troubled by
> this issue the 3.0-STABLE CVS tar ball by their request. If we delay
> the 3.0.4 release, more and more this kind of questions/requests will
> be coming. I don't want to be troubled...
> 

I agree. I have no problem with dealing with this for 3.0.5, or even 3.1.

If you have a list of open items for 3.0.4, can you give it? so that we
could help you closing some.

>> BTW, when I do a pcp_attach_node, I have the status 2, but it didn't
>> check if there was a PostgreSQL backend available. Not sure we want to
>> do something on this too. Why doesn't it check if the backend is
>> available? it doesn't do at startup too. I find this really weird, but
>> I'm sure there is a reason.
> 
> It's a design decision. pcp_attach_node is supposed to be used by
> human(or smart management tool) and he/she should know what he/she is
> doing. That says he/she should make sure if the backend actually
> usable: just it is up and running is not enough. For example, in
> replication mode, it must be synched with other backend before
> pcp_attach_node is used.

Fair enough. I didn't check the docs but it should say so. Will look
into this.

-- 
Guillaume
 http://www.postgresql.fr
 http://dalibo.com