[Pgpool-hackers] pgpool-II 3.0.4 release

Guillaume Lelarge guillaume at lelarge.info
Fri Jun 10 18:20:59 UTC 2011


On Thu, 2011-05-26 at 09:06 +0200, Guillaume Lelarge wrote:
> Le 05/26/2011 12:57 AM, Tatsuo Ishii a écrit :
> [...]
> >> When I do the pcp_detach_node, I have this:
> >>
> >> 2011-05-25 20:24:12 LOG:   pid 31861: notice_backend_error: 0 fail over
> >> request from pid 31861
> >> 2011-05-25 20:24:12 LOG:   pid 31828: starting degeneration. shutdown
> >> host localhost(5432)
> >> 2011-05-25 20:24:12 ERROR: pid 31828: failover_handler: no valid DB node
> >> found
> >> 2011-05-25 20:24:12 LOG:   pid 31828: failover done. shutdown host
> >> localhost(5432)
> >>
> >> Which seems fine to me. Then I do the pcp_attach_node, and I got this:
> >>
> >> 2011-05-25 20:25:23 LOG:   pid 31861: send_failback_request: fail back 0
> >> th node request from pid 31861
> >> 2011-05-25 20:25:23 ERROR: pid 31861: send_failback_request: node 0 is
> >> alive.
> >>
> >> I was mistaken on the "node 0 is alive" message. I thought it means that
> >> node 0 is NOW up. What it really means is that pgpool thought it was
> >> ALREADY alive (hence the ERROR message level on the
> >> send_failback_request function). Digging harder on this issue, I finally
> >> found that the VALID_BACKEND macro returns true when it should return
> >> false. Actually, there is already this comment in get_next_master_node():
> >>
> >>         /*
> >>          * Do not use VALID_BACKEND macro in raw mode.
> >>          * VALID_BACKEND return true only if the argument is master
> >>          * node id. In other words, standby nodes are false. So need
> >>          * to check backend status without VALID_BACKEND.
> >>          */
> >>
> >> And I'm actually in raw mode. VALID_BACKEND is used so much it would be
> >> really dangerous to change it. So, I'm not sure what we really should do
> >> here. I've got a patch that fixes my issue cleanly, not sure it's the
> >> best way to do this. See the patch in attachment.
> > 
> > My suggestion is, leave this as it is for 3.0.4. I think we need more
> > time to investigate it. Let's continue the work after 3.0.4 released.
> > We already have critical issues such as "unnamed statement not found"
> > with 3.0.3, and I have personaly sent to users who were troubled by
> > this issue the 3.0-STABLE CVS tar ball by their request. If we delay
> > the 3.0.4 release, more and more this kind of questions/requests will
> > be coming. I don't want to be troubled...
> > 
> 
> I agree. I have no problem with dealing with this for 3.0.5, or even 3.1.
> 

Now that 3.0.4 is out, maybe it's the right time to work on this.

This issue is really a bad one. I had this week a mail from one of our
customers, complaining that the online recovery process doesn't work
because it thinks the node is still alive. And guess what... it uses the
VALID_BACKEND, even if pgpool was working in raw mode.

What could we do about this? My patch fixes the previous error, but not
this one. I now would be more in favor of a VALID_RAW_BACKEND macro.


-- 
Guillaume
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com



More information about the Pgpool-hackers mailing list