[Pgpool-hackers] Health check retries (patch)
Matt Solnit
msolnit at soasta.com
Fri Nov 18 21:28:44 UTC 2011
Hi everyone. In August, I wrote to the pgpool-general list (see below) asking if there was any
way to have pgpool-II retry a failed health check before promoting the slave.
I'm attaching a patch that adds this functionality. Would anyone care to review it? We've been
using it successfully in production for about 3 months now, and it's working great.
This is my first time submitting a patch to PostgreSQL or PgPool, so go easy :-).
Some comments:
- The purpose of this feature is to allow pgpool-II to handle brief networking interruptions
without being "fooled" into thinking that the master node is down and the slave needs to
be promoted.
- This patch adds two new configuration settings.
- The "health_check_max_retries" setting is the maximum number of times to retry a health
check before giving up.
- The "health_check_retry_delay" is the amount of time (in seconds) to sleep between retries.
- The feature is turned *off* by default (health_check_max_retries defaults to 0, or no retries).
Patch is against git HEAD revision (commit 58043c962b8305507de0f450be74c24cbe4c8430).
Please let me know if you have any questions or comments.
-- Matt
Begin forwarded message:
> From: Matt Solnit <msolnit at soasta.com>
> Subject: Re: [Pgpool-general] Can pgpool-II retry failed health checks?
> Date: August 4, 2011 10:57:27 PM PDT
> To: Guillaume Lelarge <guillaume at lelarge.info>
> Cc: "pgpool-general at pgfoundry.org" <pgpool-general at pgfoundry.org>
>
> On Aug 4, 2011, at 10:54 PM, Guillaume Lelarge wrote:
>
>> On Fri, 2011-08-05 at 00:17 -0400, Matt Solnit wrote:
>>> On Jul 29, 2011, at 10:37 PM, Matthew Solnit wrote:
>>>
>>>> Hi everyone. I'm using pgpool-II 3.0.4 with PostgreSQL 9.0.2, in streaming replication mode. We've had
>>>> a couple of cases where pgpool-II got a network timeout while performing a health check on the master
>>>> node, and then immediately initiated failover and promoted the slave. This was a problem in our case
>>>> because the master was actually fine -- there was just a temporary network "hiccup" that caused a timeout.
>>>>
>>>> Is there any way to configure pgpool-II to retry in this case? I couldn't find one in the documentation.
>>>>
>>>> I did see the "Unplugged Wire" thead (http://pgfoundry.org/pipermail/pgpool-general/2010-March/002589.html),
>>>> which indicates that there was a single retry at one point, which was removed. But what I am more interested
>>>> in is a configurable number of retries, with a configurable delay between retries.
>>>>
>>>> -- Matt
>>>
>>> Hi everyone. I just wanted to try one more time to get an answer for this :-). We would really, really
>>> like to find a solution.
>>>
>>
>> That kind of configuration doesn't exist right, but could be interesting
>> to add to a future release.
>>
>>
>> --
>> Guillaume
>> http://blog.guillaume.lelarge.info
>> http://www.dalibo.com
>>
>
> Thanks. That's what I thought, but it's good to have it confirmed.
>
> -- Matt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: health_check_retries.patch
Type: application/octet-stream
Size: 8300 bytes
Desc: health_check_retries.patch
URL: <http://pgfoundry.org/pipermail/pgpool-hackers/attachments/20111118/3cc2b019/attachment.obj>
More information about the Pgpool-hackers
mailing list