[Pgpool-hackers] Health check retries (patch)
Tatsuo Ishii
ishii at sraoss.co.jp
Sat Nov 19 04:53:25 UTC 2011
Matt,
Thank you! The patch looks pretty good. Patch committed with a few
modications.
http://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=55199bdfa7630cf9a5142703ef85ee7695bb4221
1) While retrying, emit log(rather than debug message). This would be
more usefull for DBA because it makes clear that pgpool tries to
recover state. Here is a sampe message.
2011-11-19 13:23:12 LOG: pid 10375: health check retry sleep time: 1 second(s)
2) After successfull retry, emit a log.
2011-11-19 13:23:19 LOG: pid 10375: after some retrying backend returned to healthy state
BTW, I think to make the new feature works better, it's best to turn
on fail_over_on_backend_error because even if health checking retries,
writing to backend socket causes immediate failover if
fail_over_on_backend_error is set to off.
Also new_connection() was fixed because it caused immediate failover
when trying to connect to backend despite fail_over_on_backend_error
is set to on.
Could you provide English documentation for this?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
> Hi everyone. In August, I wrote to the pgpool-general list (see below) asking if there was any
> way to have pgpool-II retry a failed health check before promoting the slave.
>
> I'm attaching a patch that adds this functionality. Would anyone care to review it? We've been
> using it successfully in production for about 3 months now, and it's working great.
>
> This is my first time submitting a patch to PostgreSQL or PgPool, so go easy :-).
>
> Some comments:
> - The purpose of this feature is to allow pgpool-II to handle brief networking interruptions
> without being "fooled" into thinking that the master node is down and the slave needs to
> be promoted.
> - This patch adds two new configuration settings.
> - The "health_check_max_retries" setting is the maximum number of times to retry a health
> check before giving up.
> - The "health_check_retry_delay" is the amount of time (in seconds) to sleep between retries.
> - The feature is turned *off* by default (health_check_max_retries defaults to 0, or no retries).
>
> Patch is against git HEAD revision (commit 58043c962b8305507de0f450be74c24cbe4c8430).
>
> Please let me know if you have any questions or comments.
>
> -- Matt
>
> Begin forwarded message:
>
>> From: Matt Solnit <msolnit at soasta.com>
>> Subject: Re: [Pgpool-general] Can pgpool-II retry failed health checks?
>> Date: August 4, 2011 10:57:27 PM PDT
>> To: Guillaume Lelarge <guillaume at lelarge.info>
>> Cc: "pgpool-general at pgfoundry.org" <pgpool-general at pgfoundry.org>
>>
>> On Aug 4, 2011, at 10:54 PM, Guillaume Lelarge wrote:
>>
>>> On Fri, 2011-08-05 at 00:17 -0400, Matt Solnit wrote:
>>>> On Jul 29, 2011, at 10:37 PM, Matthew Solnit wrote:
>>>>
>>>>> Hi everyone. I'm using pgpool-II 3.0.4 with PostgreSQL 9.0.2, in streaming replication mode. We've had
>>>>> a couple of cases where pgpool-II got a network timeout while performing a health check on the master
>>>>> node, and then immediately initiated failover and promoted the slave. This was a problem in our case
>>>>> because the master was actually fine -- there was just a temporary network "hiccup" that caused a timeout.
>>>>>
>>>>> Is there any way to configure pgpool-II to retry in this case? I couldn't find one in the documentation.
>>>>>
>>>>> I did see the "Unplugged Wire" thead (http://pgfoundry.org/pipermail/pgpool-general/2010-March/002589.html),
>>>>> which indicates that there was a single retry at one point, which was removed. But what I am more interested
>>>>> in is a configurable number of retries, with a configurable delay between retries.
>>>>>
>>>>> -- Matt
>>>>
>>>> Hi everyone. I just wanted to try one more time to get an answer for this :-). We would really, really
>>>> like to find a solution.
>>>>
>>>
>>> That kind of configuration doesn't exist right, but could be interesting
>>> to add to a future release.
>>>
>>>
>>> --
>>> Guillaume
>>> http://blog.guillaume.lelarge.info
>>> http://www.dalibo.com
>>>
>>
>> Thanks. That's what I thought, but it's good to have it confirmed.
>>
>> -- Matt
More information about the Pgpool-hackers
mailing list