[pgpool-general: 182] Re: Healthcheck timeout not always respected

Stevo Slavić sslavic at gmail.com
Fri Jan 20 01:09:13 JST 2012


Tatsuo,

Here are the patches which should be applied to current pgpool head for
fixing this issue:

Fixes-health-check-timeout.patch
Fixes-health-check-retrying-after-failover.patch
Fixes-clearing-exitrequest-flag.patch

Quirk I noticed in logs was resolved as well - after failover pgpool would
perform healthcheck and report it is doing (max retries + 1) th health
check which was confusing. Rather I've adjusted that it does and reports
it's doing a new health check cycle after failover.

I've tested and it works well - when in raw mode, backends set to disallow
failover, failover on backend failure disabled, and health checks
configured with retries (30sec interval, 5sec timeout, 2 retries, 10sec
delay between retries).

Please test, and if confirmed ok include in next release.

Kind regards,
Stevo.


2012/1/16 Stevo Slavić <sslavic at gmail.com>

> Here is pgpool.log, strace.out, and pgpool.conf when I tested with my
> latest patch for health check timeout applied. It works well, except for
> single quirk, after failover completed in log files it was reported that
> 3rd health check retry was done (even though just 2 are configured, see
> pgpool.conf) and that backend has returned to healthy state. That
> interesting part from log file follows:
>
> Jan 16 01:31:45 sslavic pgpool[1163]: 2012-01-16 01:31:45 DEBUG: pid 1163:
> retrying 3 th health checking
> Jan 16 01:31:45 sslavic pgpool[1163]: 2012-01-16 01:31:45 DEBUG: pid 1163:
> health_check: 0 th DB node status: 3
> Jan 16 01:31:45 sslavic pgpool[1163]: 2012-01-16 01:31:45 LOG:   pid 1163:
> after some retrying backend returned to healthy state
> Jan 16 01:32:15 sslavic pgpool[1163]: 2012-01-16 01:32:15 DEBUG: pid 1163:
> starting health checking
> Jan 16 01:32:15 sslavic pgpool[1163]: 2012-01-16 01:32:15 DEBUG: pid 1163:
> health_check: 0 th DB node status: 3
>
>
> As can be seen in pgpool.conf, there is only one backend configured.
> pgpool did failover well after health check max retries has been reached
> (pgpool just degraded that single backend to 3, and restarted child
> processes).
>
> After this quirk has been logged, next health check logs were as expected.
> Except those couple weird log entries, everything seems to be ok. Maybe
> that quirk was caused by single backend only configuration corner case.
> Will try tomorrow if it occurs on dual backend configuration.
>
> Regards,
> Stevo.
>
>
> 2012/1/16 Stevo Slavić <sslavic at gmail.com>
>
>> Hello Tatsuo,
>>
>> Unfortunately, with your patch when A is on
>> (pool_config->health_check_period > 0) and B is on, when retry count is
>> over, failover will be disallowed because of B being on.
>>
>> Nenad's patch allows failover to be triggered only by health check. Here
>> is the patch which includes Nenad's fix but also fixes issue with health
>> check timeout not being respected.
>>
>> Key points in fix for health check timeout being respected are:
>> - in pool_connection_pool.c connect_inet_domain_socket_by_port function,
>> before trying to connect, file descriptor is set to non-blocking mode, and
>> also non-blocking mode error codes are handled, EINPROGRESS and EALREADY
>> (please verify changes here, especially regarding closing fd)
>> - in main.c health_check_timer_handler has been changed to signal
>> exit_request to health check initiated connect_inet_domain_socket_by_port
>> function call (please verify this, maybe there is a better way to check
>> from connect_inet_domain_socket_by_port if in health_check_timer_expired
>> has been set to 1)
>>
>> These changes will practically make connect attempt to be non-blocking
>> and repeated until:
>> - connection is made, or
>> - unhandled connection error condition is reached, or
>> - health check timer alarm has been raised, or
>> - some other exit request (shutdown) has been issued.
>>
>>
>> Kind regards,
>> Stevo.
>>
>> 2012/1/15 Tatsuo Ishii <ishii at postgresql.org>
>>
>>> Ok, let me clarify use cases regarding failover.
>>>
>>> Currently there are three parameters:
>>> a) health_check
>>> b) DISALLOW_TO_FAILOVER
>>> c) fail_over_on_backend_error
>>>
>>> Source of errors which can trigger failover are 1)health check 2)write
>>> to backend socket 3)read backend from socket. I represent each 1) as
>>> A, 2) as B, 3) as C.
>>>
>>> 1) trigger failover if A or B or C is error
>>> a = on, b = off, c = on
>>>
>>> 2) trigger failover only when B or C is error
>>> a = off, b = off, c = on
>>>
>>> 3) trigger failover only when B is error
>>> Impossible. Because C error always triggers failover.
>>>
>>> 4) trigger failover only when C is error
>>> a = off, b = off, c = off
>>>
>>> 5) trigger failover only when A is error(Stevo wants this)
>>> Impossible. Because C error always triggers failover.
>>>
>>> 6) never trigger failover
>>> Impossible. Because C error always triggers failover.
>>>
>>> As you can see, C is the problem here (look at #3, #5 and #6)
>>>
>>> If we implemented this:
>>> >> However I think we should disable failover if DISALLOW_TO_FAILOVER set
>>> >> in case of reading data from backend. This should have been done when
>>> >> DISALLOW_TO_FAILOVER was introduced because this is exactly what
>>> >> DISALLOW_TO_FAILOVER tries to accomplish. What do you think?
>>>
>>> 1) trigger failover if A or B or C is error
>>> a = on, b = off, c = on
>>>
>>> 2) trigger failover only when B or C is error
>>> a = off, b = off, c = on
>>>
>>> 3) trigger failover only when B is error
>>> a = off, b = on, c = on
>>>
>>> 4) trigger failover only when C is error
>>> a = off, b = off, c = off
>>>
>>> 5) trigger failover only when A is error(Stevo wants this)
>>> a = on, b = on, c = off
>>>
>>> 6) never trigger failover
>>> a = off, b = on, c = off
>>>
>>> So it seems my patch will solve all the problems including yours.
>>> (timeout while retrying is another issue of course).
>>> --
>>> Tatsuo Ishii
>>> SRA OSS, Inc. Japan
>>> English: http://www.sraoss.co.jp/index_en.php
>>> Japanese: http://www.sraoss.co.jp
>>>
>>> > I agree, fail_over_on_backend_error isn't useful, just adds confusion
>>> by
>>> > overlapping with DISALLOW_TO_FAILOVER.
>>> >
>>> > With your patch or without it, it is not possible to failover only on
>>> > health check (max retries) failure. With Nenad's patch, that part
>>> works ok
>>> > and I think that patch is semantically ok - failover occurs even though
>>> > DISALLOW_TO_FAILOVER is set for backend but only when health check is
>>> > configured too. Configuring health check without failover on failed
>>> health
>>> > check has no purpose. Also health check configured with allowed
>>> failover on
>>> > any condition other than health check (max retries) failure has no
>>> purpose.
>>> >
>>> > Kind regards,
>>> > Stevo.
>>> >
>>> > 2012/1/15 Tatsuo Ishii <ishii at postgresql.org>
>>> >
>>> >> fail_over_on_backend_error has different meaning from
>>> >> DISALLOW_TO_FAILOVER. From the doc:
>>> >>
>>> >>  If true, and an error occurs when writing to the backend
>>> >>  communication, pgpool-II will trigger the fail over procedure . This
>>> >>  is the same behavior as of pgpool-II 2.2.x or earlier. If set to
>>> >>  false, pgpool will report an error and disconnect the session.
>>> >>
>>> >> This means that if pgpool failed to read from backend, it will trigger
>>> >> failover even if fail_over_on_backend_error to off. So unconditionaly
>>> >> disabling failover will lead backward imcompatibilty.
>>> >>
>>> >> However I think we should disable failover if DISALLOW_TO_FAILOVER set
>>> >> in case of reading data from backend. This should have been done when
>>> >> DISALLOW_TO_FAILOVER was introduced because this is exactly what
>>> >> DISALLOW_TO_FAILOVER tries to accomplish. What do you think?
>>> >> --
>>> >> Tatsuo Ishii
>>> >> SRA OSS, Inc. Japan
>>> >> English: http://www.sraoss.co.jp/index_en.php
>>> >> Japanese: http://www.sraoss.co.jp
>>> >>
>>> >> > For a moment I thought we could have set fail_over_on_backend_error
>>> to
>>> >> off,
>>> >> > and have backends set with ALLOW_TO_FAILOVER flag. But then I
>>> looked in
>>> >> > code.
>>> >> >
>>> >> > In child.c there is a loop child process goes through in its
>>> lifetime.
>>> >> When
>>> >> > fatal error condition occurs before child process exits it will call
>>> >> > notice_backend_error which will call degenerate_backend_set which
>>> will
>>> >> not
>>> >> > take into account fail_over_on_backend_error is set to off, causing
>>> >> backend
>>> >> > to be degenerated and failover to occur. That's why we have
>>> backends set
>>> >> > with DISALLOW_TO_FAILOVER but with our patch applied, health check
>>> could
>>> >> > cause failover to occur as expected.
>>> >> >
>>> >> > Maybe it would be enough just to modify degenerate_backend_set, to
>>> take
>>> >> > fail_over_on_backend_error into account just like it already takes
>>> >> > DISALLOW_TO_FAILOVER into account.
>>> >> >
>>> >> > Kind regards,
>>> >> > Stevo.
>>> >> >
>>> >> > 2012/1/15 Stevo Slavić <sslavic at gmail.com>
>>> >> >
>>> >> >> Yes and that behaviour which you describe as expected, is not what
>>> we
>>> >> >> want. We want pgpool to degrade backend0 and failover when
>>> configured
>>> >> max
>>> >> >> health check retries have failed, and to failover only in that
>>> case, so
>>> >> not
>>> >> >> sooner e.g. connection/child error condition, but as soon as max
>>> health
>>> >> >> check retries have been attempted.
>>> >> >>
>>> >> >> Maybe examples will be more clear.
>>> >> >>
>>> >> >> Imagine two nodes (node 1 and node 2). On each node a single
>>> pgpool and
>>> >> a
>>> >> >> single backend. Apps/clients access db through pgpool on their own
>>> node.
>>> >> >> Two backends are configured in postgres native streaming
>>> replication.
>>> >> >> pgpools are used in raw mode. Both pgpools have same backend as
>>> >> backend0,
>>> >> >> and same backend as backend1.
>>> >> >> initial state: both backends are up and pgpool can access them,
>>> clients
>>> >> >> connect to their pgpool and do their work on master backend,
>>> backend0.
>>> >> >>
>>> >> >> 1st case: unmodified/non-patched pgpool 3.1.1 is used, backends are
>>> >> >> configured with ALLOW_TO_FAILOVER flag
>>> >> >> - temporary network outage happens between pgpool on node 2 and
>>> backend0
>>> >> >> - error condition is reported by child process, and since
>>> >> >> ALLOW_TO_FAILOVER is set, pgpool performs failover without giving
>>> >> chance to
>>> >> >> pgpool health check retries to control whether backend is just
>>> >> temporarily
>>> >> >> inaccessible
>>> >> >> - failover command on node 2 promotes standby backend to a new
>>> master -
>>> >> >> split brain occurs, with two masters
>>> >> >>
>>> >> >>
>>> >> >> 2nd case: unmodified/non-patched pgpool 3.1.1 is used, backends are
>>> >> >> configured with DISALLOW_TO_FAILOVER
>>> >> >> - temporary network outage happens between pgpool on node 2 and
>>> backend0
>>> >> >> - error condition is reported by child process, and since
>>> >> >> DISALLOW_TO_FAILOVER is set, pgpool does not perform failover
>>> >> >> - health check gets a chance to check backend0 condition,
>>> determines
>>> >> that
>>> >> >> it's not accessible, there will be no health check retries because
>>> >> >> DISALLOW_TO_FAILOVER is set, no failover occurs ever
>>> >> >>
>>> >> >>
>>> >> >> 3rd case, pgpool 3.1.1 + patch you've sent applied, and backends
>>> >> >> configured with DISALLOW_TO_FAILOVER
>>> >> >> - temporary network outage happens between pgpool on node 2 and
>>> backend0
>>> >> >> - error condition is reported by child process, and since
>>> >> >> DISALLOW_TO_FAILOVER is set, pgpool does not perform failover
>>> >> >> - health check gets a chance to check backend0 condition,
>>> determines
>>> >> that
>>> >> >> it's not accessible, health check retries happen, and even after
>>> max
>>> >> >> retries, no failover happens since failover is disallowed
>>> >> >>
>>> >> >>
>>> >> >> 4th expected behaviour, pgpool 3.1.1 + patch we sent, and backends
>>> >> >> configured with DISALLOW_TO_FAILOVER
>>> >> >> - temporary network outage happens between pgpool on node 2 and
>>> backend0
>>> >> >> - error condition is reported by child process, and since
>>> >> >> DISALLOW_TO_FAILOVER is set, pgpool does not perform failover
>>> >> >> - health check gets a chance to check backend0 condition,
>>> determines
>>> >> that
>>> >> >> it's not accessible, health check retries happen, before a max
>>> retry
>>> >> >> network condition is cleared, retry happens, and backend0 remains
>>> to be
>>> >> >> master, no failover occurs, temporary network issue did not cause
>>> split
>>> >> >> brain
>>> >> >> - after some time, temporary network outage happens again between
>>> pgpool
>>> >> >> on node 2 and backend0
>>> >> >> - error condition is reported by child process, and since
>>> >> >> DISALLOW_TO_FAILOVER is set, pgpool does not perform failover
>>> >> >> - health check gets a chance to check backend0 condition,
>>> determines
>>> >> that
>>> >> >> it's not accessible, health check retries happen, after max retries
>>> >> >> backend0 is still not accessible, failover happens, standby is new
>>> >> master
>>> >> >> and backend0 is degraded
>>> >> >>
>>> >> >> Kind regards,
>>> >> >> Stevo.
>>> >> >>
>>> >> >>
>>> >> >> 2012/1/15 Tatsuo Ishii <ishii at postgresql.org>
>>> >> >>
>>> >> >>> In my test evironment, the patch works as expected. I have two
>>> >> >>> backends. Health check retry conf is as follows:
>>> >> >>>
>>> >> >>> health_check_max_retries = 3
>>> >> >>> health_check_retry_delay = 1
>>> >> >>>
>>> >> >>> 5 09:17:20 LOG:   pid 21411: Backend status file
>>> /home/t-ishii/work/
>>> >> >>> git.postgresql.org/test/log/pgpool_status discarded
>>> >> >>> 2012-01-15 09:17:20 LOG:   pid 21411: pgpool-II successfully
>>> started.
>>> >> >>> version 3.2alpha1 (hatsuiboshi)
>>> >> >>> 2012-01-15 09:17:20 LOG:   pid 21411: find_primary_node: primary
>>> node
>>> >> id
>>> >> >>> is 0
>>> >> >>> -- backend1 was shutdown
>>> >> >>>
>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21445:
>>> >> connect_unix_domain_socket_by_port:
>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such file or directory
>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21445:
>>> make_persistent_db_connection:
>>> >> >>> connection to /tmp(11001) failed
>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21445: check_replication_time_lag:
>>> could
>>> >> >>> not connect to DB node 1, check sr_check_user and
>>> sr_check_password
>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21411:
>>> >> connect_unix_domain_socket_by_port:
>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such file or directory
>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21411:
>>> make_persistent_db_connection:
>>> >> >>> connection to /tmp(11001) failed
>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21411:
>>> >> connect_unix_domain_socket_by_port:
>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such file or directory
>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21411:
>>> make_persistent_db_connection:
>>> >> >>> connection to /tmp(11001) failed
>>> >> >>> -- health check failed
>>> >> >>>
>>> >> >>> 2012-01-15 09:17:50 ERROR: pid 21411: health check failed. 1 th
>>> host
>>> >> /tmp
>>> >> >>> at port 11001 is down
>>> >> >>> -- start retrying
>>> >> >>> 2012-01-15 09:17:50 LOG:   pid 21411: health check retry sleep
>>> time: 1
>>> >> >>> second(s)
>>> >> >>> 2012-01-15 09:17:51 ERROR: pid 21411:
>>> >> connect_unix_domain_socket_by_port:
>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such file or directory
>>> >> >>> 2012-01-15 09:17:51 ERROR: pid 21411:
>>> make_persistent_db_connection:
>>> >> >>> connection to /tmp(11001) failed
>>> >> >>> 2012-01-15 09:17:51 ERROR: pid 21411: health check failed. 1 th
>>> host
>>> >> /tmp
>>> >> >>> at port 11001 is down
>>> >> >>> 2012-01-15 09:17:51 LOG:   pid 21411: health check retry sleep
>>> time: 1
>>> >> >>> second(s)
>>> >> >>> 2012-01-15 09:17:52 ERROR: pid 21411:
>>> >> connect_unix_domain_socket_by_port:
>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such file or directory
>>> >> >>> 2012-01-15 09:17:52 ERROR: pid 21411:
>>> make_persistent_db_connection:
>>> >> >>> connection to /tmp(11001) failed
>>> >> >>> 2012-01-15 09:17:52 ERROR: pid 21411: health check failed. 1 th
>>> host
>>> >> /tmp
>>> >> >>> at port 11001 is down
>>> >> >>> 2012-01-15 09:17:52 LOG:   pid 21411: health check retry sleep
>>> time: 1
>>> >> >>> second(s)
>>> >> >>> 2012-01-15 09:17:53 ERROR: pid 21411:
>>> >> connect_unix_domain_socket_by_port:
>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such file or directory
>>> >> >>> 2012-01-15 09:17:53 ERROR: pid 21411:
>>> make_persistent_db_connection:
>>> >> >>> connection to /tmp(11001) failed
>>> >> >>> 2012-01-15 09:17:53 ERROR: pid 21411: health check failed. 1 th
>>> host
>>> >> /tmp
>>> >> >>> at port 11001 is down
>>> >> >>> 2012-01-15 09:17:53 LOG:   pid 21411: health_check: 1 failover is
>>> >> canceld
>>> >> >>> because failover is disallowed
>>> >> >>> -- after 3 retries, pgpool wanted to failover, but gave up because
>>> >> >>> DISALLOW_TO_FAILOVER is set for backend1
>>> >> >>>
>>> >> >>> 2012-01-15 09:18:00 ERROR: pid 21445:
>>> >> connect_unix_domain_socket_by_port:
>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such file or directory
>>> >> >>> 2012-01-15 09:18:00 ERROR: pid 21445:
>>> make_persistent_db_connection:
>>> >> >>> connection to /tmp(11001) failed
>>> >> >>> 2012-01-15 09:18:00 ERROR: pid 21445: check_replication_time_lag:
>>> could
>>> >> >>> not connect to DB node 1, check sr_check_user and
>>> sr_check_password
>>> >> >>> 2012-01-15 09:18:03 ERROR: pid 21411:
>>> >> connect_unix_domain_socket_by_port:
>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such file or directory
>>> >> >>> 2012-01-15 09:18:03 ERROR: pid 21411:
>>> make_persistent_db_connection:
>>> >> >>> connection to /tmp(11001) failed
>>> >> >>> 2012-01-15 09:18:03 ERROR: pid 21411: health check failed. 1 th
>>> host
>>> >> /tmp
>>> >> >>> at port 11001 is down
>>> >> >>> 2012-01-15 09:18:03 LOG:   pid 21411: health check retry sleep
>>> time: 1
>>> >> >>> second(s)
>>> >> >>> 2012-01-15 09:18:04 ERROR: pid 21411:
>>> >> connect_unix_domain_socket_by_port:
>>> >> >>> connect() failed to /tmp/.s.PGSQL.11001: No such file or directory
>>> >> >>> 2012-01-15 09:18:04 ERROR: pid 21411:
>>> make_persistent_db_connection:
>>> >> >>> connection to /tmp(11001) failed
>>> >> >>> 2012-01-15 09:18:04 ERROR: pid 21411: health check failed. 1 th
>>> host
>>> >> /tmp
>>> >> >>> at port 11001 is down
>>> >> >>> 2012-01-15 09:18:04 LOG:   pid 21411: health check retry sleep
>>> time: 1
>>> >> >>> second(s)
>>> >> >>> 2012-01-15 09:18:05 LOG:   pid 21411: after some retrying backend
>>> >> >>> returned to healthy state
>>> >> >>> -- started backend1 and pgpool succeeded in health checking.
>>> Resumed
>>> >> >>> using backend1
>>> >> >>> --
>>> >> >>> Tatsuo Ishii
>>> >> >>> SRA OSS, Inc. Japan
>>> >> >>> English: http://www.sraoss.co.jp/index_en.php
>>> >> >>> Japanese: http://www.sraoss.co.jp
>>> >> >>>
>>> >> >>> > Hello Tatsuo,
>>> >> >>> >
>>> >> >>> > Thank you for the patch and effort, but unfortunately this
>>> change
>>> >> won't
>>> >> >>> > work for us. We need to set disallow failover to prevent
>>> failover on
>>> >> >>> child
>>> >> >>> > reported connection errors (it's ok if few clients lose their
>>> >> >>> connection or
>>> >> >>> > can not connect), and still have pgpool perform failover but
>>> only on
>>> >> >>> failed
>>> >> >>> > health check (if configured, after max retries threshold has
>>> been
>>> >> >>> reached).
>>> >> >>> >
>>> >> >>> > Maybe it would be best to add an extra value for backend_flag -
>>> >> >>> > ALLOW_TO_FAILOVER_ON_HEALTH_CHECK or
>>> >> >>> DISALLOW_TO_FAILOVER_ON_CHILD_ERROR.
>>> >> >>> > It should behave same as DISALLOW_TO_FAILOVER is set, with only
>>> >> >>> difference
>>> >> >>> > in behaviour when health check (if set, max retries) has failed
>>> -
>>> >> unlike
>>> >> >>> > DISALLOW_TO_FAILOVER, this new flag should allow failover in
>>> this
>>> >> case
>>> >> >>> only.
>>> >> >>> >
>>> >> >>> > Without this change health check (especially health check
>>> retries)
>>> >> >>> doesn't
>>> >> >>> > make much sense - child error is more likely to occur on
>>> (temporary)
>>> >> >>> > backend failure then health check and will or will not cause
>>> >> failover to
>>> >> >>> > occur depending on backend flag, without giving health check
>>> retries
>>> >> a
>>> >> >>> > chance to determine if failure was temporary or not, risking
>>> split
>>> >> brain
>>> >> >>> > situation with two masters just because of temporary network
>>> link
>>> >> >>> hiccup.
>>> >> >>> >
>>> >> >>> > Our main problem remains though with the health check timeout
>>> not
>>> >> being
>>> >> >>> > respected in these special conditions we have. Maybe Nenad can
>>> help
>>> >> you
>>> >> >>> > more to reproduce the issue on your environment.
>>> >> >>> >
>>> >> >>> > Kind regards,
>>> >> >>> > Stevo.
>>> >> >>> >
>>> >> >>> > 2012/1/13 Tatsuo Ishii <ishii at postgresql.org>
>>> >> >>> >
>>> >> >>> >> Thanks for pointing it out.
>>> >> >>> >> Yes, checking DISALLOW_TO_FAILOVER before retrying is wrong.
>>> >> >>> >> However, after retry count over, we should check
>>> >> DISALLOW_TO_FAILOVER I
>>> >> >>> >> think.
>>> >> >>> >> Attached is the patch attempt to fix it. Please try.
>>> >> >>> >> --
>>> >> >>> >> Tatsuo Ishii
>>> >> >>> >> SRA OSS, Inc. Japan
>>> >> >>> >> English: http://www.sraoss.co.jp/index_en.php
>>> >> >>> >> Japanese: http://www.sraoss.co.jp
>>> >> >>> >>
>>> >> >>> >> > pgpool is being used in raw mode - just for (health check
>>> based)
>>> >> >>> failover
>>> >> >>> >> > part, so applications are not required to restart when
>>> standby
>>> >> gets
>>> >> >>> >> > promoted to new master. Here is pgpool.conf file and a very
>>> small
>>> >> >>> patch
>>> >> >>> >> > we're using applied to pgpool 3.1.1 release.
>>> >> >>> >> >
>>> >> >>> >> > We have to have DISALLOW_TO_FAILOVER set for the backend
>>> since any
>>> >> >>> child
>>> >> >>> >> > process that detects condition that master/backend0 is not
>>> >> >>> available, if
>>> >> >>> >> > DISALLOW_TO_FAILOVER was not set, will degenerate backend
>>> without
>>> >> >>> giving
>>> >> >>> >> > health check a chance to retry. We need health check with
>>> retries
>>> >> >>> because
>>> >> >>> >> > condition that backend0 is not available could be temporary
>>> >> (network
>>> >> >>> >> > glitches to the remote site where master is, or deliberate
>>> >> failover
>>> >> >>> of
>>> >> >>> >> > master postgres service from one node to the other on remote
>>> site
>>> >> -
>>> >> >>> in
>>> >> >>> >> both
>>> >> >>> >> > cases remote means remote to the pgpool that is going to
>>> perform
>>> >> >>> health
>>> >> >>> >> > checks and ultimately the failover) and we don't want
>>> standby to
>>> >> be
>>> >> >>> >> > promoted as easily to a new master, to prevent temporary
>>> network
>>> >> >>> >> conditions
>>> >> >>> >> > which could occur frequently to frequently cause split brain
>>> with
>>> >> two
>>> >> >>> >> > masters.
>>> >> >>> >> >
>>> >> >>> >> > But then, with DISALLOW_TO_FAILOVER set, without the patch
>>> health
>>> >> >>> check
>>> >> >>> >> > will not retry and will thus give only one chance to backend
>>> (if
>>> >> >>> health
>>> >> >>> >> > check ever occurs before child process failure to connect to
>>> the
>>> >> >>> >> backend),
>>> >> >>> >> > rendering retry settings effectively to be ignored. That's
>>> where
>>> >> this
>>> >> >>> >> patch
>>> >> >>> >> > comes into action - enables health check retries while child
>>> >> >>> processes
>>> >> >>> >> are
>>> >> >>> >> > prevented to degenerate backend.
>>> >> >>> >> >
>>> >> >>> >> > I don't think, but I could be wrong, that this patch
>>> influences
>>> >> the
>>> >> >>> >> > behavior we're seeing with unwanted health check attempt
>>> delays.
>>> >> >>> Also,
>>> >> >>> >> > knowing this, maybe pgpool could be patched or some other
>>> support
>>> >> be
>>> >> >>> >> built
>>> >> >>> >> > into it to cover this use case.
>>> >> >>> >> >
>>> >> >>> >> > Regards,
>>> >> >>> >> > Stevo.
>>> >> >>> >> >
>>> >> >>> >> >
>>> >> >>> >> > 2012/1/12 Tatsuo Ishii <ishii at postgresql.org>
>>> >> >>> >> >
>>> >> >>> >> >> I have accepted the moderation request. Your post should be
>>> sent
>>> >> >>> >> shortly.
>>> >> >>> >> >> Also I have raised the post size limit to 1MB.
>>> >> >>> >> >> I will look into this...
>>> >> >>> >> >> --
>>> >> >>> >> >> Tatsuo Ishii
>>> >> >>> >> >> SRA OSS, Inc. Japan
>>> >> >>> >> >> English: http://www.sraoss.co.jp/index_en.php
>>> >> >>> >> >> Japanese: http://www.sraoss.co.jp
>>> >> >>> >> >>
>>> >> >>> >> >> > Here is the log file and strace output file (this time in
>>> an
>>> >> >>> archive,
>>> >> >>> >> >> > didn't know about 200KB constraint on post size which
>>> requires
>>> >> >>> >> moderator
>>> >> >>> >> >> > approval). Timings configured are 30sec health check
>>> interval,
>>> >> >>> 5sec
>>> >> >>> >> >> > timeout, and 2 retries with 10sec retry delay.
>>> >> >>> >> >> >
>>> >> >>> >> >> > It takes a lot more than 5sec from started health check to
>>> >> >>> sleeping
>>> >> >>> >> 10sec
>>> >> >>> >> >> > for first retry.
>>> >> >>> >> >> >
>>> >> >>> >> >> > Seen in code (main.x, health_check() function), within
>>> (retry)
>>> >> >>> attempt
>>> >> >>> >> >> > there is inner retry (first with postgres database then
>>> with
>>> >> >>> >> template1)
>>> >> >>> >> >> and
>>> >> >>> >> >> > that part doesn't seem to be interrupted by alarm.
>>> >> >>> >> >> >
>>> >> >>> >> >> > Regards,
>>> >> >>> >> >> > Stevo.
>>> >> >>> >> >> >
>>> >> >>> >> >> > 2012/1/12 Stevo Slavić <sslavic at gmail.com>
>>> >> >>> >> >> >
>>> >> >>> >> >> >> Here is the log file and strace output file. Timings
>>> >> configured
>>> >> >>> are
>>> >> >>> >> >> 30sec
>>> >> >>> >> >> >> health check interval, 5sec timeout, and 2 retries with
>>> 10sec
>>> >> >>> retry
>>> >> >>> >> >> delay.
>>> >> >>> >> >> >>
>>> >> >>> >> >> >> It takes a lot more than 5sec from started health check
>>> to
>>> >> >>> sleeping
>>> >> >>> >> >> 10sec
>>> >> >>> >> >> >> for first retry.
>>> >> >>> >> >> >>
>>> >> >>> >> >> >> Seen in code (main.x, health_check() function), within
>>> (retry)
>>> >> >>> >> attempt
>>> >> >>> >> >> >> there is inner retry (first with postgres database then
>>> with
>>> >> >>> >> template1)
>>> >> >>> >> >> and
>>> >> >>> >> >> >> that part doesn't seem to be interrupted by alarm.
>>> >> >>> >> >> >>
>>> >> >>> >> >> >> Regards,
>>> >> >>> >> >> >> Stevo.
>>> >> >>> >> >> >>
>>> >> >>> >> >> >>
>>> >> >>> >> >> >> 2012/1/11 Tatsuo Ishii <ishii at postgresql.org>
>>> >> >>> >> >> >>
>>> >> >>> >> >> >>> Ok, I will do it. In the mean time you could use
>>> "strace -tt
>>> >> -p
>>> >> >>> PID"
>>> >> >>> >> >> >>> to see which system call is blocked.
>>> >> >>> >> >> >>> --
>>> >> >>> >> >> >>> Tatsuo Ishii
>>> >> >>> >> >> >>> SRA OSS, Inc. Japan
>>> >> >>> >> >> >>> English: http://www.sraoss.co.jp/index_en.php
>>> >> >>> >> >> >>> Japanese: http://www.sraoss.co.jp
>>> >> >>> >> >> >>>
>>> >> >>> >> >> >>> > OK, got the info - key point is that ip forwarding is
>>> >> >>> disabled for
>>> >> >>> >> >> >>> security
>>> >> >>> >> >> >>> > reasons. Rules in iptables are not important,
>>> iptables can
>>> >> be
>>> >> >>> >> >> stopped,
>>> >> >>> >> >> >>> or
>>> >> >>> >> >> >>> > previously added rules removed.
>>> >> >>> >> >> >>> >
>>> >> >>> >> >> >>> > Here are the steps to reproduce (kudos to my colleague
>>> >> Nenad
>>> >> >>> >> >> Bulatovic
>>> >> >>> >> >> >>> for
>>> >> >>> >> >> >>> > providing this):
>>> >> >>> >> >> >>> >
>>> >> >>> >> >> >>> > 1.) make sure that ip forwarding is off:
>>> >> >>> >> >> >>> >     echo 0 > /proc/sys/net/ipv4/ip_forward
>>> >> >>> >> >> >>> > 2.) create IP alias on some interface (and have
>>> postgres
>>> >> >>> listen on
>>> >> >>> >> >> it):
>>> >> >>> >> >> >>> >     ip addr add x.x.x.x/yy dev ethz
>>> >> >>> >> >> >>> > 3.) set backend_hostname0 to aforementioned IP
>>> >> >>> >> >> >>> > 4.) start pgpool and monitor health checks
>>> >> >>> >> >> >>> > 5.) remove IP alias:
>>> >> >>> >> >> >>> >     ip addr del x.x.x.x/yy dev ethz
>>> >> >>> >> >> >>> >
>>> >> >>> >> >> >>> >
>>> >> >>> >> >> >>> > Here is the interesting part in pgpool log after this:
>>> >> >>> >> >> >>> > 2012-01-11 17:38:04 DEBUG: pid 24358: starting health
>>> >> checking
>>> >> >>> >> >> >>> > 2012-01-11 17:38:04 DEBUG: pid 24358: health_check: 0
>>> th DB
>>> >> >>> node
>>> >> >>> >> >> >>> status: 2
>>> >> >>> >> >> >>> > 2012-01-11 17:38:04 DEBUG: pid 24358: health_check: 1
>>> th DB
>>> >> >>> node
>>> >> >>> >> >> >>> status: 1
>>> >> >>> >> >> >>> > 2012-01-11 17:38:34 DEBUG: pid 24358: starting health
>>> >> checking
>>> >> >>> >> >> >>> > 2012-01-11 17:38:34 DEBUG: pid 24358: health_check: 0
>>> th DB
>>> >> >>> node
>>> >> >>> >> >> >>> status: 2
>>> >> >>> >> >> >>> > 2012-01-11 17:41:43 DEBUG: pid 24358: health_check: 0
>>> th DB
>>> >> >>> node
>>> >> >>> >> >> >>> status: 2
>>> >> >>> >> >> >>> > 2012-01-11 17:41:46 ERROR: pid 24358: health check
>>> failed.
>>> >> 0
>>> >> >>> th
>>> >> >>> >> host
>>> >> >>> >> >> >>> > 192.168.2.27 at port 5432 is down
>>> >> >>> >> >> >>> > 2012-01-11 17:41:46 LOG:   pid 24358: health check
>>> retry
>>> >> sleep
>>> >> >>> >> time:
>>> >> >>> >> >> 10
>>> >> >>> >> >> >>> > second(s)
>>> >> >>> >> >> >>> >
>>> >> >>> >> >> >>> > That pgpool was configured with health check interval
>>> of
>>> >> >>> 30sec,
>>> >> >>> >> 5sec
>>> >> >>> >> >> >>> > timeout, and 10sec retry delay with 2 max retries.
>>> >> >>> >> >> >>> >
>>> >> >>> >> >> >>> > Making use of libpq instead for connecting to db in
>>> health
>>> >> >>> checks
>>> >> >>> >> IMO
>>> >> >>> >> >> >>> > should resolve it, but you'll best determine which
>>> call
>>> >> >>> exactly
>>> >> >>> >> gets
>>> >> >>> >> >> >>> > blocked waiting. Btw, psql with PGCONNECT_TIMEOUT env
>>> var
>>> >> >>> >> configured
>>> >> >>> >> >> >>> > respects that env var timeout.
>>> >> >>> >> >> >>> >
>>> >> >>> >> >> >>> > Regards,
>>> >> >>> >> >> >>> > Stevo.
>>> >> >>> >> >> >>> >
>>> >> >>> >> >> >>> > On Wed, Jan 11, 2012 at 11:15 AM, Stevo Slavić <
>>> >> >>> sslavic at gmail.com
>>> >> >>> >> >
>>> >> >>> >> >> >>> wrote:
>>> >> >>> >> >> >>> >
>>> >> >>> >> >> >>> >> Tatsuo,
>>> >> >>> >> >> >>> >>
>>> >> >>> >> >> >>> >> Did you restart iptables after adding rule?
>>> >> >>> >> >> >>> >>
>>> >> >>> >> >> >>> >> Regards,
>>> >> >>> >> >> >>> >> Stevo.
>>> >> >>> >> >> >>> >>
>>> >> >>> >> >> >>> >>
>>> >> >>> >> >> >>> >> On Wed, Jan 11, 2012 at 11:12 AM, Stevo Slavić <
>>> >> >>> >> sslavic at gmail.com>
>>> >> >>> >> >> >>> wrote:
>>> >> >>> >> >> >>> >>
>>> >> >>> >> >> >>> >>> Looking into this to verify if these are all
>>> necessary
>>> >> >>> changes
>>> >> >>> >> to
>>> >> >>> >> >> have
>>> >> >>> >> >> >>> >>> port unreachable message silently rejected
>>> (suspecting
>>> >> some
>>> >> >>> >> kernel
>>> >> >>> >> >> >>> >>> parameter tuning is needed).
>>> >> >>> >> >> >>> >>>
>>> >> >>> >> >> >>> >>> Just to clarify it's not a problem that host is
>>> being
>>> >> >>> detected
>>> >> >>> >> by
>>> >> >>> >> >> >>> pgpool
>>> >> >>> >> >> >>> >>> to be down, but the timing when that happens. On
>>> >> environment
>>> >> >>> >> where
>>> >> >>> >> >> >>> issue is
>>> >> >>> >> >> >>> >>> reproduced pgpool as part of health check attempt
>>> tries
>>> >> to
>>> >> >>> >> connect
>>> >> >>> >> >> to
>>> >> >>> >> >> >>> >>> backend and hangs for tcp timeout instead of being
>>> >> >>> interrupted
>>> >> >>> >> by
>>> >> >>> >> >> >>> timeout
>>> >> >>> >> >> >>> >>> alarm. Can you verify/confirm please the health
>>> check
>>> >> retry
>>> >> >>> >> timings
>>> >> >>> >> >> >>> are not
>>> >> >>> >> >> >>> >>> delayed?
>>> >> >>> >> >> >>> >>>
>>> >> >>> >> >> >>> >>> Regards,
>>> >> >>> >> >> >>> >>> Stevo.
>>> >> >>> >> >> >>> >>>
>>> >> >>> >> >> >>> >>>
>>> >> >>> >> >> >>> >>> On Wed, Jan 11, 2012 at 10:50 AM, Tatsuo Ishii <
>>> >> >>> >> >> ishii at postgresql.org
>>> >> >>> >> >> >>> >wrote:
>>> >> >>> >> >> >>> >>>
>>> >> >>> >> >> >>> >>>> Ok, I did:
>>> >> >>> >> >> >>> >>>>
>>> >> >>> >> >> >>> >>>> # iptables -A FORWARD -j REJECT --reject-with
>>> >> >>> >> >> icmp-port-unreachable
>>> >> >>> >> >> >>> >>>>
>>> >> >>> >> >> >>> >>>> on the host where pgpoo is running. And pull
>>> network
>>> >> cable
>>> >> >>> from
>>> >> >>> >> >> >>> >>>> backend0 host network interface. Pgpool detected
>>> the
>>> >> host
>>> >> >>> being
>>> >> >>> >> >> down
>>> >> >>> >> >> >>> >>>> as expected...
>>> >> >>> >> >> >>> >>>> --
>>> >> >>> >> >> >>> >>>> Tatsuo Ishii
>>> >> >>> >> >> >>> >>>> SRA OSS, Inc. Japan
>>> >> >>> >> >> >>> >>>> English: http://www.sraoss.co.jp/index_en.php
>>> >> >>> >> >> >>> >>>> Japanese: http://www.sraoss.co.jp
>>> >> >>> >> >> >>> >>>>
>>> >> >>> >> >> >>> >>>> > Backend is not destination of this message,
>>> pgpool
>>> >> host
>>> >> >>> is,
>>> >> >>> >> and
>>> >> >>> >> >> we
>>> >> >>> >> >> >>> >>>> don't
>>> >> >>> >> >> >>> >>>> > want it to ever get it. With command I've sent
>>> you
>>> >> rule
>>> >> >>> will
>>> >> >>> >> be
>>> >> >>> >> >> >>> >>>> created for
>>> >> >>> >> >> >>> >>>> > any source and destination.
>>> >> >>> >> >> >>> >>>> >
>>> >> >>> >> >> >>> >>>> > Regards,
>>> >> >>> >> >> >>> >>>> > Stevo.
>>> >> >>> >> >> >>> >>>> >
>>> >> >>> >> >> >>> >>>> > On Wed, Jan 11, 2012 at 10:38 AM, Tatsuo Ishii <
>>> >> >>> >> >> >>> ishii at postgresql.org>
>>> >> >>> >> >> >>> >>>> wrote:
>>> >> >>> >> >> >>> >>>> >
>>> >> >>> >> >> >>> >>>> >> I did following:
>>> >> >>> >> >> >>> >>>> >>
>>> >> >>> >> >> >>> >>>> >> Do following on the host where pgpool is
>>> running on:
>>> >> >>> >> >> >>> >>>> >>
>>> >> >>> >> >> >>> >>>> >> # iptables -A FORWARD -j REJECT --reject-with
>>> >> >>> >> >> >>> icmp-port-unreachable -d
>>> >> >>> >> >> >>> >>>> >> 133.137.177.124
>>> >> >>> >> >> >>> >>>> >> (133.137.177.124 is the host where backend is
>>> running
>>> >> >>> on)
>>> >> >>> >> >> >>> >>>> >>
>>> >> >>> >> >> >>> >>>> >> Pull network cable from backend0 host network
>>> >> interface.
>>> >> >>> >> Pgpool
>>> >> >>> >> >> >>> >>>> >> detected the host being down as expected. Am I
>>> >> missing
>>> >> >>> >> >> something?
>>> >> >>> >> >> >>> >>>> >> --
>>> >> >>> >> >> >>> >>>> >> Tatsuo Ishii
>>> >> >>> >> >> >>> >>>> >> SRA OSS, Inc. Japan
>>> >> >>> >> >> >>> >>>> >> English: http://www.sraoss.co.jp/index_en.php
>>> >> >>> >> >> >>> >>>> >> Japanese: http://www.sraoss.co.jp
>>> >> >>> >> >> >>> >>>> >>
>>> >> >>> >> >> >>> >>>> >> > Hello Tatsuo,
>>> >> >>> >> >> >>> >>>> >> >
>>> >> >>> >> >> >>> >>>> >> > With backend0 on one host just configure
>>> following
>>> >> >>> rule on
>>> >> >>> >> >> other
>>> >> >>> >> >> >>> >>>> host
>>> >> >>> >> >> >>> >>>> >> where
>>> >> >>> >> >> >>> >>>> >> > pgpool is:
>>> >> >>> >> >> >>> >>>> >> >
>>> >> >>> >> >> >>> >>>> >> > iptables -A FORWARD -j REJECT --reject-with
>>> >> >>> >> >> >>> icmp-port-unreachable
>>> >> >>> >> >> >>> >>>> >> >
>>> >> >>> >> >> >>> >>>> >> > and then have pgpool startup with health
>>> checking
>>> >> and
>>> >> >>> >> >> retrying
>>> >> >>> >> >> >>> >>>> >> configured,
>>> >> >>> >> >> >>> >>>> >> > and then pull network cable from backend0 host
>>> >> network
>>> >> >>> >> >> >>> interface.
>>> >> >>> >> >> >>> >>>> >> >
>>> >> >>> >> >> >>> >>>> >> > Regards,
>>> >> >>> >> >> >>> >>>> >> > Stevo.
>>> >> >>> >> >> >>> >>>> >> >
>>> >> >>> >> >> >>> >>>> >> > On Wed, Jan 11, 2012 at 6:27 AM, Tatsuo Ishii
>>> <
>>> >> >>> >> >> >>> ishii at postgresql.org
>>> >> >>> >> >> >>> >>>> >
>>> >> >>> >> >> >>> >>>> >> wrote:
>>> >> >>> >> >> >>> >>>> >> >
>>> >> >>> >> >> >>> >>>> >> >> I want to try to test the situation you
>>> descrived:
>>> >> >>> >> >> >>> >>>> >> >>
>>> >> >>> >> >> >>> >>>> >> >> >> > When system is configured for security
>>> >> reasons
>>> >> >>> not
>>> >> >>> >> to
>>> >> >>> >> >> >>> return
>>> >> >>> >> >> >>> >>>> >> >> destination
>>> >> >>> >> >> >>> >>>> >> >> >> > host unreachable messages, even though
>>> >> >>> >> >> >>> health_check_timeout is
>>> >> >>> >> >> >>> >>>> >> >>
>>> >> >>> >> >> >>> >>>> >> >> But I don't know how to do it. I pulled out
>>> the
>>> >> >>> network
>>> >> >>> >> >> cable
>>> >> >>> >> >> >>> and
>>> >> >>> >> >> >>> >>>> >> >> pgpool detected it as expected. Also I
>>> configured
>>> >> the
>>> >> >>> >> server
>>> >> >>> >> >> >>> which
>>> >> >>> >> >> >>> >>>> >> >> PostgreSQL is running on to disable the 5432
>>> >> port. In
>>> >> >>> >> this
>>> >> >>> >> >> case
>>> >> >>> >> >> >>> >>>> >> >> connect(2) returned EHOSTUNREACH (No route to
>>> >> host)
>>> >> >>> so
>>> >> >>> >> >> pgpool
>>> >> >>> >> >> >>> >>>> detected
>>> >> >>> >> >> >>> >>>> >> >> the error as expected.
>>> >> >>> >> >> >>> >>>> >> >>
>>> >> >>> >> >> >>> >>>> >> >> Could you please instruct me?
>>> >> >>> >> >> >>> >>>> >> >> --
>>> >> >>> >> >> >>> >>>> >> >> Tatsuo Ishii
>>> >> >>> >> >> >>> >>>> >> >> SRA OSS, Inc. Japan
>>> >> >>> >> >> >>> >>>> >> >> English:
>>> http://www.sraoss.co.jp/index_en.php
>>> >> >>> >> >> >>> >>>> >> >> Japanese: http://www.sraoss.co.jp
>>> >> >>> >> >> >>> >>>> >> >>
>>> >> >>> >> >> >>> >>>> >> >> > Hello Tatsuo,
>>> >> >>> >> >> >>> >>>> >> >> >
>>> >> >>> >> >> >>> >>>> >> >> > Thank you for replying!
>>> >> >>> >> >> >>> >>>> >> >> >
>>> >> >>> >> >> >>> >>>> >> >> > I'm not sure what exactly is blocking,
>>> just by
>>> >> >>> pgpool
>>> >> >>> >> code
>>> >> >>> >> >> >>> >>>> analysis I
>>> >> >>> >> >> >>> >>>> >> >> > suspect it is the part where a connection
>>> is
>>> >> made
>>> >> >>> to
>>> >> >>> >> the
>>> >> >>> >> >> db
>>> >> >>> >> >> >>> and
>>> >> >>> >> >> >>> >>>> it
>>> >> >>> >> >> >>> >>>> >> >> doesn't
>>> >> >>> >> >> >>> >>>> >> >> > seem to get interrupted by alarm. Tested
>>> >> thoroughly
>>> >> >>> >> health
>>> >> >>> >> >> >>> check
>>> >> >>> >> >> >>> >>>> >> >> behaviour,
>>> >> >>> >> >> >>> >>>> >> >> > it works really well when host/ip is there
>>> and
>>> >> just
>>> >> >>> >> >> >>> >>>> backend/postgres
>>> >> >>> >> >> >>> >>>> >> is
>>> >> >>> >> >> >>> >>>> >> >> > down, but not when backend host/ip is
>>> down. I
>>> >> could
>>> >> >>> >> see in
>>> >> >>> >> >> >>> log
>>> >> >>> >> >> >>> >>>> that
>>> >> >>> >> >> >>> >>>> >> >> initial
>>> >> >>> >> >> >>> >>>> >> >> > health check and each retry got delayed
>>> when
>>> >> >>> host/ip is
>>> >> >>> >> >> not
>>> >> >>> >> >> >>> >>>> reachable,
>>> >> >>> >> >> >>> >>>> >> >> > while when just backend is not listening
>>> (is
>>> >> down)
>>> >> >>> on
>>> >> >>> >> the
>>> >> >>> >> >> >>> >>>> reachable
>>> >> >>> >> >> >>> >>>> >> >> host/ip
>>> >> >>> >> >> >>> >>>> >> >> > then initial health check and all retries
>>> are
>>> >> >>> exact to
>>> >> >>> >> the
>>> >> >>> >> >> >>> >>>> settings in
>>> >> >>> >> >> >>> >>>> >> >> > pgpool.conf.
>>> >> >>> >> >> >>> >>>> >> >> >
>>> >> >>> >> >> >>> >>>> >> >> > PGCONNECT_TIMEOUT is listed as one of the
>>> libpq
>>> >> >>> >> >> environment
>>> >> >>> >> >> >>> >>>> variables
>>> >> >>> >> >> >>> >>>> >> in
>>> >> >>> >> >> >>> >>>> >> >> > the docs (see
>>> >> >>> >> >> >>> >>>> >> >>
>>> >> >>> >> http://www.postgresql.org/docs/9.1/static/libpq-envars.html)
>>> >> >>> >> >> >>> >>>> >> >> > There is equivalent parameter in libpq
>>> >> >>> >> PGconnectdbParams (
>>> >> >>> >> >> >>> see
>>> >> >>> >> >> >>> >>>> >> >> >
>>> >> >>> >> >> >>> >>>> >> >>
>>> >> >>> >> >> >>> >>>> >>
>>> >> >>> >> >> >>> >>>>
>>> >> >>> >> >> >>>
>>> >> >>> >> >>
>>> >> >>> >>
>>> >> >>>
>>> >>
>>> http://www.postgresql.org/docs/9.1/static/libpq-connect.html#LIBPQ-CONNECT-CONNECT-TIMEOUT
>>> >> >>> >> >> >>> >>>> >> >> )
>>> >> >>> >> >> >>> >>>> >> >> > At the beginning of that same page there
>>> are
>>> >> some
>>> >> >>> >> >> important
>>> >> >>> >> >> >>> >>>> infos on
>>> >> >>> >> >> >>> >>>> >> >> using
>>> >> >>> >> >> >>> >>>> >> >> > these functions.
>>> >> >>> >> >> >>> >>>> >> >> >
>>> >> >>> >> >> >>> >>>> >> >> > psql respects PGCONNECT_TIMEOUT.
>>> >> >>> >> >> >>> >>>> >> >> >
>>> >> >>> >> >> >>> >>>> >> >> > Regards,
>>> >> >>> >> >> >>> >>>> >> >> > Stevo.
>>> >> >>> >> >> >>> >>>> >> >> >
>>> >> >>> >> >> >>> >>>> >> >> > On Wed, Jan 11, 2012 at 12:13 AM, Tatsuo
>>> Ishii <
>>> >> >>> >> >> >>> >>>> ishii at postgresql.org>
>>> >> >>> >> >> >>> >>>> >> >> wrote:
>>> >> >>> >> >> >>> >>>> >> >> >
>>> >> >>> >> >> >>> >>>> >> >> >> > Hello pgpool community,
>>> >> >>> >> >> >>> >>>> >> >> >> >
>>> >> >>> >> >> >>> >>>> >> >> >> > When system is configured for security
>>> >> reasons
>>> >> >>> not
>>> >> >>> >> to
>>> >> >>> >> >> >>> return
>>> >> >>> >> >> >>> >>>> >> >> destination
>>> >> >>> >> >> >>> >>>> >> >> >> > host unreachable messages, even though
>>> >> >>> >> >> >>> health_check_timeout is
>>> >> >>> >> >> >>> >>>> >> >> >> configured,
>>> >> >>> >> >> >>> >>>> >> >> >> > socket call will block and alarm will
>>> not get
>>> >> >>> raised
>>> >> >>> >> >> >>> until TCP
>>> >> >>> >> >> >>> >>>> >> timeout
>>> >> >>> >> >> >>> >>>> >> >> >> > occurs.
>>> >> >>> >> >> >>> >>>> >> >> >>
>>> >> >>> >> >> >>> >>>> >> >> >> Interesting. So are you saying that
>>> read(2)
>>> >> >>> cannot be
>>> >> >>> >> >> >>> >>>> interrupted by
>>> >> >>> >> >> >>> >>>> >> >> >> alarm signal if the system is configured
>>> not to
>>> >> >>> return
>>> >> >>> >> >> >>> >>>> destination
>>> >> >>> >> >> >>> >>>> >> >> >> host unreachable message? Could you please
>>> >> guide
>>> >> >>> me
>>> >> >>> >> >> where I
>>> >> >>> >> >> >>> can
>>> >> >>> >> >> >>> >>>> get
>>> >> >>> >> >> >>> >>>> >> >> >> such that info? (I'm not a network
>>> expert).
>>> >> >>> >> >> >>> >>>> >> >> >>
>>> >> >>> >> >> >>> >>>> >> >> >> > Not a C programmer, found some info that
>>> >> select
>>> >> >>> call
>>> >> >>> >> >> >>> could be
>>> >> >>> >> >> >>> >>>> >> replace
>>> >> >>> >> >> >>> >>>> >> >> >> with
>>> >> >>> >> >> >>> >>>> >> >> >> > select/pselect calls. Maybe it would be
>>> best
>>> >> if
>>> >> >>> >> >> >>> >>>> PGCONNECT_TIMEOUT
>>> >> >>> >> >> >>> >>>> >> >> value
>>> >> >>> >> >> >>> >>>> >> >> >> > could be used here for connection
>>> timeout.
>>> >> >>> pgpool
>>> >> >>> >> has
>>> >> >>> >> >> >>> libpq as
>>> >> >>> >> >> >>> >>>> >> >> >> dependency,
>>> >> >>> >> >> >>> >>>> >> >> >> > why isn't it using libpq for the
>>> healthcheck
>>> >> db
>>> >> >>> >> connect
>>> >> >>> >> >> >>> >>>> calls, then
>>> >> >>> >> >> >>> >>>> >> >> >> > PGCONNECT_TIMEOUT would be applied?
>>> >> >>> >> >> >>> >>>> >> >> >>
>>> >> >>> >> >> >>> >>>> >> >> >> I don't think libpq uses select/pselect
>>> for
>>> >> >>> >> establishing
>>> >> >>> >> >> >>> >>>> connection,
>>> >> >>> >> >> >>> >>>> >> >> >> but using libpq instead of homebrew code
>>> seems
>>> >> to
>>> >> >>> be
>>> >> >>> >> an
>>> >> >>> >> >> >>> idea.
>>> >> >>> >> >> >>> >>>> Let me
>>> >> >>> >> >> >>> >>>> >> >> >> think about it.
>>> >> >>> >> >> >>> >>>> >> >> >>
>>> >> >>> >> >> >>> >>>> >> >> >> One question. Are you sure that libpq can
>>> deal
>>> >> >>> with
>>> >> >>> >> the
>>> >> >>> >> >> case
>>> >> >>> >> >> >>> >>>> (not to
>>> >> >>> >> >> >>> >>>> >> >> >> return destination host unreachable
>>> messages)
>>> >> by
>>> >> >>> using
>>> >> >>> >> >> >>> >>>> >> >> >> PGCONNECT_TIMEOUT?
>>> >> >>> >> >> >>> >>>> >> >> >> --
>>> >> >>> >> >> >>> >>>> >> >> >> Tatsuo Ishii
>>> >> >>> >> >> >>> >>>> >> >> >> SRA OSS, Inc. Japan
>>> >> >>> >> >> >>> >>>> >> >> >> English:
>>> http://www.sraoss.co.jp/index_en.php
>>> >> >>> >> >> >>> >>>> >> >> >> Japanese: http://www.sraoss.co.jp
>>> >> >>> >> >> >>> >>>> >> >> >>
>>> >> >>> >> >> >>> >>>> >> >>
>>> >> >>> >> >> >>> >>>> >>
>>> >> >>> >> >> >>> >>>>
>>> >> >>> >> >> >>> >>>
>>> >> >>> >> >> >>> >>>
>>> >> >>> >> >> >>> >>
>>> >> >>> >> >> >>>
>>> >> >>> >> >> >>
>>> >> >>> >> >> >>
>>> >> >>> >> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>
>>> >> >>
>>> >>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20120119/7a25ab27/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Fixes-health-check-timeout.patch
Type: application/octet-stream
Size: 2592 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20120119/7a25ab27/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Fixes-health-check-retrying-after-failover.patch
Type: application/octet-stream
Size: 995 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20120119/7a25ab27/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Fixes-clearing-exitrequest-flag.patch
Type: application/octet-stream
Size: 1020 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20120119/7a25ab27/attachment-0005.obj>


More information about the pgpool-general mailing list