[pgpool-general: 140] Re: Healthcheck timeout not always respected

Wed Jan 11 19:15:11 JST 2012

Tatsuo,

Did you restart iptables after adding rule?

Regards,
Stevo.

On Wed, Jan 11, 2012 at 11:12 AM, Stevo Slavić <sslavic at gmail.com> wrote:

> Looking into this to verify if these are all necessary changes to have
> port unreachable message silently rejected (suspecting some kernel
> parameter tuning is needed).
>
> Just to clarify it's not a problem that host is being detected by pgpool
> to be down, but the timing when that happens. On environment where issue is
> reproduced pgpool as part of health check attempt tries to connect to
> backend and hangs for tcp timeout instead of being interrupted by timeout
> alarm. Can you verify/confirm please the health check retry timings are not
> delayed?
>
> Regards,
> Stevo.
>
>
> On Wed, Jan 11, 2012 at 10:50 AM, Tatsuo Ishii <ishii at postgresql.org>wrote:
>
>> Ok, I did:
>>
>> # iptables -A FORWARD -j REJECT --reject-with icmp-port-unreachable
>>
>> on the host where pgpoo is running. And pull network cable from
>> backend0 host network interface. Pgpool detected the host being down
>> as expected...
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>> > Backend is not destination of this message, pgpool host is, and we don't
>> > want it to ever get it. With command I've sent you rule will be created
>> for
>> > any source and destination.
>> >
>> > Regards,
>> > Stevo.
>> >
>> > On Wed, Jan 11, 2012 at 10:38 AM, Tatsuo Ishii <ishii at postgresql.org>
>> wrote:
>> >
>> >> I did following:
>> >>
>> >> Do following on the host where pgpool is running on:
>> >>
>> >> # iptables -A FORWARD -j REJECT --reject-with icmp-port-unreachable -d
>> >> 133.137.177.124
>> >> (133.137.177.124 is the host where backend is running on)
>> >>
>> >> Pull network cable from backend0 host network interface. Pgpool
>> >> detected the host being down as expected. Am I missing something?
>> >> --
>> >> Tatsuo Ishii
>> >> SRA OSS, Inc. Japan
>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> Japanese: http://www.sraoss.co.jp
>> >>
>> >> > Hello Tatsuo,
>> >> >
>> >> > With backend0 on one host just configure following rule on other host
>> >> where
>> >> > pgpool is:
>> >> >
>> >> > iptables -A FORWARD -j REJECT --reject-with icmp-port-unreachable
>> >> >
>> >> > and then have pgpool startup with health checking and retrying
>> >> configured,
>> >> > and then pull network cable from backend0 host network interface.
>> >> >
>> >> > Regards,
>> >> > Stevo.
>> >> >
>> >> > On Wed, Jan 11, 2012 at 6:27 AM, Tatsuo Ishii <ishii at postgresql.org>
>> >> wrote:
>> >> >
>> >> >> I want to try to test the situation you descrived:
>> >> >>
>> >> >> >> > When system is configured for security reasons not to return
>> >> >> destination
>> >> >> >> > host unreachable messages, even though health_check_timeout is
>> >> >>
>> >> >> But I don't know how to do it. I pulled out the network cable and
>> >> >> pgpool detected it as expected. Also I configured the server which
>> >> >> PostgreSQL is running on to disable the 5432 port. In this case
>> >> >> connect(2) returned EHOSTUNREACH (No route to host) so pgpool
>> detected
>> >> >> the error as expected.
>> >> >>
>> >> >> Could you please instruct me?
>> >> >> --
>> >> >> Tatsuo Ishii
>> >> >> SRA OSS, Inc. Japan
>> >> >> English: http://www.sraoss.co.jp/index_en.php
>> >> >> Japanese: http://www.sraoss.co.jp
>> >> >>
>> >> >> > Hello Tatsuo,
>> >> >> >
>> >> >> > Thank you for replying!
>> >> >> >
>> >> >> > I'm not sure what exactly is blocking, just by pgpool code
>> analysis I
>> >> >> > suspect it is the part where a connection is made to the db and it
>> >> >> doesn't
>> >> >> > seem to get interrupted by alarm. Tested thoroughly health check
>> >> >> behaviour,
>> >> >> > it works really well when host/ip is there and just
>> backend/postgres
>> >> is
>> >> >> > down, but not when backend host/ip is down. I could see in log
>> that
>> >> >> initial
>> >> >> > health check and each retry got delayed when host/ip is not
>> reachable,
>> >> >> > while when just backend is not listening (is down) on the
>> reachable
>> >> >> host/ip
>> >> >> > then initial health check and all retries are exact to the
>> settings in
>> >> >> > pgpool.conf.
>> >> >> >
>> >> >> > PGCONNECT_TIMEOUT is listed as one of the libpq environment
>> variables
>> >> in
>> >> >> > the docs (see
>> >> >> http://www.postgresql.org/docs/9.1/static/libpq-envars.html )
>> >> >> > There is equivalent parameter in libpq PGconnectdbParams ( see
>> >> >> >
>> >> >>
>> >>
>> http://www.postgresql.org/docs/9.1/static/libpq-connect.html#LIBPQ-CONNECT-CONNECT-TIMEOUT
>> >> >> )
>> >> >> > At the beginning of that same page there are some important infos
>> on
>> >> >> using
>> >> >> > these functions.
>> >> >> >
>> >> >> > psql respects PGCONNECT_TIMEOUT.
>> >> >> >
>> >> >> > Regards,
>> >> >> > Stevo.
>> >> >> >
>> >> >> > On Wed, Jan 11, 2012 at 12:13 AM, Tatsuo Ishii <
>> ishii at postgresql.org>
>> >> >> wrote:
>> >> >> >
>> >> >> >> > Hello pgpool community,
>> >> >> >> >
>> >> >> >> > When system is configured for security reasons not to return
>> >> >> destination
>> >> >> >> > host unreachable messages, even though health_check_timeout is
>> >> >> >> configured,
>> >> >> >> > socket call will block and alarm will not get raised until TCP
>> >> timeout
>> >> >> >> > occurs.
>> >> >> >>
>> >> >> >> Interesting. So are you saying that read(2) cannot be
>> interrupted by
>> >> >> >> alarm signal if the system is configured not to return
>> destination
>> >> >> >> host unreachable message? Could you please guide me where I can
>> get
>> >> >> >> such that info? (I'm not a network expert).
>> >> >> >>
>> >> >> >> > Not a C programmer, found some info that select call could be
>> >> replace
>> >> >> >> with
>> >> >> >> > select/pselect calls. Maybe it would be best if
>> PGCONNECT_TIMEOUT
>> >> >> value
>> >> >> >> > could be used here for connection timeout. pgpool has libpq as
>> >> >> >> dependency,
>> >> >> >> > why isn't it using libpq for the healthcheck db connect calls,
>> then
>> >> >> >> > PGCONNECT_TIMEOUT would be applied?
>> >> >> >>
>> >> >> >> I don't think libpq uses select/pselect for establishing
>> connection,
>> >> >> >> but using libpq instead of homebrew code seems to be an idea.
>> Let me
>> >> >> >> think about it.
>> >> >> >>
>> >> >> >> One question. Are you sure that libpq can deal with the case
>> (not to
>> >> >> >> return destination host unreachable messages) by using
>> >> >> >> PGCONNECT_TIMEOUT?
>> >> >> >> --
>> >> >> >> Tatsuo Ishii
>> >> >> >> SRA OSS, Inc. Japan
>> >> >> >> English: http://www.sraoss.co.jp/index_en.php
>> >> >> >> Japanese: http://www.sraoss.co.jp
>> >> >> >>
>> >> >>
>> >>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20120111/b3abcf7b/attachment.htm>