[pgpool-general: 233] Re: VM nodes marked down after snapshot

Stevo Slavić sslavic at gmail.com
Thu Feb 16 18:01:46 JST 2012


Guillaume,

welcome to the club - pgpool healthcheck is useless at the moment, it acts
as just another db client. With pgpool settings you can only control how
lazy/frequent db client healthcheck is, but you cannot give healthcheck
complete control over when backend failover/degeneration occurs.
Healthcheck and it's retries will help avoid degeneration of a backend on
temporary conditions like you are experiencing - but only if healthcheck is
the sole db client of pgpool instance (think about app that doesn't make
use of database/pgpool at all). Even then, in such lab case, configuration
like healthcheck timeout might not be respected in all conditions of the
environment (blocking connect call). There is a patch for both already, not
(yet) accepted. You can go apply patch yourself, and then go and configure
healthcheck to handle/survive these temporary conditions.

Without healthcheck fixed, you can disable failover of a backend in all
conditions, unfortunately including healthcheck too, by setting
DISALLOW_TO_FAILOVER, and control failover manually or at least outside of
pgpool.

Healthcheck is handy even after eliminating that vmware issue, but it's
good to eliminate root cause if possible. By vmware docs (see at
http://www.vmware.com/support/ws5/doc/ws_preserve_sshot_delete.html ),
deleting a snapshot should not affect current state of vm. If it does
affect current state then it's a bug. It might be already fixed so try
upgrading vmware and if that doesn't help contact vmware support.

Did you check postgres (failing backend) logs for that period when pgpool
cannot connect to it?

Kind regards,
Stevo.

On Thu, Feb 16, 2012 at 2:59 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:

> Sounds like a bug with vmware. Pgpool does nothing special when
> issuing connect(2) system call. connect() sends SYN to peer. Peer
> should reply with SYN+ACK. If SYN+ACK is not returned, the local
> TPC/IP stack keeps on sending SYN until timeout reaches. If timed out,
> connect() fails with "Connection timed out" error. As far as I know,
> the timeout value is 189 seconds on Linux system.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>
> > Hi,
> >
> > I'm bringing back this thread as promised once I've found something.
> >
> > I managed to reproduce my problem by delete a snapshot of the vm hosting
> postgresql ; pgpool runs on another machine.
> >
> > To summarize my problem, pgpool loses connection with a postgersql on a
> vm when there's a snapshot or when a snapshot is being deleted. We're using
> vmware by the way. An odd part of this problem is  that it doesn't always
> occur, it's not systematic, probably once in every 3-4 snapshots
> created/deleted. I thought that modifying the health connection would help
> but nothing happened.
> >
> > Here's what I've found on my logs :
> >
> > 2012-02-15 16:07:05 ERROR: pid 7768: connect_inet_domain_socket:
> connect() failed: Connection timed out
> > 2012-02-15 16:07:05 ERROR: pid 7768: connection to 192.168.0.5(5432)
> failed
> > 2012-02-15 16:07:05 ERROR: pid 7768: new_connection: create_cp() failed
> > 2012-02-15 16:07:05 LOG:   pid 7768: notice_backend_error: 1 fail over
> request from pid 7768
> > 2012-02-15 16:07:05 LOG:   pid 20836: starting degeneration. shutdown
> host 192.168.0.5 (5432)
> >
> > The only way I found to work around this is by running a small script,
> after the snapshot, that checks if the node is still up or not ; But that's
> not a solution, it's a work around.
> >
> > Has anybody stumbled on this kind of problem before ?
> >
> > ____________________________________________________
> > Guillaume Douté
> > Administrateur Activités Transversales
> > ----------------------------------------------------
> > LINKBYNET
> > Columbia
> > 32 boulevard Vincent Gâche - 44000 Nantes
> > Tel direct : +33 (0)2 40 71 61 64
> > Tel : +33 (0)1 48 13 00 00 - Fax : +33 (0)1 48 13 31 21
> > Email : g.doute at linkbynet.com - Web : www.linkbynet.com
> > _____________________________________________________
> > Astreinte : http://www.linkbynet.com/astreinte/
> >
> > Avant d'imprimer cet e-mail, pensez à l'environnement.
> >
> > -----Message d'origine-----
> > De : pgpool-general-bounces at pgpool.net [mailto:
> pgpool-general-bounces at pgpool.net] De la part de Guillaume DOUTE
> > Envoyé : mercredi 25 janvier 2012 11:26
> > À : Guillaume Lelarge
> > Cc : pgpool-general at pgpool.net
> > Objet : [pgpool-general: 195] Re: VM nodes marked down after snapshot
> >
> > Hello,
> >
> > Sorry for the late reply.
> >
> > You were right, I missed that option and it was set on 1. I put it to 0
> and things went better. Needless to say that I felt silly.
> >
> > For an odd reason, pgpool stopped logging at a certain point in time
> last Friday, and my problem happened again during the Weekend. So
> unfortunately, I still have no logs.
> > I will post again when I'll have something.
> >
> > Thanks again for your help.
> >
> > ____________________________________________________
> > Guillaume Douté
> > Administrateur Activités Transversales
> > ----------------------------------------------------
> > LINKBYNET
> > Columbia
> > 32 boulevard Vincent Gâche - 44000 Nantes Tel direct : +33 (0)2 40 71
> 61 64 Tel : +33 (0)1 48 13 00 00 - Fax : +33 (0)1 48 13 31 21 Email :
> g.doute at linkbynet.com - Web : www.linkbynet.com_____________________________________________________
> > Astreinte : http://www.linkbynet.com/astreinte/
> >
> > Avant d'imprimer cet e-mail, pensez à l'environnement.
> >
> > -----Message d'origine-----
> > De : Guillaume Lelarge [mailto:guillaume at lelarge.info] Envoyé :
> dimanche 22 janvier 2012 15:21 À : Guillaume DOUTE Cc :
> pgpool-general at pgpool.net Objet : Re: [pgpool-general: 174] Re: VM nodes
> marked down after snapshot
> >
> > On Tue, 2012-01-17 at 17:58 +0100, Guillaume DOUTE wrote:
> >> Thanks for your reply and your explanations,
> >>
> >> I can't understand why but I can't reproduce my problem. Things seems
> >> quite stable, fortunately. I will reply with logs when I'll encounter
> >> the problem again
> >>
> >> On a side question : I don't understand however why I keep getting
> "DEBUG" lines in my logs although I didn't launch pgpool with "-d". Logs
> are too verbose and get too big, so I can't enable logging all the time.
> Any particular reasons as to why pgpool behaves this way ?
> >>
> >
> > You surely have debug_level set to a value higher than 0.
> >
> >
> > --
> > Guillaume
> > http://blog.guillaume.lelarge.info
> > http://www.dalibo.com
> > PostgreSQL Sessions #3: http://www.postgresql-sessions.org
> >
> > _______________________________________________
> > pgpool-general mailing list
> > pgpool-general at pgpool.net
> > http://www.pgpool.net/mailman/listinfo/pgpool-general
> > _______________________________________________
> > pgpool-general mailing list
> > pgpool-general at pgpool.net
> > http://www.pgpool.net/mailman/listinfo/pgpool-general
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20120216/e7ee8501/attachment.html>


More information about the pgpool-general mailing list