Guillaume,<br><br>welcome to the club - pgpool healthcheck is useless at the moment, it acts as just another db client. With pgpool settings you can only control how lazy/frequent db client healthcheck is, but you cannot give healthcheck complete control over when backend failover/degeneration occurs. Healthcheck and it's retries will help avoid degeneration of a backend on temporary conditions like you are experiencing - but only if healthcheck is the sole db client of pgpool instance (think about app that doesn't make use of database/pgpool at all). Even then, in such lab case, configuration like healthcheck timeout might not be respected in all conditions of the environment (blocking connect call). There is a patch for both already, not (yet) accepted. You can go apply patch yourself, and then go and configure healthcheck to handle/survive these temporary conditions.<br>

<br>Without healthcheck fixed, you can disable failover of a backend in all conditions, unfortunately including healthcheck too, by setting DISALLOW_TO_FAILOVER, and control failover manually or at least outside of pgpool.<br>

<br>Healthcheck is handy even after eliminating that vmware issue, but it's good to eliminate root cause if possible. By vmware docs (see at <a href="http://www.vmware.com/support/ws5/doc/ws_preserve_sshot_delete.html">http://www.vmware.com/support/ws5/doc/ws_preserve_sshot_delete.html</a> ), deleting a snapshot should not affect current state of vm. If it does affect current state then it's a bug. It might be already fixed so try upgrading vmware and if that doesn't help contact vmware support.<br>

<br>Did you check postgres (failing backend) logs for that period when pgpool cannot connect to it?<br><br>Kind regards,<br>Stevo.<br><br><div class="gmail_quote">On Thu, Feb 16, 2012 at 2:59 AM, Tatsuo Ishii <span dir="ltr"><<a href="mailto:ishii@postgresql.org">ishii@postgresql.org</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Sounds like a bug with vmware. Pgpool does nothing special when<br>

issuing connect(2) system call. connect() sends SYN to peer. Peer<br>

should reply with SYN+ACK. If SYN+ACK is not returned, the local<br>

TPC/IP stack keeps on sending SYN until timeout reaches. If timed out,<br>

connect() fails with "Connection timed out" error. As far as I know,<br>

the timeout value is 189 seconds on Linux system.<br>

<div class="im HOEnZb">--<br>

Tatsuo Ishii<br>

SRA OSS, Inc. Japan<br>

English: <a href="http://www.sraoss.co.jp/index_en.php" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>

Japanese: <a href="http://www.sraoss.co.jp" target="_blank">http://www.sraoss.co.jp</a><br>

<br>

</div><div class="HOEnZb"><div class="h5">> Hi,<br>

><br>

> I'm bringing back this thread as promised once I've found something.<br>

><br>

> I managed to reproduce my problem by delete a snapshot of the vm hosting postgresql ; pgpool runs on another machine.<br>

><br>

> To summarize my problem, pgpool loses connection with a postgersql on a vm when there's a snapshot or when a snapshot is being deleted. We're using vmware by the way. An odd part of this problem is  that it doesn't always occur, it's not systematic, probably once in every 3-4 snapshots created/deleted. I thought that modifying the health connection would help but nothing happened.<br>


><br>

> Here's what I've found on my logs :<br>

><br>

> 2012-02-15 16:07:05 ERROR: pid 7768: connect_inet_domain_socket: connect() failed: Connection timed out<br>

> 2012-02-15 16:07:05 ERROR: pid 7768: connection to 192.168.0.5(5432) failed<br>

> 2012-02-15 16:07:05 ERROR: pid 7768: new_connection: create_cp() failed<br>

> 2012-02-15 16:07:05 LOG:   pid 7768: notice_backend_error: 1 fail over request from pid 7768<br>

> 2012-02-15 16:07:05 LOG:   pid 20836: starting degeneration. shutdown host 192.168.0.5 (5432)<br>

><br>

> The only way I found to work around this is by running a small script, after the snapshot, that checks if the node is still up or not ; But that's not a solution, it's a work around.<br>

><br>

> Has anybody stumbled on this kind of problem before ?<br>

><br>

> ____________________________________________________<br>

> Guillaume Douté<br>

> Administrateur Activités Transversales<br>

> ----------------------------------------------------<br>

> LINKBYNET<br>

> Columbia<br>

> 32 boulevard Vincent Gâche - 44000 Nantes<br>

> Tel direct : <a href="tel:%2B33%20%280%292%2040%2071%2061%2064" value="+33240716164">+33 (0)2 40 71 61 64</a><br>

> Tel : <a href="tel:%2B33%20%280%291%2048%2013%2000%2000" value="+33148130000">+33 (0)1 48 13 00 00</a> - Fax : <a href="tel:%2B33%20%280%291%2048%2013%2031%2021" value="+33148133121">+33 (0)1 48 13 31 21</a><br>

> Email : <a href="mailto:g.doute@linkbynet.com">g.doute@linkbynet.com</a> - Web : <a href="http://www.linkbynet.com" target="_blank">www.linkbynet.com</a><br>

> _____________________________________________________<br>

> Astreinte : <a href="http://www.linkbynet.com/astreinte/" target="_blank">http://www.linkbynet.com/astreinte/</a><br>

><br>

> Avant d'imprimer cet e-mail, pensez à l'environnement.<br>

><br>

> -----Message d'origine-----<br>

> De : <a href="mailto:pgpool-general-bounces@pgpool.net">pgpool-general-bounces@pgpool.net</a> [mailto:<a href="mailto:pgpool-general-bounces@pgpool.net">pgpool-general-bounces@pgpool.net</a>] De la part de Guillaume DOUTE<br>


> Envoyé : mercredi 25 janvier 2012 11:26<br>

> À : Guillaume Lelarge<br>

> Cc : <a href="mailto:pgpool-general@pgpool.net">pgpool-general@pgpool.net</a><br>

> Objet : [pgpool-general: 195] Re: VM nodes marked down after snapshot<br>

><br>

> Hello,<br>

><br>

> Sorry for the late reply.<br>

><br>

> You were right, I missed that option and it was set on 1. I put it to 0 and things went better. Needless to say that I felt silly.<br>

><br>

> For an odd reason, pgpool stopped logging at a certain point in time last Friday, and my problem happened again during the Weekend. So unfortunately, I still have no logs.<br>

> I will post again when I'll have something.<br>

><br>

> Thanks again for your help.<br>

><br>

> ____________________________________________________<br>

> Guillaume Douté<br>

> Administrateur Activités Transversales<br>

> ----------------------------------------------------<br>

> LINKBYNET<br>

> Columbia<br>

> 32 boulevard Vincent Gâche - 44000 Nantes Tel direct : <a href="tel:%2B33%20%280%292%2040%2071%2061%2064" value="+33240716164">+33 (0)2 40 71 61 64</a> Tel : <a href="tel:%2B33%20%280%291%2048%2013%2000%2000" value="+33148130000">+33 (0)1 48 13 00 00</a> - Fax : <a href="tel:%2B33%20%280%291%2048%2013%2031%2021" value="+33148133121">+33 (0)1 48 13 31 21</a> Email : <a href="mailto:g.doute@linkbynet.com">g.doute@linkbynet.com</a> - Web : <a href="http://www.linkbynet.com" target="_blank">www.linkbynet.com</a> _____________________________________________________<br>


> Astreinte : <a href="http://www.linkbynet.com/astreinte/" target="_blank">http://www.linkbynet.com/astreinte/</a><br>

><br>

> Avant d'imprimer cet e-mail, pensez à l'environnement.<br>

><br>

> -----Message d'origine-----<br>

> De : Guillaume Lelarge [mailto:<a href="mailto:guillaume@lelarge.info">guillaume@lelarge.info</a>] Envoyé : dimanche 22 janvier 2012 15:21 À : Guillaume DOUTE Cc : <a href="mailto:pgpool-general@pgpool.net">pgpool-general@pgpool.net</a> Objet : Re: [pgpool-general: 174] Re: VM nodes marked down after snapshot<br>


><br>

> On Tue, 2012-01-17 at 17:58 +0100, Guillaume DOUTE wrote:<br>

>> Thanks for your reply and your explanations,<br>

>><br>

>> I can't understand why but I can't reproduce my problem. Things seems<br>

>> quite stable, fortunately. I will reply with logs when I'll encounter<br>

>> the problem again<br>

>><br>

>> On a side question : I don't understand however why I keep getting "DEBUG" lines in my logs although I didn't launch pgpool with "-d". Logs are too verbose and get too big, so I can't enable logging all the time. Any particular reasons as to why pgpool behaves this way ?<br>


>><br>

><br>

> You surely have debug_level set to a value higher than 0.<br>

><br>

><br>

> --<br>

> Guillaume<br>

> <a href="http://blog.guillaume.lelarge.info" target="_blank">http://blog.guillaume.lelarge.info</a><br>

> <a href="http://www.dalibo.com" target="_blank">http://www.dalibo.com</a><br>

> PostgreSQL Sessions #3: <a href="http://www.postgresql-sessions.org" target="_blank">http://www.postgresql-sessions.org</a><br>

><br>

> _______________________________________________<br>

> pgpool-general mailing list<br>

> <a href="mailto:pgpool-general@pgpool.net">pgpool-general@pgpool.net</a><br>

> <a href="http://www.pgpool.net/mailman/listinfo/pgpool-general" target="_blank">http://www.pgpool.net/mailman/listinfo/pgpool-general</a><br>

> _______________________________________________<br>

> pgpool-general mailing list<br>

> <a href="mailto:pgpool-general@pgpool.net">pgpool-general@pgpool.net</a><br>

> <a href="http://www.pgpool.net/mailman/listinfo/pgpool-general" target="_blank">http://www.pgpool.net/mailman/listinfo/pgpool-general</a><br>

_______________________________________________<br>

pgpool-general mailing list<br>

<a href="mailto:pgpool-general@pgpool.net">pgpool-general@pgpool.net</a><br>

<a href="http://www.pgpool.net/mailman/listinfo/pgpool-general" target="_blank">http://www.pgpool.net/mailman/listinfo/pgpool-general</a><br>

</div></div></blockquote></div><br>