<div dir="ltr">Ok so the problem showed up again, here what's new :<div><br></div><div>I made a small script (remoted, from my pc) that perform "SELECT 1" on the same db with same credentials, each 2 sec interval and with 2 sec timeout, using 1 pool, everything just like health_process.</div><div><br></div><div>During the bug, my script didn't show anything.... All worked fine on it. Same for network monitoring and nothing more revelant on pgpool and postgresql logs...</div><div><br></div><div>I changed health_process's credentials and db, nothing changed.</div><div><br></div><div>Could someone can tell me how to make pgpool's health_process verbose ? (even if it's require to change the source)</div><div><br></div><div>The start point is still this : </div><div><br></div><div><div>2018-04-27 18:32:16: pid 5983:LOG:  failed to connect to PostgreSQL server on "x.x.x.x:xxxx" using INET socket</div><div>2018-04-27 18:32:16: pid 5983:DETAIL:  health check timer expired</div></div><div><br></div><div>And I would like to have more details about this one</div><div><br></div><div>Thanks..</div></div><div class="gmail_extra"><br><div class="gmail_quote">2018-04-27 16:48 GMT+02:00 Bud Curly <span dir="ltr"><<a href="mailto:psyckow.prod@gmail.com" target="_blank">psyckow.prod@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">So I have reduce timeout to 2 seconds each like this :<div>

<span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><b>health_check_timeout</b></span><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"> = 2</span></div><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><b>connect_timeout</b></span><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"> = 2000</span>

<br></div><div><br></div><div>The problem seems to appears more frequently (two times since I made the update, 4 hours ago). Same logs on pgpool and postgresql and same condition (~5 insert / seconds) during the problem.<div><div><br></div><div>On pgpool :</div><div><br></div><div><div>2018-04-27 16:26:27: pid 5983:LOG:  failed to connect to PostgreSQL server on "x.x.x.x:xxxx" using INET socket</div><div>2018-04-27 16:26:27: pid 5983:DETAIL:  health check timer expired</div><div>2018-04-27 16:26:27: pid 5983:ERROR:  failed to make persistent db connection</div><div>2018-04-27 16:26:27: pid 5983:DETAIL:  connection to host:"

<span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">x.x.x.x:xxxx</span>" failed</div><div>2018-04-27 16:26:27: pid 5983:LOG:  health check failed on node 0 (timeout:1)</div><div>2018-04-27 16:26:27: pid 5983:LOG:  received degenerate backend request for node_id: 0 from pid [5983]</div><div>2018-04-27 16:26:27: pid 5949:LOG:  Pgpool-II parent process has received failover request</div><div>2018-04-27 16:26:27: pid 5949:LOG:  starting degeneration. shutdown host x.x.x.x (xxxx)</div><div>2018-04-27 16:26:27: pid 5949:LOG:  Restart all children</div></div><div><br></div><div>On PostgreSQL :</div><div><br></div><div><div>2018-04-27 16:26:32.079 CEST [30525] LOG:  trigger file found: /var/lib/postgresql/9.6/main/<wbr>trigger</div><div>2018-04-27 16:26:32.079 CEST [30527] FATAL:  terminating walreceiver process due to administrator command</div><div>2018-04-27 16:26:32.080 CEST [30525] LOG:  invalid record length at 3/32229D10: wanted 24, got 0</div><div>2018-04-27 16:26:32.080 CEST [30525] LOG:  redo done at 3/32229CE8</div><div>2018-04-27 16:26:32.080 CEST [30525] LOG:  last completed transaction was at log time 2018-04-27 16:26:27.093816+02</div><div>2018-04-27 16:26:32.090 CEST [30525] LOG:  selected new timeline ID: 98</div><div>2018-04-27 16:26:32.215 CEST [30525] LOG:  archive recovery complete</div><div>2018-04-27 16:26:32.230 CEST [30525] LOG:  MultiXact member wraparound protections are now enabled</div><div>2018-04-27 16:26:32.237 CEST [30524] LOG:  database system is ready to accept connections</div><div>2018-04-27 16:26:32.238 CEST [31170] LOG:  autovacuum launcher started</div></div><div><br></div><div>On the master PostgreSQL, I set "<b>log_min_error_statement</b> = debug5" so if there were a problem with PostgreSQL, I should have been noticed.</div><div><br></div><div>There is nothing weird on tcp paquets while I was monitoring.</div><div><br></div><div>I also monitored network connection with a looped ping x.x.x.x (public address) from the machine, there is no variation in delays during the problem...</div><div><br></div><div>I though a second it could be linked to my number of pool connection allowed on pgpool and on the backend, because of the connection monopolized by the health_check process :</div><div><br></div><div>- On pgpool :</div><div><br></div><div><div>num_init_children = 30</div><div>max_pool = 3</div></div><div><br></div><div>- On the postgreSQL master :</div><div><br></div><div>max_connections = 100<br></div><div><br></div><div>I tried to increase these settings, this change nothing...</div><div><br></div><div>I will try to simulate the health_check process with one pool and same timeout and check if I have something</div><div><br></div><div>But I run out of idea right now... If someone have something, I take.</div><div><br></div><div>Thanks</div></div></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">2018-04-27 11:33 GMT+02:00 Bud Curly <span dir="ltr"><<a href="mailto:psyckow.prod@gmail.com" target="_blank">psyckow.prod@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><b style="font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;color:rgb(0,0,0);font-family:"Times New Roman";font-size:medium">> </b><span style="font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;color:rgb(0,0,0)"><font face="arial, helvetica, sans-serif">So if we had health_check_hostname0, does it help you?</font><br><br class="m_3756096171255332015m_6058052673712875173gmail-Apple-interchange-newline">

</span></div>

<b style="font-variant-ligatures:normal;font-variant-caps:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;color:rgb(0,0,0);font-family:"Times New Roman";font-size:medium"><span style="font-style:normal;color:rgb(0,0,0);font-family:arial,helvetica,sans-serif;font-size:small;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">This could be a workaround for my case but I believe this is a network issue from my server provider. I don't really know about their network structure, but the public ip address used is set at an higher level trough NAT and it's not affected on the network interface of the server itself. </span></b><div>With the command tracepath from the machine to its public IP, I found out that it goes trough 8 node to resolve.</div><div><br></div><div>So in general the use of public IP instead of loopback is not good in terms of performance for local services.</div><div><br></div><div>A setting that could interest me could be : recovery_hostname0, 

recovery_hostname1, etc. as I need the public IP only for standby to perform pgpool_recovery().</div><div><br></div><div>Thanks :)</div><div><div><div><b style="font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;color:rgb(0,0,0);font-family:"Times New Roman";font-size:medium"><br class="m_3756096171255332015m_6058052673712875173gmail-Apple-interchange-newline">Tatsuo Ishii</b><span style="font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration-style:initial;text-decoration-color:initial;color:rgb(0,0,0);font-family:"Times New Roman";font-size:medium;background-color:rgb(255,255,255);float:none;display:inline"><span> </span></span><a href="mailto:pgpool-general%40pgpool.net?Subject=Re:%20Re%3A%20%5Bpgpool-general%3A%206060%5D%20Re%3A%20pgpool-general%20Digest%2C%20Vol%2078%2C%20Issue%2019&In-Reply-To=%3C20180427.174736.1303932718741225970.t-ishii%40sraoss.co.jp%3E" title="[pgpool-general: 6060] Re: pgpool-general Digest, Vol 78, Issue 19" style="color:rgb(17,85,204);font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);font-family:"Times New Roman";font-size:medium" target="_blank">ishii at sraoss.co.jp<span> </span></a><br style="font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;color:rgb(0,0,0);font-family:"Times New Roman";font-size:medium"><i style="font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;color:rgb(0,0,0);font-family:"Times New Roman";font-size:medium">Fri Apr 27 17:47:36 JST 2018</i><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span> </span></span><br style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><pre style="font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;white-space:pre-wrap;color:rgb(0,0,0)"><span>><i> Thanks for your support :)
</i></span>
You are welcome:-)

>><i> Still I don't understand. Pgpool-II and PostgreSQL master are on thesame
</i><span>><i> machine, that means you could set like "backend_hostname0 = "127.0.0.1".
</i>><i> 
</i>><i> Because I need the public address for pgpool_recovery() method to permit
</i>><i> online recovery from remote nodes. And pgPool like health_check
</i>><i> process use backend_hostname0
</i>><i> to do so.
</i></span>
Oh that makes sense.

><i> The setting health_check_hostname0 doesn't exist but trough, this is not a
</i>><i> workaround.
</i>
So if we had health_check_hostname0, does it help you?

><i> So according to the log, is the timeout error triggered by this
</i><span>><i> "health_check_timeout = 6" or this "connect_timeout = 10000" ?
</i></span>
I believe "health_check_timeout = 6". connect system call waits up to
10 seconds but before it expires health_check_timeout comes.

><i> I downed timeout to 2 seconds each and monitoring net paquets to find some
</i><span>><i> details... Keep you in touch
</i></span>
Thanks.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: <a href="http://www.sraoss.co.jp/index_en.php" style="color:rgb(17,85,204)" target="_blank">http://www.sraoss.co.jp/index_<wbr>en.php</a>
Japanese:<a href="http://www.sraoss.co.jp/" style="color:rgb(17,85,204)" target="_blank">http://www.sraoss.co.<wbr>jp</a></pre><br class="m_3756096171255332015m_6058052673712875173gmail-Apple-interchange-newline">

<br></div></div></div></div><div class="m_3756096171255332015HOEnZb"><div class="m_3756096171255332015h5"><div class="gmail_extra"><br><div class="gmail_quote">2018-04-27 10:44 GMT+02:00 Bud Curly <span dir="ltr"><<a href="mailto:psyckow.prod@gmail.com" target="_blank">psyckow.prod@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">

<span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Thanks for your support :)</span><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><br></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial">> <span style="font-size:12.8px">Still I don't understand. Pgpool-II and PostgreSQL master are on the</span><span style="font-size:12.8px">same machine, that means you could set like "backend_hostname0 = </span><span style="font-size:12.8px">"127.0.0.1".</span></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px"><br></span></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px">Because I need the public address for pgpool_recovery() method to permit online recovery from remote nodes. </span><span style="font-size:12.8px">And pgPool like health_check process use <span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">backend_hostname0 to do so.</span></span></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px"><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></span></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px">The setting health_check_hostname0 doesn't exist but trough, this is not a workaround.</span></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px"><br></span></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px">So according to the log, is the timeout error triggered by this "health_check_timeout = 6" or this "connect_timeout = 10000" ?</span></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px"><br></span></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px">I downed timeout to 2 seconds each and monitoring net paquets to find some details... Keep you in touch</span></div>

<br></div><div class="m_3756096171255332015m_6058052673712875173HOEnZb"><div class="m_3756096171255332015m_6058052673712875173h5"><div class="gmail_extra"><br><div class="gmail_quote">2018-04-27 3:15 GMT+02:00 Tatsuo Ishii <span dir="ltr"><<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>> Pgpool-II health check process uses non-blocking socket for connecting<br>
> to PostgreSQL. After issuing connect system call it waits for its<br>
> completion using select system call with timeout: connect_timeout in<br>
> pgpool.conf (in your case 10 seconds). On the other hand health_check<br>
> timeout is 6 seconds. So after 6 seconds, an alarm interrupted the<br>
> select system call and it returned with errno == EINTR, then the log<br>
> emitted. Not sure why the connect system call did not respond for 6<br>
> seconds.<br>
> <br>
> That's all what I know from the log.<br>
<br>
</span>If you want to make research on this, packet dump is required.<br>
<div class="m_3756096171255332015m_6058052673712875173m_8626708393922286130HOEnZb"><div class="m_3756096171255332015m_6058052673712875173m_8626708393922286130h5"><br>
Best regards,<br>
--<br>
Tatsuo Ishii<br>
SRA OSS, Inc. Japan<br>
English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_<wbr>en.php</a><br>
Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.<wbr>jp</a><br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>