<div dir="ltr">Hi<div>I downloaded version 3.2.5 and compiled suggested change-set with that.</div><div>The problem seems to be solved. </div><div>Please update the source.</div><div><br></div><div>Thanks for your support</div>

<div>Larisa. </div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Jul 16, 2013 at 10:52 AM, Tatsuo Ishii <span dir="ltr"><<a href="mailto:ishii@postgresql.org" target="_blank">ishii@postgresql.org</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">> Hi<br>

> Thanks for your update.<br>

> I have installed the changes and compile.<br>

> Will follow the behavior now and update.<br>

<br>

Thanks.Looking forward to hearing from you.<br>

<br>

> Do you recommend to upgrade to 3.2.4?<br>

<br>

Moreover, we recommend to upgrade to 3.2.5 (the latest stable version).<br>

<br>

> Will this fix be included there?<br>

<br>

No, even it is not included in 3.2.5. It's brand new.<br>

--<br>

Tatsuo Ishii<br>

SRA OSS, Inc. Japan<br>

English: <a href="http://www.sraoss.co.jp/index_en.php" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>

Japanese: <a href="http://www.sraoss.co.jp" target="_blank">http://www.sraoss.co.jp</a><br>

<br>

> Thanks<br>

> Larisa.<br>

><br>

><br>

> On Tue, Jul 16, 2013 at 6:08 AM, Tatsuo Ishii <<a href="mailto:ishii@postgresql.org">ishii@postgresql.org</a>> wrote:<br>

><br>

>> Oops. Small correction to the patch.<br>

>><br>

>> +               if (health_check_timer_expired && getpid() != mypid)<br>

>>      /* has health check timer expired */<br>

>><br>

>> should be:<br>

>><br>

>> +               if (health_check_timer_expired && getpid() == mypid)<br>

>>      /* has health check timer expired */<br>

>><br>

>> Included is the new patch.<br>

>> --<br>

>> Tatsuo Ishii<br>

>> SRA OSS, Inc. Japan<br>

>> English: <a href="http://www.sraoss.co.jp/index_en.php" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>

>> Japanese: <a href="http://www.sraoss.co.jp" target="_blank">http://www.sraoss.co.jp</a><br>

>><br>

>> > Ok. I think I finally understand what's going on here.<br>

>> ><br>

>> > Pgpool main process (14317) started health checking at Jul 12 09:17:04.<br>

>> ><br>

>> > Jul 12 09:17:04 purple1-node1-ps pgpool[14317]: starting health checking<br>

>> ><br>

>> > Pgpool main process set timer at 09:17:14 because you set<br>

>> > health_check_timeout 10.  This time the health check successfully<br>

>> > completed. The timer for 09:17:14 is blocked by calling<br>

>> > signal(SIGALRM, SIG_IGN).<br>

>> ><br>

>> > Unfortunately child life time was expired at 09:17:14 and pgpool main<br>

>> > process was busy at the time because of this.<br>

>> ><br>

>> > Jul 12 09:17:14 purple1-node1-ps pgpool[16789]: child life 300 seconds<br>

>> expired<br>

>> > Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: reap_handler called<br>

>> ><br>

>> > Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: starting health checking<br>

>> ><br>

>> > Pgpool main re-enabled the timer and reset the timer variable<br>

>> > (health_check_timer_expired = 0). But when the timer re-enabled, the<br>

>> > signal handler for the timer set health_check_timer_expired to 1.  As<br>

>> > a result pgpool thought that health check timer was expired.<br>

>> ><br>

>> > Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: health_check: health<br>

>> check timer has been already expired before attempting to connect to 0 th<br>

>> backend<br>

>> ><br>

>> > Thus failover happend even if the backend was running fine.<br>

>> ><br>

>> > Besides this,<br>

>> ><br>

>> >>> This seems very strange. The error comes here:<br>

>> ><br>

>> > I can now think of an explanation. When child life time is expired,<br>

>> > pgpool fork off new process. If the global variable<br>

>> > health_check_timer_expired to 1 for the reason above, it is possible<br>

>> > the problem you saw could happen because child process inherits this.<br>

>> ><br>

>> > Includes patch addresses the problems above. Could you try it out?<br>

>> > --<br>

>> > Tatsuo Ishii<br>

>> > SRA OSS, Inc. Japan<br>

>> > English: <a href="http://www.sraoss.co.jp/index_en.php" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>

>> > Japanese: <a href="http://www.sraoss.co.jp" target="_blank">http://www.sraoss.co.jp</a><br>

>> ><br>

>> >> Hi, thanks for your reply.<br>

>> >><br>

>> >>>> What kind of pgpool installation are you using?<br>

>> >>  I comlile pgpool from source code, pgpool-II-3.2.3.tar.gz<br>

>> >>>>  What kind of platform are you using?<br>

>> >> We use Ubuntu 12.04 on HP Cloud server.<br>

>> >><br>

>> >>>> How is like your pgpool.conf?<br>

>> >> Attaching pgpool.conf<br>

>> >><br>

>> >> Also attached syslog file from the time of a problem. You can look for a<br>

>> >> line<br>

>> >> "Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: I am 11465 accept fd 7"<br>

>> >><br>

>> >> By the way postgres was up and running at that time, there is nothing in<br>

>> >> the logs from it and its procceses uptime shows week ago...<br>

>> >><br>

>> >> Thanks in advance for your help<br>

>> >> Larisa.<br>

>> >><br>

>> >><br>

>> >><br>

>> >> On Mon, Jul 15, 2013 at 5:00 AM, Tatsuo Ishii <<a href="mailto:ishii@postgresql.org">ishii@postgresql.org</a>><br>

>> wrote:<br>

>> >><br>

>> >>> > Hi<br>

>> >>> > I am hitting the same issue as described in the mail [pgpool-general:<br>

>> >>> 1815]<br>

>> >>> > Pgpool is unable to connect backend PostgreSQL.<br>

>> >>><br>

>> >>> I guess [pgpool-general: 1815] is different from you (my guess is the<br>

>> >>> case somewhat related to Amazon EC2 environment problem). Moreover,<br>

>> >>> you case seems very unique and strange.<br>

>> >>><br>

>> >>> > While connected to a single postgres node, after a while pgpool<br>

>> looses<br>

>> >>> > connection to a running postgres db, restarts all children processes<br>

>> and<br>

>> >>> > stays in running state unable to connect to db.<br>

>> >>> ><br>

>> >>> > Pgpool version 3.2.3<br>

>> >>> > Postgres version 9.2.4<br>

>> >>> ><br>

>> >>> > Part of the log:<br>

>> >>> > --------------------<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: connection received:<br>

>> >>> > host=10.4.225.120 port=41090<br>

>> >>><br>

>> >>> Process 11465 is a pgpool child process and is responsible for actual<br>

>> >>> pgpool functions.<br>

>> >>><br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: Protocol Major: 3<br>

>> Minor:<br>

>> >>> 0<br>

>> >>> > database: hpadb user: hpauser<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: new_connection:<br>

>> >>> connecting<br>

>> >>> > 0 backend<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]:<br>

>> >>> > connect_inet_domain_socket_by_port: health check timer expired<br>

>> >>><br>

>> >>> This seems very strange. The error comes here:<br>

>> >>><br>

>> >>>                 if (health_check_timer_expired)         /* has health<br>

>> >>> check timer expired */<br>

>> >>>                 {<br>

>> >>>                         pool_log("connect_inet_domain_socket_by_port:<br>

>> >>> health check timer expired");<br>

>> >>>                         close(fd);<br>

>> >>>                         return -1;<br>

>> >>>                 }<br>

>> >>><br>

>> >>> "health_check_timer_expired" is a global variable used in pgpool main<br>

>> >>> process, which is responsible for management of pgpool, including:<br>

>> >>> health check, failover etc. The variable is only meaningful in the<br>

>> >>> main process and should not be set to non 0 in pgpool child. Moreover,<br>

>> >>> the only place set the variable to non 0 is the signal handler which<br>

>> >>> is set by main process.<br>

>> >>><br>

>> >>> One the error occurs, pgpool starts failover as you see.<br>

>> >>><br>

>> >>> I've never seen this kind of report before. What kind of pgpool<br>

>> >>> installation are you using? (compiled from source code or from<br>

>> >>> packes?) What kind of platform are you using? How is like your<br>

>> >>> pgpool.conf?<br>

>> >>> --<br>

>> >>> Tatsuo Ishii<br>

>> >>> SRA OSS, Inc. Japan<br>

>> >>> English: <a href="http://www.sraoss.co.jp/index_en.php" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>

>> >>> Japanese: <a href="http://www.sraoss.co.jp" target="_blank">http://www.sraoss.co.jp</a><br>

>> >>><br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: connection to<br>

>> >>> > purple1_node1_ps(5432) failed<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: new_connection:<br>

>> >>> create_cp()<br>

>> >>> > failed<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]:<br>

>> degenerate_backend_set: 0<br>

>> >>> > fail over request from pid 11465<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler<br>

>> called<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:<br>

>> >>> starting<br>

>> >>> > to select new master node<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: starting<br>

>> degeneration.<br>

>> >>> > shutdown host purple1_node1_ps(5432)<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler: no<br>

>> >>> valid<br>

>> >>> > DB node found<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: Restart all children<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:<br>

>> kill<br>

>> >>> 4388<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:<br>

>> kill<br>

>> >>> 9597<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[18648]: child received<br>

>> shutdown<br>

>> >>> > request signal 3<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[4388]: child received<br>

>> shutdown<br>

>> >>> > request signal 3<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps rsyslogd-2177: imuxsock lost 85<br>

>> messages<br>

>> >>> > from pid 9597 due to rate-limiting<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[9597]: child received<br>

>> shutdown<br>

>> >>> > request signal 3<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:<br>

>> kill<br>

>> >>> 18648<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[29409]: child received<br>

>> shutdown<br>

>> >>> > request signal 3<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:<br>

>> kill<br>

>> >>> 29409<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:<br>

>> kill<br>

>> >>> 11454<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14323]: child received<br>

>> shutdown<br>

>> >>> > request signal 3<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11454]: child received<br>

>> shutdown<br>

>> >>> > request signal 3<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:<br>

>> kill<br>

>> >>> 14323<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:<br>

>> kill<br>

>> >>> 22349<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[22349]: child received<br>

>> shutdown<br>

>> >>> > request signal 3<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:<br>

>> kill<br>

>> >>> 23617<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:<br>

>> kill<br>

>> >>> 29410<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[31511]: child received<br>

>> shutdown<br>

>> >>> > request signal 3<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[29410]: child received<br>

>> shutdown<br>

>> >>> > request signal 3<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:<br>

>> kill<br>

>> >>> 31511<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[4385]: child received<br>

>> shutdown<br>

>> >>> > request signal 3<br>

>> >>> > Jul 12 09:32:14 purple1-node1-ps rsyslogd-2177: imuxsock lost 757<br>

>> >>> messages<br>

>> >>> > from pid 23617 due to rate-limiting<br>

>> >>> ><br>

>> >>> > Could you please explain?<br>

>> >>> > Thanks<br>

>> >>> > Larisa.<br>

>> >>><br>

>><br>

</blockquote></div><br></div>