[pgpool-general: 1898] Re: Pgpool is unable to connect backend PostgreSQL

larisa sabban larisa.sabban at gmail.com
Tue Jul 16 16:49:18 JST 2013


Hi
Thanks for your update.
I have installed the changes and compile.
Will follow the behavior now and update.
Do you recommend to upgrade to 3.2.4? Will this fix be included there?

Thanks
Larisa.


On Tue, Jul 16, 2013 at 6:08 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:

> Oops. Small correction to the patch.
>
> +               if (health_check_timer_expired && getpid() != mypid)
>      /* has health check timer expired */
>
> should be:
>
> +               if (health_check_timer_expired && getpid() == mypid)
>      /* has health check timer expired */
>
> Included is the new patch.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>
> > Ok. I think I finally understand what's going on here.
> >
> > Pgpool main process (14317) started health checking at Jul 12 09:17:04.
> >
> > Jul 12 09:17:04 purple1-node1-ps pgpool[14317]: starting health checking
> >
> > Pgpool main process set timer at 09:17:14 because you set
> > health_check_timeout 10.  This time the health check successfully
> > completed. The timer for 09:17:14 is blocked by calling
> > signal(SIGALRM, SIG_IGN).
> >
> > Unfortunately child life time was expired at 09:17:14 and pgpool main
> > process was busy at the time because of this.
> >
> > Jul 12 09:17:14 purple1-node1-ps pgpool[16789]: child life 300 seconds
> expired
> > Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: reap_handler called
> >
> > Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: starting health checking
> >
> > Pgpool main re-enabled the timer and reset the timer variable
> > (health_check_timer_expired = 0). But when the timer re-enabled, the
> > signal handler for the timer set health_check_timer_expired to 1.  As
> > a result pgpool thought that health check timer was expired.
> >
> > Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: health_check: health
> check timer has been already expired before attempting to connect to 0 th
> backend
> >
> > Thus failover happend even if the backend was running fine.
> >
> > Besides this,
> >
> >>> This seems very strange. The error comes here:
> >
> > I can now think of an explanation. When child life time is expired,
> > pgpool fork off new process. If the global variable
> > health_check_timer_expired to 1 for the reason above, it is possible
> > the problem you saw could happen because child process inherits this.
> >
> > Includes patch addresses the problems above. Could you try it out?
> > --
> > Tatsuo Ishii
> > SRA OSS, Inc. Japan
> > English: http://www.sraoss.co.jp/index_en.php
> > Japanese: http://www.sraoss.co.jp
> >
> >> Hi, thanks for your reply.
> >>
> >>>> What kind of pgpool installation are you using?
> >>  I comlile pgpool from source code, pgpool-II-3.2.3.tar.gz
> >>>>  What kind of platform are you using?
> >> We use Ubuntu 12.04 on HP Cloud server.
> >>
> >>>> How is like your pgpool.conf?
> >> Attaching pgpool.conf
> >>
> >> Also attached syslog file from the time of a problem. You can look for a
> >> line
> >> "Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: I am 11465 accept fd 7"
> >>
> >> By the way postgres was up and running at that time, there is nothing in
> >> the logs from it and its procceses uptime shows week ago...
> >>
> >> Thanks in advance for your help
> >> Larisa.
> >>
> >>
> >>
> >> On Mon, Jul 15, 2013 at 5:00 AM, Tatsuo Ishii <ishii at postgresql.org>
> wrote:
> >>
> >>> > Hi
> >>> > I am hitting the same issue as described in the mail [pgpool-general:
> >>> 1815]
> >>> > Pgpool is unable to connect backend PostgreSQL.
> >>>
> >>> I guess [pgpool-general: 1815] is different from you (my guess is the
> >>> case somewhat related to Amazon EC2 environment problem). Moreover,
> >>> you case seems very unique and strange.
> >>>
> >>> > While connected to a single postgres node, after a while pgpool
> looses
> >>> > connection to a running postgres db, restarts all children processes
> and
> >>> > stays in running state unable to connect to db.
> >>> >
> >>> > Pgpool version 3.2.3
> >>> > Postgres version 9.2.4
> >>> >
> >>> > Part of the log:
> >>> > --------------------
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: connection received:
> >>> > host=10.4.225.120 port=41090
> >>>
> >>> Process 11465 is a pgpool child process and is responsible for actual
> >>> pgpool functions.
> >>>
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: Protocol Major: 3
> Minor:
> >>> 0
> >>> > database: hpadb user: hpauser
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: new_connection:
> >>> connecting
> >>> > 0 backend
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]:
> >>> > connect_inet_domain_socket_by_port: health check timer expired
> >>>
> >>> This seems very strange. The error comes here:
> >>>
> >>>                 if (health_check_timer_expired)         /* has health
> >>> check timer expired */
> >>>                 {
> >>>                         pool_log("connect_inet_domain_socket_by_port:
> >>> health check timer expired");
> >>>                         close(fd);
> >>>                         return -1;
> >>>                 }
> >>>
> >>> "health_check_timer_expired" is a global variable used in pgpool main
> >>> process, which is responsible for management of pgpool, including:
> >>> health check, failover etc. The variable is only meaningful in the
> >>> main process and should not be set to non 0 in pgpool child. Moreover,
> >>> the only place set the variable to non 0 is the signal handler which
> >>> is set by main process.
> >>>
> >>> One the error occurs, pgpool starts failover as you see.
> >>>
> >>> I've never seen this kind of report before. What kind of pgpool
> >>> installation are you using? (compiled from source code or from
> >>> packes?) What kind of platform are you using? How is like your
> >>> pgpool.conf?
> >>> --
> >>> Tatsuo Ishii
> >>> SRA OSS, Inc. Japan
> >>> English: http://www.sraoss.co.jp/index_en.php
> >>> Japanese: http://www.sraoss.co.jp
> >>>
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: connection to
> >>> > purple1_node1_ps(5432) failed
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: new_connection:
> >>> create_cp()
> >>> > failed
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]:
> degenerate_backend_set: 0
> >>> > fail over request from pid 11465
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler
> called
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
> >>> starting
> >>> > to select new master node
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: starting
> degeneration.
> >>> > shutdown host purple1_node1_ps(5432)
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler: no
> >>> valid
> >>> > DB node found
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: Restart all children
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
> kill
> >>> 4388
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
> kill
> >>> 9597
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[18648]: child received
> shutdown
> >>> > request signal 3
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[4388]: child received
> shutdown
> >>> > request signal 3
> >>> > Jul 12 09:32:14 purple1-node1-ps rsyslogd-2177: imuxsock lost 85
> messages
> >>> > from pid 9597 due to rate-limiting
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[9597]: child received
> shutdown
> >>> > request signal 3
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
> kill
> >>> 18648
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[29409]: child received
> shutdown
> >>> > request signal 3
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
> kill
> >>> 29409
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
> kill
> >>> 11454
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14323]: child received
> shutdown
> >>> > request signal 3
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11454]: child received
> shutdown
> >>> > request signal 3
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
> kill
> >>> 14323
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
> kill
> >>> 22349
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[22349]: child received
> shutdown
> >>> > request signal 3
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
> kill
> >>> 23617
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
> kill
> >>> 29410
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[31511]: child received
> shutdown
> >>> > request signal 3
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[29410]: child received
> shutdown
> >>> > request signal 3
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
> kill
> >>> 31511
> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[4385]: child received
> shutdown
> >>> > request signal 3
> >>> > Jul 12 09:32:14 purple1-node1-ps rsyslogd-2177: imuxsock lost 757
> >>> messages
> >>> > from pid 23617 due to rate-limiting
> >>> >
> >>> > Could you please explain?
> >>> > Thanks
> >>> > Larisa.
> >>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20130716/26017e82/attachment-0001.html>


More information about the pgpool-general mailing list