[pgpool-general: 1908] Re: Pgpool is unable to connect backend PostgreSQL

larisa sabban larisa.sabban at gmail.com
Sun Jul 21 21:43:10 JST 2013


Hi
I downloaded version 3.2.5 and compiled suggested change-set with that.
The problem seems to be solved.
Please update the source.

Thanks for your support
Larisa.


On Tue, Jul 16, 2013 at 10:52 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:

> > Hi
> > Thanks for your update.
> > I have installed the changes and compile.
> > Will follow the behavior now and update.
>
> Thanks.Looking forward to hearing from you.
>
> > Do you recommend to upgrade to 3.2.4?
>
> Moreover, we recommend to upgrade to 3.2.5 (the latest stable version).
>
> > Will this fix be included there?
>
> No, even it is not included in 3.2.5. It's brand new.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>
> > Thanks
> > Larisa.
> >
> >
> > On Tue, Jul 16, 2013 at 6:08 AM, Tatsuo Ishii <ishii at postgresql.org>
> wrote:
> >
> >> Oops. Small correction to the patch.
> >>
> >> +               if (health_check_timer_expired && getpid() != mypid)
> >>      /* has health check timer expired */
> >>
> >> should be:
> >>
> >> +               if (health_check_timer_expired && getpid() == mypid)
> >>      /* has health check timer expired */
> >>
> >> Included is the new patch.
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >> English: http://www.sraoss.co.jp/index_en.php
> >> Japanese: http://www.sraoss.co.jp
> >>
> >> > Ok. I think I finally understand what's going on here.
> >> >
> >> > Pgpool main process (14317) started health checking at Jul 12
> 09:17:04.
> >> >
> >> > Jul 12 09:17:04 purple1-node1-ps pgpool[14317]: starting health
> checking
> >> >
> >> > Pgpool main process set timer at 09:17:14 because you set
> >> > health_check_timeout 10.  This time the health check successfully
> >> > completed. The timer for 09:17:14 is blocked by calling
> >> > signal(SIGALRM, SIG_IGN).
> >> >
> >> > Unfortunately child life time was expired at 09:17:14 and pgpool main
> >> > process was busy at the time because of this.
> >> >
> >> > Jul 12 09:17:14 purple1-node1-ps pgpool[16789]: child life 300 seconds
> >> expired
> >> > Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: reap_handler called
> >> >
> >> > Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: starting health
> checking
> >> >
> >> > Pgpool main re-enabled the timer and reset the timer variable
> >> > (health_check_timer_expired = 0). But when the timer re-enabled, the
> >> > signal handler for the timer set health_check_timer_expired to 1.  As
> >> > a result pgpool thought that health check timer was expired.
> >> >
> >> > Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: health_check: health
> >> check timer has been already expired before attempting to connect to 0
> th
> >> backend
> >> >
> >> > Thus failover happend even if the backend was running fine.
> >> >
> >> > Besides this,
> >> >
> >> >>> This seems very strange. The error comes here:
> >> >
> >> > I can now think of an explanation. When child life time is expired,
> >> > pgpool fork off new process. If the global variable
> >> > health_check_timer_expired to 1 for the reason above, it is possible
> >> > the problem you saw could happen because child process inherits this.
> >> >
> >> > Includes patch addresses the problems above. Could you try it out?
> >> > --
> >> > Tatsuo Ishii
> >> > SRA OSS, Inc. Japan
> >> > English: http://www.sraoss.co.jp/index_en.php
> >> > Japanese: http://www.sraoss.co.jp
> >> >
> >> >> Hi, thanks for your reply.
> >> >>
> >> >>>> What kind of pgpool installation are you using?
> >> >>  I comlile pgpool from source code, pgpool-II-3.2.3.tar.gz
> >> >>>>  What kind of platform are you using?
> >> >> We use Ubuntu 12.04 on HP Cloud server.
> >> >>
> >> >>>> How is like your pgpool.conf?
> >> >> Attaching pgpool.conf
> >> >>
> >> >> Also attached syslog file from the time of a problem. You can look
> for a
> >> >> line
> >> >> "Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: I am 11465 accept
> fd 7"
> >> >>
> >> >> By the way postgres was up and running at that time, there is
> nothing in
> >> >> the logs from it and its procceses uptime shows week ago...
> >> >>
> >> >> Thanks in advance for your help
> >> >> Larisa.
> >> >>
> >> >>
> >> >>
> >> >> On Mon, Jul 15, 2013 at 5:00 AM, Tatsuo Ishii <ishii at postgresql.org>
> >> wrote:
> >> >>
> >> >>> > Hi
> >> >>> > I am hitting the same issue as described in the mail
> [pgpool-general:
> >> >>> 1815]
> >> >>> > Pgpool is unable to connect backend PostgreSQL.
> >> >>>
> >> >>> I guess [pgpool-general: 1815] is different from you (my guess is
> the
> >> >>> case somewhat related to Amazon EC2 environment problem). Moreover,
> >> >>> you case seems very unique and strange.
> >> >>>
> >> >>> > While connected to a single postgres node, after a while pgpool
> >> looses
> >> >>> > connection to a running postgres db, restarts all children
> processes
> >> and
> >> >>> > stays in running state unable to connect to db.
> >> >>> >
> >> >>> > Pgpool version 3.2.3
> >> >>> > Postgres version 9.2.4
> >> >>> >
> >> >>> > Part of the log:
> >> >>> > --------------------
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: connection
> received:
> >> >>> > host=10.4.225.120 port=41090
> >> >>>
> >> >>> Process 11465 is a pgpool child process and is responsible for
> actual
> >> >>> pgpool functions.
> >> >>>
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: Protocol Major: 3
> >> Minor:
> >> >>> 0
> >> >>> > database: hpadb user: hpauser
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: new_connection:
> >> >>> connecting
> >> >>> > 0 backend
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]:
> >> >>> > connect_inet_domain_socket_by_port: health check timer expired
> >> >>>
> >> >>> This seems very strange. The error comes here:
> >> >>>
> >> >>>                 if (health_check_timer_expired)         /* has
> health
> >> >>> check timer expired */
> >> >>>                 {
> >> >>>
> pool_log("connect_inet_domain_socket_by_port:
> >> >>> health check timer expired");
> >> >>>                         close(fd);
> >> >>>                         return -1;
> >> >>>                 }
> >> >>>
> >> >>> "health_check_timer_expired" is a global variable used in pgpool
> main
> >> >>> process, which is responsible for management of pgpool, including:
> >> >>> health check, failover etc. The variable is only meaningful in the
> >> >>> main process and should not be set to non 0 in pgpool child.
> Moreover,
> >> >>> the only place set the variable to non 0 is the signal handler which
> >> >>> is set by main process.
> >> >>>
> >> >>> One the error occurs, pgpool starts failover as you see.
> >> >>>
> >> >>> I've never seen this kind of report before. What kind of pgpool
> >> >>> installation are you using? (compiled from source code or from
> >> >>> packes?) What kind of platform are you using? How is like your
> >> >>> pgpool.conf?
> >> >>> --
> >> >>> Tatsuo Ishii
> >> >>> SRA OSS, Inc. Japan
> >> >>> English: http://www.sraoss.co.jp/index_en.php
> >> >>> Japanese: http://www.sraoss.co.jp
> >> >>>
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: connection to
> >> >>> > purple1_node1_ps(5432) failed
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: new_connection:
> >> >>> create_cp()
> >> >>> > failed
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]:
> >> degenerate_backend_set: 0
> >> >>> > fail over request from pid 11465
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler
> >> called
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
> >> >>> starting
> >> >>> > to select new master node
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: starting
> >> degeneration.
> >> >>> > shutdown host purple1_node1_ps(5432)
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
> no
> >> >>> valid
> >> >>> > DB node found
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: Restart all
> children
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
> >> kill
> >> >>> 4388
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
> >> kill
> >> >>> 9597
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[18648]: child received
> >> shutdown
> >> >>> > request signal 3
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[4388]: child received
> >> shutdown
> >> >>> > request signal 3
> >> >>> > Jul 12 09:32:14 purple1-node1-ps rsyslogd-2177: imuxsock lost 85
> >> messages
> >> >>> > from pid 9597 due to rate-limiting
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[9597]: child received
> >> shutdown
> >> >>> > request signal 3
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
> >> kill
> >> >>> 18648
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[29409]: child received
> >> shutdown
> >> >>> > request signal 3
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
> >> kill
> >> >>> 29409
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
> >> kill
> >> >>> 11454
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14323]: child received
> >> shutdown
> >> >>> > request signal 3
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11454]: child received
> >> shutdown
> >> >>> > request signal 3
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
> >> kill
> >> >>> 14323
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
> >> kill
> >> >>> 22349
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[22349]: child received
> >> shutdown
> >> >>> > request signal 3
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
> >> kill
> >> >>> 23617
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
> >> kill
> >> >>> 29410
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[31511]: child received
> >> shutdown
> >> >>> > request signal 3
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[29410]: child received
> >> shutdown
> >> >>> > request signal 3
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
> >> kill
> >> >>> 31511
> >> >>> > Jul 12 09:32:14 purple1-node1-ps pgpool[4385]: child received
> >> shutdown
> >> >>> > request signal 3
> >> >>> > Jul 12 09:32:14 purple1-node1-ps rsyslogd-2177: imuxsock lost 757
> >> >>> messages
> >> >>> > from pid 23617 due to rate-limiting
> >> >>> >
> >> >>> > Could you please explain?
> >> >>> > Thanks
> >> >>> > Larisa.
> >> >>>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20130721/73d2b74f/attachment-0001.html>


More information about the pgpool-general mailing list