[pgpool-general: 3114] Re: PGPool connection dropped when using pg_detach

James Sewell james.sewell at lisasoft.com
Thu Aug 21 14:21:48 JST 2014


So there is no way of scheduling PostgreSQL node downtime without
interrupting operation?

Is this on the roadmap? It's a pretty major shortcoming.

Cheers,
James

On Thursday, 21 August 2014, Tatsuo Ishii <ishii at postgresql.org> wrote:

> I thought I have already answered.
>
> There is no way to do it.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> > Any updates on this?
> >
> > What is the official procedure to remove a PostgreSQL node from PGPool
> > without damaging connections to the other PostgreSQL nodes?\
> >
> > Cheers,
> >
> >
> > James Sewell,
> > PostgreSQL Team Lead / Solutions Architect
> > ______________________________________
> >
> >
> >  Level 2, 50 Queen St, Melbourne VIC 3000
> >
> > *P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099
> >
> >
> >
> > On Fri, Jul 25, 2014 at 5:02 PM, James Sewell <james.sewell at lisasoft.com
> <javascript:;>>
> > wrote:
> >
> >> Hey,
> >>
> >> It seems the exact same behaviour happens when I shut down my standby
> >> PostgreSQL node.
> >>
> >>
> >> 2014-07-25 16:59:14 LOG:   pid 11297: degenerate_backend_set: 0 fail
> over
> >> request from pid 11297
> >> 2014-07-25 16:59:14 LOG:   pid 11286: wd_start_interlock: start
> >> interlocking
> >> 2014-07-25 16:59:15 LOG:   pid 11286: starting degeneration. shutdown
> host
> >> 10.51.9.227(5432)
> >> 2014-07-25 16:59:15 LOG:   pid 11286: Restart all children
> >> 2014-07-25 16:59:15 LOG:   pid 11286: find_primary_node_repeatedly:
> >> waiting for finding a primary node
> >> 2014-07-25 16:59:15 LOG:   pid 11286: find_primary_node: primary node id
> >> is 1
> >> 2014-07-25 16:59:15 LOG:   pid 11286: wd_end_interlock: end interlocking
> >> 2014-07-25 16:59:16 LOG:   pid 11286: failover: set new primary node: 1
> >> 2014-07-25 16:59:16 LOG:   pid 11286: failover: set new master node: 1
> >> 2014-07-25 16:59:16 LOG:   pid 22737: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22739: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22736: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22740: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22741: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22742: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22743: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22745: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22744: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22746: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22747: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22748: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22749: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22750: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22751: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22753: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22752: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22754: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22755: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22756: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22758: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22757: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22760: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22759: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22762: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22763: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22761: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22764: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22765: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:16 LOG:   pid 22707: worker process received restart
> >> request
> >> 2014-07-25 16:59:16 LOG:   pid 11286: failover done. shutdown host
> >> 10.51.9.227(5432)
> >> 2014-07-25 16:59:16 LOG:   pid 22766: do_child: failback event found.
> >> restart myself.
> >> 2014-07-25 16:59:17 LOG:   pid 22706: pcp child process received restart
> >> request
> >> 2014-07-25 16:59:17 LOG:   pid 11286: PCP child 22706 exits with status
> >> 256 in failover()
> >> 2014-07-25 16:59:17 LOG:   pid 11286: fork a new PCP child pid 22768 in
> >> failover()
> >>
> >>
> >> Cheers,
> >>
> >>
> >>
> >> James Sewell,
> >> PostgreSQL Team Lead / Solutions Architect
> >> ______________________________________
> >>
> >>
> >>  Level 2, 50 Queen St, Melbourne VIC 3000
> >>
> >> *P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099
> >>
> >>
> >>
> >> On Fri, Jul 25, 2014 at 3:22 PM, James Sewell <
> james.sewell at lisasoft.com <javascript:;>>
> >> wrote:
> >>
> >>> Hey,
> >>>
> >>> So there is no way of manually taking a postgresql node out of a pool
> >>> without interrupting traffic to all other postgresql nodes?
> >>>
> >>> Cheers,
> >>>
> >>>
> >>> James Sewell,
> >>> PostgreSQL Team Lead / Solutions Architect
> >>> ______________________________________
> >>>
> >>>
> >>>  Level 2, 50 Queen St, Melbourne VIC 3000
> >>>
> >>> *P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099
> >>>
> >>>
> >>>
> >>> On Fri, Jul 25, 2014 at 3:07 PM, Tatsuo Ishii <ishii at postgresql.org
> <javascript:;>>
> >>> wrote:
> >>>
> >>>> I cannot imagine how you maintenance the database while a user
> >>>> connects to the database:-)
> >>>>
> >>>> BTW I'm not sure if this is usefull for you or not,
> >>>> but... pcp_detach_node accepts -g option which force pcp_detach_node
> >>>> wait until all clients exit.
> >>>>
> >>>> Best regards,
> >>>> --
> >>>> Tatsuo Ishii
> >>>> SRA OSS, Inc. Japan
> >>>> English: http://www.sraoss.co.jp/index_en.php
> >>>> Japanese:http://www.sraoss.co.jp
> >>>>
> >>>> > Cool,
> >>>> >
> >>>> > Thanks Tatsuo.
> >>>> >
> >>>> > Is there a way of taking a postgresql node out of a pool without
> >>>> causing
> >>>> > connections to other postgresql nodes to drop?
> >>>> >
> >>>> > This would be used in a situation such as database maintenance.
> >>>> >
> >>>> > Cheers,
> >>>> >
> >>>> >
> >>>> > James Sewell,
> >>>> > PostgreSQL Team Lead / Solutions Architect
> >>>> > ______________________________________
> >>>> >
> >>>> >
> >>>> >  Level 2, 50 Queen St, Melbourne VIC 3000
> >>>> >
> >>>> > *P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099
> >>>> >
> >>>> >
> >>>> >
> >>>> > On Fri, Jul 25, 2014 at 2:43 PM, Tatsuo Ishii <ishii at postgresql.org
> <javascript:;>>
> >>>> wrote:
> >>>> >
> >>>> >> > Hey all,
> >>>> >> >
> >>>> >> > This is a seemingly a pretty bad problem which I uncovered as
> part
> >>>> of my
> >>>> >> > last post, so the start of the message will be similar.
> >>>> >> >
> >>>> >> > I have two pgpool nodes which I am using a TCP load balancer to
> >>>> spread
> >>>> >> > between. I am using watchdog to synchronise PostgreSQL node
> >>>> information
> >>>> >> > between the two and an external HA solution (with
> >>>> ALLOW_TO_FAILOVER).
> >>>> >> >
> >>>> >> > If I start both my pgpool nodes up I get the following initial
> >>>> state:
> >>>> >> >
> >>>> >> > postgres=# show pool_nodes;
> >>>> >> >  node_id |  hostname   | port | status | lb_weight |  role
> >>>> >> > ---------+-------------+------+--------+-----------+---------
> >>>> >> >  0       | 10.10.10.1   | 5432 | 2      | 0.500000  | standby
> >>>> >> >  1       | 10.10.10.2   | 5432 | 2      | 0.500000  | primary
> >>>> >> > (2 rows)
> >>>> >> >
> >>>> >> > Now I open a PSQL connection and do the following:
> >>>> >> >
> >>>> >> > postgres=# SELECT inet_server_addr();
> >>>> >> >  inet_server_addr
> >>>> >> > ------------------
> >>>> >> >  10.10.10.2
> >>>> >> > (1 row)
> >>>> >> >
> >>>> >> > This shows I am connected to the primary.
> >>>> >> >
> >>>> >> > I can run this multiple times and I will always be connected to
> the
> >>>> >> > primary, as long as I don't close the psql session.
> >>>> >> >
> >>>> >> > Then from another window I run the following command:
> >>>> >> >
> >>>> >> >  pcp_detach_node 1 load_balancer 9898 postgres postgres 0
> >>>> >> >
> >>>> >> > And in the same PSQL session run the command again:
> >>>> >> >
> >>>> >> > postgres=# SELECT inet_server_addr();
> >>>> >> > SSL SYSCALL error: EOF detected
> >>>> >> > The connection to the server was lost. Attempting reset:
> Succeeded.
> >>>> >> >
> >>>> >> > This is strange. Why has my master connection been severed?
> >>>> >>
> >>>> >> It's an expected behavior of pcp_detach_node, which causes failover
> >>>> >> and all existing sessions are disconnected.
> >>>> >>
> >>>> >> Best regards,
> >>>> >> --
> >>>> >> Tatsuo Ishii
> >>>> >> SRA OSS, Inc. Japan
> >>>> >> English: http://www.sraoss.co.jp/index_en.php
> >>>> >> Japanese:http://www.sraoss.co.jp
> >>>> >>
> >>>> >
> >>>> > --
> >>>> >
> >>>> >
> >>>> > ------------------------------
> >>>> > The contents of this email are confidential and may be subject to
> >>>> legal or
> >>>> > professional privilege and copyright. No representation is made that
> >>>> this
> >>>> > email is free of viruses or other defects. If you have received this
> >>>> > communication in error, you may not copy or distribute any part of
> it
> >>>> or
> >>>> > otherwise disclose its contents to anyone. Please advise the sender
> of
> >>>> your
> >>>> > incorrect receipt of this correspondence.
> >>>>
> >>>
> >>>
> >>
> >
> > --
> >
> >
> > ------------------------------
> > The contents of this email are confidential and may be subject to legal
> or
> > professional privilege and copyright. No representation is made that this
> > email is free of viruses or other defects. If you have received this
> > communication in error, you may not copy or distribute any part of it or
> > otherwise disclose its contents to anyone. Please advise the sender of
> your
> > incorrect receipt of this correspondence.
>


-- 

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________


 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099

-- 


------------------------------
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20140821/9f922c52/attachment-0001.html>


More information about the pgpool-general mailing list