[pgpool-general: 4510] Re: pgpool unable to attach a node during failback command

Shay Cohavi cohavisi at gmail.com
Tue Mar 1 23:20:38 JST 2016


I'm using online_recovery *only* for full copy purposes - if node gets
corrupted. pcp_recovery_node will execute recovery_1st_stage command which
using pg_basebackup (full copy).

but for switchover between the nodes (*none* of the nodes gets corrupted) -
just for *switch roles:*
*1. shutdown the primary.*
*2. pgpool promotes the secondary.*
*3. perform pcp_attach_node (old primary) which calls the failback.sh:*
      the failback.sh does exactly what you describe:
         a.pg_start_backup()
         b. rsync (should be fast)
         c. pg_stop_backup()
         d. creates recovery.conf
         e. start the node.


any ideas or comments??


Thanks,
cohavisi


On Tue, Mar 1, 2016 at 3:37 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:

> > OK....Thanks!
> >
> > I'm trying to implement an failover/failback on the nodes:
> > 1. primary node gets down.
> > 2. pgpool promotes the secondary node - make it primary.
> > 3. by attaching the failed node (old primary) -  the failback.sh is
> called
> > and recovering the failed node (using rsync - much more faster) and make
> it
> > online secondary!
>
> I don't know what failback.sh is doing but if it just runs rsync, it's
> not safe.  You should use pg_start_backup()/pg_stop_backup().
>
> BTW, if rsync is much faster for you, why don't you use it for online
> recovery as well?
>
> > from what you are saying...
> > just to make sure, I *can not* use the failback.sh script (which called
> by
> > pcp_attach_node) in order "recover" the node and make it online (as
> > scondary).
>
> Ok, but failback.sh is not supposed to do what you want.
> I recommend you to look into follow master command.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> > Thanks,
> > cohavisi
> >
> > On Tue, Mar 1, 2016 at 2:15 PM, Tatsuo Ishii <ishii at postgresql.org>
> wrote:
> >
> >> I'm not sure what you want to do (especialy I'm confused by
> >> "secondary": what does it mean?). Have you taken look at follow master
> >> script?
> >>
> >> Anyway...
> >>
> >> pcp_attach_node should be used for the case PostgreSQL server is
> >> online and ready to use. Not for recovering a PostgreSQL server.
> >>
> >> Best regards,
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >> English: http://www.sraoss.co.jp/index_en.php
> >> Japanese:http://www.sraoss.co.jp
> >>
> >> > Hi,
> >> > Thanks for your replay...
> >> > I do use online recovery in case a full recovery is needed (using
> >> > pg_basebackup - via pcp_recovery_node).
> >> > but I added an ability to perform a switchover between the nodes using
> >> > stop/detach primary - failover occurs and reattach it as secondary
> (using
> >> > failback script).
> >> > but as the failback finished the pgpool does not attach it as
> secondary!!
> >> >
> >> >
> >> > Can you please advice?
> >> >
> >> > cohavisi
> >> >
> >> >
> >> > On Tue, Mar 1, 2016 at 10:41 AM, Tatsuo Ishii <ishii at postgresql.org>
> >> wrote:
> >> >
> >> >> You should use online recovery instead of pcp_attach_node.
> >> >>
> >> >> Best regards,
> >> >> --
> >> >> Tatsuo Ishii
> >> >> SRA OSS, Inc. Japan
> >> >> English: http://www.sraoss.co.jp/index_en.php
> >> >> Japanese:http://www.sraoss.co.jp
> >> >>
> >> >> > Hi,
> >> >> > I have a Hugh problem regarding attaching a node (as secondary) to
> the
> >> >> pool
> >> >> > after I performing pcp_attach_node.
> >> >> >
> >> >> > after failover is being completed successfully and valid primary
> node
> >> is
> >> >> > active, i'm performing an *pcp_attach (via sql)* to the faulty
> node in
> >> >> > order to failback as secondary!
> >> >> >
> >> >> > *select pcp_attach_node
> (0,'10.10.61.99',1200,9898,'*****','*****') *
> >> >> >
> >> >> > during this command, a failback script is being executed and
> performs
> >> the
> >> >> > following:
> >> >> > 1. rsync between the DB nodes.
> >> >> > 2. create recovery.conf.
> >> >> > 3. startup the node(as secondary).
> >> >> >
> >> >> > *the failback could take for 20 min to finish.*
> >> >> >
> >> >> > after the failback finished *successfully* (exit status 0) and the
> >> node
> >> >> > started as *secondary* (according to postgres) - streaming
> >> replication.
> >> >> >
> >> >> > *the pgpool reportes the node status from 1 to 3 (instead of 2).*
> >> >> >
> >> >> > *** when failback finished early (less then few min) the pgpool
> >> reports
> >> >> the
> >> >> > node status as 2 - as aspected.*
> >> >> >
> >> >> >
> >> >> > please advice regarding this issue...
> >> >> >
> >> >> >
> >> >> > *Thanks,*
> >> >> > *cohavisi*
> >> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20160301/91b8a679/attachment.html>


More information about the pgpool-general mailing list