[pgpool-general: 7252] Re: Query regarding failover and recovery

Tue Sep 1 15:47:55 JST 2020

Hi Tatsuo,

Thanks for the detailed explanation. I will try out your recommendations
and see the behavior.

On Tue, Sep 1, 2020 at 12:08 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> > Hi Tatsuo,
> >
> > Thanks for the detailed explanation. But I'm not expecting this. It is
> not
> > in my hands if pg1 reboots for some reason. I was assuming that pg2
> should
> > continue to be master and pgpool should send the requests to pg2 until
> > another failover occurs.
>
> Of course. Just rebooting pg1 does not affect to pgpool. Pgpool-II
> will keep on assuming pg1 as down.
>
> > And at any point of time, we should not have 2
> > masters. How can we avoid this situation? If we have 3 postgres nodes,
> will
> > it solve the problem?
>
> But someone could make a mistake anytime: for example, hit pg_ctl
> promote command. What can pgpool do for such a brain dead
> administrator? To avoid the confusion by such that mistake, I would
> recommend:
>
> 1) Do not use -D option of pgpool. Without the option, pgpool remeber
> what was the status of each backend. Without -D option,
>
> >> pg1: down, standby
> >> pg2: up, primary
> >>
> >> So far, so good. BUT if you restart whole system, you will find that:
> >>
> >> pg1: up, primary
> >> pg2: up, standby
>
> will become:
>
> pg1: down, standby
> pg2: up, primary
>
> So far, so good. BUT if you restart whole system, you will find that:
>
> pg1: down, standby
> pg2: up, primary
>
> 2) Enable detach_false_primary.
>
> > On Tue, Sep 1, 2020 at 11:36 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> >
> >> > Hi Tatsuo,
> >> >
> >> > Thanks for letting me know the process. Just curious to know what will
> >> > happen if pg1 reboots and comes back and in the meanwhile pg2 got
> >> promoted
> >> > as master. What is the expectation in this situation ?
> >>
> >> When pg1 shudown, failover process will mark pg1 as down, and pg2 gets
> >> promoted to primary. So you have:
> >>
> >> pg1: down, standby
> >> pg2: up, primary
> >>
> >> So far, so good. BUT if you restart whole system, you will find that:
> >>
> >> pg1: up, primary
> >> pg2: up, standby
> >>
> >> Becuase both pg1 and pg2 are primary and there's no other info to
> >> decide which one is the true primary. In this case pgpool decides that
> >> the first node, which is not standby, to be primary.
> >>
> >> Actually p2 is not standby, and you will get into trouble because pg2
> >> is not a standby synched with pg1. (you will see huge
> >> relplication_delay).
> >>
> >> If you turn on detach_false_primary, pgpool will find that pg2 is not
> >> a standby and make it down.
> >>
> >> > On Tue, Sep 1, 2020 at 10:32 AM Tatsuo Ishii <ishii at sraoss.co.jp>
> wrote:
> >> >
> >> >> > Hi Tatsuo,
> >> >> >
> >> >> > Thanks for your email. Yes, I have read the message and fixed the
> pcp
> >> >> > authentication issue.
> >> >> >
> >> >> > Now, I'm facing a different issue. I will try to explain it step by
> >> step.
> >> >> >
> >> >> > Sequence:
> >> >> > 1. pg1 is primary and pg2 is standby
> >> >> > 2. Stop postgres service on pg1
> >> >> > 3. Pgpool executes failover
> >> >> > 4. Now pg2 is my primary. I verified all the configurations.
> >> >> > 5. Started postgres service on pg1
> >> >> > 6. Trying to recover failed node
> >> >> >
> >> >> > postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ pcp_recovery_node -h
> >> >> > localhost -p 9898 -n 0
> >> >> > Password:
> >> >> > ERROR:  process recovery request failed
> >> >> > DETAIL:  primary server cannot be recovered by online recovery.
> >> >>
> >> >> You shoud have not started pg1 postgres at this
> >> >> point. pcp_recovery_node should not be executed on running node. See
> >> >> the manual:
> >> >>
> >> >> Note: The recovery target PostgreSQL server must not be running for
> >> >> performing the online recovery. If the target PostgreSQL server has
> >> >> already started, you must shut it down before starting the online
> >> >> recovery.
> >> >>
> >> >> If you have not started pg1 then pcp_recovert_node would succeed.
> >> >>
> >> >> > Log says verify_backend_node_status: decided node 0 is the true
> >> primary
> >> >> >
> >> >> > This command hangs
> >> >> > psql -U postgres -h localhost -p 9999 --pset pager=off -c "show
> >> >> pool_nodes"
> >> >> >
> >> >> > I would like to know how pgpool is considering node 0 which is pg1
> as
> >> >> true
> >> >> > primary ?
> >> >>
> >> >> Because pgpool thinks the first primary node found is the true
> primary
> >> >> node if there is
> >> >>
> >> >> > Then what did I do ?
> >> >> > 1. Cleaned everything on pg1 and manually configured it as standby.
> >> >> > 2. Started postgres service
> >> >> > 3. Log says
> >> >> > 2020-08-31 13:40:00: pid 2459: DEBUG:  do_query: extended:0
> >> query:"SELECT
> >> >> > pg_is_in_recovery()"
> >> >> > 2020-08-31 13:40:00: pid 2459: DEBUG:  verify_backend_node_status:
> >> >> there's
> >> >> > no standby node
> >> >> > 2020-08-31 13:40:00: pid 2459: DEBUG:  node status[0]: 0
> >> >> > 2020-08-31 13:40:00: pid 2459: DEBUG:  node status[1]: 1
> >> >> >
> >> >> > This command executes now
> >> >> > postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ psql -U postgres -h
> >> >> localhost
> >> >> > -p 9999 --pset pager=off -c "show pool_nodes"
> >> >> >  node_id | hostname | port | status | lb_weight |  role   |
> >> select_cnt |
> >> >> > load_balance_node | replication_delay | last_status_change
> >> >> >
> >> >>
> >>
> ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
> >> >> >  0       | pg1      | 5432 | down   | 0.500000  | standby | 0
> >>   |
> >> >> > false             | 0                 | 2020-08-31 13:35:44
> >> >> >  1       | pg2      | 5432 | up     | 0.500000  | primary | 0
> >>   |
> >> >> > true              | 0                 | 2020-08-31 13:35:44
> >> >> > (2 rows)
> >> >> >
> >> >> > I checked between pg1 and pg2 and see that streaming replication is
> >> >> working.
> >> >> >
> >> >> > Then I used pcp_attach_node to attach standby
> >> >> >
> >> >> > postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ pcp_attach_node -h
> >> >> localhost
> >> >> > -p 9898 -n 0
> >> >> > Password:
> >> >> > pcp_attach_node -- Command Successful
> >> >> > postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ psql -U postgres -h
> >> >> localhost
> >> >> > -p 9999 --pset pager=off -c "show pool_nodes"
> >> >> >  node_id | hostname | port | status | lb_weight |  role   |
> >> select_cnt |
> >> >> > load_balance_node | replication_delay | last_status_change
> >> >> >
> >> >>
> >>
> ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
> >> >> >  0       | pg1      | 5432 | up     | 0.500000  | standby | 0
> >>   |
> >> >> > true              | 0                 | 2020-08-31 14:09:31
> >> >> >  1       | pg2      | 5432 | up     | 0.500000  | primary | 0
> >>   |
> >> >> > false             | 0                 | 2020-08-31 14:02:35
> >> >> > (2 rows)
> >> >> >
> >> >> > Then I tried to perform a test operation and it executed
> successfully.
> >> >> > postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ psql -U postgres -h
> >> >> localhost
> >> >> > -p 9999 --pset pager=off -c "create database covid"
> >> >> > CREATE DATABASE
> >> >> >
> >> >> > Am I missing something in this failover process ? Please let me
> know
> >> if
> >> >> you
> >> >> > need any additional details.
> >> >> >
> >> >> > On Tue, Sep 1, 2020 at 3:03 AM Tatsuo Ishii <ishii at sraoss.co.jp>
> >> wrote:
> >> >> >
> >> >> >> Praveen,
> >> >> >>
> >> >> >> Have you read this message? I wonder if you have fixed the issue
> or
> >> not.
> >> >> >>
> >> >> >> Best regards,
> >> >> >> --
> >> >> >> Tatsuo Ishii
> >> >> >> SRA OSS, Inc. Japan
> >> >> >> English: http://www.sraoss.co.jp/index_en.php
> >> >> >> Japanese:http://www.sraoss.co.jp
> >> >> >>
> >> >> >> From: Tatsuo Ishii <ishii at sraoss.co.jp>
> >> >> >> Subject: [pgpool-general: 7208] Re: Query regarding failover and
> >> >> recovery
> >> >> >> Date: Thu, 20 Aug 2020 08:51:04 +0900 (JST)
> >> >> >> Message-ID: <
> 20200820.085104.891242161358675858.t-ishii at sraoss.co.jp
> >> >
> >> >> >>
> >> >> >> > The contents of pcp.conf looks incorrect.
> >> >> >> >
> >> >> >> >> postgres:md53175bce1d3201d16594cebf9d7eb3f9d
> >> >> >> >
> >> >> >> > The hashed password must not start with "md5".
> >> >> >> >
> >> >> >> > To create proper pcp password, please follow the instruction in
> the
> >> >> >> manual:
> >> >> >> >
> https://www.pgpool.net/docs/40/en/html/configuring-pcp-conf.html
> >> >> >> >
> >> >> >> >> Hello,
> >> >> >> >>
> >> >> >> >> Thanks for the clarification. I'm trying to execute and getting
> >> below
> >> >> >> >> error. I'm attaching configs for your reference. Can you please
> >> help
> >> >> ?
> >> >> >> >>
> >> >> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$ psql -U postgres -h
> localhost
> >> -p
> >> >> 9999
> >> >> >> >> --pset pager=off -c "show pool_nodes"
> >> >> >> >>  node_id | hostname | port | status | lb_weight |  role   |
> >> >> select_cnt |
> >> >> >> >> load_balance_node | replication_delay | last_status_change
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
> >> >> >> >>  0       | pg1      | 5432 | down   | 0.500000  | standby | 0
> >> >>   |
> >> >> >> >> false             | 0                 | 2020-08-19 09:02:46
> >> >> >> >>  1       | pg2      | 5432 | up     | 0.500000  | primary | 0
> >> >>   |
> >> >> >> >> true              | 0                 | 2020-08-19 09:02:46
> >> >> >> >> (2 rows)
> >> >> >> >>
> >> >> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$
> >> >> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$
> >> >> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$
> >> >> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$ pcp_recovery_node -h
> localhost
> >> -p
> >> >> >> 9898 -n
> >> >> >> >> 0
> >> >> >> >> Password:
> >> >> >> >> FATAL:  authentication failed for user "postgres"
> >> >> >> >> DETAIL:  username and/or password does not match
> >> >> >> >>
> >> >> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> On Wed, Aug 19, 2020 at 10:22 AM Tatsuo Ishii <
> ishii at sraoss.co.jp
> >> >
> >> >> >> wrote:
> >> >> >> >>
> >> >> >> >>> > I have 3 servers with two postgres (9.6) and one pgpool
> >> (4.0.9).
> >> >> >> Postgres
> >> >> >> >>> > is configured with streaming replication.
> >> >> >> >>> > When I manually stop postgres service on primary node,
> failover
> >> >> has
> >> >> >> >>> > happened successfully.
> >> >> >> >>> > Now I started postgres service on old primary node which is
> >> >> expected
> >> >> >> to
> >> >> >> >>> be
> >> >> >> >>> > converted as slave, pgpool is not triggering
> >> >> >> recovery_1st_stage_command =
> >> >> >> >>> > 'recovery_1st_stage.sh'
> >> >> >> >>> > May I know what could be the reason ?
> >> >> >> >>>
> >> >> >> >>> That is an expected behavior. The node previously brought
> down is
> >> >> left
> >> >> >> >>> as "down" by pgoool. This is intentional. You need to issue
> >> >> >> >>> pcp_recovery_node against the node (previous primary node in
> your
> >> >> >> >>> case) to make it online again.
> >> >> >> >>>
> >> >> >> >>> When a node is brought down, there might be a reason: for
> example
> >> >> >> >>> needed to repair the hardware. So in general it's not safe to
> >> >> >> >>> automatically restart the previously down node.
> >> >> >> >>>
> >> >> >> >>> Best regards,
> >> >> >> >>> --
> >> >> >> >>> Tatsuo Ishii
> >> >> >> >>> SRA OSS, Inc. Japan
> >> >> >> >>> English: http://www.sraoss.co.jp/index_en.php
> >> >> >> >>> Japanese:http://www.sraoss.co.jp
> >> >> >> >>>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> *Regards,*
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> *K S Praveen KumarM: +91-9986855625 *
> >> >> >> > _______________________________________________
> >> >> >> > pgpool-general mailing list
> >> >> >> > pgpool-general at pgpool.net
> >> >> >> > http://www.pgpool.net/mailman/listinfo/pgpool-general
> >> >> >>
> >> >> >
> >> >> >
> >> >> > --
> >> >> >
> >> >> >
> >> >> > *Regards,*
> >> >> >
> >> >> >
> >> >> > *K S Praveen KumarM: +91-9986855625 *
> >> >>
> >> >
> >> >
> >> > --
> >> >
> >> >
> >> > *Regards,*
> >> >
> >> >
> >> > *K S Praveen KumarM: +91-9986855625 *
> >> _______________________________________________
> >> pgpool-general mailing list
> >> pgpool-general at pgpool.net
> >> http://www.pgpool.net/mailman/listinfo/pgpool-general
> >>
> >
> >
> > --
> >
> >
> > *Regards,*
> >
> >
> > *K S Praveen KumarM: +91-9986855625 *
>

-- 

*Regards,*

*K S Praveen KumarM: +91-9986855625 *
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20200901/d3758a22/attachment-0001.html>