[pgpool-general: 7250] Re: Query regarding failover and recovery

Praveen Kumar K S praveenssit at gmail.com
Tue Sep 1 15:20:31 JST 2020


Hi Tatsuo,

Thanks for the detailed explanation. But I'm not expecting this. It is not
in my hands if pg1 reboots for some reason. I was assuming that pg2 should
continue to be master and pgpool should send the requests to pg2 until
another failover occurs. And at any point of time, we should not have 2
masters. How can we avoid this situation? If we have 3 postgres nodes, will
it solve the problem?

On Tue, Sep 1, 2020 at 11:36 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> > Hi Tatsuo,
> >
> > Thanks for letting me know the process. Just curious to know what will
> > happen if pg1 reboots and comes back and in the meanwhile pg2 got
> promoted
> > as master. What is the expectation in this situation ?
>
> When pg1 shudown, failover process will mark pg1 as down, and pg2 gets
> promoted to primary. So you have:
>
> pg1: down, standby
> pg2: up, primary
>
> So far, so good. BUT if you restart whole system, you will find that:
>
> pg1: up, primary
> pg2: up, standby
>
> Becuase both pg1 and pg2 are primary and there's no other info to
> decide which one is the true primary. In this case pgpool decides that
> the first node, which is not standby, to be primary.
>
> Actually p2 is not standby, and you will get into trouble because pg2
> is not a standby synched with pg1. (you will see huge
> relplication_delay).
>
> If you turn on detach_false_primary, pgpool will find that pg2 is not
> a standby and make it down.
>
> > On Tue, Sep 1, 2020 at 10:32 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> >
> >> > Hi Tatsuo,
> >> >
> >> > Thanks for your email. Yes, I have read the message and fixed the pcp
> >> > authentication issue.
> >> >
> >> > Now, I'm facing a different issue. I will try to explain it step by
> step.
> >> >
> >> > Sequence:
> >> > 1. pg1 is primary and pg2 is standby
> >> > 2. Stop postgres service on pg1
> >> > 3. Pgpool executes failover
> >> > 4. Now pg2 is my primary. I verified all the configurations.
> >> > 5. Started postgres service on pg1
> >> > 6. Trying to recover failed node
> >> >
> >> > postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ pcp_recovery_node -h
> >> > localhost -p 9898 -n 0
> >> > Password:
> >> > ERROR:  process recovery request failed
> >> > DETAIL:  primary server cannot be recovered by online recovery.
> >>
> >> You shoud have not started pg1 postgres at this
> >> point. pcp_recovery_node should not be executed on running node. See
> >> the manual:
> >>
> >> Note: The recovery target PostgreSQL server must not be running for
> >> performing the online recovery. If the target PostgreSQL server has
> >> already started, you must shut it down before starting the online
> >> recovery.
> >>
> >> If you have not started pg1 then pcp_recovert_node would succeed.
> >>
> >> > Log says verify_backend_node_status: decided node 0 is the true
> primary
> >> >
> >> > This command hangs
> >> > psql -U postgres -h localhost -p 9999 --pset pager=off -c "show
> >> pool_nodes"
> >> >
> >> > I would like to know how pgpool is considering node 0 which is pg1 as
> >> true
> >> > primary ?
> >>
> >> Because pgpool thinks the first primary node found is the true primary
> >> node if there is
> >>
> >> > Then what did I do ?
> >> > 1. Cleaned everything on pg1 and manually configured it as standby.
> >> > 2. Started postgres service
> >> > 3. Log says
> >> > 2020-08-31 13:40:00: pid 2459: DEBUG:  do_query: extended:0
> query:"SELECT
> >> > pg_is_in_recovery()"
> >> > 2020-08-31 13:40:00: pid 2459: DEBUG:  verify_backend_node_status:
> >> there's
> >> > no standby node
> >> > 2020-08-31 13:40:00: pid 2459: DEBUG:  node status[0]: 0
> >> > 2020-08-31 13:40:00: pid 2459: DEBUG:  node status[1]: 1
> >> >
> >> > This command executes now
> >> > postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ psql -U postgres -h
> >> localhost
> >> > -p 9999 --pset pager=off -c "show pool_nodes"
> >> >  node_id | hostname | port | status | lb_weight |  role   |
> select_cnt |
> >> > load_balance_node | replication_delay | last_status_change
> >> >
> >>
> ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
> >> >  0       | pg1      | 5432 | down   | 0.500000  | standby | 0
>   |
> >> > false             | 0                 | 2020-08-31 13:35:44
> >> >  1       | pg2      | 5432 | up     | 0.500000  | primary | 0
>   |
> >> > true              | 0                 | 2020-08-31 13:35:44
> >> > (2 rows)
> >> >
> >> > I checked between pg1 and pg2 and see that streaming replication is
> >> working.
> >> >
> >> > Then I used pcp_attach_node to attach standby
> >> >
> >> > postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ pcp_attach_node -h
> >> localhost
> >> > -p 9898 -n 0
> >> > Password:
> >> > pcp_attach_node -- Command Successful
> >> > postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ psql -U postgres -h
> >> localhost
> >> > -p 9999 --pset pager=off -c "show pool_nodes"
> >> >  node_id | hostname | port | status | lb_weight |  role   |
> select_cnt |
> >> > load_balance_node | replication_delay | last_status_change
> >> >
> >>
> ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
> >> >  0       | pg1      | 5432 | up     | 0.500000  | standby | 0
>   |
> >> > true              | 0                 | 2020-08-31 14:09:31
> >> >  1       | pg2      | 5432 | up     | 0.500000  | primary | 0
>   |
> >> > false             | 0                 | 2020-08-31 14:02:35
> >> > (2 rows)
> >> >
> >> > Then I tried to perform a test operation and it executed successfully.
> >> > postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ psql -U postgres -h
> >> localhost
> >> > -p 9999 --pset pager=off -c "create database covid"
> >> > CREATE DATABASE
> >> >
> >> > Am I missing something in this failover process ? Please let me know
> if
> >> you
> >> > need any additional details.
> >> >
> >> > On Tue, Sep 1, 2020 at 3:03 AM Tatsuo Ishii <ishii at sraoss.co.jp>
> wrote:
> >> >
> >> >> Praveen,
> >> >>
> >> >> Have you read this message? I wonder if you have fixed the issue or
> not.
> >> >>
> >> >> Best regards,
> >> >> --
> >> >> Tatsuo Ishii
> >> >> SRA OSS, Inc. Japan
> >> >> English: http://www.sraoss.co.jp/index_en.php
> >> >> Japanese:http://www.sraoss.co.jp
> >> >>
> >> >> From: Tatsuo Ishii <ishii at sraoss.co.jp>
> >> >> Subject: [pgpool-general: 7208] Re: Query regarding failover and
> >> recovery
> >> >> Date: Thu, 20 Aug 2020 08:51:04 +0900 (JST)
> >> >> Message-ID: <20200820.085104.891242161358675858.t-ishii at sraoss.co.jp
> >
> >> >>
> >> >> > The contents of pcp.conf looks incorrect.
> >> >> >
> >> >> >> postgres:md53175bce1d3201d16594cebf9d7eb3f9d
> >> >> >
> >> >> > The hashed password must not start with "md5".
> >> >> >
> >> >> > To create proper pcp password, please follow the instruction in the
> >> >> manual:
> >> >> > https://www.pgpool.net/docs/40/en/html/configuring-pcp-conf.html
> >> >> >
> >> >> >> Hello,
> >> >> >>
> >> >> >> Thanks for the clarification. I'm trying to execute and getting
> below
> >> >> >> error. I'm attaching configs for your reference. Can you please
> help
> >> ?
> >> >> >>
> >> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$ psql -U postgres -h localhost
> -p
> >> 9999
> >> >> >> --pset pager=off -c "show pool_nodes"
> >> >> >>  node_id | hostname | port | status | lb_weight |  role   |
> >> select_cnt |
> >> >> >> load_balance_node | replication_delay | last_status_change
> >> >> >>
> >> >>
> >>
> ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
> >> >> >>  0       | pg1      | 5432 | down   | 0.500000  | standby | 0
> >>   |
> >> >> >> false             | 0                 | 2020-08-19 09:02:46
> >> >> >>  1       | pg2      | 5432 | up     | 0.500000  | primary | 0
> >>   |
> >> >> >> true              | 0                 | 2020-08-19 09:02:46
> >> >> >> (2 rows)
> >> >> >>
> >> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$
> >> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$
> >> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$
> >> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$ pcp_recovery_node -h localhost
> -p
> >> >> 9898 -n
> >> >> >> 0
> >> >> >> Password:
> >> >> >> FATAL:  authentication failed for user "postgres"
> >> >> >> DETAIL:  username and/or password does not match
> >> >> >>
> >> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$
> >> >> >>
> >> >> >>
> >> >> >> On Wed, Aug 19, 2020 at 10:22 AM Tatsuo Ishii <ishii at sraoss.co.jp
> >
> >> >> wrote:
> >> >> >>
> >> >> >>> > I have 3 servers with two postgres (9.6) and one pgpool
> (4.0.9).
> >> >> Postgres
> >> >> >>> > is configured with streaming replication.
> >> >> >>> > When I manually stop postgres service on primary node, failover
> >> has
> >> >> >>> > happened successfully.
> >> >> >>> > Now I started postgres service on old primary node which is
> >> expected
> >> >> to
> >> >> >>> be
> >> >> >>> > converted as slave, pgpool is not triggering
> >> >> recovery_1st_stage_command =
> >> >> >>> > 'recovery_1st_stage.sh'
> >> >> >>> > May I know what could be the reason ?
> >> >> >>>
> >> >> >>> That is an expected behavior. The node previously brought down is
> >> left
> >> >> >>> as "down" by pgoool. This is intentional. You need to issue
> >> >> >>> pcp_recovery_node against the node (previous primary node in your
> >> >> >>> case) to make it online again.
> >> >> >>>
> >> >> >>> When a node is brought down, there might be a reason: for example
> >> >> >>> needed to repair the hardware. So in general it's not safe to
> >> >> >>> automatically restart the previously down node.
> >> >> >>>
> >> >> >>> Best regards,
> >> >> >>> --
> >> >> >>> Tatsuo Ishii
> >> >> >>> SRA OSS, Inc. Japan
> >> >> >>> English: http://www.sraoss.co.jp/index_en.php
> >> >> >>> Japanese:http://www.sraoss.co.jp
> >> >> >>>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >>
> >> >> >>
> >> >> >> *Regards,*
> >> >> >>
> >> >> >>
> >> >> >> *K S Praveen KumarM: +91-9986855625 *
> >> >> > _______________________________________________
> >> >> > pgpool-general mailing list
> >> >> > pgpool-general at pgpool.net
> >> >> > http://www.pgpool.net/mailman/listinfo/pgpool-general
> >> >>
> >> >
> >> >
> >> > --
> >> >
> >> >
> >> > *Regards,*
> >> >
> >> >
> >> > *K S Praveen KumarM: +91-9986855625 *
> >>
> >
> >
> > --
> >
> >
> > *Regards,*
> >
> >
> > *K S Praveen KumarM: +91-9986855625 *
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
>


-- 


*Regards,*


*K S Praveen KumarM: +91-9986855625 *
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20200901/643f22cf/attachment-0001.html>


More information about the pgpool-general mailing list