[pgpool-general: 7248] Re: Query regarding failover and recovery

Praveen Kumar K S praveenssit at gmail.com
Tue Sep 1 14:14:25 JST 2020


Hi Tatsuo,

Thanks for letting me know the process. Just curious to know what will
happen if pg1 reboots and comes back and in the meanwhile pg2 got promoted
as master. What is the expectation in this situation ?

On Tue, Sep 1, 2020 at 10:32 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> > Hi Tatsuo,
> >
> > Thanks for your email. Yes, I have read the message and fixed the pcp
> > authentication issue.
> >
> > Now, I'm facing a different issue. I will try to explain it step by step.
> >
> > Sequence:
> > 1. pg1 is primary and pg2 is standby
> > 2. Stop postgres service on pg1
> > 3. Pgpool executes failover
> > 4. Now pg2 is my primary. I verified all the configurations.
> > 5. Started postgres service on pg1
> > 6. Trying to recover failed node
> >
> > postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ pcp_recovery_node -h
> > localhost -p 9898 -n 0
> > Password:
> > ERROR:  process recovery request failed
> > DETAIL:  primary server cannot be recovered by online recovery.
>
> You shoud have not started pg1 postgres at this
> point. pcp_recovery_node should not be executed on running node. See
> the manual:
>
> Note: The recovery target PostgreSQL server must not be running for
> performing the online recovery. If the target PostgreSQL server has
> already started, you must shut it down before starting the online
> recovery.
>
> If you have not started pg1 then pcp_recovert_node would succeed.
>
> > Log says verify_backend_node_status: decided node 0 is the true primary
> >
> > This command hangs
> > psql -U postgres -h localhost -p 9999 --pset pager=off -c "show
> pool_nodes"
> >
> > I would like to know how pgpool is considering node 0 which is pg1 as
> true
> > primary ?
>
> Because pgpool thinks the first primary node found is the true primary
> node if there is
>
> > Then what did I do ?
> > 1. Cleaned everything on pg1 and manually configured it as standby.
> > 2. Started postgres service
> > 3. Log says
> > 2020-08-31 13:40:00: pid 2459: DEBUG:  do_query: extended:0 query:"SELECT
> > pg_is_in_recovery()"
> > 2020-08-31 13:40:00: pid 2459: DEBUG:  verify_backend_node_status:
> there's
> > no standby node
> > 2020-08-31 13:40:00: pid 2459: DEBUG:  node status[0]: 0
> > 2020-08-31 13:40:00: pid 2459: DEBUG:  node status[1]: 1
> >
> > This command executes now
> > postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ psql -U postgres -h
> localhost
> > -p 9999 --pset pager=off -c "show pool_nodes"
> >  node_id | hostname | port | status | lb_weight |  role   | select_cnt |
> > load_balance_node | replication_delay | last_status_change
> >
> ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
> >  0       | pg1      | 5432 | down   | 0.500000  | standby | 0          |
> > false             | 0                 | 2020-08-31 13:35:44
> >  1       | pg2      | 5432 | up     | 0.500000  | primary | 0          |
> > true              | 0                 | 2020-08-31 13:35:44
> > (2 rows)
> >
> > I checked between pg1 and pg2 and see that streaming replication is
> working.
> >
> > Then I used pcp_attach_node to attach standby
> >
> > postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ pcp_attach_node -h
> localhost
> > -p 9898 -n 0
> > Password:
> > pcp_attach_node -- Command Successful
> > postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ psql -U postgres -h
> localhost
> > -p 9999 --pset pager=off -c "show pool_nodes"
> >  node_id | hostname | port | status | lb_weight |  role   | select_cnt |
> > load_balance_node | replication_delay | last_status_change
> >
> ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
> >  0       | pg1      | 5432 | up     | 0.500000  | standby | 0          |
> > true              | 0                 | 2020-08-31 14:09:31
> >  1       | pg2      | 5432 | up     | 0.500000  | primary | 0          |
> > false             | 0                 | 2020-08-31 14:02:35
> > (2 rows)
> >
> > Then I tried to perform a test operation and it executed successfully.
> > postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ psql -U postgres -h
> localhost
> > -p 9999 --pset pager=off -c "create database covid"
> > CREATE DATABASE
> >
> > Am I missing something in this failover process ? Please let me know if
> you
> > need any additional details.
> >
> > On Tue, Sep 1, 2020 at 3:03 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> >
> >> Praveen,
> >>
> >> Have you read this message? I wonder if you have fixed the issue or not.
> >>
> >> Best regards,
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >> English: http://www.sraoss.co.jp/index_en.php
> >> Japanese:http://www.sraoss.co.jp
> >>
> >> From: Tatsuo Ishii <ishii at sraoss.co.jp>
> >> Subject: [pgpool-general: 7208] Re: Query regarding failover and
> recovery
> >> Date: Thu, 20 Aug 2020 08:51:04 +0900 (JST)
> >> Message-ID: <20200820.085104.891242161358675858.t-ishii at sraoss.co.jp>
> >>
> >> > The contents of pcp.conf looks incorrect.
> >> >
> >> >> postgres:md53175bce1d3201d16594cebf9d7eb3f9d
> >> >
> >> > The hashed password must not start with "md5".
> >> >
> >> > To create proper pcp password, please follow the instruction in the
> >> manual:
> >> > https://www.pgpool.net/docs/40/en/html/configuring-pcp-conf.html
> >> >
> >> >> Hello,
> >> >>
> >> >> Thanks for the clarification. I'm trying to execute and getting below
> >> >> error. I'm attaching configs for your reference. Can you please help
> ?
> >> >>
> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$ psql -U postgres -h localhost -p
> 9999
> >> >> --pset pager=off -c "show pool_nodes"
> >> >>  node_id | hostname | port | status | lb_weight |  role   |
> select_cnt |
> >> >> load_balance_node | replication_delay | last_status_change
> >> >>
> >>
> ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
> >> >>  0       | pg1      | 5432 | down   | 0.500000  | standby | 0
>   |
> >> >> false             | 0                 | 2020-08-19 09:02:46
> >> >>  1       | pg2      | 5432 | up     | 0.500000  | primary | 0
>   |
> >> >> true              | 0                 | 2020-08-19 09:02:46
> >> >> (2 rows)
> >> >>
> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$
> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$
> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$
> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$ pcp_recovery_node -h localhost -p
> >> 9898 -n
> >> >> 0
> >> >> Password:
> >> >> FATAL:  authentication failed for user "postgres"
> >> >> DETAIL:  username and/or password does not match
> >> >>
> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$
> >> >>
> >> >>
> >> >> On Wed, Aug 19, 2020 at 10:22 AM Tatsuo Ishii <ishii at sraoss.co.jp>
> >> wrote:
> >> >>
> >> >>> > I have 3 servers with two postgres (9.6) and one pgpool (4.0.9).
> >> Postgres
> >> >>> > is configured with streaming replication.
> >> >>> > When I manually stop postgres service on primary node, failover
> has
> >> >>> > happened successfully.
> >> >>> > Now I started postgres service on old primary node which is
> expected
> >> to
> >> >>> be
> >> >>> > converted as slave, pgpool is not triggering
> >> recovery_1st_stage_command =
> >> >>> > 'recovery_1st_stage.sh'
> >> >>> > May I know what could be the reason ?
> >> >>>
> >> >>> That is an expected behavior. The node previously brought down is
> left
> >> >>> as "down" by pgoool. This is intentional. You need to issue
> >> >>> pcp_recovery_node against the node (previous primary node in your
> >> >>> case) to make it online again.
> >> >>>
> >> >>> When a node is brought down, there might be a reason: for example
> >> >>> needed to repair the hardware. So in general it's not safe to
> >> >>> automatically restart the previously down node.
> >> >>>
> >> >>> Best regards,
> >> >>> --
> >> >>> Tatsuo Ishii
> >> >>> SRA OSS, Inc. Japan
> >> >>> English: http://www.sraoss.co.jp/index_en.php
> >> >>> Japanese:http://www.sraoss.co.jp
> >> >>>
> >> >>
> >> >>
> >> >> --
> >> >>
> >> >>
> >> >> *Regards,*
> >> >>
> >> >>
> >> >> *K S Praveen KumarM: +91-9986855625 *
> >> > _______________________________________________
> >> > pgpool-general mailing list
> >> > pgpool-general at pgpool.net
> >> > http://www.pgpool.net/mailman/listinfo/pgpool-general
> >>
> >
> >
> > --
> >
> >
> > *Regards,*
> >
> >
> > *K S Praveen KumarM: +91-9986855625 *
>


-- 


*Regards,*


*K S Praveen KumarM: +91-9986855625 *
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20200901/8da793eb/attachment-0001.html>


More information about the pgpool-general mailing list