[pgpool-general: 7244] Re: Query regarding failover and recovery
Praveen Kumar K S
praveenssit at gmail.com
Tue Sep 1 12:10:53 JST 2020
Hi Tatsuo,
Thanks for your email. Yes, I have read the message and fixed the pcp
authentication issue.
Now, I'm facing a different issue. I will try to explain it step by step.
Sequence:
1. pg1 is primary and pg2 is standby
2. Stop postgres service on pg1
3. Pgpool executes failover
4. Now pg2 is my primary. I verified all the configurations.
5. Started postgres service on pg1
6. Trying to recover failed node
postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ pcp_recovery_node -h
localhost -p 9898 -n 0
Password:
ERROR: process recovery request failed
DETAIL: primary server cannot be recovered by online recovery.
Log says verify_backend_node_status: decided node 0 is the true primary
This command hangs
psql -U postgres -h localhost -p 9999 --pset pager=off -c "show pool_nodes"
I would like to know how pgpool is considering node 0 which is pg1 as true
primary ?
Then what did I do ?
1. Cleaned everything on pg1 and manually configured it as standby.
2. Started postgres service
3. Log says
2020-08-31 13:40:00: pid 2459: DEBUG: do_query: extended:0 query:"SELECT
pg_is_in_recovery()"
2020-08-31 13:40:00: pid 2459: DEBUG: verify_backend_node_status: there's
no standby node
2020-08-31 13:40:00: pid 2459: DEBUG: node status[0]: 0
2020-08-31 13:40:00: pid 2459: DEBUG: node status[1]: 1
This command executes now
postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ psql -U postgres -h localhost
-p 9999 --pset pager=off -c "show pool_nodes"
node_id | hostname | port | status | lb_weight | role | select_cnt |
load_balance_node | replication_delay | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
0 | pg1 | 5432 | down | 0.500000 | standby | 0 |
false | 0 | 2020-08-31 13:35:44
1 | pg2 | 5432 | up | 0.500000 | primary | 0 |
true | 0 | 2020-08-31 13:35:44
(2 rows)
I checked between pg1 and pg2 and see that streaming replication is working.
Then I used pcp_attach_node to attach standby
postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ pcp_attach_node -h localhost
-p 9898 -n 0
Password:
pcp_attach_node -- Command Successful
postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ psql -U postgres -h localhost
-p 9999 --pset pager=off -c "show pool_nodes"
node_id | hostname | port | status | lb_weight | role | select_cnt |
load_balance_node | replication_delay | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
0 | pg1 | 5432 | up | 0.500000 | standby | 0 |
true | 0 | 2020-08-31 14:09:31
1 | pg2 | 5432 | up | 0.500000 | primary | 0 |
false | 0 | 2020-08-31 14:02:35
(2 rows)
Then I tried to perform a test operation and it executed successfully.
postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ psql -U postgres -h localhost
-p 9999 --pset pager=off -c "create database covid"
CREATE DATABASE
Am I missing something in this failover process ? Please let me know if you
need any additional details.
On Tue, Sep 1, 2020 at 3:03 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> Praveen,
>
> Have you read this message? I wonder if you have fixed the issue or not.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> From: Tatsuo Ishii <ishii at sraoss.co.jp>
> Subject: [pgpool-general: 7208] Re: Query regarding failover and recovery
> Date: Thu, 20 Aug 2020 08:51:04 +0900 (JST)
> Message-ID: <20200820.085104.891242161358675858.t-ishii at sraoss.co.jp>
>
> > The contents of pcp.conf looks incorrect.
> >
> >> postgres:md53175bce1d3201d16594cebf9d7eb3f9d
> >
> > The hashed password must not start with "md5".
> >
> > To create proper pcp password, please follow the instruction in the
> manual:
> > https://www.pgpool.net/docs/40/en/html/configuring-pcp-conf.html
> >
> >> Hello,
> >>
> >> Thanks for the clarification. I'm trying to execute and getting below
> >> error. I'm attaching configs for your reference. Can you please help ?
> >>
> >> postgres at pgp1:/etc/pgpool2/4.0.9$ psql -U postgres -h localhost -p 9999
> >> --pset pager=off -c "show pool_nodes"
> >> node_id | hostname | port | status | lb_weight | role | select_cnt |
> >> load_balance_node | replication_delay | last_status_change
> >>
> ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
> >> 0 | pg1 | 5432 | down | 0.500000 | standby | 0 |
> >> false | 0 | 2020-08-19 09:02:46
> >> 1 | pg2 | 5432 | up | 0.500000 | primary | 0 |
> >> true | 0 | 2020-08-19 09:02:46
> >> (2 rows)
> >>
> >> postgres at pgp1:/etc/pgpool2/4.0.9$
> >> postgres at pgp1:/etc/pgpool2/4.0.9$
> >> postgres at pgp1:/etc/pgpool2/4.0.9$
> >> postgres at pgp1:/etc/pgpool2/4.0.9$ pcp_recovery_node -h localhost -p
> 9898 -n
> >> 0
> >> Password:
> >> FATAL: authentication failed for user "postgres"
> >> DETAIL: username and/or password does not match
> >>
> >> postgres at pgp1:/etc/pgpool2/4.0.9$
> >>
> >>
> >> On Wed, Aug 19, 2020 at 10:22 AM Tatsuo Ishii <ishii at sraoss.co.jp>
> wrote:
> >>
> >>> > I have 3 servers with two postgres (9.6) and one pgpool (4.0.9).
> Postgres
> >>> > is configured with streaming replication.
> >>> > When I manually stop postgres service on primary node, failover has
> >>> > happened successfully.
> >>> > Now I started postgres service on old primary node which is expected
> to
> >>> be
> >>> > converted as slave, pgpool is not triggering
> recovery_1st_stage_command =
> >>> > 'recovery_1st_stage.sh'
> >>> > May I know what could be the reason ?
> >>>
> >>> That is an expected behavior. The node previously brought down is left
> >>> as "down" by pgoool. This is intentional. You need to issue
> >>> pcp_recovery_node against the node (previous primary node in your
> >>> case) to make it online again.
> >>>
> >>> When a node is brought down, there might be a reason: for example
> >>> needed to repair the hardware. So in general it's not safe to
> >>> automatically restart the previously down node.
> >>>
> >>> Best regards,
> >>> --
> >>> Tatsuo Ishii
> >>> SRA OSS, Inc. Japan
> >>> English: http://www.sraoss.co.jp/index_en.php
> >>> Japanese:http://www.sraoss.co.jp
> >>>
> >>
> >>
> >> --
> >>
> >>
> >> *Regards,*
> >>
> >>
> >> *K S Praveen KumarM: +91-9986855625 *
> > _______________________________________________
> > pgpool-general mailing list
> > pgpool-general at pgpool.net
> > http://www.pgpool.net/mailman/listinfo/pgpool-general
>
--
*Regards,*
*K S Praveen KumarM: +91-9986855625 *
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20200901/375ce2ee/attachment.htm>
More information about the pgpool-general
mailing list