[pgpool-general: 7244] Re: Query regarding failover and recovery

Tue Sep 1 12:10:53 JST 2020

Hi Tatsuo,

Thanks for your email. Yes, I have read the message and fixed the pcp
authentication issue.

Now, I'm facing a different issue. I will try to explain it step by step.

Sequence:
1. pg1 is primary and pg2 is standby
2. Stop postgres service on pg1
3. Pgpool executes failover
4. Now pg2 is my primary. I verified all the configurations.
5. Started postgres service on pg1
6. Trying to recover failed node

postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ pcp_recovery_node -h
localhost -p 9898 -n 0
Password:
ERROR:  process recovery request failed
DETAIL:  primary server cannot be recovered by online recovery.

Log says verify_backend_node_status: decided node 0 is the true primary

This command hangs
psql -U postgres -h localhost -p 9999 --pset pager=off -c "show pool_nodes"

I would like to know how pgpool is considering node 0 which is pg1 as true
primary ?

Then what did I do ?
1. Cleaned everything on pg1 and manually configured it as standby.
2. Started postgres service
3. Log says
2020-08-31 13:40:00: pid 2459: DEBUG:  do_query: extended:0 query:"SELECT
pg_is_in_recovery()"
2020-08-31 13:40:00: pid 2459: DEBUG:  verify_backend_node_status: there's
no standby node
2020-08-31 13:40:00: pid 2459: DEBUG:  node status[0]: 0
2020-08-31 13:40:00: pid 2459: DEBUG:  node status[1]: 1

This command executes now
postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ psql -U postgres -h localhost
-p 9999 --pset pager=off -c "show pool_nodes"
 node_id | hostname | port | status | lb_weight |  role   | select_cnt |
load_balance_node | replication_delay | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
 0       | pg1      | 5432 | down   | 0.500000  | standby | 0          |
false             | 0                 | 2020-08-31 13:35:44
 1       | pg2      | 5432 | up     | 0.500000  | primary | 0          |
true              | 0                 | 2020-08-31 13:35:44
(2 rows)

I checked between pg1 and pg2 and see that streaming replication is working.

Then I used pcp_attach_node to attach standby

postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ pcp_attach_node -h localhost
-p 9898 -n 0
Password:
pcp_attach_node -- Command Successful
postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ psql -U postgres -h localhost
-p 9999 --pset pager=off -c "show pool_nodes"
 node_id | hostname | port | status | lb_weight |  role   | select_cnt |
load_balance_node | replication_delay | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
 0       | pg1      | 5432 | up     | 0.500000  | standby | 0          |
true              | 0                 | 2020-08-31 14:09:31
 1       | pg2      | 5432 | up     | 0.500000  | primary | 0          |
false             | 0                 | 2020-08-31 14:02:35
(2 rows)

Then I tried to perform a test operation and it executed successfully.
postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ psql -U postgres -h localhost
-p 9999 --pset pager=off -c "create database covid"
CREATE DATABASE

Am I missing something in this failover process ? Please let me know if you
need any additional details.

On Tue, Sep 1, 2020 at 3:03 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> Praveen,
>
> Have you read this message? I wonder if you have fixed the issue or not.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> From: Tatsuo Ishii <ishii at sraoss.co.jp>
> Subject: [pgpool-general: 7208] Re: Query regarding failover and recovery
> Date: Thu, 20 Aug 2020 08:51:04 +0900 (JST)
> Message-ID: <20200820.085104.891242161358675858.t-ishii at sraoss.co.jp>
>
> > The contents of pcp.conf looks incorrect.
> >
> >> postgres:md53175bce1d3201d16594cebf9d7eb3f9d
> >
> > The hashed password must not start with "md5".
> >
> > To create proper pcp password, please follow the instruction in the
> manual:
> > https://www.pgpool.net/docs/40/en/html/configuring-pcp-conf.html
> >
> >> Hello,
> >>
> >> Thanks for the clarification. I'm trying to execute and getting below
> >> error. I'm attaching configs for your reference. Can you please help ?
> >>
> >> postgres at pgp1:/etc/pgpool2/4.0.9$ psql -U postgres -h localhost -p 9999
> >> --pset pager=off -c "show pool_nodes"
> >>  node_id | hostname | port | status | lb_weight |  role   | select_cnt |
> >> load_balance_node | replication_delay | last_status_change
> >>
> ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
> >>  0       | pg1      | 5432 | down   | 0.500000  | standby | 0          |
> >> false             | 0                 | 2020-08-19 09:02:46
> >>  1       | pg2      | 5432 | up     | 0.500000  | primary | 0          |
> >> true              | 0                 | 2020-08-19 09:02:46
> >> (2 rows)
> >>
> >> postgres at pgp1:/etc/pgpool2/4.0.9$
> >> postgres at pgp1:/etc/pgpool2/4.0.9$
> >> postgres at pgp1:/etc/pgpool2/4.0.9$
> >> postgres at pgp1:/etc/pgpool2/4.0.9$ pcp_recovery_node -h localhost -p
> 9898 -n
> >> 0
> >> Password:
> >> FATAL:  authentication failed for user "postgres"
> >> DETAIL:  username and/or password does not match
> >>
> >> postgres at pgp1:/etc/pgpool2/4.0.9$
> >>
> >>
> >> On Wed, Aug 19, 2020 at 10:22 AM Tatsuo Ishii <ishii at sraoss.co.jp>
> wrote:
> >>
> >>> > I have 3 servers with two postgres (9.6) and one pgpool (4.0.9).
> Postgres
> >>> > is configured with streaming replication.
> >>> > When I manually stop postgres service on primary node, failover has
> >>> > happened successfully.
> >>> > Now I started postgres service on old primary node which is expected
> to
> >>> be
> >>> > converted as slave, pgpool is not triggering
> recovery_1st_stage_command =
> >>> > 'recovery_1st_stage.sh'
> >>> > May I know what could be the reason ?
> >>>
> >>> That is an expected behavior. The node previously brought down is left
> >>> as "down" by pgoool. This is intentional. You need to issue
> >>> pcp_recovery_node against the node (previous primary node in your
> >>> case) to make it online again.
> >>>
> >>> When a node is brought down, there might be a reason: for example
> >>> needed to repair the hardware. So in general it's not safe to
> >>> automatically restart the previously down node.
> >>>
> >>> Best regards,
> >>> --
> >>> Tatsuo Ishii
> >>> SRA OSS, Inc. Japan
> >>> English: http://www.sraoss.co.jp/index_en.php
> >>> Japanese:http://www.sraoss.co.jp
> >>>
> >>
> >>
> >> --
> >>
> >>
> >> *Regards,*
> >>
> >>
> >> *K S Praveen KumarM: +91-9986855625 *
> > _______________________________________________
> > pgpool-general mailing list
> > pgpool-general at pgpool.net
> > http://www.pgpool.net/mailman/listinfo/pgpool-general
>

-- 

*Regards,*

*K S Praveen KumarM: +91-9986855625 *
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20200901/375ce2ee/attachment.htm>