[pgpool-general: 7257] Re: Query regarding failover and recovery

Tatsuo Ishii ishii at sraoss.co.jp
Wed Sep 2 10:47:17 JST 2020


Hi Praveen,

> Hi Tatsuo,
> 
> I have tried it and now it is not giving message "node 0 is the true
> primary"
> 
> Please excuse me if I'm asking out of pgpool questions. Let us say we have
> 3 postgres servers and let us call it pg1,pg2,pg3.
> 
> Sequence:
> 1. pg1 is master and pg2/pg3 are hot standbys
> 2. pg1 has failed and pg2 is promoted as master
> 3. Now pg3 has to be reconfigured to stream from pg2 and not from pg1.
> 4. Does pgpool have an option to inform pg3 about this failover event ? If
> yes, please send me the docs link.

Yes. Search for "follow_master_command".
https://www.pgpool.net/docs/latest/en/html/runtime-config-failover.html#GUC-FAILOVER-COMMAND

Concrete example script can found here:
https://git.postgresql.org/gitweb/?p=pgpool2.git;a=blob_plain;f=src/sample/scripts/follow_master.sh.sample;hb=refs/heads/V4_1_STABLE

> Also, I'm using custom shell scripts to perform failover and recovery
> operations. Do you recommend any other tools for the same ?

You may want to look into this. There are some production ready scripts.
https://www.pgpool.net/docs/latest/en/html/example-cluster.html

> On Tue, Sep 1, 2020 at 12:17 PM Praveen Kumar K S <praveenssit at gmail.com>
> wrote:
> 
>> Hi Tatsuo,
>>
>> Thanks for the detailed explanation. I will try out your recommendations
>> and see the behavior.
>>
>> On Tue, Sep 1, 2020 at 12:08 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>>
>>> > Hi Tatsuo,
>>> >
>>> > Thanks for the detailed explanation. But I'm not expecting this. It is
>>> not
>>> > in my hands if pg1 reboots for some reason. I was assuming that pg2
>>> should
>>> > continue to be master and pgpool should send the requests to pg2 until
>>> > another failover occurs.
>>>
>>> Of course. Just rebooting pg1 does not affect to pgpool. Pgpool-II
>>> will keep on assuming pg1 as down.
>>>
>>> > And at any point of time, we should not have 2
>>> > masters. How can we avoid this situation? If we have 3 postgres nodes,
>>> will
>>> > it solve the problem?
>>>
>>> But someone could make a mistake anytime: for example, hit pg_ctl
>>> promote command. What can pgpool do for such a brain dead
>>> administrator? To avoid the confusion by such that mistake, I would
>>> recommend:
>>>
>>> 1) Do not use -D option of pgpool. Without the option, pgpool remeber
>>> what was the status of each backend. Without -D option,
>>>
>>> >> pg1: down, standby
>>> >> pg2: up, primary
>>> >>
>>> >> So far, so good. BUT if you restart whole system, you will find that:
>>> >>
>>> >> pg1: up, primary
>>> >> pg2: up, standby
>>>
>>> will become:
>>>
>>> pg1: down, standby
>>> pg2: up, primary
>>>
>>> So far, so good. BUT if you restart whole system, you will find that:
>>>
>>> pg1: down, standby
>>> pg2: up, primary
>>>
>>> 2) Enable detach_false_primary.
>>>
>>> > On Tue, Sep 1, 2020 at 11:36 AM Tatsuo Ishii <ishii at sraoss.co.jp>
>>> wrote:
>>> >
>>> >> > Hi Tatsuo,
>>> >> >
>>> >> > Thanks for letting me know the process. Just curious to know what
>>> will
>>> >> > happen if pg1 reboots and comes back and in the meanwhile pg2 got
>>> >> promoted
>>> >> > as master. What is the expectation in this situation ?
>>> >>
>>> >> When pg1 shudown, failover process will mark pg1 as down, and pg2 gets
>>> >> promoted to primary. So you have:
>>> >>
>>> >> pg1: down, standby
>>> >> pg2: up, primary
>>> >>
>>> >> So far, so good. BUT if you restart whole system, you will find that:
>>> >>
>>> >> pg1: up, primary
>>> >> pg2: up, standby
>>> >>
>>> >> Becuase both pg1 and pg2 are primary and there's no other info to
>>> >> decide which one is the true primary. In this case pgpool decides that
>>> >> the first node, which is not standby, to be primary.
>>> >>
>>> >> Actually p2 is not standby, and you will get into trouble because pg2
>>> >> is not a standby synched with pg1. (you will see huge
>>> >> relplication_delay).
>>> >>
>>> >> If you turn on detach_false_primary, pgpool will find that pg2 is not
>>> >> a standby and make it down.
>>> >>
>>> >> > On Tue, Sep 1, 2020 at 10:32 AM Tatsuo Ishii <ishii at sraoss.co.jp>
>>> wrote:
>>> >> >
>>> >> >> > Hi Tatsuo,
>>> >> >> >
>>> >> >> > Thanks for your email. Yes, I have read the message and fixed the
>>> pcp
>>> >> >> > authentication issue.
>>> >> >> >
>>> >> >> > Now, I'm facing a different issue. I will try to explain it step
>>> by
>>> >> step.
>>> >> >> >
>>> >> >> > Sequence:
>>> >> >> > 1. pg1 is primary and pg2 is standby
>>> >> >> > 2. Stop postgres service on pg1
>>> >> >> > 3. Pgpool executes failover
>>> >> >> > 4. Now pg2 is my primary. I verified all the configurations.
>>> >> >> > 5. Started postgres service on pg1
>>> >> >> > 6. Trying to recover failed node
>>> >> >> >
>>> >> >> > postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ pcp_recovery_node
>>> -h
>>> >> >> > localhost -p 9898 -n 0
>>> >> >> > Password:
>>> >> >> > ERROR:  process recovery request failed
>>> >> >> > DETAIL:  primary server cannot be recovered by online recovery.
>>> >> >>
>>> >> >> You shoud have not started pg1 postgres at this
>>> >> >> point. pcp_recovery_node should not be executed on running node. See
>>> >> >> the manual:
>>> >> >>
>>> >> >> Note: The recovery target PostgreSQL server must not be running for
>>> >> >> performing the online recovery. If the target PostgreSQL server has
>>> >> >> already started, you must shut it down before starting the online
>>> >> >> recovery.
>>> >> >>
>>> >> >> If you have not started pg1 then pcp_recovert_node would succeed.
>>> >> >>
>>> >> >> > Log says verify_backend_node_status: decided node 0 is the true
>>> >> primary
>>> >> >> >
>>> >> >> > This command hangs
>>> >> >> > psql -U postgres -h localhost -p 9999 --pset pager=off -c "show
>>> >> >> pool_nodes"
>>> >> >> >
>>> >> >> > I would like to know how pgpool is considering node 0 which is
>>> pg1 as
>>> >> >> true
>>> >> >> > primary ?
>>> >> >>
>>> >> >> Because pgpool thinks the first primary node found is the true
>>> primary
>>> >> >> node if there is
>>> >> >>
>>> >> >> > Then what did I do ?
>>> >> >> > 1. Cleaned everything on pg1 and manually configured it as
>>> standby.
>>> >> >> > 2. Started postgres service
>>> >> >> > 3. Log says
>>> >> >> > 2020-08-31 13:40:00: pid 2459: DEBUG:  do_query: extended:0
>>> >> query:"SELECT
>>> >> >> > pg_is_in_recovery()"
>>> >> >> > 2020-08-31 13:40:00: pid 2459: DEBUG:  verify_backend_node_status:
>>> >> >> there's
>>> >> >> > no standby node
>>> >> >> > 2020-08-31 13:40:00: pid 2459: DEBUG:  node status[0]: 0
>>> >> >> > 2020-08-31 13:40:00: pid 2459: DEBUG:  node status[1]: 1
>>> >> >> >
>>> >> >> > This command executes now
>>> >> >> > postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ psql -U postgres -h
>>> >> >> localhost
>>> >> >> > -p 9999 --pset pager=off -c "show pool_nodes"
>>> >> >> >  node_id | hostname | port | status | lb_weight |  role   |
>>> >> select_cnt |
>>> >> >> > load_balance_node | replication_delay | last_status_change
>>> >> >> >
>>> >> >>
>>> >>
>>> ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
>>> >> >> >  0       | pg1      | 5432 | down   | 0.500000  | standby | 0
>>> >>   |
>>> >> >> > false             | 0                 | 2020-08-31 13:35:44
>>> >> >> >  1       | pg2      | 5432 | up     | 0.500000  | primary | 0
>>> >>   |
>>> >> >> > true              | 0                 | 2020-08-31 13:35:44
>>> >> >> > (2 rows)
>>> >> >> >
>>> >> >> > I checked between pg1 and pg2 and see that streaming replication
>>> is
>>> >> >> working.
>>> >> >> >
>>> >> >> > Then I used pcp_attach_node to attach standby
>>> >> >> >
>>> >> >> > postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ pcp_attach_node -h
>>> >> >> localhost
>>> >> >> > -p 9898 -n 0
>>> >> >> > Password:
>>> >> >> > pcp_attach_node -- Command Successful
>>> >> >> > postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ psql -U postgres -h
>>> >> >> localhost
>>> >> >> > -p 9999 --pset pager=off -c "show pool_nodes"
>>> >> >> >  node_id | hostname | port | status | lb_weight |  role   |
>>> >> select_cnt |
>>> >> >> > load_balance_node | replication_delay | last_status_change
>>> >> >> >
>>> >> >>
>>> >>
>>> ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
>>> >> >> >  0       | pg1      | 5432 | up     | 0.500000  | standby | 0
>>> >>   |
>>> >> >> > true              | 0                 | 2020-08-31 14:09:31
>>> >> >> >  1       | pg2      | 5432 | up     | 0.500000  | primary | 0
>>> >>   |
>>> >> >> > false             | 0                 | 2020-08-31 14:02:35
>>> >> >> > (2 rows)
>>> >> >> >
>>> >> >> > Then I tried to perform a test operation and it executed
>>> successfully.
>>> >> >> > postgres at ip-172-31-39-241:/etc/pgpool2/4.0.9$ psql -U postgres -h
>>> >> >> localhost
>>> >> >> > -p 9999 --pset pager=off -c "create database covid"
>>> >> >> > CREATE DATABASE
>>> >> >> >
>>> >> >> > Am I missing something in this failover process ? Please let me
>>> know
>>> >> if
>>> >> >> you
>>> >> >> > need any additional details.
>>> >> >> >
>>> >> >> > On Tue, Sep 1, 2020 at 3:03 AM Tatsuo Ishii <ishii at sraoss.co.jp>
>>> >> wrote:
>>> >> >> >
>>> >> >> >> Praveen,
>>> >> >> >>
>>> >> >> >> Have you read this message? I wonder if you have fixed the issue
>>> or
>>> >> not.
>>> >> >> >>
>>> >> >> >> Best regards,
>>> >> >> >> --
>>> >> >> >> Tatsuo Ishii
>>> >> >> >> SRA OSS, Inc. Japan
>>> >> >> >> English: http://www.sraoss.co.jp/index_en.php
>>> >> >> >> Japanese:http://www.sraoss.co.jp
>>> >> >> >>
>>> >> >> >> From: Tatsuo Ishii <ishii at sraoss.co.jp>
>>> >> >> >> Subject: [pgpool-general: 7208] Re: Query regarding failover and
>>> >> >> recovery
>>> >> >> >> Date: Thu, 20 Aug 2020 08:51:04 +0900 (JST)
>>> >> >> >> Message-ID: <
>>> 20200820.085104.891242161358675858.t-ishii at sraoss.co.jp
>>> >> >
>>> >> >> >>
>>> >> >> >> > The contents of pcp.conf looks incorrect.
>>> >> >> >> >
>>> >> >> >> >> postgres:md53175bce1d3201d16594cebf9d7eb3f9d
>>> >> >> >> >
>>> >> >> >> > The hashed password must not start with "md5".
>>> >> >> >> >
>>> >> >> >> > To create proper pcp password, please follow the instruction
>>> in the
>>> >> >> >> manual:
>>> >> >> >> >
>>> https://www.pgpool.net/docs/40/en/html/configuring-pcp-conf.html
>>> >> >> >> >
>>> >> >> >> >> Hello,
>>> >> >> >> >>
>>> >> >> >> >> Thanks for the clarification. I'm trying to execute and
>>> getting
>>> >> below
>>> >> >> >> >> error. I'm attaching configs for your reference. Can you
>>> please
>>> >> help
>>> >> >> ?
>>> >> >> >> >>
>>> >> >> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$ psql -U postgres -h
>>> localhost
>>> >> -p
>>> >> >> 9999
>>> >> >> >> >> --pset pager=off -c "show pool_nodes"
>>> >> >> >> >>  node_id | hostname | port | status | lb_weight |  role   |
>>> >> >> select_cnt |
>>> >> >> >> >> load_balance_node | replication_delay | last_status_change
>>> >> >> >> >>
>>> >> >> >>
>>> >> >>
>>> >>
>>> ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
>>> >> >> >> >>  0       | pg1      | 5432 | down   | 0.500000  | standby | 0
>>> >> >>   |
>>> >> >> >> >> false             | 0                 | 2020-08-19 09:02:46
>>> >> >> >> >>  1       | pg2      | 5432 | up     | 0.500000  | primary | 0
>>> >> >>   |
>>> >> >> >> >> true              | 0                 | 2020-08-19 09:02:46
>>> >> >> >> >> (2 rows)
>>> >> >> >> >>
>>> >> >> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$
>>> >> >> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$
>>> >> >> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$
>>> >> >> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$ pcp_recovery_node -h
>>> localhost
>>> >> -p
>>> >> >> >> 9898 -n
>>> >> >> >> >> 0
>>> >> >> >> >> Password:
>>> >> >> >> >> FATAL:  authentication failed for user "postgres"
>>> >> >> >> >> DETAIL:  username and/or password does not match
>>> >> >> >> >>
>>> >> >> >> >> postgres at pgp1:/etc/pgpool2/4.0.9$
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> On Wed, Aug 19, 2020 at 10:22 AM Tatsuo Ishii <
>>> ishii at sraoss.co.jp
>>> >> >
>>> >> >> >> wrote:
>>> >> >> >> >>
>>> >> >> >> >>> > I have 3 servers with two postgres (9.6) and one pgpool
>>> >> (4.0.9).
>>> >> >> >> Postgres
>>> >> >> >> >>> > is configured with streaming replication.
>>> >> >> >> >>> > When I manually stop postgres service on primary node,
>>> failover
>>> >> >> has
>>> >> >> >> >>> > happened successfully.
>>> >> >> >> >>> > Now I started postgres service on old primary node which is
>>> >> >> expected
>>> >> >> >> to
>>> >> >> >> >>> be
>>> >> >> >> >>> > converted as slave, pgpool is not triggering
>>> >> >> >> recovery_1st_stage_command =
>>> >> >> >> >>> > 'recovery_1st_stage.sh'
>>> >> >> >> >>> > May I know what could be the reason ?
>>> >> >> >> >>>
>>> >> >> >> >>> That is an expected behavior. The node previously brought
>>> down is
>>> >> >> left
>>> >> >> >> >>> as "down" by pgoool. This is intentional. You need to issue
>>> >> >> >> >>> pcp_recovery_node against the node (previous primary node in
>>> your
>>> >> >> >> >>> case) to make it online again.
>>> >> >> >> >>>
>>> >> >> >> >>> When a node is brought down, there might be a reason: for
>>> example
>>> >> >> >> >>> needed to repair the hardware. So in general it's not safe to
>>> >> >> >> >>> automatically restart the previously down node.
>>> >> >> >> >>>
>>> >> >> >> >>> Best regards,
>>> >> >> >> >>> --
>>> >> >> >> >>> Tatsuo Ishii
>>> >> >> >> >>> SRA OSS, Inc. Japan
>>> >> >> >> >>> English: http://www.sraoss.co.jp/index_en.php
>>> >> >> >> >>> Japanese:http://www.sraoss.co.jp
>>> >> >> >> >>>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> --
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> *Regards,*
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> *K S Praveen KumarM: +91-9986855625 *
>>> >> >> >> > _______________________________________________
>>> >> >> >> > pgpool-general mailing list
>>> >> >> >> > pgpool-general at pgpool.net
>>> >> >> >> > http://www.pgpool.net/mailman/listinfo/pgpool-general
>>> >> >> >>
>>> >> >> >
>>> >> >> >
>>> >> >> > --
>>> >> >> >
>>> >> >> >
>>> >> >> > *Regards,*
>>> >> >> >
>>> >> >> >
>>> >> >> > *K S Praveen KumarM: +91-9986855625 *
>>> >> >>
>>> >> >
>>> >> >
>>> >> > --
>>> >> >
>>> >> >
>>> >> > *Regards,*
>>> >> >
>>> >> >
>>> >> > *K S Praveen KumarM: +91-9986855625 *
>>> >> _______________________________________________
>>> >> pgpool-general mailing list
>>> >> pgpool-general at pgpool.net
>>> >> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>> >>
>>> >
>>> >
>>> > --
>>> >
>>> >
>>> > *Regards,*
>>> >
>>> >
>>> > *K S Praveen KumarM: +91-9986855625 *
>>>
>>
>>
>> --
>>
>>
>> *Regards,*
>>
>>
>> *K S Praveen KumarM: +91-9986855625 *
>>
> 
> 
> -- 
> 
> 
> *Regards,*
> 
> 
> *K S Praveen KumarM: +91-9986855625 *


More information about the pgpool-general mailing list