<div dir="ltr">Hi Tatsuo,<div><br></div><div>I have tried it and now it is not giving message "node 0 is the true primary"</div><div><br></div><div>Please excuse me if I'm asking out of pgpool questions. Let us say we have 3 postgres servers and let us call it pg1,pg2,pg3.</div><div><br></div><div>Sequence:</div><div>1. pg1 is master and pg2/pg3 are hot standbys</div><div>2. pg1 has failed and pg2 is promoted as master</div><div>3. Now pg3 has to be reconfigured to stream from pg2 and not from pg1.</div><div>4. Does pgpool have an option to inform pg3 about this failover event ? If yes, please send me the docs link.</div><div><br></div><div>Also, I'm using custom shell scripts to perform failover and recovery operations. Do you recommend any other tools for the same ?</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Sep 1, 2020 at 12:17 PM Praveen Kumar K S <<a href="mailto:praveenssit@gmail.com">praveenssit@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi Tatsuo,<div><br></div><div>Thanks for the detailed explanation. I will try out your recommendations and see the behavior. </div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Sep 1, 2020 at 12:08 PM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">> Hi Tatsuo,<br>

> <br>

> Thanks for the detailed explanation. But I'm not expecting this. It is not<br>

> in my hands if pg1 reboots for some reason. I was assuming that pg2 should<br>

> continue to be master and pgpool should send the requests to pg2 until<br>

> another failover occurs.<br>

<br>

Of course. Just rebooting pg1 does not affect to pgpool. Pgpool-II<br>

will keep on assuming pg1 as down.<br>

<br>

> And at any point of time, we should not have 2<br>

> masters. How can we avoid this situation? If we have 3 postgres nodes, will<br>

> it solve the problem?<br>

<br>

But someone could make a mistake anytime: for example, hit pg_ctl<br>

promote command. What can pgpool do for such a brain dead<br>

administrator? To avoid the confusion by such that mistake, I would<br>

recommend:<br>

<br>

1) Do not use -D option of pgpool. Without the option, pgpool remeber<br>

what was the status of each backend. Without -D option,<br>

<br>

>> pg1: down, standby<br>

>> pg2: up, primary<br>

>><br>

>> So far, so good. BUT if you restart whole system, you will find that:<br>

>><br>

>> pg1: up, primary<br>

>> pg2: up, standby<br>

<br>

will become:<br>

<br>

pg1: down, standby<br>

pg2: up, primary<br>

<br>

So far, so good. BUT if you restart whole system, you will find that:<br>

<br>

pg1: down, standby<br>

pg2: up, primary<br>

<br>

2) Enable detach_false_primary.<br>

<br>

> On Tue, Sep 1, 2020 at 11:36 AM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>> wrote:<br>

> <br>

>> > Hi Tatsuo,<br>

>> ><br>

>> > Thanks for letting me know the process. Just curious to know what will<br>

>> > happen if pg1 reboots and comes back and in the meanwhile pg2 got<br>

>> promoted<br>

>> > as master. What is the expectation in this situation ?<br>

>><br>

>> When pg1 shudown, failover process will mark pg1 as down, and pg2 gets<br>

>> promoted to primary. So you have:<br>

>><br>

>> pg1: down, standby<br>

>> pg2: up, primary<br>

>><br>

>> So far, so good. BUT if you restart whole system, you will find that:<br>

>><br>

>> pg1: up, primary<br>

>> pg2: up, standby<br>

>><br>

>> Becuase both pg1 and pg2 are primary and there's no other info to<br>

>> decide which one is the true primary. In this case pgpool decides that<br>

>> the first node, which is not standby, to be primary.<br>

>><br>

>> Actually p2 is not standby, and you will get into trouble because pg2<br>

>> is not a standby synched with pg1. (you will see huge<br>

>> relplication_delay).<br>

>><br>

>> If you turn on detach_false_primary, pgpool will find that pg2 is not<br>

>> a standby and make it down.<br>

>><br>

>> > On Tue, Sep 1, 2020 at 10:32 AM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>> wrote:<br>

>> ><br>

>> >> > Hi Tatsuo,<br>

>> >> ><br>

>> >> > Thanks for your email. Yes, I have read the message and fixed the pcp<br>

>> >> > authentication issue.<br>

>> >> ><br>

>> >> > Now, I'm facing a different issue. I will try to explain it step by<br>

>> step.<br>

>> >> ><br>

>> >> > Sequence:<br>

>> >> > 1. pg1 is primary and pg2 is standby<br>

>> >> > 2. Stop postgres service on pg1<br>

>> >> > 3. Pgpool executes failover<br>

>> >> > 4. Now pg2 is my primary. I verified all the configurations.<br>

>> >> > 5. Started postgres service on pg1<br>

>> >> > 6. Trying to recover failed node<br>

>> >> ><br>

>> >> > postgres@ip-172-31-39-241:/etc/pgpool2/4.0.9$ pcp_recovery_node -h<br>

>> >> > localhost -p 9898 -n 0<br>

>> >> > Password:<br>

>> >> > ERROR:  process recovery request failed<br>

>> >> > DETAIL:  primary server cannot be recovered by online recovery.<br>

>> >><br>

>> >> You shoud have not started pg1 postgres at this<br>

>> >> point. pcp_recovery_node should not be executed on running node. See<br>

>> >> the manual:<br>

>> >><br>

>> >> Note: The recovery target PostgreSQL server must not be running for<br>

>> >> performing the online recovery. If the target PostgreSQL server has<br>

>> >> already started, you must shut it down before starting the online<br>

>> >> recovery.<br>

>> >><br>

>> >> If you have not started pg1 then pcp_recovert_node would succeed.<br>

>> >><br>

>> >> > Log says verify_backend_node_status: decided node 0 is the true<br>

>> primary<br>

>> >> ><br>

>> >> > This command hangs<br>

>> >> > psql -U postgres -h localhost -p 9999 --pset pager=off -c "show<br>

>> >> pool_nodes"<br>

>> >> ><br>

>> >> > I would like to know how pgpool is considering node 0 which is pg1 as<br>

>> >> true<br>

>> >> > primary ?<br>

>> >><br>

>> >> Because pgpool thinks the first primary node found is the true primary<br>

>> >> node if there is<br>

>> >><br>

>> >> > Then what did I do ?<br>

>> >> > 1. Cleaned everything on pg1 and manually configured it as standby.<br>

>> >> > 2. Started postgres service<br>

>> >> > 3. Log says<br>

>> >> > 2020-08-31 13:40:00: pid 2459: DEBUG:  do_query: extended:0<br>

>> query:"SELECT<br>

>> >> > pg_is_in_recovery()"<br>

>> >> > 2020-08-31 13:40:00: pid 2459: DEBUG:  verify_backend_node_status:<br>

>> >> there's<br>

>> >> > no standby node<br>

>> >> > 2020-08-31 13:40:00: pid 2459: DEBUG:  node status[0]: 0<br>

>> >> > 2020-08-31 13:40:00: pid 2459: DEBUG:  node status[1]: 1<br>

>> >> ><br>

>> >> > This command executes now<br>

>> >> > postgres@ip-172-31-39-241:/etc/pgpool2/4.0.9$ psql -U postgres -h<br>

>> >> localhost<br>

>> >> > -p 9999 --pset pager=off -c "show pool_nodes"<br>

>> >> >  node_id | hostname | port | status | lb_weight |  role   |<br>

>> select_cnt |<br>

>> >> > load_balance_node | replication_delay | last_status_change<br>

>> >> ><br>

>> >><br>

>> ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------<br>

>> >> >  0       | pg1      | 5432 | down   | 0.500000  | standby | 0<br>

>>   |<br>

>> >> > false             | 0                 | 2020-08-31 13:35:44<br>

>> >> >  1       | pg2      | 5432 | up     | 0.500000  | primary | 0<br>

>>   |<br>

>> >> > true              | 0                 | 2020-08-31 13:35:44<br>

>> >> > (2 rows)<br>

>> >> ><br>

>> >> > I checked between pg1 and pg2 and see that streaming replication is<br>

>> >> working.<br>

>> >> ><br>

>> >> > Then I used pcp_attach_node to attach standby<br>

>> >> ><br>

>> >> > postgres@ip-172-31-39-241:/etc/pgpool2/4.0.9$ pcp_attach_node -h<br>

>> >> localhost<br>

>> >> > -p 9898 -n 0<br>

>> >> > Password:<br>

>> >> > pcp_attach_node -- Command Successful<br>

>> >> > postgres@ip-172-31-39-241:/etc/pgpool2/4.0.9$ psql -U postgres -h<br>

>> >> localhost<br>

>> >> > -p 9999 --pset pager=off -c "show pool_nodes"<br>

>> >> >  node_id | hostname | port | status | lb_weight |  role   |<br>

>> select_cnt |<br>

>> >> > load_balance_node | replication_delay | last_status_change<br>

>> >> ><br>

>> >><br>

>> ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------<br>

>> >> >  0       | pg1      | 5432 | up     | 0.500000  | standby | 0<br>

>>   |<br>

>> >> > true              | 0                 | 2020-08-31 14:09:31<br>

>> >> >  1       | pg2      | 5432 | up     | 0.500000  | primary | 0<br>

>>   |<br>

>> >> > false             | 0                 | 2020-08-31 14:02:35<br>

>> >> > (2 rows)<br>

>> >> ><br>

>> >> > Then I tried to perform a test operation and it executed successfully.<br>

>> >> > postgres@ip-172-31-39-241:/etc/pgpool2/4.0.9$ psql -U postgres -h<br>

>> >> localhost<br>

>> >> > -p 9999 --pset pager=off -c "create database covid"<br>

>> >> > CREATE DATABASE<br>

>> >> ><br>

>> >> > Am I missing something in this failover process ? Please let me know<br>

>> if<br>

>> >> you<br>

>> >> > need any additional details.<br>

>> >> ><br>

>> >> > On Tue, Sep 1, 2020 at 3:03 AM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>><br>

>> wrote:<br>

>> >> ><br>

>> >> >> Praveen,<br>

>> >> >><br>

>> >> >> Have you read this message? I wonder if you have fixed the issue or<br>

>> not.<br>

>> >> >><br>

>> >> >> Best regards,<br>

>> >> >> --<br>

>> >> >> Tatsuo Ishii<br>

>> >> >> SRA OSS, Inc. Japan<br>

>> >> >> English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>

>> >> >> Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.jp</a><br>

>> >> >><br>

>> >> >> From: Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>><br>

>> >> >> Subject: [pgpool-general: 7208] Re: Query regarding failover and<br>

>> >> recovery<br>

>> >> >> Date: Thu, 20 Aug 2020 08:51:04 +0900 (JST)<br>

>> >> >> Message-ID: <<a href="mailto:20200820.085104.891242161358675858.t-ishii@sraoss.co.jp" target="_blank">20200820.085104.891242161358675858.t-ishii@sraoss.co.jp</a><br>

>> ><br>

>> >> >><br>

>> >> >> > The contents of pcp.conf looks incorrect.<br>

>> >> >> ><br>

>> >> >> >> postgres:md53175bce1d3201d16594cebf9d7eb3f9d<br>

>> >> >> ><br>

>> >> >> > The hashed password must not start with "md5".<br>

>> >> >> ><br>

>> >> >> > To create proper pcp password, please follow the instruction in the<br>

>> >> >> manual:<br>

>> >> >> > <a href="https://www.pgpool.net/docs/40/en/html/configuring-pcp-conf.html" rel="noreferrer" target="_blank">https://www.pgpool.net/docs/40/en/html/configuring-pcp-conf.html</a><br>

>> >> >> ><br>

>> >> >> >> Hello,<br>

>> >> >> >><br>

>> >> >> >> Thanks for the clarification. I'm trying to execute and getting<br>

>> below<br>

>> >> >> >> error. I'm attaching configs for your reference. Can you please<br>

>> help<br>

>> >> ?<br>

>> >> >> >><br>

>> >> >> >> postgres@pgp1:/etc/pgpool2/4.0.9$ psql -U postgres -h localhost<br>

>> -p<br>

>> >> 9999<br>

>> >> >> >> --pset pager=off -c "show pool_nodes"<br>

>> >> >> >>  node_id | hostname | port | status | lb_weight |  role   |<br>

>> >> select_cnt |<br>

>> >> >> >> load_balance_node | replication_delay | last_status_change<br>

>> >> >> >><br>

>> >> >><br>

>> >><br>

>> ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------<br>

>> >> >> >>  0       | pg1      | 5432 | down   | 0.500000  | standby | 0<br>

>> >>   |<br>

>> >> >> >> false             | 0                 | 2020-08-19 09:02:46<br>

>> >> >> >>  1       | pg2      | 5432 | up     | 0.500000  | primary | 0<br>

>> >>   |<br>

>> >> >> >> true              | 0                 | 2020-08-19 09:02:46<br>

>> >> >> >> (2 rows)<br>

>> >> >> >><br>

>> >> >> >> postgres@pgp1:/etc/pgpool2/4.0.9$<br>

>> >> >> >> postgres@pgp1:/etc/pgpool2/4.0.9$<br>

>> >> >> >> postgres@pgp1:/etc/pgpool2/4.0.9$<br>

>> >> >> >> postgres@pgp1:/etc/pgpool2/4.0.9$ pcp_recovery_node -h localhost<br>

>> -p<br>

>> >> >> 9898 -n<br>

>> >> >> >> 0<br>

>> >> >> >> Password:<br>

>> >> >> >> FATAL:  authentication failed for user "postgres"<br>

>> >> >> >> DETAIL:  username and/or password does not match<br>

>> >> >> >><br>

>> >> >> >> postgres@pgp1:/etc/pgpool2/4.0.9$<br>

>> >> >> >><br>

>> >> >> >><br>

>> >> >> >> On Wed, Aug 19, 2020 at 10:22 AM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a><br>

>> ><br>

>> >> >> wrote:<br>

>> >> >> >><br>

>> >> >> >>> > I have 3 servers with two postgres (9.6) and one pgpool<br>

>> (4.0.9).<br>

>> >> >> Postgres<br>

>> >> >> >>> > is configured with streaming replication.<br>

>> >> >> >>> > When I manually stop postgres service on primary node, failover<br>

>> >> has<br>

>> >> >> >>> > happened successfully.<br>

>> >> >> >>> > Now I started postgres service on old primary node which is<br>

>> >> expected<br>

>> >> >> to<br>

>> >> >> >>> be<br>

>> >> >> >>> > converted as slave, pgpool is not triggering<br>

>> >> >> recovery_1st_stage_command =<br>

>> >> >> >>> > 'recovery_1st_stage.sh'<br>

>> >> >> >>> > May I know what could be the reason ?<br>

>> >> >> >>><br>

>> >> >> >>> That is an expected behavior. The node previously brought down is<br>

>> >> left<br>

>> >> >> >>> as "down" by pgoool. This is intentional. You need to issue<br>

>> >> >> >>> pcp_recovery_node against the node (previous primary node in your<br>

>> >> >> >>> case) to make it online again.<br>

>> >> >> >>><br>

>> >> >> >>> When a node is brought down, there might be a reason: for example<br>

>> >> >> >>> needed to repair the hardware. So in general it's not safe to<br>

>> >> >> >>> automatically restart the previously down node.<br>

>> >> >> >>><br>

>> >> >> >>> Best regards,<br>

>> >> >> >>> --<br>

>> >> >> >>> Tatsuo Ishii<br>

>> >> >> >>> SRA OSS, Inc. Japan<br>

>> >> >> >>> English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>

>> >> >> >>> Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.jp</a><br>

>> >> >> >>><br>

>> >> >> >><br>

>> >> >> >><br>

>> >> >> >> --<br>

>> >> >> >><br>

>> >> >> >><br>

>> >> >> >> *Regards,*<br>

>> >> >> >><br>

>> >> >> >><br>

>> >> >> >> *K S Praveen KumarM: +91-9986855625 *<br>

>> >> >> > _______________________________________________<br>

>> >> >> > pgpool-general mailing list<br>

>> >> >> > <a href="mailto:pgpool-general@pgpool.net" target="_blank">pgpool-general@pgpool.net</a><br>

>> >> >> > <a href="http://www.pgpool.net/mailman/listinfo/pgpool-general" rel="noreferrer" target="_blank">http://www.pgpool.net/mailman/listinfo/pgpool-general</a><br>

>> >> >><br>

>> >> ><br>

>> >> ><br>

>> >> > --<br>

>> >> ><br>

>> >> ><br>

>> >> > *Regards,*<br>

>> >> ><br>

>> >> ><br>

>> >> > *K S Praveen KumarM: +91-9986855625 *<br>

>> >><br>

>> ><br>

>> ><br>

>> > --<br>

>> ><br>

>> ><br>

>> > *Regards,*<br>

>> ><br>

>> ><br>

>> > *K S Praveen KumarM: +91-9986855625 *<br>

>> _______________________________________________<br>

>> pgpool-general mailing list<br>

>> <a href="mailto:pgpool-general@pgpool.net" target="_blank">pgpool-general@pgpool.net</a><br>

>> <a href="http://www.pgpool.net/mailman/listinfo/pgpool-general" rel="noreferrer" target="_blank">http://www.pgpool.net/mailman/listinfo/pgpool-general</a><br>

>><br>

> <br>

> <br>

> -- <br>

> <br>

> <br>

> *Regards,*<br>

> <br>

> <br>

> *K S Praveen KumarM: +91-9986855625 *<br>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr"><font style="font-family:"courier new",monospace" size="1"><b style="color:rgb(102,102,102)">Regards,<br><br></b></font><div style="color:rgb(102,102,102)"><font size="1"><b><font face="'comic sans ms', sans-serif"><font style="font-family:"courier new",monospace" size="1">K S Praveen Kumar<br>M: +91-9986855625 </font><br></font></b></font></div></div>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><font style="font-family:"courier new",monospace" size="1"><b style="color:rgb(102,102,102)">Regards,<br><br></b></font><div style="color:rgb(102,102,102)"><font size="1"><b><font face="'comic sans ms', sans-serif"><font style="font-family:"courier new",monospace" size="1">K S Praveen Kumar<br>M: +91-9986855625 </font><br></font></b></font></div></div>