<div dir="ltr"><div>Sorry i don't think i understand that could you please be more specific about this " solution </div><div>just for a reference here are our </div><div>follow_primary_command: <a href="https://github.com/stormatics/pg_cirrus/blob/589cc51a12339a13b2a64dc25d7d7e0952e3a6a2/3-node-cluster/ansible/playbooks/setup-pgpool.yml#L45" target="_blank">https://github.com/stormatics/pg_cirrus/blob/589cc51a12339a13b2a64dc25d7d7e0952e3a6a2/3-node-cluster/ansible/playbooks/setup-pgpool.yml#L45</a></div><div>follow_primary_script file: <a href="https://github.com/stormatics/pg_cirrus/blob/main/3-node-cluster/pgpool-internals/follow_primary.sh" target="_blank">https://github.com/stormatics/pg_cirrus/blob/main/3-node-cluster/pgpool-internals/follow_primary.sh</a></div><span class="gmail-im" style="color:rgb(80,0,80)"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>> NOTE, I attempted to resolve the situation by restarting pgpool2 using the<br>> -d switch. After the restart, everything seemed to work fine, and standby2<br>> node was correctly marked as standby again. Is restart of pgpool really<br>> necessary?<br><br>No. You just should have waited a little bit longer so that the follow<br>primary command completed the job: recovering node 2.</blockquote></span><div>I waited almost an hour after the failover but still getting above log that there is no standby node</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Aug 19, 2023 at 5:28 PM Dev BotX <<a href="mailto:devbotx5@gmail.com">devbotx5@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, 19 Aug 2023 at 08:03, Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">> Hi Tatsuo<br>

> I've observed some logs following an auto failover that I'd like to discuss<br>

> <br>

> 2023-08-19 00:44:07.277: main pid 62145: LOG: find_primary_node: primary<br>

> node is 1<br>

> 2023-08-19 00:44:07.277: main pid 62145: LOG: find_primary_node: standby<br>

> node is 2<br>

> 2023-08-19 00:44:07.278: main pid 62145: LOG: starting follow degeneration.<br>

> shutdown host 172.16.14.165(5432)<br>

> 2023-08-19 00:44:07.279: main pid 62145: LOG: starting follow degeneration.<br>

> shutdown host 172.16.14.163(5432)<br>

> 2023-08-19 00:44:07.279: main pid 62145: LOG: failover: 2 follow backends<br>

> have been degenerated<br>

> 2023-08-19 00:44:07.280: main pid 62145: LOG: failover: set new primary<br>

> node: 1<br>

> <br>

> do_query: extended:0 query:"SELECT pg_is_in_recovery()"<br>

> 2023-08-19 00:44:08.305: sr_check_worker pid 62243: DEBUG:<br>

> verify_backend_node_status: there's no standby node<br>

> 2023-08-19 00:44:08.305: sr_check_worker pid 62243: DEBUG: node status[0]: 0<br>

> 2023-08-19 00:44:08.305: sr_check_worker pid 62243: DEBUG: node status[1]: 1<br>

> 2023-08-19 00:44:08.305: sr_check_worker pid 62243: DEBUG: node status[2]: 0<br>

> <br>

> <br>

> In the above logs, it's evident that the initial primary node at IP address<br>

> 172.16.14.165 (which is currently down) and standby2 node at 172.16.14.163<br>

> were involved in the auto failover process.<br>

> My concern is why did pgpool initiate the shutdown of standby2 node as part<br>

> of the auto failover process?<br>

<br>

Because in general standby servers cannot connect to new primary<br>

without otaining copy of the new primary database.<br>

<br>

> and How can we prevent such a degeneration<br>

> from happening to standby2 in our case?<br>

<br>

You can set '' to follow_primary_command to prevent standbys being<br>

killed by pgpool.<br></blockquote><div>Sorry i don't think i understand that could you please be more specific about this " solution </div><div>just for a reference here are our </div><div>follow_primary_command: <a href="https://github.com/stormatics/pg_cirrus/blob/589cc51a12339a13b2a64dc25d7d7e0952e3a6a2/3-node-cluster/ansible/playbooks/setup-pgpool.yml#L45" target="_blank">https://github.com/stormatics/pg_cirrus/blob/589cc51a12339a13b2a64dc25d7d7e0952e3a6a2/3-node-cluster/ansible/playbooks/setup-pgpool.yml#L45</a></div><div>follow_primary_script file: <a href="https://github.com/stormatics/pg_cirrus/blob/main/3-node-cluster/pgpool-internals/follow_primary.sh" target="_blank">https://github.com/stormatics/pg_cirrus/blob/main/3-node-cluster/pgpool-internals/follow_primary.sh</a></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

> NOTE, I attempted to resolve the situation by restarting pgpool2 using the<br>

> -d switch. After the restart, everything seemed to work fine, and standby2<br>

> node was correctly marked as standby again. Is restart of pgpool really<br>

> necessary?<br>

<br>

No. You just should have waited a little bit longer so that the follow<br>

primary command completed the job: recovering node 2. </blockquote><div>I waited almost an hour after the failover but still getting above log that there is no standby node</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Please remember<br>

that the follow primary command keeps on running after the failover<br>

completed.<br>

<br>

Best reagards,<br>

--<br>

Tatsuo Ishii<br>

SRA OSS LLC<br>

English: <a href="http://www.sraoss.co.jp/index_en/" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_en/</a><br>

Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.jp</a><br>

</blockquote></div></div>

</blockquote></div>