[pgpool-hackers: 4377] Re: Load balancing after failover not working in 3-node HA PostgreSQL cluster

Salman Ahmed salman.ahmed at stormatics.tech
Sat Aug 19 05:13:41 JST 2023


Hi Tatsuo
I've observed some logs following an auto failover that I'd like to discuss

2023-08-19 00:44:07.277: main pid 62145: LOG: find_primary_node: primary
node is 1
2023-08-19 00:44:07.277: main pid 62145: LOG: find_primary_node: standby
node is 2
2023-08-19 00:44:07.278: main pid 62145: LOG: starting follow degeneration.
shutdown host 172.16.14.165(5432)
2023-08-19 00:44:07.279: main pid 62145: LOG: starting follow degeneration.
shutdown host 172.16.14.163(5432)
2023-08-19 00:44:07.279: main pid 62145: LOG: failover: 2 follow backends
have been degenerated
2023-08-19 00:44:07.280: main pid 62145: LOG: failover: set new primary
node: 1

do_query: extended:0 query:"SELECT pg_is_in_recovery()"
2023-08-19 00:44:08.305: sr_check_worker pid 62243: DEBUG:
verify_backend_node_status: there's no standby node
2023-08-19 00:44:08.305: sr_check_worker pid 62243: DEBUG: node status[0]: 0
2023-08-19 00:44:08.305: sr_check_worker pid 62243: DEBUG: node status[1]: 1
2023-08-19 00:44:08.305: sr_check_worker pid 62243: DEBUG: node status[2]: 0


In the above logs, it's evident that the initial primary node at IP address
172.16.14.165 (which is currently down) and standby2 node at 172.16.14.163
were involved in the auto failover process.
My concern is why did pgpool initiate the shutdown of standby2 node as part
of the auto failover process? and How can we prevent such a degeneration
from happening to standby2 in our case?

NOTE, I attempted to resolve the situation by restarting pgpool2 using the
-d switch. After the restart, everything seemed to work fine, and standby2
node was correctly marked as standby again. Is restart of pgpool really
necessary?

Thanks

On Sat, Aug 19, 2023 at 1:06 AM Dev BotX <devbotx5 at gmail.com> wrote:

> Hi Tatsuo
> I've observed some logs following an auto failover that I'd like to discuss
>
> 2023-08-19 00:44:07.277: main pid 62145: LOG: find_primary_node: primary
> node is 1
> 2023-08-19 00:44:07.277: main pid 62145: LOG: find_primary_node: standby
> node is 2
> 2023-08-19 00:44:07.278: main pid 62145: LOG: starting follow
> degeneration. shutdown host 172.16.14.165(5432)
> 2023-08-19 00:44:07.279: main pid 62145: LOG: starting follow
> degeneration. shutdown host 172.16.14.163(5432)
> 2023-08-19 00:44:07.279: main pid 62145: LOG: failover: 2 follow backends
> have been degenerated
> 2023-08-19 00:44:07.280: main pid 62145: LOG: failover: set new primary
> node: 1
>
> do_query: extended:0 query:"SELECT pg_is_in_recovery()"
> 2023-08-19 00:44:08.305: sr_check_worker pid 62243: DEBUG:
> verify_backend_node_status: there's no standby node
> 2023-08-19 00:44:08.305: sr_check_worker pid 62243: DEBUG: node status[0]:
> 0
> 2023-08-19 00:44:08.305: sr_check_worker pid 62243: DEBUG: node status[1]:
> 1
> 2023-08-19 00:44:08.305: sr_check_worker pid 62243: DEBUG: node status[2]:
> 0
>
>
> In the above logs, it's evident that the initial primary node at IP
> address 172.16.14.165 (which is currently down) and standby2 node at
> 172.16.14.163 were involved in the auto failover process.
> My concern is why did pgpool initiate the shutdown of standby2 node as
> part of the auto failover process? and How can we prevent such a
> degeneration from happening to standby2 in our case?
>
> NOTE, I attempted to resolve the situation by restarting pgpool2 using the
> -d switch. After the restart, everything seemed to work fine, and standby2
> node was correctly marked as standby again. Is restart of pgpool really
> necessary?
>
> Thanks
>
> On Wed, 16 Aug 2023 at 18:41, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>
>> You need to set backend_weight0 to non 0.
>> e.g.
>>
>> backend_weight0 = 0.5
>>
>> Note that you don't need to change backend_weight[0-2] before/after
>> failover.
>>
>> > Just make my last message clearer,
>> >
>> > When I first setup the cluster these were the properties I was using
>> >
>> > backend_weight0 = 0
>> >
>> > backend_weight1 = 0.5
>> >
>> > backend_weight2 = 0.5
>> >
>> > load_balance_mode = on database_redirect_preference_list = '.*:standby'
>> > Load balancing after failover did not work. After failover, I manually
>> set
>> > the backend_weight1 = 0 and backend_weight2 = 1, and load balancing
>> started
>> > to work.
>> >
>> >
>> >
>> > On Wed, Aug 16, 2023 at 5:29 PM Salman Ahmed
>> <salman.ahmed at stormatics.tech>
>> > wrote:
>> >
>> >> I did as you suggested, I set the  database_redirect_preference_list
>> value
>> >> in pgpool.conf to
>> >>
>> >> database_redirect_preference_list = '.*:standby'
>> >>
>> >> but load balancing did not work after failover, however when I manually
>> >> set the backend_weight1 = 0 and backend_weight2 = 1 after failover, the
>> >> load balancing started to work.
>> >>
>> >> On Wed, Aug 16, 2023 at 1:02 PM Tatsuo Ishii <ishii at sraoss.co.jp>
>> wrote:
>> >>
>> >>> > Dear pgpool-hackers,
>> >>> >
>> >>> > We’re currently working on pg_cirrus - a simple tool to setup
>> 3-node HA
>> >>> > PostgreSQL clusters using pgpool. We’re facing an issue i.e. load
>> >>> balancing
>> >>> > after failover does not work.
>> >>> >
>> >>> > When pg_cirrus sets up 3-node HA cluster within the same subnet,
>> load
>> >>> > balancing works seamlessly as expected i.e. write operations go to
>> >>> primary
>> >>> > node and read operations go to standby nodes, but after primary node
>> >>> goes
>> >>> > down the read and write queries both land on the same newly promoted
>> >>> > primary node.
>> >>> >
>> >>> > We’re using the following load balancing properties in pgpool.conf:
>> >>> >
>> >>> > backend_weight0 = 0
>> >>> >
>> >>> > backend_weight1 = 0.5
>> >>> >
>> >>> > backend_weight2 = 0.5
>> >>> >
>> >>> > load_balance_mode = on
>> >>> >
>> >>> > Is this the expected behavior or are we doing something wrong,
>> kindly
>> >>> guide
>> >>> > us.
>> >>>
>> >>> Assuming the initial primary is node 0 and the new primary is node 1,
>> >>> it seems pgpool behaves as expected. After failover, node 0 becomes
>> >>> new standby with weight 0, which means no read query is routed to node
>> >>> 0. In the mean time node 1 becomes the new primary with weight 0.5,
>> >>> which means some of read queries are routed to node1, new primary.
>> >>>
>> >>> If you want to keep on routing no read query being sent to primary
>> >>> node, you can use database_redirect_preference_list. For example,
>> >>>
>> >>> database_redirect_preference_list = '.*:standby'
>> >>>
>> >>> will route read queries to one of standby servers (which standby is
>> >>> chosen is depending on the weight parameter), i.e. no read query is
>> >>> routed to primary. This behavior is kept regardless which node is
>> >>> assigned to the primary. See manual for more details.
>> >>>
>> >>> Best reagards,
>> >>> --
>> >>> Tatsuo Ishii
>> >>> SRA OSS LLC
>> >>> English: http://www.sraoss.co.jp/index_en/
>> >>> Japanese:http://www.sraoss.co.jp
>> >>>
>> >>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20230819/e77e9fc2/attachment.htm>


More information about the pgpool-hackers mailing list