[pgpool-general: 7931] Re: multi-datacenter postgres cluster with pgpool shows unexpected latency

Sat Dec 11 22:37:39 JST 2021

> Hi all, I know this is an edge case (and likely not supported) but I have a setup with a postgres streaming cluster running across 2 datacenters with the primary in on datacenter and standby in the other. Postgres is managing the streaming replication. I insert pgpool between the application and the postgres nodes and set the pgpool weight for the remote datacenter node to 0, while setting the weight of the local (local to the application and pgpool) postgres node to 1. This setup works great in that all local application 'selects' get sent to the local datacenter node and all writes go to the 'primary' node in the remote datacenter. And this setup solves the issue I am trying to address.
> 
> However, I'm seeing a 2+ second latency in the application request response times under this setup, even though the node in the remote site is set to a weight of 0 and all 'selects' go to the database node in the local datacenter. As a test, I've set the weight of both nodes to a weight of  1 and see selects going to both node and the 2+ second latency stays the same, which is something I would expect since some ‘selects’ are being sent to the remote datacenter. If I shutdown the remote datacenter node the latency goes away. But if I detach the remote node from pgpool it does not.
> 
> I've also brought the 2 nodes (primary and standby) into the same local datacenter and the latency doesn't show so it seems that the latency seen with the nodes in different datacenters has to do with pgpool communications across the WAN.
> 
> So now my question... Under the 2 datacenter model, with the remote node set to weight 0 and local node set to weight 1, why would there be a latency in response times through pgpool if none of the select statements are being sent to the remote datacenter?
> 
> Also, I have confirmed from query logs that no select requests are hitting the node in the remote datacenter by the app. Only the sr and health_check requests are seen by that node coming from the pgpool node. Below is my config. I've tried many variations of the config to try to eliminate the added latency but nothing seems to work. Any help would be greatly appreciated.
[snip]

My guess is the latency is caused by Pgpool-II's internal SELECTs
issued to primary in the datacenter. Pgpool-II needs to know meta
information (system catalog information) of tables specified in the
user's query. (you can see the internal queries by enabling debug log).

By setting relcache_query_target you can change the behavior. If it's
set to 'load_balance_node', then the internal SELECTs will not be sent
to primary and the latency will go away.

Please note that there's a limitation in this feature. If user's
SELECT is within an explicit transaction (i.e. starts with BEGIN
etc.), then the BEGIN will be sent to both primary and standby thus you
will see the latency.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp