[pgpool-general: 7929] multi-datacenter postgres cluster with pgpool shows unexpected latency

Sat Dec 11 04:50:48 JST 2021

Hi all, I know this is an edge case (and likely not supported) but I have a setup with a postgres streaming cluster running across 2 datacenters with the primary in on datacenter and standby in the other. Postgres is managing the streaming replication. I insert pgpool between the application and the postgres nodes and set the pgpool weight for the remote datacenter node to 0, while setting the weight of the local (local to the application and pgpool) postgres node to 1. This setup works great in that all local application 'selects' get sent to the local datacenter node and all writes go to the 'primary' node in the remote datacenter. And this setup solves the issue I am trying to address.

However, I'm seeing a 2+ second latency in the application request response times under this setup, even though the node in the remote site is set to a weight of 0 and all 'selects' go to the database node in the local datacenter. As a test, I've set the weight of both nodes to a weight of  1 and see selects going to both node and the 2+ second latency stays the same, which is something I would expect since some ‘selects’ are being sent to the remote datacenter. If I shutdown the remote datacenter node the latency goes away. But if I detach the remote node from pgpool it does not.

I've also brought the 2 nodes (primary and standby) into the same local datacenter and the latency doesn't show so it seems that the latency seen with the nodes in different datacenters has to do with pgpool communications across the WAN.

So now my question... Under the 2 datacenter model, with the remote node set to weight 0 and local node set to weight 1, why would there be a latency in response times through pgpool if none of the select statements are being sent to the remote datacenter?

Also, I have confirmed from query logs that no select requests are hitting the node in the remote datacenter by the app. Only the sr and health_check requests are seen by that node coming from the pgpool node. Below is my config. I've tried many variations of the config to try to eliminate the added latency but nothing seems to work. Any help would be greatly appreciated.

pgpool-II version 4.1.4 (karasukiboshi) running on RHEL7

My pgpool.conf
```
listen_addresses='*'
port=5342
socket_dir = '/var/run/pgpool-II-13'
pid_file_name = '/var/run/pgpool-II-13/pgpool.pid'
pcp_listen_addresses = '*'
pcp_port = 9898
pcp_socket_dir = '/var/run/pgpool-II-13'
pool_passwd = ''

backend_hostname0 = 'local-dc.dbnode.com'
backend_port0 = 5432
backend_weight0 = 1

backend_hostname1 = 'remote-dc.dbnode.com''
backend_port1 = 5432
backend_weight1 = 0

load_balance_mode = on

master_slave_mode = on
master_slave_sub_mode = 'stream'
replication_mode = off

sr_check_period = 10
sr_check_user = 'pgpool_monitor'
sr_check_password = 'xxxxxx'
sr_check_database = 'postgres'
delay_threshold = 10240

allow_clear_text_frontend_auth = on
ssl = 'on'
ssl_cert = '/etc/pki/tls/certs/internal.pem'
ssl_key = '/etc/pki/tls/private/internal.key'
ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL'
```

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20211210/ca30f176/attachment.htm>