[pgpool-general: 9004] Re: Massive performance and latency issues when remote replica is attached to cluster.

Tatsuo Ishii ishii at sraoss.co.jp
Thu Jan 25 15:34:27 JST 2024


> Brian,
> 
> I have looked into the log and found some queries were sent to remote node (node id 2).
> This is pretty strange because you said:
>> Load-balancing is turned off.
> 
> In this case every queries should be sent to primary node only (node
> id 1).
> 
> Are you sure that you set load_balance_node to off?
                            ~~~~~~~~~~~~~~~~
I meant "load_balance_mode".

> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS LLC
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
> 
>> Tatsuo,
>> 
>> As requested, I re-ran pgbench with the log_per_node_statement option
>> turned on.
>> I included just a small excerpt, but everything I saw looked to be the same
>> pattern.
>> 
>> $ pgbench -h 10.1.1.1 -p1234 -U pgbench -c 75 -j 12 -t 500 -P 15 -r
>> pgbench1 -M extended
>> starting vacuum...end.
>> progress: 15.0 s, 1044.2 tps, lat 69.690 ms stddev 7.344
>> progress: 30.0 s, 1060.1 tps, lat 69.940 ms stddev 7.635
>> transaction type: <builtin: TPC-B (sort of)>
>> scaling factor: 100
>> query mode: extended
>> number of clients: 75
>> number of threads: 12
>> number of transactions per client: 500
>> number of transactions actually processed: 37500/37500
>> latency average = 69.725 ms
>> latency stddev = 7.464 ms
>> tps = 1003.125524 (including connections establishing)
>> tps = 1004.224113 (excluding connections establishing)
>> statement latencies in milliseconds:
>>          0.001  \set aid random(1, 100000 * :scale)
>>          0.001  \set bid random(1, 1 * :scale)
>>          0.000  \set tid random(1, 10 * :scale)
>>          0.000  \set delta random(-5000, 5000)
>>         11.958  BEGIN;
>>          8.933  UPDATE pgbench_accounts SET abalance = abalance + :delta
>> WHERE aid = :aid;
>>          8.921  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
>>          9.230  UPDATE pgbench_tellers SET tbalance = tbalance + :delta
>> WHERE tid = :tid;
>>         10.102  UPDATE pgbench_branches SET bbalance = bbalance + :delta
>> WHERE bid = :bid;
>>          8.907  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime)
>> VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
>>         11.671  END;
>> 
>> 
>> 
>> 
>> -----------------------
>> 
>> Looped through the logfile to grab the connections coming from pgbench and
>> then find those:
>> 
>> for i in `cat pgpool-Tue.log | grep "connecting\ host" | grep
>> "10\.1\.1\.150"|awk '{print $4}'|sed "s/://"`;do cat pgpool-Tue.log |grep
>> $i;done|less
>> 
>> A snippet from there:
>> NOTE: node_id 0 and 1 are the "local" db_nodes
>> node_id 2 is the "remote" db_node
>> node_id 1 is the current primary.
>> 
>> 
>> 
>> 2024-01-23 10:10:03: pid 4507: LOG:  new connection received
>> 2024-01-23 10:10:03: pid 4507: DETAIL:  connecting host=10.1.1.150
>> port=52802
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  child.c:1860
>> 2024-01-23 10:10:03: pid 4507: LOG:  statement: select count(*) from
>> pgbench_branches
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:224
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: SELECT pg_catalog.version()
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: SELECT count(*) FROM pg_catalog.pg_proc AS p,
>> pg_catalog.pg_namespace AS n WHERE p.proname = 'count' AND n.oid =
>> p.pronamespace AND n.nspname ~ '.*' AND p.provolatile = 'v'
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: select count(*) from pgbench_branches
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  statement: vacuum pgbench_branches
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:224
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: vacuum pgbench_branches
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  statement: vacuum pgbench_tellers
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:224
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: vacuum pgbench_tellers
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  statement: truncate pgbench_history
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:224
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: truncate pgbench_history
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  statement:  DISCARD ALL
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:224
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 0 backend pid: 19865
>> statement:  DISCARD ALL
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement:  DISCARD ALL
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 2 backend pid: 17091
>> statement:  DISCARD ALL
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  frontend disconnection: session time:
>> 0:00:00.215 user=pgbench database=pgbench1 host=10.1.1.150 port=52802
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  child.c:2076
>> 2024-01-23 10:10:03: pid 4507: LOG:  new connection received
>> 2024-01-23 10:10:03: pid 4507: DETAIL:  connecting host=10.1.1.150
>> port=52828
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  child.c:1860
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 0 backend pid: 19865
>> statement: Parse: BEGIN;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: Parse: BEGIN;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 2 backend pid: 17091
>> statement: Parse: BEGIN;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 0 backend pid: 19865
>> statement: Bind: BEGIN;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: Bind: BEGIN;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 2 backend pid: 17091
>> statement: Bind: BEGIN;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 0 backend pid: 19865
>> statement: D message
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: D message
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 2 backend pid: 17091
>> statement: D message
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  statement: BEGIN;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:914
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 0 backend pid: 19865
>> statement: Execute: BEGIN;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: Execute: BEGIN;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 2 backend pid: 17091
>> statement: Execute: BEGIN;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: Parse: UPDATE pgbench_accounts SET abalance = abalance + $1
>> WHERE aid = $2;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: Bind: UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE
>> aid = $2;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: D message
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  statement: UPDATE pgbench_accounts SET
>> abalance = abalance + $1 WHERE aid = $2;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:914
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: Execute: UPDATE pgbench_accounts SET abalance = abalance + $1
>> WHERE aid = $2;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: Parse: SELECT abalance FROM pgbench_accounts WHERE aid = $1;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: Bind: SELECT abalance FROM pgbench_accounts WHERE aid = $1;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: D message
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  statement: SELECT abalance FROM
>> pgbench_accounts WHERE aid = $1;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:914
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: Execute: SELECT abalance FROM pgbench_accounts WHERE aid = $1;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: Parse: UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE
>> tid = $2;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: Bind: UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE
>> tid = $2;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: D message
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  statement: UPDATE pgbench_tellers SET
>> tbalance = tbalance + $1 WHERE tid = $2;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:914
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: Execute: UPDATE pgbench_tellers SET tbalance = tbalance + $1
>> WHERE tid = $2;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: Parse: UPDATE pgbench_branches SET bbalance = bbalance + $1
>> WHERE bid = $2;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: Bind: UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE
>> bid = $2;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: D message
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  statement: UPDATE pgbench_branches SET
>> bbalance = bbalance + $1 WHERE bid = $2;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:914
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: Execute: UPDATE pgbench_branches SET bbalance = bbalance + $1
>> WHERE bid = $2;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: Parse: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime)
>> VALUES ($1, $2, $3, $4, CURRENT_TIMESTAMP);
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: Bind: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime)
>> VALUES ($1, $2, $3, $4, CURRENT_TIMESTAMP);
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: D message
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  statement: INSERT INTO pgbench_history
>> (tid, bid, aid, delta, mtime) VALUES ($1, $2, $3, $4, CURRENT_TIMESTAMP);
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:914
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: Execute: INSERT INTO pgbench_history (tid, bid, aid, delta,
>> mtime) VALUES ($1, $2, $3, $4, CURRENT_TIMESTAMP);
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 0 backend pid: 19865
>> statement: Parse: END;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: Parse: END;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 2 backend pid: 17091
>> statement: Parse: END;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 0 backend pid: 19865
>> statement: Bind: END;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: Bind: END;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 2 backend pid: 17091
>> statement: Bind: END;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 0 backend pid: 19865
>> statement: D message
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: D message
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 2 backend pid: 17091
>> statement: D message
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  statement: END;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:914
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 0 backend pid: 19865
>> statement: Execute: END;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 1 backend pid: 6857
>> statement: Execute: END;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 2024-01-23 10:10:03: pid 4507: LOG:  DB node id: 2 backend pid: 17091
>> statement: Execute: END;
>> 2024-01-23 10:10:03: pid 4507: LOCATION:  pool_proto_modules.c:3596
>> 
>> --
>> Brian Ismay
>> jismay at cenic.org
>> 
>> On Sat, Jan 20, 2024 at 10:39 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>> 
>>> > Any recommendations would be greatly appreciated. So far, after making
>>> > multiple tuning and settings changes there have been no meaningful
>>> changes.
>>> >
>>> > What we are highly suspicious of is that the latency numbers seem to go
>>> > from around .2 ms to around 8 ms. This is very close to the
>>> round-trip-time
>>> > between our datacenters.
>>>
>>> > statement latencies in milliseconds:
>>> >          0.003  \set aid random(1, 100000 * :scale)
>>> >          0.001  \set bid random(1, 1 * :scale)
>>> >          0.001  \set tid random(1, 10 * :scale)
>>> >          0.001  \set delta random(-5000, 5000)
>>> >          8.844  BEGIN;
>>> >          9.695  UPDATE pgbench_accounts SET abalance = abalance + :delta
>>> > WHERE aid = :aid;
>>> >          8.878  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
>>> >        685.873  UPDATE pgbench_tellers SET tbalance = tbalance + :delta
>>> > WHERE tid = :tid;
>>> >         92.049  UPDATE pgbench_branches SET bbalance = bbalance + :delta
>>> > WHERE bid = :bid;
>>> >          8.826  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime)
>>> > VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
>>> >          8.847  END;
>>>
>>> If BEGIN and END were sent to the remote standby node, then the
>>> latency number makes sense. But theoretically it should not happen
>>> because you turn off load balance mode. In this case everything should
>>> be sent to primary except the connection establishing. But even if
>>> load balance mode is off, pgpool establish connections to both primary
>>> and standbys. However the default option of pgbench is establishing
>>> connection only once at the beginning of session, meaning the
>>> connection overhead should only happen only once in the session.  Thus
>>> I can not think of plausible reason for the latency numbers.
>>>
>>> Anyway, can you turn on log_per_node_statement and see everything is
>>> sent to standby? I want to confirm that first.
>>>
>>> Best reagards,
>>> --
>>> Tatsuo Ishii
>>> SRA OSS LLC
>>> English: http://www.sraoss.co.jp/index_en/
>>> Japanese:http://www.sraoss.co.jp
>>>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general


More information about the pgpool-general mailing list