[pgpool-hackers: 3917] Re: Proposal: If replication delay exceeds delay_threshold, elect a new load balance node with less delay

Wed Jun 9 17:25:08 JST 2021

Hi Ishii-san,

I modified my patch.

Pleas see the following test result.
I tested on a 5 node cluster. 

First, in the case that the node 3 is only delayed, the node 0 is the primary 
and the node 1,2,4 are the lowest delay standbys. The result is that pgpool 
sent the 17% queries to the primary and the 27~30% queries to the each 
lowest delay standbys. This result is close to the hoped result that the 20% 
to the primary and the 30% to the each standbys.

Second, in the case that the node 3 and 4 are delayed, the node 0 is the 
primary and the node 1,2 are the lowest delay standbys. The result is the 
19% to the primary and the 39~40% to the each lowest delay standbys. I 
think this is very good result.

What do you think?

========
-bash-4.2$ psql -p 11000 -c "show pool_nodes"
 node_id | hostname | port  | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | /tmp     | 11002 | up     | up        | 0.200000  | primary | primary | 0          | true              | 0                 |               |                        | 2021-06-09 07:23:37
 1       | /tmp     | 11003 | up     | up        | 0.200000  | standby | standby | 0          | false             | 0                 | streaming     | async                  | 2021-06-09 07:23:37
 2       | /tmp     | 11004 | up     | up        | 0.200000  | standby | standby | 0          | false             | 0                 | streaming     | async                  | 2021-06-09 07:23:37
 3       | /tmp     | 11005 | up     | up        | 0.200000  | standby | standby | 0          | false             | 0                 | streaming     | async                  | 2021-06-09 07:23:37
 4       | /tmp     | 11006 | up     | up        | 0.200000  | standby | standby | 0          | false             | 0                 | streaming     | async                  | 2021-06-09 07:23:37
(5 rows)

-bash-4.2$ psql -p 11005 -c "select pg_wal_replay_pause()"
 pg_wal_replay_pause
---------------------

(1 row)

-bash-4.2$ pgbench -p 11000 -i test

-bash-4.2$ pgbench -p 11000 -n -S -t 400 test

-bash-4.2$ psql -p 11000 -c "show pool_nodes"
 node_id | hostname | port  | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | /tmp     | 11002 | up     | up        | 0.200000  | primary | primary | 69         | false             | 0                 |               |                        | 2021-06-09 07:05:16
 1       | /tmp     | 11003 | up     | up        | 0.200000  | standby | standby | 106        | false             | 0                 | streaming     | async                  | 2021-06-09 07:05:16
 2       | /tmp     | 11004 | up     | up        | 0.200000  | standby | standby | 108        | false             | 0                 | streaming     | async                  | 2021-06-09 07:05:16
 3       | /tmp     | 11005 | up     | up        | 0.200000  | standby | standby | 0          | false             | 13158872          | streaming     | async                  | 2021-06-09 07:05:16
 4       | /tmp     | 11006 | up     | up        | 0.200000  | standby | standby | 119        | true              | 0                 | streaming     | async                  | 2021-06-09 07:05:16
(5 rows)

-bash-4.2$ psql -p 11006 -c "select pg_wal_replay_pause()"
 pg_wal_replay_pause
---------------------

(1 row)

-bash-4.2$ pgbench -p 11000 -i test

-bash-4.2$ pgbench -p 11000 -n -S -t 400 test

-bash-4.2$ psql -p 11000 -c "show pool_nodes"
 node_id | hostname | port  | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | /tmp     | 11002 | up     | up        | 0.200000  | primary | primary | 69         | true              | 0                 |               |                        | 2021-06-09 07:05:16
 1       | /tmp     | 11003 | up     | up        | 0.200000  | standby | standby | 106        | false             | 0                 | streaming     | async                  | 2021-06-09 07:05:16
 2       | /tmp     | 11004 | up     | up        | 0.200000  | standby | standby | 108        | false             | 0                 | streaming     | async                  | 2021-06-09 07:05:16
 3       | /tmp     | 11005 | up     | up        | 0.200000  | standby | standby | 0          | false             | 26195408          | streaming     | async                  | 2021-06-09 07:05:16
 4       | /tmp     | 11006 | up     | up        | 0.200000  | standby | standby | 119        | false             | 13036536          | streaming     | async                  | 2021-06-09 07:05:16
(5 rows)

-bash-4.2$ pgbench -p 11000 -n -S -t 300 test

-bash-4.2$ psql -p 11000 -c "show pool_nodes"
 node_id | hostname | port  | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | /tmp     | 11002 | up     | up        | 0.200000  | primary | primary | 127        | false             | 0                 |               |                        | 2021-06-09 07:05:16
 1       | /tmp     | 11003 | up     | up        | 0.200000  | standby | standby | 225        | true              | 0                 | streaming     | async                  | 2021-06-09 07:05:16
 2       | /tmp     | 11004 | up     | up        | 0.200000  | standby | standby | 233        | false             | 0                 | streaming     | async                  | 2021-06-09 07:05:16
 3       | /tmp     | 11005 | up     | up        | 0.200000  | standby | standby | 0          | false             | 26268544          | streaming     | async                  | 2021-06-09 07:05:16
 4       | /tmp     | 11006 | up     | up        | 0.200000  | standby | standby | 119        | false             | 13109672          | streaming     | async                  | 2021-06-09 07:05:16
(5 rows)
========

Best regards.

On Thu, 3 Jun 2021 15:32:19 +0900
KAWAMOTO Masaya <kawamoto ¡÷ sraoss.co.jp> wrote:

> On Thu, 03 Jun 2021 15:06:14 +0900 (JST)
> Tatsuo Ishii <ishii ¡÷ sraoss.co.jp> wrote:
> 
> > >> Good news is, with prefer_lower_delay_standby, SELECT is not sent to
> > >> standby node 1 because its replication delay 13188800 exceeds
> > >> delay_threshold 10000000. However, select_cnt of primary and standby
> > >> node 2 looks strange since lb_weight of both nodes are
> > >> identical. Because pgbench issues 100 SELECTs, select_cnt of both
> > >> nodes should be close to 50 and 50, no?
> > > 
> > > This is the expected result. 
> > > prefer_lower_delay_standby is effective when the selected node is a standby
> > > node. In your example, all nodes have the same weight, so if it set to on the 
> > > queries were sent to node 2 when node 1 was selected. 
> > > Other standby nodes take over the processing of the delayed node.
> > 
> > Ok.
> > 
> > So this time I did the same test on a 4 node cluster.
> > 
> > t-ishii$ pgbench -p 11000 -n -S -t 100 test
> > transaction type: <builtin: select only>
> > scaling factor: 1
> > query mode: simple
> > number of clients: 1
> > number of threads: 1
> > number of transactions per client: 100
> > number of transactions actually processed: 100/100
> > latency average = 0.300 ms
> > tps = 3336.064903 (including connections establishing)
> > tps = 3827.364870 (excluding connections establishing)
> > t-ishii$ psql -p 11000 -c "show pool_nodes" test
> >  node_id | hostname | port  | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
> > ---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> >  0       | /tmp     | 11002 | up     | up        | 0.250000  | primary | primary | 33         | false             | 0                 |                   |                        | 2021-06-03 14:54:46
> >  1       | /tmp     | 11003 | up     | up        | 0.250000  | standby | standby | 0          | false             | 13188720          | streaming         | async                  | 2021-06-03 14:54:46
> >  2       | /tmp     | 11004 | up     | up        | 0.250000  | standby | standby | 45         | false             | 0                 | streaming         | async                  | 2021-06-03 14:54:46
> >  3       | /tmp     | 11005 | up     | up        | 0.250000  | standby | standby | 24         | true              | 0                 | streaming         | async                  | 2021-06-03 14:54:46
> > (4 rows)
> > 
> > I was expecting almost the same count of SELECTs were sent to node 2
> > and 3. But in reality, about twice the number of SELECTs were sent to
> > node 2. Shouldn't the same number of SELECTs be sent to node 2 and
> > node 3 because replication delay of both nodes are equal (0)? Also
> > this will be better from a point of performance view.
> 
> OK.
> I will try modify the algorithm to select the load balancing node based on
> backend_weight if there are multiple nodes with the lowest delay.
> 
> 
> > 
> > Best regards,
> > --
> > Tatsuo Ishii
> > SRA OSS, Inc. Japan
> > English: http://www.sraoss.co.jp/index_en.php
> > Japanese:http://www.sraoss.co.jp
> 
> -- 
> KAWAMOTO Masaya <kawamoto ¡÷ sraoss.co.jp>
> SRA OSS, Inc. Japan
> _______________________________________________
> pgpool-hackers mailing list
> pgpool-hackers ¡÷ pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-hackers

-- 
KAWAMOTO Masaya <kawamoto ¡÷ sraoss.co.jp>
SRA OSS, Inc. Japan
-------------- next part --------------
¥Æ¥¥¹¥È·Á¼°°Ê³°¤ÎÅºÉÕ¥Õ¥¡¥¤¥ë¤òÊÝ´É¤·¤Þ¤·¤¿...
¥Õ¥¡¥¤¥ëÌ¾: select_lower_delay_load_balance_node.patch_r6
·¿:         application/octet-stream
¥µ¥¤¥º:     21670 ¥Ð¥¤¥È
ÀâÌÀ:       Ìµ¤·
URL:        <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20210609/216c07c3/attachment-0001.obj>