View Issue Details

IDProjectCategoryView StatusLast Update
0000618Pgpool-II[All Projects] Generalpublic2020-06-21 18:41
ReportergreenenvyAssigned Tot-ishii 
PrioritynormalSeverityminorReproducibilityN/A
Status assignedResolutionopen 
Product Version4.1.2 
Target VersionFixed in Version 
Summary0000618: Loadbalancer behaving not expectedly with hibernate and Hikari
DescriptionI'm not sure if this is a bug or my problem with configuration in general, but the load balancing is like not working as expected.

I'm using a spring boot app which should connect to PgPool which should load balance the queries to 1 of 3 nodes where the failover is managed with Repmgr. Streaming replication is used where both replicas are synchronous using the 'remote_apply' option. When i run the whole setup with docker-compose (so i'm using a docker image for PgPool), everything seems fine. When i use psql from PgPool to issue queries, on inserts, the master is in charge, and when there are selects, the load is spread evenly across all nodes. And everything seems to be working fine.

When i'm charging request form spring boot app, then there seems to be a problem. It has it's own connection pool in shape of HikariCP, and whatever i enter as max connections, the problem i'm facing is that few first selects are load balanced across all nodes, and then every select gets to only one of them, where it is not always the master (as displayed in the show pool_nodes table. It sometimes go all to pg_2).

 0 | pg_0 | 5432 | up | 0.333333 | primary | 255 | true | 0 | | | 2020-06-15 10:08:34
 1 | pg_1 | 5432 | up | 0.333333 | standby | 3 | false | 0 | streaming | sync | 2020-06-15 10:08:34
 2 | pg_2 | 5432 | up | 0.333333 | standby | 1 | false | 0 | streaming | sync | 2020-06-15 10:08:34

I have tried using other connection pools, like tomcat and c3p0 and the outcome was the same. The only difference i observed is that when using HikariCP or Tomcat CP is that one more select is counted in the table even when hibernate displays that only one select is in charge.

Logs for a select request:

2020-06-15 10:15:59: pid 55: LOG: DB node id: 2 backend pid: 164 statement: Execute: <--------- Seems like this is counted as well as the real select query
2020-06-15 10:15:59: pid 55: LOG: DB node id: 0 backend pid: 195 statement: Parse: SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY
2020-06-15 10:15:59: pid 55: LOG: DB node id: 1 backend pid: 163 statement: Parse: SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY
2020-06-15 10:15:59: pid 55: LOG: DB node id: 2 backend pid: 164 statement: Parse: SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY
2020-06-15 10:15:59: pid 55: LOG: DB node id: 0 backend pid: 195 statement: B message
2020-06-15 10:15:59: pid 55: LOG: DB node id: 1 backend pid: 163 statement: B message
2020-06-15 10:15:59: pid 55: LOG: DB node id: 2 backend pid: 164 statement: B message
2020-06-15 10:15:59: pid 55: LOG: DB node id: 0 backend pid: 195 statement: Execute: SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY
2020-06-15 10:15:59: pid 55: LOG: DB node id: 1 backend pid: 163 statement: Execute: SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY
2020-06-15 10:15:59: pid 55: LOG: DB node id: 2 backend pid: 164 statement: Execute: SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY
2020-06-15 10:15:59: pid 55: LOG: DB node id: 0 backend pid: 195 statement: B message
2020-06-15 10:15:59: pid 55: LOG: DB node id: 1 backend pid: 163 statement: B message
2020-06-15 10:15:59: pid 55: LOG: DB node id: 2 backend pid: 164 statement: B message
2020-06-15 10:15:59: pid 55: LOG: DB node id: 0 backend pid: 195 statement: Execute: BEGIN
2020-06-15 10:15:59: pid 55: LOG: DB node id: 1 backend pid: 163 statement: Execute: BEGIN
2020-06-15 10:15:59: pid 55: LOG: DB node id: 2 backend pid: 164 statement: Execute: BEGIN
2020-06-15 10:15:59: pid 55: LOG: DB node id: 0 backend pid: 195 statement: B message
2020-06-15 10:15:59: pid 55: LOG: DB node id: 0 backend pid: 195 statement: Execute: select trans0_.id as id1_0_, trans0_.message as message2_0_ from trans trans0_
2020-06-15 10:15:59: pid 55: LOG: DB node id: 0 backend pid: 195 statement: B message
2020-06-15 10:15:59: pid 55: LOG: DB node id: 1 backend pid: 163 statement: B message
2020-06-15 10:15:59: pid 55: LOG: DB node id: 2 backend pid: 164 statement: B message
2020-06-15 10:15:59: pid 55: LOG: DB node id: 0 backend pid: 195 statement: Execute: COMMIT
2020-06-15 10:15:59: pid 55: LOG: DB node id: 1 backend pid: 163 statement: Execute: COMMIT
2020-06-15 10:15:59: pid 55: LOG: DB node id: 2 backend pid: 164 statement: Execute: COMMIT
2020-06-15 10:15:59: pid 55: LOG: DB node id: 0 backend pid: 195 statement: Parse: SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE
2020-06-15 10:15:59: pid 55: LOG: DB node id: 0 backend pid: 195 statement: B message
2020-06-15 10:15:59: pid 55: LOG: DB node id: 0 backend pid: 195 statement: Execute: SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE

When using c3p0, i had not that strage request, but also everything was going to one node after first few requests.

The statement_load_balance option is set to 'on'.

I've also tried to totally disable connection pool from the spring boot side, but then i'm facing the problem that on every request, a new connection is opened, which then leads to a higher latency, BUT, the load balancing then works as expected.

2020-06-15 11:01:21: pid 43: LOG: new connection received
2020-06-15 11:01:21: pid 43: DETAIL: connecting host=172.19.0.6 port=55054
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: Parse: SET extra_float_digits = 3
2020-06-15 11:01:21: pid 43: LOG: DB node id: 1 backend pid: 166 statement: Parse: SET extra_float_digits = 3
2020-06-15 11:01:21: pid 43: LOG: DB node id: 2 backend pid: 166 statement: Parse: SET extra_float_digits = 3
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: B message
2020-06-15 11:01:21: pid 43: LOG: DB node id: 1 backend pid: 166 statement: B message
2020-06-15 11:01:21: pid 43: LOG: DB node id: 2 backend pid: 166 statement: B message
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: Execute: SET extra_float_digits = 3
2020-06-15 11:01:21: pid 43: LOG: DB node id: 1 backend pid: 166 statement: Execute: SET extra_float_digits = 3
2020-06-15 11:01:21: pid 43: LOG: DB node id: 2 backend pid: 166 statement: Execute: SET extra_float_digits = 3
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: Parse: SET application_name = 'PostgreSQL JDBC Driver'
2020-06-15 11:01:21: pid 43: LOG: DB node id: 1 backend pid: 166 statement: Parse: SET application_name = 'PostgreSQL JDBC Driver'
2020-06-15 11:01:21: pid 43: LOG: DB node id: 2 backend pid: 166 statement: Parse: SET application_name = 'PostgreSQL JDBC Driver'
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: B message
2020-06-15 11:01:21: pid 43: LOG: DB node id: 1 backend pid: 166 statement: B message
2020-06-15 11:01:21: pid 43: LOG: DB node id: 2 backend pid: 166 statement: B message
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: Execute: SET application_name = 'PostgreSQL JDBC Driver'
2020-06-15 11:01:21: pid 43: LOG: DB node id: 1 backend pid: 166 statement: Execute: SET application_name = 'PostgreSQL JDBC Driver'
2020-06-15 11:01:21: pid 43: LOG: DB node id: 2 backend pid: 166 statement: Execute: SET application_name = 'PostgreSQL JDBC Driver'
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: Parse: SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY
2020-06-15 11:01:21: pid 43: LOG: DB node id: 1 backend pid: 166 statement: Parse: SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY
2020-06-15 11:01:21: pid 43: LOG: DB node id: 2 backend pid: 166 statement: Parse: SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: B message
2020-06-15 11:01:21: pid 43: LOG: DB node id: 1 backend pid: 166 statement: B message
2020-06-15 11:01:21: pid 43: LOG: DB node id: 2 backend pid: 166 statement: B message
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: Execute: SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY
2020-06-15 11:01:21: pid 43: LOG: DB node id: 1 backend pid: 166 statement: Execute: SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY
2020-06-15 11:01:21: pid 43: LOG: DB node id: 2 backend pid: 166 statement: Execute: SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: Parse: BEGIN
2020-06-15 11:01:21: pid 43: LOG: DB node id: 1 backend pid: 166 statement: Parse: BEGIN
2020-06-15 11:01:21: pid 43: LOG: DB node id: 2 backend pid: 166 statement: Parse: BEGIN
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: B message
2020-06-15 11:01:21: pid 43: LOG: DB node id: 1 backend pid: 166 statement: B message
2020-06-15 11:01:21: pid 43: LOG: DB node id: 2 backend pid: 166 statement: B message
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: Execute: BEGIN
2020-06-15 11:01:21: pid 43: LOG: DB node id: 1 backend pid: 166 statement: Execute: BEGIN
2020-06-15 11:01:21: pid 43: LOG: DB node id: 2 backend pid: 166 statement: Execute: BEGIN
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: SELECT count(*) FROM pg_class AS c, pg_namespace AS n WHERE c.oid = pg_catalog.to_regclass('"trans"') AND c.relnamespace = n.oid AND n.nspname = 'pg_catalog'
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: SELECT count(*) FROM pg_catalog.pg_class AS c, pg_namespace AS n WHERE c.relname = 'trans' AND c.relnamespace = n.oid AND n.nspname ~ '^pg_temp_'
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: SELECT count(*) FROM pg_catalog.pg_class AS c WHERE c.oid = pg_catalog.to_regclass('"trans"') AND c.relpersistence = 'u'
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: Parse: select trans0_.id as id1_0_, trans0_.message as message2_0_ from trans trans0_
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: B message
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: D message
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: Execute: select trans0_.id as id1_0_, trans0_.message as message2_0_ from trans trans0_
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: Parse: COMMIT
2020-06-15 11:01:21: pid 43: LOG: DB node id: 1 backend pid: 166 statement: Parse: COMMIT
2020-06-15 11:01:21: pid 43: LOG: DB node id: 2 backend pid: 166 statement: Parse: COMMIT
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: B message
2020-06-15 11:01:21: pid 43: LOG: DB node id: 1 backend pid: 166 statement: B message
2020-06-15 11:01:21: pid 43: LOG: DB node id: 2 backend pid: 166 statement: B message
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: Execute: COMMIT
2020-06-15 11:01:21: pid 43: LOG: DB node id: 1 backend pid: 166 statement: Execute: COMMIT
2020-06-15 11:01:21: pid 43: LOG: DB node id: 2 backend pid: 166 statement: Execute: COMMIT
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: Parse: SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: B message
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: Execute: SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE
2020-06-15 11:01:21: pid 43: LOG: DB node id: 0 backend pid: 198 statement: DISCARD ALL
2020-06-15 11:01:21: pid 43: LOG: DB node id: 1 backend pid: 166 statement: DISCARD ALL
2020-06-15 11:01:21: pid 43: LOG: DB node id: 2 backend pid: 166 statement: DISCARD ALL

I'm really not sure where is the problem with the application side connection pools. I'm feeling that i miss something really obvious, but i don't know what. Maybe this is an expected behavior after all, i don't know.

I have attached the pgpool.conf file i'm using.

Any help i would really appreciate. Thanks.
TagsNo tags attached.

Activities

greenenvy

2020-06-15 20:06

reporter  

pgpool.conf (44,106 bytes)

greenenvy

2020-06-17 08:13

reporter   ~0003397

I went through the postgresql JDBC driver documentation and also run into a stackoverflow link (https://stackoverflow.com/questions/40639322/issue-with-pgpool-3-5-4-and-jdbc) that states that PgPool is having some trouble in the extended query mode which is set by default. It's stated for the 3.5.4 version. But i gave it a try, even i'm using the version 4.1.2, and as i tested, load balancing was working well.

So can someone explain why could this be a problem in general? What's possibly problem with the extended query mode?

t-ishii

2020-06-17 09:20

developer   ~0003398

Pgpool-II keeps on using the same load balance node (the PostgreSQL backend which SELECT queries are sent to ) qas long as client keeps on connecting to Pgpool-II. Since client side connection pooling keeps the connection to Pgpool-II forever, Pgpool-II won't change the load balance node. Fortunately since Pgpool-II 4.1 "statement_level_load_balance" is supported.

https://www.pgpool.net/docs/latest/en/html/runtime-config-load-balancing.html

By enabling this, Pgpool-II will choose different load balance node for each SELECT in the same session, rather than keeping on using the same load balance node until the session ends.

greenenvy

2020-06-17 09:34

reporter   ~0003399

Thank you for the response!

I'm not sure if i'm getting you. I stated in the description that i switched the statement_level_load_balance (i wrote it down wrong in the description, but in the config file it's exact as you linked in the docs) option to on. So, i would expect to work like you described. I only got few initial balances, and then the selects were spammed to only one node. The behavior works like expected when i switched the PostgreSQL JDBC driver property 'preferQueryMode' to 'simple'. But on the default 'extended' query it spammed one node only. https://www.postgresql.org/docs/10/protocol-flow.html#id-1.10.5.7.4 (52.2.2 Simple query, 52.2.3 Extended query) .

t-ishii

2020-06-17 09:44

developer   ~0003400

Oh ok. I missed that. If the replication delay is not too much (in your case 10000000 bytes), setting statement_level_load_balance should work. Let me check.

greenenvy

2020-06-18 06:35

reporter   ~0003401

I've made an docker-compose file, where you should be able to reproduce the problem (or maybe not) i'm observing.

Docker version: 19.03.8
Docker-Compose version: 1.25.4

If you run the the compose file, the query mode is set through the environment variable PREFER_QUERY_MODE (line 78) to extended which is also the default value for the property for the connection string. You need to wait like 15,16 seconds for everything to spin up, because i added some sleep time so everything should be running without problems.

Add some simple data few times with: curl -X POST http://localhost:8080/TestWrite
And then spam consecutive read calls that return the whole table: curl -X GET http://localhost:8080/TestRead

If you spam it like 30 times, around 25 selects will go to only one node. Only few first one will be load balanced. You can track that through the logs.

If you stop the whole thing, remove the containers, change the PREFER_QUERY_MODE in the compose file to 'simple' (without quotes), if you try again the steps above, everything works fine.

If you want to attach to the pgpool container: docker exec -it pgpool /bin/sh
Then you can execute if needed: psql -h localhost -p 5432 -U sample
password: secretpass
and inspect everything you need.

docker-compose.yml (1,492 bytes)

t-ishii

2020-06-18 15:22

developer   ~0003402

Thank you for the info. I also succeeded in reproducing the problem. It seems like if SELECT is executed in an explicit transaction using extended query protocol. One of the developers Peng Bo kindly is kindly looking into this.

t-ishii

2020-06-21 07:08

developer   ~0003403

I was wrong. Peng Bo confirmed that statement_level_load_balance works as expected in the test using pgbench and pgpool_setup.
I also confirmed that. (probably I made a mistake).
cat s.sql
BEGIN;
\set aid random(1, 100000)
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
END;

t-ishii$ pgbench -n -p 11000 -f s.sql -M extended -c 1 -t 1000 test
transaction type: s.sql
scaling factor: 1
query mode: extended
number of clients: 1
number of threads: 1
number of transactions per client: 1000
number of transactions actually processed: 1000/1000
latency average = 0.636 ms
tps = 1571.189742 (including connections establishing)
tps = 1593.497416 (excluding connections establishing)
t-ishii$ psql -p 11000 -c "show pool_nodes" test
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0 | /tmp | 11002 | up | 0.333333 | primary | 671 | false | 0 | | | 2020-06-21 06:59:46
 1 | /tmp | 11003 | up | 0.333333 | standby | 658 | false | 0 | streaming | async | 2020-06-21 06:59:46
 2 | /tmp | 11004 | up | 0.333333 | standby | 671 | true | 0 | streaming | async | 2020-06-21 06:59:46
(3 rows)

So there should be additional "factor" in your case. Let me study more...

greenenvy

2020-06-21 18:41

reporter   ~0003406

The load balancing works totally fine and as it should for the simple query mode and in the start when i tested it through psql on PgPool node. Don't know whats up with the extended mode for the JDBC driver.

The table is simple as it gets. An auto generated id, and a string column where an UUID is written and that's it. As it does replication to the standbys, it is not an unlogged table nor a temp table.

The difference in your case and my case, is that i used synchronous replication with remote_apply, but i think that should not matter. Hibernate uses read_commited as a default isolation level, but i think that also should not matter, but maybe as additional information is worth mentioning. You can try to run the compose file i posted up there, and you should be able to reproduce it.

If you somehow reproduced it, then there should be something up.

How can i additionally help around this?

Also a question not related to this, but i did not get the empty BEGIN COMMIT that is executed on the standbys when i'm writing to the master?

Issue History

Date Modified Username Field Change
2020-06-15 20:06 greenenvy New Issue
2020-06-15 20:06 greenenvy File Added: pgpool.conf
2020-06-17 08:13 greenenvy Note Added: 0003397
2020-06-17 09:06 t-ishii Assigned To => t-ishii
2020-06-17 09:06 t-ishii Status new => assigned
2020-06-17 09:20 t-ishii Note Added: 0003398
2020-06-17 09:21 t-ishii Status assigned => feedback
2020-06-17 09:21 t-ishii Description Updated View Revisions
2020-06-17 09:34 greenenvy Note Added: 0003399
2020-06-17 09:34 greenenvy Status feedback => assigned
2020-06-17 09:44 t-ishii Note Added: 0003400
2020-06-18 06:35 greenenvy File Added: docker-compose.yml
2020-06-18 06:35 greenenvy Note Added: 0003401
2020-06-18 15:22 t-ishii Note Added: 0003402
2020-06-21 07:08 t-ishii Note Added: 0003403
2020-06-21 18:41 greenenvy Note Added: 0003406