View Issue Details

IDProjectCategoryView StatusLast Update
0000435Pgpool-II[All Projects] Generalpublic2018-10-24 19:39
Reporterjgj1018Assigned Tot-ishii 
PriorityurgentSeverityblockReproducibilityalways
Status resolvedResolutionopen 
PlatformvmOSCentOSOS Versionrelease 7.5.1804
Product Version3.7.5 
Target Version3.7.6Fixed in Version4.0.0 
Summary0000435: Unexpected movement on load balancing (master slave mode)
DescriptionFirst, I'm not for sure that It's a bug of pgpool,
but just I want you to give us any advice to investigate the root cause of this situation.
because It seems related with master slave mode and load balancing function of pgpool.
We are currently using pgpool(3.0.3) & PostgreSQL (9.0.18, 1 master & 1 slave)
and faced a unexpected trouble on upgrading the version of pgpool(to 3.7.5) and postgreSQL(to 9.6.10, 1 master & 1 slave).

I attached the log and current setting of pgpool.conf.

It looks similar with a old (but seems fixed) bug.
http://lists.pgfoundry.org/pipermail/pgpool-hackers/2010-May/000311.html

And This doesn't happen in Local test env (pgpool(to 3.7.5) and postgreSQL(9.6.9, 1 master & 1 slave). It's very mysterious.
Could you give me any advice on this situation?
TagsNo tags attached.

Activities

jgj1018

2018-10-12 11:40

reporter  

pgpool.log.20181012.txt (278,996 bytes)
pgpool.conf (35,731 bytes)

jgj1018

2018-10-12 11:46

reporter   ~0002195

FYI, It also worked fine with the combination of our old pgpool(3.0.3) and new postgreSQL(9.6.9).
That's why we thought it maybe is a bug of pgpool.
Let me add our old pgpool.conf(3.0.3)

pgpool_old.conf (10,677 bytes)

jgj1018

2018-10-12 15:39

reporter   ~0002198

Let me add further information.
It seems a query prepared in Master but Select is sent to Slave. but the select have to be sent to Master.
if both are sent to Master, it's working. but not working in case that select is sent to Slave.
The thing is that same code is totally fine on pgpool(3.0.3) & PostgreSQL (9.0.18, 1 master & 1 slave)

t-ishii

2018-10-17 10:13

developer   ~0002204

After some investigations, I realized that it is likely a bug of 3.7.

At line 2404 of pgpool.log.20181012.txt, a SELECT was executed using prepared statement "pdo_pgsql_stmt_02398690" on DB node 0.
At line 2461 deallocate of "pdo_pgsql_stmt_02398690" was requested by frontend and it completed on node 0.
However for some reason I don't know, Pgpool-II did not remove the info on "pdo_pgsql_stmt_02398690". So Pgpool-II thought any subsequent query using the statement was sent to node 0.
Then next parse request came and it was sent to node 1 because the SELECT can be load balanced (line 2634)
Corresponding bind request came (line 2841). Pgpool-II decided to send the bind message to node 0 since the info regarding "pdo_pgsql_stmt_02398690" in Pgpool-II still pointed to node 0
Later on, deallocate statement came (line 9332). It was also sent to node 0, and the error raised because there's no "pdo_pgsql_stmt_02398690" on node 0.

I will investigate the problem.

jgj1018

2018-10-17 14:07

reporter   ~0002207

First, Thank you for the investigation and sorry for the mistake in the title.
if you can, Could you change the title of this ticket to "unexpected movement on load balancing (master slave mode)" ?

And Let me share further information.
with the setting as follows, sometimes it works, and sometimes it fails. I'll attached the detailed debug log. I hope this information can be helpful.

replication_mode = off
replicate_select = off
load_balance_mode = on
ignore_leading_white_space = on
white_function_list = ''
black_function_list = 'nextval,setval'
database_redirect_preference_list = ''
app_name_redirect_preference_list = ''
allow_sql_comments = off

pgpool.log.20181017.success (216,782 bytes)
pgpool.log.20181017.fail (204,096 bytes)

jgj1018

2018-10-17 14:36

reporter   ~0002208

FYI,
The result of "show pool_nodes";
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+----------------------------------+------+--------+-----------+---------+------------+-------------------+-------------------
 0 | master | 5432 | up | 0.300000 | primary | 190 | false | 0
 1 | slave | 5432 | up | 0.700000 | standby | 85 | true | 0
(2 rows)

t-ishii

2018-10-17 17:03

developer   ~0002209

I have changed the title as you requested.

t-ishii

2018-10-17 17:51

developer   ~0002210

Last edited: 2018-10-17 18:04

View 2 revisions

It appears that this is a known bug with 3.7.5 and was recently fixed:
https://git.postgresql.org/gitweb/?p=pgpool2.git;a=blobdiff;f=src/protocol/CommandComplete.c;h=a16a05424c377a5d3a06ef3b4e866dfbff7c8d0f;hp=7322d86f738bdce2b9984df4ec5556eefb58f8ef;hb=e22933a1a514fcecf3874f79ff7444e5809a1533;hpb=209ef0ee7b74be40a11a69202cc833627ab911a5

Please try the 1 line diff or 3.7 stable head.

Also we are planning to make minor releases around October 30. We will release 3.7.6 which includes the fix.

jgj1018

2018-10-18 10:28

reporter   ~0002211

Thank you for the investigation!
It becomes more clear. but the thing is that we installed pgpool with "yum" command.
I tried to adapt the diff on binary level, but failed (There were too many diff with just one line change.)
And I also tried to use "yum update" with the change "/etc/yum.repos.d/pgpool-II-release-37.repo" if possible, but seems the srpm doesn't exist.
Do you have any ideas to update the one line in this situation?

t-ishii

2018-10-18 11:19

developer   ~0002212

You can get SRPMS from here:
http://www.pgpool.net/yum/srpms/3.7/redhat/rhel-7-x86_64/

pengbo

2018-10-18 11:53

developer   ~0002213

To install SRPMS please use "rpm -ivh ..." instead of "yum".

For example:

  rpm -ivh http://www.pgpool.net/yum/srpms/3.7/redhat/rhel-7-x86_64/pgpool-II-pg10-3.7.5-1pgdg.rhel7.src.rpm

jgj1018

2018-10-18 17:32

reporter   ~0002214

Thank you for the information. and Thank you for the helpful example.

Due to some restriction in the company, I first downgrade the current one(3.7.5) to 3.6.12 and tried but didn't work again.
The reason why I downgrade is that I found out that diff line(latest version) looks working the same as "3.6.12".

3.7.6 : if (session_context->query_context != NULL && (!SL_MODE || (SL_MODE && !pool_is_doing_extended_query_message())))
3.6.12 : if (session_context->query_context != NULL && (!STREAM || (SL_MODE && !pool_is_doing_extended_query_message())))

But it didn't work again. Exactly like the situation above, randomly sometimes it worked, and sometimes it failed.
The setting I used for 3.6.12 is basically the same as follows.

replication_mode = off
replicate_select = off
load_balance_mode = on
ignore_leading_white_space = on
white_function_list = ''
black_function_list = 'nextval,setval'
database_redirect_preference_list = ''
app_name_redirect_preference_list = ''
allow_sql_comments = off

Do you think I still need to give it a try with the "3.7 stable head" or updating one line diff?
If so, This is what I will try. Could you check if the process is okay?

① rpm -ivh http://www.pgpool.net/yum/srpms/3.7/redhat/rhel-7-x86_64/pgpool-II-pg96-3.7.5-1pgdg.rhel7.src.rpm
② cp rpmbuild/SOURCES/pgpool-II-3.7.5.tar.gz ~/pgpool-II-3.7.5.tar.gz
③ tar vxf ~/pgpool-II-3.7.5.tar.gz
④ vim ~/pgpool-II-3.7.5/src/protocol/CommandComplete.c
⑤ (update one line)
⑥ tar cvzf ~/pgpool-II-3.7.5.tar.gz ~/pgpool-II-3.7.5/
⑦ cp ~/pgpool-II-3.7.5.tar.gz rpmbuild/SOURCES/pgpool-II-3.7.5.tar.gz
⑧ yum install pam-devel openssl-devel libmemcached-devel gcc jade libxslt docbook-dtds docbook-style-xsl docbook-style-dsssl docbook docbook-dsssl docbook-sgml docbook-xsl openjade
⑨ rpmbuild -ba /root/rpmbuild/SPECS/pgpool.spec --define="pgpool_version 3.7.5" --define="pg_version 96" --define="pghome /usr/pgsql-9.6" --define="dist .rhel7"
⑩ (check the results inside /root/rpmbuild/RPMS/x86_64 and start install)

I'm deeply sorry for bothering you. Thank you again.

jgj1018

2018-10-18 20:09

reporter   ~0002216

Let me add one more information. It might be helpful for you to understand our architecture.
The old pgpool(3.0.3) is installed in the OS server(CentOS 6.x) where application source exists. (we use cakephp3.x as our framework)
and new pgpool(3.7.5) is installed in a different OS server(CentOS 7.x) which means separated from application servers.

jgj1018

2018-10-24 11:26

reporter   ~0002226

FYI, This problem is resolved by upgrading to 4.0.0.

t-ishii

2018-10-24 12:40

developer   ~0002227

Great! Can we mark this issue as "resolved"?

jgj1018

2018-10-24 19:34

reporter   ~0002229

yes, you can. Thank you for helping me again.

Issue History

Date Modified Username Field Change
2018-10-12 11:40 jgj1018 New Issue
2018-10-12 11:40 jgj1018 File Added: pgpool.conf
2018-10-12 11:40 jgj1018 File Added: pgpool.log.20181012.txt
2018-10-12 11:46 jgj1018 File Added: pgpool_old.conf
2018-10-12 11:46 jgj1018 Note Added: 0002195
2018-10-12 15:39 jgj1018 Note Added: 0002198
2018-10-17 09:45 t-ishii Assigned To => t-ishii
2018-10-17 09:45 t-ishii Status new => assigned
2018-10-17 09:45 t-ishii Description Updated View Revisions
2018-10-17 10:13 t-ishii Note Added: 0002204
2018-10-17 14:07 jgj1018 File Added: pgpool.log.20181017.fail
2018-10-17 14:07 jgj1018 File Added: pgpool.log.20181017.success
2018-10-17 14:07 jgj1018 Note Added: 0002207
2018-10-17 14:36 jgj1018 Note Added: 0002208
2018-10-17 17:02 t-ishii Summary Unexpected => Unexpected movement on load balancing (master slave mode)
2018-10-17 17:03 t-ishii Note Added: 0002209
2018-10-17 17:51 t-ishii Note Added: 0002210
2018-10-17 17:52 t-ishii Status assigned => feedback
2018-10-17 17:52 t-ishii Target Version => 3.7.6
2018-10-17 18:04 t-ishii Note Edited: 0002210 View Revisions
2018-10-18 10:28 jgj1018 Note Added: 0002211
2018-10-18 10:28 jgj1018 Status feedback => assigned
2018-10-18 11:19 t-ishii Note Added: 0002212
2018-10-18 11:25 t-ishii Status assigned => feedback
2018-10-18 11:53 pengbo Note Added: 0002213
2018-10-18 17:32 jgj1018 Note Added: 0002214
2018-10-18 17:32 jgj1018 Status feedback => assigned
2018-10-18 20:09 jgj1018 Note Added: 0002216
2018-10-24 11:26 jgj1018 Note Added: 0002226
2018-10-24 12:40 t-ishii Note Added: 0002227
2018-10-24 19:34 jgj1018 Note Added: 0002229
2018-10-24 19:39 t-ishii Status assigned => resolved
2018-10-24 19:39 t-ishii Fixed in Version => 4.0.0