View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000435 | Pgpool-II | General | public | 2018-10-12 11:40 | 2018-10-24 19:39 |
| Reporter | jgj1018 | Assigned To | t-ishii | ||
| Priority | urgent | Severity | block | Reproducibility | always |
| Status | resolved | Resolution | open | ||
| Platform | vm | OS | CentOS | OS Version | release 7.5.1804 |
| Product Version | 3.7.5 | ||||
| Target Version | 3.7.6 | Fixed in Version | 4.0.0 | ||
| Summary | 0000435: Unexpected movement on load balancing (master slave mode) | ||||
| Description | First, I'm not for sure that It's a bug of pgpool, but just I want you to give us any advice to investigate the root cause of this situation. because It seems related with master slave mode and load balancing function of pgpool. We are currently using pgpool(3.0.3) & PostgreSQL (9.0.18, 1 master & 1 slave) and faced a unexpected trouble on upgrading the version of pgpool(to 3.7.5) and postgreSQL(to 9.6.10, 1 master & 1 slave). I attached the log and current setting of pgpool.conf. It looks similar with a old (but seems fixed) bug. http://lists.pgfoundry.org/pipermail/pgpool-hackers/2010-May/000311.html And This doesn't happen in Local test env (pgpool(to 3.7.5) and postgreSQL(9.6.9, 1 master & 1 slave). It's very mysterious. Could you give me any advice on this situation? | ||||
| Tags | No tags attached. | ||||
|
|
|
|
|
FYI, It also worked fine with the combination of our old pgpool(3.0.3) and new postgreSQL(9.6.9). That's why we thought it maybe is a bug of pgpool. Let me add our old pgpool.conf(3.0.3) |
|
|
Let me add further information. It seems a query prepared in Master but Select is sent to Slave. but the select have to be sent to Master. if both are sent to Master, it's working. but not working in case that select is sent to Slave. The thing is that same code is totally fine on pgpool(3.0.3) & PostgreSQL (9.0.18, 1 master & 1 slave) |
|
|
After some investigations, I realized that it is likely a bug of 3.7. At line 2404 of pgpool.log.20181012.txt, a SELECT was executed using prepared statement "pdo_pgsql_stmt_02398690" on DB node 0. At line 2461 deallocate of "pdo_pgsql_stmt_02398690" was requested by frontend and it completed on node 0. However for some reason I don't know, Pgpool-II did not remove the info on "pdo_pgsql_stmt_02398690". So Pgpool-II thought any subsequent query using the statement was sent to node 0. Then next parse request came and it was sent to node 1 because the SELECT can be load balanced (line 2634) Corresponding bind request came (line 2841). Pgpool-II decided to send the bind message to node 0 since the info regarding "pdo_pgsql_stmt_02398690" in Pgpool-II still pointed to node 0 Later on, deallocate statement came (line 9332). It was also sent to node 0, and the error raised because there's no "pdo_pgsql_stmt_02398690" on node 0. I will investigate the problem. |
|
|
First, Thank you for the investigation and sorry for the mistake in the title. if you can, Could you change the title of this ticket to "unexpected movement on load balancing (master slave mode)" ? And Let me share further information. with the setting as follows, sometimes it works, and sometimes it fails. I'll attached the detailed debug log. I hope this information can be helpful. replication_mode = off replicate_select = off load_balance_mode = on ignore_leading_white_space = on white_function_list = '' black_function_list = 'nextval,setval' database_redirect_preference_list = '' app_name_redirect_preference_list = '' allow_sql_comments = off |
|
|
FYI, The result of "show pool_nodes"; node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay ---------+----------------------------------+------+--------+-----------+---------+------------+-------------------+------------------- 0 | master | 5432 | up | 0.300000 | primary | 190 | false | 0 1 | slave | 5432 | up | 0.700000 | standby | 85 | true | 0 (2 rows) |
|
|
I have changed the title as you requested. |
|
|
It appears that this is a known bug with 3.7.5 and was recently fixed: https://git.postgresql.org/gitweb/?p=pgpool2.git;a=blobdiff;f=src/protocol/CommandComplete.c;h=a16a05424c377a5d3a06ef3b4e866dfbff7c8d0f;hp=7322d86f738bdce2b9984df4ec5556eefb58f8ef;hb=e22933a1a514fcecf3874f79ff7444e5809a1533;hpb=209ef0ee7b74be40a11a69202cc833627ab911a5 Please try the 1 line diff or 3.7 stable head. Also we are planning to make minor releases around October 30. We will release 3.7.6 which includes the fix. |
|
|
Thank you for the investigation! It becomes more clear. but the thing is that we installed pgpool with "yum" command. I tried to adapt the diff on binary level, but failed (There were too many diff with just one line change.) And I also tried to use "yum update" with the change "/etc/yum.repos.d/pgpool-II-release-37.repo" if possible, but seems the srpm doesn't exist. Do you have any ideas to update the one line in this situation? |
|
|
You can get SRPMS from here: http://www.pgpool.net/yum/srpms/3.7/redhat/rhel-7-x86_64/ |
|
|
To install SRPMS please use "rpm -ivh ..." instead of "yum". For example: rpm -ivh http://www.pgpool.net/yum/srpms/3.7/redhat/rhel-7-x86_64/pgpool-II-pg10-3.7.5-1pgdg.rhel7.src.rpm |
|
|
Thank you for the information. and Thank you for the helpful example. Due to some restriction in the company, I first downgrade the current one(3.7.5) to 3.6.12 and tried but didn't work again. The reason why I downgrade is that I found out that diff line(latest version) looks working the same as "3.6.12". 3.7.6 : if (session_context->query_context != NULL && (!SL_MODE || (SL_MODE && !pool_is_doing_extended_query_message()))) 3.6.12 : if (session_context->query_context != NULL && (!STREAM || (SL_MODE && !pool_is_doing_extended_query_message()))) But it didn't work again. Exactly like the situation above, randomly sometimes it worked, and sometimes it failed. The setting I used for 3.6.12 is basically the same as follows. replication_mode = off replicate_select = off load_balance_mode = on ignore_leading_white_space = on white_function_list = '' black_function_list = 'nextval,setval' database_redirect_preference_list = '' app_name_redirect_preference_list = '' allow_sql_comments = off Do you think I still need to give it a try with the "3.7 stable head" or updating one line diff? If so, This is what I will try. Could you check if the process is okay? ① rpm -ivh http://www.pgpool.net/yum/srpms/3.7/redhat/rhel-7-x86_64/pgpool-II-pg96-3.7.5-1pgdg.rhel7.src.rpm ② cp rpmbuild/SOURCES/pgpool-II-3.7.5.tar.gz ~/pgpool-II-3.7.5.tar.gz ③ tar vxf ~/pgpool-II-3.7.5.tar.gz ④ vim ~/pgpool-II-3.7.5/src/protocol/CommandComplete.c ⑤ (update one line) ⑥ tar cvzf ~/pgpool-II-3.7.5.tar.gz ~/pgpool-II-3.7.5/ ⑦ cp ~/pgpool-II-3.7.5.tar.gz rpmbuild/SOURCES/pgpool-II-3.7.5.tar.gz ⑧ yum install pam-devel openssl-devel libmemcached-devel gcc jade libxslt docbook-dtds docbook-style-xsl docbook-style-dsssl docbook docbook-dsssl docbook-sgml docbook-xsl openjade ⑨ rpmbuild -ba /root/rpmbuild/SPECS/pgpool.spec --define="pgpool_version 3.7.5" --define="pg_version 96" --define="pghome /usr/pgsql-9.6" --define="dist .rhel7" ⑩ (check the results inside /root/rpmbuild/RPMS/x86_64 and start install) I'm deeply sorry for bothering you. Thank you again. |
|
|
Let me add one more information. It might be helpful for you to understand our architecture. The old pgpool(3.0.3) is installed in the OS server(CentOS 6.x) where application source exists. (we use cakephp3.x as our framework) and new pgpool(3.7.5) is installed in a different OS server(CentOS 7.x) which means separated from application servers. |
|
|
FYI, This problem is resolved by upgrading to 4.0.0. |
|
|
Great! Can we mark this issue as "resolved"? |
|
|
yes, you can. Thank you for helping me again. |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2018-10-12 11:40 | jgj1018 | New Issue | |
| 2018-10-12 11:40 | jgj1018 | File Added: pgpool.conf | |
| 2018-10-12 11:40 | jgj1018 | File Added: pgpool.log.20181012.txt | |
| 2018-10-12 11:46 | jgj1018 | File Added: pgpool_old.conf | |
| 2018-10-12 11:46 | jgj1018 | Note Added: 0002195 | |
| 2018-10-12 15:39 | jgj1018 | Note Added: 0002198 | |
| 2018-10-17 09:45 | t-ishii | Assigned To | => t-ishii |
| 2018-10-17 09:45 | t-ishii | Status | new => assigned |
| 2018-10-17 09:45 | t-ishii | Description Updated | |
| 2018-10-17 10:13 | t-ishii | Note Added: 0002204 | |
| 2018-10-17 14:07 | jgj1018 | File Added: pgpool.log.20181017.fail | |
| 2018-10-17 14:07 | jgj1018 | File Added: pgpool.log.20181017.success | |
| 2018-10-17 14:07 | jgj1018 | Note Added: 0002207 | |
| 2018-10-17 14:36 | jgj1018 | Note Added: 0002208 | |
| 2018-10-17 17:02 | t-ishii | Summary | Unexpected => Unexpected movement on load balancing (master slave mode) |
| 2018-10-17 17:03 | t-ishii | Note Added: 0002209 | |
| 2018-10-17 17:51 | t-ishii | Note Added: 0002210 | |
| 2018-10-17 17:52 | t-ishii | Status | assigned => feedback |
| 2018-10-17 17:52 | t-ishii | Target Version | => 3.7.6 |
| 2018-10-17 18:04 | t-ishii | Note Edited: 0002210 | |
| 2018-10-18 10:28 | jgj1018 | Note Added: 0002211 | |
| 2018-10-18 10:28 | jgj1018 | Status | feedback => assigned |
| 2018-10-18 11:19 | t-ishii | Note Added: 0002212 | |
| 2018-10-18 11:25 | t-ishii | Status | assigned => feedback |
| 2018-10-18 11:53 | pengbo | Note Added: 0002213 | |
| 2018-10-18 17:32 | jgj1018 | Note Added: 0002214 | |
| 2018-10-18 17:32 | jgj1018 | Status | feedback => assigned |
| 2018-10-18 20:09 | jgj1018 | Note Added: 0002216 | |
| 2018-10-24 11:26 | jgj1018 | Note Added: 0002226 | |
| 2018-10-24 12:40 | t-ishii | Note Added: 0002227 | |
| 2018-10-24 19:34 | jgj1018 | Note Added: 0002229 | |
| 2018-10-24 19:39 | t-ishii | Status | assigned => resolved |
| 2018-10-24 19:39 | t-ishii | Fixed in Version | => 4.0.0 |