View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000353 | Pgpool-II | Bug | public | 2017-10-17 00:39 | 2017-10-27 15:25 |
| Reporter | eldad | Assigned To | t-ishii | ||
| Priority | urgent | Severity | block | Reproducibility | always |
| Status | resolved | Resolution | open | ||
| Product Version | 3.6.6 | ||||
| Target Version | 3.6.7 | ||||
| Summary | 0000353: Once slave is attached the master becomes very slow and evantully hangs | ||||
| Description | We upgraded our QA env from postgres 9.3.3 with pgpool 3.4.7 to postgres 9.6.5 with pgpool 3.6.6. pgpool was installed using rpm (pgpool-II-pg96-3.6.6-1pgdg.rhel6.x86_64). We have master-slave with streaming replication. load balance mode is off. Once we attache the slave to the pgpool the master server becomes very slow, Sessions are staying in active state for long time, much higher then usual, after few minutes all the pools are used(400) and we can't create new sessions anymore. on the slave I see many sessions created even though LB mode is off. detaching the slave in this situation will not solve the problem, only when I shut down the DB on the slave (which will cause all the sessions on it to be closed) the problem is gone in 2 seconds and master becomes idle again. This problem prevent us from using pgpool for HA as we can't work with the slave attached. I'm attaching the conf file. | ||||
| Steps To Reproduce | master slave streaming replication postgres 9.6.5 with pgpool 3.6.6. application is generating 0000005:0000020 TPS. start the pgpool when master is up. start the slave and let it sync, then attach the slave. | ||||
| Additional Information | VM with RHEL 6.3 | ||||
| Tags | streaming replication | ||||
|
|
|
|
|
Sounds like a network or DNS issue. To confirm this, you could attach strace to the Pgpool-II child process. For instance, set num_init_children to 1. start pgpool-II. find the Pgpool-II child process id by using ps (something like ps aux|grep pgpool|grep "pgpool: wait for" attach strace: "strace -tt -p the_pid" or "strace -T -p the_pid" find the system call which takes long time. |
|
|
Hi Ishii, Thank you for your response. 1. The servers are the same servers I used with postgres 9.3.3 and pgpool 3.4.7, its on the same LAN and the servers names are in /etc/hosts After the upgrade the performance changed. 2. Can you shortly explain why pgpool duplicate the sessions on the slave even our LB mode is off? is there a way to disable it? Anyway I will check the network as you suggested. Thanks, Eldad |
|
|
Hi, Out Network guys checked the DNS and network, they reported everything looks valid. I installed another cluster on new machines with same postgres and pgpool versions and it didn't help, we encounter same performance problems. Regards, Eldad |
|
|
BTW, pgpool is running with OPTS=" -n" as per logging bug |
|
|
Without the strace log I requested, it's very hard to find the problem. At least I need debug log. |
|
|
I'm attaching the strace and pgpool log with debug5 |
|
|
I have looked into the log and found that Pgpool-II received 'H' (flush) message from frontend: 2017-10-24 05:03:03: pid 20560: stormr_pcoe: storm_admin_exec: DETAIL: received kind 'H'(48) from frontend Then it was stuck here: 2017-10-24 05:03:03: pid 20560: stormr_pcoe: storm_admin_exec: DETAIL: backend:0 of 2 kind = '1' This is pretty much similar to: https://www.pgpool.net/mantisbt/view.php?id=345 A fix was attached to the bug report (bug345.diff). Or you could try the git repository since the fix was already in it: https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=dc858b93055f22a9055acfa98c203325c1c1621d |
|
|
Hi, I did another check, removed pgpool 3.6.6 and installed pgpool-II-pg96-3.4.13 (same postgres DB 9.6.5) looks like the problem doesn't exists there and everything works as expected. Regards, Eldad |
|
|
Yes, 3.4 does not have the problem. Only 3.5 or greater are affected by the problem. However, 3.4 or before are very slow when used with extended queries (prepared statements). |
|
|
Many thanks. I verified pgpool-II-pg96-3.5.10 has this issue and pgpool-II-pg96-3.6.2 doesn't have it, so I will stay with 3.6.2 for now in order not to stay with older 3.4 version as you suggested. will it be fixed in the coming rpm release? Thanks, Eldad |
|
|
Yes, of course. I am going to set the status of this as "resolved". |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2017-10-17 00:39 | eldad | New Issue | |
| 2017-10-17 00:39 | eldad | File Added: pgpool.conf | |
| 2017-10-17 00:39 | eldad | Tag Attached: streaming replication | |
| 2017-10-18 10:52 | t-ishii | Assigned To | => t-ishii |
| 2017-10-18 10:52 | t-ishii | Status | new => feedback |
| 2017-10-18 10:58 | t-ishii | Note Added: 0001763 | |
| 2017-10-19 15:49 | eldad | Note Added: 0001767 | |
| 2017-10-19 15:49 | eldad | Status | feedback => assigned |
| 2017-10-23 20:52 | eldad | Note Added: 0001773 | |
| 2017-10-23 20:55 | eldad | Note Added: 0001774 | |
| 2017-10-23 21:59 | t-ishii | Note Added: 0001775 | |
| 2017-10-23 21:59 | t-ishii | Status | assigned => feedback |
| 2017-10-24 21:11 | eldad | File Added: pgpool.log | |
| 2017-10-24 21:11 | eldad | File Added: strace.log | |
| 2017-10-24 21:11 | eldad | Note Added: 0001777 | |
| 2017-10-24 21:11 | eldad | Status | feedback => assigned |
| 2017-10-24 22:12 | t-ishii | Note Added: 0001779 | |
| 2017-10-24 22:13 | t-ishii | Status | assigned => feedback |
| 2017-10-24 22:19 | eldad | Note Added: 0001780 | |
| 2017-10-24 22:19 | eldad | Status | feedback => assigned |
| 2017-10-24 23:08 | t-ishii | Note Added: 0001781 | |
| 2017-10-25 12:29 | t-ishii | Status | assigned => feedback |
| 2017-10-26 18:55 | eldad | Note Added: 0001788 | |
| 2017-10-26 18:55 | eldad | Status | feedback => assigned |
| 2017-10-27 15:23 | t-ishii | Note Added: 0001789 | |
| 2017-10-27 15:25 | t-ishii | Status | assigned => resolved |
| 2017-10-27 15:25 | t-ishii | Target Version | => 3.6.7 |