View Issue Details

IDProjectCategoryView StatusLast Update
0000353Pgpool-IIBugpublic2017-10-27 15:25
ReportereldadAssigned Tot-ishii 
PriorityurgentSeverityblockReproducibilityalways
Status resolvedResolutionopen 
Product Version3.6.6 
Target Version3.6.7Fixed in Version 
Summary0000353: Once slave is attached the master becomes very slow and evantully hangs
DescriptionWe upgraded our QA env from postgres 9.3.3 with pgpool 3.4.7 to postgres 9.6.5 with pgpool 3.6.6.
pgpool was installed using rpm (pgpool-II-pg96-3.6.6-1pgdg.rhel6.x86_64).
We have master-slave with streaming replication.
load balance mode is off.

Once we attache the slave to the pgpool the master server becomes very slow,
Sessions are staying in active state for long time, much higher then usual, after few minutes
all the pools are used(400) and we can't create new sessions anymore.
on the slave I see many sessions created even though LB mode is off.
detaching the slave in this situation will not solve the problem, only when I shut down the DB on the slave (which will cause all the sessions on it to be closed) the problem is gone in 2 seconds and master becomes idle again.

This problem prevent us from using pgpool for HA as we can't work with the slave attached.
I'm attaching the conf file.

Steps To Reproducemaster slave streaming replication postgres 9.6.5 with pgpool 3.6.6.
application is generating 0000005:0000020 TPS.
start the pgpool when master is up.
start the slave and let it sync, then attach the slave.
Additional InformationVM with RHEL 6.3
Tagsstreaming replication

Activities

eldad

2017-10-17 00:39

reporter  

pgpool.conf (38,035 bytes)

t-ishii

2017-10-18 10:58

developer   ~0001763

Sounds like a network or DNS issue. To confirm this, you could attach strace to the Pgpool-II child process. For instance,

set num_init_children to 1.
start pgpool-II.
find the Pgpool-II child process id by using ps (something like ps aux|grep pgpool|grep "pgpool: wait for"
attach strace: "strace -tt -p the_pid" or "strace -T -p the_pid"
find the system call which takes long time.

eldad

2017-10-19 15:49

reporter   ~0001767

Hi Ishii,

Thank you for your response.
1. The servers are the same servers I used with postgres 9.3.3 and pgpool 3.4.7,
its on the same LAN and the servers names are in /etc/hosts
After the upgrade the performance changed.
2. Can you shortly explain why pgpool duplicate the sessions on the slave even our LB mode is off?
is there a way to disable it?

Anyway I will check the network as you suggested.

Thanks,
Eldad

eldad

2017-10-23 20:52

reporter   ~0001773

Hi,

Out Network guys checked the DNS and network, they reported everything looks valid.
I installed another cluster on new machines with same postgres and pgpool versions and it didn't help,
we encounter same performance problems.

Regards,
Eldad

eldad

2017-10-23 20:55

reporter   ~0001774

BTW, pgpool is running with OPTS=" -n" as per logging bug

t-ishii

2017-10-23 21:59

developer   ~0001775

Without the strace log I requested, it's very hard to find the problem. At least I need debug log.

eldad

2017-10-24 21:11

reporter   ~0001777

I'm attaching the strace and pgpool log with debug5

strace.log (1,107,502 bytes)
pgpool.log (1,238,034 bytes)

t-ishii

2017-10-24 22:12

developer   ~0001779

I have looked into the log and found that Pgpool-II received 'H' (flush) message from frontend:

2017-10-24 05:03:03: pid 20560: stormr_pcoe: storm_admin_exec: DETAIL: received kind 'H'(48) from frontend

Then it was stuck here:
2017-10-24 05:03:03: pid 20560: stormr_pcoe: storm_admin_exec: DETAIL: backend:0 of 2 kind = '1'

This is pretty much similar to:
https://www.pgpool.net/mantisbt/view.php?id=345

A fix was attached to the bug report (bug345.diff).

Or you could try the git repository since the fix was already in it:
https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=dc858b93055f22a9055acfa98c203325c1c1621d

eldad

2017-10-24 22:19

reporter   ~0001780

Hi,

I did another check, removed pgpool 3.6.6 and installed pgpool-II-pg96-3.4.13 (same postgres DB 9.6.5)
looks like the problem doesn't exists there and everything works as expected.

Regards,
Eldad

t-ishii

2017-10-24 23:08

developer   ~0001781

Yes, 3.4 does not have the problem. Only 3.5 or greater are affected by the problem. However, 3.4 or before are very slow when used with extended queries (prepared statements).

eldad

2017-10-26 18:55

reporter   ~0001788

Many thanks.

I verified pgpool-II-pg96-3.5.10 has this issue and pgpool-II-pg96-3.6.2 doesn't have it,
so I will stay with 3.6.2 for now in order not to stay with older 3.4 version as you suggested.
will it be fixed in the coming rpm release?

Thanks,
Eldad

t-ishii

2017-10-27 15:23

developer   ~0001789

Yes, of course. I am going to set the status of this as "resolved".

Issue History

Date Modified Username Field Change
2017-10-17 00:39 eldad New Issue
2017-10-17 00:39 eldad File Added: pgpool.conf
2017-10-17 00:39 eldad Tag Attached: streaming replication
2017-10-18 10:52 t-ishii Assigned To => t-ishii
2017-10-18 10:52 t-ishii Status new => feedback
2017-10-18 10:58 t-ishii Note Added: 0001763
2017-10-19 15:49 eldad Note Added: 0001767
2017-10-19 15:49 eldad Status feedback => assigned
2017-10-23 20:52 eldad Note Added: 0001773
2017-10-23 20:55 eldad Note Added: 0001774
2017-10-23 21:59 t-ishii Note Added: 0001775
2017-10-23 21:59 t-ishii Status assigned => feedback
2017-10-24 21:11 eldad File Added: pgpool.log
2017-10-24 21:11 eldad File Added: strace.log
2017-10-24 21:11 eldad Note Added: 0001777
2017-10-24 21:11 eldad Status feedback => assigned
2017-10-24 22:12 t-ishii Note Added: 0001779
2017-10-24 22:13 t-ishii Status assigned => feedback
2017-10-24 22:19 eldad Note Added: 0001780
2017-10-24 22:19 eldad Status feedback => assigned
2017-10-24 23:08 t-ishii Note Added: 0001781
2017-10-25 12:29 t-ishii Status assigned => feedback
2017-10-26 18:55 eldad Note Added: 0001788
2017-10-26 18:55 eldad Status feedback => assigned
2017-10-27 15:23 t-ishii Note Added: 0001789
2017-10-27 15:25 t-ishii Status assigned => resolved
2017-10-27 15:25 t-ishii Target Version => 3.6.7