[pgpool-general: 3395] Re: v3.4.0.(3)? - memory issue and connection hangs
Tatsuo Ishii
ishii at postgresql.org
Sat Jan 3 09:56:00 JST 2015
> Hi,
>
> I'm going to apologize in advance for his lengthy email message. I'm
> hoping there's sufficient valid data presented for some assistance.
>
> I'm on CentOS 7 with the latest updates. I'm using the latest version
> of pgpool [1]: v3.4.0.3
>
> We're seeing two issues. One, a memory leak and another PG Pool seems
> to hang when there's a /severe/ in-rush of connections.
>
> At a high-level, we have the following PGPool features enabled:
>
> o num_init_children = 95
> o max_pool = 1
> o connection_cache = on
> o replication_mode = off
> o load_balance_mode = on
>
> We have one /slave/ for our /master/
>
> o master_slave_mode = on
> o use_watchdog = on
> o memory_cache_enabled = on
>
> Any help would be appreciated. If more data is required, please don't
> hesitate to ask.
>
> Thank you!
>
> ::: Memory Leak :::
>
> The front-end application is using the Quartz scheduler[2] whch seems
> to occasionally get into an infinite loop. The memory leak is
> triggered when we see upwards of 300+ UPDATE's per second which
> fail[3].
>
> While I understand there's an application issue which needs to be
> resolved, IMO PGPool shouldn't die because of the issue. :)
>
> I'm enclosing trimmed /sar/ memory data[4] which shows how quickly we
> run out of memory.
I thought we fixed memory leak issue in this commit (which is included
in the RPM you installed).
http://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=352195f946199e58a6f28474107df2d64bbaab46
There may be another code path which trigger another pattern of memory
leak. Let me invest this...
> ::: Memory Leak - Part 2 :::
>
> It appears, I've yet to confirm it though, but when we have
> /memory_cache_enabled = on/, we have a minor memory leak. It seems to
> take 48h+ (I've yet to let the server run out of memory) before we
> exhause all the RAM (nearly 4G) on the server.
Ok, this definitely a different pattern of memory leak. Let me
invest...
> :: Hang on in-rush :::
>
> We've been using PGBench to stress test the environment. One test in
> particular is how our environment handles a sudden burst of
> connections.
Ok, I will try to reproduce the problem using your test case. In the
mean time I would suggest increasing listen_backlog_multiplier might
help.
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
> Using the -C option, PGBench has the ability to create a connection
> per query. When 'connection_cache = off', PGPool can handle any
> number of iterations of the PGBench call per minute[5] - sh code:
>
> while [ 1 ] ; do
> [5]
> sleep 1m
> done
>
> When 'connection_cache = on', on the third iteration PGBench hangs.
>
> The benchmark can be set up following these steps[6].
>
> While monitoring network connections[7], I had to employ the following
> kernel tunes[8] to get 'connection_cache = off' to work but it didn't
> help with 'connection_cache = on'
>
> We'd like to use the connection pooling feature.
>
> ::: References :::
>
> [1] - pgpool-II-pg93-3.4.0-3pgdg.rhel6.x86_64
>
> [2] - http://quartz-scheduler.org
>
> [3] - UPDATE failure
>
> 015-01-01 22:04:02 - [unknown] (pid 20899): the-user-db: LOG:
> Parse: Error or notice message from backend: : DB node id: 0 backend
> pid: 28304 statement: "UPDATE QRTZ_TRIGGERS SET TRIGGER_STATE = $1
> WHERE SCHED_NAME = 'QuartzScheduler' AND TRIGGER_NAME = $2 AND
> TRIGGER_GROUP = $3 AND TRIGGER_STATE�<88>" message: "invalid byte
> sequence for encoding "UTF8": 0xe8 0x88"
>
> [4] - trimmed output of /sar -r 60 .../
>
> Note: "%memused" increasing attt roughly 9:43:27 pm
>
> kbmemfree kbmemused %memused %commit
> 09:20:27 PM 3037436 846340 21.79 23.46
> 09:21:27 PM 3042244 841532 21.67 23.37
> 09:22:27 PM 3041180 842596 21.70 23.39
> 09:23:27 PM 3036952 846824 21.80 23.46
> 09:24:27 PM 3023848 859928 22.14 23.69
> 09:25:27 PM 3023632 860144 22.15 23.69
> 09:26:27 PM 2676032 1207744 31.10 28.91
> 09:27:27 PM 2404916 1478860 38.08 28.91
> 09:28:27 PM 2061288 1822488 46.93 28.93
> 09:29:27 PM 2025796 1857980 47.84 28.95
> 09:30:27 PM 1876968 2006808 51.67 28.96
> 09:31:27 PM 1877060 2006716 51.67 28.96
> 09:32:27 PM 1875104 2008672 51.72 29.02
> 09:33:27 PM 1876220 2007556 51.69 29.01
> 09:34:27 PM 2929684 954092 24.57 25.12
> 09:35:27 PM 2907896 975880 25.13 25.48
> 09:36:27 PM 3001660 882116 22.71 7.99
> 09:37:27 PM 3053604 830172 21.38 24.51
> 09:38:27 PM 2903444 980332 25.24 25.61
> 09:39:27 PM 2881668 1002108 25.80 25.96
> 09:40:27 PM 2854664 1029112 26.50 26.44
> 09:41:27 PM 2847116 1036660 26.69 26.57
> 09:42:27 PM 2840500 1043276 26.86 26.67
> 09:43:27 PM 2830284 1053492 27.13 26.82
> 09:44:27 PM 2581672 1302104 33.53 31.63
> 09:45:27 PM 2297892 1585884 40.83 37.20
> 09:46:27 PM 2019400 1864376 48.00 42.38
> 09:47:27 PM 1717092 2166684 55.79 48.02
> 09:48:27 PM 1430172 2453604 63.18 53.36
> 09:49:27 PM 1158128 2725648 70.18 58.43
> 09:50:27 PM 884424 2999352 77.23 63.56
> 09:51:27 PM 584972 3298804 84.94 69.20
> 09:52:27 PM 309340 3574436 92.04 74.66
> 09:53:27 PM 131388 3752388 96.62 80.59
> 09:54:27 PM 105308 3778468 97.29 86.00
> 09:55:27 PM 112748 3771028 97.10 91.64
> 09:56:27 PM 130988 3752788 96.63 97.19
> 09:57:27 PM 107760 3776016 97.23 102.37
> 09:58:27 PM 103328 3780448 97.34 107.24
> 09:59:27 PM 113372 3770404 97.08 112.71
> 10:00:27 PM 145368 3738408 96.26 117.67
> 10:01:27 PM 112524 3771252 97.10 122.87
> 10:02:27 PM 113772 3770004 97.07 127.92
> 10:03:27 PM 103928 3779848 97.32 132.97
> 10:04:27 PM 140552 3743224 96.38 119.54
> 10:05:27 PM 139540 3744236 96.41 119.54
> 10:06:27 PM 139740 3744036 96.40 119.54
> 10:07:27 PM 139464 3744312 96.41 119.54
> 10:08:27 PM 139592 3744184 96.41 119.54
> 10:09:27 PM 139600 3744176 96.41 119.54
> 10:10:27 PM 133392 3750384 96.57 119.54
> 10:11:27 PM 133488 3750288 96.56 119.54
> 10:12:27 PM 133552 3750224 96.56 119.54
> 10:13:27 PM 133632 3750144 96.56 119.54
> 10:14:27 PM 133624 3750152 96.56 119.54
> 10:15:27 PM 133640 3750136 96.56 119.54
> 10:16:27 PM 133768 3750008 96.56 119.54
> 10:17:27 PM 133800 3749976 96.55 119.54
> 10:18:27 PM 133832 3749944 96.55 119.54
> 10:19:27 PM 133800 3749976 96.55 119.54
> 10:20:27 PM 86816 3796960 97.76 119.54
> 10:21:27 PM 97432 3786344 97.49 119.54
> 10:22:27 PM 103120 3780656 97.34 119.54
> 10:23:27 PM 124552 3759224 96.79 119.54
> 10:24:27 PM 3619768 264008 6.80 12.24
> 10:25:27 PM 3614228 269548 6.94 12.24
>
> [5] pgbench call
>
> #
> # db-cluster: the VIP to a two-node PGPool cluster.
> #
> date ; /usr/pgsql-9.3/bin/pgbench -h db-cluster -U postgres -T 30 -S
> -c 10 -C pgbench ; date
>
> [6] pgbench setup
>
> yum -y install postgresql93-contrib
> createdb -h db-cluster -U postgres pgbench
>
> # scale of 100
> /usr/pgsql-9.3/bin/pgbench -h db-cluster -U postgres -i -s 100
> --foreign-keys --unlogged-tables pgbench
>
> [7] Monitor connections
>
> # Script found on the web
>
> while [ 1 ] ; do
> netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}';
> sleep 1;
> echo '---';
> done
>
> [8] kernel tunes on PGPool Server
>
> # Default: 0
> net.ipv4.tcp_tw_reuse = 1
>
> # Default: 32768 61000
> net.ipv4.ip_local_port_range = 1024 65000
>
> # Default: 128
> net.core.somaxconn = 10240
> --
> Pablo Sanchez - Blueoak Database Engineering, Inc
> Ph: 819.459.1926 Blog: http://pablo-blog.blueoakdb.com
> iNum: 883.5100.0990.1054
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
More information about the pgpool-general
mailing list