[pgpool-general: 961] Re: Query Memory Cache goes OOM

Sebastian Kornehl webmaster at sourcebase.org
Wed Sep 5 22:00:31 JST 2012


I rolled back to the default conf and did some changes, the diff between 
the new and the old conf is:

108c108
< max_pool = 16
---
 > max_pool = 20
191c191
< connection_cache = on
---
 > connection_cache = off
487c487
< relcache_size = 256
---
 > relcache_size = 512
493c493
< check_temp_table = on
---
 > check_temp_table = off
510c510
< memqcache_memcached_host = 'localhost'
---
 > #memqcache_memcached_host = 'localhost'
514c514
< memqcache_memcached_port = 11211
---
 > #memqcache_memcached_port = 11211
530c530
< memqcache_auto_cache_invalidation = on
---
 > memqcache_auto_cache_invalidation = off

and now it looks stable. I'm not sure which point(s) caused my problem 
but I can live with this configuration. I'll try this today with some 
real frontends :)

Thanks for your help!

Sebastian


On 09/05/2012 02:10 PM, Tatsuo Ishii wrote:
> I have no problem with default pgpool.conf with memory cache enabled.
>   After 5 minutes run of "pgbench -n -S -p 11002 -c 32 -T 300 test"
>
> $ LANG=C free
>               total       used       free     shared    buffers     cached
> Mem:       8077568    6628312    1449256          0     274888    4472472
> -/+ buffers/cache:    1880952    6196616
> Swap:      8835712         24    8835688
>
> By using included script (I found from: http://d.hatena.ne.jp/naoya/20080727/1217119867)
> which is basically looking into /proc/PID/status:
>
> PID	RSS	SHARED
> 9623	35448	35232 (99%)
>
> That means pgpool process 9623 uses 35448KB virtual memory and 35234KB
> out of 35448KB is shared memory.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>
>> I've waited ~30 minutes after the machine was unresponsive to check if
>> it came back - and it came. pgpool log says:
>>
>> 2012-09-05 12:37:25 LOG: pid 8812: statement: SELECT value FROM data
>> WHERE id =16239;
>> 2012-09-05 13:03:53 ERROR: pid 12575: pool_read: EOF encountered with
>> backend
>> 2012-09-05 13:03:55 ERROR: pid 12575: pool_read_kind: error while
>> reading message kind
>> 2012-09-05 13:04:05 ERROR: pid 13114: SimpleForwardToFrontend:
>> pool_write_and_flush failed
>> 2012-09-05 13:04:05 LOG: pid 13114: do_child: exits with status 1 due
>> to error
>> [.. this comes a few times, then ..]
>> 2012-09-05 13:04:08 LOG: pid 8730: do_child: exits with status 1 due
>> to error
>> 2012-09-05 13:04:09 ERROR: pid 8812: SimpleForwardToFrontend:
>> pool_write_and_flush failed
>> 2012-09-05 13:04:15 LOG: pid 8812: do_child: exits with status 1 due
>> to error
>> 2012-09-05 13:04:47 LOG:   pid 9338: authentication is timeout
>> 2012-09-05 13:04:54 LOG: pid 9691: connection received: host=[...]
>> port=42784
>> 2012-09-05 13:04:54 LOG: pid 960: PCP child 1596 exits with status 9
>> by signal 9
>> 2012-09-05 13:04:56 LOG:   pid 960: fork a new PCP child pid 14474
>> 2012-09-05 13:04:56 LOG: pid 960: worker child 1597 exits with status
>> 9 by signal 9
>> 2012-09-05 13:04:56 LOG:   pid 960: fork a new worker child pid 14475
>> 2012-09-05 13:05:01 LOG:   pid 3927: statement: show pool_cache;
>>
>> syslog says:
>>
>> 2012 Sep 5 13:04:11 pgpool-1 [err]: kernel: Out of memory: Killed
>> process 1596, UID 0, (pgpool).
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: pgpool invoked
>> oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0
>> 2012 Sep  5 13:04:11 pgpool-1 [warning]: kernel:
>> 2012 Sep  5 13:04:11 pgpool-1 [warning]: kernel: Call Trace:
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff802c1b64>]
>> out_of_memory+0x8b/0x203
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff8020fa5d>]
>> __alloc_pages+0x27f/0x308
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff80211c32>]
>> do_wp_page+0x3e9/0x8e8
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff802336b6>]
>> read_swap_cache_async+0x42/0xd1
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff802c6c25>]
>> swapin_readahead+0x4e/0x77
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff8020996a>]
>> __handle_mm_fault+0xf11/0x144f
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff80231f76>]
>> release_sock+0x13/0xbe
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff80207116>]
>> kmem_cache_free+0x84/0xd7
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff80207116>]
>> kmem_cache_free+0x84/0xd7
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff80263929>]
>> _spin_lock_irqsave+0x9/0x14
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff80266d94>]
>> do_page_fault+0xf72/0x131b
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff8041e973>]
>> sys_accept+0x1b6/0x1d2
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff80207116>]
>> kmem_cache_free+0x84/0xd7
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff80263929>]
>> _spin_lock_irqsave+0x9/0x14
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff802956b6>]
>> recalc_sigpending+0xe/0x25
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff8025f82b>]
>> error_exit+0x0/0x6e
>> 2012 Sep  5 13:04:11 pgpool-1 [warning]: kernel:
>> 2012 Sep  5 13:04:11 pgpool-1 [info]: kernel: Mem-info:
>> 2012 Sep  5 13:04:11 pgpool-1 [warning]: kernel: DMA per-cpu:
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 0 hot: high 0,
>> batch 1 used:0
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 0 cold: high 0,
>> batch 1 used:0
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 1 hot: high 0,
>> batch 1 used:0
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 1 cold: high 0,
>> batch 1 used:0
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 2 hot: high 0,
>> batch 1 used:0
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 2 cold: high 0,
>> batch 1 used:0
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 3 hot: high 0,
>> batch 1 used:0
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 3 cold: high 0,
>> batch 1 used:0
>> 2012 Sep  5 13:04:11 pgpool-1 [warning]: kernel: DMA32 per-cpu:
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 0 hot: high 186,
>> batch 31 used:74
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 0 cold: high 62,
>> batch 15 used:16
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 1 hot: high 186,
>> batch 31 used:46
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 1 cold: high 62,
>> batch 15 used:42
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 2 hot: high 186,
>> batch 31 used:10
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 2 cold: high 62,
>> batch 15 used:46
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 3 hot: high 186,
>> batch 31 used:29
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 3 cold: high 62,
>> batch 15 used:52
>> 2012 Sep  5 13:04:11 pgpool-1 [warning]: kernel: Normal per-cpu: empty
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: HighMem per-cpu: empty
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: Free pages: 10124kB
>> (0kB HighMem)
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: Active:3296
>> inactive:947386 dirty:0 writeback:0 unstable:0 free:2531 slab:8265
>> mapped-file:2016 mapped-anon:950927 pagetables:33607
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: DMA free:2040kB
>> min:16kB low:20kB high:24kB active:0kB inactive:0kB present:9076kB
>> pages_scanned:0 all_unreclaimable? yes
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: lowmem_reserve[]: 0
>> 4024 4024 4024
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: DMA32 free:8084kB
>> min:8108kB low:10132kB high:12160kB active:13184kB inactive:3789544kB
>> present:4120800kB pages_scanned:11144945 all_unreclaimable? yes
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: lowmem_reserve[]: 0 0
>> 0 0
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: Normal free:0kB
>> min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB
>> pages_scanned:0 all_unreclaimable? no
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: lowmem_reserve[]: 0 0
>> 0 0
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: HighMem free:0kB
>> min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB
>> pages_scanned:0 all_unreclaimable? no
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: lowmem_reserve[]: 0 0
>> 0 0
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: DMA: 0*4kB 1*8kB
>> 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB
>> 0*4096kB = 2040kB
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: DMA32: 17*4kB 4*8kB
>> 3*16kB 0*32kB 0*64kB 2*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB
>> 1*4096kB = 8084kB
>> 2012 Sep  5 13:04:11 pgpool-1 [warning]: kernel: Normal: empty
>> 2012 Sep  5 13:04:11 pgpool-1 [warning]: kernel: HighMem: empty
>> 2012 Sep  5 13:04:11 pgpool-1 [warning]: kernel: 26511 pagecache pages
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: Swap cache: add
>> 3128627, delete 3120367, find 1948467/2124994, race 35+122
>> 2012 Sep  5 13:04:11 pgpool-1 [warning]: kernel: Free swap  = 0kB
>> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: Total swap = 1048568kB
>> 2012 Sep 5 13:04:11 pgpool-1 [info]: kernel: Free swap: 0kB
>> 2012 Sep  5 13:04:11 pgpool-1 [info]: kernel: 1048576 pages of RAM
>> 2012 Sep  5 13:04:11 pgpool-1 [info]: kernel: 23824 reserved pages
>> 2012 Sep  5 13:04:11 pgpool-1 [info]: kernel: 159440 pages shared
>> 2012 Sep  5 13:04:11 pgpool-1 [info]: kernel: 8394 pages swap cached
>>
>>
>>
>> On 09/05/2012 12:24 PM, Sebastian Kornehl wrote:
>>> It's a fresh set up just for playing around with pgpool, the Database
>>> and pgpool have their own machines with no other services.
>>>
>>> -Sebastian
>>>
>>> On 09/05/2012 12:16 PM, Tatsuo Ishii wrote:
>>>>> I changed this, actual I test this with 20 loops. This seems to hold,
>>>>> but still uses lot of memory (and swap).
>>>>>
>>>>> Maybe I just misunderstand the config values, is
>>>>> "memqcache_total_size" a per connection configuration or a total
>>>>> value?
>>>> No. Because it sits on shared memory, it is shared by all connections.
>>>> Are there any servers running on the machine other than pgpool?
>>>> -- 
>>>> Tatsuo Ishii
>>>> SRA OSS, Inc. Japan
>>>> English: http://www.sraoss.co.jp/index_en.php
>>>> Japanese: http://www.sraoss.co.jp
>>>>
>>>>> show pool_cache gives now:
>>>>>
>>>>>    num_cache_hits | num_selects | cache_hit_ratio | num_hash_entries |
>>>>>    used_hash_entries | num_cache_entries | used_cache_enrties_size |
>>>>>    free_cache_entries_size | fragment_cache_entries_size
>>>>> ----------------+-------------+-----------------+------------------+-------------------+-------------------+-------------------------+----------------------
>>>>> -------------------+-----------------------------
>>>>>    137 | 17411 | 0.01 | 1048576 | 13026 | 13026 | 1512533 | 260121555
>>>>>      | 299816
>>>>>
>>>>> With 20 selects per second (and 15 seconds until memqcache_expire) I
>>>>> would say there should be ~300 entries in the cache and not 13026.
>>>>>
>>>>> With 31 selects per second 'top' looks like this:
>>>>>
>>>>> Mem: 4194304k total, 4180556k used, 13748k free, 1144k buffers
>>>>> Swap: 1048568k total, 1048560k used, 8k free, 160984k cached
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Sebastian
>>>>>
>>>>> On 09/05/2012 10:56 AM, Tatsuo Ishii wrote:
>>>>>> Thanks, but I didn't find any evidence of actual memory leak.
>>>>>> BTW,
>>>>>>>>> memqcache_max_num_cache = 10
>>>>>> looks way too small.
>>>>>>
>>>>>> Can you set to default and try again?
>>>>>>
>>>>>> memqcache_max_num_cache = 1000000
>>>>>> -- 
>>>>>> Tatsuo Ishii
>>>>>> SRA OSS, Inc. Japan
>>>>>> English: http://www.sraoss.co.jp/index_en.php
>>>>>> Japanese: http://www.sraoss.co.jp
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I attached the log file.
>>>>>>>
>>>>>>> Thanks for your Help!
>>>>>>>
>>>>>>> Sebastian
>>>>>>>
>>>>>>> On 09/05/2012 04:23 AM, Tatsuo Ishii wrote:
>>>>>>>> I couldn't reproduce your problem here.
>>>>>>>> Can you try to run with valgrind to detect memory leak?
>>>>>>>> Just in case how to use valgrind:
>>>>>>>>
>>>>>>>> 1) start pgpool by following command
>>>>>>>>        valgrind --trace-children=yes --leak-check=full pgpool [your pgpool
>>>>>>>>        options here...]
>>>>>>>>        Note that you need to -n option of pgpool and redirect pgpool log to
>>>>>>>>        somewhere.
>>>>>>>>
>>>>>>>> 2) stop pgpool.
>>>>>>>>
>>>>>>>> 3) send pgpool log.
>>>>>>>> -- 
>>>>>>>> Tatsuo Ishii
>>>>>>>> SRA OSS, Inc. Japan
>>>>>>>> English: http://www.sraoss.co.jp/index_en.php
>>>>>>>> Japanese: http://www.sraoss.co.jp
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I'm testing the new query cache (pgpool-II version 3.2.0) on CentOS 5.
>>>>>>>>>
>>>>>>>>> System Specs:
>>>>>>>>> 4 cores
>>>>>>>>> 4 GB Ram
>>>>>>>>>
>>>>>>>>> shmmax= 2147483648 (=2 GB)
>>>>>>>>>
>>>>>>>>> When I start pgpool, my memory usage looks like this:
>>>>>>>>>
>>>>>>>>> Mem: 4194304k total, 660016k used, 3534288k free, 62116k buffers
>>>>>>>>>
>>>>>>>>> My test scenario is the following:
>>>>>>>>>
>>>>>>>>> # Config:
>>>>>>>>>
>>>>>>>>> memory_cache_enabled = on
>>>>>>>>> memqcache_method = 'shmem'
>>>>>>>>> memqcache_total_size =  262144000 #250MB
>>>>>>>>> memqcache_max_num_cache = 10
>>>>>>>>> memqcache_expire = 15
>>>>>>>>> memqcache_auto_cache_invalidation = off
>>>>>>>>> memqcache_maxcache = 409600 #400kb
>>>>>>>>> memqcache_cache_block_size = 1048576 #1MB
>>>>>>>>> memqcache_oiddir = '/var/log/pgpool/oiddir'
>>>>>>>>>
>>>>>>>>> Table: data
>>>>>>>>> id integer pkey
>>>>>>>>> value character varying(255)
>>>>>>>>> [some more columns]
>>>>>>>>>
>>>>>>>>> SELECT value FROM data WHERE id =${randomNumber};
>>>>>>>>>
>>>>>>>>> so when I start the first
>>>>>>>>> while true; do psql -c "foobar ..." ; sleep 1; done
>>>>>>>>>
>>>>>>>>> everything looks good. When I start 9 more loops (= 10 selects per
>>>>>>>>> second), the used memory increases fast. After ~2 minutes it looks
>>>>>>>>> like this:
>>>>>>>>>
>>>>>>>>> Mem: 4194304k total, 2439012k used, 1755292k free, 62588k buffers
>>>>>>>>>
>>>>>>>>> after 2 more minutes:
>>>>>>>>>
>>>>>>>>> Mem: 4194304k total, 4053796k used, 140508k free, 62752k buffers
>>>>>>>>>
>>>>>>>>> When I add more clients, the machine begins to swap and dies.
>>>>>>>>>
>>>>>>>>> show pool_cache gives:
>>>>>>>>>
>>>>>>>>>      num_cache_hits | num_selects | cache_hit_ratio | num_hash_entries |
>>>>>>>>>      used_hash_entries | num_cache_entries | used_cache_enrties_size |
>>>>>>>>>      free_cache_entries_size | fragment_cache_entries_size
>>>>>>>>> ----------------+-------------+-----------------+------------------+-------------------+-------------------+-------------------------+----------------------
>>>>>>>>> -------------------+-----------------------------
>>>>>>>>>      0 | 1527 | 0.00 | 16 | 15 | 15 | 1746 | 262142254 | 0
>>>>>>>>> (1 row)
>>>>>>>>>
>>>>>>>>> So my question:
>>>>>>>>>
>>>>>>>>> What I'm doing wrong? The plan is to connect more Frontends with 300+
>>>>>>>>> statements per second.
>>>>>>>>> When I turn memory_cache_enabled off the problem does not occur.
>>>>>>>>>
>>>>>>>>> Thanks in advance!
>>>>>>>>>
>>>>>>>>> Sebastian
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> pgpool-general mailing list
>>>>>>>>> pgpool-general at pgpool.net
>>>>>>>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>> _______________________________________________
>>> pgpool-general mailing list
>>> pgpool-general at pgpool.net
>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>> _______________________________________________
>> pgpool-general mailing list
>> pgpool-general at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-general



More information about the pgpool-general mailing list