[pgpool-general: 960] Re: Query Memory Cache goes OOM

Tatsuo Ishii ishii at postgresql.org
Wed Sep 5 21:10:34 JST 2012


I have no problem with default pgpool.conf with memory cache enabled.
 After 5 minutes run of "pgbench -n -S -p 11002 -c 32 -T 300 test"

$ LANG=C free
             total       used       free     shared    buffers     cached
Mem:       8077568    6628312    1449256          0     274888    4472472
-/+ buffers/cache:    1880952    6196616
Swap:      8835712         24    8835688

By using included script (I found from: http://d.hatena.ne.jp/naoya/20080727/1217119867)
which is basically looking into /proc/PID/status:

PID	RSS	SHARED
9623	35448	35232 (99%)

That means pgpool process 9623 uses 35448KB virtual memory and 35234KB
out of 35448KB is shared memory.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> I've waited ~30 minutes after the machine was unresponsive to check if
> it came back - and it came. pgpool log says:
> 
> 2012-09-05 12:37:25 LOG: pid 8812: statement: SELECT value FROM data
> WHERE id =16239;
> 2012-09-05 13:03:53 ERROR: pid 12575: pool_read: EOF encountered with
> backend
> 2012-09-05 13:03:55 ERROR: pid 12575: pool_read_kind: error while
> reading message kind
> 2012-09-05 13:04:05 ERROR: pid 13114: SimpleForwardToFrontend:
> pool_write_and_flush failed
> 2012-09-05 13:04:05 LOG: pid 13114: do_child: exits with status 1 due
> to error
> [.. this comes a few times, then ..]
> 2012-09-05 13:04:08 LOG: pid 8730: do_child: exits with status 1 due
> to error
> 2012-09-05 13:04:09 ERROR: pid 8812: SimpleForwardToFrontend:
> pool_write_and_flush failed
> 2012-09-05 13:04:15 LOG: pid 8812: do_child: exits with status 1 due
> to error
> 2012-09-05 13:04:47 LOG:   pid 9338: authentication is timeout
> 2012-09-05 13:04:54 LOG: pid 9691: connection received: host=[...]
> port=42784
> 2012-09-05 13:04:54 LOG: pid 960: PCP child 1596 exits with status 9
> by signal 9
> 2012-09-05 13:04:56 LOG:   pid 960: fork a new PCP child pid 14474
> 2012-09-05 13:04:56 LOG: pid 960: worker child 1597 exits with status
> 9 by signal 9
> 2012-09-05 13:04:56 LOG:   pid 960: fork a new worker child pid 14475
> 2012-09-05 13:05:01 LOG:   pid 3927: statement: show pool_cache;
> 
> syslog says:
> 
> 2012 Sep 5 13:04:11 pgpool-1 [err]: kernel: Out of memory: Killed
> process 1596, UID 0, (pgpool).
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: pgpool invoked
> oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0
> 2012 Sep  5 13:04:11 pgpool-1 [warning]: kernel:
> 2012 Sep  5 13:04:11 pgpool-1 [warning]: kernel: Call Trace:
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff802c1b64>]
> out_of_memory+0x8b/0x203
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff8020fa5d>]
> __alloc_pages+0x27f/0x308
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff80211c32>]
> do_wp_page+0x3e9/0x8e8
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff802336b6>]
> read_swap_cache_async+0x42/0xd1
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff802c6c25>]
> swapin_readahead+0x4e/0x77
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff8020996a>]
> __handle_mm_fault+0xf11/0x144f
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff80231f76>]
> release_sock+0x13/0xbe
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff80207116>]
> kmem_cache_free+0x84/0xd7
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff80207116>]
> kmem_cache_free+0x84/0xd7
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff80263929>]
> _spin_lock_irqsave+0x9/0x14
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff80266d94>]
> do_page_fault+0xf72/0x131b
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff8041e973>]
> sys_accept+0x1b6/0x1d2
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff80207116>]
> kmem_cache_free+0x84/0xd7
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff80263929>]
> _spin_lock_irqsave+0x9/0x14
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff802956b6>]
> recalc_sigpending+0xe/0x25
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: [<ffffffff8025f82b>]
> error_exit+0x0/0x6e
> 2012 Sep  5 13:04:11 pgpool-1 [warning]: kernel:
> 2012 Sep  5 13:04:11 pgpool-1 [info]: kernel: Mem-info:
> 2012 Sep  5 13:04:11 pgpool-1 [warning]: kernel: DMA per-cpu:
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 0 hot: high 0,
> batch 1 used:0
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 0 cold: high 0,
> batch 1 used:0
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 1 hot: high 0,
> batch 1 used:0
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 1 cold: high 0,
> batch 1 used:0
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 2 hot: high 0,
> batch 1 used:0
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 2 cold: high 0,
> batch 1 used:0
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 3 hot: high 0,
> batch 1 used:0
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 3 cold: high 0,
> batch 1 used:0
> 2012 Sep  5 13:04:11 pgpool-1 [warning]: kernel: DMA32 per-cpu:
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 0 hot: high 186,
> batch 31 used:74
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 0 cold: high 62,
> batch 15 used:16
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 1 hot: high 186,
> batch 31 used:46
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 1 cold: high 62,
> batch 15 used:42
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 2 hot: high 186,
> batch 31 used:10
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 2 cold: high 62,
> batch 15 used:46
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 3 hot: high 186,
> batch 31 used:29
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: cpu 3 cold: high 62,
> batch 15 used:52
> 2012 Sep  5 13:04:11 pgpool-1 [warning]: kernel: Normal per-cpu: empty
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: HighMem per-cpu: empty
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: Free pages: 10124kB
> (0kB HighMem)
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: Active:3296
> inactive:947386 dirty:0 writeback:0 unstable:0 free:2531 slab:8265
> mapped-file:2016 mapped-anon:950927 pagetables:33607
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: DMA free:2040kB
> min:16kB low:20kB high:24kB active:0kB inactive:0kB present:9076kB
> pages_scanned:0 all_unreclaimable? yes
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: lowmem_reserve[]: 0
> 4024 4024 4024
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: DMA32 free:8084kB
> min:8108kB low:10132kB high:12160kB active:13184kB inactive:3789544kB
> present:4120800kB pages_scanned:11144945 all_unreclaimable? yes
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: lowmem_reserve[]: 0 0
> 0 0
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: Normal free:0kB
> min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB
> pages_scanned:0 all_unreclaimable? no
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: lowmem_reserve[]: 0 0
> 0 0
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: HighMem free:0kB
> min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB
> pages_scanned:0 all_unreclaimable? no
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: lowmem_reserve[]: 0 0
> 0 0
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: DMA: 0*4kB 1*8kB
> 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB
> 0*4096kB = 2040kB
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: DMA32: 17*4kB 4*8kB
> 3*16kB 0*32kB 0*64kB 2*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB
> 1*4096kB = 8084kB
> 2012 Sep  5 13:04:11 pgpool-1 [warning]: kernel: Normal: empty
> 2012 Sep  5 13:04:11 pgpool-1 [warning]: kernel: HighMem: empty
> 2012 Sep  5 13:04:11 pgpool-1 [warning]: kernel: 26511 pagecache pages
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: Swap cache: add
> 3128627, delete 3120367, find 1948467/2124994, race 35+122
> 2012 Sep  5 13:04:11 pgpool-1 [warning]: kernel: Free swap  = 0kB
> 2012 Sep 5 13:04:11 pgpool-1 [warning]: kernel: Total swap = 1048568kB
> 2012 Sep 5 13:04:11 pgpool-1 [info]: kernel: Free swap: 0kB
> 2012 Sep  5 13:04:11 pgpool-1 [info]: kernel: 1048576 pages of RAM
> 2012 Sep  5 13:04:11 pgpool-1 [info]: kernel: 23824 reserved pages
> 2012 Sep  5 13:04:11 pgpool-1 [info]: kernel: 159440 pages shared
> 2012 Sep  5 13:04:11 pgpool-1 [info]: kernel: 8394 pages swap cached
> 
> 
> 
> On 09/05/2012 12:24 PM, Sebastian Kornehl wrote:
>> It's a fresh set up just for playing around with pgpool, the Database
>> and pgpool have their own machines with no other services.
>>
>> -Sebastian
>>
>> On 09/05/2012 12:16 PM, Tatsuo Ishii wrote:
>>>> I changed this, actual I test this with 20 loops. This seems to hold,
>>>> but still uses lot of memory (and swap).
>>>>
>>>> Maybe I just misunderstand the config values, is
>>>> "memqcache_total_size" a per connection configuration or a total
>>>> value?
>>> No. Because it sits on shared memory, it is shared by all connections.
>>> Are there any servers running on the machine other than pgpool?
>>> -- 
>>> Tatsuo Ishii
>>> SRA OSS, Inc. Japan
>>> English: http://www.sraoss.co.jp/index_en.php
>>> Japanese: http://www.sraoss.co.jp
>>>
>>>> show pool_cache gives now:
>>>>
>>>>   num_cache_hits | num_selects | cache_hit_ratio | num_hash_entries |
>>>>   used_hash_entries | num_cache_entries | used_cache_enrties_size |
>>>>   free_cache_entries_size | fragment_cache_entries_size
>>>> ----------------+-------------+-----------------+------------------+-------------------+-------------------+-------------------------+----------------------
>>>> -------------------+-----------------------------
>>>>   137 | 17411 | 0.01 | 1048576 | 13026 | 13026 | 1512533 | 260121555
>>>>     | 299816
>>>>
>>>> With 20 selects per second (and 15 seconds until memqcache_expire) I
>>>> would say there should be ~300 entries in the cache and not 13026.
>>>>
>>>> With 31 selects per second 'top' looks like this:
>>>>
>>>> Mem: 4194304k total, 4180556k used, 13748k free, 1144k buffers
>>>> Swap: 1048568k total, 1048560k used, 8k free, 160984k cached
>>>>
>>>> Thanks,
>>>>
>>>> Sebastian
>>>>
>>>> On 09/05/2012 10:56 AM, Tatsuo Ishii wrote:
>>>>> Thanks, but I didn't find any evidence of actual memory leak.
>>>>> BTW,
>>>>>>>> memqcache_max_num_cache = 10
>>>>> looks way too small.
>>>>>
>>>>> Can you set to default and try again?
>>>>>
>>>>> memqcache_max_num_cache = 1000000
>>>>> -- 
>>>>> Tatsuo Ishii
>>>>> SRA OSS, Inc. Japan
>>>>> English: http://www.sraoss.co.jp/index_en.php
>>>>> Japanese: http://www.sraoss.co.jp
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I attached the log file.
>>>>>>
>>>>>> Thanks for your Help!
>>>>>>
>>>>>> Sebastian
>>>>>>
>>>>>> On 09/05/2012 04:23 AM, Tatsuo Ishii wrote:
>>>>>>> I couldn't reproduce your problem here.
>>>>>>> Can you try to run with valgrind to detect memory leak?
>>>>>>> Just in case how to use valgrind:
>>>>>>>
>>>>>>> 1) start pgpool by following command
>>>>>>>       valgrind --trace-children=yes --leak-check=full pgpool [your pgpool
>>>>>>>       options here...]
>>>>>>>       Note that you need to -n option of pgpool and redirect pgpool log to
>>>>>>>       somewhere.
>>>>>>>
>>>>>>> 2) stop pgpool.
>>>>>>>
>>>>>>> 3) send pgpool log.
>>>>>>> -- 
>>>>>>> Tatsuo Ishii
>>>>>>> SRA OSS, Inc. Japan
>>>>>>> English: http://www.sraoss.co.jp/index_en.php
>>>>>>> Japanese: http://www.sraoss.co.jp
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I'm testing the new query cache (pgpool-II version 3.2.0) on CentOS 5.
>>>>>>>>
>>>>>>>> System Specs:
>>>>>>>> 4 cores
>>>>>>>> 4 GB Ram
>>>>>>>>
>>>>>>>> shmmax= 2147483648 (=2 GB)
>>>>>>>>
>>>>>>>> When I start pgpool, my memory usage looks like this:
>>>>>>>>
>>>>>>>> Mem: 4194304k total, 660016k used, 3534288k free, 62116k buffers
>>>>>>>>
>>>>>>>> My test scenario is the following:
>>>>>>>>
>>>>>>>> # Config:
>>>>>>>>
>>>>>>>> memory_cache_enabled = on
>>>>>>>> memqcache_method = 'shmem'
>>>>>>>> memqcache_total_size =  262144000 #250MB
>>>>>>>> memqcache_max_num_cache = 10
>>>>>>>> memqcache_expire = 15
>>>>>>>> memqcache_auto_cache_invalidation = off
>>>>>>>> memqcache_maxcache = 409600 #400kb
>>>>>>>> memqcache_cache_block_size = 1048576 #1MB
>>>>>>>> memqcache_oiddir = '/var/log/pgpool/oiddir'
>>>>>>>>
>>>>>>>> Table: data
>>>>>>>> id integer pkey
>>>>>>>> value character varying(255)
>>>>>>>> [some more columns]
>>>>>>>>
>>>>>>>> SELECT value FROM data WHERE id =${randomNumber};
>>>>>>>>
>>>>>>>> so when I start the first
>>>>>>>> while true; do psql -c "foobar ..." ; sleep 1; done
>>>>>>>>
>>>>>>>> everything looks good. When I start 9 more loops (= 10 selects per
>>>>>>>> second), the used memory increases fast. After ~2 minutes it looks
>>>>>>>> like this:
>>>>>>>>
>>>>>>>> Mem: 4194304k total, 2439012k used, 1755292k free, 62588k buffers
>>>>>>>>
>>>>>>>> after 2 more minutes:
>>>>>>>>
>>>>>>>> Mem: 4194304k total, 4053796k used, 140508k free, 62752k buffers
>>>>>>>>
>>>>>>>> When I add more clients, the machine begins to swap and dies.
>>>>>>>>
>>>>>>>> show pool_cache gives:
>>>>>>>>
>>>>>>>>     num_cache_hits | num_selects | cache_hit_ratio | num_hash_entries |
>>>>>>>>     used_hash_entries | num_cache_entries | used_cache_enrties_size |
>>>>>>>>     free_cache_entries_size | fragment_cache_entries_size
>>>>>>>> ----------------+-------------+-----------------+------------------+-------------------+-------------------+-------------------------+----------------------
>>>>>>>> -------------------+-----------------------------
>>>>>>>>     0 | 1527 | 0.00 | 16 | 15 | 15 | 1746 | 262142254 | 0
>>>>>>>> (1 row)
>>>>>>>>
>>>>>>>> So my question:
>>>>>>>>
>>>>>>>> What I'm doing wrong? The plan is to connect more Frontends with 300+
>>>>>>>> statements per second.
>>>>>>>> When I turn memory_cache_enabled off the problem does not occur.
>>>>>>>>
>>>>>>>> Thanks in advance!
>>>>>>>>
>>>>>>>> Sebastian
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> pgpool-general mailing list
>>>>>>>> pgpool-general at pgpool.net
>>>>>>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>
>> _______________________________________________
>> pgpool-general mailing list
>> pgpool-general at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-general
> 
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
-------------- next part --------------
#!/usr/bin/env perl
use strict;
use warnings;
use Linux::Smaps;

@ARGV or die "usage: %0 [pid ...]";

printf "PID\tRSS\tSHARED\n";

for my $pid (@ARGV) {
    my $map = Linux::Smaps->new($pid);
    unless ($map) {
        warn $!;
        next;
    }

    printf
        "%d\t%d\t%d (%d%%)\n",
        $pid,
        $map->rss,
        $map->shared_dirty + $map->shared_clean,
        int((($map->shared_dirty + $map->shared_clean) / $map->rss) * 100)
}


More information about the pgpool-general mailing list