[pgpool-general: 7728] Re: Out of memory

Tatsuo Ishii ishii at sraoss.co.jp
Wed Sep 29 09:10:22 JST 2021


Ok. Let's assume 4.1.0 introduced the problem for now (it's still
possible that certain minor version of 4.1 introduced the issue, but
for now I neglect the possibility).

4.1.0 includes following major changes especially related to memory.

(1) Shared relation cache allows to reuse relation cache among
    sessions to reduce internal queries against PostgreSQL system
    catalogs.

    (this feature uses shared memory even if memory_cache_enabled = off).
    
(2) Have separate SQL parser for DML statements to eliminate
    unnecessary parsing effort.

    (this allocates additional process memory for certain query)

For (1) we can disable the feature by:

enable_shared_relcache = off

Can you try it out?

> I did those tests:
> 4.0.15 is running fine, when I upgraded to 4.1.8 it started to consume a
> lot of memory.
> 
> On Tue, Sep 28, 2021 at 8:30 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
>> Ok. 4.2.1 consumes 10 times of memory than 3.7.3 (maybe 5, not 10
>> because in your test connections of 4.2.1 is twice of 3.7.3).
>>
>> Next thing I need to know is, from which version of pgpool started to
>> consume lot of memory.  Is it possible to run the same test against
>> 4.0 and 4.1? Ideally from which version, including minor version will
>> be a lot helpful. I wish I could the test myself but I don't know to
>> do it on my environment.
>>
>> > Sure.
>> >
>> > On 3.7.3:
>> > $ ps aux | grep pgpool | awk '{sum += $6}END{print sum}'
>> > 896752
>> >
>> > On 4.2.1:
>> > $ ps aux | grep pgpool | awk '{sum += $6}END{print sum}'
>> > 9969280
>> >
>> >
>> > On Mon, Sep 27, 2021 at 11:41 PM Tatsuo Ishii <ishii at sraoss.co.jp>
>> wrote:
>> >
>> >> Ok, let's see how much RSS pgpool consume in total. Can you share the
>> >> result of the command for both pgpool 4.2 and 3.7?
>> >>
>> >> ps aux | grep pgpool | awk '{sum += $6}END{print sum}'
>> >>
>> >> > I see.
>> >> >
>> >> > The problem is, the OOM happens when all RAM is already compromised by
>> >> > pgpool (all 16Gb of RAM is dedicated to pgpool), so any common request
>> >> will
>> >> > die anyway.
>> >> >
>> >> > It happens on versions: 4.2.5 and 4.2.1. Now, I'm running the same
>> >> > application on 3.7.3 and it's running great.
>> >> >
>> >> > I don't know if it helps, but just to demonstrate the amount of RAM is
>> >> > necessary to run a few connections I created another VM, and
>> comparing:
>> >> >
>> >> > 62 active connections running on 4.2.1 (same behaviour on 4.2.5):
>> >> > $ vmstat -a -SM
>> >> > procs -----------memory---------- ---swap-- -----io---- -system--
>> >> > ------cpu-----
>> >> >  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us
>> sy
>> >> id
>> >> > wa st
>> >> >  0  0      0   4829    173  10772    0    0     2     1   68   56  1
>> 1
>> >> 99
>> >> >  0  0
>> >> >
>> >> > 31 active connections running on 3.7.3:
>> >> > $ vmstat -a -SM
>> >> > procs -----------memory---------- ---swap-- -----io---- -system--
>> >> > ------cpu-----
>> >> >  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us
>> sy
>> >> id
>> >> > wa st
>> >> >  0  0      0  15174     73    635    0    0     1     1   76    3  0
>> 0
>> >> 100
>> >> >  0  0
>> >> >
>> >> > Both are running basically the same application.
>> >> >
>> >> > Seeing the RAM consumption of 3.7.3, it does not seem to be receiving
>> >> such
>> >> > a big requests. Is there a way to track it down?
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Sep 27, 2021 at 9:07 AM Tatsuo Ishii <ishii at sraoss.co.jp>
>> wrote:
>> >> >
>> >> >> At this point I cannot judge whether the problem is caused by a
>> pgpool
>> >> >> bug or client's resource request is too much.
>> >> >>
>> >> >> Typical bug which requests too much memory allocation is not
>> something
>> >> >> like this because 33947648 = 32MB memory request itself is not
>> insane.
>> >> >>
>> >> >> > Sep 24 09:14:10 pgpool pgpool[12650]: [426-2] 2021-09-24 09:14:10:
>> pid
>> >> >> > 12650: DETAIL:  Failed on request of size 33947648.
>> >> >>
>> >> >> (Please let us know what version of Pgpool-II you are using because
>> >> >> it's important information to identify any known bug).
>> >> >>
>> >> >> In the mean time, however, I think 32MB memory request is not very
>> >> >> common in pgpool. One thing I wonder is, whether your application
>> >> >> issues SQL which requires large memory: e.g. very long SQL statement,
>> >> >> COPY for large data set. They will request large read/write buffer in
>> >> >> pgpool.
>> >> >>
>> >> >> > We saw both, but pgpool aborting is way more common:
>> >> >> > Sep 24 09:14:10 pgpool pgpool[12650]: [426-1] 2021-09-24 09:14:10:
>> pid
>> >> >> > 12650: ERROR:  out of memory
>> >> >> > Sep 24 09:14:10 pgpool pgpool[12650]: [426-2] 2021-09-24 09:14:10:
>> pid
>> >> >> > 12650: DETAIL:  Failed on request of size 33947648.
>> >> >> > Sep 24 09:14:10 pgpool pgpool[12650]: [426-3] 2021-09-24 09:14:10:
>> pid
>> >> >> > 12650: LOCATION:  mcxt.c:900
>> >> >> >
>> >> >> > Here, two other ways we saw at the logs, but those just occurred
>> once
>> >> >> each:
>> >> >> > Sep 24 07:33:14 pgpool pgpool[5874]: [434-1] 2021-09-24 07:33:14:
>> pid
>> >> >> 5874:
>> >> >> > FATAL:  failed to fork a child
>> >> >> > Sep 24 07:33:14 pgpool pgpool[5874]: [434-2] 2021-09-24 07:33:14:
>> pid
>> >> >> 5874:
>> >> >> > DETAIL:  system call fork() failed with reason: Cannot allocate
>> memory
>> >> >> > Sep 24 07:33:14 pgpool pgpool[5874]: [434-3] 2021-09-24 07:33:14:
>> pid
>> >> >> 5874:
>> >> >> > LOCATION:  pgpool_main.c:681
>> >> >> >
>> >> >> > And:
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691518] pgpool invoked
>> >> oom-killer:
>> >> >> > gfp_mask=0x24200ca, order=0, oom_score_adj=0
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691525] CPU: 1 PID: 1194
>> Comm:
>> >> >> > pgpool Not tainted 4.4.276 #1
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691527] Hardware name:
>> VMware,
>> >> >> Inc.
>> >> >> > VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00
>> >> >> > 12/12/2018
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691528]  0000000000000000
>> >> >> > ffff8803281cbae8 ffffffff81c930e7 ffff880420efe2c0
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691530]  ffff880420efe2c0
>> >> >> > ffff8803281cbb50 ffffffff81c8d9da ffff8803281cbb08
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691532]  ffffffff81133f1a
>> >> >> > ffff8803281cbb80 ffffffff81182eb0 ffff8800bba393c0
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691533] Call Trace:
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691540]
>> [<ffffffff81c930e7>]
>> >> >> > dump_stack+0x57/0x6d
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691542]
>> [<ffffffff81c8d9da>]
>> >> >> > dump_header.isra.9+0x54/0x1ae
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691547]
>> [<ffffffff81133f1a>] ?
>> >> >> > __delayacct_freepages_end+0x2a/0x30
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691553]
>> [<ffffffff81182eb0>] ?
>> >> >> > do_try_to_free_pages+0x350/0x3d0
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691556]
>> [<ffffffff811709f9>]
>> >> >> > oom_kill_process+0x209/0x3c0
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691558]
>> [<ffffffff81170eeb>]
>> >> >> > out_of_memory+0x2db/0x2f0
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691561]
>> [<ffffffff81176111>]
>> >> >> > __alloc_pages_nodemask+0xa81/0xae0
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691565]
>> [<ffffffff811ad2cd>]
>> >> >> > __read_swap_cache_async+0xdd/0x130
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691567]
>> [<ffffffff811ad337>]
>> >> >> > read_swap_cache_async+0x17/0x40
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691569]
>> [<ffffffff811ad455>]
>> >> >> > swapin_readahead+0xf5/0x190
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691571]
>> [<ffffffff8119ce3f>]
>> >> >> > handle_mm_fault+0xf3f/0x15e0
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691574]
>> [<ffffffff81c9d4e2>] ?
>> >> >> > __schedule+0x272/0x770
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691576]
>> [<ffffffff8104e241>]
>> >> >> > __do_page_fault+0x161/0x370
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691577]
>> [<ffffffff8104e49c>]
>> >> >> > do_page_fault+0xc/0x10
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691579]
>> [<ffffffff81ca3782>]
>> >> >> > page_fault+0x22/0x30
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691581] Mem-Info:
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691584] active_anon:3564689
>> >> >> > inactive_anon:445592 isolated_anon:0
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691584]  active_file:462
>> >> >> > inactive_file:44 isolated_file:0
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691584]  unevictable:0
>> dirty:2
>> >> >> > writeback:2212 unstable:0
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691584]
>> slab_reclaimable:3433
>> >> >> > slab_unreclaimable:5859
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691584]  mapped:989
>> shmem:2607
>> >> >> > pagetables:16367 bounce:0
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691584]  free:51773
>> >> free_pcp:189
>> >> >> > free_cma:0
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691589] DMA free:15904kB
>> >> min:128kB
>> >> >> > low:160kB high:192kB active_anon:0kB inactive_anon:0kB
>> active_file:0kB
>> >> >> > inactive_file:0kB unevictable:0kB isolated(anon):0kB
>> >> isolated(file):0kB
>> >> >> > present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB
>> >> >> > mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB
>> >> >> > kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB
>> free_pcp:0kB
>> >> >> > local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0
>> >> >> > all_unreclaimable? yes
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691590] lowmem_reserve[]: 0
>> >> 2960
>> >> >> > 15991 15991
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691596] DMA32 free:77080kB
>> >> >> > min:25000kB low:31248kB high:37500kB active_anon:2352676kB
>> >> >> > inactive_anon:590536kB active_file:472kB inactive_file:140kB
>> >> >> > unevictable:0kB isolated(anon):0kB isolated(file):0kB
>> >> present:3129216kB
>> >> >> > managed:3043556kB mlocked:0kB dirty:0kB writeback:4144kB
>> mapped:1140kB
>> >> >> > shmem:3568kB slab_reclaimable:1028kB slab_unreclaimable:3004kB
>> >> >> > kernel_stack:816kB pagetables:10988kB unstable:0kB bounce:0kB
>> >> >> > free_pcp:312kB local_pcp:196kB free_cma:0kB writeback_tmp:0kB
>> >> >> > pages_scanned:4292 all_unreclaimable? yes
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691597] lowmem_reserve[]: 0
>> 0
>> >> >> 13031
>> >> >> > 13031
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691601] Normal free:114108kB
>> >> >> > min:110032kB low:137540kB high:165048kB active_anon:11906080kB
>> >> >> > inactive_anon:1191832kB active_file:1376kB inactive_file:36kB
>> >> >> > unevictable:0kB isolated(anon):0kB isolated(file):0kB
>> >> present:13631488kB
>> >> >> > managed:13343784kB mlocked:0kB dirty:8kB writeback:4704kB
>> >> mapped:2816kB
>> >> >> > shmem:6860kB slab_reclaimable:12704kB slab_unreclaimable:20432kB
>> >> >> > kernel_stack:4848kB pagetables:54480kB unstable:0kB bounce:0kB
>> >> >> > free_pcp:444kB local_pcp:196kB free_cma:0kB writeback_tmp:0kB
>> >> >> > pages_scanned:105664 all_unreclaimable? yes
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691602] lowmem_reserve[]: 0
>> 0
>> >> 0 0
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691603] DMA: 0*4kB 0*8kB
>> 0*16kB
>> >> >> > 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U)
>> >> >> 1*2048kB
>> >> >> > (U) 3*4096kB (M) = 15904kB
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691610] DMA32: 48*4kB (ME)
>> >> 63*8kB
>> >> >> > (ME) 62*16kB (E) 46*32kB (UME) 35*64kB (UME) 26*128kB (UME)
>> 25*256kB
>> >> >> (UME)
>> >> >> > 61*512kB (UME) 28*1024kB (UME) 1*2048kB (M) 0*4096kB = 77080kB
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691616] Normal: 165*4kB
>> (MEH)
>> >> >> > 317*8kB (UMEH) 442*16kB (UMEH) 289*32kB (UMEH) 162*64kB (UMEH)
>> >> 121*128kB
>> >> >> > (UMEH) 74*256kB (UMEH) 33*512kB (UMEH) 24*1024kB (ME) 0*2048kB
>> >> 2*4096kB
>> >> >> (M)
>> >> >> > = 113980kB
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691623] 5552 total pagecache
>> >> pages
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691624] 2355 pages in swap
>> >> cache
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691625] Swap cache stats:
>> add
>> >> >> > 5385308, delete 5382953, find 1159094/1325033
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691626] Free swap  = 0kB
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691626] Total swap =
>> 4194300kB
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691627] 4194174 pages RAM
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691628] 0 pages
>> >> >> HighMem/MovableOnly
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691628] 93363 pages reserved
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691989] Out of memory: Kill
>> >> >> process
>> >> >> > 8975 (pgpool) score 7 or sacrifice child
>> >> >> > Sep 23 17:07:40 pgpool kernel: [157160.691995] Killed process 8975
>> >> >> (pgpool)
>> >> >> > total-vm:337504kB, anon-rss:166824kB, file-rss:1920kB
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Mon, Sep 27, 2021 at 1:22 AM Tatsuo Ishii <ishii at sraoss.co.jp>
>> >> wrote:
>> >> >> >
>> >> >> >> Hi,
>> >> >> >>
>> >> >> >> > Hello,
>> >> >> >> >
>> >> >> >> > Our pgpool is consuming A LOT of memory and frequently dies
>> with:
>> >> Out
>> >> >> of
>> >> >> >> > memory error.
>> >> >> >> >
>> >> >> >> > We have 2 backends, 1 master and 1 slave. Here some config:
>> >> >> >> > num_init_children = 150
>> >> >> >> > max_pool = 1
>> >> >> >> > child_life_time = 300
>> >> >> >> > child_max_connections = 1
>> >> >> >> > connection_life_time = 0
>> >> >> >> > client_idle_limit = 0
>> >> >> >> > connection_cache = on
>> >> >> >> > load_balance_mode = on
>> >> >> >> > memory_cache_enabled = off
>> >> >> >> >
>> >> >> >> > RAM: 16Gb
>> >> >> >> >
>> >> >> >> > Does anyone have a clue what's going on?
>> >> >> >> >
>> >> >> >> > Thank you.
>> >> >> >>
>> >> >> >> Is that OOM killer or pgpool itself aborted with an out of memory
>> >> >> >> error?  If latter, can you share the pgpool log?
>> >> >> >> --
>> >> >> >> Tatsuo Ishii
>> >> >> >> SRA OSS, Inc. Japan
>> >> >> >> English: http://www.sraoss.co.jp/index_en.php
>> >> >> >> Japanese:http://www.sraoss.co.jp
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Luiz Fernando Pasqual S. Souza
>> >> >> > mail: luiz at pasquall.com
>> >> >>
>> >> >
>> >> >
>> >> > --
>> >> > Luiz Fernando Pasqual S. Souza
>> >> > mail: luiz at pasquall.com
>> >>
>> >
>> >
>> > --
>> > Luiz Fernando Pasqual S. Souza
>> > mail: luiz at pasquall.com
>>
> 
> 
> -- 
> Luiz Fernando Pasqual S. Souza
> mail: luiz at pasquall.com


More information about the pgpool-general mailing list