[Pgpool-general] autovacuum stuck on a table for 18+ hours, consuming lots of CPU time

Wed Nov 23 02:16:34 UTC 2011

I think you'd better post to pgsql-general since your question is
nothing related to pgpool.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Greetings,
> I'm running PostgreSQL-9.0.4 on a Linux-x86_64 cluster with 1 master,
> and two streaming replication slaves.  Since late yesterday, the load
> on the server has been noticably higher (5.00+) than normal (generally
> under 1.00).  I investigated, and found that for the past ~18 hours,
> there's one autovacuum process that has been running, and not making
> any obvious progress:
> 
> select procpid,query_start,current_query from pg_stat_activity where
> current_query LIKE 'autovacuum%' ;
>  procpid |          query_start          |              current_query
> ---------+-------------------------------+-----------------------------------------
>    30188 | 2011-11-21 22:42:26.426315-08 | autovacuum: VACUUM
> public.nppsmoketests
> (1 row)
> 
> select c.oid,c.relname,l.pid,l.mode,l.granted from pg_class c join
> pg_locks l on c.oid=l.relation where l.pid='30188' order by l.pid;
>   oid  |              relname               |  pid  |           mode
>         | granted
> -------+------------------------------------+-------+--------------------------+---------
>  72112 | nppsmoketests                      | 30188 |
> ShareUpdateExclusiveLock | t
>  72617 | nppsmoketests_pkey                 | 30188 | RowExclusiveLock
>         | t
>  72619 | nppsmoketests_bug_idx              | 30188 | RowExclusiveLock
>         | t
>  72620 | nppsmoketests_active_idx           | 30188 | RowExclusiveLock
>         | t
>  72621 | nppsmoketests_arch_idx             | 30188 | RowExclusiveLock
>         | t
>  72622 | nppsmoketests_branch_idx           | 30188 | RowExclusiveLock
>         | t
>  72623 | nppsmoketests_current_status_idx   | 30188 | RowExclusiveLock
>         | t
>  72624 | nppsmoketests_build_type_idx       | 30188 | RowExclusiveLock
>         | t
>  72625 | nppsmoketests_gpu_idx              | 30188 | RowExclusiveLock
>         | t
>  72626 | nppsmoketests_os_idx               | 30188 | RowExclusiveLock
>         | t
>  72627 | nppsmoketests_owner_idx            | 30188 | RowExclusiveLock
>         | t
>  72628 | nppsmoketests_regressioncl_idx     | 30188 | RowExclusiveLock
>         | t
>  72629 | nppsmoketests_subtest_idx          | 30188 | RowExclusiveLock
>         | t
>  72630 | nppsmoketests_suiteid_idx          | 30188 | RowExclusiveLock
>         | t
>  72631 | nppsmoketests_suiteid_testname_idx | 30188 | RowExclusiveLock
>         | t
>  72632 | nppsmoketests_testcl_idx           | 30188 | RowExclusiveLock
>         | t
>  72633 | nppsmoketests_testname_idx         | 30188 | RowExclusiveLock
>         | t
>  80749 | nppsmoketests_osversion_idx        | 30188 | RowExclusiveLock
>         | t
> (18 rows)
> 
> When I strace PID 30188, I see tons of this scrolling past quickly,
> but I'm not really sure what it means beyond a 'Timeout' not looking
> good:
> select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout)
> lseek(95, 753901568, SEEK_SET)          = 753901568
> read(95, "\202\1\0\0\260\315\250\245\1\0\0\0\220\0\360\20\360\37\4
> \0\0\0\0p\237\0\1\360\236\0\1"..., 8192) = 8192
> lseek(95, 753917952, SEEK_SET)          = 753917952
> read(95, "\202\1\0\0 N\253\245\1\0\0\0\220\0\360\20\360\37\4
> \0\0\0\0p\237\0\1\360\236\0\1"..., 8192) = 8192
> select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout)
> lseek(95, 768606208, SEEK_SET)          = 768606208
> read(95, "\204\1\0\0h!~\233\1\0\0\0\230\0\360\20\360\37\4
> \0\0\0\0x\237\360\0\0\237\360\0"..., 8192) = 8192
> lseek(95, 753934336, SEEK_SET)          = 753934336
> read(95, "\202\1\0\0 &\275\245\1\0\0\0\220\0\360\20\360\37\4
> \0\0\0\0p\237\0\1\360\236\0\1"..., 8192) = 8192
> select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout)
> read(95, "\202\1\0\0\10\33\276\245\1\0\0\0\220\0\360\20\360\37\4
> \0\0\0\0p\237\0\1\360\236\0\1"..., 8192) = 8192
> lseek(95, 753958912, SEEK_SET)          = 753958912
> read(95, "\202\1\0\0x\317\307\245\1\0\0\0\220\0\360\20\360\37\4
> \0\0\0\0p\237\0\1\360\236\0\1"..., 8192) = 8192
> select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout)
> lseek(95, 768614400, SEEK_SET)          = 768614400
> 
> An old thread suggests that this is a stuck spinlock:
> http://archives.postgresql.org/pgsql-performance/2009-05/msg00455.php
> 
> I'm using the defaults for all the *vacuum* options in
> postgresql.conf, except for:
> log_autovacuum_min_duration = 2500
> 
> At this point, I'm not sure what I can safely do to either terminate
> this autovacuum process, or kick it into making progress again?
> 
> thanks
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgpool-general