[Pgpool-general] INSERT slow on one system not on another

Wed Mar 4 20:04:38 UTC 2009

Hi all,

Well, you know what, always question your assumptions...

I finally tracked it down. (I've been chasing this for over a week now - 
so not just something that popped up today).

There had been a rule added to the DB on the production system (yup - by 
me - I forgot :-( ) to do a replace instead of an insert - which of 
course requires a select first in postgres

Something like this:

     replace_table_A AS
     ON INSERT TO table_A
    WHERE (EXISTS ( SELECT 1
            FROM table_A
           WHERE table_A.abc ~~ new.abc::text AND table_A.def ~~ 
new.def::text AND table_A.my_type ~~ new.my_type::text)) DO INSTEAD 
UPDATE table_A SET pqr = new.pqr, stu = new.stu, deflink = new.deflink, 
my_type = new.my_type, ep_type = new.ep_type
   WHERE table_A.abc ~~ new.abc::text AND table_A.def ~~ new.def::text 
AND table_A.my_type ~~ new.my_type::text

Now, I'll need to remember why that rule got added in the 1st place... 
but ... hey - I think I can live without it for the moment in current 
clients.

Sorry to have bothered everyone, but thank you very much for the help. 
It jarred me in the direction of questioning my assumptions. Got there.

Kind regards

Derek.

Marcelo Martins wrote:
> No sure If i missed it but what pgpool-II version you are using ?
> 
> -
> Marcelo
> 
> On Mar 4, 2009, at 9:15, Derek Jones <scunacc at yahoo.com> wrote:
> 
>> Hi folks,
>>
>> I've been using pgpool II with Postgres 8.2.5 successfully for quite a
>> while in load-balanced mode on a cluster, but I've just noticed a 
>> problem.
>>
>> I have two identical DB setups on two different clusters.
>>
>> 4 nodes + one node for pgpool. Clients connect from other nodes.
>>
>> On one cluster, the INSERTs happen in reasonable time, such that several
>> hundred thousand inserts take place over a period of a few minutes
>> (other processing happening too).
>>
>> On the other, it might take, say, 12 pqr...
>>
>> It seems to get stuck in a LOCK TABLE waiting with an INSERT maxed out
>> on one CPU core. I have tried turned off locking because these tables
>> have no serial data type, and it doesn't make much difference, just Idle
>> waiting instead - it's the INSERT that's taking time, not a locking
>> issue per se.
>>
>> There is a difference in the way the systems (Linux) are configured, and
>> that is the only difference between them but I can't explain the long
>> INSERT times in spite of that.
>>
>> I am the cluster admin - I have a significant UNIX & Linux background as
>> a sysadmin, programmer, system configurator etc. and have set up these
>> clusters, so I have a reasonable idea of what's going on under the hood
>> in terms of locking semantics and actions, disk I/O, caching , etc. and
>> still can't reasonably explain this at the moment.
>>
>> OK. The difference:
>>
>> One cluster (the good one) has each node booting off of a local disk,
>> and the postgres partition (on the postgres backend nodes) mounted from
>> a local disk partition. The Linux kernel rev is: 2.6.22.9
>>
>> df
>>
>> Filesystem            Size  Used Avail Use% Mounted on
>> /dev/sda1             226G   45G  170G  21% /
>> ....
>>
>> df /postgres/
>>
>> Filesystem            Size  Used Avail Use% Mounted on
>> /dev/sda1             226G   45G  170G  21% /
>>
>> The other cluster (the bad one), has each node NFS root booting via PXE,
>> so, the root partition is NFS.
>>
>> df
>> Filesystem            Size  Used Avail Use% Mounted on
>> master2:/tftpboot/192.168.5.209
>>                        26G   23G  2.2G  92% /
>> master2:/usr           26G   23G  2.2G  92% /usr
>> master2:/home          26G   23G  2.2G  92% /home
>> .....
>> /dev/sda3              51G   41G  7.6G  85% /postgres
>>
>>
>> and is running kernel rev: 2.6.12-12 with some customizations for NFS
>> booting, high BW net connections, etc.
>>
>> The disk on the good cluster is SATA. The disk on the bad cluster is 
>> SCSI.
>>
>> Does anyone have any idea what might cause the one to be soooo slow?
>>
>> I am wondering if, because I am mounting /postgres in the 2nd example on
>> top of a mount point that is itself NFS-mounted, that I am actually
>> seeing some weird interaction with NFS happening to slow things down. I
>> don't see that in terms of disk or net activity tho'.
>>
>> On the postgres (bad) cluster nodes of course, lib, usr/lib etc. are all
>> NFS mounted as well as you see above, but unless there's something
>> really kooky happening about shared libs being endlessly pulled (which I
>> don't see) or continual NFS access that way, then I don't see the
>> connection.
>>
>> An strace shows lots of __llseek SEEK_END activity as the thing is 
>> "stuck"
>>
>> I've tried changing things like mount options, turning off atime
>> updates, mounting sync vs. async, turning off WAL updates with fsync in
>> postgres. Etc. Nothing changes.
>>
>> Given the 2 difference system configs I would understand if the NFS
>> setup on the bad one - NFS booting / root FS -  being the issue, but I
>> wonder if there's a pgpool / postgres config tweak I could make that
>> would fix this instead of having to completely reconfigure a cluster to
>> be locally booting (which, being it's a production cluster would be
>> pretty prohibitive).
>>
>> Any help much appreciated. (I have trawled lists and googled and come up
>> with nothing that seems focused on this issue).
>>
>> Kind regards
>>
>> Derek.
>>
>>
>> _______________________________________________
>> Pgpool-general mailing list
>> Pgpool-general at pgfoundry.org
>> http://pgfoundry.org/mailman/listinfo/pgpool-general
>