[Pgpool-general] [SOLVED] pgpool detected difference of the number of inserted, updated or deleted tuples.

Wed Dec 22 11:36:04 UTC 2010

Hello,
We have been struggling with errors like the below syslog snippet, when heavy
loading the replication pgpool and backends:

Dec  17 13:43:13 Pgpool01 pgpool: ERROR: pid 22957: pgpool detected difference
of the number of inserted, updated or deleted tuples. Possible last query was:
"DELETE FROM phpgw_preferences WHERE preference_owner='-2' and preference_app <>
'jabberit_messenger'"

We isolated the problem with a simple test script, controlling the "concurrency
jitter" using floating point argument sleep function :

#############################
# AFM 20dec2010 testing concurrency jitter at backends
#!/bin/bash
time for ((i=1; i <= 5; i++))
do
{
sleep 0.005;
psql -h pgpoolserver -p 5432 -o /tmp/psqloutput.txt -U "postgres" -d
"expresso-nuvens" --quiet < nuvem_queries_01_usuarios_transacao.sql &
}
done;
echo "script finished";

###############################

The sql file contains statements captured from the pgpool log.
If we used sleep above 0.230 (at our sql test case and hw), the problem does not
happen (because the "concurrent sql statements" are enough apart in time line).
When we modified the sql to transform it entirely into one single transaction
(instead of a few groups of transactions) the problem becomes even more easy to see.

Then, after reading postgresql 8.3 docs and pgpool docs and list archive thread
[0] we realized that the default transaction table lock mode for inserts,
updates, deletes and transactions was not enough with pgpool under heavy
concurrent load, given pgpool concurrent replication operation characteristics
and postgresql backends concurrent operation characteristics.

Actually, the problem is not exactly "heavy load" but concurrency timeline of
queries. Our test script showed this with **only 2** processes executing the
queries at **same** time (tested at msec level).
At our real world app, this happens when increasing load, seeming ramdom at some
given threshold level.

When we transformed the test sql into a big single transaction with other table
lock mode at the troubled table (the phpgw_access_log was an overkill but as it
was updated too, we tried it):

########
begin;
lock table phpgw_access_log, phpgw_preferences in share row exclusive mode;

sql statements.....

commit;
discard all;

###############

After these simple tests, we modified our php application at webservers to
include the following snippet when the troubled table was modified:

#######
begin;
lock table phpgw_preferences in share row exclusive mode;

......

commit;
##########

Our real app first day tests were good: not one single error even when under
100% cpu pg backends.
We are aware of some write performance penalty at such lock mode and it should
be used only when and at needed situations.

Is there a better approach?
How to proceed when is not possible to modify the client application (some
proprietary code)?

Is it possible to pgpool rewrite queries to change table lock mode?
Even single statements not explicitly inside a begin/commit block (as each
statement is an atomic transaction itself)?
What are the consequences of this approach?

We are using an unreleased pgpool 2.3.4 from cvs, packaged in house for Debian
5.x and postgresql 8.3.12 from repository, but this issue should be useful for
the current cvs head too.

Regards.
Andre Felipe Machado
visit my linux blog
http://www.techforce.com.br

[0] http://lists.pgfoundry.org/pipermail/pgpool-general/2010-November/003066.html