[Pgpool-general] Replication Flawed?

Wed May 17 02:42:09 UTC 2006

> Hi,
> 
> I have been evaluating PgPool as part of a study of the available open-source HA solutions but believe I have come across a fundamental flaw in its replication features.
> 
> As part of the evaluation I conducted a test of PgPool on my local machine as follows:
> 
> PgPool listening on port 9999
> *Master - PostgreSQL (instance 1) listening on port 5432
> *Secondary - PostgreSQL (instance 2) listening on port 5433
> 
> With the replication features enabled as follows (pgpool.conf):
> 
>  replication_mode = true
>  replication_strict = true 
>  replication_stop_on_mismatch = true
>  replication_timeout = 5000
> 
> Now PgPool provides a two machine replication facility by executing a given query on both machines.  For replication to work under this manner, then all queries (from all connections) on machine1 must be executed in the same order on machine2.  I do not believe PgPool ensures this synchronisation, I shall demonstrate with a simple example:
> 
> Extract of 'test1.sql':
> 
>  insert into test values (10, 'test1');
>  insert into test values (11, 'test1');
>  insert into test values (12, 'test1');
>  insert into test values (13, 'test1');
>  ...
>  insert into test values (10009, 'test1');
> 
> Extract of 'test2.sql':
> 
>  delete from test;
>  delete from test;
>  delete from test;
>  delete from test;
>  ...10,000 times!
> 
> When I run test1.sql through one PgPool connection (port 9999 as above) and test2.sql through another PgPool connection, after both tests have completed there is a clear difference between Master and Secondary nodes on the data integrity (and order) e.g. rows in table test - Master != rows in table test on Secondary.  PgPool itself acknowledges this difference if you run the same test but substituting test2.sql with 'test3.sql':
> 
>  select * from test;
>  select * from test;
>  select * from test;
>  select * from test;
> 
>  where if replication_stop_on_mismatch is set to true, PgPool will fail (very quickly) with:
> 
>  'pool_process_query: kind does not match between backends master(D) secondary(C)'
> 
> Unless my tests are fundamentally flawed, this lack of data integrity makes the replication facilities unuseable.  It is possible I could ensure any application running atop of the DB synch'ed its connections to the DB but this would impose a serious performance penalty on the application (besides which replication should be transparent to the application anyway).
> 
> Please advise

These results are not very surprising. Since pgpool write to master
THEN secondary, there is a timing window between them. If you want to
retrieve(SELECT) exactly same results anytime from those servers while
many transactions are running currently, you need to explictly lock
the target table.

Same thing can be said to DELETE. DELETE needs to retrieve target rows
before deleting.

I don't think this restriction is applied only to pgpool.
--
Tatsuo Ishii
SRA OSS, Inc. Japan