[pgpool-hackers: 3657] Re: Inconsistency in native replication mode

Tatsuo Ishii ishii at sraoss.co.jp
Fri Jun 19 14:06:03 JST 2020


I have committed/pushed this.

https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=22930861c4006d230361be3779a84da3d0d24b66

> Here's a work-in-progress patch to introduce the Pangea algorythm [1].
> I deicded to add new clustering mode called "snapshot isolation mode"
> for this. What I have done are:
> 
> - Implement Pangea algorythm for r/w transactions.
> - Create docs for the new clustering mode.
> 
> Please note that to use the new clustering mode, PostgreSQL's default
> transaction isolation mode must be "REPEATABLE READ" (not READ
> COMMITTED nor SERIALIZABLE). At this point I tend to think that the
> new feature should be marked as "experimental" since I am not
> convinced that the new mode does not have any defect at the when 4.2
> is released.
> 
> BTW, I have added new section "Bibliography" to give a credit to
> Pangea. However it seems the rendering of it is different from what
> PostgreSQL has. Can someone please help?
> 
> "Takeshi Mishima and Hiroshi Nakamura, "Pangea: An Eager Database
> Replication Middleware guaranteeing Snapshot Isolation without
> modification of Database Servers", Proc. VLDB Conference, Aug. 2009."
> 
> I was expecting something like this:
> 
> "[mishima2009] Pangea: An Eager Database Replication Middleware
> guaranteeing Snapshot Isolation without modification of Database
> Servers", "Takeshi Mishima and Hiroshi Nakamura, Proc. VLDB
> Conference, Aug. 2009."
> 
> What I have not done are:
> 
> (1) Check if the transaction is read only and if so, then load balance
>     it.
> (2) regression test
> 
> (3) How to deal with SERIAL/sequence.
> 
> (4) Deal with extended query protocol
> 
> For (1), maybe we could have following rules:
> 
> - If it's a read only SELECT and it's not inside an explicit
>   transaction, we can load balance it.
> 
> - If it's a read only SELECT and it's inside an explicit transaction,
>   we can not load balance it.
> 
> The latter is different from native replication mode.
> 
> For (2) maybe we need new testing framework to give a control the
> timing to give queries to backend. If you have an idea, please let me
> know.
> 
> For (3) I have no idea. Please help.
> 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>   
>> Hi,
>> 
>> I have studied $subject issue. Suppose we have two sessions S1, S2 and
>> two servers N1 (master), N2 (slave). S1 updates table t1 in an
>> explicit transaction. S2 read t1. Below is a demonstration how we find
>> an inconsistency with native replication mode.  "S1:N1 SQL statement"
>> denotes an SQL statement is executed on node N1 in session S1.  Table
>> t1 has an integer column i. There is only on row in t1 and the initial
>> value is 0. Time is increasing from top to bottom.
>> 
>> S1/N1: BEGIN;
>> S1/N2: BEGIN;
>> S1/N1: UPDATE t1 SET i = i + 1;	-- i = 1
>> S1/N2: UPDATE t1 SET i = i + 1; -- i = 1
>> S1/N1: COMMIT;
>> S2/N1: SELECT i FROM t1; -- i = 1
>> S2/N2: SELECT i FROM t1; -- i = 0
>> S1/N2: COMMIT;
>> 
>> So inconsistent values of i of t1 are observed in session S2.
>> 
>> To avoid the inconsistency, we can control the order of concurrent
>> COMMIT and the first SQL in a transaction, which acquires the
>> snapshot, so that they are never executed concurrently. (This is a
>> proposed algorithm in [1]). Note that this requires transactions run
>> in REPEATABLE READ (or SERIALIZABLE), not default READ COMMITTED.
>> 
>> Suppose we have a mutex lock M. To execute that first SQL in a
>> transaction, Pgpool-II needs to acquire M before it. Also it needs to
>> acquire M before issuing COMMIT. After hiring this we have:
>> 
>> S1/N1: BEGIN;
>> S1/N2: BEGIN;
>> S1/N1: UPDATE t1 SET i = i + 1;	-- i = 1
>> S1/N2: UPDATE t1 SET i = i + 1;	-- i = 1
>> S1: try to acquire M and succeeded
>> S1/N1: COMMIT;
>> S2: try to acquire M and failed
>> S1/N2: COMMIT;
>> S1: release M
>> S2: wake up
>> S2/N1: SELECT i FROM t1; -- i = 1
>> S2/N2: SELECT i FROM t1; -- i = 1
>> 
>> or:
>> 
>> S1/N1: BEGIN;
>> S1/N2: BEGIN;
>> S1/N1: UPDATE t1 SET i = i + 1;	-- i = 1 in session S1
>> S1/N2: UPDATE t1 SET i = i + 1;
>> S1: try to acquire M and failed
>> S2: try to acquire M and succeeded
>> S2/N1: SELECT i FROM t1;	-- i = 0 in session S2
>> S2/N2: SELECT i FROM t1;	-- i = 0 in session S2
>> S2: release M
>> S1: wake up
>> S1/N1: COMMIT;
>> S2/N1: COMMIT;
>> 
>> As you see, we can now avoid the inconsistency.
>> 
>> Also [1] claims that only read only transaction can distribute SELECT
>> (load balance). Currently Pgpool-II load balance read only SELECT
>> whenever possible. i.e. if SELECT is followed by write query, start
>> transaction (if no transaction is started).
>> 
>> [1] http://www.vldb.org/pvldb/vol2/vldb09-694.pdf
>> 
>> Comments and suggestions are welcome.
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>> _______________________________________________
>> pgpool-hackers mailing list
>> pgpool-hackers at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-hackers


More information about the pgpool-hackers mailing list