[pgpool-hackers: 3653] Re: Inconsistency in native replication mode

Thu Jun 18 09:18:26 JST 2020

Here's a work-in-progress patch to introduce the Pangea algorythm [1].
I deicded to add new clustering mode called "snapshot isolation mode"
for this. What I have done are:

- Implement Pangea algorythm for r/w transactions.
- Create docs for the new clustering mode.

Please note that to use the new clustering mode, PostgreSQL's default
transaction isolation mode must be "REPEATABLE READ" (not READ
COMMITTED nor SERIALIZABLE). At this point I tend to think that the
new feature should be marked as "experimental" since I am not
convinced that the new mode does not have any defect at the when 4.2
is released.

BTW, I have added new section "Bibliography" to give a credit to
Pangea. However it seems the rendering of it is different from what
PostgreSQL has. Can someone please help?

"Takeshi Mishima and Hiroshi Nakamura, "Pangea: An Eager Database
Replication Middleware guaranteeing Snapshot Isolation without
modification of Database Servers", Proc. VLDB Conference, Aug. 2009."

I was expecting something like this:

"[mishima2009] Pangea: An Eager Database Replication Middleware
guaranteeing Snapshot Isolation without modification of Database
Servers", "Takeshi Mishima and Hiroshi Nakamura, Proc. VLDB
Conference, Aug. 2009."

What I have not done are:

(1) Check if the transaction is read only and if so, then load balance
    it.
(2) regression test

(3) How to deal with SERIAL/sequence.

(4) Deal with extended query protocol

For (1), maybe we could have following rules:

- If it's a read only SELECT and it's not inside an explicit
  transaction, we can load balance it.

- If it's a read only SELECT and it's inside an explicit transaction,
  we can not load balance it.

The latter is different from native replication mode.

For (2) maybe we need new testing framework to give a control the
timing to give queries to backend. If you have an idea, please let me
know.

For (3) I have no idea. Please help.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Hi,
> 
> I have studied $subject issue. Suppose we have two sessions S1, S2 and
> two servers N1 (master), N2 (slave). S1 updates table t1 in an
> explicit transaction. S2 read t1. Below is a demonstration how we find
> an inconsistency with native replication mode.  "S1:N1 SQL statement"
> denotes an SQL statement is executed on node N1 in session S1.  Table
> t1 has an integer column i. There is only on row in t1 and the initial
> value is 0. Time is increasing from top to bottom.
> 
> S1/N1: BEGIN;
> S1/N2: BEGIN;
> S1/N1: UPDATE t1 SET i = i + 1;	-- i = 1
> S1/N2: UPDATE t1 SET i = i + 1; -- i = 1
> S1/N1: COMMIT;
> S2/N1: SELECT i FROM t1; -- i = 1
> S2/N2: SELECT i FROM t1; -- i = 0
> S1/N2: COMMIT;
> 
> So inconsistent values of i of t1 are observed in session S2.
> 
> To avoid the inconsistency, we can control the order of concurrent
> COMMIT and the first SQL in a transaction, which acquires the
> snapshot, so that they are never executed concurrently. (This is a
> proposed algorithm in [1]). Note that this requires transactions run
> in REPEATABLE READ (or SERIALIZABLE), not default READ COMMITTED.
> 
> Suppose we have a mutex lock M. To execute that first SQL in a
> transaction, Pgpool-II needs to acquire M before it. Also it needs to
> acquire M before issuing COMMIT. After hiring this we have:
> 
> S1/N1: BEGIN;
> S1/N2: BEGIN;
> S1/N1: UPDATE t1 SET i = i + 1;	-- i = 1
> S1/N2: UPDATE t1 SET i = i + 1;	-- i = 1
> S1: try to acquire M and succeeded
> S1/N1: COMMIT;
> S2: try to acquire M and failed
> S1/N2: COMMIT;
> S1: release M
> S2: wake up
> S2/N1: SELECT i FROM t1; -- i = 1
> S2/N2: SELECT i FROM t1; -- i = 1
> 
> or:
> 
> S1/N1: BEGIN;
> S1/N2: BEGIN;
> S1/N1: UPDATE t1 SET i = i + 1;	-- i = 1 in session S1
> S1/N2: UPDATE t1 SET i = i + 1;
> S1: try to acquire M and failed
> S2: try to acquire M and succeeded
> S2/N1: SELECT i FROM t1;	-- i = 0 in session S2
> S2/N2: SELECT i FROM t1;	-- i = 0 in session S2
> S2: release M
> S1: wake up
> S1/N1: COMMIT;
> S2/N1: COMMIT;
> 
> As you see, we can now avoid the inconsistency.
> 
> Also [1] claims that only read only transaction can distribute SELECT
> (load balance). Currently Pgpool-II load balance read only SELECT
> whenever possible. i.e. if SELECT is followed by write query, start
> transaction (if no transaction is started).
> 
> [1] http://www.vldb.org/pvldb/vol2/vldb09-694.pdf
> 
> Comments and suggestions are welcome.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
> _______________________________________________
> pgpool-hackers mailing list
> pgpool-hackers at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
-------------- next part --------------
A non-text attachment was scrubbed...
Name: snapshot.diff
Type: text/x-patch
Size: 52897 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20200618/51571701/attachment-0001.bin>