[Pgpool-general] Zimbra Detected SpamPgPool replication or Slony

Mon Nov 26 08:52:59 UTC 2007

Tatsuo Ishii wrote:
>> Tatsuo Ishii wrote:
>>     
>>>>> Can that situation occurs also with PgPool replication like:
>>>>> PgPool sends update of A in M .
>>>>> First request on A goes on M.
>>>>> Second request on A goes to S.
>>>>> PgPool sends update to S.
>>>>>       
>>>>>           
>>> This could happen too, though the window is very narrow comparing with
>>> Slony-I. As of pgpool-II 2.0, any query is executed as an explicit
>>> transaction. Let's say you have a master and two slaves and you send an
>>> update query to pgpool-II.
>>>
>>>   
>>>       
>> Thanks, it was almost a theoretical question - but if it can happen, it 
>> will happen in Production the day you are not expecting it ;)
>>     
>>> Also note that in pgpool, if you send an update then a select to
>>> retrieve the result in a *same session*, it's guaranteed that the
>>> select returns the result of update since the update and select will
>>> be executed on a same PostgreSQL node even if the load balanace option
>>> is enabled.
>>>   
>>>       
>> I had noticed that.
>>
>> Does the lock mechanism change anything to my question ?
>>
>> In fact I made the following tests with and without the lock:
>>
>> * pgpool distributing the load on 3 machines A,B,C defined in that order 
>> in the list.
>> * I create a table with an id as primary key and a string column.
>> *Through several psql connections via pgpool I update the DB and add 
>> some rows.
>> * Then I add manually in B server a line with id let's say 5. When then 
>> I reconnect the psql client and try to add a row with id 5, it fails as 
>> expected and detecting data inconsistency, it removes A.
>> * When I do not set any lock, I get in fact A with the new row with 
>> id=5, the second one with previous row and C with no row.
>> * When I use the lock A and C remain consistent with no new row while B 
>> has its previous row.
>>
>> My idea is to keep the DB consistency among the three databases (a kind 
>> of two-phase commit). And I just added that row to check the behavior on 
>> the three database when an error might occur on a given DB.
>> It is like with no-lock I can't guarantee the DB consistency and that 
>> with lock I can.
>> Am I right ?
>> And with the lock, can the update be "accessed" on all the tables only 
>> after all the DB have comited their update ?
>>
>> Thanks for your answer and also thank you for PgPool which is a very 
>> nice tool.
>>     
>
> I think you could avoid the window by using a strong lock.
>
> BEGIN;
> LOCK t1;
> UPDATE t1;
> SELECT...
> COMMIT;
>
> With this, any transaction will not see the effect of UPDATE until all
> servers commit the transaction.
>
> Of course this will sacrifice concurrency though.
>
> BTW I don't think 2PC helps you in this regard.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
>   
Thanks again.
What do you mean when you say 2PC can't help me ?

In fact my general purpose is to study how to improve scalability and 
availability of the DB server as probably many users of pgpool.
Availability could be enforced thanks to pgpool-ha if I am right.
Concerning the scalability we have a database that could grow 
significantly (if the project has success of course) and we want to 
anticipate this kind of concerns.
There are few writes but many reads so the idea to have a replication on 
several DB and dispatch the select among the DB. It seems that the 
solution with pgpool to replicate data among DB is suitable (if we can 
ensure the kind of 2-phase commit among all the updates).
I am also currently studying the way to split some tables among separate 
DB with partitioning but I need to study especially the way it works 
with foreign keys.