[Pgpool-general] pgpool and slony
Christian Storm
christian.storm at gmail.com
Fri Jan 19 02:14:15 UTC 2007
I was wondering if anyone had submitted a patch to pgpool to make
certain SELECT queries 'sticky' to the master when operating in load
balancing mode, i.e., working
in concert with Slony?
In our current setup, our application keeps track of each user's
session. To overcome the data inconsistency issue, it will 'stick'
SELECT queries generated by a user
to the master for a certain amount of time after an updating
statement is made allowing Slony to do its thing. We'd love to black
box this by pushing this logic down further into a PG proxy
so that all our applications (PERL, C++, etc.) don't have to fool
with it.
This way of dealing with replication lag is non-ideal for a number of
reasons but it works > 99.999% of the time for our application and I
would suspect for many others. Problems can emerge from longer than
expected replication lags, e.g., caused by large UPDATEs or DELETEs,
or when data is being shared between users and can show up as
inconsistent during the replication lag time, e.g., only the user
making updates is stuck to the master while the other user is not.
I wanted to bounce this solution off the list....
Right now, if I understand it right, pgpool will 'stick' SELECT
statements to the master if they are in a transaction block.
Otherwise, they go
to the slave. As I'm sure you know, this can cause problems when
SELECT queries are run against the slave and the data hasn't arrived yet
due to the replication lag.
Could pgpool (I or II) be modified to except a token with each
query. In my case, this would be a session id or userid. When
pgpool gets an updating statement it would
store the token and timestamp in the cache. If any SELECTs come in
during the 'sticky' window (now - timestamp < sticky time window)
that are sent to the master. To deal
with unknown replication delays, this could possibly be improved by
having pgpool be Slony aware. In other words, pgpool would know
which transactions have been
replicated and which ones haven't been. So, for instance, if a
particular transaction for user x was taking a long time any SELECTs
tagged as coming from that same user
would be sent to the master. Or, perhaps more simply, pgpool could
be aware of Slony's current replication lag (sl_status or a direct
connection to Slony perhaps?) and
factor that into the 'sticky' time window calculation.
Is there any reason anyone can think of why this wouldn't work or be
a good idea? Has anyone else faced and dealt with this issue in
another way?
Christian
More information about the Pgpool-general
mailing list