[Pgpool-general] Using SELECT + nextval + load balance with pgpool

Richard Jones rj at last.fm
Mon Sep 11 12:33:56 UTC 2006


Hi,
I've been pondering the issue of master/master pgpool replication for a few 
days.

My first concern was queries with side effects. I patched pgpool to do a few 
simple checks, but it's far from foolproof. When detecting if a query has 
sideeffects it's ok to have false positives, so i check for the presence of 
nextval/setval or a "select" without a "from". If these exist, I made 
load_balance_enabled() in pool_process_query.c return 0 to force the query to 
be sent to both backends. This handles sequences and basic "select 
someFunc()" calls. 

I don't really know how to detect if an arbitrary function (or select with a 
trigger) will modify data, maybe you need tighter integration with postgres 
for that (would love to hear if/how this is feasible). 
It would be possible to put that logic into the applications that connect to 
pgpool in the meantime though. (eg: put a space in front of the query if you 
know it modifies data)

Then there's the issue of locking.. it's not feasible (at least for me) to 
serialize queries that modify data, as it more than doubles the time for 
every insert/update query. The only safe way i could imagine of doing this is 
for pgpool to use two-phase-commit for all data-modifying queries. Then you 
can fail atomically if one of the backends could not get  locks or fails for 
some reason.

So my requirements are something like:
1) use TPC over both backends to handle inserts/updates/deletes safely
2) detect queries that modify data, and don't load balance them
3) replace non-deterministic functions with constants in pgpool 

2) is hard, but could be the job of the application instead of pgpool i 
suppose. 1) is easier, but I don't have the requisite C skills to code it.
3) requires the ability to parse the SQL.. eg: looking to replace "random()" 
and "now()" with constants before the sql is sent to backends. 


I am interested to hear how other people are approaching this and what is 
being done.

Regards,
RJ

PS: in case anyone is interested, here is some code i added to 
pool_process_query.c to detect sequences: 
(i'm not a C programmer... )

/*
 * Checks if query needs to be broadcast to all backends
 * @returns non-zero if sending to all backends is required
 */
static int query_alters_data(const char *sql)
{
        // make a copy of sql and lowercase it
        char *lowersql = malloc(strlen(sql)+1);
        if(!lowersql) return 1; 
        lowersql = strcpy(lowersql, sql);
        char *p = NULL;
        for(p=lowersql; *p; p++) *p=tolower(*p);
        int ret = 0;
        if(!ret && strstr(lowersql, "nextval")) ret = 1;
        if(!ret && strstr(lowersql, "setval"))  ret = 1;
        if(!ret && strstr(lowersql, "select ") && !strstr(lowersql, " from "))  
ret = 1;
        free(lowersql);
        return ret;
}

and in load_balance_enabled() i added this:

if(query_alters_data(sql))
{
	pool_debug("query_alters_data: TRUE. send query to all backends.");
        return 0;
}










On Wednesday 06 September 2006 15:13, Tatsuo Ishii wrote:
> > Hello,
> >
> > On Wed, 2006-09-06 at 22:21 +0900, Tatsuo Ishii wrote:
> > > Any idea?
> >
> > What about teaching pgpool to ignore "SELECT nextval()" queries (and
> > also setval, of course) while load balancing?
>
> What if nextval() is in a user defined function? What if SELECT
> nextval() is placed in subselect? What do you do with user defined
> functions having side effects?
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgpool-general

-- 
Richard Jones
Last.fm Ltd. | http://www.last.fm/
Office: +44 (0) 207 7780 7087
Mobile: +44 (0) 7740 194 389


More information about the Pgpool-general mailing list