[Pgpool-hackers] Transforming SQL to prepared statements inside pgpool-II

Wed May 18 07:01:13 UTC 2011

A quick update about transforming simple queries to extended protocol 
queries inside pgpool-II.

I managed to do static transformation of a query (select 1 from foo join 
bar on bar.id = foo.id where xyzzy = 2;) inside pgpool. Transforming the 
query inside pgpool isn't that hard, there is just a few extended 
protocol message types which needs to be handled. The performance is as 
expected - you do get rid of planning overhead. Also, the data returned 
by pgpool to the frontend is identical when passing the simple query and 
when transforming to extended protocol query, so there should be no 
problem in that part.

Now, there are two hard parts, as far as I can see.

The first one is parsing the query and matching it against the potential 
prepare targets. For this, I would have to create a parse tree walker 
that can handle any node possible in select statements. And that is 
quite a list. This should be doable, but requires a lot of work.

The second problem is handling prepared statement state across 
connections. For this feature to be useful (at least for my use case) I 
would need to persist the prepared statements for each backend 
connection in the pool. Typical use for me would be a web application 
which makes a lot of short connections. So if the transformation to 
prepared statements is done per frontend connection, there is not much 
to gain. On the other hand, if prepare is done just once for backend 
connection, then the problem is how to handle "deallocate all" and 
"discard all". Trying to bind to non-existing prepared statement will 
result in aborted transaction, so "bind and if fail prepare" is not 
possible.

If the plans are kept for the duration of the backend connection, then a 
frontend using prepared statements could have problems. At connection 
start there are prepared statements from old connections still hanging 
around. And as said, if not kept, there is not much gain from this 
feature. Also, if the frontend manually deallocates all plans (possibly 
inside a plpgsql function), then we are screwed. There is no cheap way 
to check if a given prepared statement exists if I am not mistaken. A 
potential simple solution for this is to document that "deallocate all" 
and "discard all" should not be used when using this feature and using 
other than unnamed prepared statements in frontend connections can lead 
to name clashes.

I am not exactly sure why use of extended query protocol is wanted. The 
other option is to use "prepare _pgpool_plan_x as (query);" and "execute 
_pgpool_plan_x;" using simple query protocol. This would be a bit easier 
to do and allow for more use cases. For example function arguments are 
not allowed in extended query protocol, but "execute 
fooplan(some_func())" is allowed.

Unfortunately, at the moment I do not have too much time to devote to 
this. I will get back to this when I have time. This could be an useful 
feature, and at least I can learn a lot while hacking.

  - Anssi