[Pgpool-general] Parallel Querying

James Farrugia james.farrugia at gmail.com
Fri May 23 10:57:31 UTC 2008


Hi all,

I'm a total Pgpool-II beginner having fiddled around with it only for the
past few days.  I'd like to settle with some of the more seasoned users a
few doubts/queries but before that let me descibe in brief my use case.

We have two database, each hosting having several months' data, each month
contained in a shard/child table, inheriting a master table (postgresql
partitioning)

The data is horizontally partitioned over two nodes, such that even months
are found on Node 1 and odd months are found on Node 2.  However the last 3
months (including the current) are available on both Node 1 and 2; in case
of h/w failure or other problems on any one of the nodes, the last 3 months'
worth of data will be available on the other. (We can't afford full database
replication because of space constraints).

At first I experimented with Plproxy but i found the need to wrap every
query in a function a bit cumbersome.  Finally I came across pgpool-II which
with its support for paralled querying, seemed like an ideal choice.

I installed the latest stable release pgpool-II-2.0.1 on a postgresql 8.2.1
database server.
I set Pgpool.conf to run in parallel mode.  Since my databases are not
replicated, the replication and load balancing features are switched off.

Then I created the pgpool database together with dist_def, replicate_def and
query_cache.  I configured dist_def to cater for one partitioned table
complete with function which resolves the database node according a
timestamp field.

Now for the questions:
1) SELECTs are not selectively routed to a database node according to the
timestamp; they are run on both nodes.  Is this correct?

2) Since some data is replicated on both nodes, in some cases I get
duplicate records.  How that can be avoided?

3) The distinct SQL command results in the query not being rewritten
correctly:

Ex:  select distinct on (id) *  from my_test;
ERROR:  syntax error at or near ")"

The query is represented as:
SELECT DISTINCT ON (pool_t$0.pool_c$0) pool_c$0  AS id,pool_c$1  AS txstamp
FROM dblink('host=theia.go.com.mt dbname=CDR port=9999
user=postgres','SELECT pool_parallel("SELECT  FROM my_test")') AS pool_t$0g
()

in pgpool's log file.

Thank you in advance for your help.

James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://pgfoundry.org/pipermail/pgpool-general/attachments/20080523/25e94d01/attachment.html 


More information about the Pgpool-general mailing list