[Pgpool-general] Parallel Querying
James Farrugia
james.farrugia at gmail.com
Fri May 23 10:57:31 UTC 2008
Hi all,
I'm a total Pgpool-II beginner having fiddled around with it only for the
past few days. I'd like to settle with some of the more seasoned users a
few doubts/queries but before that let me descibe in brief my use case.
We have two database, each hosting having several months' data, each month
contained in a shard/child table, inheriting a master table (postgresql
partitioning)
The data is horizontally partitioned over two nodes, such that even months
are found on Node 1 and odd months are found on Node 2. However the last 3
months (including the current) are available on both Node 1 and 2; in case
of h/w failure or other problems on any one of the nodes, the last 3 months'
worth of data will be available on the other. (We can't afford full database
replication because of space constraints).
At first I experimented with Plproxy but i found the need to wrap every
query in a function a bit cumbersome. Finally I came across pgpool-II which
with its support for paralled querying, seemed like an ideal choice.
I installed the latest stable release pgpool-II-2.0.1 on a postgresql 8.2.1
database server.
I set Pgpool.conf to run in parallel mode. Since my databases are not
replicated, the replication and load balancing features are switched off.
Then I created the pgpool database together with dist_def, replicate_def and
query_cache. I configured dist_def to cater for one partitioned table
complete with function which resolves the database node according a
timestamp field.
Now for the questions:
1) SELECTs are not selectively routed to a database node according to the
timestamp; they are run on both nodes. Is this correct?
2) Since some data is replicated on both nodes, in some cases I get
duplicate records. How that can be avoided?
3) The distinct SQL command results in the query not being rewritten
correctly:
Ex: select distinct on (id) * from my_test;
ERROR: syntax error at or near ")"
The query is represented as:
SELECT DISTINCT ON (pool_t$0.pool_c$0) pool_c$0 AS id,pool_c$1 AS txstamp
FROM dblink('host=theia.go.com.mt dbname=CDR port=9999
user=postgres','SELECT pool_parallel("SELECT FROM my_test")') AS pool_t$0g
()
in pgpool's log file.
Thank you in advance for your help.
James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://pgfoundry.org/pipermail/pgpool-general/attachments/20080523/25e94d01/attachment.html
More information about the Pgpool-general
mailing list