[Pgpool-general] SELECT behavior during Parallel Query mode

Thu Nov 11 06:44:33 UTC 2010

Hello,

I'm a pgpool-II user for almost a year and it really helped me a lot so
thanks for all your work.

I solved every problem that arised so far with your documentation + this
list archive and google's search, but today i just realized that i may had
misunderstood some of PgPool-II functions so i'll really need help.

I'd been using Pgpool-II's Parallel Query Mode and i always thought that it
would split queries following the distribution definition created. Today,
while executing some queries on mine experimental database cluster i
realized that it only follows distribution definitions rules while
inserting, so queries that are made with a filter on distribution column
aren't executed only on the corresponding backend, but on every backend
(even knowing that they won't have any line that would answer this query).

After this surprise, i started surfing the web looking for any reference
about that and i found this e-mail written in *Mon Sep 18 12:27:41 UTC 2006
 *to this list ->  [Pgpool-general] pgpool II first
experiences<http://pgfoundry.org/pipermail/pgpool-general/2006-September/000465.html>
where Mr Tatsuo Ishii answers the same question against pgpool-II release
0.1 :

The data partitioning rule is used only for INSERT. Other queries
including SELECT do not use it. So the behavior you are watching is
expected one.

I really don't know why i assumed that pgpool-II would do that (guess i
wasn't the only one). Maybe a note on official documentation while
describing Parallel Query mode would be really helpful, but this is just a
suggestion.

   - Parallel Query

   Using the parallel query function, data can be divided among the multiple
   servers, so that a query can be executed on all the servers concurrently to
   reduce the overall execution time. Parallel query works the best when
   searching large-scale data.

Well, after getting this all of my chest it would be really thankful if you
guys could help me with some questions:

PgPool-II really don't support this query awareness of where data is located
according to distribution definitions, so the above behavior is expected?

Are there any plans for PgPool-II implements this feature?

Anyone knows any solutions (besides PgPool-II) that supplies this feature?

And finally, are there any PgPool-II class guidelines (like uml
diagrams) available to allow a better code understatement, so i may try
modify it to support this feature ?

Thanks a lot for any help and for this wonderful software.

Diego Pereira
Universidade Federal do Estado do Rio de Janeiro (UNIRIO)
Rio de Janeiro, RJ - Brazil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pgfoundry.org/pipermail/pgpool-general/attachments/20101111/848b1d31/attachment.html>