[Pgpool-general] Replication Mode + Parallel Query Execution

Thu Jul 10 01:52:38 UTC 2008

Anyone? Please? By the way, is there like a FAQ somewhere so that I could
avoid asking basic questions like this. For example, I have another
question about replicate_def. How does it really work and what is it for?
I was able to make data partitioning dist_def work (and later found out
that not all nodes have contain the same data). Is my English
comprehension really that bad or am I just relying too much on the
pgpool-II manual? (Believe me, I tried reading it so many many many times
but no, I'm never going to give up until I have this implemented because I
strongly believe in its potential and we need it for speed on querying
millions of data). Thank you very much for your help. I hope I could get a
reply this time. :)

> Hello!
>
> In our company, we have approximately 144+ million records and are stored
> in multiple databases using MySQL InnoDB engine. We are planning to
> migrate them to PostgreSQL and found out about Pgpool-II. I am currently
> testing the software (pgpool-II 2.1 beta2). I have successfully finished
> setting up Replication Mode (1 Pgpool-II Server and 3 backend nodes).
>
> The MySQL server is a single Sales Report server wherein a lot of queries
> are done everyday specially SELECT queries. INSERTS/UPDATES I believe, are
> done after midnight. We are planning to do the write queries
> (INSERT/UPDATE/etc.) realtime and at the same time, take lots of SELECT
> queries.
>
> I compared the time taken when issuing a simple "SELECT COUNT(*) FROM
> table1" on the MySQL server against the time taken on doing the same
> through the Pgpool-II Server and directly to one of the backends. Here's
> the result:
>
> * Pgpool-II Server with Replication on 3 Backends:	31.34mins
> * MySQL Standalone Server:				20.56mins
> * PostgreSQL Pgpool-II Directly to Backend:		14.28mins
>
> The time to beat is MySQL's 20.56mins. It seems that connecting directly
> to one of the backend servers is faster than going through the Pgpool-II
> server when doing a SELECT query (maybe because of overhead?). I'm not
> sure if I read/comprehended it right from the manual, that Replication
> Mode performs best on multiple simultaneous connections. (Probably a
> server with lots of connections then balances the load?) Another good
> thing about Replication Mode is that, as I understand it, creates a backup
> of the whole database/s to each and everyone of the backend servers so
> when a failure occurs on one the the backends, service continues.(But what
> if the main Pgpool-II Server fails?)
>
> I also found out about Parallel Query Execution which is, I believe, the
> right mode for our target. Our primary goal is to minimize the time taken
> when doing SELECT queries and strip down the MySQL's 20.56min time limit.
> We designed a simple setup to utilize both Replication and Parallel Modes:
>
>                         Pgpool-II Server (Replication Mode)
>                                |
>                         _______|________
>                         |              |
>                     Backend0       Backend1
>                 (Parallel Mode) (Parallel Mode)
>                   ______|______  ______|______
>                   |     |      | |     |      |
>                  B0    B1     B2 B0    B1     B2   <-- Backends for
> Parallel
>
> Question #1: Will this setup work?
> Question #2: Is this the best path to go for minimizing SELECT query
> times?
> Question #3: Any other suggestions please?
>
> Thank you for your time and help.
>
> Best regards,
> Viril L. Calimlim
>
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgpool-general
>
>