[Pgpool-general] Replication Mode + Parallel Query Execution

Thu Jul 3 02:57:55 UTC 2008

Hello!

In our company, we have approximately 144+ million records and are stored
in multiple databases using MySQL InnoDB engine. We are planning to
migrate them to PostgreSQL and found out about Pgpool-II. I am currently
testing the software (pgpool-II 2.1 beta2). I have successfully finished
setting up Replication Mode (1 Pgpool-II Server and 3 backend nodes).

The MySQL server is a single Sales Report server wherein a lot of queries
are done everyday specially SELECT queries. INSERTS/UPDATES I believe, are
done after midnight. We are planning to do the write queries
(INSERT/UPDATE/etc.) realtime and at the same time, take lots of SELECT
queries.

I compared the time taken when issuing a simple "SELECT COUNT(*) FROM
table1" on the MySQL server against the time taken on doing the same
through the Pgpool-II Server and directly to one of the backends. Here's
the result:

* Pgpool-II Server with Replication on 3 Backends:	31.34mins
* MySQL Standalone Server:				20.56mins
* PostgreSQL Pgpool-II Directly to Backend:		14.28mins

The time to beat is MySQL's 20.56mins. It seems that connecting directly
to one of the backend servers is faster than going through the Pgpool-II
server when doing a SELECT query (maybe because of overhead?). I'm not
sure if I read/comprehended it right from the manual, that Replication
Mode performs best on multiple simultaneous connections. (Probably a
server with lots of connections then balances the load?) Another good
thing about Replication Mode is that, as I understand it, creates a backup
of the whole database/s to each and everyone of the backend servers so
when a failure occurs on one the the backends, service continues.(But what
if the main Pgpool-II Server fails?)

I also found out about Parallel Query Execution which is, I believe, the
right mode for our target. Our primary goal is to minimize the time taken
when doing SELECT queries and strip down the MySQL's 20.56min time limit.
We designed a simple setup to utilize both Replication and Parallel Modes:

                        Pgpool-II Server (Replication Mode)
                               |
                        _______|________
                        |              |
                    Backend0       Backend1
                (Parallel Mode) (Parallel Mode)
                  ______|______  ______|______
                  |     |      | |     |      |
                 B0    B1     B2 B0    B1     B2   <-- Backends for Parallel

Question #1: Will this setup work?
Question #2: Is this the best path to go for minimizing SELECT query times?
Question #3: Any other suggestions please?

Thank you for your time and help.

Best regards,
Viril L. Calimlim