[Pgpool-general] Parallel Query Mode - SELECT, Phppgadmin, Psql]

Tue Jul 15 10:52:03 UTC 2008

Hi.

> >> It's now working in pgpool-II 2.1 RC1. No more problems. And yes I use
> >> PHP
> >> PDO and AdoDB. My previous post has more details. So what's the
> >> replicate_def table and replicate_select for? Thanks again. :)
> >
> > replicate_def is to replicate data, not to distribute.
> >
> > ex) partitioning data
> >   node1: dist1 -> (1, 2, 3)  dist2 -> (4, 5, 6)
> >   node2: dist1 -> (4, 5, 6)  dist2 -> (7, 8, 9)
> >   node3: dist1 -> (7, 8, 9)  dist2 -> (10, 11, 12)
> >
> > When pgpool executes the follwing query, the query is not parallel
> > execution. Because we need to join dist1 table on node2 and dist2
> > table on node1.
> >
> >   SELECT * FROM dist1 INNER JOIN dist2 on dist1.a = dist2.a;
> >
> >
> > ex) partitioning data
> >   node1: dist1 -> (1, 2, 3)  rep1 -> (4, 5, 6, 7, 8, 9, 10, 11, 12)
> >   node2: dist1 -> (4, 5, 6)  rep1 -> (4, 5, 6, 7, 8, 9, 10, 11, 12)
> >   node3: dist1 -> (7, 8, 9)  rep1 -> (4, 5, 6, 7, 8, 9, 10, 11, 12)
> >
> > pgpool can execute the following query concurrently. Because rep1
> > table is the same on all nodes.
> >
> >   SELECT * FROM dist1 INNER JOIN rep1 ON dist1.a = rep1.a;
> >
> > I think small tables are defined in replicate_def.
> 
> If small tables are ideal in replicate_def then, it may not be applicable
> to our situation because here's the structure of the whole thing as
> planned:
> 
> database: loadbal
> schema: public
> tables: salesdetail
>         salesheader
> 
> Tables salesdetail and salesheader both contain transaction date and
> branchcode columns. We decided to partition the data by transaction date
> because, as calculated, it partitions data more evenly as compared to by
> branchcode partitioning. We then extract the MONTH part of the transaction
> date then use it as an argument on the SELECT CASE function. Therefore,
> possible values are 1-12. The 2 tables are very very large (millions of
> records per table because we have like 100+ branches and
> hundreds/thousands of transactions per day). So that is the reason why my
> function looks like this:

Do you want to execute following query ?

SELECT * FROM salesdetail INNER JOIN salesheader using(transaction_date);

ex) partitioning data
   node1: salesdetail -> (1, 2, 3, 4)  salesheader -> (1, 2, 3, 4)
   node2: salesdetail -> (5, 6, 7, 8)  salesheader -> (5, 6, 7, 8)
   node3: salesdetail -> (9, 10, 11, 12)  salesheader -> (9, 10, 11, 12)

If JOIN is concluded by a single node, you can use the VIEW.

Define VIEW to each node.
 CREATE VIEW detail_and_header as SELECT * FROM salesdetail INNER JOIN salesheader using(transaction_date);

Define VIEW as partitioning table to dist_def table.

You can use parallel query execution.

  SELECT * FROM detail_and_header;

> 
> *** Note: $1 = transaction date
> 
> SELECT CASE WHEN EXTRACT(MONTH FROM $1) => 1 AND EXTRACT(MONTH FROM $1) <=
> 4 THEN 0
> 
> *** Months 1-4 goes to node0
> 
> Question: Is it safe to say that it is not advisable to use replicate_def
> because the tables are very large or should I use it to replicate the
> records on all nodes?
> 
> Whenever I enter a value in the transactiondate field, ex. 2008-07-15, it
> goes to node1 alone and no copy on others. Is this the right behavior? Are
> we going to be able to take advantage of parallel query if this is going
> to be the setup? If not then what would possibly be the best partitioning
> scheme to take full advantage of parallel query given the structure/setup
> above? Our goal again is to prove that pgpool-II parallel query can cut
> the SELECT query time shorter as compared to a single MySQL server with
> same amount of data (millions). Ideally, time taken should divided by 3
> because of 3 backends.
> 
> Thank you very much for all your help.
> 
> Regards,
> Viril Calimlim
> 
> >
> > replicate_select is a parameter for replication mode. So you don't
> > need to set it.
> >
> > I'm not good at English. So if you cannot understand, feel free to ask
> > more.
> >
> > Regards,
> > --
> > Yoshiyuki Asaba
> > y-asaba at sraoss.co.jp
> >
> >>
> >> Regards,
> >> Viril Calimlim
> >>
> >> > Hi,
> >> >
> >> > From: calimlimvl at nationalbookstore.com.ph
> >> > Subject: Re: [Pgpool-general] Parallel Query Mode - SELECT,
> >> Phppgadmin,
> >> > Psql
> >> > Date: Thu, 10 Jul 2008 17:25:41 +0800 (PHT)
> >> >
> >> >> This is really serious because we code using PHP and we wouldn't be
> >> able
> >> >> to make use of pgpool-II if we couldn't solve this problem.
> >> >
> >> > Do you use PHP-PDO?
> >> >
> >> >  By the way,
> >> >> where can I get more information regarding other parameters like
> >> >> replicate_select (yes it replicates the SELECT on all nodes but
> >> what's
> >> >> the
> >> >> advantage? did I miss anything on the manual?), replicate_def table,
> >> >> etc.
> >> >> I am more than willing to further test the software for you to
> >> eliminate
> >> >> more bugs. Thank you so much for the speedy response.
> >> >
> >> > Could you try pgpool-II 2.1 RC1?
> >> >
> >> > Regards,
> >> > --
> >> > Yoshiyuki Asaba
> >> > y-asaba at sraoss.co.jp
> >> >
> >> >
> >> >
> >> >>
> >> >> Best regards,
> >> >> Viril Calimlim
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > Do you set parallel_mode to true?
> >> >> >
> >> >> > Regards,
> >> >> > --
> >> >> > Yoshiyuki Asaba
> >> >> > y-asaba at sraoss.co.jp
> >> >> >
> >> >> >
> >> >> > From: calimlimvl at nationalbookstore.com.ph
> >> >> > Subject: [Pgpool-general] Parallel Query Mode - SELECT, Phppgadmin,
> >> >> Psql
> >> >> > Date: Thu, 10 Jul 2008 15:45:11 +0800 (PHT)
> >> >> >
> >> >> >> Hello!
> >> >> >>
> >> >> >> First of all, thank you for taking the time to reply and excellent
> >> >> tip
> >> >> >> regarding my last question about Parallel Query. However, I've got
> >> a
> >> >> new
> >> >> >> problem.
> >> >> >>
> >> >> >> In Parallel Query Mode, as per the Tutorial guidelines, I created
> >> the
> >> >> >> following:
> >> >> >>
> >> >> >> - Database "test"
> >> >> >> - Schema "testschema"
> >> >> >> - Table "testtable"
> >> >> >> - Fields "id integer", "name varchar(20)"
> >> >> >> - Function pgpool_catalog.dist_def_test with same ranges
> >> (1..100000
> >> >> etc)
> >> >> >> - Inserted a row in dist_def table with key "id"
> >> >> >>
> >> >> >> I have 3 backend servers 0, 1, 2 and 1 pgpool-II dedicated server.
> >> I
> >> >> did
> >> >> >> not insert a row in replicate_def because I still don't understand
> >> >> that
> >> >> >> part (sorry).
> >> >> >>
> >> >> >> My question is that whenever I issue the command:
> >> >> >>
> >> >> >> postgres at pgpool-server:~$ psql -p 9999 -c "INSERT INTO
> >> >> >> testschema.testtable VALUES (100000,'testname')" test
> >> >> >>
> >> >> >> ... I checked all the backends and it send the data to node0 ALONE
> >> >> and
> >> >> >> when I do a SELECT *, it shows only 1 row.
> >> >> >>
> >> >> >> But when I use Phppgadmin or even run a PHP script (using PDO) on
> >> >> port
> >> >> >> 9999 of the pgpool-II server to insert data, I checked all the
> >> >> backends
> >> >> >> and all of them has the copy of the inserted row. When I do SELECT
> >> *,
> >> >> I
> >> >> >> get 3 rows of the same data. Another thing about Phppgadmin, when
> >> I
> >> >> >> click
> >> >> >> on the BROWSE button on testtable, it just keeps on Loading...
> >> >> meaning,
> >> >> >> it
> >> >> >> stuck forever for some reason and I had to kill all pgpool
> >> processes.
> >> >> >>
> >> >> >> tail postgresql log: lots of "LOG:  unexpected EOF on client
> >> >> connection"
> >> >> >> tail pgpool log:
> >> >> >>
> >> >> >> 2008-07-10 15:41:28 DEBUG: pid 6621: read_kind_from_backend: read
> >> >> kind
> >> >> >> from 0 th backend Z NUM_BACKENDS: 3
> >> >> >> 2008-07-10 15:41:28 DEBUG: pid 6621: read_kind_from_backend: read
> >> >> kind
> >> >> >> from 1 th backend Z NUM_BACKENDS: 3
> >> >> >> 2008-07-10 15:41:28 DEBUG: pid 6621: read_kind_from_backend: read
> >> >> kind
> >> >> >> from 2 th backend Z NUM_BACKENDS: 3
> >> >> >> 2008-07-10 15:41:28 DEBUG: pid 6621: pool_process_query: kind from
> >> >> >> backend: Z
> >> >> >> 2008-07-10 15:41:28 DEBUG: pid 6621: pool_read_message_length:
> >> slot:
> >> >> 0
> >> >> >> length: 5
> >> >> >> 2008-07-10 15:41:28 DEBUG: pid 6621: pool_read_message_length:
> >> slot:
> >> >> 1
> >> >> >> length: 5
> >> >> >> 2008-07-10 15:41:28 DEBUG: pid 6621: pool_read_message_length:
> >> slot:
> >> >> 2
> >> >> >> length: 5
> >> >> >> 2008-07-10 15:41:28 DEBUG: pid 6621: ReadyForQuery: message
> >> length: 5
> >> >> >> 2008-07-10 15:41:28 DEBUG: pid 6621: ReadyForQuery: transaction
> >> >> state: I
> >> >> >> 2008-07-10 15:41:28 DEBUG: pid 6621: pool_connection_pool_timer:
> >> set
> >> >> >> close
> >> >> >> time 1215675688
> >> >> >>
> >> >> >> Version: pgpool-II 2.1beta2
> >> >> >> OS: Ubuntu 8.04 Server
> >> >> >> PostgreSQL: 8.3.1
> >> >> >>
> >> >> >> Please help. I really want this to work. Ive been after this for 3
> >> >> >> months
> >> >> >> now.
> >> >> >>
> >> >> >> Thank you very much for your help.
> >> >> >>
> >> >> >> Regards,
> >> >> >> Viril
> >> >> >>
> >> >> >> _______________________________________________
> >> >> >> Pgpool-general mailing list
> >> >> >> Pgpool-general at pgfoundry.org
> >> >> >> http://pgfoundry.org/mailman/listinfo/pgpool-general
> >> >> > _______________________________________________
> >> >> > Pgpool-general mailing list
> >> >> > Pgpool-general at pgfoundry.org
> >> >> > http://pgfoundry.org/mailman/listinfo/pgpool-general
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >
> >> >
> >>
> >>
> >
> >
> 
> 
> 
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgpool-general
> 

-- 
SRA OSS, Inc. 日本支社
Yoshiharu Mori <y-mori at sraoss.co.jp>
http://www.sraoss.co.jp/