[Pgpool-general] Order of Query Results

Tue Aug 3 22:20:45 UTC 2010

On Wednesday, July 28, 2010 07:08:42 pm you wrote:
> > Le 28/07/2010 17:16, Sean Brown a écrit :
> > > On Wednesday, July 28, 2010 10:56:27 am you wrote:
> > >> Le 28/07/2010 16:50, Sean Brown a écrit :
> > >>> Does PgPool have an issue with the order of results from a query
> > >>> being returned in a different order?
> > >>> 
> > >>> With the ongoing issue of our cluster falling out, we just had one
> > >>> member fall out again and again, no error in PostgreSQLs log is
> > >>> recorded. PgPool reports the possible last query was:
> > >>> select * from carts where cart_id in (11835,11824,11819)
> > >>> 
> > >>> Specifically the error in PgPools log is:
> > >>> read_kind_from_backend: 1 th kind D does not match with master or
> > >>> majority connection kind ^@
> > >>> 2010-07-28 10:21:23 ERROR: pid 28242: kind mismatch among backends.
> > >>> Possible last query was: "select * from carts where cart_id in
> > >>> (11835,11824,11819)" kind details are: 1[D]
> > >>> 
> > >>> If I run that query on the remaining node and on the one that just
> > >>> fell out, I get the same 3 results, but the orders of the records
> > >>> are different.
> > >> 
> > >> The query you show doesn't ask for a specific order (no "ORDER BY"
> > >> clause), so each backend can send the data in whatever order they
> > >> "prefer".
> > > 
> > > That I understand, what I am wondering is if this will cause PgPool to
> > > believe there is an error in the information returned from the
> > > backends, even if the only difference is the difference in order?
> > > 
> > > If it does, I assume the best way to deal with it is add order by's to
> > > every query passed to pgpool?
> > 
> > That's actually really good questions, and I don't have an answer to
> > them. CC back the list to know if other people have an answer to give us.
> 
> No, pgpool does not care about the order of data returned by SELECT.
> More details...
> 
>  PostgreSQL returns data packet like this:
> T(descriptions about tuple)
> D(actual one raw tuple data)
> D
> D
> 
> 
> C(indicates succeeded in sending data)
> 
> where each single capital letter is "packet kind"(see PostgreSQL docs
> for more details). What pgpool actually does is, checking the packet
> kind, not content of tuple data.  So as long as same number of tuples
> are returned from each backend, pgpool is happy.
> 
> I'm not sure what pgpool version Sean's uses, it seems it's a little
> bit old (from the error message I guess). Also, "majority connection
> kind ^@" looks strange. ^@ = 0x00, which is not valid kind at all.
> It seems something unusual is going on...
> 
> If self cantained test case is provided, I will be able to look into
> this.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp

Ok I have some more information. The cluster fell apart again yesterday. 
Again, there was nothing out of the ordinary in the postgresql logs leading 
upto the nodes falling out but this was logged in the pgpool log

First node fell out at 10:36am:

2010-08-02 10:36:47 DEBUG: pid 16699: AsciiRow: 24 th field size does not match 
between master(16777216) and 2 th backend(0)
2010-08-02 10:36:47 DEBUG: pid 16699: read_kind_from_backend: read kind from 0 
th backend ^@ NUM_BACKENDS: 3
2010-08-02 10:36:47 DEBUG: pid 16699: read_kind_from_backend: read kind from 1 
th backend ^@ NUM_BACKENDS: 3
2010-08-02 10:36:47 DEBUG: pid 16699: read_kind_from_backend: read kind from 2 
th backend D NUM_BACKENDS: 3
2010-08-02 10:36:47 ERROR: pid 16699: read_kind_from_backend: 2 th kind D does 
not match with master or majority connection kind ^@
2010-08-02 10:36:47 ERROR: pid 16699: kind mismatch among backends. Possible 
last query was: "select * from carts where cart_id in 
(12979,12984,12987,12986,12982,12981)" kind details are: 2[D]
2010-08-02 10:36:47 LOG:   pid 16699: notice_backend_error: 2 fail over 
request from pid 16699
2010-08-02 10:36:47 DEBUG: pid 24933: failover_handler called

Second node fell out at 1:56pm:

2010-08-02 13:56:21 DEBUG: pid 19720: AsciiRow: len: 1 data: t
2010-08-02 13:56:21 DEBUG: pid 19720: read_kind_from_backend: read kind from 0 
th backend ^@ NUM_BACKENDS: 3
2010-08-02 13:56:21 DEBUG: pid 19720: read_kind_from_backend: read kind from 1 
th backend D NUM_BACKENDS: 3
2010-08-02 13:56:21 ERROR: pid 19720: read_kind_from_backend: 1 th kind D does 
not match with master or majority connection kind ^@
2010-08-02 13:56:21 ERROR: pid 19720: kind mismatch among backends. Possible 
last query was: "select * from carts where cart_id in (13019,13018)" kind 
details are: 1[D]
2010-08-02 13:56:21 LOG:   pid 19720: notice_backend_error: 1 fail over 
request from pid 19720
2010-08-02 13:56:21 DEBUG: pid 24933: failover_handler called

I notice two things, first that we died both times on a query of the carts 
table, but this could just be a coincidence. Second, we have the ^@ showing up 
as a kind again, except while two nodes returned ^@ when the first node fell 
out, only one returned ^@ when the second fell out, even though it was one of 
the nodes that returned ^@ that morning.

I have PgPool 2.3.2 and PostgreSQL 8.4.4 on CentOS 5.5. Is this really a 
PgPool issue, or should I walk over to the postgresql mailing lists?