[Pgpool-general] Mismatch among backends

Sun Mar 8 07:35:22 UTC 2009

Hi Marcelo,

I did 3 days continuous stress test using pgbench before RC1 or 2 and
got nothing wrong. So the possibility is, 

1) I need more load average.

2) I need more complex queries other than just SELECT/UPDATE/INSERT.

Thoughts?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> Hi James,
> 
> Unfortunately I have also had several issues with backends getting  
> mismatch problems on 2.2 while on load. Had to go back to 2.1 .
> 
> I was doing a stresstest on our pgpool server which has two backends.  
> One of our devs created a python script that replays the apache logs  
> and each script is run on 4 boxes that open 150 cocurrent connection  
> to pgpool.
> 
> When the script first start everything seems ok. But when the amount  
> of transactions start to increase the load on the pg backends start to  
> get high around 20-30 avg and thats when pgpool starts to throw a  
> bunch of mismatch errors and the backebds fall out sync. Sometimes  
> that happens in 5 minutes other time in 10-15 minutes.
> 
> So we decided to do the stresstest gain but against version2.1 with a  
> patch for the DECLARE statements. It was a cvs version from 2008-08-25  
> if I'm correct. Everything worked out great on version 2.1, we  
> tresstested pgpool and the backends to the fullest and no problems at  
> all. We let the test run for 1 hour and repeated about 3 times. The  
> load avg on the pg backends also reached around 50-70 .
> 
> When I get some time, hopefully next week or so, I will start doing  
> this same tests but increasing the pgpool cvs version after revision  
> 112 until I start seeing problems again. Hopefully that will help  
> Tatsuo.
> 
> 
> -
> Marcelo
> 
> On Mar 6, 2009, at 2:44, Jaume Sabater <jsabater at gmail.com> wrote:
> 
> > Hi all!
> >
> > Just tried to connect to my pgpool-II 2.2/PostgreSQL 8.3 cluster and
> > saw an error, which I forgot to copy and paste somewhere, that said
> > something like "error in catalog with relid 26243" (I only copied the
> > number). I checked the cluster and, again, there had been a kind
> > mismatch among backends, so the slave node was down and the cluster
> > was working only with the master node.
> >
> > This is what I found on the syslog:
> >
> > Mar  6 08:10:27 pgsql1 pgpool: ERROR: pid 26306:
> > read_kind_from_backend: 1 th kind E does not match with master or
> > majority connection kind C
> > Mar  6 08:10:27 pgsql1 pgpool: ERROR: pid 26306: kind mismatch among
> > backends. Possible last query was: "COPY "TSearcherServices"
> > ("IdSearcherServices", "IdSearcher" ,"IdService", "SearcherNumber" )
> > Mar  6 08:10:27 pgsql1 pgpool: FROM '/opt/pgpool2/ 
> > TSearcherServices.csv'
> > Mar  6 08:10:27 pgsql1 pgpool: WITH DELIMITER AS '|' CSV;" kind
> > details are: 0[C] 1[E]
> > Mar  6 08:10:27 pgsql1 pgpool: LOG:   pid 26306: notice_backend_error:
> > 1 fail over request from pid 26306
> > Mar  6 08:10:27 pgsql1 pgpool: LOG:   pid 5315: starting degeneration.
> > shutdown host pgsql2.freyatest.domain(5432)
> > Mar  6 08:10:27 pgsql1 pgpool: LOG:   pid 5315: execute command:
> > /var/lib/postgresql/8.3/main/pgpool-failover 1 pgsql2.freyatest.domain
> > 5432 /var/lib/postgresql/8.3/main 0 0
> > Mar  6 08:10:27 pgsql1 pgpool[32211]: Executing pgpool-failover as  
> > user postgres
> > Mar  6 08:10:27 pgsql1 pgpool[32212]: Failover of node 1 at hostname
> > pgsql2.freyatest.domain. New master node is 0. Old master node was 0.
> > Mar  6 08:10:27 pgsql1 pgpool: LOG:   pid 5315: failover_handler: set
> > new master node: 0
> > Mar  6 08:10:27 pgsql1 pgpool: LOG:   pid 5315: failover done.
> > shutdown host pgsql2.freyatest.domain(5432)
> >
> >
> > These COPY operations have been very frequent during the last three or
> > four months, with developers constantly dumping information here and
> > there. With version 2.1 I never had a mismatch among backends, but now
> > I have had 2 of those this very same week, plus a few more the
> > previous couple of weeks (we were working with betas or RCs of version
> > 2.2). I can't really point at version 2.2 regarding the issue, but I
> > promise I don't recall it happening with version 2.1. It is true that
> > the number of operations on the PostgreSQL cluster have increased a
> > lot in the last 4 weeks, too.
> >
> > Tatsuo, could you please check it out? Here you are the other error
> > that happened this week. Notice the query was different. Logs from
> > past week are gone, unfortunately.
> >
> > Mar  5 14:50:26 pgsql1 pgpool: ERROR: pid 20120: pool_read: read
> > failed (Connection reset by peer)
> > Mar  5 14:50:26 pgsql1 pgpool: LOG:   pid 20120:
> > ProcessFrontendResponse: failed to read kind from frontend. frontend
> > abnormally exited
> > Mar  5 14:50:26 pgsql1 pgpool: LOG:   pid 20120:
> > read_kind_from_backend: parameter name: is_superuser value: on
> > Mar  5 14:50:26 pgsql1 pgpool: LOG:   pid 20120:
> > read_kind_from_backend: parameter name: session_authorization value:
> > pgpool2
> > Mar  5 14:50:26 pgsql1 pgpool: LOG:   pid 20120:
> > read_kind_from_backend: parameter name: is_superuser value: on
> > Mar  5 14:50:26 pgsql1 pgpool: LOG:   pid 20120:
> > read_kind_from_backend: parameter name: session_authorization value:
> > pgpool2
> > Mar  5 14:50:57 pgsql1 pgpool: LOG:   pid 19950:
> > read_kind_from_backend: parameter name: is_superuser value: on
> > Mar  5 14:50:57 pgsql1 pgpool: LOG:   pid 19950:
> > read_kind_from_backend: parameter name: session_authorization value:
> > pgpool2
> > Mar  5 14:50:57 pgsql1 pgpool: LOG:   pid 19950:
> > read_kind_from_backend: parameter name: is_superuser value: on
> > Mar  5 14:50:57 pgsql1 pgpool: LOG:   pid 19950:
> > read_kind_from_backend: parameter name: session_authorization value:
> > pgpool2
> > Mar  5 14:51:08 pgsql1 pgpool: ERROR: pid 19538:
> > read_kind_from_backend: 1 th kind E does not match with master or
> > majority connection kind C
> > Mar  5 14:51:08 pgsql1 pgpool: ERROR: pid 19538: kind mismatch among
> > backends. Possible last query was: "delete from "TSearcher"" kind
> > details are: 0[C] 1[E]
> > Mar  5 14:51:08 pgsql1 pgpool: LOG:   pid 19538: notice_backend_error:
> > 1 fail over request from pid 19538
> > Mar  5 14:51:08 pgsql1 pgpool: LOG:   pid 5315: starting degeneration.
> > shutdown host pgsql2.freyatest.domain(5432)
> > Mar  5 14:51:08 pgsql1 pgpool: LOG:   pid 5315: execute command:
> > /var/lib/postgresql/8.3/main/pgpool-failover 1 pgsql2.freyatest.domain
> > 5432 /var/lib/postgresql/8.3/main 0 0
> > Mar  5 14:51:08 pgsql1 pgpool[20704]: Executing pgpool-failover as  
> > user postgres
> > Mar  5 14:51:08 pgsql1 pgpool[20705]: Failover of node 1 at hostname
> > pgsql2.freyatest.domain. New master node is 0. Old master node was 0.
> > Mar  5 14:51:08 pgsql1 pgpool: LOG:   pid 5315: failover_handler: set
> > new master node: 0
> > Mar  5 14:51:08 pgsql1 pgpool: LOG:   pid 5315: failover done.
> > shutdown host pgsql2.freyatest.domain(5432)
> >
> >
> > Anyone else having this problem?
> >
> > -- 
> > Jaume Sabater
> > http://linuxsilo.net/
> >
> > "Ubi sapientas ibi libertas"
> > _______________________________________________
> > Pgpool-general mailing list
> > Pgpool-general at pgfoundry.org
> > http://pgfoundry.org/mailman/listinfo/pgpool-general
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgpool-general