[Pgpool-general] Mismatch among backends

Marcelo Martins pglists at zeroaccess.org
Sat Mar 7 16:01:02 UTC 2009


Hi James,

Unfortunately I have also had several issues with backends getting  
mismatch problems on 2.2 while on load. Had to go back to 2.1 .

I was doing a stresstest on our pgpool server which has two backends.  
One of our devs created a python script that replays the apache logs  
and each script is run on 4 boxes that open 150 cocurrent connection  
to pgpool.

When the script first start everything seems ok. But when the amount  
of transactions start to increase the load on the pg backends start to  
get high around 20-30 avg and thats when pgpool starts to throw a  
bunch of mismatch errors and the backebds fall out sync. Sometimes  
that happens in 5 minutes other time in 10-15 minutes.

So we decided to do the stresstest gain but against version2.1 with a  
patch for the DECLARE statements. It was a cvs version from 2008-08-25  
if I'm correct. Everything worked out great on version 2.1, we  
tresstested pgpool and the backends to the fullest and no problems at  
all. We let the test run for 1 hour and repeated about 3 times. The  
load avg on the pg backends also reached around 50-70 .

When I get some time, hopefully next week or so, I will start doing  
this same tests but increasing the pgpool cvs version after revision  
112 until I start seeing problems again. Hopefully that will help  
Tatsuo.


-
Marcelo

On Mar 6, 2009, at 2:44, Jaume Sabater <jsabater at gmail.com> wrote:

> Hi all!
>
> Just tried to connect to my pgpool-II 2.2/PostgreSQL 8.3 cluster and
> saw an error, which I forgot to copy and paste somewhere, that said
> something like "error in catalog with relid 26243" (I only copied the
> number). I checked the cluster and, again, there had been a kind
> mismatch among backends, so the slave node was down and the cluster
> was working only with the master node.
>
> This is what I found on the syslog:
>
> Mar  6 08:10:27 pgsql1 pgpool: ERROR: pid 26306:
> read_kind_from_backend: 1 th kind E does not match with master or
> majority connection kind C
> Mar  6 08:10:27 pgsql1 pgpool: ERROR: pid 26306: kind mismatch among
> backends. Possible last query was: "COPY "TSearcherServices"
> ("IdSearcherServices", "IdSearcher" ,"IdService", "SearcherNumber" )
> Mar  6 08:10:27 pgsql1 pgpool: FROM '/opt/pgpool2/ 
> TSearcherServices.csv'
> Mar  6 08:10:27 pgsql1 pgpool: WITH DELIMITER AS '|' CSV;" kind
> details are: 0[C] 1[E]
> Mar  6 08:10:27 pgsql1 pgpool: LOG:   pid 26306: notice_backend_error:
> 1 fail over request from pid 26306
> Mar  6 08:10:27 pgsql1 pgpool: LOG:   pid 5315: starting degeneration.
> shutdown host pgsql2.freyatest.domain(5432)
> Mar  6 08:10:27 pgsql1 pgpool: LOG:   pid 5315: execute command:
> /var/lib/postgresql/8.3/main/pgpool-failover 1 pgsql2.freyatest.domain
> 5432 /var/lib/postgresql/8.3/main 0 0
> Mar  6 08:10:27 pgsql1 pgpool[32211]: Executing pgpool-failover as  
> user postgres
> Mar  6 08:10:27 pgsql1 pgpool[32212]: Failover of node 1 at hostname
> pgsql2.freyatest.domain. New master node is 0. Old master node was 0.
> Mar  6 08:10:27 pgsql1 pgpool: LOG:   pid 5315: failover_handler: set
> new master node: 0
> Mar  6 08:10:27 pgsql1 pgpool: LOG:   pid 5315: failover done.
> shutdown host pgsql2.freyatest.domain(5432)
>
>
> These COPY operations have been very frequent during the last three or
> four months, with developers constantly dumping information here and
> there. With version 2.1 I never had a mismatch among backends, but now
> I have had 2 of those this very same week, plus a few more the
> previous couple of weeks (we were working with betas or RCs of version
> 2.2). I can't really point at version 2.2 regarding the issue, but I
> promise I don't recall it happening with version 2.1. It is true that
> the number of operations on the PostgreSQL cluster have increased a
> lot in the last 4 weeks, too.
>
> Tatsuo, could you please check it out? Here you are the other error
> that happened this week. Notice the query was different. Logs from
> past week are gone, unfortunately.
>
> Mar  5 14:50:26 pgsql1 pgpool: ERROR: pid 20120: pool_read: read
> failed (Connection reset by peer)
> Mar  5 14:50:26 pgsql1 pgpool: LOG:   pid 20120:
> ProcessFrontendResponse: failed to read kind from frontend. frontend
> abnormally exited
> Mar  5 14:50:26 pgsql1 pgpool: LOG:   pid 20120:
> read_kind_from_backend: parameter name: is_superuser value: on
> Mar  5 14:50:26 pgsql1 pgpool: LOG:   pid 20120:
> read_kind_from_backend: parameter name: session_authorization value:
> pgpool2
> Mar  5 14:50:26 pgsql1 pgpool: LOG:   pid 20120:
> read_kind_from_backend: parameter name: is_superuser value: on
> Mar  5 14:50:26 pgsql1 pgpool: LOG:   pid 20120:
> read_kind_from_backend: parameter name: session_authorization value:
> pgpool2
> Mar  5 14:50:57 pgsql1 pgpool: LOG:   pid 19950:
> read_kind_from_backend: parameter name: is_superuser value: on
> Mar  5 14:50:57 pgsql1 pgpool: LOG:   pid 19950:
> read_kind_from_backend: parameter name: session_authorization value:
> pgpool2
> Mar  5 14:50:57 pgsql1 pgpool: LOG:   pid 19950:
> read_kind_from_backend: parameter name: is_superuser value: on
> Mar  5 14:50:57 pgsql1 pgpool: LOG:   pid 19950:
> read_kind_from_backend: parameter name: session_authorization value:
> pgpool2
> Mar  5 14:51:08 pgsql1 pgpool: ERROR: pid 19538:
> read_kind_from_backend: 1 th kind E does not match with master or
> majority connection kind C
> Mar  5 14:51:08 pgsql1 pgpool: ERROR: pid 19538: kind mismatch among
> backends. Possible last query was: "delete from "TSearcher"" kind
> details are: 0[C] 1[E]
> Mar  5 14:51:08 pgsql1 pgpool: LOG:   pid 19538: notice_backend_error:
> 1 fail over request from pid 19538
> Mar  5 14:51:08 pgsql1 pgpool: LOG:   pid 5315: starting degeneration.
> shutdown host pgsql2.freyatest.domain(5432)
> Mar  5 14:51:08 pgsql1 pgpool: LOG:   pid 5315: execute command:
> /var/lib/postgresql/8.3/main/pgpool-failover 1 pgsql2.freyatest.domain
> 5432 /var/lib/postgresql/8.3/main 0 0
> Mar  5 14:51:08 pgsql1 pgpool[20704]: Executing pgpool-failover as  
> user postgres
> Mar  5 14:51:08 pgsql1 pgpool[20705]: Failover of node 1 at hostname
> pgsql2.freyatest.domain. New master node is 0. Old master node was 0.
> Mar  5 14:51:08 pgsql1 pgpool: LOG:   pid 5315: failover_handler: set
> new master node: 0
> Mar  5 14:51:08 pgsql1 pgpool: LOG:   pid 5315: failover done.
> shutdown host pgsql2.freyatest.domain(5432)
>
>
> Anyone else having this problem?
>
> -- 
> Jaume Sabater
> http://linuxsilo.net/
>
> "Ubi sapientas ibi libertas"
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgpool-general


More information about the Pgpool-general mailing list