[pgpool-general: 3655] Re: kind mismatch among backends. Possible last query was: "COMMIT"

Fri Apr 24 12:40:42 JST 2015

Thanks for your explanation.
I agree that it probably is a postgres timeout issue and not a problem with pgpool, per se. But if you can explain a little more on what is happening with pgpool, I will be able to work out a solution for the problem.
The error message says ---

kind mismatch among backends. Possible last query was: "COMMIT" kind details are: 0[C] 1[E: canceling statement due to statement timeout]
I am assuming that means that the operation completed on node 0[C] and got an error on node 1[E]. So here are my questions -1) Node 0 is the primary node, which means that commit should only happen on node 0 ?
2) Or is the commit sent to node 1 (hot standby) also ? I am using explicit transaction begin and commit.3) If the commit is sent to node 1 also, then I am assuming actual updates are not happeing in this transaction but only through replication. So node 1 should not ideally not take too long to commit and finish quickly.4) Is it that the commit to node 1 goes after the node 0 has been committed and node 1 has run out of time.5) This happens only in production and not in test env, so pg_locks will not help because I come to know of the issue after several hours. Is there a way to capture the lock state just before or after the failure.6) One last question, even though node 1 times out, the data has actually been updated. Since node 0 reports success, can we just ignore the error from node 1 until we find a correct solution to the problem.
Thanks for any help you can provide.
Regards,Anupama.

     On Friday, April 24, 2015 4:29 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:

 Sounds like a PostgreSQL issue and irrelevant to pgpool-II.

> My pg-pool setup is as follows -
> I use Pgpool II - 3.3.1
> Two 9.1 backends in load balanced mode. The 2 backends use postgres streaming replication.DB node 0 - primary node, DB node 1 - hot standby
> We are __not__ using pgpool for connection pool or replication.Front end is a web application using JDBC + DBCP connection pool.
> 
> I see the error in the pgpool logs as shown below, for one of the updates in my application.
> Some details on the nature of the transaction -
> There are several records updated across multiple tables, in a single transaction. Each update statement updates only a few records. 2 of these tables have deferred update triggers.
> Total time consumed in completing all the updates could be high (say 1-2 minutes), but each update could take only 1-2 seconds, not more. The default statement timeout is set at 30 seconds.
> I have increased the statement timeout to 3 minutes, just to make sure that the transaction goes to completion.The above error happens only some times and not always. 

Apparently the statement took more than 30 seconds.

> Apr 23 09:54:27 auhhisapp01 pgpool[25760]: DB node id: 1 backend pid: 28735 statement: Parse:  set statement_timeout = 30000

This sets the timeout to 30 seconds, not 3 minutes. The only solution
I can think of is, disabling the statement_timeout or increasing it.

Another possiblity is, the COMMIT on node 1 was waiting for taking a
lock hold by someone else.  You could check the siuation by using
pg_locks view.

> Looks like, node 0 reports completion of transaction, while node 1 reports an statement timeout error.
> 
> What am I doing wrong and how do I go about troubleshooting this ?
> Thanks for your help. 
> 
> ================ Pgpool error logs ===============================================Apr 23 09:54:27 auhhisapp01 pgpool[25760]: DB node id: 0 backend pid: 944 statement: Execute: set search_path = 'nmc'
> Apr 23 09:54:27 auhhisapp01 pgpool[25760]: DB node id: 1 backend pid: 28735 statement: Execute: set search_path = 'nmc'
> Apr 23 09:54:27 auhhisapp01 pgpool[25760]: DB node id: 0 backend pid: 944 statement: Parse:  set statement_timeout = 30000
> Apr 23 09:54:27 auhhisapp01 pgpool[25760]: DB node id: 1 backend pid: 28735 statement: Parse:  set statement_timeout = 30000
> Apr 23 09:54:27 auhhisapp01 rsyslogd-2177: imuxsock begins to drop messages from pid 25760 due to rate-limiting
> Apr 23 09:54:29 auhhisapp01 pgpool[12387]: DB node id: 1 backend pid: 28738 statement: Execute: COMMIT
> Apr 23 09:54:29 auhhisapp01 pgpool[12387]: pool_send_and_wait: Error or notice message from backend: : DB node id: 1 backend pid: 28738 statement: COMMIT message: canceling statement due to statement timeout
> Apr 23 09:54:29 auhhisapp01 pgpool[12387]: read_kind_from_backend: 1 th kind E does not match with master or majority connection kind C
> Apr 23 09:54:29 auhhisapp01 pgpool[12387]: kind mismatch among backends. Possible last query was: "COMMIT" kind details are: 0[C] 1[E: canceling statement due to statement timeout]
> Apr 23 09:54:29 auhhisapp01 pgpool[12387]: do_child: exits with status 1 due to error
> Apr 23 09:54:30 auhhisapp01 pgpool[15802]: DB node id: 0 backend pid: 15013 statement: Parse: set search_path = 'nmc'
> Apr 23 09:54:30 auhhisapp01 pgpool[15802]: DB node id: 1 backend pid: 5694 statement: Parse: set search_path = 'nmc'
> Apr 23 09:54:30 auhhisapp01 pgpool[15802]: DB node id: 0 backend pid: 15013 statement: B message
> =============================================================================================
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20150424/42b961fc/attachment.html>