[pgpool-general: 5070] Re: Possible bug in streaming replication/load-balancing mode when standby is up but way behind.

David Sisk -X (dsisk - TEKSYSTEMS INC at Cisco) dsisk at cisco.com
Mon Oct 24 23:44:58 JST 2016


I may have been wrong about replication delay causing the issue.  We've now encountered the issue when there is no replication delay...the errors are erratic, and happen sometimes but not other times.

I have determined that it has something to do with load_balancing...with load_balancing = on, the errors happen periodically...with load_balancing = off, the errors do NOT happen. Please try turning load_balancing on to reproduce the problem.

Thank you for the response!

David Sisk
Engineer - Software
dsisk at cisco.com
Tel: 
Cisco Systems, Inc.
7025-6 Kit Creek Road PO Box 14987
RESEARCH TRIANGLE PARK
27709-4987
United States
cisco.com


Think before you print.
This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message.
Please click here for Company Registration Information.


-----Original Message-----
From: Tatsuo Ishii [mailto:ishii at sraoss.co.jp] 
Sent: Tuesday, October 18, 2016 10:39 PM
To: David Sisk -X (dsisk - TEKSYSTEMS INC at Cisco) <dsisk at cisco.com>
Cc: pgpool-general at pgpool.net
Subject: Re: [pgpool-general: 5057] Possible bug in streaming replication/load-balancing mode when standby is up but way behind.

> In two cases, I've gotten this error from PGPool 3.5.4 (with PostgreSQL 9.5.4) when a hot standby was up and available, but not replicating properly and way behind the primary:
> 
> -bash-4.1$ psql -h localhost -U jiralt01 jiralt01
> psql: ERROR: unable to read message kind
> DETAIL: kind does not match between master(53) slot[1] (45) -bash-4.1$

Do you know how to reproduce the problem?  Just creating an excess replication delay does not reproduce the problem here.

> What I believe should happen is that PGPool should detach that standby (delay_threshold = 10000000)...instead it blocks anything from connecting, which does NOT lead to the intended high-availability of this SR+LB configuration.

Sounds an idea but it will break existing behavior. Should be proposed as a new feature (e.g. requires new switch).

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


More information about the pgpool-general mailing list