[pgpool-general: 5230] Re: Failed to check replication time lag - infinitely occurring error

Bhattacharyya, Subhro s.bhattacharyya at sap.com
Thu Jan 12 18:49:37 JST 2017


Hi,

As pointed out, this is the log of node 1

2017-01-11 11:17:52: pid 28787: LOG:  watchdog node state changed from [STANDBY] to [JOINING]
2017-01-11 11:17:52: pid 28787: LOG:  watchdog node state changed from [JOINING] to [INITIALIZING]
2017-01-11 11:17:52: pid 28787: LOG:  read from socket failed with error :"Connection reset by peer"
2017-01-11 11:17:52: pid 28787: LOG:  read from socket failed, remote end closed the connection
2017-01-11 11:17:53: pid 28787: LOG:  watchdog node state changed from [INITIALIZING] to [STANDING FOR MASTER]
2017-01-11 11:17:53: pid 28787: LOG:  watchdog node state changed from [STANDING FOR MASTER] to [MASTER]
2017-01-11 11:17:53: pid 28787: LOG:  I am announcing my self as master/coordinator watchdog node
2017-01-11 11:17:53: pid 28787: LOG:  I am the cluster leader node
2017-01-11 11:17:53: pid 28787: DETAIL:  our declare coordinator message is accepted by all nodes
2017-01-11 11:17:53: pid 28787: LOG:  I am the cluster leader node. Starting escalation process
2017-01-11 11:17:53: pid 28787: LOG:  escalation process started with PID:30979
2017-01-11 11:17:53: pid 30979: LOG:  watchdog: escalation started
2017-01-11 11:17:53: pid 28787: LOG:  watchdog escalation process with pid: 30979 exit with SUCCESS.
2017-01-11 11:17:56: pid 30978: ERROR:  Failed to check replication time lag
2017-01-11 11:17:56: pid 30978: DETAIL:  Query to node (1) returned no data
2017-01-11 11:17:56: pid 30978: CONTEXT:  while checking replication time lag
2017-01-11 11:17:57: pid 30814: LOG:  statement: show pool_version;

This has indeed become master.
The other two nodes have cone up as standby.

However the watchdog information shows that the last node is master, not node 1.
How do we stop pgpool-II from thinking that node 1 is standby and stop the nodes from querying node 1

Thanks

From: Muhammad Usama [mailto:m.usama at gmail.com]
Sent: Thursday, January 12, 2017 2:50 PM
To: Bhattacharyya, Subhro <s.bhattacharyya at sap.com>
Cc: pgpool-general at pgpool.net
Subject: Re: [pgpool-general: 5228] Failed to check replication time lag - infinitely occurring error

Pgpool-II sends  (SELECT pg_last_xlog_replay_location())  query to standby backend nodes for calculating the replication time lag.
And the error "Query to node (1) returned no data" indicates that the backend node 1 is not returning any results for the query.

Usually this happens when Pgpool-II thinks that the node is a standby while actually it is not. So you should check the backend node 1 if it is in the standby mode and it is not accidentally promoted to primary.

Thanks
Regards
Muhammad Usama

On Thu, Jan 12, 2017 at 12:20 PM, Bhattacharyya, Subhro <s.bhattacharyya at sap.com<mailto:s.bhattacharyya at sap.com>> wrote:

Hi,

We have a 2 node postgresql cluster along with 3 pgpool-II nodes.
The version of pgpool-II that we are using is 3.5.2

We keep getting the following logs infinitely in the logs of all the pgpool-II nodes

2017-01-12 07:09:55: pid 2534: ERROR:  Failed to check replication time lag
2017-01-12 07:09:55: pid 2534: DETAIL:  Query to node (1) returned no data
2017-01-12 07:09:55: pid 2534: CONTEXT:  while checking replication time lag

Following is the watchdog cluster/node information:

Watchdog Cluster Information
Total Nodes          : 3
Remote Nodes         : 2
Quorum state         : QUORUM EXIST
Alive Remote Nodes   : 2
VIP up on local node : NO
Master Node Name     : Linux_270f905f-37ef-46a2-9f7b-9b92b96a6ea9_9999
Master Host Name     : 10.11.61.50

Watchdog Node Information
Node Name      : Linux_26acb17c-dadc-49ca-bb14-44820ab6a78a_9999
Host Name      : 10.11.61.52
Delegate IP    : Not_Set
Pgpool port    : 9999
Watchdog port  : 9000
Node priority  : 1
Status         : 7
Status Name    : STANDBY

Node Name      : Linux_2ae9f3fd-31fc-4921-aeb7-8e320cf347c4_9999
Host Name      : 10.11.61.51
Delegate IP    : Not_Set
Pgpool port    : 9999
Watchdog port  : 9000
Node priority  : 1
Status         : 7
Status Name    : STANDBY

Node Name      : Linux_270f905f-37ef-46a2-9f7b-9b92b96a6ea9_9999
Host Name      : 10.11.61.50
Delegate IP    : Not_Set
Pgpool port    : 9999
Watchdog port  : 9000
Node priority  : 1
Status         : 4
Status Name    : MASTER

Please let us know how to rectify this issue.


_______________________________________________
pgpool-general mailing list
pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
http://www.pgpool.net/mailman/listinfo/pgpool-general

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20170112/6c13ac1a/attachment-0001.html>


More information about the pgpool-general mailing list