[pgpool-general: 5231] Re: Failed to check replication time lag - infinitely occurring error

Thu Jan 12 20:09:19 JST 2017

The watchdog information has nothing to do with the PostgreSQL backends.
The master shown by the watchdog info against the Pgpool-II node is the
Pgpool-II master/coordinator node in the Pgpool-II watchdog cluster. While
the master/primary node in discussion above is the master PostgreSQL
backend node in streaming replication.

Thanks
Best regards
Muhammad Usama

On Thu, Jan 12, 2017 at 2:49 PM, Bhattacharyya, Subhro <
s.bhattacharyya at sap.com> wrote:

> Hi,
>
>
>
> As pointed out, this is the log of node 1
>
>
>
> 2017-01-11 11:17:52: pid 28787: LOG:  watchdog node state changed from
> [STANDBY] to [JOINING]
>
> 2017-01-11 11:17:52: pid 28787: LOG:  watchdog node state changed from
> [JOINING] to [INITIALIZING]
>
> 2017-01-11 11:17:52: pid 28787: LOG:  read from socket failed with error
> :"Connection reset by peer"
>
> 2017-01-11 11:17:52: pid 28787: LOG:  read from socket failed, remote end
> closed the connection
>
> 2017-01-11 11:17:53: pid 28787: LOG:  watchdog node state changed from
> [INITIALIZING] to [STANDING FOR MASTER]
>
> 2017-01-11 11:17:53: pid 28787: LOG:  watchdog node state changed from
> [STANDING FOR MASTER] to [MASTER]
>
> 2017-01-11 11:17:53: pid 28787: LOG:  I am announcing my self as
> master/coordinator watchdog node
>
> 2017-01-11 11:17:53: pid 28787: LOG:  I am the cluster leader node
>
> 2017-01-11 11:17:53: pid 28787: DETAIL:  our declare coordinator message
> is accepted by all nodes
>
> 2017-01-11 11:17:53: pid 28787: LOG:  I am the cluster leader node.
> Starting escalation process
>
> 2017-01-11 11:17:53: pid 28787: LOG:  escalation process started with
> PID:30979
>
> 2017-01-11 11:17:53: pid 30979: LOG:  watchdog: escalation started
>
> 2017-01-11 11:17:53: pid 28787: LOG:  watchdog escalation process with
> pid: 30979 exit with SUCCESS.
>
> 2017-01-11 11:17:56: pid 30978: ERROR:  Failed to check replication time
> lag
>
> 2017-01-11 11:17:56: pid 30978: DETAIL:  Query to node (1) returned no data
>
> 2017-01-11 11:17:56: pid 30978: CONTEXT:  while checking replication time
> lag
>
> 2017-01-11 11:17:57: pid 30814: LOG:  statement: show pool_version;
>
>
>
> This has indeed become master.
>
> The other two nodes have cone up as standby.
>
>
>
> However the watchdog information shows that the last node is master, not
> node 1.
>
> How do we stop pgpool-II from thinking that node 1 is standby and stop the
> nodes from querying node 1
>
>
>
> Thanks
>
>
>
> *From:* Muhammad Usama [mailto:m.usama at gmail.com]
> *Sent:* Thursday, January 12, 2017 2:50 PM
> *To:* Bhattacharyya, Subhro <s.bhattacharyya at sap.com>
> *Cc:* pgpool-general at pgpool.net
> *Subject:* Re: [pgpool-general: 5228] Failed to check replication time
> lag - infinitely occurring error
>
>
>
> Pgpool-II sends  (SELECT pg_last_xlog_replay_location())  query to
> standby backend nodes for calculating the replication time lag.
>
> And the error "Query to node (1) returned no data" indicates that the
> backend node 1 is not returning any results for the query.
>
>
>
> Usually this happens when Pgpool-II thinks that the node is a standby
> while actually it is not. So you should check the backend node 1 if it is
> in the standby mode and it is not accidentally promoted to primary.
>
>
>
> Thanks
>
> Regards
>
> Muhammad Usama
>
>
>
> On Thu, Jan 12, 2017 at 12:20 PM, Bhattacharyya, Subhro <
> s.bhattacharyya at sap.com> wrote:
>
>
>
> Hi,
>
>
>
> We have a 2 node postgresql cluster along with 3 pgpool-II nodes.
>
> The version of pgpool-II that we are using is 3.5.2
>
>
>
> We keep getting the following logs infinitely in the logs of all the
> pgpool-II nodes
>
>
>
> *2017-01-12 07:09:55: pid 2534: ERROR:  Failed to check replication time
> lag*
>
> *2017-01-12 07:09:55: pid 2534: DETAIL:  Query to node (1) returned no
> data*
>
> *2017-01-12 07:09:55: pid 2534: CONTEXT:  while checking replication time
> lag*
>
>
>
> Following is the watchdog cluster/node information:
>
>
>
> Watchdog Cluster Information
>
> Total Nodes          : 3
>
> Remote Nodes         : 2
>
> Quorum state         : QUORUM EXIST
>
> Alive Remote Nodes   : 2
>
> VIP up on local node : NO
>
> Master Node Name     : Linux_270f905f-37ef-46a2-9f7b-9b92b96a6ea9_9999
>
> Master Host Name     : 10.11.61.50
>
>
>
> Watchdog Node Information
>
> Node Name      : Linux_26acb17c-dadc-49ca-bb14-44820ab6a78a_9999
>
> Host Name      : 10.11.61.52
>
> Delegate IP    : Not_Set
>
> Pgpool port    : 9999
>
> Watchdog port  : 9000
>
> Node priority  : 1
>
> Status         : 7
>
> Status Name    : STANDBY
>
>
>
> Node Name      : Linux_2ae9f3fd-31fc-4921-aeb7-8e320cf347c4_9999
>
> Host Name      : 10.11.61.51
>
> Delegate IP    : Not_Set
>
> Pgpool port    : 9999
>
> Watchdog port  : 9000
>
> Node priority  : 1
>
> Status         : 7
>
> Status Name    : STANDBY
>
>
>
> Node Name      : Linux_270f905f-37ef-46a2-9f7b-9b92b96a6ea9_9999
>
> Host Name      : 10.11.61.50
>
> Delegate IP    : Not_Set
>
> Pgpool port    : 9999
>
> Watchdog port  : 9000
>
> Node priority  : 1
>
> Status         : 4
>
> Status Name    : MASTER
>
>
>
> Please let us know how to rectify this issue.
>
>
>
>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20170112/80c62311/attachment.html>