[pgpool-general: 7156] Re: Pgpool starts and stops

Bo Peng pengbo at sraoss.co.jp
Mon Jul 20 15:17:43 JST 2020


Hello,

> 2020-07-17T13:53:58.207428+00:00 lcm-34-199 pgpool[1495]: [19-1] 2020-07-17
> 13:53:58: pid 1495: WARNING:  network IP is removed and system has no IP is
> assigned
> 2020-07-17T13:53:58.207489+00:00 lcm-34-199 pgpool[1495]: [19-2] 2020-07-17
> 13:53:58: pid 1495: DETAIL:  changing the state to in network trouble
> 2020-07-17T13:53:58.207522+00:00 lcm-34-199 pgpool[1495]: [20-1] 2020-07-17
> 13:53:58: pid 1495: LOG:  watchdog node state changed from [STANDBY] to [IN
> NETWORK TROUBLE]
> 2020-07-17T13:53:58.207550+00:00 lcm-34-199 pgpool[1495]: [21-1] 2020-07-17
> 13:53:58: pid 1495: FATAL:  system has lost the network
> 2020-07-17T13:53:58.207580+00:00 lcm-34-199 pgpool[1495]: [22-1] 2020-07-17
> 13:53:58: pid 1495: LOG:  Watchdog is shutting down

I think the network interface state is changed after reboot.
Could you check if your network is started and the wd_hostname is set in your network interface after reboot.
Please make sure that the network is started before starting pgpool.

On Fri, 17 Jul 2020 19:43:17 +0530
Lakshmi Raghavendra <lakshmiym108 at gmail.com> wrote:

> Hi All,
> 
>      *Am currently using pgpool for a production release and facing the
> below issue :*
> *As soon as the pgpool starts on one of the node after reboot, it stops and
> fails to join cluster of 3 nodes. Am using pgpool 4.0.4 for postgres
> 9.6.18. Any pointers to resolve this at the earliest is highly appreciated.*
> *Attaching the pgpool.conf .*
> *Thanks in advance !*
> 
> 2020-07-17T08:17:30.160121+00:00 lcm-34-199 pgpool[1192]: [9-1] 2020-07-17
> 08:17:30: pid 1192: LOG:  pgpool-II successfully* started. version 4.0.4
> (torokiboshi)*
> 2020-07-17T13:53:25.859897+00:00 lcm-34-199 pgpool[1495]: [7-1] 2020-07-17
> 13:53:25: pid 1495: LOG:  watchdog node state changed from [DEAD] to
> [LOADING]
> 2020-07-17T13:53:25.866426+00:00 lcm-34-199 pgpool[1495]: [8-1] 2020-07-17
> 13:53:25: pid 1495: LOG:  new outbound connection to 10.198.34.195:9000
> 2020-07-17T13:53:25.866662+00:00 lcm-34-199 pgpool[1495]: [9-1] 2020-07-17
> 13:53:25: pid 1495: LOG:  new outbound connection to 10.198.34.200:9000
> 2020-07-17T13:53:25.868803+00:00 lcm-34-199 pgpool[1495]: [10-1] 2020-07-17
> 13:53:25: pid 1495: LOG:  setting the remote node
> "lcm-34-195.dev.lcm.local:9999 Linux lcm-34-195.dev.lcm.local" as watchdog
> cluster master
> 2020-07-17T13:53:25.868951+00:00 lcm-34-199 pgpool[1495]: [11-1] 2020-07-17
> 13:53:25: pid 1495: LOG:  watchdog node state changed from [LOADING] to
> [INITIALIZING]
> 2020-07-17T13:53:26.870517+00:00 lcm-34-199 pgpool[1495]: [12-1] 2020-07-17
> 13:53:26: pid 1495: LOG:  watchdog node state changed from [INITIALIZING]
> to [STANDBY]
> 2020-07-17T13:53:26.870803+00:00 lcm-34-199 pgpool[1495]: [13-1] 2020-07-17
> 13:53:26: pid 1495: LOG:  successfully joined the watchdog cluster as
> standby node
> 2020-07-17T13:53:26.870878+00:00 lcm-34-199 pgpool[1495]: [13-2] 2020-07-17
> 13:53:26: pid 1495: DETAIL:  our join coordinator request is accepted by
> cluster leader node "lcm-34-195.dev.lcm.local:9999 Linux
> lcm-34-195.dev.lcm.local"
> 2020-07-17T13:53:26.871160+00:00 lcm-34-199 pgpool[1486]: [3-1] 2020-07-17
> 13:53:26: pid 1486: LOG:  watchdog process is initialized
> 2020-07-17T13:53:26.871422+00:00 lcm-34-199 pgpool[1495]: [14-1] 2020-07-17
> 13:53:26: pid 1495: LOG:  new IPC connection received
> 2020-07-17T13:53:26.871561+00:00 lcm-34-199 pgpool[1486]: [4-1] 2020-07-17
> 13:53:26: pid 1486: LOG:  we have joined the watchdog cluster as STANDBY
> node
> 2020-07-17T13:53:26.871618+00:00 lcm-34-199 pgpool[1486]: [4-2] 2020-07-17
> 13:53:26: pid 1486: DETAIL:  syncing the backend states from the MASTER
> watchdog node
> 2020-07-17T13:53:26.871746+00:00 lcm-34-199 pgpool[1495]: [15-1] 2020-07-17
> 13:53:26: pid 1495: LOG:  new IPC connection received
> 2020-07-17T13:53:26.871802+00:00 lcm-34-199 pgpool[1495]: [16-1] 2020-07-17
> 13:53:26: pid 1495: LOG:  received the get data request from local
> pgpool-II on IPC interface
> 2020-07-17T13:53:26.871873+00:00 lcm-34-199 pgpool[1495]: [17-1] 2020-07-17
> 13:53:26: pid 1495: LOG:  get data request from local pgpool-II node
> received on IPC interface is forwarded to master watchdog node
> "lcm-34-195.dev.lcm.local:9999 Linux lcm-34-195.dev.lcm.local"
> 2020-07-17T13:53:26.871933+00:00 lcm-34-199 pgpool[1495]: [17-2] 2020-07-17
> 13:53:26: pid 1495: DETAIL:  waiting for the reply...
> 2020-07-17T13:53:26.872216+00:00 lcm-34-199 pgpool[1486]: [5-1] 2020-07-17
> 13:53:26: pid 1486: LOG:  master watchdog node
> "lcm-34-195.dev.lcm.local:9999 Linux lcm-34-195.dev.lcm.local" returned
> status for 3 backend nodes
> 2020-07-17T13:53:26.872286+00:00 lcm-34-199 pgpool[1486]: [6-1] 2020-07-17
> 13:53:26: pid 1486: LOG:  backend:1 is set to down status
> 2020-07-17T13:53:26.872335+00:00 lcm-34-199 pgpool[1486]: [6-2] 2020-07-17
> 13:53:26: pid 1486: DETAIL:  backend:1 is DOWN on cluster master
> "lcm-34-195.dev.lcm.local:9999 Linux lcm-34-195.dev.lcm.local"
> 2020-07-17T13:53:26.872394+00:00 lcm-34-199 pgpool[1486]: [7-1] 2020-07-17
> 13:53:26: pid 1486: LOG:  Setting up socket for 0.0.0.0:9999
> 2020-07-17T13:53:26.872443+00:00 lcm-34-199 pgpool[1486]: [8-1] 2020-07-17
> 13:53:26: pid 1486: LOG:  Setting up socket for :::9999
> 2020-07-17T13:53:26.877045+00:00 lcm-34-199 pgpool[1486]: [9-1] 2020-07-17
> 13:53:26: pid 1486: LOG:  pgpool-II successfully started. version 4.0.4
> (torokiboshi)
> 2020-07-17T13:53:26.877119+00:00 lcm-34-199 pgpool[1486]: [10-1] 2020-07-17
> 13:53:26: pid 1486: LOG:  node status[0]: 0
> 2020-07-17T13:53:26.877174+00:00 lcm-34-199 pgpool[1486]: [11-1] 2020-07-17
> 13:53:26: pid 1486: LOG:  node status[1]: 0
> 2020-07-17T13:53:26.877235+00:00 lcm-34-199 pgpool[1486]: [12-1] 2020-07-17
> 13:53:26: pid 1486: LOG:  node status[2]: 0
> 2020-07-17T13:53:26.892342+00:00 lcm-34-199 pgpool[1495]: [18-1] 2020-07-17
> 13:53:26: pid 1495: LOG:  new IPC connection received
> 2020-07-17T13:53:26.893061+00:00 lcm-34-199 pgpool[1822]: [4-1] 2020-07-17
> 13:53:26: pid 1822: LOG:  3 watchdog nodes are configured for lifecheck
> 2020-07-17T13:53:26.893135+00:00 lcm-34-199 pgpool[1822]: [5-1] 2020-07-17
> 13:53:26: pid 1822: LOG:  watchdog nodes ID:0
> Name:"lcm-34-199.dev.lcm.local:9999 Linux lcm-34-199.dev.lcm.local"
> 2020-07-17T13:53:26.893194+00:00 lcm-34-199 pgpool[1822]: [5-2] 2020-07-17
> 13:53:26: pid 1822: DETAIL:  Host:"lcm-34-199.dev.lcm.local" WD Port:9000
> pgpool-II port:9999
> 2020-07-17T13:53:26.893241+00:00 lcm-34-199 pgpool[1822]: [6-1] 2020-07-17
> 13:53:26: pid 1822: LOG:  watchdog nodes ID:1
> Name:"lcm-34-195.dev.lcm.local:9999 Linux lcm-34-195.dev.lcm.local"
> 2020-07-17T13:53:26.893301+00:00 lcm-34-199 pgpool[1822]: [6-2] 2020-07-17
> 13:53:26: pid 1822: DETAIL:  Host:"10.198.34.195" WD Port:9000 pgpool-II
> port:9999
> 2020-07-17T13:53:26.893351+00:00 lcm-34-199 pgpool[1822]: [7-1] 2020-07-17
> 13:53:26: pid 1822: LOG:  watchdog nodes ID:2
> Name:"lcm-34-200.dev.lcm.local:9999 Linux lcm-34-200.dev.lcm.local"
> 2020-07-17T13:53:26.893406+00:00 lcm-34-199 pgpool[1822]: [7-2] 2020-07-17
> 13:53:26: pid 1822: DETAIL:  Host:"10.198.34.200" WD Port:9000 pgpool-II
> port:9999
> 2020-07-17T13:53:27.894801+00:00 lcm-34-199 pgpool[1862]: [8-1] 2020-07-17
> 13:53:27: pid 1862: LOG:  creating socket for sending heartbeat
> 2020-07-17T13:53:27.894937+00:00 lcm-34-199 pgpool[1862]: [8-2] 2020-07-17
> 13:53:27: pid 1862: DETAIL:  set SO_REUSEPORT
> 2020-07-17T13:53:27.896688+00:00 lcm-34-199 pgpool[1860]: [8-1] 2020-07-17
> 13:53:27: pid 1860: LOG:  creating socket for sending heartbeat
> 2020-07-17T13:53:27.896739+00:00 lcm-34-199 pgpool[1860]: [8-2] 2020-07-17
> 13:53:27: pid 1860: DETAIL:  set SO_REUSEPORT
> 2020-07-17T13:53:27.901020+00:00 lcm-34-199 pgpool[1861]: [8-1] 2020-07-17
> 13:53:27: pid 1861: LOG:  creating watchdog heartbeat receive socket.
> 2020-07-17T13:53:27.901079+00:00 lcm-34-199 pgpool[1861]: [8-2] 2020-07-17
> 13:53:27: pid 1861: DETAIL:  set SO_REUSEPORT
> 2020-07-17T13:53:27.901216+00:00 lcm-34-199 pgpool[1859]: [8-1] 2020-07-17
> 13:53:27: pid 1859: LOG:  creating watchdog heartbeat receive socket.
> 2020-07-17T13:53:27.901268+00:00 lcm-34-199 pgpool[1859]: [8-2] 2020-07-17
> 13:53:27: pid 1859: DETAIL:  set SO_REUSEPORT
> 2020-07-17T13:53:58.207428+00:00 lcm-34-199 pgpool[1495]: [19-1] 2020-07-17
> 13:53:58: pid 1495: WARNING:  network IP is removed and system has no IP is
> assigned
> 2020-07-17T13:53:58.207489+00:00 lcm-34-199 pgpool[1495]: [19-2] 2020-07-17
> 13:53:58: pid 1495: DETAIL:  changing the state to in network trouble
> 2020-07-17T13:53:58.207522+00:00 lcm-34-199 pgpool[1495]: [20-1] 2020-07-17
> 13:53:58: pid 1495: LOG:  watchdog node state changed from [STANDBY] to [IN
> NETWORK TROUBLE]
> 2020-07-17T13:53:58.207550+00:00 lcm-34-199 pgpool[1495]: [21-1] 2020-07-17
> 13:53:58: pid 1495: FATAL:  system has lost the network
> 2020-07-17T13:53:58.207580+00:00 lcm-34-199 pgpool[1495]: [22-1] 2020-07-17
> 13:53:58: pid 1495: LOG:  Watchdog is shutting down
> 2020-07-17T13:53:58.208161+00:00 lcm-34-199 pgpool[1486]: [13-1] 2020-07-17
> 13:53:58: pid 1486: LOG:  watchdog child process with pid: 1495 exits with
> status 768
> 2020-07-17T13:53:58.208216+00:00 lcm-34-199 pgpool[1486]: [14-1] 2020-07-17
> 13:53:58: pid 1486: FATAL:  watchdog child process exit with fatal error.
> exiting pgpool-II
> 2020-07-17T13:57:14.337528+00:00 lcm-34-199 pgpool[4917]: [1-1] 2020-07-17
> 13:57:14: pid 4917: LOG:  stop request sent to pgpool. waiting for
> termination...


-- 
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS, Inc. Japan


More information about the pgpool-general mailing list