[pgpool-general: 6031] Re: pgpool identify nodes bug after stopping all nodes [without deatch]

Tatsuo Ishii ishii at sraoss.co.jp
Fri Apr 6 10:17:48 JST 2018


I see some segfaults messages in the log.  That might be a known
problem with Pgpool-II 3.7.2 when used with DISALLOW_FAIOVER. Can you
try the latest source code at the Pgpool-II git repository?

Or you can test without DISALLOW_FAIOVER flag to see if there's
improvement.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Hi,
> I'm using pgpool version 3.7.2 .
> I configured  IN pgpool 3 nodes (that havethe DISALLOW_FAILOVER command).
> I configured those 3 nodes replication with repmgr.  I synced all the nodes
> and started the pgpool (but forgot to deatch them via pcp in the pool when
> I stopped the cluster).
>  After starting the pgpool process many times It took the pool about 10
> minutes to identify the nodes and you see in the log the same ourput during
> the 10 minutes :
> [[No Connection]]([No Connection]) - 2018-03-18 18:56:02 - [No Connection]
> [31193]LOG:  find_primary_node: checking backend no 0
> [[No Connection]]([No Connection]) - 2018-03-18 18:56:02 - [No Connection]
> [31193]LOG:  find_primary_node: checking backend no 1
> [[No Connection]]([No Connection]) - 2018-03-18 18:56:02 - [No Connection]
> [31193]LOG:  find_primary_node: checking backend no 2
> 
> Moreover, I couldnt use the pcp_commands *, *I got the next error :
> [postgres at pgpool1 log]$ pcp_node_info -h localhost -U postgres -p 9898 0
> Password:
> ERROR: connection to host "localhost" failed with error "Connection refused"
> 
> Suddenly, after 10 minutes I saw in the log that the nodes where identified
> :
> 
> [[No Connection]]([No Connection]) - 2018-03-18 18:56:04 - [No Connection]
> [31193]LOG:  find_primary_node: checking backe
> nd no 0
> [[No Connection]]([No Connection]) - 2018-03-18 18:56:04 - [No Connection]
> [31193]LOG:  find_primary_node: checking backe
> nd no 1
> [[No Connection]]([No Connection]) - 2018-03-18 18:56:04 - [No Connection]
> [31193]LOG:  find_primary_node: checking backe
> nd no 2
> [[No Connection]]([No Connection]) - 2018-03-18 18:56:05 - [No Connection]
> [31193]LOG:  pgpool-II successfully started. v
> ersion 3.7.2 (amefuriboshi)
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:23 - [No Connection]
> [1132]LOG:  forked new pcp worker, pid=1942 so
> cket=8
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:23 - [No Connection]
> [1132]LOG:  PCP process with pid: 1942 exit wi
> th SUCCESS.
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:23 - [No Connection]
> [1132]LOG:  PCP process with pid: 1942 exits w
> ith status 0
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:26 - [No Connection]
> [1132]LOG:  forked new pcp worker, pid=1957 so
> cket=8
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:26 - [No Connection]
> [1132]LOG:  PCP process with pid: 1957 exit with SUCCESS.
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:26 - [No Connection]
> [1132]LOG:  PCP process with pid: 1957 exits with status 0
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:29 - [No Connection]
> [1132]LOG:  forked new pcp worker, pid=1971 socket=8
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:29 - [No Connection]
> [1132]LOG:  PCP process with pid: 1971 exit with SUCCESS.
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:29 - [No Connection]
> [1132]LOG:  PCP process with pid: 1971 exits with status 0
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
> [1132]LOG:  forked new pcp worker, pid=2074 socket=8
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
> [2074]LOG:  received failback request for node_id: 1 from pid [2074]
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
> [31193]LOG:  Pgpool-II parent process has received failover request
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
> [31193]LOG:  starting fail back. reconnect host pgserver2(5432)
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
> [31193]LOG:  Node 0 is not down (status: 2)
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
> [1132]LOG:  PCP process with pid: 2074 exit with SUCCESS.
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
> [1132]LOG:  PCP process with pid: 2074 exits with status 0
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
> [31193]LOG:  Do not restart children because we are failing back node id 1
> host: pgserver2 port: 5432 and we are in streaming replication mode and not
> all backends were down
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
> [31193]LOG:  find_primary_node_repeatedly: waiting for finding a primary
> node
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
> [31193]LOG:  find_primary_node: checking backend no 0
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
> [31193]LOG:  find_primary_node: checking backend no 1
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
> [31193]LOG:  find_primary_node: primary node id is 1
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
> [31193]LOG:  failover: set new primary node: 1
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
> [31193]LOG:  failover: set new master node: 0
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
> [31193]LOG:  failback done. reconnect host pgserver2(5432)
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:50 - [No Connection]
> [1135]LOG:  worker process received restart request
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:51 - [No Connection]
> [1132]LOG:  restart request received in pcp child process
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:51 - [No Connection]
> [31193]LOG:  PCP child 1132 exits with status 0 in failover()
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:51 - [No Connection]
> [31193]LOG:  fork a new PCP child pid 2078 in failover()
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:51 - [No Connection]
> [31193]LOG:  worker child process with pid: 1135 exits with status 256
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:51 - [No Connection]
> [31193]LOG:  fork a new worker child process with pid: 2079
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
> [31193]WARNING:  child process with pid: 31214 was terminated by
> segmentation fault
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
> [31193]LOG:  fork a new child process with pid: 2086
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
> [2086]LOG:  failback event detected
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
> [2086]DETAIL:  restarting myself
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
> [31193]LOG:  child process with pid: 2086 exits with status 256
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
> [31193]LOG:  fork a new child process with pid: 2087
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
> [31193]WARNING:  child process with pid: 31215 was terminated by
> segmentation fault
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
> [31193]LOG:  fork a new child process with pid: 2090
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
> [2090]LOG:  failback event detected
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
> [2090]DETAIL:  restarting myself
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
> [31193]LOG:  child process with pid: 2090 exits with status 256
> [[No Connection]]([No Connection]) - 2018-03-18 18:57:53 - [No Connection]
> [31193]LOG:  fork a new child process with pid: 2091
> 
> 
> 
> 
> --It seems that the pool was stuck and restarting it didnt resolve it. Is
> it suppose to happen or is it a bug ? Can you explain to me what is the
> reason behind it ?


More information about the pgpool-general mailing list