[pgpool-general: 6081] Re: failover_require_consensus does not work.

Vlad G omenvlad at gmail.com
Sat May 12 00:38:00 JST 2018


Hey guys,

I changed:
- health_check_period = 0
+ health_check_period = 5
and now it seems everything is working.  

I can't thank you enough for your help. 
I really appreciate it. 

Best regards,
Vladyslav

> On May 11, 2018, at 11:30, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
> Ok, here is a proposal for addition to the doc.
> 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
> 
>> Usama,
>> 
>> Do we want to add some notes to the doc regarding this? The behavior
>> described below may not be obvious to users.
>> 
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>> 
>>> Hi
>>> 
>>> Thanks for the logs and config files.
>>> As per the logs and pgpool.conf files, This is what is happening.
>>> 
>>> You have health check disabled on all Pgpool-II nodes, So only way to
>>> detect the backend failure is through fail_over_on_backend error( which
>>> only works when client connection
>>> detects the error) . But since the clients are only connecting to the
>>> master Pgpool-II node, so only master Pgpool-II node can notice the backend
>>> PostgreSQL node failure
>>> and because of consensus requirement it will keep waiting for the detection
>>> of backend failure by other Pgpool-II nodes, Which never arrives because
>>> other two Pgpool-II nodes
>>> are sitting idle and didn't detected the error.
>>> So you either need to enable the health check on all pgpool-II nodes (
>>> Which is the recommended setting for HA) or just disable the consensus
>>> requirements (as you did when failover
>>> was working fine)
>>> 
>>> Thanks
>>> Best Rgeards
>>> Muhammad Usama
>>> 
>>> On Tue, May 8, 2018 at 7:54 PM, Vlad G <omenvlad at gmail.com> wrote:
>>> 
>>>> Hey Guys.
>>>> Thank you for your answer.
>>>> I attached the configuration files of pgpool and logs.
>>>> I hope you сan help.
>>>> 
>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Best regards,
>>>> Vladyslav
>>>> 
>>>> 
>>>> On May 7, 2018, at 16:05, Muhammad Usama <m.usama at gmail.com> wrote:
>>>> 
>>>> Hi
>>>> 
>>>> From the log snippet you shared it seems that the the failure was never
>>>> detected by the other Pgpool-II node, Can you please share the pgpoo.conf
>>>> files and log files for all Pgpool nodes.
>>>> 
>>>> Thanks
>>>> Best Regards
>>>> Muhammad Usama
>>>> 
>>>> On Thu, May 3, 2018 at 5:20 PM, Vlad G <omenvlad at gmail.com> wrote:
>>>> 
>>>>> Hey Guys,
>>>>> I have a cluster with Pgpool-II-pg96-3.7.3 and postgresql-9.6.
>>>>> (3 x pgpool and 3 x postgresql
>>>>> The same scheme as:
>>>>> http://www.pgpool.net/docs/latest/en/html/example-cluster.html
>>>>> 
>>>>> When master node of postgresql (pgpoolpsql-1) goes down the master node
>>>>> of pgpool (  pgpool-1)  does not get second vote from one of the standby
>>>>> pgpool nodes (pgpool-2 and pgpool-3).
>>>>> 
>>>>> If I set:
>>>>> failover_require_consensus = off
>>>>> Everything works fine.
>>>>> 
>>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24237:
>>>>> LOG:  failed to connect to PostgreSQL server on "pgpoolpsql-1:5432",
>>>>> getsockopt() detected error "Connection refused"
>>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24237:
>>>>> LOG:  received degenerate backend request for node_id: 0 from pid [24237]
>>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>>>>> LOG:  new IPC connection received
>>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>>>>> LOG:  watchdog received the failover command from local pgpool-II on IPC
>>>>> interface
>>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>>>>> LOG:  watchdog is processing the failover command
>>>>> [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
>>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>>>>> LOG:  failover requires the majority vote, waiting for consensus
>>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>>>>> DETAIL:  failover request noted
>>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>>>>> LOG:  failover command [DEGENERATE_BACKEND_REQUEST] request from pgpool-II
>>>>> node "pgpool-1:9999 Linux pgpool-1" is queued, waiting for the confirmation
>>>>> from other nodes
>>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24237:
>>>>> LOG:  degenerate backend request for node_id: 0 from pid [24237], will be
>>>>> handled by watchdog, which is building consensus for request
>>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24237:
>>>>> FATAL:  failed to create a backend connection
>>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24237:
>>>>> DETAIL:  executing failover on backend
>>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24216:
>>>>> LOG:  child process with pid: 24237 exits with status 256
>>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24216:
>>>>> LOG:  fork a new child process with pid: 24268
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>>>>> LOG:  failed to connect to PostgreSQL server on "pgpoolpsql-1:5432",
>>>>> getsockopt() detected error "Connection refused"
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>>>>> LOG:  received degenerate backend request for node_id: 0 from pid [24228]
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>>> LOG:  new IPC connection received
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>>> LOG:  watchdog received the failover command from local pgpool-II on IPC
>>>>> interface
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>>> LOG:  watchdog is processing the failover command
>>>>> [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>>> LOG:  Duplicate failover request from "pgpool-1:9999 Linux pgpool-1" node
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>>> DETAIL:  request ignored
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>>> LOG:  failover requires the majority vote, waiting for consensus
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>>> DETAIL:  failover request noted
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>>>>> LOG:  degenerate backend request for 1 node(s) from pid [24228], is changed
>>>>> to quarantine node request by watchdog
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>>>>> DETAIL:  watchdog is taking time to build consensus
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>>>>> FATAL:  failed to create a backend connection
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>>>>> DETAIL:  executing failover on backend
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24216:
>>>>> LOG:  Pgpool-II parent process has received failover request
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>>> LOG:  new IPC connection received
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>>> LOG:  received the failover indication from Pgpool-II on IPC interface
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>>> LOG:  watchdog is informed of failover end by the main process
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24216:
>>>>> LOG:  starting quarantine. shutdown host pgpoolpsql-1(5432)
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24216:
>>>>> LOG:  Restart all children
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24216:
>>>>> LOG:  failover: set new primary node: -1
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24216:
>>>>> LOG:  failover: set new master node: 1
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24252:
>>>>> LOG:  worker process received restart request
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>>> LOG:  new IPC connection received
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>>> LOG:  received the failover indication from Pgpool-II on IPC interface
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>>> LOG:  watchdog is informed of failover start by the main process
>>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: quarantine done. shutdown host
>>>>> pgpoolpsql-1(5432)2018-05-03 13:02:46: pid 24216: LOG:  quarantine done.
>>>>> shutdown host pgpoolpsql-1(5432)
>>>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24251:
>>>>> LOG:  restart request received in pcp child process
>>>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>>>>> LOG:  PCP child 24251 exits with status 0 in failover()
>>>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>>>>> LOG:  fork a new PCP child pid 24301 in failover()
>>>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>>>>> LOG:  child process with pid: 24219 exits with status 0
>>>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>>>>> LOG:  child process with pid: 24219 exited with success and will not be
>>>>> restarted
>>>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>>>>> LOG:  child process with pid: 24220 exits with status 0
>>>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>>>>> LOG:  child process with pid: 24220 exited with success and will not be
>>>>> restarted
>>>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>>>>> LOG:  child process with pid: 24221 exits with status 0
>>>>> 
>>>>> Around a month ago it woked fine (It seems I tested it on pgpool-3.7.2),
>>>>> but now it does not work. Could you tell me some parameters what it depends
>>>>> on or you have other thoughts.
>>>>> 
>>>>> Best regards,
>>>>> Vladyslav
>>>>> 
>>>>> _______________________________________________
>>>>> pgpool-general mailing list
>>>>> pgpool-general at pgpool.net
>>>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> pgpool-general mailing list
>>>> pgpool-general at pgpool.net
>>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>>> 
>>>> 
>> _______________________________________________
>> pgpool-general mailing list
>> pgpool-general at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-general
> diff --git a/doc/src/sgml/watchdog.sgml b/doc/src/sgml/watchdog.sgml
> index 7e7adc9..041686b 100644
> --- a/doc/src/sgml/watchdog.sgml
> +++ b/doc/src/sgml/watchdog.sgml
> @@ -442,6 +442,16 @@
>         <para>
>           Default is on.
>         </para>
> +
> +	<caution>
> +	  <para>
> +	    To make <varname>failover_require_consensus</varname>
> +	    workable, You need to enable health check. For more
> +	    details of health check,
> +	    see <xref linkend="runtime-config-health-check">.
> +	  </para>
> +	</caution>
> +
>         <para>
>         <varname>failover_require_consensus</varname> is not available prior to
>         <productname>Pgpool-II </productname><emphasis>V3.7</emphasis>. and it is only
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general



More information about the pgpool-general mailing list