[pgpool-general: 6084] Re: failover_require_consensus does not work.

Tatsuo Ishii ishii at sraoss.co.jp
Mon May 14 06:14:37 JST 2018


Hi Usama,

Thanks. The patch looks good to me.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Hi Ishii-San
> 
> I have tried to rephrase your suggestion for clarity. Please have a look at
> the attached patch if you see it fit
> 
> 
> Thanks
> Best Regards
> Muhammad Usama
> 
> 
> On Fri, May 11, 2018 at 1:30 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
>> Ok, here is a proposal for addition to the doc.
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>> > Usama,
>> >
>> > Do we want to add some notes to the doc regarding this? The behavior
>> > described below may not be obvious to users.
>> >
>> > Best regards,
>> > --
>> > Tatsuo Ishii
>> > SRA OSS, Inc. Japan
>> > English: http://www.sraoss.co.jp/index_en.php
>> > Japanese:http://www.sraoss.co.jp
>> >
>> >> Hi
>> >>
>> >> Thanks for the logs and config files.
>> >> As per the logs and pgpool.conf files, This is what is happening.
>> >>
>> >> You have health check disabled on all Pgpool-II nodes, So only way to
>> >> detect the backend failure is through fail_over_on_backend error( which
>> >> only works when client connection
>> >> detects the error) . But since the clients are only connecting to the
>> >> master Pgpool-II node, so only master Pgpool-II node can notice the
>> backend
>> >> PostgreSQL node failure
>> >> and because of consensus requirement it will keep waiting for the
>> detection
>> >> of backend failure by other Pgpool-II nodes, Which never arrives because
>> >> other two Pgpool-II nodes
>> >> are sitting idle and didn't detected the error.
>> >> So you either need to enable the health check on all pgpool-II nodes (
>> >> Which is the recommended setting for HA) or just disable the consensus
>> >> requirements (as you did when failover
>> >> was working fine)
>> >>
>> >> Thanks
>> >> Best Rgeards
>> >> Muhammad Usama
>> >>
>> >> On Tue, May 8, 2018 at 7:54 PM, Vlad G <omenvlad at gmail.com> wrote:
>> >>
>> >>> Hey Guys.
>> >>> Thank you for your answer.
>> >>> I attached the configuration files of pgpool and logs.
>> >>> I hope you сan help.
>> >>>
>> >>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> Best regards,
>> >>> Vladyslav
>> >>>
>> >>>
>> >>> On May 7, 2018, at 16:05, Muhammad Usama <m.usama at gmail.com> wrote:
>> >>>
>> >>> Hi
>> >>>
>> >>> From the log snippet you shared it seems that the the failure was never
>> >>> detected by the other Pgpool-II node, Can you please share the
>> pgpoo.conf
>> >>> files and log files for all Pgpool nodes.
>> >>>
>> >>> Thanks
>> >>> Best Regards
>> >>> Muhammad Usama
>> >>>
>> >>> On Thu, May 3, 2018 at 5:20 PM, Vlad G <omenvlad at gmail.com> wrote:
>> >>>
>> >>>> Hey Guys,
>> >>>> I have a cluster with Pgpool-II-pg96-3.7.3 and postgresql-9.6.
>> >>>> (3 x pgpool and 3 x postgresql
>> >>>> The same scheme as:
>> >>>> http://www.pgpool.net/docs/latest/en/html/example-cluster.html
>> >>>>
>> >>>> When master node of postgresql (pgpoolpsql-1) goes down the master
>> node
>> >>>> of pgpool (  pgpool-1)  does not get second vote from one of the
>> standby
>> >>>> pgpool nodes (pgpool-2 and pgpool-3).
>> >>>>
>> >>>> If I set:
>> >>>> failover_require_consensus = off
>> >>>> Everything works fine.
>> >>>>
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24237:
>> >>>> LOG:  failed to connect to PostgreSQL server on "pgpoolpsql-1:5432",
>> >>>> getsockopt() detected error "Connection refused"
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24237:
>> >>>> LOG:  received degenerate backend request for node_id: 0 from pid
>> [24237]
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24217:
>> >>>> LOG:  new IPC connection received
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24217:
>> >>>> LOG:  watchdog received the failover command from local pgpool-II on
>> IPC
>> >>>> interface
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24217:
>> >>>> LOG:  watchdog is processing the failover command
>> >>>> [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC
>> interface
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24217:
>> >>>> LOG:  failover requires the majority vote, waiting for consensus
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24217:
>> >>>> DETAIL:  failover request noted
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24217:
>> >>>> LOG:  failover command [DEGENERATE_BACKEND_REQUEST] request from
>> pgpool-II
>> >>>> node "pgpool-1:9999 Linux pgpool-1" is queued, waiting for the
>> confirmation
>> >>>> from other nodes
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24237:
>> >>>> LOG:  degenerate backend request for node_id: 0 from pid [24237],
>> will be
>> >>>> handled by watchdog, which is building consensus for request
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24237:
>> >>>> FATAL:  failed to create a backend connection
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24237:
>> >>>> DETAIL:  executing failover on backend
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24216:
>> >>>> LOG:  child process with pid: 24237 exits with status 256
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24216:
>> >>>> LOG:  fork a new child process with pid: 24268
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24228:
>> >>>> LOG:  failed to connect to PostgreSQL server on "pgpoolpsql-1:5432",
>> >>>> getsockopt() detected error "Connection refused"
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24228:
>> >>>> LOG:  received degenerate backend request for node_id: 0 from pid
>> [24228]
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> LOG:  new IPC connection received
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> LOG:  watchdog received the failover command from local pgpool-II on
>> IPC
>> >>>> interface
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> LOG:  watchdog is processing the failover command
>> >>>> [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC
>> interface
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> LOG:  Duplicate failover request from "pgpool-1:9999 Linux pgpool-1"
>> node
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> DETAIL:  request ignored
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> LOG:  failover requires the majority vote, waiting for consensus
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> DETAIL:  failover request noted
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24228:
>> >>>> LOG:  degenerate backend request for 1 node(s) from pid [24228], is
>> changed
>> >>>> to quarantine node request by watchdog
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24228:
>> >>>> DETAIL:  watchdog is taking time to build consensus
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24228:
>> >>>> FATAL:  failed to create a backend connection
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24228:
>> >>>> DETAIL:  executing failover on backend
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24216:
>> >>>> LOG:  Pgpool-II parent process has received failover request
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> LOG:  new IPC connection received
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> LOG:  received the failover indication from Pgpool-II on IPC interface
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> LOG:  watchdog is informed of failover end by the main process
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24216:
>> >>>> LOG:  starting quarantine. shutdown host pgpoolpsql-1(5432)
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24216:
>> >>>> LOG:  Restart all children
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24216:
>> >>>> LOG:  failover: set new primary node: -1
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24216:
>> >>>> LOG:  failover: set new master node: 1
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24252:
>> >>>> LOG:  worker process received restart request
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> LOG:  new IPC connection received
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> LOG:  received the failover indication from Pgpool-II on IPC interface
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> LOG:  watchdog is informed of failover start by the main process
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: quarantine done. shutdown host
>> >>>> pgpoolpsql-1(5432)2018-05-03 13:02:46: pid 24216: LOG:  quarantine
>> done.
>> >>>> shutdown host pgpoolpsql-1(5432)
>> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
>> 24251:
>> >>>> LOG:  restart request received in pcp child process
>> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
>> 24216:
>> >>>> LOG:  PCP child 24251 exits with status 0 in failover()
>> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
>> 24216:
>> >>>> LOG:  fork a new PCP child pid 24301 in failover()
>> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
>> 24216:
>> >>>> LOG:  child process with pid: 24219 exits with status 0
>> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
>> 24216:
>> >>>> LOG:  child process with pid: 24219 exited with success and will not
>> be
>> >>>> restarted
>> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
>> 24216:
>> >>>> LOG:  child process with pid: 24220 exits with status 0
>> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
>> 24216:
>> >>>> LOG:  child process with pid: 24220 exited with success and will not
>> be
>> >>>> restarted
>> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
>> 24216:
>> >>>> LOG:  child process with pid: 24221 exits with status 0
>> >>>>
>> >>>> Around a month ago it woked fine (It seems I tested it on
>> pgpool-3.7.2),
>> >>>> but now it does not work. Could you tell me some parameters what it
>> depends
>> >>>> on or you have other thoughts.
>> >>>>
>> >>>> Best regards,
>> >>>> Vladyslav
>> >>>>
>> >>>> _______________________________________________
>> >>>> pgpool-general mailing list
>> >>>> pgpool-general at pgpool.net
>> >>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> pgpool-general mailing list
>> >>> pgpool-general at pgpool.net
>> >>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>> >>>
>> >>>
>> > _______________________________________________
>> > pgpool-general mailing list
>> > pgpool-general at pgpool.net
>> > http://www.pgpool.net/mailman/listinfo/pgpool-general
>>
>> diff --git a/doc/src/sgml/watchdog.sgml b/doc/src/sgml/watchdog.sgml
>> index 7e7adc9..041686b 100644
>> --- a/doc/src/sgml/watchdog.sgml
>> +++ b/doc/src/sgml/watchdog.sgml
>> @@ -442,6 +442,16 @@
>>          <para>
>>            Default is on.
>>          </para>
>> +
>> +       <caution>
>> +         <para>
>> +           To make <varname>failover_require_consensus</varname>
>> +           workable, You need to enable health check. For more
>> +           details of health check,
>> +           see <xref linkend="runtime-config-health-check">.
>> +         </para>
>> +       </caution>
>> +
>>          <para>
>>          <varname>failover_require_consensus</varname> is not available
>> prior to
>>          <productname>Pgpool-II </productname><emphasis>V3.7</emphasis>.
>> and it is only
>>
>>


More information about the pgpool-general mailing list