[pgpool-general: 6084] Re: failover_require_consensus does not work.
Tatsuo Ishii
ishii at sraoss.co.jp
Mon May 14 06:14:37 JST 2018
Hi Usama,
Thanks. The patch looks good to me.
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
> Hi Ishii-San
>
> I have tried to rephrase your suggestion for clarity. Please have a look at
> the attached patch if you see it fit
>
>
> Thanks
> Best Regards
> Muhammad Usama
>
>
> On Fri, May 11, 2018 at 1:30 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>
>> Ok, here is a proposal for addition to the doc.
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>> > Usama,
>> >
>> > Do we want to add some notes to the doc regarding this? The behavior
>> > described below may not be obvious to users.
>> >
>> > Best regards,
>> > --
>> > Tatsuo Ishii
>> > SRA OSS, Inc. Japan
>> > English: http://www.sraoss.co.jp/index_en.php
>> > Japanese:http://www.sraoss.co.jp
>> >
>> >> Hi
>> >>
>> >> Thanks for the logs and config files.
>> >> As per the logs and pgpool.conf files, This is what is happening.
>> >>
>> >> You have health check disabled on all Pgpool-II nodes, So only way to
>> >> detect the backend failure is through fail_over_on_backend error( which
>> >> only works when client connection
>> >> detects the error) . But since the clients are only connecting to the
>> >> master Pgpool-II node, so only master Pgpool-II node can notice the
>> backend
>> >> PostgreSQL node failure
>> >> and because of consensus requirement it will keep waiting for the
>> detection
>> >> of backend failure by other Pgpool-II nodes, Which never arrives because
>> >> other two Pgpool-II nodes
>> >> are sitting idle and didn't detected the error.
>> >> So you either need to enable the health check on all pgpool-II nodes (
>> >> Which is the recommended setting for HA) or just disable the consensus
>> >> requirements (as you did when failover
>> >> was working fine)
>> >>
>> >> Thanks
>> >> Best Rgeards
>> >> Muhammad Usama
>> >>
>> >> On Tue, May 8, 2018 at 7:54 PM, Vlad G <omenvlad at gmail.com> wrote:
>> >>
>> >>> Hey Guys.
>> >>> Thank you for your answer.
>> >>> I attached the configuration files of pgpool and logs.
>> >>> I hope you сan help.
>> >>>
>> >>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> Best regards,
>> >>> Vladyslav
>> >>>
>> >>>
>> >>> On May 7, 2018, at 16:05, Muhammad Usama <m.usama at gmail.com> wrote:
>> >>>
>> >>> Hi
>> >>>
>> >>> From the log snippet you shared it seems that the the failure was never
>> >>> detected by the other Pgpool-II node, Can you please share the
>> pgpoo.conf
>> >>> files and log files for all Pgpool nodes.
>> >>>
>> >>> Thanks
>> >>> Best Regards
>> >>> Muhammad Usama
>> >>>
>> >>> On Thu, May 3, 2018 at 5:20 PM, Vlad G <omenvlad at gmail.com> wrote:
>> >>>
>> >>>> Hey Guys,
>> >>>> I have a cluster with Pgpool-II-pg96-3.7.3 and postgresql-9.6.
>> >>>> (3 x pgpool and 3 x postgresql
>> >>>> The same scheme as:
>> >>>> http://www.pgpool.net/docs/latest/en/html/example-cluster.html
>> >>>>
>> >>>> When master node of postgresql (pgpoolpsql-1) goes down the master
>> node
>> >>>> of pgpool ( pgpool-1) does not get second vote from one of the
>> standby
>> >>>> pgpool nodes (pgpool-2 and pgpool-3).
>> >>>>
>> >>>> If I set:
>> >>>> failover_require_consensus = off
>> >>>> Everything works fine.
>> >>>>
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24237:
>> >>>> LOG: failed to connect to PostgreSQL server on "pgpoolpsql-1:5432",
>> >>>> getsockopt() detected error "Connection refused"
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24237:
>> >>>> LOG: received degenerate backend request for node_id: 0 from pid
>> [24237]
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24217:
>> >>>> LOG: new IPC connection received
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24217:
>> >>>> LOG: watchdog received the failover command from local pgpool-II on
>> IPC
>> >>>> interface
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24217:
>> >>>> LOG: watchdog is processing the failover command
>> >>>> [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC
>> interface
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24217:
>> >>>> LOG: failover requires the majority vote, waiting for consensus
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24217:
>> >>>> DETAIL: failover request noted
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24217:
>> >>>> LOG: failover command [DEGENERATE_BACKEND_REQUEST] request from
>> pgpool-II
>> >>>> node "pgpool-1:9999 Linux pgpool-1" is queued, waiting for the
>> confirmation
>> >>>> from other nodes
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24237:
>> >>>> LOG: degenerate backend request for node_id: 0 from pid [24237],
>> will be
>> >>>> handled by watchdog, which is building consensus for request
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24237:
>> >>>> FATAL: failed to create a backend connection
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24237:
>> >>>> DETAIL: executing failover on backend
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24216:
>> >>>> LOG: child process with pid: 24237 exits with status 256
>> >>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid
>> 24216:
>> >>>> LOG: fork a new child process with pid: 24268
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24228:
>> >>>> LOG: failed to connect to PostgreSQL server on "pgpoolpsql-1:5432",
>> >>>> getsockopt() detected error "Connection refused"
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24228:
>> >>>> LOG: received degenerate backend request for node_id: 0 from pid
>> [24228]
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> LOG: new IPC connection received
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> LOG: watchdog received the failover command from local pgpool-II on
>> IPC
>> >>>> interface
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> LOG: watchdog is processing the failover command
>> >>>> [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC
>> interface
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> LOG: Duplicate failover request from "pgpool-1:9999 Linux pgpool-1"
>> node
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> DETAIL: request ignored
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> LOG: failover requires the majority vote, waiting for consensus
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> DETAIL: failover request noted
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24228:
>> >>>> LOG: degenerate backend request for 1 node(s) from pid [24228], is
>> changed
>> >>>> to quarantine node request by watchdog
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24228:
>> >>>> DETAIL: watchdog is taking time to build consensus
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24228:
>> >>>> FATAL: failed to create a backend connection
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24228:
>> >>>> DETAIL: executing failover on backend
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24216:
>> >>>> LOG: Pgpool-II parent process has received failover request
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> LOG: new IPC connection received
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> LOG: received the failover indication from Pgpool-II on IPC interface
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> LOG: watchdog is informed of failover end by the main process
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24216:
>> >>>> LOG: starting quarantine. shutdown host pgpoolpsql-1(5432)
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24216:
>> >>>> LOG: Restart all children
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24216:
>> >>>> LOG: failover: set new primary node: -1
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24216:
>> >>>> LOG: failover: set new master node: 1
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24252:
>> >>>> LOG: worker process received restart request
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> LOG: new IPC connection received
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> LOG: received the failover indication from Pgpool-II on IPC interface
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid
>> 24217:
>> >>>> LOG: watchdog is informed of failover start by the main process
>> >>>> May 03 13:02:46 pgpool-1 pgpool[24216]: quarantine done. shutdown host
>> >>>> pgpoolpsql-1(5432)2018-05-03 13:02:46: pid 24216: LOG: quarantine
>> done.
>> >>>> shutdown host pgpoolpsql-1(5432)
>> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
>> 24251:
>> >>>> LOG: restart request received in pcp child process
>> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
>> 24216:
>> >>>> LOG: PCP child 24251 exits with status 0 in failover()
>> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
>> 24216:
>> >>>> LOG: fork a new PCP child pid 24301 in failover()
>> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
>> 24216:
>> >>>> LOG: child process with pid: 24219 exits with status 0
>> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
>> 24216:
>> >>>> LOG: child process with pid: 24219 exited with success and will not
>> be
>> >>>> restarted
>> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
>> 24216:
>> >>>> LOG: child process with pid: 24220 exits with status 0
>> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
>> 24216:
>> >>>> LOG: child process with pid: 24220 exited with success and will not
>> be
>> >>>> restarted
>> >>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid
>> 24216:
>> >>>> LOG: child process with pid: 24221 exits with status 0
>> >>>>
>> >>>> Around a month ago it woked fine (It seems I tested it on
>> pgpool-3.7.2),
>> >>>> but now it does not work. Could you tell me some parameters what it
>> depends
>> >>>> on or you have other thoughts.
>> >>>>
>> >>>> Best regards,
>> >>>> Vladyslav
>> >>>>
>> >>>> _______________________________________________
>> >>>> pgpool-general mailing list
>> >>>> pgpool-general at pgpool.net
>> >>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> pgpool-general mailing list
>> >>> pgpool-general at pgpool.net
>> >>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>> >>>
>> >>>
>> > _______________________________________________
>> > pgpool-general mailing list
>> > pgpool-general at pgpool.net
>> > http://www.pgpool.net/mailman/listinfo/pgpool-general
>>
>> diff --git a/doc/src/sgml/watchdog.sgml b/doc/src/sgml/watchdog.sgml
>> index 7e7adc9..041686b 100644
>> --- a/doc/src/sgml/watchdog.sgml
>> +++ b/doc/src/sgml/watchdog.sgml
>> @@ -442,6 +442,16 @@
>> <para>
>> Default is on.
>> </para>
>> +
>> + <caution>
>> + <para>
>> + To make <varname>failover_require_consensus</varname>
>> + workable, You need to enable health check. For more
>> + details of health check,
>> + see <xref linkend="runtime-config-health-check">.
>> + </para>
>> + </caution>
>> +
>> <para>
>> <varname>failover_require_consensus</varname> is not available
>> prior to
>> <productname>Pgpool-II </productname><emphasis>V3.7</emphasis>.
>> and it is only
>>
>>
More information about the pgpool-general
mailing list