[pgpool-general: 6080] Re: failover_require_consensus does not work.

Tatsuo Ishii ishii at sraoss.co.jp
Fri May 11 17:30:50 JST 2018


Ok, here is a proposal for addition to the doc.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Usama,
> 
> Do we want to add some notes to the doc regarding this? The behavior
> described below may not be obvious to users.
> 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
> 
>> Hi
>> 
>> Thanks for the logs and config files.
>> As per the logs and pgpool.conf files, This is what is happening.
>> 
>> You have health check disabled on all Pgpool-II nodes, So only way to
>> detect the backend failure is through fail_over_on_backend error( which
>> only works when client connection
>> detects the error) . But since the clients are only connecting to the
>> master Pgpool-II node, so only master Pgpool-II node can notice the backend
>> PostgreSQL node failure
>> and because of consensus requirement it will keep waiting for the detection
>> of backend failure by other Pgpool-II nodes, Which never arrives because
>> other two Pgpool-II nodes
>> are sitting idle and didn't detected the error.
>> So you either need to enable the health check on all pgpool-II nodes (
>> Which is the recommended setting for HA) or just disable the consensus
>> requirements (as you did when failover
>> was working fine)
>> 
>> Thanks
>> Best Rgeards
>> Muhammad Usama
>> 
>> On Tue, May 8, 2018 at 7:54 PM, Vlad G <omenvlad at gmail.com> wrote:
>> 
>>> Hey Guys.
>>> Thank you for your answer.
>>> I attached the configuration files of pgpool and logs.
>>> I hope you сan help.
>>>
>> 
>>>
>>>
>>>
>>>
>>>
>>>
>>> Best regards,
>>> Vladyslav
>>>
>>>
>>> On May 7, 2018, at 16:05, Muhammad Usama <m.usama at gmail.com> wrote:
>>>
>>> Hi
>>>
>>> From the log snippet you shared it seems that the the failure was never
>>> detected by the other Pgpool-II node, Can you please share the pgpoo.conf
>>> files and log files for all Pgpool nodes.
>>>
>>> Thanks
>>> Best Regards
>>> Muhammad Usama
>>>
>>> On Thu, May 3, 2018 at 5:20 PM, Vlad G <omenvlad at gmail.com> wrote:
>>>
>>>> Hey Guys,
>>>> I have a cluster with Pgpool-II-pg96-3.7.3 and postgresql-9.6.
>>>> (3 x pgpool and 3 x postgresql
>>>> The same scheme as:
>>>> http://www.pgpool.net/docs/latest/en/html/example-cluster.html
>>>>
>>>> When master node of postgresql (pgpoolpsql-1) goes down the master node
>>>> of pgpool (  pgpool-1)  does not get second vote from one of the standby
>>>> pgpool nodes (pgpool-2 and pgpool-3).
>>>>
>>>> If I set:
>>>> failover_require_consensus = off
>>>> Everything works fine.
>>>>
>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24237:
>>>> LOG:  failed to connect to PostgreSQL server on "pgpoolpsql-1:5432",
>>>> getsockopt() detected error "Connection refused"
>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24237:
>>>> LOG:  received degenerate backend request for node_id: 0 from pid [24237]
>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>>>> LOG:  new IPC connection received
>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>>>> LOG:  watchdog received the failover command from local pgpool-II on IPC
>>>> interface
>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>>>> LOG:  watchdog is processing the failover command
>>>> [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>>>> LOG:  failover requires the majority vote, waiting for consensus
>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>>>> DETAIL:  failover request noted
>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>>>> LOG:  failover command [DEGENERATE_BACKEND_REQUEST] request from pgpool-II
>>>> node "pgpool-1:9999 Linux pgpool-1" is queued, waiting for the confirmation
>>>> from other nodes
>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24237:
>>>> LOG:  degenerate backend request for node_id: 0 from pid [24237], will be
>>>> handled by watchdog, which is building consensus for request
>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24237:
>>>> FATAL:  failed to create a backend connection
>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24237:
>>>> DETAIL:  executing failover on backend
>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24216:
>>>> LOG:  child process with pid: 24237 exits with status 256
>>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24216:
>>>> LOG:  fork a new child process with pid: 24268
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>>>> LOG:  failed to connect to PostgreSQL server on "pgpoolpsql-1:5432",
>>>> getsockopt() detected error "Connection refused"
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>>>> LOG:  received degenerate backend request for node_id: 0 from pid [24228]
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>> LOG:  new IPC connection received
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>> LOG:  watchdog received the failover command from local pgpool-II on IPC
>>>> interface
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>> LOG:  watchdog is processing the failover command
>>>> [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>> LOG:  Duplicate failover request from "pgpool-1:9999 Linux pgpool-1" node
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>> DETAIL:  request ignored
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>> LOG:  failover requires the majority vote, waiting for consensus
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>> DETAIL:  failover request noted
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>>>> LOG:  degenerate backend request for 1 node(s) from pid [24228], is changed
>>>> to quarantine node request by watchdog
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>>>> DETAIL:  watchdog is taking time to build consensus
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>>>> FATAL:  failed to create a backend connection
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>>>> DETAIL:  executing failover on backend
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24216:
>>>> LOG:  Pgpool-II parent process has received failover request
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>> LOG:  new IPC connection received
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>> LOG:  received the failover indication from Pgpool-II on IPC interface
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>> LOG:  watchdog is informed of failover end by the main process
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24216:
>>>> LOG:  starting quarantine. shutdown host pgpoolpsql-1(5432)
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24216:
>>>> LOG:  Restart all children
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24216:
>>>> LOG:  failover: set new primary node: -1
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24216:
>>>> LOG:  failover: set new master node: 1
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24252:
>>>> LOG:  worker process received restart request
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>> LOG:  new IPC connection received
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>> LOG:  received the failover indication from Pgpool-II on IPC interface
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>>> LOG:  watchdog is informed of failover start by the main process
>>>> May 03 13:02:46 pgpool-1 pgpool[24216]: quarantine done. shutdown host
>>>> pgpoolpsql-1(5432)2018-05-03 13:02:46: pid 24216: LOG:  quarantine done.
>>>> shutdown host pgpoolpsql-1(5432)
>>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24251:
>>>> LOG:  restart request received in pcp child process
>>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>>>> LOG:  PCP child 24251 exits with status 0 in failover()
>>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>>>> LOG:  fork a new PCP child pid 24301 in failover()
>>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>>>> LOG:  child process with pid: 24219 exits with status 0
>>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>>>> LOG:  child process with pid: 24219 exited with success and will not be
>>>> restarted
>>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>>>> LOG:  child process with pid: 24220 exits with status 0
>>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>>>> LOG:  child process with pid: 24220 exited with success and will not be
>>>> restarted
>>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>>>> LOG:  child process with pid: 24221 exits with status 0
>>>>
>>>> Around a month ago it woked fine (It seems I tested it on pgpool-3.7.2),
>>>> but now it does not work. Could you tell me some parameters what it depends
>>>> on or you have other thoughts.
>>>>
>>>> Best regards,
>>>> Vladyslav
>>>>
>>>> _______________________________________________
>>>> pgpool-general mailing list
>>>> pgpool-general at pgpool.net
>>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> pgpool-general mailing list
>>> pgpool-general at pgpool.net
>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>>
>>>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
-------------- next part --------------
A non-text attachment was scrubbed...
Name: watchdog.diff
Type: text/x-patch
Size: 687 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20180511/1b5b5870/attachment.bin>


More information about the pgpool-general mailing list