[pgpool-general: 6079] Re: failover_require_consensus does not work.

Tatsuo Ishii ishii at sraoss.co.jp
Fri May 11 07:35:04 JST 2018


Usama,

Do we want to add some notes to the doc regarding this? The behavior
described below may not be obvious to users.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Hi
> 
> Thanks for the logs and config files.
> As per the logs and pgpool.conf files, This is what is happening.
> 
> You have health check disabled on all Pgpool-II nodes, So only way to
> detect the backend failure is through fail_over_on_backend error( which
> only works when client connection
> detects the error) . But since the clients are only connecting to the
> master Pgpool-II node, so only master Pgpool-II node can notice the backend
> PostgreSQL node failure
> and because of consensus requirement it will keep waiting for the detection
> of backend failure by other Pgpool-II nodes, Which never arrives because
> other two Pgpool-II nodes
> are sitting idle and didn't detected the error.
> So you either need to enable the health check on all pgpool-II nodes (
> Which is the recommended setting for HA) or just disable the consensus
> requirements (as you did when failover
> was working fine)
> 
> Thanks
> Best Rgeards
> Muhammad Usama
> 
> On Tue, May 8, 2018 at 7:54 PM, Vlad G <omenvlad at gmail.com> wrote:
> 
>> Hey Guys.
>> Thank you for your answer.
>> I attached the configuration files of pgpool and logs.
>> I hope you сan help.
>>
> 
>>
>>
>>
>>
>>
>>
>> Best regards,
>> Vladyslav
>>
>>
>> On May 7, 2018, at 16:05, Muhammad Usama <m.usama at gmail.com> wrote:
>>
>> Hi
>>
>> From the log snippet you shared it seems that the the failure was never
>> detected by the other Pgpool-II node, Can you please share the pgpoo.conf
>> files and log files for all Pgpool nodes.
>>
>> Thanks
>> Best Regards
>> Muhammad Usama
>>
>> On Thu, May 3, 2018 at 5:20 PM, Vlad G <omenvlad at gmail.com> wrote:
>>
>>> Hey Guys,
>>> I have a cluster with Pgpool-II-pg96-3.7.3 and postgresql-9.6.
>>> (3 x pgpool and 3 x postgresql
>>> The same scheme as:
>>> http://www.pgpool.net/docs/latest/en/html/example-cluster.html
>>>
>>> When master node of postgresql (pgpoolpsql-1) goes down the master node
>>> of pgpool (  pgpool-1)  does not get second vote from one of the standby
>>> pgpool nodes (pgpool-2 and pgpool-3).
>>>
>>> If I set:
>>> failover_require_consensus = off
>>> Everything works fine.
>>>
>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24237:
>>> LOG:  failed to connect to PostgreSQL server on "pgpoolpsql-1:5432",
>>> getsockopt() detected error "Connection refused"
>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24237:
>>> LOG:  received degenerate backend request for node_id: 0 from pid [24237]
>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>>> LOG:  new IPC connection received
>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>>> LOG:  watchdog received the failover command from local pgpool-II on IPC
>>> interface
>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>>> LOG:  watchdog is processing the failover command
>>> [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>>> LOG:  failover requires the majority vote, waiting for consensus
>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>>> DETAIL:  failover request noted
>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>>> LOG:  failover command [DEGENERATE_BACKEND_REQUEST] request from pgpool-II
>>> node "pgpool-1:9999 Linux pgpool-1" is queued, waiting for the confirmation
>>> from other nodes
>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24237:
>>> LOG:  degenerate backend request for node_id: 0 from pid [24237], will be
>>> handled by watchdog, which is building consensus for request
>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24237:
>>> FATAL:  failed to create a backend connection
>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24237:
>>> DETAIL:  executing failover on backend
>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24216:
>>> LOG:  child process with pid: 24237 exits with status 256
>>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24216:
>>> LOG:  fork a new child process with pid: 24268
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>>> LOG:  failed to connect to PostgreSQL server on "pgpoolpsql-1:5432",
>>> getsockopt() detected error "Connection refused"
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>>> LOG:  received degenerate backend request for node_id: 0 from pid [24228]
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>> LOG:  new IPC connection received
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>> LOG:  watchdog received the failover command from local pgpool-II on IPC
>>> interface
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>> LOG:  watchdog is processing the failover command
>>> [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>> LOG:  Duplicate failover request from "pgpool-1:9999 Linux pgpool-1" node
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>> DETAIL:  request ignored
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>> LOG:  failover requires the majority vote, waiting for consensus
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>> DETAIL:  failover request noted
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>>> LOG:  degenerate backend request for 1 node(s) from pid [24228], is changed
>>> to quarantine node request by watchdog
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>>> DETAIL:  watchdog is taking time to build consensus
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>>> FATAL:  failed to create a backend connection
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>>> DETAIL:  executing failover on backend
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24216:
>>> LOG:  Pgpool-II parent process has received failover request
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>> LOG:  new IPC connection received
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>> LOG:  received the failover indication from Pgpool-II on IPC interface
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>> LOG:  watchdog is informed of failover end by the main process
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24216:
>>> LOG:  starting quarantine. shutdown host pgpoolpsql-1(5432)
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24216:
>>> LOG:  Restart all children
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24216:
>>> LOG:  failover: set new primary node: -1
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24216:
>>> LOG:  failover: set new master node: 1
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24252:
>>> LOG:  worker process received restart request
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>> LOG:  new IPC connection received
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>> LOG:  received the failover indication from Pgpool-II on IPC interface
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>>> LOG:  watchdog is informed of failover start by the main process
>>> May 03 13:02:46 pgpool-1 pgpool[24216]: quarantine done. shutdown host
>>> pgpoolpsql-1(5432)2018-05-03 13:02:46: pid 24216: LOG:  quarantine done.
>>> shutdown host pgpoolpsql-1(5432)
>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24251:
>>> LOG:  restart request received in pcp child process
>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>>> LOG:  PCP child 24251 exits with status 0 in failover()
>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>>> LOG:  fork a new PCP child pid 24301 in failover()
>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>>> LOG:  child process with pid: 24219 exits with status 0
>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>>> LOG:  child process with pid: 24219 exited with success and will not be
>>> restarted
>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>>> LOG:  child process with pid: 24220 exits with status 0
>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>>> LOG:  child process with pid: 24220 exited with success and will not be
>>> restarted
>>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>>> LOG:  child process with pid: 24221 exits with status 0
>>>
>>> Around a month ago it woked fine (It seems I tested it on pgpool-3.7.2),
>>> but now it does not work. Could you tell me some parameters what it depends
>>> on or you have other thoughts.
>>>
>>> Best regards,
>>> Vladyslav
>>>
>>> _______________________________________________
>>> pgpool-general mailing list
>>> pgpool-general at pgpool.net
>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>>
>>
>>
>>
>> _______________________________________________
>> pgpool-general mailing list
>> pgpool-general at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>
>>


More information about the pgpool-general mailing list