[pgpool-general: 6078] Re: failover_require_consensus does not work.

Muhammad Usama m.usama at gmail.com
Fri May 11 05:39:03 JST 2018


Hi

Thanks for the logs and config files.
As per the logs and pgpool.conf files, This is what is happening.

You have health check disabled on all Pgpool-II nodes, So only way to
detect the backend failure is through fail_over_on_backend error( which
only works when client connection
detects the error) . But since the clients are only connecting to the
master Pgpool-II node, so only master Pgpool-II node can notice the backend
PostgreSQL node failure
and because of consensus requirement it will keep waiting for the detection
of backend failure by other Pgpool-II nodes, Which never arrives because
other two Pgpool-II nodes
are sitting idle and didn't detected the error.
So you either need to enable the health check on all pgpool-II nodes (
Which is the recommended setting for HA) or just disable the consensus
requirements (as you did when failover
was working fine)

Thanks
Best Rgeards
Muhammad Usama

On Tue, May 8, 2018 at 7:54 PM, Vlad G <omenvlad at gmail.com> wrote:

> Hey Guys.
> Thank you for your answer.
> I attached the configuration files of pgpool and logs.
> I hope you сan help.
>

>
>
>
>
>
>
> Best regards,
> Vladyslav
>
>
> On May 7, 2018, at 16:05, Muhammad Usama <m.usama at gmail.com> wrote:
>
> Hi
>
> From the log snippet you shared it seems that the the failure was never
> detected by the other Pgpool-II node, Can you please share the pgpoo.conf
> files and log files for all Pgpool nodes.
>
> Thanks
> Best Regards
> Muhammad Usama
>
> On Thu, May 3, 2018 at 5:20 PM, Vlad G <omenvlad at gmail.com> wrote:
>
>> Hey Guys,
>> I have a cluster with Pgpool-II-pg96-3.7.3 and postgresql-9.6.
>> (3 x pgpool and 3 x postgresql
>> The same scheme as:
>> http://www.pgpool.net/docs/latest/en/html/example-cluster.html
>>
>> When master node of postgresql (pgpoolpsql-1) goes down the master node
>> of pgpool (  pgpool-1)  does not get second vote from one of the standby
>> pgpool nodes (pgpool-2 and pgpool-3).
>>
>> If I set:
>> failover_require_consensus = off
>> Everything works fine.
>>
>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24237:
>> LOG:  failed to connect to PostgreSQL server on "pgpoolpsql-1:5432",
>> getsockopt() detected error "Connection refused"
>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24237:
>> LOG:  received degenerate backend request for node_id: 0 from pid [24237]
>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>> LOG:  new IPC connection received
>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>> LOG:  watchdog received the failover command from local pgpool-II on IPC
>> interface
>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>> LOG:  watchdog is processing the failover command
>> [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>> LOG:  failover requires the majority vote, waiting for consensus
>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>> DETAIL:  failover request noted
>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24217:
>> LOG:  failover command [DEGENERATE_BACKEND_REQUEST] request from pgpool-II
>> node "pgpool-1:9999 Linux pgpool-1" is queued, waiting for the confirmation
>> from other nodes
>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24237:
>> LOG:  degenerate backend request for node_id: 0 from pid [24237], will be
>> handled by watchdog, which is building consensus for request
>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24237:
>> FATAL:  failed to create a backend connection
>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24237:
>> DETAIL:  executing failover on backend
>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24216:
>> LOG:  child process with pid: 24237 exits with status 256
>> May 03 13:02:45 pgpool-1 pgpool[24216]: 2018-05-03 13:02:45: pid 24216:
>> LOG:  fork a new child process with pid: 24268
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>> LOG:  failed to connect to PostgreSQL server on "pgpoolpsql-1:5432",
>> getsockopt() detected error "Connection refused"
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>> LOG:  received degenerate backend request for node_id: 0 from pid [24228]
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>> LOG:  new IPC connection received
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>> LOG:  watchdog received the failover command from local pgpool-II on IPC
>> interface
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>> LOG:  watchdog is processing the failover command
>> [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>> LOG:  Duplicate failover request from "pgpool-1:9999 Linux pgpool-1" node
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>> DETAIL:  request ignored
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>> LOG:  failover requires the majority vote, waiting for consensus
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>> DETAIL:  failover request noted
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>> LOG:  degenerate backend request for 1 node(s) from pid [24228], is changed
>> to quarantine node request by watchdog
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>> DETAIL:  watchdog is taking time to build consensus
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>> FATAL:  failed to create a backend connection
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24228:
>> DETAIL:  executing failover on backend
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24216:
>> LOG:  Pgpool-II parent process has received failover request
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>> LOG:  new IPC connection received
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>> LOG:  received the failover indication from Pgpool-II on IPC interface
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>> LOG:  watchdog is informed of failover end by the main process
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24216:
>> LOG:  starting quarantine. shutdown host pgpoolpsql-1(5432)
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24216:
>> LOG:  Restart all children
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24216:
>> LOG:  failover: set new primary node: -1
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24216:
>> LOG:  failover: set new master node: 1
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24252:
>> LOG:  worker process received restart request
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>> LOG:  new IPC connection received
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>> LOG:  received the failover indication from Pgpool-II on IPC interface
>> May 03 13:02:46 pgpool-1 pgpool[24216]: 2018-05-03 13:02:46: pid 24217:
>> LOG:  watchdog is informed of failover start by the main process
>> May 03 13:02:46 pgpool-1 pgpool[24216]: quarantine done. shutdown host
>> pgpoolpsql-1(5432)2018-05-03 13:02:46: pid 24216: LOG:  quarantine done.
>> shutdown host pgpoolpsql-1(5432)
>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24251:
>> LOG:  restart request received in pcp child process
>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>> LOG:  PCP child 24251 exits with status 0 in failover()
>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>> LOG:  fork a new PCP child pid 24301 in failover()
>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>> LOG:  child process with pid: 24219 exits with status 0
>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>> LOG:  child process with pid: 24219 exited with success and will not be
>> restarted
>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>> LOG:  child process with pid: 24220 exits with status 0
>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>> LOG:  child process with pid: 24220 exited with success and will not be
>> restarted
>> May 03 13:02:47 pgpool-1 pgpool[24216]: 2018-05-03 13:02:47: pid 24216:
>> LOG:  child process with pid: 24221 exits with status 0
>>
>> Around a month ago it woked fine (It seems I tested it on pgpool-3.7.2),
>> but now it does not work. Could you tell me some parameters what it depends
>> on or you have other thoughts.
>>
>> Best regards,
>> Vladyslav
>>
>> _______________________________________________
>> pgpool-general mailing list
>> pgpool-general at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>
>
>
>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20180511/7cce557a/attachment.html>


More information about the pgpool-general mailing list