View Issue Details

IDProjectCategoryView StatusLast Update
0000290Pgpool-IIBugpublic2017-03-23 09:06
ReportergdeciccoAssigned ToMuhammad Usama 
PriorityimmediateSeveritymajorReproducibilityalways
Status assignedResolutionopen 
PlatformLinuxOSDebian x64OS Versionstretch
Product Version3.6.1 
Target VersionFixed in Version 
Summary0000290: [safety violation] master without quorum execute failover procedure
Descriptionpgpool node that believe to be master (but without quorum), not reaching the master database, will start failover procedure, reaching safety violation.

0) This was the original configuration:
master/slave streaming replication

(master is the real master, * is the believing master)

3 pgpool nodes:
pgpool master (alpha)
pgpool standby1 (bravo)
pgpool standby2 (charlie)

3 databases:
db replica1 (xray)
db replica2 (yankee)
db master (zulu)

every node is connected with every one else.

1)After a network partitioning charlie remains disconnected from the other pgpool

In this situation charlie think to be the new master but it doesn't have the quorum (so it will not escalate).

2) After a network partitioning charlie remains disconnected also from the master database (zulu)
Although it is not a recognized master, charlie will start failover procedure promoting yankee to new database primary node and setting xray as yankee's replica.

So form the charlie point of view the database state is the following:
0 | xray | up | 0.333333 | standby
1 | yankee | up | 0.333333 | primary
2 | zulu | down | 0.333333 | standby


In the meantime the recognized master alpha and bravo can only see partially what is happening: from their perspective the database state is the following:

0 | xray | down | 0.333333 | standby | 0
1 | yankee | up | 0.333333 | standby | 448 <--- (replica delay)
2 | zulu | up | 0.333333 | primary | 0

The system has reach an unsafe state. alpha still sees the old primary as primary, but it also sees yankee as zulu's replica (with a bit of delay of course)
Alpha still had the VIP so the application could continue to write to zulu in the mean time.

This state can't be resolved automatically without risking a data loss:

- if charlie reconnect to the other pgpool nodes (and resolved issue 0000289 http://www.pgpool.net/mantisbt/view.php?id=289), it will forget what it has done and following the alpha lead. No replica for real primary database available.

- if somehow charlie become the recognized master (eg. alpha resign/disconnect/crash or not fixing issue 0000289), there will be some data in zulu that will not be on yankee causing data loss.


The expecting behavior would be to disallow a master pgpool instance without quorum to start failover. Only recognized Master should be able to recovery database.


Steps To Reproducestep 1)
on charlie:
iptables -A INPUT -s alpha -j DROP
iptables -A INPUT -s bravo -j DROP

on alpha:
iptables -A INPUT -s charlie -j DROP

on bravo:
iptables -A INPUT -s charlie -j DROP


step 2)
on zulu:
iptables -A INPUT -s charlie -j DROP
Tagspgpool, watchdog

Activities

There are no notes attached to this issue.

Issue History

Date Modified Username Field Change
2017-02-26 02:56 gdecicco New Issue
2017-02-26 02:59 gdecicco Tag Attached: pgpool
2017-02-26 02:59 gdecicco Tag Attached: watchdog
2017-03-23 09:06 t-ishii Assigned To => Muhammad Usama
2017-03-23 09:06 t-ishii Status new => assigned