View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000290 | Pgpool-II | Bug | public | 2017-02-26 02:56 | 2017-03-23 09:06 |
| Reporter | gdecicco | Assigned To | Muhammad Usama | ||
| Priority | immediate | Severity | major | Reproducibility | always |
| Status | assigned | Resolution | open | ||
| Platform | Linux | OS | Debian x64 | OS Version | stretch |
| Product Version | 3.6.1 | ||||
| Summary | 0000290: [safety violation] master without quorum execute failover procedure | ||||
| Description | pgpool node that believe to be master (but without quorum), not reaching the master database, will start failover procedure, reaching safety violation. 0) This was the original configuration: master/slave streaming replication (master is the real master, * is the believing master) 3 pgpool nodes: pgpool master (alpha) pgpool standby1 (bravo) pgpool standby2 (charlie) 3 databases: db replica1 (xray) db replica2 (yankee) db master (zulu) every node is connected with every one else. 1)After a network partitioning charlie remains disconnected from the other pgpool In this situation charlie think to be the new master but it doesn't have the quorum (so it will not escalate). 2) After a network partitioning charlie remains disconnected also from the master database (zulu) Although it is not a recognized master, charlie will start failover procedure promoting yankee to new database primary node and setting xray as yankee's replica. So form the charlie point of view the database state is the following: 0 | xray | up | 0.333333 | standby 1 | yankee | up | 0.333333 | primary 2 | zulu | down | 0.333333 | standby In the meantime the recognized master alpha and bravo can only see partially what is happening: from their perspective the database state is the following: 0 | xray | down | 0.333333 | standby | 0 1 | yankee | up | 0.333333 | standby | 448 <--- (replica delay) 2 | zulu | up | 0.333333 | primary | 0 The system has reach an unsafe state. alpha still sees the old primary as primary, but it also sees yankee as zulu's replica (with a bit of delay of course) Alpha still had the VIP so the application could continue to write to zulu in the mean time. This state can't be resolved automatically without risking a data loss: - if charlie reconnect to the other pgpool nodes (and resolved issue 0000289 http://www.pgpool.net/mantisbt/view.php?id=289), it will forget what it has done and following the alpha lead. No replica for real primary database available. - if somehow charlie become the recognized master (eg. alpha resign/disconnect/crash or not fixing issue 0000289), there will be some data in zulu that will not be on yankee causing data loss. The expecting behavior would be to disallow a master pgpool instance without quorum to start failover. Only recognized Master should be able to recovery database. | ||||
| Steps To Reproduce | step 1) on charlie: iptables -A INPUT -s alpha -j DROP iptables -A INPUT -s bravo -j DROP on alpha: iptables -A INPUT -s charlie -j DROP on bravo: iptables -A INPUT -s charlie -j DROP step 2) on zulu: iptables -A INPUT -s charlie -j DROP | ||||
| Tags | pgpool, watchdog | ||||