View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000135 | Pgpool-II | Bug | public | 2015-05-20 22:02 | 2015-12-16 20:38 |
| Reporter | januszb | Assigned To | Muhammad Usama | ||
| Priority | normal | Severity | major | Reproducibility | sometimes |
| Status | closed | Resolution | fixed | ||
| Platform | linux | OS | centos | OS Version | 7 |
| Summary | 0000135: Delegate IP does not get up on Standby upon Active gets disconnected | ||||
| Description | [root@ib-wawa-189 ~]# pgpool --version pgpool-II version 3.4.2 (tataraboshi) When Active watchdog gets disconnected (cable unplugged), the Standby does not react. As a result service is not available. Sometimes Standby reacts correctly bringing up the delegate IP, often it does not. In 4 experiments it reacts correctly only once (on average) in my experience. | ||||
| Steps To Reproduce | Unplug cable from Active and watch Standby | ||||
| Additional Information | Upon Active gets disconnected, the Standby enters an infinite loop logging every few seconds: May 20 13:22:30 localhost pgpool: 2015-05-20 13:22:30: pid 1229: DEBUG: watchdog heartbeat: send heartbeat signal to 192.168.10.189:9694 May 20 13:22:32 localhost pgpool[902]: [5430-1] 2015-05-20 13:22:32: pid 902: LOG: failed to create watchdog sending socket May 20 13:22:32 localhost pgpool[902]: [5430-2] 2015-05-20 13:22:32: pid 902: DETAIL: connect() reports failure "No route to host" May 20 13:22:32 localhost pgpool[902]: [5430-3] 2015-05-20 13:22:32: pid 902: HINT: You can safely ignore this while starting up. May 20 13:22:32 localhost pgpool[902]: [5431-1] 2015-05-20 13:22:32: pid 902: LOG: watchdog sending packet for nodes May 20 13:22:32 localhost pgpool[902]: [5431-2] 2015-05-20 13:22:32: pid 902: DETAIL: packet for "192.168.10.189:9000" is canceled May 20 13:22:32 localhost pgpool: 2015-05-20 13:22:32: pid 902: LOG: failed to create watchdog sending socket May 20 13:22:32 localhost pgpool: 2015-05-20 13:22:32: pid 902: DETAIL: connect() reports failure "No route to host" May 20 13:22:32 localhost pgpool: 2015-05-20 13:22:32: pid 902: HINT: You can safely ignore this while starting up. May 20 13:22:32 localhost pgpool: 2015-05-20 13:22:32: pid 902: LOG: watchdog sending packet for nodes May 20 13:22:32 localhost pgpool: 2015-05-20 13:22:32: pid 902: DETAIL: packet for "192.168.10.189:9000" is canceled | ||||
| Tags | No tags attached. | ||||
|
|
|
|
|
the attached log in "messages" file show the situation when Standby dows not bring up delegate IP. At May 20 16:31:49 it notices that the Active .189 is not reachable, but no action is taken to make .188 watchdog active |
|
|
Interesting: in the described scenario, the Standby node is supposed to 1. promote PG 2. bring up the delegate IP It does none of the two. However, just when I plug the Active back to the network a few minutes later, Standby performs promoting! |
|
|
I believe this problem is an installer bug. installer2-pg92-3.4.0 in ./lib/pgpool.sh it does: for host 0 _writePgpoolParam heartbeat_destination0 "'${PGPOOL_HOST_ARR[0]}'" for host 1 _writePgpoolParam heartbeat_destination0 "'${PGPOOL_HOST_ARR[0]}'" |
|
|
Hi As in the heartbeat mode, the watchdog monitors the health of other watchdog nodes by sending out the periodic UDP packets, since the UDP is a connectionless protocol, so the heartbeat can only detect the failure of node if it notices the absence of heartbeat signals from another node. But lifecheck only starts to monitor the absence of heartbeat signals from other watchdog nodes after receiving atleast one heartbeat message from the node and before it receives the first heartbeat signal it considers the watchdog node as not started yet and hence remains silent when it does not receive heartbeat from the node. Apparently what is happening in the situation when lifecheck is failing to detect the cable unplug on the remote node is the cable is unplugged at the time of startup of pgpool-II before it sends the first heartbeat, so the node is not registered as a alive node when the cable was unplugged and consequently the other watchdog node never reacts. Also if you can provide the log of both standby and active watchdog when the situation happens that would be more helpful in analyzing the cause of the problem. As a side note we are also in the process of overhauling the watchdog and lifechecking process in pgpool-II 3.5 which is currently in the beta mode and hopefully that will provide a better experience and more features. |
|
|
Hi! We are done with this problem for a long time already. It appeared after installing pgpool using the installer2-pg92-3.4.0 . The installer had in ./lib/pgpool.sh it does: for host 0 _writePgpoolParam heartbeat_destination0 "'${PGPOOL_HOST_ARR[0]}'" while I believe it should have for host 0 _writePgpoolParam heartbeat_destination0 "'${PGPOOL_HOST_ARR[1]}'" After we made the config files by ourselves it works fluently. Thanks! |
|
|
Glad to hear your problem was solved. Many thanks for updating the status. |
|
|
The problem was fixed by the configuration changes |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2015-05-20 22:02 | januszb | New Issue | |
| 2015-05-20 23:41 | januszb | File Added: messages | |
| 2015-05-20 23:46 | januszb | Note Added: 0000535 | |
| 2015-05-21 00:50 | januszb | Note Added: 0000536 | |
| 2015-05-21 21:40 | januszb | Note Added: 0000538 | |
| 2015-08-04 10:19 | t-ishii | Assigned To | => Muhammad Usama |
| 2015-08-04 10:19 | t-ishii | Status | new => assigned |
| 2015-12-16 16:43 | Muhammad Usama | Note Added: 0000615 | |
| 2015-12-16 18:19 | januszb | Note Added: 0000616 | |
| 2015-12-16 20:36 | Muhammad Usama | Note Added: 0000617 | |
| 2015-12-16 20:38 | Muhammad Usama | Note Added: 0000618 | |
| 2015-12-16 20:38 | Muhammad Usama | Status | assigned => closed |
| 2015-12-16 20:38 | Muhammad Usama | Resolution | open => fixed |