View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000133 | Pgpool-II | Bug | public | 2015-04-22 15:52 | 2016-03-25 02:43 |
| Reporter | mb | Assigned To | Muhammad Usama | ||
| Priority | normal | Severity | minor | Reproducibility | random |
| Status | feedback | Resolution | open | ||
| Summary | 0000133: Get empty parameters for failover.sh: Master and slave can not define a new primary node | ||||
| Description | pgPool is configurate in master-slave-mode with streaming replication. If I shut down the master database the failover.sh script should be execute and the slave database should be to master database. But the master execute the failover.sh script with empty parameters: Apr 21 14:58:05 lnx-dbpgxcdev2 pgpool: + FALLING_NODE=0 Apr 21 14:58:05 lnx-dbpgxcdev2 pgpool: + OLDPRIMARY_NODE=0 Apr 21 14:58:05 lnx-dbpgxcdev2 pgpool: + NEW_PRIMARY= Apr 21 14:58:05 lnx-dbpgxcdev2 pgpool: + PGDATA= Apr 21 14:58:05 lnx-dbpgxcdev2 pgpool: + '[' 0 = 0 ']' Apr 21 14:58:05 lnx-dbpgxcdev2 pgpool: + '[' '' '!=' '' ']' Apr 21 14:58:05 lnx-dbpgxcdev2 pgpool: + exit 0 and the slave executes the failover.sh script with following parameters: Apr 21 14:57:50 lnx-dbpgxcdev1 pgpool: + FALLING_NODE=1 Apr 21 14:57:50 lnx-dbpgxcdev1 pgpool: + OLDPRIMARY_NODE=1 Apr 21 14:57:50 lnx-dbpgxcdev1 pgpool: + NEW_PRIMARY=10.1.0.2 Apr 21 14:57:50 lnx-dbpgxcdev1 pgpool: + PGDATA=/var/lib/postgresql/9.3/main Apr 21 14:57:50 lnx-dbpgxcdev1 pgpool: + '[' 1 = 1 ']' Apr 21 14:57:50 lnx-dbpgxcdev1 pgpool: + '[' 10.1.0.2 '!=' '' ']' Apr 21 14:57:50 lnx-dbpgxcdev1 pgpool: + '[' 10.1.0.2 = 10.1.0.2 ']' Apr 21 14:57:50 lnx-dbpgxcdev1 pgpool: + /usr/lib/postgresql/9.3/bin/pg_ctl -D /var/lib/postgresql/9.3/main promote Apr 21 14:57:50 lnx-dbpgxcdev1 pgpool: Server wird befördert Apr 21 14:57:50 lnx-dbpgxcdev1 pgpool: + exit 0 Master and slave can not define a new primary node. Master set new primary node to -1 Apr 21 14:58:16 lnx-dbpgxcdev2 pgpool[22338]: [3540-1] 2015-04-21 14:58:16: pid 22338: LOG: failover: set new primary node: -1 Apr 21 14:58:16 lnx-dbpgxcdev2 pgpool: 2015-04-21 14:58:16: pid 22338: LOG: failover: set new primary node: -1 and the slave set new primary node first to 0 Apr 21 14:57:55 lnx-dbpgxcdev1 pgpool[5669]: [30-1] 2015-04-21 14:57:55: pid 5669: LOG: failover: set new primary node: 0 Apr 21 14:57:55 lnx-dbpgxcdev1 pgpool[5669]: [31-1] 2015-04-21 14:57:55: pid 5669: LOG: failover: set new master node: 0 and after a second to -1 Apr 21 14:58:09 lnx-dbpgxcdev1 pgpool[5669]: [127-1] 2015-04-21 14:58:09: pid 5669: LOG: failover: set new primary node: -1 Apr 21 14:58:09 lnx-dbpgxcdev1 pgpool: 2015-04-21 14:58:09: pid 5669: LOG: failover: set new primary node: -1 The database is no longer available over virtual IP. | ||||
| Steps To Reproduce | 1. Start postgresql on server1 2. Start postgresql on server2 3. Start pgPool-II on server1 4. Start pgPool-II on server2 5. Stop postgresql on server1 failover.sh script will be execute on master with empty parameters and on slave with correct parameters. | ||||
| Additional Information | pgPool-II version: 3.4.2-0 postgresql version: 9.3 | ||||
| Tags | No tags attached. | ||||
|
|
|
|
|
The problem is with your setup of pgpool-II on node 10.1.0.2. pgpool-II is not able to make a connection with PostgreSQL because of "Authentication failure" see the related log entries at the start of "syslog_10.1.0.2" file. Now, Since the pgpool-II is not able to communicate with any configured backends so no master node gets selected and failover script is called with empty parameter values for "master database host name %H" and "master database directory %R" as there is no master node. |
|
|
|
|
|
I have checked the configuration on node 10.1.0.2. The authentication parameters are correct. But I discover another problem which produces the empty parameters for failover.sh script. If the debug libraries for pgpool-II are installed on node 10.1.0.2 and on 10.1.0.3 the failover.sh script will be execute correctly with the right parameters. But if this libraries aren't installed the failover.sh script will be execute on node 10.1.0.2 with empty parameters and I will get a CoreDump on node 10.1.0.2. You can see the backtrace of CoreDump in attached files. |
|
|
Hi First of all sorry for the delayed response. Basically there are two seperate problems. One is the segmentation fault and the stacktrace shared by you suggests that it is in the lifecheck process so it should not have anything to do with the other problem you are facing of getting empty parameters in the failover command. I am looking into the crash issue which I am still not able to reproduce on my local setup. And the analysis on the original problem is as follows The reason you are getting the empty parameters in failover script is only because pgpool-II is not able to select a new master node after the failure of old primary node. The failover command in your pgpool-II configuration files is passing 4 parameters to the failover.sh script. parameter 3 and 4 (%H and %R) gets the value of Hostname and cluster path of new master node respectively. And these parameter will always get the empty values when pgpool-II does not find any valid backend node to select as a new master node. So the most probable reason the failover script is getting empty values for %H and %R parameters is, pgpool-II is not able to connect to the backend PostgreSQL server. And the logs files you shared shows many errors suggesting that pgpool-II is failing to connect to PostgreSQL backend node. For example in syslog_10.1.0.3 log file shared above there are many backend connection failure messages --log file snippet start-- Apr 21 14:58:00 lnx-dbpgxcdev2 pgpool[22397]: [6551-1] 2015-04-21 14:58:00: pid 22397: LOG: failed to connect to PostgreSQL server on "10.1.0.3:5432", Apr 21 14:58:00 lnx-dbpgxcdev2 pgpool: 2015-04-21 14:58:00: pid 22397: LOG: failed to connect to PostgreSQL server on "10.1.0.3:5432", Apr 21 14:58:00 lnx-dbpgxcdev2 pgpool[22397]: [6552-1] 2015-04-21 14:58:00: pid 22397: ERROR: failed to make persistent db connection Apr 21 14:58:00 lnx-dbpgxcdev2 pgpool[22397]: [6552-2] 2015-04-21 14:58:00: pid 22397: DETAIL: connection to host:"10.1.0.3:5432" failed Apr 21 14:58:00 lnx-dbpgxcdev2 pgpool: 2015-04-21 14:58:00: pid 22397: ERROR: failed to make persistent db connection Apr 21 14:58:00 lnx-dbpgxcdev2 pgpool: 2015-04-21 14:58:00: pid 22397: DETAIL: connection to host:"10.1.0.3:5432" failed --log file snippet end-- So can please check the reason of these connection failure messages appearing in the pgpool-II log files. See the PostgreSQL server log file to identify what is causing these connection failures. and secondly please check the output of "show pool_nodes;" command before triggering the failover, if pgpool-II is successfully connected to all PostgreSQL backend node. Thanks and best regards |
|
|
Hi, I have a feedback : The same issue occur with pgpool 3.5 and posgresql 9.5 My configuration is based on : http://www.pgpool.net/pgpool-web/contrib_docs/watchdog_master_slave/en.html When the debug libraries for pgpool-II are installed on all my nodes the failover.sh script is executed correctly with the right parameters. Without the debug libraries, if y shutdown postgres (the master) the failover.sh script is called with empty parameters. |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2015-04-22 15:52 | mb | New Issue | |
| 2015-04-22 15:52 | mb | File Added: configuration_files_and_logs.zip | |
| 2015-04-22 16:27 | t-ishii | Assigned To | => Muhammad Usama |
| 2015-04-22 16:27 | t-ishii | Status | new => assigned |
| 2015-04-25 01:29 | Muhammad Usama | Note Added: 0000530 | |
| 2015-05-04 16:24 | mb | File Added: backtrace.txt | |
| 2015-05-04 16:31 | mb | Note Added: 0000534 | |
| 2015-06-11 23:00 | Muhammad Usama | Note Added: 0000539 | |
| 2015-06-29 18:22 | Muhammad Usama | Status | assigned => feedback |
| 2016-03-25 02:43 | QM | Note Added: 0000719 |