View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000469 | Pgpool-II | General | public | 2019-02-23 07:05 | 2019-03-21 19:51 |
| Reporter | demisx | Assigned To | t-ishii | ||
| Priority | high | Severity | major | Reproducibility | always |
| Status | resolved | Resolution | open | ||
| Platform | Linux | OS | Ubuntu | OS Version | 18.04 LTS |
| Product Version | 4.0.2 | ||||
| Summary | 0000469: LOG ERROR: primary 0 owns only 0 standbys out of 1 | ||||
| Description | I have pgpool-II running on app server managing two database nodes (0 - primary and 1 - standby). The psql -c 'show pool_nodes' command shows both nodes being up: ``` $ psql -c 'show pool_nodes' node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | last_status_change ---------+-----------+------+--------+-----------+---------+------------+-------------------+-------------------+--------------------- 0 | 10.1.1.20 | 5432 | up | 0.500000 | primary | 0 | true | 0 | 2019-02-22 21:37:55 1 | 10.1.3.20 | 5432 | up | 0.500000 | standby | 0 | false | 0 | 2019-02-22 21:44:49 ``` However, I've started seeing this error today written to the log every 5 sec after testing online recovery and fallback processes: ``` 2019-02-22 22:04:04: pid 21998: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1 ``` I can't figure out what may be wrong. Any idea where I can look? | ||||
| Tags | error, logs | ||||
|
|
Here is the full log. The error is written every 10 seconds actually and I believe it's coming from sr_check_period=10 setting. The PostgreSQL streaming replication appears to work properly. That is when I create and populate a test table on primary, I can see the same happening on standby. ``` $ sudo -u postgres pgpool -n --- 2019-02-22 22:15:47: pid 22204: LOG: Setting up socket for 0.0.0.0:5432 2019-02-22 22:15:47: pid 22204: LOG: Setting up socket for :::5432 2019-02-22 22:15:47: pid 22204: LOG: find_primary_node_repeatedly: waiting for finding a primary node 2019-02-22 22:15:47: pid 22204: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1 2019-02-22 22:15:47: pid 22204: LOG: find_primary_node: primary node is 0 2019-02-22 22:15:47: pid 22204: LOG: find_primary_node: standby node is 1 2019-02-22 22:15:47: pid 22204: LOG: pgpool-II successfully started. version 4.0.2 (torokiboshi) 2019-02-22 22:15:47: pid 22204: LOG: node status[0]: 1 2019-02-22 22:15:47: pid 22204: LOG: node status[1]: 2 2019-02-22 22:15:47: pid 22238: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1 2019-02-22 22:15:57: pid 22238: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1 2019-02-22 22:16:07: pid 22238: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1 2019-02-22 22:16:17: pid 22238: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1 2019-02-22 22:16:27: pid 22238: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1 2019-02-22 22:16:37: pid 22238: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1 2019-02-22 22:16:47: pid 22238: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1 ``` |
|
|
I've narrowed it down a little more. It appears that though my backend node is running in streaming replication, the following query returns NULL: ``` 2019-02-23 15:32:32: pid 17383: DEBUG: verify_backend_node_status: pg_stat_wal_receiver status for standby 1 is NULL 2019-02-23 15:32:32: pid 17383: LOCATION: pgpool_main.c:3271 ``` If I query the same standby node via pgpool, I get the status back: ``` $ psql -x -h 10.1.3.20 -c "SELECT status, conninfo FROM pg_stat_wal_receiver"; -[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- status | streaming conninfo | user=repl passfile=/var/lib/postgresql/.pgpass dbname=replication host=10.1.1.20 port=5432 fallback_application_name=walreceiver sslmode=prefer sslcompression=0 krbsrvname=postgres target_session_attrs=any ``` Any idea what I may be missing here? |
|
|
OK. I was finally able to fix this. The problem was in sr_check_user= 'pgpool' not having permissions to query `pg_stat_wal_receiver` view. It became apparent after passing the pgpool username in the SELECT: ``` $ psql -x -h [standby.db.node.ip] -U pgpool -d postgres -c "SELECT status, conninfo FROM pg_stat_wal_receiver"; ``` The issue could be fixed with making pgpool user part of "pg_monitor` role: ``` GRANT pg_monitor to pgpool; ``` This step seems to be missing from http://www.pgpool.net/docs/latest/en/html/example-cluster.html docs. Would be nice to include it. Also, please consider moving your code to GitHub or GitLab. This way we can all help to contribute such fixes with pull requests. This issue can be closed. |
|
|
Thanks for the report. Unfortunately the solution is only available in PostgreSQL 10 or later (9.6 does not have pg_monitor role). Anyway your point is valid. I will update the docs. > consider moving your code to GitHub or GitLab. This way we can all help to contribute such fixes with pull requests. Sorry, we don't have the plan. You could still send us patches to us using our git repository (like PostgreSQL development). |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2019-02-23 07:05 | demisx | New Issue | |
| 2019-02-23 07:05 | demisx | Tag Attached: error | |
| 2019-02-23 07:05 | demisx | Tag Attached: logs | |
| 2019-02-23 07:20 | demisx | Note Added: 0002398 | |
| 2019-02-24 00:37 | demisx | Note Added: 0002399 | |
| 2019-02-24 03:34 | demisx | Note Added: 0002400 | |
| 2019-02-24 21:03 | t-ishii | Assigned To | => t-ishii |
| 2019-02-24 21:03 | t-ishii | Status | new => assigned |
| 2019-02-24 21:07 | t-ishii | Note Added: 0002403 | |
| 2019-03-21 19:51 | t-ishii | Status | assigned => resolved |