View Issue Details

IDProjectCategoryView StatusLast Update
0000469Pgpool-II[All Projects] Generalpublic2019-03-21 19:51
ReporterdemisxAssigned Tot-ishii 
PriorityhighSeveritymajorReproducibilityalways
Status resolvedResolutionopen 
PlatformLinuxOSUbuntu OS Version18.04 LTS
Product Version4.0.2 
Target VersionFixed in Version 
Summary0000469: LOG ERROR: primary 0 owns only 0 standbys out of 1
DescriptionI have pgpool-II running on app server managing two database nodes (0 - primary and 1 - standby). The psql -c 'show pool_nodes' command shows both nodes being up:

```
$ psql -c 'show pool_nodes'
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | last_status_change
---------+-----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
 0 | 10.1.1.20 | 5432 | up | 0.500000 | primary | 0 | true | 0 | 2019-02-22 21:37:55
 1 | 10.1.3.20 | 5432 | up | 0.500000 | standby | 0 | false | 0 | 2019-02-22 21:44:49
```

However, I've started seeing this error today written to the log every 5 sec after testing online recovery and fallback processes:

```
2019-02-22 22:04:04: pid 21998: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
```

I can't figure out what may be wrong. Any idea where I can look?
Tagserror, logs

Activities

demisx

2019-02-23 07:20

reporter   ~0002398

Here is the full log. The error is written every 10 seconds actually and I believe it's coming from sr_check_period=10 setting. The PostgreSQL streaming replication appears to work properly. That is when I create and populate a test table on primary, I can see the same happening on standby.

```
$ sudo -u postgres pgpool -n
---
2019-02-22 22:15:47: pid 22204: LOG: Setting up socket for 0.0.0.0:5432
2019-02-22 22:15:47: pid 22204: LOG: Setting up socket for :::5432
2019-02-22 22:15:47: pid 22204: LOG: find_primary_node_repeatedly: waiting for finding a primary node
2019-02-22 22:15:47: pid 22204: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2019-02-22 22:15:47: pid 22204: LOG: find_primary_node: primary node is 0
2019-02-22 22:15:47: pid 22204: LOG: find_primary_node: standby node is 1
2019-02-22 22:15:47: pid 22204: LOG: pgpool-II successfully started. version 4.0.2 (torokiboshi)
2019-02-22 22:15:47: pid 22204: LOG: node status[0]: 1
2019-02-22 22:15:47: pid 22204: LOG: node status[1]: 2
2019-02-22 22:15:47: pid 22238: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2019-02-22 22:15:57: pid 22238: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2019-02-22 22:16:07: pid 22238: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2019-02-22 22:16:17: pid 22238: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2019-02-22 22:16:27: pid 22238: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2019-02-22 22:16:37: pid 22238: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2019-02-22 22:16:47: pid 22238: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
```

demisx

2019-02-24 00:37

reporter   ~0002399

I've narrowed it down a little more. It appears that though my backend node is running in streaming replication, the following query returns NULL:

```
2019-02-23 15:32:32: pid 17383: DEBUG: verify_backend_node_status: pg_stat_wal_receiver status for standby 1 is NULL
2019-02-23 15:32:32: pid 17383: LOCATION: pgpool_main.c:3271
```

If I query the same standby node via pgpool, I get the status back:

```
$ psql -x -h 10.1.3.20 -c "SELECT status, conninfo FROM pg_stat_wal_receiver";
-[ RECORD 1 ]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
status | streaming
conninfo | user=repl passfile=/var/lib/postgresql/.pgpass dbname=replication host=10.1.1.20 port=5432 fallback_application_name=walreceiver sslmode=prefer sslcompression=0 krbsrvname=postgres target_session_attrs=any
```

Any idea what I may be missing here?

demisx

2019-02-24 03:34

reporter   ~0002400

OK. I was finally able to fix this. The problem was in sr_check_user= 'pgpool' not having permissions to query `pg_stat_wal_receiver` view. It became apparent after passing the pgpool username in the SELECT:

```
$ psql -x -h [standby.db.node.ip] -U pgpool -d postgres -c "SELECT status, conninfo FROM pg_stat_wal_receiver";
```

The issue could be fixed with making pgpool user part of "pg_monitor` role:

```
GRANT pg_monitor to pgpool;
```

This step seems to be missing from http://www.pgpool.net/docs/latest/en/html/example-cluster.html docs. Would be nice to include it. Also, please consider moving your code to GitHub or GitLab. This way we can all help to contribute such fixes with pull requests.

This issue can be closed.

t-ishii

2019-02-24 21:07

developer   ~0002403

Thanks for the report. Unfortunately the solution is only available in PostgreSQL 10 or later (9.6 does not have pg_monitor role). Anyway your point is valid. I will update the docs.

> consider moving your code to GitHub or GitLab. This way we can all help to contribute such fixes with pull requests.
Sorry, we don't have the plan. You could still send us patches to us using our git repository (like PostgreSQL development).

Issue History

Date Modified Username Field Change
2019-02-23 07:05 demisx New Issue
2019-02-23 07:05 demisx Tag Attached: error
2019-02-23 07:05 demisx Tag Attached: logs
2019-02-23 07:20 demisx Note Added: 0002398
2019-02-24 00:37 demisx Note Added: 0002399
2019-02-24 03:34 demisx Note Added: 0002400
2019-02-24 21:03 t-ishii Assigned To => t-ishii
2019-02-24 21:03 t-ishii Status new => assigned
2019-02-24 21:07 t-ishii Note Added: 0002403
2019-03-21 19:51 t-ishii Status assigned => resolved