[pgpool-general: 6506] New pgpool-II-95 4.0.4 install in front of a 3-node repmgr95 postgresql-9.5 cluster - not finding all the nodes

Sat Apr 13 00:04:22 JST 2019

I "feel like" it should be working since so much of it is working, except
the main function of the s/w seems to be failing me.

my repmgr95 says this:

ID | Name | Role | Status | Upstream | Location | Connection string
----+---------+---------+-----------+----------+----------+----------------------------------------------------------
1 | r01sv05 | standby | running | r01sv04 | default | host=r01sv05
user=repmgr dbname=repmgr connect_timeout=2
2 | r01sv04 | primary | * running | | default | host=r01sv04 user=repmgr
dbname=repmgr connect_timeout=2
3 | r01sv03 | standby | running | r01sv04 | default | host=r01sv03
user=repmgr dbname=repmgr connect_timeout=2

(actually 05 is now the primary, that is an old shot)

r01sv02 is the pgpool server btw, and they are all on the same subnet.

my pgpool says this:

-bash-4.2$ psql -U pgpool --dbname=pgpool --host r01sv02 -c "show
pool_nodes"
 node_id | hostname | port | status | lb_weight |  role   | select_cnt |
load_balance_node | replication_delay | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
 0       | r01sv03  | 5432 | up     | 1.000000  | standby | 0          |
true              | 0                 | 2019-04-11 19:48:43
(1 row)

pgpool keeps logging this:

Apr 12 14:03:03 r01sv02.change.me pgpool[14630]: [259-1] 2019-04-12
14:03:03: pid 14630: LOG:  find_primary_node: standby node is 0
Apr 12 14:03:03 r01sv02.change.me pgpool[14630]: [259-2] 2019-04-12
14:03:03: pid 14630: LOCATION:  pgpool_main.c:3438
Apr 12 14:03:04 r01sv02.change.me pgpool[14630]: [260-1] 2019-04-12
14:03:04: pid 14630: LOG:  find_primary_node: standby node is 0
Apr 12 14:03:04 r01sv02.change.me pgpool[14630]: [260-2] 2019-04-12
14:03:04: pid 14630: LOCATION:  pgpool_main.c:3438
Apr 12 14:03:05 r01sv02.change.me pgpool[14630]: [261-1] 2019-04-12
14:03:05: pid 14630: LOG:  find_primary_node: standby node is 0
and occasionally the find_primary_node_repeatedly line

Quick summary of my setup:
3 postgresql-9.5 db nodes, one is primary, the other two are standby, in a
streaming replication cluster built and managed with repmgr95.  This is
working fine.

1 pgpool 4.0.4 server that has the same version of postgresql-9.5 and
postgres user setup as the other 3.
- pgpool is running as postgres

what does work:
-the postgres user has ssh access to/from any of the four servers. I can
remotely run repmgr from the pgpool server as postgres user with no problem
-psql can access all the db's says with simple \list or \dt or whatever
from any of the 4 nodes asking for 5432 access from any of the four nodes,
even from the pgpool server
-i can use the postgres user or pgpool user with psql
- dns is working too, but I changed from using the hostname to the IP's in
the config file in case it made a difference, but it did not.

I've even run this commands by hand and it gets the right answers:

-bash-4.2$ psql -U pgpool --dbname=pgpool --host r01sv02 -c "SELECT
pg_is_in_recovery();"
 pg_is_in_recovery
-------------------
 t
(1 row)

-bash-4.2$ psql -U pgpool --dbname=pgpool --host r01sv03 -c "SELECT
pg_is_in_recovery();"
 pg_is_in_recovery
-------------------
 t
(1 row)

-bash-4.2$ psql -U pgpool --dbname=pgpool --host r01sv04 -c "SELECT
pg_is_in_recovery();"
 pg_is_in_recovery
-------------------
 t
(1 row)

-bash-4.2$ psql -U pgpool --dbname=pgpool --host r01sv05 -c "SELECT
pg_is_in_recovery();"
 pg_is_in_recovery
-------------------
 f
(1 row)

pgpool for some reason finds one of the three nodes, a standby node, and it
has the right.

the pgpool database I created, I created on my primary.  I had thought that
when pgpool started up it might put some stuff in that database, but I
haven't seen anything, in case that is the problem.  i found notes on
creating said database and user, but have seen nothing on actually putting
anything in it by hand.--anyway, I was just looking at that in case it is
something

Main question -- where are the other two nodes?

Also, I've noted that each time I start pgpool, it throws those errors
(above) until the steps reaches 300, then it finally says "successfully
started" and at that point the pcp_* commands will work, before then it has
not yet created the pcp socket.  Don't know if that is normal/expected or
not.  Seemed odd to me, for basic commands to take 5 minutes to even be
available.

The other thing is that while it will come up for a while, pgpool seems to
be stopping itself after about 10 minutes or so.  the log just says that
pgpool was told to stop (but I didn't do it).

I've attached a sanitized version of my pgpool.conf file

In case it helps, here also is the sanitized contents of the .pgpass and
.pcppass files in the postgres home dir of all four of my servers and the
pool_passwd, in case you see a problem with these (they are 600 owned by
postgres).

-bash-4.2$ cat .pgpass
r01sv02:5432:*:pgpool:sanitized
r01sv05:5432:*:postgres:pgpool:sanitized
r01sv04:5432:*:postgres:pgpool:sanitized
r01sv03:5432:*:postgres:pgpool:sanitized
r01sv05:5432:replication:repmgr:pgpool:sanitized
r01sv04:5432:replication:repmgr:pgpool:sanitized
r01sv03:5432:replication:repmgr:pgpool:sanitized

-bash-4.2$ cat .pcppass
*:*:pgpool:pgpool:sanitized
*:*:postgres:pgpool:sanitized

pcp.conf
pgpool:sanitized
nrpe:sanitized
postgres:sanitized

pool_passwd
pgpool:sanitized
nrpe:sanitized
postgres:sanitized

-bash-4.2$ cat pool_hba.conf
# pgpool Client Authentication Configuration File

# "local" is for Unix domain socket connections only
local   all         all                               trust
# IPv4 local connections:
host    all         all         127.0.0.1/32          trust
host    all         all         ::1/128               trust
host    all         all         192.x.y.0/24             md5

Thanks,
Rob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20190412/7763e5b6/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool-sanitized.conf
Type: application/octet-stream
Size: 42043 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20190412/7763e5b6/attachment-0001.obj>