[pgpool-general: 6509] Re: New pgpool-II-95 4.0.4 install in front of a 3-node repmgr95 postgresql-9.5 cluster - not finding all the nodes

Sat Apr 13 05:00:43 JST 2019

Strange, i don't see any reason in the config you sent.
But it looks like it stops always at 03 and 33, so every 2 times per hour. Isn't there a cron that stop it every 30 min ?
Maybe you can increase the log verbosity ?
Pierre 

    On Friday, April 12, 2019, 9:10:50 PM GMT+2, Rob Reinhardt <rreinhardt at eitccorp.com> wrote:  

 Thank you Pierre.  That fixed that problem.
I have another though.  It will not stay running.  Just sitting there doing nothing or just monitoring the node status every 10 seconds, it claims to the a stop request and shuts itself down.
I'm on Redhat 7.5 if that helps.
----deleted the stuff in between start and automatic? stops
Apr 12 17:37:31  pgpool[6291]: [6-1] 2019-04-12 17:37:31: pid 6291: LOG:  pgpool-II successfully started. version 4.0.4 (torokiboshi)Apr 12 18:03:10  pgpool[9426]: [1-1] 2019-04-12 18:03:10: pid 9426: LOG:  stop request sent to pgpool. waiting for termination...~26 minutes it stopped itself?
Apr 12 18:10:42  pgpool[10288]: [6-1] 2019-04-12 18:10:42: pid 10288: LOG:  pgpool-II successfully started. version 4.0.4 (torokiboshi)Apr 12 18:33:10  pgpool[13333]: [1-1] 2019-04-12 18:33:10: pid 13333: LOG:  stop request sent to pgpool. waiting for termination...~23 minutes it stopped itself?
@trying this time without the 10second watch monitor to see if that changes anything...Apr 12 18:42:34  pgpool[14349]: [6-1] 2019-04-12 18:42:34: pid 14349: LOG:  pgpool-II successfully started. version 4.0.4 (torokiboshi)Apr 12 19:03:10 pgpool[16505]: [1-1] 2019-04-12 19:03:10: pid 16505: LOG:  stop request sent to pgpool. waiting for termination...~21 mins?
I feel  like I'm in the movie Independence Day and there is a countdown to something even less nice.

On Fri, Apr 12, 2019 at 1:09 PM Pierre Timmermans <ptim007 at yahoo.com> wrote:

Hello,
In your config you have
backend_hostname0 = '192.x.y.a'

3 times, it should be once backend_hostname0, backend_hostname1 and backend_hostname2 (the same for backend_port0,  etc)

Pierre 

    On Friday, April 12, 2019, 5:05:05 PM GMT+2, Rob Reinhardt <rreinhardt at eitccorp.com> wrote:  

 I "feel like" it should be working since so much of it is working, except the main function of the s/w seems to be failing me.
my repmgr95 says this:
ID | Name | Role | Status | Upstream | Location | Connection string----+---------+---------+-----------+----------+----------+----------------------------------------------------------1 | r01sv05 | standby | running | r01sv04 | default | host=r01sv05 user=repmgr dbname=repmgr connect_timeout=22 | r01sv04 | primary | * running | | default | host=r01sv04 user=repmgr dbname=repmgr connect_timeout=23 | r01sv03 | standby | running | r01sv04 | default | host=r01sv03 user=repmgr dbname=repmgr connect_timeout=2
(actually 05 is now the primary, that is an old shot)
r01sv02 is the pgpool server btw, and they are all on the same subnet.
my pgpool says this:
-bash-4.2$ psql -U pgpool --dbname=pgpool --host r01sv02 -c "show pool_nodes" node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | last_status_change  ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+--------------------- 0       | r01sv03  | 5432 | up     | 1.000000  | standby | 0          | true              | 0                 | 2019-04-11 19:48:43(1 row)
pgpool keeps logging this:
Apr 12 14:03:03 r01sv02.change.me pgpool[14630]: [259-1] 2019-04-12 14:03:03: pid 14630: LOG:  find_primary_node: standby node is 0Apr 12 14:03:03 r01sv02.change.me pgpool[14630]: [259-2] 2019-04-12 14:03:03: pid 14630: LOCATION:  pgpool_main.c:3438Apr 12 14:03:04 r01sv02.change.me pgpool[14630]: [260-1] 2019-04-12 14:03:04: pid 14630: LOG:  find_primary_node: standby node is 0Apr 12 14:03:04 r01sv02.change.me pgpool[14630]: [260-2] 2019-04-12 14:03:04: pid 14630: LOCATION:  pgpool_main.c:3438Apr 12 14:03:05 r01sv02.change.me pgpool[14630]: [261-1] 2019-04-12 14:03:05: pid 14630: LOG:  find_primary_node: standby node is 0and occasionally the find_primary_node_repeatedly line
Quick summary of my setup:3 postgresql-9.5 db nodes, one is primary, the other two are standby, in a streaming replication cluster built and managed with repmgr95.  This is working fine.
1 pgpool 4.0.4 server that has the same version of postgresql-9.5 and postgres user setup as the other 3.- pgpool is running as postgres
what does work:-the postgres user has ssh access to/from any of the four servers. I can remotely run repmgr from the pgpool server as postgres user with no problem-psql can access all the db's says with simple \list or \dt or whatever from any of the 4 nodes asking for 5432 access from any of the four nodes, even from the pgpool server-i can use the postgres user or pgpool user with psql- dns is working too, but I changed from using the hostname to the IP's in the config file in case it made a difference, but it did not.
I've even run this commands by hand and it gets the right answers:
-bash-4.2$ psql -U pgpool --dbname=pgpool --host r01sv02 -c "SELECT pg_is_in_recovery();" pg_is_in_recovery ------------------- t(1 row)
-bash-4.2$ psql -U pgpool --dbname=pgpool --host r01sv03 -c "SELECT pg_is_in_recovery();" pg_is_in_recovery ------------------- t(1 row)
-bash-4.2$ psql -U pgpool --dbname=pgpool --host r01sv04 -c "SELECT pg_is_in_recovery();" pg_is_in_recovery ------------------- t(1 row)
-bash-4.2$ psql -U pgpool --dbname=pgpool --host r01sv05 -c "SELECT pg_is_in_recovery();" pg_is_in_recovery ------------------- f(1 row)
pgpool for some reason finds one of the three nodes, a standby node, and it has the right.
the pgpool database I created, I created on my primary.  I had thought that when pgpool started up it might put some stuff in that database, but I haven't seen anything, in case that is the problem.  i found notes on creating said database and user, but have seen nothing on actually putting anything in it by hand.--anyway, I was just looking at that in case it is something
Main question -- where are the other two nodes?
Also, I've noted that each time I start pgpool, it throws those errors (above) until the steps reaches 300, then it finally says "successfully started" and at that point the pcp_* commands will work, before then it has not yet created the pcp socket.  Don't know if that is normal/expected or not.  Seemed odd to me, for basic commands to take 5 minutes to even be available.
The other thing is that while it will come up for a while, pgpool seems to be stopping itself after about 10 minutes or so.  the log just says that pgpool was told to stop (but I didn't do it).
I've attached a sanitized version of my pgpool.conf file
In case it helps, here also is the sanitized contents of the .pgpass and .pcppass files in the postgres home dir of all four of my servers and the pool_passwd, in case you see a problem with these (they are 600 owned by postgres).
-bash-4.2$ cat .pgpassr01sv02:5432:*:pgpool:sanitizedr01sv05:5432:*:postgres:pgpool:sanitizedr01sv04:5432:*:postgres:pgpool:sanitizedr01sv03:5432:*:postgres:pgpool:sanitizedr01sv05:5432:replication:repmgr:pgpool:sanitizedr01sv04:5432:replication:repmgr:pgpool:sanitizedr01sv03:5432:replication:repmgr:pgpool:sanitized
-bash-4.2$ cat .pcppass*:*:pgpool:pgpool:sanitized*:*:postgres:pgpool:sanitized
pcp.confpgpool:sanitizednrpe:sanitizedpostgres:sanitized
pool_passwdpgpool:sanitizednrpe:sanitizedpostgres:sanitized

-bash-4.2$ cat pool_hba.conf# pgpool Client Authentication Configuration File
# "local" is for Unix domain socket connections onlylocal   all         all                               trust# IPv4 local connections:host    all         all         127.0.0.1/32          trusthost    all         all         ::1/128               trusthost    all         all         192.x.y.0/24             md5
Thanks,Rob

_______________________________________________
pgpool-general mailing list
pgpool-general at pgpool.net
http://www.pgpool.net/mailman/listinfo/pgpool-general

-- 
Rob ReinhardtDevOps EngineerEnlighten IT Consulting (EITC), a MacAulay-Brown, Inc. company

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20190412/87b35c00/attachment.html>