0000510: pcp_recovery_node fails: node id 1 is not valid - Pgpool-II Bug Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0000510	Pgpool-II	Bug	public	2019-05-09 04:44	2019-05-17 09:54

Reporter	ankur	Assigned To
Priority	high	Severity	major	Reproducibility	always
Status	closed	Resolution	open
Platform	linux	OS	rhel	OS Version	7.6
Product Version	4.0.4

Summary	0000510: pcp_recovery_node fails: node id 1 is not valid
Description	I am trying to bring up postgresql 9.5.14 cluster with pgpool 4.0.4. It consists of only 2 Machines, rhel 7.6 . Node1 is physical machine while node2 is virtual. I have successfully installed pgpool and I am able to run it as service. VIP is getting acquired successfully by pgpool ( or atleast it appears to me) There are 2 problems I am facing and I believe 2nd one is somehow related to 1st one. Problem 1#. I got confirmation from linux sysadmin that physical node1 has different network interface naming scheme than 2nd node (virtual). On node1 , I have following network interfaces: bond0 bond0.431 bond0.431:0 bond0.849 eno1 to eno5 eno6d1 ens1f0 ens1f1 lo while on node2 machine I have simple 3 such network interfaces: lo ens192 ens224 When I am starting pgpool.service, i can see following error in its logs: ... Cannot find device "ens192" ... whereas the ip_cmd in pgpool.conf is : if_up_cmd = 'ip_w addr add $_IP_$/32 dev ens192 label ens192:0' I am aware that there is no network interface ens192 on this machine to further allow me to configure one additional interface ens192:0 to it but I have no clue how should I configure my ip_up_cmd in this case. I mean which of the interface I should choose from existing one. Problem 2#. I have found in another pgppol case, that probably above is not at all the problem. Also, I am able to see following in my pgpool logs: ... successfully acquired the delegate IP:"193.64.42.47" ... Real issue in this part is , whenever I try to execute : pcp_recovery_node -h 193.64.42.47 -p 9898 -U postgres -n 1 It provides following right after receiving password : ERROR: process recovery request failed DETAIL: node id 1 is not valid Now I see in pgpool logs , there is no line which indicates me that node1 is set.I mean following is not there anywhere in logs ( coming form another successful cluster I setup recently) node status[0]: 1 node status[1]: 0 Could someone please help me. Here is complete logs of pgpool service startup which includes above 2 logs lines ( i have removed datetime from log lines): systemd[1]: Started Pgpool-II. pgpool[16298]: 2019-05-08 22:21:24: pid 16298: LOG: waiting for watchdog to initialize pgpool[16298]: 2019-05-08 22:21:24: pid 16299: LOG: setting the local watchdog node name to "193.64.42.48:5433 Linux sma-db1t.i pgpool[16298]: 2019-05-08 22:21:24: pid 16299: LOG: watchdog cluster is configured with 1 remote nodes pgpool[16298]: 2019-05-08 22:21:24: pid 16299: LOG: watchdog remote node:0 on 193.64.42.49:9000 pgpool[16298]: 2019-05-08 22:21:24: pid 16299: LOG: interface monitoring is disabled in watchdog pgpool[16298]: 2019-05-08 22:21:24: pid 16299: LOG: watchdog node state changed from [DEAD] to [LOADING] pgpool[16298]: 2019-05-08 22:21:29: pid 16299: LOG: watchdog node state changed from [LOADING] to [JOINING] pgpool[16298]: 2019-05-08 22:21:33: pid 16299: LOG: watchdog node state changed from [JOINING] to [INITIALIZING] pgpool[16298]: 2019-05-08 22:21:34: pid 16299: LOG: I am the only alive node in the watchdog cluster pgpool[16298]: 2019-05-08 22:21:34: pid 16299: HINT: skipping stand for coordinator state pgpool[16298]: 2019-05-08 22:21:34: pid 16299: LOG: watchdog node state changed from [INITIALIZING] to [MASTER] pgpool[16298]: 2019-05-08 22:21:34: pid 16299: LOG: I am announcing my self as master/coordinator watchdog node pgpool[16298]: 2019-05-08 22:21:38: pid 16299: LOG: I am the cluster leader node pgpool[16298]: 2019-05-08 22:21:38: pid 16299: DETAIL: our declare coordinator message is accepted by all nodes pgpool[16298]: 2019-05-08 22:21:38: pid 16299: LOG: setting the local node "193.64.42.48:5433 Linux sma-db1t.itidm.elisa.fi" as pgpool[16298]: 2019-05-08 22:21:38: pid 16299: LOG: I am the cluster leader node. Starting escalation process pgpool[16298]: 2019-05-08 22:21:38: pid 16298: LOG: watchdog process is initialized pgpool[16298]: 2019-05-08 22:21:38: pid 16299: LOG: escalation process started with PID:16324 pgpool[16298]: 2019-05-08 22:21:38: pid 16324: LOG: watchdog: escalation started pgpool[16298]: 2019-05-08 22:21:38: pid 16299: LOG: new IPC connection received pgpool[16298]: 2019-05-08 22:21:38: pid 16299: LOG: new IPC connection received pgpool[16298]: 2019-05-08 22:21:38: pid 16325: LOG: 2 watchdog nodes are configured for lifecheck pgpool[16298]: 2019-05-08 22:21:38: pid 16325: LOG: watchdog nodes ID:0 Name:"193.64.42.48:5433 Linux sma-db1t.itidm.elisa.fi" pgpool[16298]: 2019-05-08 22:21:38: pid 16325: DETAIL: Host:"193.64.42.48" WD Port:9000 pgpool-II port:5433 pgpool[16298]: 2019-05-08 22:21:38: pid 16325: LOG: watchdog nodes ID:1 Name:"Not_Set" pgpool[16298]: 2019-05-08 22:21:38: pid 16325: DETAIL: Host:"193.64.42.49" WD Port:9000 pgpool-II port:5433 pgpool[16298]: 2019-05-08 22:21:38: pid 16298: LOG: Setting up socket for 0.0.0.0:5433 pgpool[16298]: 2019-05-08 22:21:38: pid 16298: LOG: Setting up socket for :::5433 pgpool[16298]: 2019-05-08 22:21:38: pid 16298: LOG: perhaps failed to create INET domain socket pgpool[16298]: 2019-05-08 22:21:38: pid 16298: DETAIL: socket(::) failed: "Address family not supported by protocol" pgpool[16298]: Cannot find device "ens192" pgpool[16298]: arping: Suitable device could not be determined. Please, use option -I. pgpool[16298]: Usage: arping [-fqbDUAV] [-c count] [-w timeout] [-I device] [-s source] destination pgpool[16298]: -f : quit on first reply pgpool[16298]: -q : be quiet pgpool[16298]: -b : keep broadcasting, don't go unicast pgpool[16298]: -D : duplicate address detection mode pgpool[16298]: -U : Unsolicited ARP mode, update your neighbours pgpool[16298]: -A : ARP answer mode, update your neighbours pgpool[16298]: -V : print version and exit pgpool[16298]: -c count : how many packets to send pgpool[16298]: -w timeout : how long to wait for a reply pgpool[16298]: -I device : which ethernet device to use pgpool[16298]: -s source : source ip address pgpool[16298]: destination : ask for what ip address pgpool[16298]: 2019-05-08 22:21:38: pid 16298: LOG: find_primary_node_repeatedly: waiting for finding a primary node pgpool[16298]: 2019-05-08 22:21:38: pid 16298: LOG: find_primary_node: primary node is 0 pgpool[16298]: 2019-05-08 22:21:38: pid 16298: LOG: pgpool-II successfully started. version 4.0.4 (torokiboshi) pgpool[16298]: 2019-05-08 22:21:38: pid 16298: LOG: node status[0]: 1 pgpool[16298]: 2019-05-08 22:21:39: pid 16329: LOG: set SO_REUSEPORT option to the socket pgpool[16298]: 2019-05-08 22:21:39: pid 16329: LOG: creating watchdog heartbeat receive socket. pgpool[16298]: 2019-05-08 22:21:39: pid 16329: DETAIL: set SO_REUSEPORT pgpool[16298]: 2019-05-08 22:21:39: pid 16330: LOG: set SO_REUSEPORT option to the socket pgpool[16298]: 2019-05-08 22:21:39: pid 16330: LOG: creating socket for sending heartbeat pgpool[16298]: 2019-05-08 22:21:39: pid 16330: DETAIL: set SO_REUSEPORT pgpool[16298]: 2019-05-08 22:21:40: pid 16324: LOG: successfully acquired the delegate IP:"193.64.42.47" pgpool[16298]: 2019-05-08 22:21:40: pid 16324: DETAIL: 'if_up_cmd' returned with success pgpool[16298]: 2019-05-08 22:21:40: pid 16299: LOG: watchdog escalation process with pid: 16324 exit with SUCCESS.
Steps To Reproduce	1. setup pgpool.service 2. start postgres 3. start pgpool.service 4. try bringing up the node1 with pcp command: pcp_recovery_node -h 193.64.42.47 -p 9898 -U postgres -n 1
Tags	master slave, pcp commands

ankur 2019-05-10 18:47 reporter ~0002594	Hi Team, You may close the issue. I found out the problems. 1. For network interfaces, I found out that the network interface i was trying to set was ens192 , which was indeed not available in RHEL 7.6. I used the name bond0.431:1 for vip binding and error went away. In any case, vip was getting acquired by pgpool both before I made the fix and after it. 2. I had the configuration for backend_hostname1 in commented format and that was throwing error: ERROR: process recovery request failed DETAIL: node id 1 is not valid I un - commented the needed lines and it worked.

Date Modified	Username	Field	Change
2019-05-09 04:44	ankur	New Issue
2019-05-09 04:44	ankur	Tag Attached: master slave
2019-05-09 04:44	ankur	Tag Attached: pcp commands
2019-05-10 18:47	ankur	Note Added: 0002594
2019-05-17 09:54	administrator	Status	new => closed