View Issue Details

IDProjectCategoryView StatusLast Update
0000510Pgpool-IIBugpublic2019-05-17 09:54
Reporterankur Assigned To 
PriorityhighSeveritymajorReproducibilityalways
Status closedResolutionopen 
PlatformlinuxOSrhel OS Version7.6
Product Version4.0.4 
Summary0000510: pcp_recovery_node fails: node id 1 is not valid
DescriptionI am trying to bring up postgresql 9.5.14 cluster with pgpool 4.0.4.
It consists of only 2 Machines, rhel 7.6 . Node1 is physical machine while node2 is virtual.
I have successfully installed pgpool and I am able to run it as service.
VIP is getting acquired successfully by pgpool ( or atleast it appears to me)

There are 2 problems I am facing and I believe 2nd one is somehow related to 1st one.

Problem 1#. I got confirmation from linux sysadmin that physical node1 has different network interface naming scheme than 2nd node (virtual). On node1 , I have following network interfaces:
bond0
bond0.431
bond0.431:0
bond0.849
eno1 to eno5
eno6d1
ens1f0
ens1f1
lo

while on node2 machine I have simple 3 such network interfaces:
lo
ens192
ens224

When I am starting pgpool.service, i can see following error in its logs:
...
 Cannot find device "ens192"
...
whereas the ip_cmd in pgpool.conf is : if_up_cmd = 'ip_w addr add $_IP_$/32 dev ens192 label ens192:0'
I am aware that there is no network interface ens192 on this machine to further allow me to configure one additional interface ens192:0 to it but I have no clue how should I configure my ip_up_cmd in this case. I mean which of the interface I should choose from existing one.

Problem 2#. I have found in another pgppol case, that probably above is not at all the problem. Also, I am able to see following in my pgpool logs:
...
successfully acquired the delegate IP:"193.64.42.47"
...

Real issue in this part is , whenever I try to execute : pcp_recovery_node -h 193.64.42.47 -p 9898 -U postgres -n 1
It provides following right after receiving password :
    ERROR: process recovery request failed
    DETAIL: node id 1 is not valid

Now I see in pgpool logs , there is no line which indicates me that node1 is set.I mean following is not there anywhere in logs ( coming form another successful cluster I setup recently)
    node status[0]: 1
    node status[1]: 0

Could someone please help me.
Here is complete logs of pgpool service startup which includes above 2 logs lines ( i have removed datetime from log lines):

systemd[1]: Started Pgpool-II.
pgpool[16298]: 2019-05-08 22:21:24: pid 16298: LOG: waiting for watchdog to initialize
pgpool[16298]: 2019-05-08 22:21:24: pid 16299: LOG: setting the local watchdog node name to "193.64.42.48:5433 Linux sma-db1t.i
pgpool[16298]: 2019-05-08 22:21:24: pid 16299: LOG: watchdog cluster is configured with 1 remote nodes
pgpool[16298]: 2019-05-08 22:21:24: pid 16299: LOG: watchdog remote node:0 on 193.64.42.49:9000
pgpool[16298]: 2019-05-08 22:21:24: pid 16299: LOG: interface monitoring is disabled in watchdog
pgpool[16298]: 2019-05-08 22:21:24: pid 16299: LOG: watchdog node state changed from [DEAD] to [LOADING]
pgpool[16298]: 2019-05-08 22:21:29: pid 16299: LOG: watchdog node state changed from [LOADING] to [JOINING]
pgpool[16298]: 2019-05-08 22:21:33: pid 16299: LOG: watchdog node state changed from [JOINING] to [INITIALIZING]
pgpool[16298]: 2019-05-08 22:21:34: pid 16299: LOG: I am the only alive node in the watchdog cluster
pgpool[16298]: 2019-05-08 22:21:34: pid 16299: HINT: skipping stand for coordinator state
pgpool[16298]: 2019-05-08 22:21:34: pid 16299: LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
pgpool[16298]: 2019-05-08 22:21:34: pid 16299: LOG: I am announcing my self as master/coordinator watchdog node
pgpool[16298]: 2019-05-08 22:21:38: pid 16299: LOG: I am the cluster leader node
pgpool[16298]: 2019-05-08 22:21:38: pid 16299: DETAIL: our declare coordinator message is accepted by all nodes
pgpool[16298]: 2019-05-08 22:21:38: pid 16299: LOG: setting the local node "193.64.42.48:5433 Linux sma-db1t.itidm.elisa.fi" as
pgpool[16298]: 2019-05-08 22:21:38: pid 16299: LOG: I am the cluster leader node. Starting escalation process
pgpool[16298]: 2019-05-08 22:21:38: pid 16298: LOG: watchdog process is initialized
pgpool[16298]: 2019-05-08 22:21:38: pid 16299: LOG: escalation process started with PID:16324
pgpool[16298]: 2019-05-08 22:21:38: pid 16324: LOG: watchdog: escalation started
pgpool[16298]: 2019-05-08 22:21:38: pid 16299: LOG: new IPC connection received
pgpool[16298]: 2019-05-08 22:21:38: pid 16299: LOG: new IPC connection received
pgpool[16298]: 2019-05-08 22:21:38: pid 16325: LOG: 2 watchdog nodes are configured for lifecheck
pgpool[16298]: 2019-05-08 22:21:38: pid 16325: LOG: watchdog nodes ID:0 Name:"193.64.42.48:5433 Linux sma-db1t.itidm.elisa.fi"
pgpool[16298]: 2019-05-08 22:21:38: pid 16325: DETAIL: Host:"193.64.42.48" WD Port:9000 pgpool-II port:5433
pgpool[16298]: 2019-05-08 22:21:38: pid 16325: LOG: watchdog nodes ID:1 Name:"Not_Set"
pgpool[16298]: 2019-05-08 22:21:38: pid 16325: DETAIL: Host:"193.64.42.49" WD Port:9000 pgpool-II port:5433
pgpool[16298]: 2019-05-08 22:21:38: pid 16298: LOG: Setting up socket for 0.0.0.0:5433
pgpool[16298]: 2019-05-08 22:21:38: pid 16298: LOG: Setting up socket for :::5433
pgpool[16298]: 2019-05-08 22:21:38: pid 16298: LOG: perhaps failed to create INET domain socket
pgpool[16298]: 2019-05-08 22:21:38: pid 16298: DETAIL: socket(::) failed: "Address family not supported by protocol"
pgpool[16298]: Cannot find device "ens192"
pgpool[16298]: arping: Suitable device could not be determined. Please, use option -I.
pgpool[16298]: Usage: arping [-fqbDUAV] [-c count] [-w timeout] [-I device] [-s source] destination
pgpool[16298]: -f : quit on first reply
pgpool[16298]: -q : be quiet
pgpool[16298]: -b : keep broadcasting, don't go unicast
pgpool[16298]: -D : duplicate address detection mode
pgpool[16298]: -U : Unsolicited ARP mode, update your neighbours
pgpool[16298]: -A : ARP answer mode, update your neighbours
pgpool[16298]: -V : print version and exit
pgpool[16298]: -c count : how many packets to send
pgpool[16298]: -w timeout : how long to wait for a reply
pgpool[16298]: -I device : which ethernet device to use
pgpool[16298]: -s source : source ip address
pgpool[16298]: destination : ask for what ip address
pgpool[16298]: 2019-05-08 22:21:38: pid 16298: LOG: find_primary_node_repeatedly: waiting for finding a primary node
pgpool[16298]: 2019-05-08 22:21:38: pid 16298: LOG: find_primary_node: primary node is 0
pgpool[16298]: 2019-05-08 22:21:38: pid 16298: LOG: pgpool-II successfully started. version 4.0.4 (torokiboshi)
pgpool[16298]: 2019-05-08 22:21:38: pid 16298: LOG: node status[0]: 1
pgpool[16298]: 2019-05-08 22:21:39: pid 16329: LOG: set SO_REUSEPORT option to the socket
pgpool[16298]: 2019-05-08 22:21:39: pid 16329: LOG: creating watchdog heartbeat receive socket.
pgpool[16298]: 2019-05-08 22:21:39: pid 16329: DETAIL: set SO_REUSEPORT
pgpool[16298]: 2019-05-08 22:21:39: pid 16330: LOG: set SO_REUSEPORT option to the socket
pgpool[16298]: 2019-05-08 22:21:39: pid 16330: LOG: creating socket for sending heartbeat
pgpool[16298]: 2019-05-08 22:21:39: pid 16330: DETAIL: set SO_REUSEPORT
pgpool[16298]: 2019-05-08 22:21:40: pid 16324: LOG: successfully acquired the delegate IP:"193.64.42.47"
pgpool[16298]: 2019-05-08 22:21:40: pid 16324: DETAIL: 'if_up_cmd' returned with success
pgpool[16298]: 2019-05-08 22:21:40: pid 16299: LOG: watchdog escalation process with pid: 16324 exit with SUCCESS.

Steps To Reproduce1. setup pgpool.service
2. start postgres
3. start pgpool.service
4. try bringing up the node1 with pcp command:
pcp_recovery_node -h 193.64.42.47 -p 9898 -U postgres -n 1
Tagsmaster slave, pcp commands

Activities

ankur

2019-05-10 18:47

reporter   ~0002594

Hi Team,
You may close the issue. I found out the problems.
1. For network interfaces, I found out that the network interface i was trying to set was ens192 , which was indeed not available in RHEL 7.6. I used the name bond0.431:1 for vip binding and error went away.
In any case, vip was getting acquired by pgpool both before I made the fix and after it.

2. I had the configuration for backend_hostname1 in commented format and that was throwing error:
    ERROR: process recovery request failed
    DETAIL: node id 1 is not valid
I un - commented the needed lines and it worked.

Issue History

Date Modified Username Field Change
2019-05-09 04:44 ankur New Issue
2019-05-09 04:44 ankur Tag Attached: master slave
2019-05-09 04:44 ankur Tag Attached: pcp commands
2019-05-10 18:47 ankur Note Added: 0002594
2019-05-17 09:54 administrator Status new => closed