View Issue Details

IDProjectCategoryView StatusLast Update
0000476Pgpool-IIBugpublic2019-04-05 08:58
Reportersiva Assigned Tohoshiai  
PriorityurgentSeveritymajorReproducibilityhave not tried
Status closedResolutionno change required 
Product Version3.7.7 
Summary0000476: Pgpool Master and Slaves are not properly sync the backends DB status..Pool Mater and Slaves are showing fault backend status..
DescriptionHi ,

Good Morning.

We are using pgpool 3.7.7 server for load balancing with Master and Slave setup and repmgr 4.2 for replication and failover. Total 4 DB backend servers ( 1 Master and 3 Slaves)..

Regarding to status of backend DB servers:
as per repmgr status everything as per expectation , but checking status from pgpool master and pgpool slave getting different status ;

repmgr status on master:

Status on Database Master Node:
-bash-4.2$ /usr/pgsql-9.5/bin/repmgr -f /var/lib/pgsql/repmgr/repmgr.conf cluster show
WARNING: master_response_timeout/9: unknown name/value pair provided; ignoring
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
 ID | Name | Role | Status | Upstream | Location | Connection string
----+--------------+---------+-----------+--------------+----------+----------------------------------------------
 1 | Master_DB_IP | primary | * running | | default | host= Master_DB_IP user=repmgr dbname=repmgr
 2 | Slave_DB_IP1 | standby | running | Master_DB_IP | default | host= Slave_DB_IP1 user=repmgr dbname=repmgr
 3 | Slave_DB_IP2 | standby | running | Master_DB_IP | default | host= Slave_DB_IP2 user=repmgr dbname=repmgr
 4 | Slave_DB_IP3 | standby | running | Master_DB_IP | default | host= Slave_DB_IP3 user=repmgr dbname=repmgr

on Hostname1 (pgpool master Node status):
[root@Hostname1 ~]# systemctl status pgpool
Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: LOG: perhaps failed to create INET domain socket
Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: DETAIL: socket(::) failed: "Address family not supported by protocol"
Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: LOG: pgpool-II successfully started. version 3.7.7 (amefuriboshi)
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5382: FATAL: failed to create watchdog heartbeat receive socket
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5382: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "No such device"
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5383: FATAL: failed to create watchdog heartbeat sender socket
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5383: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "No such device"
Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5377: LOG: successfully acquired the delegate IP:"VIP Address"
Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5377: DETAIL: 'if_up_cmd' returned with success
Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5358: LOG: watchdog escalation process with pid: 5377 exit with SUCCESS.

[root@Hostname1 ~]# su - postgres
-bash-4.2$ psql -U pgpool --host Hostname1 --dbname postgres -c "show pool_nodes"
 node_id | Hostname1 | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+--------------+------+--------+-----------+--------+------------+-------------------+-------------------
 0 | backend_host0_IP | backen_port | down | 0.250000 | slave | 0 | false | 0
 1 | backend_host1_IP | backen_port | down | 0.250000 | slave | 0 | false | 0
 2 | backend_host2_IP | backen_port | up | 0.250000 | master | 1 | true | 0
 3 | backend_host3_IP | backen_port | up | 0.250000 | slave | 0 | false | 0
(4 rows)

on Hostname2 (pgpool slave Node status):
[root@Hostname2 ~]# systemctl status pgpool
on Hostname2 (pgpool slave Node):
Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4349: WARNING: checking setuid bit of arping command
Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4349: DETAIL: arping[/usr/sbin/arping] doesn t have setuid bit
Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4349: LOG: reading status file: 0 th backend is set to down status
Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4349: LOG: waiting for watchdog to initialize
Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4350: LOG: setting the local watchdog node name to "pgpool_slave_host_IP:pool_port Linux Hostname2"
Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4350: LOG: watchdog cluster is configured with 1 remote nodes
Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4350: LOG: watchdog remote node:0 on pgpool_master_host_IP:watchdog_port
Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4350: LOG: interface monitoring is disabled in watchdog
Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4350: LOG: watchdog node state changed from [DEAD] to [LOADING]
Mar 19 21:42:17 Hostname2 pgpool[4349]: 2019-03-19 21:42:17: pid 4350: LOG: watchdog node state changed from [LOADING] to [JOINING]

[root@Hostname2 ~]# su - postgres
Last login: Tue Mar 19 21:34:19 IST 2019 on pts/0
-bash-4.2$ psql -U pgpool --host Hostname2 --dbname postgres -c "show pool_nodes"
 node_id | Hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+--------------+------+--------+-----------+--------+------------+-------------------+-------------------
 0 | backend_host0_IP | backen_port | down | 0.250000 | slave | 0 | false | 0
 1 | backend_host1_IP | backen_port | up | 0.250000 | master | 0 | true | 0
 2 | backend_host2_IP | backen_port | up | 0.250000 | slave | 0 | false | 0
 3 | backend_host3_IP | backen_port | up | 0.250000 | slave | 0 | false | 0
(4 rows)

Could you please help to resolve this status mismatch between pgpool master , slave and Databases.Please share me if already have any bug fix which related to this issues if already addressed.

Thank you very much for your help.
Additional Informationon Hostname1 (pgpool master Node status):

[root@Hostname1 ~]# systemctl status pgpool
Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: LOG: perhaps failed to create INET domain socket
Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: DETAIL: socket(::) failed: "Address family not supported by protocol"
Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: LOG: pgpool-II successfully started. version 3.7.7 (amefuriboshi)
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5382: FATAL: failed to create watchdog heartbeat receive socket
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5382: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "No such device"
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5383: FATAL: failed to create watchdog heartbeat sender socket
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5383: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "No such device"
Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5377: LOG: successfully acquired the delegate IP:"VIP Address"
Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5377: DETAIL: 'if_up_cmd' returned with success
Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5358: LOG: watchdog escalation process with pid: 5377 exit with SUCCESS.

Please help us to address above errors (pid 5357: DETAIL: socket(::) failed: "Address family not supported by protocol") , (pid 5382: FATAL: failed to create watchdog heartbeat receive socket),(pid 5383: FATAL: failed to create watchdog heartbeat sender socket),(pid 5382: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "No such device")
TagsNo tags attached.

Activities

hoshiai

2019-03-22 16:17

developer   ~0002450

I think that it maybe failed to connect part of pgpool-pgpool and pgpool-postgres,
because of incollect settings.

> (pid 5357: DETAIL: socket(::) failed: "Address family not supported by protocol") ,
This is probably not a problem.

> (pid 5382: FATAL: failed to create watchdog heartbeat receive socket),
> (pid 5383: FATAL: failed to create watchdog heartbeat sender socket),
> (pid 5382: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "・LONo such device")

They are failed to create and set socket for heartbeat among pgpool.
Do you set the heartbeat_* parameters in pgpool.conf correctly?
For example, if heartbeat_device parameter specified incorrectly, it may well happen.

Could you show pgpool.conf and pgpool's logfile?

siva

2019-03-24 13:29

reporter   ~0002451

Hi Hoshial,

thanks for your reply.

Could you please clarify few things.

1. Status Mismatch between postgres server and master pool server
[root@Hostname1 ~]# su - postgres
-bash-4.2$ psql -U pgpool --host Hostname1 --dbname postgres -c "show pool_nodes"
 node_id | Hostname1 | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+--------------+------+--------+-----------+--------+------------+-------------------+-------------------
 0 | backend_host0_IP | backen_port | down | 0.250000 | slave | 0 | false | 0
 1 | backend_host1_IP | backen_port | down | 0.250000 | slave | 0 | false | 0
 2 | backend_host2_IP | backen_port | up | 0.250000 | master | 1 | true | 0
 3 | backend_host3_IP | backen_port | up | 0.250000 | slave | 0 | false | 0
(4 rows)

Please let me know the reason why it has been showing wrong status?. What is the fix for this to get status properly between psotgres and pgpool

2. Mismatch status between pool to pool:
[root@Hostname2 ~]# su - postgres
Last login: Tue Mar 19 21:34:19 IST 2019 on pts/0
-bash-4.2$ psql -U pgpool --host Hostname2 --dbname postgres -c "show pool_nodes"
 node_id | Hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+--------------+------+--------+-----------+--------+------------+-------------------+-------------------
 0 | backend_host0_IP | backen_port | down | 0.250000 | slave | 0 | false | 0
 1 | backend_host1_IP | backen_port | up | 0.250000 | master | 0 | true | 0
 2 | backend_host2_IP | backen_port | up | 0.250000 | slave | 0 | false | 0
 3 | backend_host3_IP | backen_port | up | 0.250000 | slave | 0 | false | 0
(4 rows)

Even between pool to pool also, there is no sync between them , please help to understand to overcome these issues.

3. setuid issue:
Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4349: WARNING: checking setuid bit of arping command
Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4349: DETAIL: arping[/usr/sbin/arping] doesn t have setuid bit

How to fix above setuid issues?.

4. No device found
> (pid 5382: FATAL: failed to create watchdog heartbeat receive socket),
> (pid 5383: FATAL: failed to create watchdog heartbeat sender socket),
> (pid 5382: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "・LONo such device")


Please find requested details.

Log:

[root@Hostname1 ~]# systemctl status pgpool
Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: LOG: perhaps failed to create INET domain socket
Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: DETAIL: socket(::) failed: "Address family not supported by protocol"
Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: LOG: pgpool-II successfully started. version 3.7.7 (amefuriboshi)
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5382: FATAL: failed to create watchdog heartbeat receive socket
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5382: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "No such device"
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5383: FATAL: failed to create watchdog heartbeat sender socket
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5383: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "No such device"
Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5377: LOG: successfully acquired the delegate IP:"VIP Address"
Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5377: DETAIL: 'if_up_cmd' returned with success
Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5358: LOG: watchdog escalation process with pid: 5377 exit with SUCCESS.

Thanks for you help.

Regards,
Siva.
Master_Pool_config_file.txt (40,853 bytes)   
Master_Pool_config_file.txt (40,853 bytes)   

hoshiai

2019-03-25 15:15

developer   ~0002453

> 1. Status Mismatch between postgres server and master pool server

If Pgpool-II have already judged a node for down status once, it will not check again.
So, you need to do that this node is reattached to the Pgpool-II by manually.

for the way for example,
  you can use 'pcp_attach_node' command
or
  shutdown Pgpool-II and delete 'pgpool_status' file in a directory specified 'log_dir' paramter, and then restart Pgpool-II.


> 2. Mismatch status between pool to pool:

The reason of no sync may be that connection failed between pgpool.

Please check pgpool's status to use pcp_watchdog_info command for each Pgpool's server.

for example:
  pcp_watchdog_info -v -h IP_ADRESS -p PCP_PORT -U USERNAME

* pcp_watchdog_info command
http://www.pgpool.net/docs/pgpool-II-3.7.7/en/html/pcp-watchdog-info.html


> 3. setuid issue:

if_up_cmd, if_down_cmd and arping_cmd need administrator authority.
You have to set setuid(use 'chmod u+s') or sudo without password, to execute their command by Pgpool.

http://www.pgpool.net/docs/pgpool-II-3.7.7/ja/html/tutorial-watchdog-intro.html#TUTORIAL-WATCHDOG-AUTOMATIC-VIP

> 4. No device found

Probably, heartbeat setting are wrong.
For example, does the 'eth0' device specified by your conf exist on the server?
You can check it using 'ip a' command.

siva

2019-03-25 15:55

reporter   ~0002454

> 1. Status Mismatch between postgres server and master pool server

If Pgpool-II have already judged a node for down status once, it will not check again.
So, you need to do that this node is reattached to the Pgpool-II by manually.

for the way for example,
  you can use 'pcp_attach_node' command
or
  shutdown Pgpool-II and delete 'pgpool_status' file in a directory specified 'log_dir' paramter, and then restart Pgpool-II.

-----------
shutdown Pgpool-II and delete 'pgpool_status' file in a directory specified 'log_dir' paramter, and then restart Pgpool-II. ==> we have done this one , after restarting it is fine but again after 6 hours again getting wrong update. From the repmgr status , it is showing right update.




> 2. Mismatch status between pool to pool:

The reason of no sync may be that connection failed between pgpool.

Please check pgpool's status to use pcp_watchdog_info command for each Pgpool's server.

for example:
  pcp_watchdog_info -v -h IP_ADRESS -p PCP_PORT -U USERNAME

* pcp_watchdog_info command
http://www.pgpool.net/docs/pgpool-II-3.7.7/en/html/pcp-watchdog-info.html


> 3. setuid issue:

if_up_cmd, if_down_cmd and arping_cmd need administrator authority.
You have to set setuid(use 'chmod u+s') or sudo without password, to execute their command by Pgpool.

http://www.pgpool.net/docs/pgpool-II-3.7.7/ja/html/tutorial-watchdog-intro.html#TUTORIAL-WATCHDOG-AUTOMATIC-VIP

============
All starting and stopping done by using root user only . But still getting above issue

> 4. No device found

Probably, heartbeat setting are wrong.
For example, does the 'eth0' device specified by your conf exist on the server?
You can check it using 'ip a' command.
=========

I will check this one. Is this cause any issues? like status mismatch between pools or status mismatch between DB servers to pgpool?

=======
Status mismatch is re occurring every day.. Could you please help us to permanent fix for this issue..

Thanks a lot for your help.

hoshiai

2019-03-25 16:56

developer   ~0002455

> All starting and stopping done by using root user only . But still getting above issue
I see, so this warning messages is no problem.

Could you share pgpool.log (in '/var/log/pgpool') and postgresql.log of each server?
I could not specified a cause by only part of log. And I want to confirm the result of pcp_watchdog_info command .

siva

2019-03-25 23:51

reporter   ~0002459

Thanks for your help hoshiai ..
I will update the status

siva

2019-03-27 23:44

reporter   ~0002469

Hi Hoshiai,

Please find watchdog info of both pgpool master and slave servers:

[root@N1PPRL-UFGA0159 pgpool-II]# pcp_watchdog_info -h 10.222.164.81 -p 9898 -U pgpool -v
Password:
Watchdog Cluster Information
Total Nodes : 2
Remote Nodes : 1
Quorum state : QUORUM IS ON THE EDGE
Alive Remote Nodes : 0
VIP up on local node : YES
Master Node Name : 10.222.164.81:5432 Linux N1PPRL-UFGA0159
Master Host Name : 10.222.164.81

Watchdog Node Information
Node Name : 10.222.164.81:5432 Linux N1PPRL-UFGA0159
Host Name : 10.222.164.81
Delegate IP : 10.222.167.250
Pgpool port : 5432
Watchdog port : 9000
Node priority : 2
Status : 4
Status Name : MASTER

Node Name : Not_Set
Host Name : 10.222.16.82
Delegate IP : Not_Set
Pgpool port : 5432
Watchdog port : 9000
Node priority : 0
Status : 0
Status Name : DEAD

[root@N1PPRL-UFGA0159 pgpool-II]# pcp_watchdog_info -h 10.222.164.81 -p 9898 -U pgpool -v
Password:
Watchdog Cluster Information
Total Nodes : 2
Remote Nodes : 1
Quorum state : QUORUM IS ON THE EDGE
Alive Remote Nodes : 0
VIP up on local node : YES
Master Node Name : 10.222.164.81:5432 Linux N1PPRL-UFGA0159
Master Host Name : 10.222.164.81

Watchdog Node Information
Node Name : 10.222.164.81:5432 Linux N1PPRL-UFGA0159
Host Name : 10.222.164.81
Delegate IP : 10.222.167.250
Pgpool port : 5432
Watchdog port : 9000
Node priority : 2
Status : 4
Status Name : MASTER

Node Name : Not_Set
Host Name : 10.222.16.82
Delegate IP : Not_Set
Pgpool port : 5432
Watchdog port : 9000
Node priority : 0
Status : 0
Status Name : DEAD
===

siva

2019-03-28 00:24

reporter   ~0002471

Hi Hoshiai,

Please find the all configuration and log file for reference...

Thanks for your help.

hoshiai

2019-03-28 14:14

developer   ~0002474

Hi siva,

Thank you for sharing log file.

Probably, when communication between Pgpool-II and PostgreSQL failed, Pgpool-II degenerate backend PostgreSQL, because of fail_over_on_backend_error = on.
How about setting healthcheck enabled, instead of fail_over_on_backend = off .

And heartbeat also failed because of "No such device". Does the 'eth0' contain network about '10.222.16.xx' ?
I think heartbeat will probably be success, when 'heartbeat_device0' paramter change from 'eth0' to ''(empty caharacters).

hoshiai

2019-03-28 14:29

developer   ~0002476

> Q) Here finding is , we are not using any failover command to failover , but still showing failover operation in log , but in backend nothing was happend. Is any sort of promotion or degeneration happend at pool level with out effecting backend DB node?

A) Yes. this log are meant degeneration happend at Pgpool level. Actually, PostgreSQL not failover.

siva

2019-03-28 19:04

reporter   ~0002477

Hi hoshiai,

thanks for your help , will check and get back to you.

siva

2019-03-29 00:55

reporter   ~0002480

Hi hoshiai,

Thanks for your help.

And heartbeat also failed because of "No such device". Does the 'eth0' contain network about '10.222.16.xx' ?
I think heartbeat will probably be success, when 'heartbeat_device0' paramter change from 'eth0' to ''(empty caharacters).

Change: I have done change as per your suggestion.'heartbeat_device0='' . Now both Master and Slave pool servers are acting as "Masters".

As per the log both are sending heartbeat signals and not getting reply from both the servers so assuming as masters.

Below is log from slave pool:
LOG: watchdog node state changed from [DEAD] to [LOADING]
LOG: watchdog node state changed from [LOADING] to [JOINING]
LOG: watchdog node state changed from [JOINING] to [INITIALIZING
LOG: I am the only alive node in the watchdog cluster
HINT: skipping stand for coordinator state
LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
LOG: I am announcing my self as master/coordinator watchdog node

Below are the some of findings:

we are not able to find watchdog heartbeat port : 9694 and not able accept the connections on 9694 even both master and slave pool are up and running.

[root@s2n pgpool-II]# netstat -nlt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:9898 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:5432 0.0.0.0:* LISTEN

[root@INHUSZ1-V1625126 ~]# netstat -nlt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:9898 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:5433 0.0.0.0:* LISTEN

[root@s2n pgpool-II]# hostname -I
172.16.251.25
[root@s2n pgpool-II]# telnet 172.16.251.26 5432
Trying 172.16.251.26...
Connected to 172.16.251.26.
Escape character is '^]'.
^CConnection closed by foreign host.
[root@s2n pgpool-II]# telnet 172.16.251.26 9000
Trying 172.16.251.26...
Connected to 172.16.251.26.
Escape character is '^]'.
^CConnection closed by foreign host.
[root@s2n pgpool-II]# telnet 172.16.251.26 9898
Trying 172.16.251.26...
Connected to 172.16.251.26.
Escape character is '^]'.
^CConnection closed by foreign host.
[root@s2n pgpool-II]# telnet 172.16.251.26 9694
Trying 172.16.251.26...
telnet: connect to address 172.16.251.26: Connection refused

On Slave:

[root@INHUSZ1-V1625126 ~]# telnet 172.16.251.25 9694
Trying 172.16.251.25...
telnet: connect to address 172.16.251.25: Connection refused
[root@INHUSZ1-V1625126 ~]# telnet 172.16.251.25 9898
Trying 172.16.251.25...
Connected to 172.16.251.25.
Escape character is '^]'.

is there any other config parameters to tweek to get sync between servers apart from

other_pgpool_hostname0
other_pgpool_port0
other_wd_port0
heartbeat_destination0
 
heartbeat_destination_port0 = 9694
 
heartbeat_device0 = ''

siva

2019-03-29 13:31

reporter   ~0002481

Hi hoshiai,

We have found some heartbeat sender and receiver port mismatch, is it expected or is their any issue with my setup. The heartbeat sender process is keep on changing after restarting the pool.

On Master:
[root@INHUSZ1-V1625126 ~]# netstat -nltup | grep pgpool
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN 25579/pgpool: watch
tcp 0 0 0.0.0.0:9898 0.0.0.0:* LISTEN 25578/pgpool
tcp 0 0 0.0.0.0:5433 0.0.0.0:* LISTEN 25578/pgpool
tcp6 0 0 :::5433 :::* LISTEN 25578/pgpool
udp 0 0 0.0.0.0:9694 0.0.0.0:* 25587/pgpool: heart
udp 0 0 0.0.0.0:59432 0.0.0.0:* 25589/pgpool: heart

[root@INHUSZ1-V1625126 ~]# ps -eaf| grep 59432
root 31903 24284 0 13:38 pts/2 00:00:00 grep --color=auto 59432
[root@INHUSZ1-V1625126 ~]# ps -eaf| grep 25589
root 25589 25581 0 00:12 ? 00:00:01 pgpool: heartbeat sender
root 31920 24284 0 13:38 pts/2 00:00:00 grep --color=auto 25589
[root@INHUSZ1-V1625126 ~]# ps -eaf| grep 25581
root 25581 25578 0 00:12 ? 00:00:35 pgpool: lifecheck
root 25587 25581 0 00:12 ? 00:00:00 pgpool: heartbeat receiver
root 25589 25581 0 00:12 ? 00:00:01 pgpool: heartbeat sender
root 32018 25581 0 13:39 ? 00:00:00 ping -q -c3 172.16.251.26
root 32019 25581 0 13:39 ? 00:00:00 ping -q -c3 172.16.251.27
root 32022 24284 0 13:39 pts/2 00:00:00 grep --color=auto 25581

? pgpool.service - Pgpool-II
   Loaded: loaded (/usr/lib/systemd/system/pgpool.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2019-03-29 00:12:51 IST; 13h ago
  Process: 25468 ExecStop=/usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf $STOP_OPTS stop (code=exited, status=0/SUCCESS)
 Main PID: 25578 (pgpool)
    Tasks: 42
   CGroup: /system.slice/pgpool.service
           +-25578 /usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf -n
           +-25579 pgpool: watchdog
           +-25581 pgpool: lifecheck
           +-25582 pgpool: wait for connection request
           +-25583 pgpool: wait for connection request
           +-25584 pgpool: wait for connection request
           +-25585 pgpool: wait for connection request
           +-25586 pgpool: wait for connection request
           +-25587 pgpool: heartbeat receiver
           +-25588 pgpool: wait for connection request
           +-25589 pgpool: heartbeat sender
           +-25590 pgpool: wait for connection request
           +-25591 pgpool: wait for connection request
           +-25592 pgpool: wait for connection request
           +-25593 pgpool: wait for connection request
           +-25594 pgpool: wait for connection request
           +-25595 pgpool: wait for connection request
           +-25596 pgpool: wait for connection request
           +-25597 pgpool: wait for connection request
           +-25598 pgpool: wait for connection request
           +-25599 pgpool: wait for connection request
           +-25600 pgpool: wait for connection request
           +-25601 pgpool: wait for connection request
           +-25602 pgpool: wait for connection request
           +-25603 pgpool: wait for connection request
           +-25604 pgpool: wait for connection request
           +-25605 pgpool: wait for connection request
           +-25606 pgpool: wait for connection request
           +-25607 pgpool: wait for connection request
           +-25608 pgpool: wait for connection request
           +-25609 pgpool: wait for connection request
           +-25610 pgpool: wait for connection request
           +-25611 pgpool: wait for connection request
           +-25612 pgpool: wait for connection request
           +-25613 pgpool: wait for connection request
           +-25614 pgpool: wait for connection request
           +-25615 pgpool: wait for connection request
           +-25617 pgpool: PCP: wait for connection request
           +-25618 pgpool: worker process
           +-25619 pgpool: health check process(0)
           +-25620 pgpool: health check process(1)
           
---

On Slave :

[root@s2n pgpool-II]# netstat -nltup | grep pgpool
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN 4003/pgpool: watchd
tcp 0 0 0.0.0.0:9898 0.0.0.0:* LISTEN 4002/pgpool
tcp 0 0 0.0.0.0:5432 0.0.0.0:* LISTEN 4002/pgpool
tcp6 0 0 :::5432 :::* LISTEN 4002/pgpool
udp 0 0 0.0.0.0:9694 0.0.0.0:* 4013/pgpool: heartb
udp 0 0 0.0.0.0:36764 0.0.0.0:* 4015/pgpool: heartb
[root@s2n pgpool-II]# systemctl status pgpool.service
? pgpool.service - Pgpool-II
   Loaded: loaded (/usr/lib/systemd/system/pgpool.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2019-03-29 00:12:47 IST; 14h ago
  Process: 3861 ExecStop=/usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf $STOP_OPTS stop (code=exited, status=0/SUCCESS)
 Main PID: 4002 (pgpool)
   CGroup: /system.slice/pgpool.service
           +-4002 /usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf -n
           +-4003 pgpool: watchdog
           +-4009 pgpool: lifecheck
           +-4010 pgpool: wait for connection request
           +-4011 pgpool: wait for connection request
           +-4012 pgpool: wait for connection request
           +-4013 pgpool: heartbeat receiver
           +-4014 pgpool: wait for connection request
           +-4015 pgpool: heartbeat sender
           +-4016 pgpool: wait for connection request

Could you please comment on this and is cause issue?

Thanks for your help.

hoshiai

2019-03-29 16:59

developer   ~0002482

> Change: I have done change as per your suggestion.'heartbeat_device0='' . Now both Master and Slave pool servers are acting as "Masters".
Thank you for trying. I understood that don't resolve a problem to change "heartbeat_device0".
I am interested to the result of netstat command apear heartbeat sender/receiver process.
Does it mean that haven't appear "No such device" messages?

> udp 0 0 0.0.0.0:9694 0.0.0.0:* 25587/pgpool: heartbeat receiver
> udp 0 0 0.0.0.0:59432 0.0.0.0:* 25589/pgpool: heartbeat sender
This is no problem.
The heartbeat sender process send a signal to other servers' 9694 port, so its socket does not need to use 9694 port on own local.

siva

2019-03-30 23:17

reporter   ~0002486

Hi hoshiai,

For missing sync between master and slave pgpool..

i have checked watchdog socket connectivity, not able to see established connection between master and slave pool: Is this cause missing connectivity between master and slave ? and even for tcpdump -i ens192 port 9000 , i am not able to get any response on master as well as slave?

On Slave:
[root@s2n pgpool-II]# netstat -nltupa | grep pgpool
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN 22101/pgpool: watch
tcp 0 0 0.0.0.0:9898 0.0.0.0:* LISTEN 22100/pgpool
tcp 0 0 0.0.0.0:5432 0.0.0.0:* LISTEN 22100/pgpool
tcp6 0 0 :::5432 :::* LISTEN 22100/pgpool
udp 0 0 0.0.0.0:56384 0.0.0.0:* 22124/pgpool: heart
udp 0 0 0.0.0.0:9694 0.0.0.0:* 22123/pgpool: heart

On Master :
[root@INHUSZ1-V1625126 ~]# netstat -nltupa | grep pgpool
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN 30772/pgpool: watch
tcp 0 0 0.0.0.0:9898 0.0.0.0:* LISTEN 30771/pgpool
tcp 0 0 0.0.0.0:5433 0.0.0.0:* LISTEN 30771/pgpool
tcp6 0 0 :::5433 :::* LISTEN 30771/pgpool
udp 0 0 0.0.0.0:9694 0.0.0.0:* 30795/pgpool: heart
udp 0 0 0.0.0.0:40139 0.0.0.0:* 30796/pgpool: heart

Is above issue is causing EVENT = TIMEOUT while initializing watchdog on slave pool instated of getting " outbound connection to master pool server "? or is here any issue for causing " EVENT =TIMEOUT" instead of outbound connection to master server?
What are the possible causes for getting this issue?

2019-03-28 18:55:52: pid 24816: DEBUG: STATE MACHINE INVOKED WITH EVENT = STATE CHANGED Current State = LOADING
2019-03-28 18:55:57: pid 24816: DEBUG: STATE MACHINE INVOKED WITH EVENT = TIMEOUT Current State = LOADING

Please share your comments on this..
Thanks a lot for your ongoing support :)

siva

2019-04-02 14:19

reporter   ~0002500

Hi Hoshiai,

Good Morning,

Please share your suggestion on above issue.

Thank you.

hoshiai

2019-04-03 17:29

developer   ~0002510

Hi siva,
Sorry for late reply. Currently,I am checking a sourcecode and a Pgpool's operation.

Could you confirm to conmmunicate beetween servers?
In particular, test that Slave Pgpool server communicates to parimary Pgpool server
using heartbeat ports(other_wd_port0, heartbeat_destination_port0). Using such as 'nmap' command.

If you can use 'nmap' command, you can check with the following command:

nmap {primary_server_address} -sUT -p 'T:{other_wd_port0},{hearbeat_destination_port0}, U:{other_wd_port0},{hearbeat_destination_port0}'

siva

2019-04-03 18:54

reporter   ~0002511

Thanks hoshiai,

i will check and share the output.

siva

2019-04-04 19:21

reporter   ~0002521

Hi hoshiai,

Thanks for your "nmap {primary_server_address} -sUT -p 'T:{other_wd_port0},{hearbeat_destination_port0}, U:{other_wd_port0},{hearbeat_destination_port0}'".
Port has not enabled , now we can able to identify the issue and now both master and slave are in sync .
Issue has been resolved .
Thanks a lot for your support, much appreciate for your support.

Please close this issue now.

hoshiai

2019-04-05 08:58

developer   ~0002523

It is glad that your problem resolved.

Ok, I colsed this issue.

Issue History

Date Modified Username Field Change
2019-03-20 14:31 siva New Issue
2019-03-20 14:40 pengbo Assigned To => hoshiai
2019-03-20 14:40 pengbo Status new => assigned
2019-03-22 16:17 hoshiai Note Added: 0002450
2019-03-24 13:29 siva File Added: Master_Pool_config_file.txt
2019-03-24 13:29 siva Note Added: 0002451
2019-03-25 15:15 hoshiai Note Added: 0002453
2019-03-25 15:55 siva Note Added: 0002454
2019-03-25 16:56 hoshiai Note Added: 0002455
2019-03-25 23:51 siva Note Added: 0002459
2019-03-27 23:44 siva Note Added: 0002469
2019-03-28 00:24 siva File Added: Pool Issues debug Info.zip
2019-03-28 00:24 siva Note Added: 0002471
2019-03-28 14:14 hoshiai Note Added: 0002474
2019-03-28 14:29 hoshiai Note Added: 0002476
2019-03-28 19:04 siva Note Added: 0002477
2019-03-29 00:55 siva Note Added: 0002480
2019-03-29 13:31 siva Note Added: 0002481
2019-03-29 16:59 hoshiai Note Added: 0002482
2019-03-30 23:17 siva Note Added: 0002486
2019-04-02 14:19 siva Note Added: 0002500
2019-04-03 17:29 hoshiai Note Added: 0002510
2019-04-03 18:54 siva Note Added: 0002511
2019-04-04 19:21 siva Note Added: 0002521
2019-04-05 08:58 hoshiai Status assigned => closed
2019-04-05 08:58 hoshiai Resolution open => no change required
2019-04-05 08:58 hoshiai Note Added: 0002523