View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000476 | Pgpool-II | Bug | public | 2019-03-20 14:31 | 2019-04-05 08:58 |
| Reporter | siva | Assigned To | hoshiai | ||
| Priority | urgent | Severity | major | Reproducibility | have not tried |
| Status | closed | Resolution | no change required | ||
| Product Version | 3.7.7 | ||||
| Summary | 0000476: Pgpool Master and Slaves are not properly sync the backends DB status..Pool Mater and Slaves are showing fault backend status.. | ||||
| Description | Hi , Good Morning. We are using pgpool 3.7.7 server for load balancing with Master and Slave setup and repmgr 4.2 for replication and failover. Total 4 DB backend servers ( 1 Master and 3 Slaves).. Regarding to status of backend DB servers: as per repmgr status everything as per expectation , but checking status from pgpool master and pgpool slave getting different status ; repmgr status on master: Status on Database Master Node: -bash-4.2$ /usr/pgsql-9.5/bin/repmgr -f /var/lib/pgsql/repmgr/repmgr.conf cluster show WARNING: master_response_timeout/9: unknown name/value pair provided; ignoring WARNING: the following problems were found in the configuration file: parameter "cluster" is deprecated and will be ignored ID | Name | Role | Status | Upstream | Location | Connection string ----+--------------+---------+-----------+--------------+----------+---------------------------------------------- 1 | Master_DB_IP | primary | * running | | default | host= Master_DB_IP user=repmgr dbname=repmgr 2 | Slave_DB_IP1 | standby | running | Master_DB_IP | default | host= Slave_DB_IP1 user=repmgr dbname=repmgr 3 | Slave_DB_IP2 | standby | running | Master_DB_IP | default | host= Slave_DB_IP2 user=repmgr dbname=repmgr 4 | Slave_DB_IP3 | standby | running | Master_DB_IP | default | host= Slave_DB_IP3 user=repmgr dbname=repmgr on Hostname1 (pgpool master Node status): [root@Hostname1 ~]# systemctl status pgpool Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: LOG: perhaps failed to create INET domain socket Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: DETAIL: socket(::) failed: "Address family not supported by protocol" Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: LOG: pgpool-II successfully started. version 3.7.7 (amefuriboshi) Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5382: FATAL: failed to create watchdog heartbeat receive socket Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5382: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "No such device" Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5383: FATAL: failed to create watchdog heartbeat sender socket Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5383: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "No such device" Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5377: LOG: successfully acquired the delegate IP:"VIP Address" Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5377: DETAIL: 'if_up_cmd' returned with success Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5358: LOG: watchdog escalation process with pid: 5377 exit with SUCCESS. [root@Hostname1 ~]# su - postgres -bash-4.2$ psql -U pgpool --host Hostname1 --dbname postgres -c "show pool_nodes" node_id | Hostname1 | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay ---------+--------------+------+--------+-----------+--------+------------+-------------------+------------------- 0 | backend_host0_IP | backen_port | down | 0.250000 | slave | 0 | false | 0 1 | backend_host1_IP | backen_port | down | 0.250000 | slave | 0 | false | 0 2 | backend_host2_IP | backen_port | up | 0.250000 | master | 1 | true | 0 3 | backend_host3_IP | backen_port | up | 0.250000 | slave | 0 | false | 0 (4 rows) on Hostname2 (pgpool slave Node status): [root@Hostname2 ~]# systemctl status pgpool on Hostname2 (pgpool slave Node): Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4349: WARNING: checking setuid bit of arping command Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4349: DETAIL: arping[/usr/sbin/arping] doesn t have setuid bit Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4349: LOG: reading status file: 0 th backend is set to down status Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4349: LOG: waiting for watchdog to initialize Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4350: LOG: setting the local watchdog node name to "pgpool_slave_host_IP:pool_port Linux Hostname2" Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4350: LOG: watchdog cluster is configured with 1 remote nodes Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4350: LOG: watchdog remote node:0 on pgpool_master_host_IP:watchdog_port Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4350: LOG: interface monitoring is disabled in watchdog Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4350: LOG: watchdog node state changed from [DEAD] to [LOADING] Mar 19 21:42:17 Hostname2 pgpool[4349]: 2019-03-19 21:42:17: pid 4350: LOG: watchdog node state changed from [LOADING] to [JOINING] [root@Hostname2 ~]# su - postgres Last login: Tue Mar 19 21:34:19 IST 2019 on pts/0 -bash-4.2$ psql -U pgpool --host Hostname2 --dbname postgres -c "show pool_nodes" node_id | Hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay ---------+--------------+------+--------+-----------+--------+------------+-------------------+------------------- 0 | backend_host0_IP | backen_port | down | 0.250000 | slave | 0 | false | 0 1 | backend_host1_IP | backen_port | up | 0.250000 | master | 0 | true | 0 2 | backend_host2_IP | backen_port | up | 0.250000 | slave | 0 | false | 0 3 | backend_host3_IP | backen_port | up | 0.250000 | slave | 0 | false | 0 (4 rows) Could you please help to resolve this status mismatch between pgpool master , slave and Databases.Please share me if already have any bug fix which related to this issues if already addressed. Thank you very much for your help. | ||||
| Additional Information | on Hostname1 (pgpool master Node status): [root@Hostname1 ~]# systemctl status pgpool Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: LOG: perhaps failed to create INET domain socket Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: DETAIL: socket(::) failed: "Address family not supported by protocol" Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: LOG: pgpool-II successfully started. version 3.7.7 (amefuriboshi) Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5382: FATAL: failed to create watchdog heartbeat receive socket Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5382: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "No such device" Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5383: FATAL: failed to create watchdog heartbeat sender socket Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5383: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "No such device" Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5377: LOG: successfully acquired the delegate IP:"VIP Address" Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5377: DETAIL: 'if_up_cmd' returned with success Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5358: LOG: watchdog escalation process with pid: 5377 exit with SUCCESS. Please help us to address above errors (pid 5357: DETAIL: socket(::) failed: "Address family not supported by protocol") , (pid 5382: FATAL: failed to create watchdog heartbeat receive socket),(pid 5383: FATAL: failed to create watchdog heartbeat sender socket),(pid 5382: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "No such device") | ||||
| Tags | No tags attached. | ||||
|
|
I think that it maybe failed to connect part of pgpool-pgpool and pgpool-postgres, because of incollect settings. > (pid 5357: DETAIL: socket(::) failed: "Address family not supported by protocol") , This is probably not a problem. > (pid 5382: FATAL: failed to create watchdog heartbeat receive socket), > (pid 5383: FATAL: failed to create watchdog heartbeat sender socket), > (pid 5382: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "・LONo such device") They are failed to create and set socket for heartbeat among pgpool. Do you set the heartbeat_* parameters in pgpool.conf correctly? For example, if heartbeat_device parameter specified incorrectly, it may well happen. Could you show pgpool.conf and pgpool's logfile? |
|
|
Hi Hoshial, thanks for your reply. Could you please clarify few things. 1. Status Mismatch between postgres server and master pool server [root@Hostname1 ~]# su - postgres -bash-4.2$ psql -U pgpool --host Hostname1 --dbname postgres -c "show pool_nodes" node_id | Hostname1 | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay ---------+--------------+------+--------+-----------+--------+------------+-------------------+------------------- 0 | backend_host0_IP | backen_port | down | 0.250000 | slave | 0 | false | 0 1 | backend_host1_IP | backen_port | down | 0.250000 | slave | 0 | false | 0 2 | backend_host2_IP | backen_port | up | 0.250000 | master | 1 | true | 0 3 | backend_host3_IP | backen_port | up | 0.250000 | slave | 0 | false | 0 (4 rows) Please let me know the reason why it has been showing wrong status?. What is the fix for this to get status properly between psotgres and pgpool 2. Mismatch status between pool to pool: [root@Hostname2 ~]# su - postgres Last login: Tue Mar 19 21:34:19 IST 2019 on pts/0 -bash-4.2$ psql -U pgpool --host Hostname2 --dbname postgres -c "show pool_nodes" node_id | Hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay ---------+--------------+------+--------+-----------+--------+------------+-------------------+------------------- 0 | backend_host0_IP | backen_port | down | 0.250000 | slave | 0 | false | 0 1 | backend_host1_IP | backen_port | up | 0.250000 | master | 0 | true | 0 2 | backend_host2_IP | backen_port | up | 0.250000 | slave | 0 | false | 0 3 | backend_host3_IP | backen_port | up | 0.250000 | slave | 0 | false | 0 (4 rows) Even between pool to pool also, there is no sync between them , please help to understand to overcome these issues. 3. setuid issue: Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4349: WARNING: checking setuid bit of arping command Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4349: DETAIL: arping[/usr/sbin/arping] doesn t have setuid bit How to fix above setuid issues?. 4. No device found > (pid 5382: FATAL: failed to create watchdog heartbeat receive socket), > (pid 5383: FATAL: failed to create watchdog heartbeat sender socket), > (pid 5382: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "・LONo such device") Please find requested details. Log: [root@Hostname1 ~]# systemctl status pgpool Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: LOG: perhaps failed to create INET domain socket Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: DETAIL: socket(::) failed: "Address family not supported by protocol" Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: LOG: pgpool-II successfully started. version 3.7.7 (amefuriboshi) Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5382: FATAL: failed to create watchdog heartbeat receive socket Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5382: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "No such device" Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5383: FATAL: failed to create watchdog heartbeat sender socket Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5383: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "No such device" Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5377: LOG: successfully acquired the delegate IP:"VIP Address" Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5377: DETAIL: 'if_up_cmd' returned with success Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5358: LOG: watchdog escalation process with pid: 5377 exit with SUCCESS. Thanks for you help. Regards, Siva. |
|
|
> 1. Status Mismatch between postgres server and master pool server If Pgpool-II have already judged a node for down status once, it will not check again. So, you need to do that this node is reattached to the Pgpool-II by manually. for the way for example, you can use 'pcp_attach_node' command or shutdown Pgpool-II and delete 'pgpool_status' file in a directory specified 'log_dir' paramter, and then restart Pgpool-II. > 2. Mismatch status between pool to pool: The reason of no sync may be that connection failed between pgpool. Please check pgpool's status to use pcp_watchdog_info command for each Pgpool's server. for example: pcp_watchdog_info -v -h IP_ADRESS -p PCP_PORT -U USERNAME * pcp_watchdog_info command http://www.pgpool.net/docs/pgpool-II-3.7.7/en/html/pcp-watchdog-info.html > 3. setuid issue: if_up_cmd, if_down_cmd and arping_cmd need administrator authority. You have to set setuid(use 'chmod u+s') or sudo without password, to execute their command by Pgpool. http://www.pgpool.net/docs/pgpool-II-3.7.7/ja/html/tutorial-watchdog-intro.html#TUTORIAL-WATCHDOG-AUTOMATIC-VIP > 4. No device found Probably, heartbeat setting are wrong. For example, does the 'eth0' device specified by your conf exist on the server? You can check it using 'ip a' command. |
|
|
> 1. Status Mismatch between postgres server and master pool server If Pgpool-II have already judged a node for down status once, it will not check again. So, you need to do that this node is reattached to the Pgpool-II by manually. for the way for example, you can use 'pcp_attach_node' command or shutdown Pgpool-II and delete 'pgpool_status' file in a directory specified 'log_dir' paramter, and then restart Pgpool-II. ----------- shutdown Pgpool-II and delete 'pgpool_status' file in a directory specified 'log_dir' paramter, and then restart Pgpool-II. ==> we have done this one , after restarting it is fine but again after 6 hours again getting wrong update. From the repmgr status , it is showing right update. > 2. Mismatch status between pool to pool: The reason of no sync may be that connection failed between pgpool. Please check pgpool's status to use pcp_watchdog_info command for each Pgpool's server. for example: pcp_watchdog_info -v -h IP_ADRESS -p PCP_PORT -U USERNAME * pcp_watchdog_info command http://www.pgpool.net/docs/pgpool-II-3.7.7/en/html/pcp-watchdog-info.html > 3. setuid issue: if_up_cmd, if_down_cmd and arping_cmd need administrator authority. You have to set setuid(use 'chmod u+s') or sudo without password, to execute their command by Pgpool. http://www.pgpool.net/docs/pgpool-II-3.7.7/ja/html/tutorial-watchdog-intro.html#TUTORIAL-WATCHDOG-AUTOMATIC-VIP ============ All starting and stopping done by using root user only . But still getting above issue > 4. No device found Probably, heartbeat setting are wrong. For example, does the 'eth0' device specified by your conf exist on the server? You can check it using 'ip a' command. ========= I will check this one. Is this cause any issues? like status mismatch between pools or status mismatch between DB servers to pgpool? ======= Status mismatch is re occurring every day.. Could you please help us to permanent fix for this issue.. Thanks a lot for your help. |
|
|
> All starting and stopping done by using root user only . But still getting above issue I see, so this warning messages is no problem. Could you share pgpool.log (in '/var/log/pgpool') and postgresql.log of each server? I could not specified a cause by only part of log. And I want to confirm the result of pcp_watchdog_info command . |
|
|
Thanks for your help hoshiai .. I will update the status |
|
|
Hi Hoshiai, Please find watchdog info of both pgpool master and slave servers: [root@N1PPRL-UFGA0159 pgpool-II]# pcp_watchdog_info -h 10.222.164.81 -p 9898 -U pgpool -v Password: Watchdog Cluster Information Total Nodes : 2 Remote Nodes : 1 Quorum state : QUORUM IS ON THE EDGE Alive Remote Nodes : 0 VIP up on local node : YES Master Node Name : 10.222.164.81:5432 Linux N1PPRL-UFGA0159 Master Host Name : 10.222.164.81 Watchdog Node Information Node Name : 10.222.164.81:5432 Linux N1PPRL-UFGA0159 Host Name : 10.222.164.81 Delegate IP : 10.222.167.250 Pgpool port : 5432 Watchdog port : 9000 Node priority : 2 Status : 4 Status Name : MASTER Node Name : Not_Set Host Name : 10.222.16.82 Delegate IP : Not_Set Pgpool port : 5432 Watchdog port : 9000 Node priority : 0 Status : 0 Status Name : DEAD [root@N1PPRL-UFGA0159 pgpool-II]# pcp_watchdog_info -h 10.222.164.81 -p 9898 -U pgpool -v Password: Watchdog Cluster Information Total Nodes : 2 Remote Nodes : 1 Quorum state : QUORUM IS ON THE EDGE Alive Remote Nodes : 0 VIP up on local node : YES Master Node Name : 10.222.164.81:5432 Linux N1PPRL-UFGA0159 Master Host Name : 10.222.164.81 Watchdog Node Information Node Name : 10.222.164.81:5432 Linux N1PPRL-UFGA0159 Host Name : 10.222.164.81 Delegate IP : 10.222.167.250 Pgpool port : 5432 Watchdog port : 9000 Node priority : 2 Status : 4 Status Name : MASTER Node Name : Not_Set Host Name : 10.222.16.82 Delegate IP : Not_Set Pgpool port : 5432 Watchdog port : 9000 Node priority : 0 Status : 0 Status Name : DEAD === |
|
|
Hi Hoshiai, Please find the all configuration and log file for reference... Thanks for your help. |
|
|
Hi siva, Thank you for sharing log file. Probably, when communication between Pgpool-II and PostgreSQL failed, Pgpool-II degenerate backend PostgreSQL, because of fail_over_on_backend_error = on. How about setting healthcheck enabled, instead of fail_over_on_backend = off . And heartbeat also failed because of "No such device". Does the 'eth0' contain network about '10.222.16.xx' ? I think heartbeat will probably be success, when 'heartbeat_device0' paramter change from 'eth0' to ''(empty caharacters). |
|
|
> Q) Here finding is , we are not using any failover command to failover , but still showing failover operation in log , but in backend nothing was happend. Is any sort of promotion or degeneration happend at pool level with out effecting backend DB node? A) Yes. this log are meant degeneration happend at Pgpool level. Actually, PostgreSQL not failover. |
|
|
Hi hoshiai, thanks for your help , will check and get back to you. |
|
|
Hi hoshiai, Thanks for your help. And heartbeat also failed because of "No such device". Does the 'eth0' contain network about '10.222.16.xx' ? I think heartbeat will probably be success, when 'heartbeat_device0' paramter change from 'eth0' to ''(empty caharacters). Change: I have done change as per your suggestion.'heartbeat_device0='' . Now both Master and Slave pool servers are acting as "Masters". As per the log both are sending heartbeat signals and not getting reply from both the servers so assuming as masters. Below is log from slave pool: LOG: watchdog node state changed from [DEAD] to [LOADING] LOG: watchdog node state changed from [LOADING] to [JOINING] LOG: watchdog node state changed from [JOINING] to [INITIALIZING LOG: I am the only alive node in the watchdog cluster HINT: skipping stand for coordinator state LOG: watchdog node state changed from [INITIALIZING] to [MASTER] LOG: I am announcing my self as master/coordinator watchdog node Below are the some of findings: we are not able to find watchdog heartbeat port : 9694 and not able accept the connections on 9694 even both master and slave pool are up and running. [root@s2n pgpool-II]# netstat -nlt Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:9898 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:5432 0.0.0.0:* LISTEN [root@INHUSZ1-V1625126 ~]# netstat -nlt Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:9898 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:5433 0.0.0.0:* LISTEN [root@s2n pgpool-II]# hostname -I 172.16.251.25 [root@s2n pgpool-II]# telnet 172.16.251.26 5432 Trying 172.16.251.26... Connected to 172.16.251.26. Escape character is '^]'. ^CConnection closed by foreign host. [root@s2n pgpool-II]# telnet 172.16.251.26 9000 Trying 172.16.251.26... Connected to 172.16.251.26. Escape character is '^]'. ^CConnection closed by foreign host. [root@s2n pgpool-II]# telnet 172.16.251.26 9898 Trying 172.16.251.26... Connected to 172.16.251.26. Escape character is '^]'. ^CConnection closed by foreign host. [root@s2n pgpool-II]# telnet 172.16.251.26 9694 Trying 172.16.251.26... telnet: connect to address 172.16.251.26: Connection refused On Slave: [root@INHUSZ1-V1625126 ~]# telnet 172.16.251.25 9694 Trying 172.16.251.25... telnet: connect to address 172.16.251.25: Connection refused [root@INHUSZ1-V1625126 ~]# telnet 172.16.251.25 9898 Trying 172.16.251.25... Connected to 172.16.251.25. Escape character is '^]'. is there any other config parameters to tweek to get sync between servers apart from other_pgpool_hostname0 other_pgpool_port0 other_wd_port0 heartbeat_destination0 heartbeat_destination_port0 = 9694 heartbeat_device0 = '' |
|
|
Hi hoshiai, We have found some heartbeat sender and receiver port mismatch, is it expected or is their any issue with my setup. The heartbeat sender process is keep on changing after restarting the pool. On Master: [root@INHUSZ1-V1625126 ~]# netstat -nltup | grep pgpool tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN 25579/pgpool: watch tcp 0 0 0.0.0.0:9898 0.0.0.0:* LISTEN 25578/pgpool tcp 0 0 0.0.0.0:5433 0.0.0.0:* LISTEN 25578/pgpool tcp6 0 0 :::5433 :::* LISTEN 25578/pgpool udp 0 0 0.0.0.0:9694 0.0.0.0:* 25587/pgpool: heart udp 0 0 0.0.0.0:59432 0.0.0.0:* 25589/pgpool: heart [root@INHUSZ1-V1625126 ~]# ps -eaf| grep 59432 root 31903 24284 0 13:38 pts/2 00:00:00 grep --color=auto 59432 [root@INHUSZ1-V1625126 ~]# ps -eaf| grep 25589 root 25589 25581 0 00:12 ? 00:00:01 pgpool: heartbeat sender root 31920 24284 0 13:38 pts/2 00:00:00 grep --color=auto 25589 [root@INHUSZ1-V1625126 ~]# ps -eaf| grep 25581 root 25581 25578 0 00:12 ? 00:00:35 pgpool: lifecheck root 25587 25581 0 00:12 ? 00:00:00 pgpool: heartbeat receiver root 25589 25581 0 00:12 ? 00:00:01 pgpool: heartbeat sender root 32018 25581 0 13:39 ? 00:00:00 ping -q -c3 172.16.251.26 root 32019 25581 0 13:39 ? 00:00:00 ping -q -c3 172.16.251.27 root 32022 24284 0 13:39 pts/2 00:00:00 grep --color=auto 25581 ? pgpool.service - Pgpool-II Loaded: loaded (/usr/lib/systemd/system/pgpool.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2019-03-29 00:12:51 IST; 13h ago Process: 25468 ExecStop=/usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf $STOP_OPTS stop (code=exited, status=0/SUCCESS) Main PID: 25578 (pgpool) Tasks: 42 CGroup: /system.slice/pgpool.service +-25578 /usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf -n +-25579 pgpool: watchdog +-25581 pgpool: lifecheck +-25582 pgpool: wait for connection request +-25583 pgpool: wait for connection request +-25584 pgpool: wait for connection request +-25585 pgpool: wait for connection request +-25586 pgpool: wait for connection request +-25587 pgpool: heartbeat receiver +-25588 pgpool: wait for connection request +-25589 pgpool: heartbeat sender +-25590 pgpool: wait for connection request +-25591 pgpool: wait for connection request +-25592 pgpool: wait for connection request +-25593 pgpool: wait for connection request +-25594 pgpool: wait for connection request +-25595 pgpool: wait for connection request +-25596 pgpool: wait for connection request +-25597 pgpool: wait for connection request +-25598 pgpool: wait for connection request +-25599 pgpool: wait for connection request +-25600 pgpool: wait for connection request +-25601 pgpool: wait for connection request +-25602 pgpool: wait for connection request +-25603 pgpool: wait for connection request +-25604 pgpool: wait for connection request +-25605 pgpool: wait for connection request +-25606 pgpool: wait for connection request +-25607 pgpool: wait for connection request +-25608 pgpool: wait for connection request +-25609 pgpool: wait for connection request +-25610 pgpool: wait for connection request +-25611 pgpool: wait for connection request +-25612 pgpool: wait for connection request +-25613 pgpool: wait for connection request +-25614 pgpool: wait for connection request +-25615 pgpool: wait for connection request +-25617 pgpool: PCP: wait for connection request +-25618 pgpool: worker process +-25619 pgpool: health check process(0) +-25620 pgpool: health check process(1) --- On Slave : [root@s2n pgpool-II]# netstat -nltup | grep pgpool tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN 4003/pgpool: watchd tcp 0 0 0.0.0.0:9898 0.0.0.0:* LISTEN 4002/pgpool tcp 0 0 0.0.0.0:5432 0.0.0.0:* LISTEN 4002/pgpool tcp6 0 0 :::5432 :::* LISTEN 4002/pgpool udp 0 0 0.0.0.0:9694 0.0.0.0:* 4013/pgpool: heartb udp 0 0 0.0.0.0:36764 0.0.0.0:* 4015/pgpool: heartb [root@s2n pgpool-II]# systemctl status pgpool.service ? pgpool.service - Pgpool-II Loaded: loaded (/usr/lib/systemd/system/pgpool.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2019-03-29 00:12:47 IST; 14h ago Process: 3861 ExecStop=/usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf $STOP_OPTS stop (code=exited, status=0/SUCCESS) Main PID: 4002 (pgpool) CGroup: /system.slice/pgpool.service +-4002 /usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf -n +-4003 pgpool: watchdog +-4009 pgpool: lifecheck +-4010 pgpool: wait for connection request +-4011 pgpool: wait for connection request +-4012 pgpool: wait for connection request +-4013 pgpool: heartbeat receiver +-4014 pgpool: wait for connection request +-4015 pgpool: heartbeat sender +-4016 pgpool: wait for connection request Could you please comment on this and is cause issue? Thanks for your help. |
|
|
> Change: I have done change as per your suggestion.'heartbeat_device0='' . Now both Master and Slave pool servers are acting as "Masters". Thank you for trying. I understood that don't resolve a problem to change "heartbeat_device0". I am interested to the result of netstat command apear heartbeat sender/receiver process. Does it mean that haven't appear "No such device" messages? > udp 0 0 0.0.0.0:9694 0.0.0.0:* 25587/pgpool: heartbeat receiver > udp 0 0 0.0.0.0:59432 0.0.0.0:* 25589/pgpool: heartbeat sender This is no problem. The heartbeat sender process send a signal to other servers' 9694 port, so its socket does not need to use 9694 port on own local. |
|
|
Hi hoshiai, For missing sync between master and slave pgpool.. i have checked watchdog socket connectivity, not able to see established connection between master and slave pool: Is this cause missing connectivity between master and slave ? and even for tcpdump -i ens192 port 9000 , i am not able to get any response on master as well as slave? On Slave: [root@s2n pgpool-II]# netstat -nltupa | grep pgpool tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN 22101/pgpool: watch tcp 0 0 0.0.0.0:9898 0.0.0.0:* LISTEN 22100/pgpool tcp 0 0 0.0.0.0:5432 0.0.0.0:* LISTEN 22100/pgpool tcp6 0 0 :::5432 :::* LISTEN 22100/pgpool udp 0 0 0.0.0.0:56384 0.0.0.0:* 22124/pgpool: heart udp 0 0 0.0.0.0:9694 0.0.0.0:* 22123/pgpool: heart On Master : [root@INHUSZ1-V1625126 ~]# netstat -nltupa | grep pgpool tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN 30772/pgpool: watch tcp 0 0 0.0.0.0:9898 0.0.0.0:* LISTEN 30771/pgpool tcp 0 0 0.0.0.0:5433 0.0.0.0:* LISTEN 30771/pgpool tcp6 0 0 :::5433 :::* LISTEN 30771/pgpool udp 0 0 0.0.0.0:9694 0.0.0.0:* 30795/pgpool: heart udp 0 0 0.0.0.0:40139 0.0.0.0:* 30796/pgpool: heart Is above issue is causing EVENT = TIMEOUT while initializing watchdog on slave pool instated of getting " outbound connection to master pool server "? or is here any issue for causing " EVENT =TIMEOUT" instead of outbound connection to master server? What are the possible causes for getting this issue? 2019-03-28 18:55:52: pid 24816: DEBUG: STATE MACHINE INVOKED WITH EVENT = STATE CHANGED Current State = LOADING 2019-03-28 18:55:57: pid 24816: DEBUG: STATE MACHINE INVOKED WITH EVENT = TIMEOUT Current State = LOADING Please share your comments on this.. Thanks a lot for your ongoing support :) |
|
|
Hi Hoshiai, Good Morning, Please share your suggestion on above issue. Thank you. |
|
|
Hi siva, Sorry for late reply. Currently,I am checking a sourcecode and a Pgpool's operation. Could you confirm to conmmunicate beetween servers? In particular, test that Slave Pgpool server communicates to parimary Pgpool server using heartbeat ports(other_wd_port0, heartbeat_destination_port0). Using such as 'nmap' command. If you can use 'nmap' command, you can check with the following command: nmap {primary_server_address} -sUT -p 'T:{other_wd_port0},{hearbeat_destination_port0}, U:{other_wd_port0},{hearbeat_destination_port0}' |
|
|
Thanks hoshiai, i will check and share the output. |
|
|
Hi hoshiai, Thanks for your "nmap {primary_server_address} -sUT -p 'T:{other_wd_port0},{hearbeat_destination_port0}, U:{other_wd_port0},{hearbeat_destination_port0}'". Port has not enabled , now we can able to identify the issue and now both master and slave are in sync . Issue has been resolved . Thanks a lot for your support, much appreciate for your support. Please close this issue now. |
|
|
It is glad that your problem resolved. Ok, I colsed this issue. |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2019-03-20 14:31 | siva | New Issue | |
| 2019-03-20 14:40 | pengbo | Assigned To | => hoshiai |
| 2019-03-20 14:40 | pengbo | Status | new => assigned |
| 2019-03-22 16:17 | hoshiai | Note Added: 0002450 | |
| 2019-03-24 13:29 | siva | File Added: Master_Pool_config_file.txt | |
| 2019-03-24 13:29 | siva | Note Added: 0002451 | |
| 2019-03-25 15:15 | hoshiai | Note Added: 0002453 | |
| 2019-03-25 15:55 | siva | Note Added: 0002454 | |
| 2019-03-25 16:56 | hoshiai | Note Added: 0002455 | |
| 2019-03-25 23:51 | siva | Note Added: 0002459 | |
| 2019-03-27 23:44 | siva | Note Added: 0002469 | |
| 2019-03-28 00:24 | siva | File Added: Pool Issues debug Info.zip | |
| 2019-03-28 00:24 | siva | Note Added: 0002471 | |
| 2019-03-28 14:14 | hoshiai | Note Added: 0002474 | |
| 2019-03-28 14:29 | hoshiai | Note Added: 0002476 | |
| 2019-03-28 19:04 | siva | Note Added: 0002477 | |
| 2019-03-29 00:55 | siva | Note Added: 0002480 | |
| 2019-03-29 13:31 | siva | Note Added: 0002481 | |
| 2019-03-29 16:59 | hoshiai | Note Added: 0002482 | |
| 2019-03-30 23:17 | siva | Note Added: 0002486 | |
| 2019-04-02 14:19 | siva | Note Added: 0002500 | |
| 2019-04-03 17:29 | hoshiai | Note Added: 0002510 | |
| 2019-04-03 18:54 | siva | Note Added: 0002511 | |
| 2019-04-04 19:21 | siva | Note Added: 0002521 | |
| 2019-04-05 08:58 | hoshiai | Status | assigned => closed |
| 2019-04-05 08:58 | hoshiai | Resolution | open => no change required |
| 2019-04-05 08:58 | hoshiai | Note Added: 0002523 |