View Issue Details

IDProjectCategoryView StatusLast Update
0000222Pgpool-IIBugpublic2016-08-12 11:07
ReporterueharaAssigned ToMuhammad Usama 
PrioritynormalSeveritymajorReproducibilitysometimes
Status resolvedResolutionopen 
PlatformVirtualBoxOSCentOSOS Version6.5
Product Version3.5.3 
Target Version3.5.4Fixed in Version 
Summary0000222: Sometimes Failover command isn't executed.
DescriptionPgpool-II 3.5.3 Health check: enable
PostgreSQL 9.5.2


Outlineļ¼š
I'm checking Pgpool-II's behavior when stopped LAN on DB(Master) node to connect Pgpool-II nodes.

There are two DB nodes in Pgpool-II cluster configured in streaming replication mode.
This cluster is constructed with three Pgpool-II nodes.

Sometimes Pgpool-II don't execute failover command when I stopped LAN on DB(Master) server.
It occured if besides COODINATOR node detected failures of DB on health check at first.

Showed these messages on Pgpool-II(Master) when it occured.

------------------
2016-07-25 19:52:18: pid 7487: LOG: watchdog node "Linux_pgpool_01_9999"
is requesting to check lock for failover command start
2016-07-25 19:52:18: pid 7487: LOG: check lock for failover command start
request is denied to node "Linux_pgpool_01_9999"
2016-07-25 19:52:18: pid 7487: DETAIL: node "Linux_pgpool_02_9999" is
holding the lock
------------------


I found the doubtful code at wd_command.c:764 .

watchdog/wd_commands.c
---------
 758 static WDFailoverCMDResults
wd_issue_failover_lock_command(WDFailoverCMDTypes cmdType, char*
syncReqType)
 759 {
 760 WDFailoverCMDResults res;
 761 int x;
 762 for (x=0; x < MAX_SEC_WAIT_FOR_CLUSTER_TRANSATION; x++)
 763 {
 764 res = wd_send_failover_sync_command(NODE_FAILBACK_CMD,syncReqType);
 765 if (res != FAILOVER_RES_TRANSITION)
 766 break;
 767 sleep(1);
 768 }
 769 return res;
 770 }
---------------


I think it should be modified as follows.

--------
  res = wd_send_failover_sync_command(cmdType, syncReqType);
--------

Please tell me what you think.


regards,
Steps To ReproduceFirst, run PostgreSQL(Master and Slave).


Second, run Pgpool-II nodes in the following order.
  pgpool_01,pgpool_02,pgpool_03

And check the status.
$ pcp_watchdog_info -h localhost -U postgres
3 YES Linux_pgpool_01_9999 192.168.2.3

Linux_pgpool_01_9999 192.168.2.3 9999 9000 4 MASTER
Linux_pgpool_02_9999 192.168.2.4 9999 9000 7 STANDBY
Linux_pgpool_03_9999 192.168.2.5 9999 9000 7 STANDBY

$ psql -h 192.168.3.3 -p 9999 postgres -c "show pool_nodes" node_id | hostname | port | status | lb_weight | role | select_cnt
---------+-------------+------+--------+-----------+---------+------------
 0 | 192.168.1.1 | 5432 | 2 | 0.000000 | primary | 0
 1 | 192.168.1.2 | 5432 | 2 | 1.000000 | standby | 0
(2 rows)


Third, stop the NW of PostgreSQL(Master) server.
# ip addr del 192.168.1.1/24 dev eth1
Additional InformationI sent log-file and conf-file of Pgpool-II.

Besides COODINATOR node detected failures of DB at first..
  - original code : 01_Before_NG (NW stop : Mon Jul 25 19:51:36 JST 2016)
  - modified code : 02_After_OK (NW stop : non-measure)

COODINATOR node detected DB failures of DB at first.
  - original code : 03_Before_OK (NW stop : Mon Jul 25 19:57:17 JST 2016)

Pgpool-II node01's conf file
  - Pgpool_01_conf

I added some debug-log.
Please ignore the "LOG: debug-log".
TagsNo tags attached.

Activities

uehara

2016-07-26 09:31

reporter  

Pgpool-II_log.zip (28,363 bytes)

t-ishii

2016-07-26 09:59

developer   ~0000930

Uehara-san, thank you for the report! I have assigned Usama, who is responsible for watchdog.

t-ishii

2016-08-02 11:50

developer   ~0000956

It seems he committed the fix. Please try.

https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=af8af96365c6ddf6e5741b2a9b917aa0276c5c1d

uehara

2016-08-12 10:05

reporter   ~0000981

Thank you for modifying.

I confirmed that No errors occur.

t-ishii

2016-08-12 11:07

developer   ~0000983

Thanks for confirmation. Issue resolved.

Issue History

Date Modified Username Field Change
2016-07-26 09:31 uehara New Issue
2016-07-26 09:31 uehara File Added: Pgpool-II_log.zip
2016-07-26 09:58 t-ishii Assigned To => Muhammad Usama
2016-07-26 09:58 t-ishii Status new => assigned
2016-07-26 09:59 t-ishii Note Added: 0000930
2016-07-31 08:47 t-ishii Product Version => 3.5.3
2016-07-31 08:47 t-ishii Target Version => 3.5.4
2016-08-02 11:50 t-ishii Note Added: 0000956
2016-08-02 11:51 t-ishii Status assigned => feedback
2016-08-12 10:05 uehara Note Added: 0000981
2016-08-12 10:05 uehara Status feedback => assigned
2016-08-12 11:07 t-ishii Note Added: 0000983
2016-08-12 11:07 t-ishii Status assigned => resolved