View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000659 | Pgpool-II | Bug | public | 2020-11-14 00:35 | 2020-12-22 19:53 |
| Reporter | tmartincpp | Assigned To | t-ishii | ||
| Priority | normal | Severity | minor | Reproducibility | have not tried |
| Status | feedback | Resolution | open | ||
| Product Version | 4.1.4 | ||||
| Summary | 0000659: Failover issue in replication mode | ||||
| Description | Hello ! We use pgpool in replication and load balancing mode with 2 nodes and often we have failover issues with SELECT statements. Postgresql version si 11.9 . Here is our configuration: listen_backlog_multiplier = 2 serialize_accept = on enable_pool_hba = on pool_passwd = 'pool_passwd' authentication_timeout = 60 ssl = off num_init_children = 280 max_pool = 3 child_life_time = 0 child_max_connections = 0 connection_life_time = 1 client_idle_limit = 0 connection_cache = on reset_query_list = 'ABORT; DISCARD ALL' replication_mode = on replicate_select = off insert_lock = on lobj_lock_table = '' replication_stop_on_mismatch = on failover_if_affected_tuples_mismatch = off load_balance_mode = on ignore_leading_white_space = on white_function_list = '' black_function_list = 'nextval,setval' database_redirect_preference_list = '' app_name_redirect_preference_list = '' allow_sql_comments = off failover_on_backend_error = on relcache_expire = 0 relcache_size = 256 check_temp_table = on check_unlogged_table = off enable_shared_relcache = on pgpool logs: Nov 12 23:53:39 host pgpool-II[30574]: [65630-1] LOG: DB node id: 1 backend pid: 12650 statement: SELECT X FROM Y WHERE true ; Nov 12 23:53:39 host pgpool-II[30574]: [65631-1] LOG: statement: DISCARD ALL Nov 12 23:53:39 host pgpool-II[30574]: [65632-1] LOG: DB node id: 0 backend pid: 7992 statement: DISCARD ALL Nov 12 23:53:39 host pgpool-II[30574]: [65633-1] LOG: DB node id: 1 backend pid: 12650 statement: DISCARD ALL Nov 13 00:00:12 host pgpool-II[30574]: [78709-1] ERROR: unable to write data to frontend Nov 13 00:00:12 host pgpool-II[30574]: [78709-2] DETAIL: pool_flush failed Nov 13 00:00:12 host pgpool-II[30574]: [78715-1] FATAL: failed to read kind from backend Nov 13 00:00:12 host pgpool-II[30574]: [78715-2] DETAIL: kind mismatch among backends. Possible last query was: " DISCARD ALL" kind details are: 0[C] 1[D] Nov 13 00:00:12 host pgpool-II[30574]: [78715-3] HINT: check data consistency among db nodes (END) node0 logs: 2020-11-12 23:53:38.936 CET:10.101.24.11:user@database:[5fadbcf2.1f38]:[7992]: LOG: connection authorized: user=X database=Y 2020-11-12 23:53:39.396 CET:10.101.24.11:user@database:[5fadbcf2.1f38]:[7992]: LOG: duration: 0.087 ms statement: DISCARD ALL 2020-11-12 23:53:40.396 CET:10.101.24.11:user@database:[5fadbcf2.1f38]:[7992]: LOG: disconnection: session time: 0:00:01.461 user=user database=database host=x.x.x.x port=46838 node1 logs: 2020-11-12 23:53:39.018 CET:10.101.24.11:user@database:[5fadbcf2.316a]:[12650]: LOG: duration: 21.532 ms statement: SELECT X FROM Y WHERE true ; 2020-11-12 23:53:39.396 CET:10.101.24.11:user@database:[5fadbcf2.316a]:[12650]: LOG: duration: 0.044 ms statement: DISCARD ALL 2020-11-12 23:53:40.396 CET:10.101.24.11:user@database:[5fadbcf2.316a]:[12650]: LOG: disconnection: session time: 0:00:01.461 user=user database=database host=10.101.24.11 port=32914 SELECT statements are not replicated in configuration so the DISCARD gets different return codes and pgpool degenerate a node. In this context node1 is queried. There was no WRITE statements. I don't understand why pgpool connects to node0 and issues a discard to this node. It is really strange that the degeneration happens minutes later for the same PID. It implies that it works well. I can provide full obfuscated logs if necessary. | ||||
| Tags | No tags attached. | ||||
|
|
> There was no WRITE statements. > I don't understand why pgpool connects to node0 and issues a discard to this node. The reason why DISCARD is issued is you have it in the reset_query_list. It's perfectly normal. > It is really strange that the degeneration happens minutes later for the same PID. Yeah, it's strange. Can you show me how to reliably reproduce the error (failover)? |
|
|
Oh I thought the DISCARD command was only sent to all the nodes which sent WRITE queries. So when we have this log: Nov 13 00:00:12 host pgpool-II[30574]: [78715-2] DETAIL: kind mismatch among backends. Possible last query was: " DISCARD ALL" kind details are: 0[C] 1[D] it's not a problem that the kind details are different for each node ? Otherwise I'm trying to replicate the issue but no success so far. I'm suspecting a "brutal" disconnection issue causing this behavior ( aprogram not properly closing its connection). |
|
|
Sorry for delay. > it's not a problem that the kind details are different for each node ? Yes, it's a problem. > Can you show me how to reliably reproduce the error (failover)? Can you share how to reproduce the error? |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2020-11-14 00:35 | tmartincpp | New Issue | |
| 2020-11-17 10:36 | t-ishii | Assigned To | => t-ishii |
| 2020-11-17 10:36 | t-ishii | Status | new => assigned |
| 2020-11-17 12:02 | t-ishii | Note Added: 0003597 | |
| 2020-11-17 12:03 | t-ishii | Status | assigned => feedback |
| 2020-11-21 00:25 | tmartincpp | Note Added: 0003606 | |
| 2020-11-21 00:25 | tmartincpp | Status | feedback => assigned |
| 2020-12-22 18:29 | t-ishii | Note Added: 0003679 | |
| 2020-12-22 19:53 | t-ishii | Status | assigned => feedback |