View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000544 | Pgpool-II | Bug | public | 2019-09-06 00:34 | 2019-09-18 07:30 |
| Reporter | Carlos Mendez | Assigned To | t-ishii | ||
| Priority | urgent | Severity | major | Reproducibility | random |
| Status | closed | Resolution | open | ||
| Product Version | 3.7.1 | ||||
| Summary | 0000544: standby_postgres is in quarantine status | ||||
| Description | PROD CONFIGURATION PostgreSQL 10.4 Primary DB standby DB 3 Pgpools pgpool1 pgpool2 pgpool3 Hi Team In our PROD environment during daily monitoring sometimes when we check the status of our pool nodes we can see that our standby environment is in "quarantine" status psql -h watchdog -p 9999 -c"show pool_nodes" postgres node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay ---------+------------------+------+------------+-----------+---------+------------+-------------------+------------------- 0 | primary_postgres | 5432 | up | 0.500000 | primary | 21527743 | true | 0 1 | standby_postgres | 5432 | quarantine | 0.500000 | standby | 2468553 | false | 12693992 (2 rows) 1. we don't have much experience working with pgpools and postgres DBs 2. we think there was maybe an issue in the sync process between prod and standby for that reason the status of our standby 3. after to see that our standby DB is in quarantine status what we should check and how to return to normal status?, what should be the best practices for apply when we face this status? I have looked for this kind of status but is not clear for me what should be performed from my side after to identify this status I have validated the status from pgpool node and attached the node with this activity we can see again the standby status in UP [postgres@pgpool_01 ~]$ pcp_node_info 1 Password: standby_postgres 5432 3 0.500000 down standby [postgres@pgpool_01 ~]$ pcp_attach_node 1 Password: pcp_attach_node -- Command Successful node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay ---------+------------------+------+--------+-----------+---------+------------+-------------------+------------------- 0 | primary_postgres | 5432 | up | 0.500000 | primary | 21551718 | true | 0 1 | standby_postgres | 5432 | up | 0.500000 | standby | 2468679 | false | 0 (2 rows) Any kind of information will help me to improve this kind of events Regards | ||||
| Tags | pgpool 3.7.1, quarantine status, settings, standby | ||||
|
|
To understand "quarantine", see the manual "5.14.6. Controlling the Failover behavior". In summary this means that one of pgpool detects connection problem with PostgreSQL node1, but other pgpool do detect any error with node1. In this case you need to inspect the pgpool log and/or other systems logs to see if connection between the pgpool and node1 is ok or not. You said that after pcp_attach_node executing, node 1 came back to "up". This suggests that the failure is a temporary one (otherwise the node should go into quarantine state again). Note that upcoming Pgpool-II 4.1 (supposed to be released in this October) automatically brings back the quarantine node if the failure is a temporary one. If you are interested, please try Pgpool-II 4.1 beta1, which will be released today. |
|
|
Hi T-ishii Thanks by your comments, do you know what kind of errors should be looking in the pgpool logs? some string error? We can see some errors but not sure if this have some relation with the intermittence 2019-09-06 09:56:13: pid 8931: ERROR: unable to read data from frontend 2019-09-06 09:56:13: pid 8931: DETAIL: EOF encountered with frontend 2019-09-06 09:57:12: pid 26580: LOG: new connection received 2019-09-06 09:57:12: pid 26580: DETAIL: connecting host=10.241.166.76 port=39322 2019-09-06 09:57:12: pid 26580: ERROR: unable to read data from frontend 2019-09-06 09:57:12: pid 26580: DETAIL: EOF encountered with frontend 2019-09-06 09:58:13: pid 26580: LOG: new connection received 2019-09-06 09:58:13: pid 26580: DETAIL: connecting host=10.241.166.76 port=39936 2019-09-06 09:58:13: pid 26580: ERROR: unable to read data from frontend Regards CAR |
|
|
You can look for "quarantine" (case insensitive search is recommended). |
|
|
Thanks T-ishii We will consider your recommendation, in the same way we will wait for the new version of Pgpool-II 4.1 (October version) Regards CAR |
|
|
Can we close this issue? |
|
|
Hi T-ishii We can close the issue Regards |
|
|
Thank you for confirming. Issue closed. |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2019-09-06 00:34 | Carlos Mendez | New Issue | |
| 2019-09-06 00:34 | Carlos Mendez | Tag Attached: pgpool 3.7.1 | |
| 2019-09-06 00:34 | Carlos Mendez | Tag Attached: settings | |
| 2019-09-06 00:34 | Carlos Mendez | Tag Attached: quarantine status | |
| 2019-09-06 00:34 | Carlos Mendez | Tag Attached: standby | |
| 2019-09-06 08:57 | t-ishii | Assigned To | => t-ishii |
| 2019-09-06 08:57 | t-ishii | Status | new => assigned |
| 2019-09-06 09:09 | t-ishii | Note Added: 0002824 | |
| 2019-09-06 09:09 | t-ishii | Status | assigned => feedback |
| 2019-09-07 00:11 | Carlos Mendez | Note Added: 0002826 | |
| 2019-09-07 00:11 | Carlos Mendez | Status | feedback => assigned |
| 2019-09-11 15:01 | t-ishii | Note Added: 0002832 | |
| 2019-09-11 15:01 | t-ishii | Status | assigned => feedback |
| 2019-09-12 00:45 | Carlos Mendez | Note Added: 0002833 | |
| 2019-09-12 00:45 | Carlos Mendez | Status | feedback => assigned |
| 2019-09-17 15:40 | t-ishii | Note Added: 0002856 | |
| 2019-09-17 15:40 | t-ishii | Status | assigned => feedback |
| 2019-09-18 01:58 | Carlos Mendez | Note Added: 0002858 | |
| 2019-09-18 01:58 | Carlos Mendez | Status | feedback => assigned |
| 2019-09-18 07:30 | t-ishii | Note Added: 0002859 | |
| 2019-09-18 07:30 | t-ishii | Status | assigned => closed |