View Issue Details

IDProjectCategoryView StatusLast Update
0000544Pgpool-IIBugpublic2019-09-18 07:30
ReporterCarlos Mendez Assigned Tot-ishii  
PriorityurgentSeveritymajorReproducibilityrandom
Status closedResolutionopen 
Product Version3.7.1 
Summary0000544: standby_postgres is in quarantine status
DescriptionPROD CONFIGURATION
 PostgreSQL 10.4

Primary DB
standby DB

3 Pgpools
pgpool1
pgpool2
pgpool3

Hi Team

In our PROD environment during daily monitoring sometimes when we check the status of our pool nodes we can see that our standby environment is in "quarantine" status

psql -h watchdog -p 9999 -c"show pool_nodes" postgres

 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+------------------+------+------------+-----------+---------+------------+-------------------+-------------------
 0 | primary_postgres | 5432 | up | 0.500000 | primary | 21527743 | true | 0
 1 | standby_postgres | 5432 | quarantine | 0.500000 | standby | 2468553 | false | 12693992
(2 rows)


1. we don't have much experience working with pgpools and postgres DBs
2. we think there was maybe an issue in the sync process between prod and standby for that reason the status of our standby
3. after to see that our standby DB is in quarantine status what we should check and how to return to normal status?, what should be the best practices for apply when we face this status?

I have looked for this kind of status but is not clear for me what should be performed from my side after to identify this status I have validated the status from pgpool node and attached the node with this activity we can see again the standby status in UP

[postgres@pgpool_01 ~]$ pcp_node_info 1
Password:
standby_postgres 5432 3 0.500000 down standby


[postgres@pgpool_01 ~]$ pcp_attach_node 1
Password:
pcp_attach_node -- Command Successful

 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+------------------+------+--------+-----------+---------+------------+-------------------+-------------------
 0 | primary_postgres | 5432 | up | 0.500000 | primary | 21551718 | true | 0
 1 | standby_postgres | 5432 | up | 0.500000 | standby | 2468679 | false | 0
(2 rows)

Any kind of information will help me to improve this kind of events

Regards







Tagspgpool 3.7.1, quarantine status, settings, standby

Activities

t-ishii

2019-09-06 09:09

developer   ~0002824

To understand "quarantine", see the manual "5.14.6. Controlling the Failover behavior".
In summary this means that one of pgpool detects connection problem with PostgreSQL node1, but other pgpool do detect any error with node1.
In this case you need to inspect the pgpool log and/or other systems logs to see if connection between the pgpool and node1 is ok or not.

You said that after pcp_attach_node executing, node 1 came back to "up". This suggests that the failure is a temporary one (otherwise the node should go into quarantine state again).

Note that upcoming Pgpool-II 4.1 (supposed to be released in this October) automatically brings back the quarantine node if the failure is a temporary one. If you are interested, please try Pgpool-II 4.1 beta1, which will be released today.

Carlos Mendez

2019-09-07 00:11

reporter   ~0002826

Hi T-ishii

Thanks by your comments, do you know what kind of errors should be looking in the pgpool logs? some string error? We can see some errors but not sure if this have some relation with the intermittence

2019-09-06 09:56:13: pid 8931: ERROR: unable to read data from frontend
2019-09-06 09:56:13: pid 8931: DETAIL: EOF encountered with frontend
2019-09-06 09:57:12: pid 26580: LOG: new connection received
2019-09-06 09:57:12: pid 26580: DETAIL: connecting host=10.241.166.76 port=39322
2019-09-06 09:57:12: pid 26580: ERROR: unable to read data from frontend
2019-09-06 09:57:12: pid 26580: DETAIL: EOF encountered with frontend
2019-09-06 09:58:13: pid 26580: LOG: new connection received
2019-09-06 09:58:13: pid 26580: DETAIL: connecting host=10.241.166.76 port=39936
2019-09-06 09:58:13: pid 26580: ERROR: unable to read data from frontend

Regards
CAR

t-ishii

2019-09-11 15:01

developer   ~0002832

You can look for "quarantine" (case insensitive search is recommended).

Carlos Mendez

2019-09-12 00:45

reporter   ~0002833

Thanks T-ishii
We will consider your recommendation, in the same way we will wait for the new version of Pgpool-II 4.1 (October version)

Regards
CAR

t-ishii

2019-09-17 15:40

developer   ~0002856

Can we close this issue?

Carlos Mendez

2019-09-18 01:58

reporter   ~0002858

Hi T-ishii

We can close the issue

Regards

t-ishii

2019-09-18 07:30

developer   ~0002859

Thank you for confirming. Issue closed.

Issue History

Date Modified Username Field Change
2019-09-06 00:34 Carlos Mendez New Issue
2019-09-06 00:34 Carlos Mendez Tag Attached: pgpool 3.7.1
2019-09-06 00:34 Carlos Mendez Tag Attached: settings
2019-09-06 00:34 Carlos Mendez Tag Attached: quarantine status
2019-09-06 00:34 Carlos Mendez Tag Attached: standby
2019-09-06 08:57 t-ishii Assigned To => t-ishii
2019-09-06 08:57 t-ishii Status new => assigned
2019-09-06 09:09 t-ishii Note Added: 0002824
2019-09-06 09:09 t-ishii Status assigned => feedback
2019-09-07 00:11 Carlos Mendez Note Added: 0002826
2019-09-07 00:11 Carlos Mendez Status feedback => assigned
2019-09-11 15:01 t-ishii Note Added: 0002832
2019-09-11 15:01 t-ishii Status assigned => feedback
2019-09-12 00:45 Carlos Mendez Note Added: 0002833
2019-09-12 00:45 Carlos Mendez Status feedback => assigned
2019-09-17 15:40 t-ishii Note Added: 0002856
2019-09-17 15:40 t-ishii Status assigned => feedback
2019-09-18 01:58 Carlos Mendez Note Added: 0002858
2019-09-18 01:58 Carlos Mendez Status feedback => assigned
2019-09-18 07:30 t-ishii Note Added: 0002859
2019-09-18 07:30 t-ishii Status assigned => closed