[pgpool-general: 3571] Re: Out of sync with actual db state (master slave via streaming)

Wed Mar 25 14:32:24 JST 2015

> I've stumbled across a number of scenarios where pgpool's view of the
> db nodes is out of sync with the actual state of them:
> 
> 1/ Restart
> 
> If a standby is restarted (or stopped and started) then it will
> usually continue streaming ok, but pgpool things it is still down, and
> the health check does not seem to realize it is back and ok"
> 
> => SHOW pool_nodes;
>  node_id | hostname | port | status | lb_weight |  role
> ---------+----------+------+--------+-----------+---------
>  0       | db1      | 5432 | 2      | 0.333333  | primary
>  1       | db2      | 5432 | 3      | 0.333333  | standby
>  2       | db3      | 5432 | 2      | 0.333333  | standby
> (3 rows)
> 
> but directly from the primary:
> 
> =# SELECT client_addr,state,sent_location,replay_location ,sync_state
> FROM pg_stat_replication;
>   client_addr | state | sent_location | replay_location | sync_state
> ----------------+-----------+---------------+-----------------+------------
>  192.168.122.72 | streaming | 0/E5F68C98    | 0/E51095E8      | async
>  192.168.122.73 | streaming | 0/E5F68C98    | 0/E581FFF0      | async
> (2 rows)
> 
> Is there some config I'm missing to get the health check to realize
> the standby is back and ok?

It's an expected behavior. Pgpool-II never automatically reattaches
any DB node.

> 2/ Failover/Recovery
> 
> If a failover is initiated (say by stopping postgres on the primary),
> but with pgpool sessions still active then this can happen:
> 
> => SHOW pool_nodes;
>  node_id | hostname | port | status | lb_weight |  role
> ---------+----------+------+--------+-----------+---------
>  0       | db1      | 5432 | 2      | 0.333333  | primary
>  1       | db2      | 5432 | 2      | 0.333333  | standby
>  2       | db3      | 5432 | 1      | 0.333333  | standby
> (3 rows)

Not sure what happend here. Maybe I can chase the cause by looking at
the pgpool log.

> but directly from the primary:
> 
> =# SELECT client_addr,state,sent_location,replay_location ,sync_state
> FROM pg_stat_replication;
>   client_addr | state | sent_location | replay_location | sync_state
> ----------------+-----------+---------------+-----------------+------------
>  192.168.122.72 | streaming | 0/E5000528    | 0/E5000528      | async
>  192.168.122.73 | streaming | 0/E5000528    | 0/E5000528      | async
> (2 rows)
> 
> Again it looks like the health check has not figured out that the 2nd
> standby has started.

? show pool_nodes says the status of node 2 is up (1).

> I've attached config and the various scripts.
> 
> Cheers
> 
> Mark