[pgpool-general: 8594] Re: PgPool thinks node 0 is in recovery.

Sat Feb 4 14:17:26 JST 2023

You can stop pgpool and remove pgpool_status file then start pgpool so
that it recognizes backend 0. pgpool_status is recreated upon starting
up of pgpool.

pgpool_status should be located under "logdir" (in your case /tmp).

> Correct.  It's as if pgpool doesn't even see backend_hostname0.  (I
> tried commenting out all of the backend_*1 config items, and pgpool
> didn't see *anything*. "psql --port=9999" refused connection.)
> 
> $ psql --port=9999 -c "\x" -c "show pool_nodes;"
> Expanded display is on.
> -[ RECORD 1 ]----------+--------------------
> node_id                | 0
> hostname               | FISPCCPGS405a
> port                   | 5432
> status                 | down <<<<<<<<<<<<<<<<<<<
> pg_status              | up
> lb_weight              | 0.666667
> role                   | primary
> pg_role                | primary
> select_cnt             | 0
> load_balance_node      | false
> replication_delay      | 0
> replication_state      |
> replication_sync_state |
> last_status_change     | 2023-02-03 23:07:59
> -[ RECORD 2 ]----------+--------------------
> node_id                | 1
> hostname               | FISPCCPGS405b
> port                   | 5432
> status                 | up
> pg_status              | up
> lb_weight              | 0.333333
> role                   | standby
> pg_role                | standby
> select_cnt             | 0
> load_balance_node      | true
> replication_delay      | 0
> replication_state      |
> replication_sync_state |
> last_status_change     | 2023-02-03 23:07:59
> 
> 
> On 2/3/23 18:15, Tatsuo Ishii wrote:
>> It seems pgpool thinks backend node 0 is down. To confirm this, can
>> you share pool_status file and the result of show pool_nodes?
>>
>>> Logs attached, with log_statement = 'all'.
>>>
>>> I don't see any attempted connections to the primary server when
>>> pgpool is starting up.
>>>
>>> On 2/3/23 03:25, Tatsuo Ishii wrote:
>>>> Can you share PostgreSQL log of the primary with log_statement =
>>>> 'all'?  I would like to confirm that queries sent from sr_check worker
>>>> are reached to the primary. If so, you should see something like:
>>>>
>>>> 1771450 2023-02-03 18:19:05.585 JST LOG: statement: SELECT
>>>> pg_is_in_recovery()
>>>> 1771463 2023-02-03 18:19:15.597 JST LOG: statement: SELECT
>>>> pg_current_wal_lsn()
>>>>
>>>> Best reagards,
>>>> --
>>>> Tatsuo Ishii
>>>> SRA OSS LLC
>>>> English:http://www.sraoss.co.jp/index_en/
>>>> Japanese:http://www.sraoss.co.jp
>>>>
>>>>> Attached are three log files (pgpool, the primary and replicated
>>>>> servers).
>>>>>
>>>>> The primary is definitely not in replication mode.
>>>>>
>>>>> On 2/1/23 00:04, Tatsuo Ishii wrote:
>>>>>>> There must have been a miscommunication; I thought I attached my
>>>>>>> pgpool.conf and the log file to a previous email, but maybe not.
>>>>>>>
>>>>>>> I fixed the backend_port0 problem last week.
>>>>>> Ok.
>>>>>>
>>>>>>> pgppol is already running with pgpool.conf log_min_messages=debug3. Is
>>>>>>> that sufficient?
>>>>>> Yes.
>>>>>>
>>>>>>> Attached is the error log from when I last started pgpool, and the
>>>>>>> pgpool.conf from that time.
>>>>>> I see some errors with streaming replication check process:
>>>>>>
>>>>>> 2023-01-26 13:31:04.594: sr_check_worker pid 796880: DEBUG: do_query:
>>>>>> extended:0 query:"SELECT pg_current_wal_lsn()"
>>>>>> 2023-01-26 13:31:04.594: sr_check_worker pid 796880: CONTEXT: while
>>>>>> checking replication time lag
>>>>>> 2023-01-26 13:31:09.594: health_check1 pid 796881: DEBUG: health
>>>>>> check: clearing alarm
>>>>>> 2023-01-26 13:31:09.603: health_check1 pid 796881: DEBUG: authenticate
>>>>>> kind = 10
>>>>>> 2023-01-26 13:31:09.612: health_check1 pid 796881: DEBUG: SCRAM
>>>>>> authentication successful for user:pool_health_check
>>>>>> 2023-01-26 13:31:09.612: health_check1 pid 796881: DEBUG: authenticate
>>>>>> backend: key data received
>>>>>> 2023-01-26 13:31:09.612: health_check1 pid 796881: DEBUG: authenticate
>>>>>> backend: transaction state: I
>>>>>> 2023-01-26 13:31:09.612: health_check1 pid 796881: DEBUG: health
>>>>>> check: clearing alarm
>>>>>> 2023-01-26 13:31:09.612: health_check1 pid 796881: DEBUG: health
>>>>>> check: clearing alarm
>>>>>> 2023-01-26 13:31:14.595: sr_check_worker pid 796880: FATAL: Backend
>>>>>> throw an error message
>>>>>> 2023-01-26 13:31:14.595: sr_check_worker pid 796880: DETAIL: Exiting
>>>>>> current session because of an error from backend
>>>>>> 2023-01-26 13:31:14.595: sr_check_worker pid 796880: HINT: BACKEND
>>>>>> Error: "recovery is in progress"
>>>>>> 2023-01-26 13:31:14.595: sr_check_worker pid 796880: CONTEXT: while
>>>>>> checking replication time lag
>>>>>>
>>>>>> sr_check_process tried to dtermin WAL LSN on backend0 by issuing
>>>>>> "SELECT pg_current_wal_lsn()" to backend0 but failed with:
>>>>>>
>>>>>>> 2023-01-26 13:31:14.595: sr_check_worker pid 796880: HINT: BACKEND
>>>>>>> Error: "recovery is in progress"
>>>>>> This suggests that backend0 is running as a standby server. I guess
>>>>>> there's something wrong with the setting in backend0.  Maybe
>>>>>> standby.signal exists?  Can you share PostgreSQL log of backend0 at
>>>>>> it's start up?
>>>>>>
>>>>>> Best reagards,
>>>>>> --
>>>>>> Tatsuo Ishii
>>>>>> SRA OSS LLC
>>>>>> English:http://www.sraoss.co.jp/index_en/
>>>>>> Japanese:http://www.sraoss.co.jp
>>>>> -- 
>>>>> Born in Arizona, moved to Babylonia.
>>> -- 
>>> Born in Arizona, moved to Babylonia.
> 
> -- 
> Born in Arizona, moved to Babylonia.