[pgpool-general: 8927] Re: Testing cluster: it doesn't work as expected

Tue Sep 19 10:57:28 JST 2023

Hi,

> What's the difference between status and pg_status? 

status: backend status managed by pgpool
pg_status: actual backend status (taken from pg_isready. Pgpool-II 4.3 or later)

After failover, pgpool detached the node1 and mark it's status as "down":

> >show pool_nodes;
> "node_id" "hostname" "port" *"status" "pg_status"* "lb_weight" "role"
> "pg_role" "select_cnt" "load_balance_node" "replication_delay"
> "replication_state" "replication_sync_state" "last_status_change"
> "0" "node1" "5432" *"down" "down"* "0.200000" "primary" "unknown" "4"
> "false" "0" "2023-09-18 10:32:44"
> "1" "node2" "5432" "up" "up" "0.400000" "standby" "standby" "4" "true" "0"
> "2023-09-18 10:30:04"
> "2" "node3" "5432" "up" "up" "0.400000" "standby" "standby" "1" "false" "0"
> "2023-09-18 10:30:04"

After you restarted node0, you need to run "pcp_attach_node" to attach this node to pgpool.
Primary node does not automatically fail back because it's a safe way to manage a cluster.

https://www.pgpool.net/docs/latest/en/html/pcp-attach-node.html

On Mon, 18 Sep 2023 12:47:21 +0200
Tan Mientras <tanimientras at gmail.com> wrote:

> Hi.
> 
> Newbie here trying to TEST 3-node pgpool cluster.
> 
> >< show pool_nodes;
> "node_id" "hostname" "port" *"status" "pg_status"* "lb_weight" "role"
> "pg_role" "select_cnt" "load_balance_node" "replication_delay"
> "replication_state" "replication_sync_state" "last_status_change"
> "0" "node1" "5432" *"up" "up"* "0.200000" "primary" "primary" "3" "false"
> "0" "2023-09-18 10:30:04"
> "1" "node2" "5432" "up" "up" "0.400000" "standby" "standby" "0" "false" "0"
> "2023-09-18 10:30:04"
> "2" "node3" "5432" "up" "up" "0.400000" "standby" "standby" "1" "true" "0"
> "2023-09-18 10:30:04"
> 
> 
> *stop node1 postgresql service*
> >service postgresql stop
> >
> >show pool_nodes;
> "node_id" "hostname" "port" *"status" "pg_status"* "lb_weight" "role"
> "pg_role" "select_cnt" "load_balance_node" "replication_delay"
> "replication_state" "replication_sync_state" "last_status_change"
> "0" "node1" "5432" *"down" "down"* "0.200000" "primary" "unknown" "4"
> "false" "0" "2023-09-18 10:32:44"
> "1" "node2" "5432" "up" "up" "0.400000" "standby" "standby" "4" "true" "0"
> "2023-09-18 10:30:04"
> "2" "node3" "5432" "up" "up" "0.400000" "standby" "standby" "1" "false" "0"
> "2023-09-18 10:30:04"
> 
> 
> *start node1 postgresql service*
> >service postgresql start
> >
> >show pool_nodes;
> "node_id" "hostname" "port" *"status" "pg_status"* "lb_weight" "role"
> "pg_role" "select_cnt" "load_balance_node" "replication_delay"
> "replication_state" "replication_sync_state" "last_status_change"
> "0" "node1" "5432" *"down" "up"* "0.200000" "primary" "primary" "4" "false"
> "0" "2023-09-18 10:32:44"
> "1" "node2" "5432" "up" "up" "0.400000" "standby" "standby" "5" "true" "0"
> "2023-09-18 10:30:04"
> "2" "node3" "5432" "up" "up" "0.400000" "standby" "standby" "1" "false" "0"
> "2023-09-18 10:30:04"
> 
> What's the difference between status and pg_status? Why status is still
> down although node1 is already working/started? Should I edit my own
> failover/failback scripts?
> 
> Same not-going-back behaviour happens with node2 and 3 if service is
> stopped and then started.
> 
> 
> *Some logs:*
> pgpool 10:29:51.84
> pgpool 10:29:51.84 Welcome to the Bitnami pgpool container
> pgpool 10:29:51.85 Subscribe to project updates by watching
> https://github.com/bitnami/containers
> pgpool 10:29:51.85 Submit issues and feature requests at
> https://github.com/bitnami/containers/issues
> pgpool 10:29:51.85
> pgpool 10:29:51.86 INFO  ==> ** Starting Pgpool-II setup **
> pgpool 10:29:51.88 INFO  ==> Validating settings in PGPOOL_* env vars...
> pgpool 10:29:51.91 INFO  ==> Initializing Pgpool-II...
> pgpool 10:29:51.91 INFO  ==> Generating pg_hba.conf file...
> pgpool 10:29:51.92 INFO  ==> Generating pgpool.conf file...
> pgpool 10:29:52.16 INFO  ==> Custom configuration '/pgpool.conf' detected!.
> Adding it to the configuration file.
> pgpool 10:29:52.17 INFO  ==> Generating password file for local
> authentication...
> pgpool 10:29:52.18 INFO  ==> Generating password file for pgpool admin
> user...
> pgpool 10:29:52.19 INFO  ==> ** Pgpool-II setup finished! **
> 
> pgpool 10:29:52.21 INFO  ==> ** Starting Pgpool-II **
> 2023-09-18 10:29:52.241: main pid 14: LOG:  Backend status file
> /opt/bitnami/pgpool/logs/pgpool_status does not exist
> 2023-09-18 10:29:52.241: main pid 14: LOG:
>  health_check_stats_shared_memory_size: requested size: 12288
> 2023-09-18 10:29:52.241: main pid 14: LOG:  memory cache initialized
> 2023-09-18 10:29:52.241: main pid 14: DETAIL:  memcache blocks :64
> 2023-09-18 10:29:52.241: main pid 14: LOG:  allocating (134563952) bytes of
> shared memory segment
> 2023-09-18 10:29:52.241: main pid 14: LOG:  allocating shared memory
> segment of size: 134563952
> 2023-09-18 10:29:52.345: main pid 14: LOG:
>  health_check_stats_shared_memory_size: requested size: 12288
> 2023-09-18 10:29:52.345: main pid 14: LOG:
>  health_check_stats_shared_memory_size: requested size: 12288
> 2023-09-18 10:29:52.345: main pid 14: LOG:  memory cache initialized
> 2023-09-18 10:29:52.345: main pid 14: DETAIL:  memcache blocks :64
> 2023-09-18 10:29:52.347: main pid 14: LOG:  pool_discard_oid_maps:
> discarded memqcache oid maps
> 2023-09-18 10:29:52.361: main pid 14: LOG:  unix_socket_directories[0]:
> /opt/bitnami/pgpool/tmp/.s.PGSQL.5432
> 2023-09-18 10:29:52.361: main pid 14: LOG:  listen address[0]: *
> 2023-09-18 10:29:52.363: main pid 14: LOG:  Setting up socket for
> 0.0.0.0:5432
> 2023-09-18 10:29:52.363: main pid 14: LOG:  Setting up socket for :::5432
> 2023-09-18 10:29:52.365: main pid 14: LOG:  perhaps failed to create INET
> domain socket
> 2023-09-18 10:29:52.365: main pid 14: DETAIL:  socket(::) failed: "Address
> family not supported by protocol"
> 2023-09-18 10:29:52.367: main pid 14: LOG:  find_primary_node_repeatedly:
> waiting for finding a primary node
> 2023-09-18 10:29:52.368: main pid 14: LOG:  listen address[0]: localhost
> 2023-09-18 10:29:52.370: main pid 14: LOG:  Setting up socket for
> 127.0.0.1:9898
> 2023-09-18 10:29:52.370: main pid 14: LOG:  Setting up socket for ::1:9898
> 2023-09-18 10:29:52.372: main pid 14: LOG:  perhaps failed to create INET
> domain socket
> 2023-09-18 10:29:52.372: main pid 14: DETAIL:  socket(::1) failed: "Address
> family not supported by protocol"
> 2023-09-18 10:29:52.373: pcp_main pid 163: LOG:  PCP process: 163 started
> 2023-09-18 10:29:52.373: health_check pid 165: LOG:  process started
> 2023-09-18 10:29:52.374: health_check pid 167: LOG:  process started
> 2023-09-18 10:29:52.374: health_check pid 166: LOG:  process started
> 2023-09-18 10:29:52.376: sr_check_worker pid 164: LOG:  process started
> 2023-09-18 10:29:52.381: main pid 14: LOG:  pgpool-II successfully started.
> version 4.4.3 (nurikoboshi)
> 2023-09-18 10:29:52.381: main pid 14: LOG:  node status[0]: 0
> 2023-09-18 10:29:52.381: main pid 14: LOG:  node status[1]: 0
> 2023-09-18 10:29:52.381: main pid 14: LOG:  node status[2]: 0
> 2023-09-18 10:32:44.537: pgAdmin 4 - CONN:1549034 pid 161: LOG:  reading
> and processing packets
> 2023-09-18 10:32:44.537: pgAdmin 4 - CONN:1549034 pid 161: DETAIL:
>  postmaster on DB node 0 was shutdown by administrative command
> 2023-09-18 10:32:44.537: pgAdmin 4 - CONN:1549034 pid 161: LOG:  received
> degenerate backend request for node_id: 0 from pid [161]
> 2023-09-18 10:32:44.537: pgAdmin 4 - CONN:1549034 pid 161: LOG:
>  signal_user1_to_parent_with_reason(0)
> 2023-09-18 10:32:44.537: main pid 14: LOG:  Pgpool-II parent process
> received SIGUSR1
> 2023-09-18 10:32:44.537: main pid 14: LOG:  Pgpool-II parent process has
> received failover request
> 2023-09-18 10:32:44.537: main pid 14: LOG:  === Starting degeneration.
> shutdown host node1(5432) ===
> >>> Failover - that will initialize new primary node search!
> 2023-09-18 10:32:44.551: main pid 14: LOG:  Restart all children
> 2023-09-18 10:32:44.551: main pid 14: LOG:  execute command: echo ">>>
> Failover - that will initialize new primary node search!"
> 2023-09-18 10:32:44.558: main pid 14: LOG:  find_primary_node_repeatedly:
> waiting for finding a primary node
> 2023-09-18 10:32:44.558: main pid 14: LOG:  failover: set new primary node:
> 0
> 2023-09-18 10:32:44.558: main pid 14: LOG:  failover: set new main node: 1
> 2023-09-18 10:32:44.561: sr_check_worker pid 164: LOG:  worker process
> received restart request
> 2023-09-18 10:32:44.561: main pid 14: LOG:  === Failover done. shutdown
> host node1(5432) ===
> 2023-09-18 10:32:45.561: pcp_main pid 163: LOG:  restart request received
> in pcp child process
> 2023-09-18 10:32:45.563: main pid 14: LOG:  PCP child 163 exits with status
> 0 in failover()
> 2023-09-18 10:32:45.564: main pid 14: LOG:  fork a new PCP child pid 179 in
> failover()
> 2023-09-18 10:32:45.564: main pid 14: LOG:  reaper handler
> 2023-09-18 10:32:45.565: pcp_main pid 179: LOG:  PCP process: 179 started
> 2023-09-18 10:32:45.565: main pid 14: LOG:  reaper handler: exiting normally
> 2023-09-18 10:32:45.565: sr_check_worker pid 180: LOG:  process started
> 2023-09-18 10:37:38.471: pgAdmin 4 - CONN:1549034 pid 175: LOG:  reading
> and processing packets
> 2023-09-18 10:37:38.471: pgAdmin 4 - CONN:1549034 pid 175: DETAIL:
>  postmaster on DB node 1 was shutdown by administrative command
> 2023-09-18 10:37:38.471: pgAdmin 4 - CONN:1549034 pid 175: LOG:  received
> degenerate backend request for node_id: 1 from pid [175]
> 2023-09-18 10:37:38.471: pgAdmin 4 - CONN:1549034 pid 175: LOG:
>  signal_user1_to_parent_with_reason(0)
> 2023-09-18 10:37:38.471: main pid 14: LOG:  Pgpool-II parent process
> received SIGUSR1
> 2023-09-18 10:37:38.471: main pid 14: LOG:  Pgpool-II parent process has
> received failover request
> 
> 
> *Configuration (grep -v "#")*
> 
> backend_clustering_mode = 'streaming_replication'
> listen_addresses = '*'
> port = '5432'
> unix_socket_directories = '/opt/bitnami/pgpool/tmp'
> pcp_socket_dir = '/opt/bitnami/pgpool/tmp'
> enable_pool_hba = 'on'
> pool_passwd = 'pool_passwd'
> authentication_timeout = '30'
> allow_clear_text_frontend_auth = 'off'
> num_init_children = '10'
> max_pool = '1'
> log_connections = 'off'
> log_hostname = 'off'
> log_per_node_statement = 'off'
> pid_file_name = '/opt/bitnami/pgpool/tmp/pgpool.pid'
> logdir = '/opt/bitnami/pgpool/logs'
> load_balance_mode = 'on'
> disable_load_balance_on_write = 'transaction'
> statement_level_load_balance = 'on'
> sr_check_period = '30'
> sr_check_user = 'replica'
> sr_check_database = 'postgres'
> health_check_period = '30'
> health_check_timeout = '10'
> health_check_user = 'replica'
> health_check_max_retries = '5'
> health_check_retry_delay = '5'
> connect_timeout = '10000'
> failover_command = 'echo ">>> Failover - that will initialize new primary
> node search!"'
> failover_on_backend_error = 'off'
> search_primary_node_timeout = '0'
> auto_failback = 'on'
> hostname0 = ''
> backend_hostname0 = 'node1'
> backend_port0 = 5432
> backend_weight0 = 2
> backend_data_directory0 = 'primary'
> backend_flag0 = 'ALWAYS_PRIMARY'
> backend_application_name0 = 'node1'
> backend_hostname1 = 'node2'
> backend_port1 = 5432
> backend_weight1 = 4
> backend_data_directory1 = 'replica'
> backend_flag1 = 'ALLOW_TO_FAILOVER'
> backend_application_name1 = 'node2'
> backend_hostname2 = 'node3'
> backend_port2 = 5432
> backend_weight2 = 4
> backend_data_directory2 = 'replica'
> backend_flag2 = 'ALLOW_TO_FAILOVER'
> backend_application_name2 = 'node3'

-- 
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS LLC
TEL: 03-5979-2701 FAX: 03-5979-2702
URL: https://www.sraoss.co.jp/