View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
598 [Pgpool-II] Bug minor always 2020-03-20 13:06 2020-04-03 02:49
Reporter: Xavok Platform:  
Assigned To: t-ishii OS:  
Priority: normal OS Version:  
Status: assigned Product Version: 4.1.0  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: segmentation fault
Description: Hi
I constantly get an error in the log:
WARNING: PCP process with pid: 23498 was terminated by segmentation fault
Is this normal?
In the attachment, my configuration file and extended log
I use 2xPostgreSQL 11.6 + 1 PgPool-II
Tags: pcp, pgpool
Steps To Reproduce:
Additional Information:
Attached Files: pgpool.log (1,152,940 bytes) 2020-03-20 13:06
https://www.pgpool.net/mantisbt/file_download.php?file_id=772&type=bug
pgpool.conf (40,916 bytes) 2020-03-20 13:06
https://www.pgpool.net/mantisbt/file_download.php?file_id=771&type=bug
gdb.txt (11,863 bytes) 2020-03-21 19:08
https://www.pgpool.net/mantisbt/file_download.php?file_id=773&type=bug
gdb.log (6,999 bytes) 2020-04-03 02:49
https://www.pgpool.net/mantisbt/file_download.php?file_id=777&type=bug
Notes
(0003273)
t-ishii   
2020-03-20 17:47   
(Last edited: 2020-03-20 18:04)
Of course it's not normal. Can you provide stack trace of the segfault process?

(0003274)
Xavok   
2020-03-21 19:08   
Added GBD, did you need this?
(0003286)
t-ishii   
2020-03-31 13:57   
Sorry for delay. gdb.txt looks like a stack trace of Pgpool-II parent process. What I wanted was the stack trace of pgpool child process which segfaults. To get the stack trace, you need to let the pgpool child process produce a core dump file, and take a stack trace from the core file. Can you please do that?
(0003297)
Xavok   
2020-04-03 02:49   
In the attachment


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
602 [Pgpool-II] Bug minor always 2020-04-02 03:06 2020-04-02 16:57
Reporter: awiller Platform:  
Assigned To: pengbo OS:  
Priority: normal OS Version:  
Status: assigned Product Version: 4.1.0  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: sr_health_check authentication fails with any other password type than cleartext password
Description: sr_health_check authentication fails with any other password type than cleartext password (tested in pgpool.conf).

The documentation for Version 4.1.1 states that also hashed or encrypted passwords should work, but that is not the case: https://www.pgpool.net/docs/latest/en/html/runtime-streaming-replication-check.html

Please find the log below:
Apr 01 20:01:34 dbtest-1 pgpool[21090]: 2020-04-01 20:01:34: pid 21176: ERROR: Failed to check replication time lag
Apr 01 20:01:34 dbtest-1 pgpool[21090]: 2020-04-01 20:01:34: pid 21176: DETAIL: No persistent db connection for the node 0
Apr 01 20:01:34 dbtest-1 pgpool[21090]: 2020-04-01 20:01:34: pid 21176: HINT: check sr_check_user and sr_check_password
Apr 01 20:01:34 dbtest-1 pgpool[21090]: 2020-04-01 20:01:34: pid 21176: CONTEXT: while checking replication time lag
Apr 01 20:01:34 dbtest-1 pgpool[21090]: 2020-04-01 20:01:34: pid 21176: DEBUG: pool_flush_it: flush size: 41
Apr 01 20:01:34 dbtest-1 pgpool[21090]: 2020-04-01 20:01:34: pid 21176: DEBUG: pool_read: read 13 bytes from backend 0
Apr 01 20:01:34 dbtest-1 pgpool[21090]: 2020-04-01 20:01:34: pid 21176: DEBUG: authenticate kind = 5
Apr 01 20:01:34 dbtest-1 pgpool[21090]: 2020-04-01 20:01:34: pid 21176: DEBUG: pool_write: to backend: 0 kind:p
Apr 01 20:01:34 dbtest-1 pgpool[21090]: 2020-04-01 20:01:34: pid 21176: DEBUG: pool_flush_it: flush size: 41
Apr 01 20:01:34 dbtest-1 pgpool[21090]: 2020-04-01 20:01:34: pid 21176: DEBUG: pool_read: read 123 bytes from backend 0
Apr 01 20:01:34 dbtest-1 pgpool[21090]: 2020-04-01 20:01:34: pid 21176: ERROR: authentication failed
Apr 01 20:01:34 dbtest-1 pgpool[21090]: 2020-04-01 20:01:34: pid 21176: DETAIL: Passwort-Authentifizierung für Benutzer »postgres« fehlgeschlagen
Tags:
Steps To Reproduce:
Additional Information: Related to: https://www.pgpool.net/mantisbt/view.php?id=464
Attached Files:
Notes
(0003295)
awiller   
2020-04-02 03:11   
I tested this with md5 and plaintext password type both directly in pgpool.conf and in pool_passwd file. Only text passwords work in both places, md5 passwords don't work anywhere.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
599 [Pgpool-II] General major N/A 2020-03-22 21:50 2020-04-02 15:00
Reporter: maiquelet Platform: Docker  
Assigned To: hoshiai OS: Alpine Linux  
Priority: normal OS Version: 3.11  
Status: feedback Product Version:  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: Fortinet discards lots of TCP packets with reason: Not a Valid SYN Packet
Description: Hello,

We are running PgPool 4.0.8 inside an Alpine Linux container (Docker). Access to the backend server goes through a firewall. It's a one PgPool, one Postgresql setup.

The problem is that we keep seeing a lot of discarded TCP packets with the message "Not a Valid SYN Packet". From this link: https://help.stonesoft.com/onlinehelp/StoneGate/SMC/6.4.0/GUID-F62E7E1A-7A2B-4B35-B70A-183A8C11FFB6.html, we get the meaning of the message:

“Not a Valid SYN Packet” is a TCP packet that is not the first packet of a TCP connection (the packet does not have the SYN flag set), but is not part of an existing connection either (there is no connection tracking entry on the Firewall matching this packet)

We thought that this could be caused by keepalive intervals that are longer that what the firewall is set to wait for an idle connection to be reused, so we set this settings in the container:
          net.ipv4.tcp_keepalive_time = 60
          net.ipv4.tcp_keepalive_intvl = 5
          net.ipv4.tcp_keepalive_probes = 3

Those net settings were picked from a container (running another technology) that is actually using them. But they don't seem to work with PgPool as we still got discarded packets for the same reason.

We are using a set of few known tuples: user/database/client_application and, as we are still at the preproduction stage, the number of queries over time are no more than 10 per minute and they are short and fast queries. You can find attached our PgPool configuration and I can tell that we are not reaching the "max_pool * num_init_children" limit (we use half of that). Also, we are using a Python script to check the pg_stat_activity table and, sometimes, this script can take almost 30 seconds to complete (and sometimes I have been waiting for minutes), while sometimes it only takes a second.

The thing is that we ran out of ideas and we don't know where else to look.

If someone has any idea, we would really appreciate it. We are planning on using PgPool for production purposes, so we'll keep trying.

Thank you very much,

Miquel.
Tags: Alpine, firewall, networking, TCP discards
Steps To Reproduce:
Additional Information: ----- PgPool configuration
listen_addresses = '*'
port = 5432
socket_dir = '/tmp'
pcp_listen_addresses = '*'
pcp_port = 9898
pcp_socket_dir = '/tmp'
listen_backlog_multiplier = 2
serialize_accept = off
backend_hostname0 = 'some_database'
backend_port0 = 5432
backend_weight0 = 10
backend_data_directory0 = '/data'
backend_flag0 = DISALLOW_TO_FAILOVER
enable_pool_hba = on
pool_passwd = 'pool_passwd'
authentication_timeout = 60
ssl = off
num_init_children = 40
max_pool = 5
child_life_time = 300
child_max_connections = 600
connection_life_time = 0
client_idle_limit = 0
log_destination = 'stderr'
log_line_prefix = '%t: pid %p: ' # printf-style string to output at beginning of each log line.
log_connections = off
log_hostname = off
log_statement = off
log_per_node_statement = off
log_standby_delay = 'if_over_threshold'
syslog_facility = 'LOCAL0'
syslog_ident = 'pgpool'
pid_file_name = '/var/run/pgpool/pgpool.pid'
logdir = '/tmp'
connection_cache = on
reset_query_list = 'ABORT; DISCARD ALL'
replication_mode = off
replicate_select = off
insert_lock = off
lobj_lock_table = ''
replication_stop_on_mismatch = off
failover_if_affected_tuples_mismatch = off
load_balance_mode = off
ignore_leading_white_space = on
white_function_list = ''
black_function_list = 'currval,lastval,nextval,setval'
database_redirect_preference_list = ''
app_name_redirect_preference_list = ''
allow_sql_comments = off
master_slave_mode = off
master_slave_sub_mode = 'stream'
sr_check_period = 10
sr_check_user = some_user'
sr_check_password = 'some_pass'
sr_check_database = 'postgres'
delay_threshold = 10000000
follow_master_command = ''
health_check_period = 0
health_check_timeout = 20
health_check_user = 'nobody'
health_check_password = ''
health_check_database = ''
health_check_max_retries = 0
health_check_retry_delay = 1
connect_timeout = 10000
health_check_period0 = 0
health_check_timeout0 = 20
health_check_user0 = 'nobody'
health_check_password0 = ''
health_check_database0 = ''
health_check_max_retries0 = 0
health_check_retry_delay0 = 1
connect_timeout0 = 10000
failover_command = ''
failback_command = ''
fail_over_on_backend_error = on
search_primary_node_timeout = 300
recovery_user = 'nobody'
recovery_password = ''
recovery_1st_stage_command = ''
recovery_2nd_stage_command = ''
recovery_timeout = 90
client_idle_limit_in_recovery = 0
use_watchdog = off
trusted_servers = ''
ping_path = '/bin'
wd_hostname = ''
wd_port = 9000
wd_priority = 1
wd_authkey = ''
wd_ipc_socket_dir = '/tmp'
delegate_IP = ''
if_cmd_path = '/sbin'
if_up_cmd = 'ip addr add $_IP_$/24 dev eth0 label eth0:0'
if_down_cmd = 'ip addr del $_IP_$/24 dev eth0'
arping_path = '/usr/sbin'
arping_cmd = 'arping -U $_IP_$ -w 1'
clear_memqcache_on_escalation = on
wd_escalation_command = ''
wd_de_escalation_command = ''
failover_when_quorum_exists = on
failover_require_consensus = on
allow_multiple_failover_requests_from_node = off
wd_monitoring_interfaces_list = '' # Comma separated list of interfaces names to monitor.
wd_lifecheck_method = 'heartbeat'
wd_interval = 10
wd_heartbeat_port = 9694
wd_heartbeat_keepalive = 2
wd_heartbeat_deadtime = 30
heartbeat_destination0 = 'host0_ip1'
heartbeat_destination_port0 = 9694
heartbeat_device0 = ''
wd_life_point = 3
wd_lifecheck_query = 'SELECT 1'
wd_lifecheck_dbname = 'template1'
wd_lifecheck_user = 'nobody'
wd_lifecheck_password = ''
relcache_expire = 0
relcache_size = 256
check_temp_table = on
check_unlogged_table = on
memory_cache_enabled = off
memqcache_method = 'shmem'
memqcache_memcached_host = 'localhost'
memqcache_memcached_port = 11211
memqcache_total_size = 67108864
memqcache_max_num_cache = 1000000
memqcache_expire = 0
memqcache_auto_cache_invalidation = on
memqcache_maxcache = 409600
memqcache_cache_block_size = 1048576
memqcache_oiddir = '/var/log/pgpool/oiddir'
white_memqcache_table_list = ''
black_memqcache_table_list = ''
Attached Files: logsSample.txt (393,921 bytes) 2020-03-25 19:38
https://www.pgpool.net/mantisbt/file_download.php?file_id=774&type=bug
Notes
(0003283)
hoshiai   
2020-03-25 16:09   
Pgpool-II don't has keep_alive paramters, so ordinarily use OS parameters. as was already done by you.

Do your Pgpool and PostgreSQL output any error meesages? Please share your log file.
(0003284)
maiquelet   
2020-03-25 19:38   
Hello, you will find attached one log fragment, I couldn't get the ones generated at the sservice' start as this instance has been running for days. We have a bunch of those firewall errors every minute, so it might be enough. If not, just let me know.

This is the most strange behaviour we have seen so far, it happens wevery now and then:

2020-03-25 06:27:48: pid 2380: WARNING: write on backend 0 failed with error :"No error information"
2020-03-25 06:27:48: pid 2380: DETAIL: while trying to write data from offset: 0 wlen: 5
2020-03-25 06:27:48: pid 1: DEBUG: reaper handler
2020-03-25 06:27:48: pid 1: LOG: child process with pid: 2380 exits with status 256
2020-03-25 06:27:48: pid 1: LOG: fork a new child process with pid: 2406
2020-03-25 06:27:48: pid 1: DEBUG: reaper handler: exiting normally
2020-03-25 06:27:48: pid 2406: DEBUG: initializing backend status

Thanks again!
(0003285)
maiquelet   
2020-03-27 19:08   
Hello, from now on, my colleague 'juananglobalia' will take my place in troubleshooting the issue.
(0003296)
hoshiai   
2020-04-02 15:00   
Could you get tcpdump between pgpool and postgresql on this server?
We can confirm that keep alive working by this dump. If enable keep alive, length 0 tcp message is appeared.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
601 [Pgpool-II] Bug major random 2020-03-31 17:28 2020-04-01 17:19
Reporter: mvineza Platform: Container  
Assigned To: pengbo OS: Centos  
Priority: normal OS Version: 7.7.1908  
Status: assigned Product Version: 4.1.0  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: Some SQL queries are missed
Description: I have AWX running inside kubernetes. The components (web and task/scheduler) connects to postgresql. All of them are containers running as kubernetes pods. For 2 months that setup is running fine without connection issues between the pods.

web (pod) --> postgres (pod)
task/scheduler (pod) --> postgres (pod)

Due to lack of persistent storage in kubernetes cluster for postgres, I decided to move postgresql outside of kubernetes so it will be now a traditional virtual machine (1 master and 1 slave) instead of a container. Then I deployed pgpool inside kubernete and remove the postgres pod. The current setup now is:

web (pod) --> pgpool (pod) --> postgres master/slave (virtual machine)
task/scheduler (pod) --> pgpool (pod) --> postgres master/slave (virtual machine)

During that setup, I encounter connection issues such as some SQL statements are not sent properly to backend postgres. I notice that scenario when I encounter this error from the task/scheduler pod:

  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/query.py", line 408, in get
    self.model._meta.object_name
awx.main.models.jobs.Job.DoesNotExist: Job matching query does not exist.

I looked closely and found out that some SQL queries are not being handled by pgpool. In effect, the queries never reached the backend postgres servers.

I uploaded here an image of the missed job in AWX that corresponds to the missed SQL query (miss_awx_job.png). From the image job 2214 is successful and ran, job 2216 is missed, job 2217 is successful, and so on.

Checking the pgpool logs, I see the job ID 2217 is there.

➜ ~ ☿ kubectl logs dev-pgpool-77949cf588-mx25h | egrep '2020-03-31' | grep dev_execute_command | egrep 'INSERT INTO "main_activitystream"' | egrep '"id": 2217'
2020-03-31 05:04:28: pid 36: LOG: statement: INSERT INTO "main_activitystream" ("actor_id", "operation", "timestamp", "changes", "deleted_actor", "action_node", "object_relationship_type", "object1", "object2", "setting") VALUES (NULL, 'create', '2020-03-31T05:04:28.282003+00:00'::timestamptz, '{"name": "dev_execute_command", "description": "", "job_type": "run", "inventory": "awx-playbooks-dev-1", "project": "awx-playbooks-6", "playbook": "playbooks/execute_command.yaml", "scm_branch": "", "forks": 0, "limit": "", "verbosity": 0, "extra_vars": "{\"command\": \"date\"}", "job_tags": "", "force_handlers": false, "skip_tags": "", "start_at_task": "", "timeout": 0, "use_fact_cache": false, "job_template": "dev_execute_command-18", "allow_simultaneous": false, "instance_group": null, "diff_mode": false, "job_slice_number": 0, "job_slice_count": 1, "id": 2217, "credentials": [], "labels": []}', NULL, 'awx', '', 'job', '', '{}') RETURNING "main_activitystream"."id"
➜ ~ ☿

But job id 2216 is not there.

➜ ~ ☿ kubectl logs dev-pgpool-77949cf588-mx25h | egrep '2020-03-31' | grep dev_execute_command | egrep 'INSERT INTO "main_activitystream"' | egrep '"id": 2216'
➜ ~ ☿

That is a major issue since our AWX application will run several batch jobs regularly and there is a high chance that a job will be missed.

I tried playing around to on some connection settings with pgpool like setting "child_max_connections" to "1" but no luck.

Here are the versions of the software involved:

AWX: 7.0.0
Pgpool: 4.1.1
Postgres: 12
Tags:
Steps To Reproduce:
Additional Information:
Attached Files: missed_awx_job.png (294,037 bytes) 2020-03-31 17:28
https://www.pgpool.net/mantisbt/file_download.php?file_id=776&type=bug
Notes
(0003290)
pengbo   
2020-04-01 09:45   
(Last edited: 2020-04-01 10:41)
If you are using postgres master/slave, you should set "master_slave_mode = on" and "replication_mode = off" in pgpool.conf.
Or you can copy the sample config file for streaming replication.

     # cp /etc/pgpool-II/pgpool.conf.sample-stream /etc/pgpool-II/pgpool.conf

Could you share more pgpool log and pgpool.conf?

    # kubectl logs dev-pgpool-77949cf588-mx25h | egrep '2020-03-31'

(0003291)
mvineza   
2020-04-01 13:43   
Here is the pgpool.conf.

# - pgpool Connection Settings -

listen_addresses = '*'
port = '5432'
socket_dir = '/tmp'
reserved_connections = 0

# - pgpool Communication Manager Connection Settings -

pcp_listen_addresses = '*'
pcp_port = 9898
pcp_socket_dir = '/tmp'
listen_backlog_multiplier = 2
serialize_accept = off

# - Backend Connection Settings -

backend_hostname0 = 'awxpostgresql0'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/pgsql/12/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_application_name0 = 'awxpostgresql0'
backend_hostname1 = 'awxpostgresql1'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/pgsql/12/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_application_name1 = 'awxpostgresql1'

# - Authentication -

enable_pool_hba = off
pool_passwd = 'pool_passwd'
authentication_timeout = '30'
allow_clear_text_frontend_auth = off

# - SSL Connections -

ssl = off
ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL'
ssl_prefer_server_ciphers = off
ssl_ecdh_curve = 'prime256v1'
ssl_dh_params_file = ''

# - Concurrent session and pool size -

num_init_children = 32
max_pool = '15'

# - Life time -

child_life_time = 300
child_max_connections = 1
connection_life_time = 0
client_idle_limit = 0

# - Where to log -

log_destination = 'stderr'

# - What to log -

log_line_prefix = '%t: pid %p: '
log_connections = off
log_hostname = off
log_statement = off
log_per_node_statement = off
log_client_messages = off
log_standby_delay = 'if_over_threshold'

# - Syslog specific -

syslog_facility = 'LOCAL0'
syslog_ident = 'pgpool'

# - File Locations -

pid_file_name = '/tmp/pgpool.pid'
logdir = '/var/log'

# - Connection Pooling -

connection_cache = on
reset_query_list = 'ABORT; DISCARD ALL'

# - Replication Mode -

replication_mode = off
replicate_select = off
insert_lock = off
lobj_lock_table = ''

# - Degenerate handling -

replication_stop_on_mismatch = off
failover_if_affected_tuples_mismatch = off

# - Load Balancing Mode -

load_balance_mode = 'on'
ignore_leading_white_space = on
white_function_list = ''
black_function_list = 'nextval,setval'
black_query_pattern_list = ''
database_redirect_preference_list = ''
app_name_redirect_preference_list = ''
allow_sql_comments = off
disable_load_balance_on_write = 'transaction'
statement_level_load_balance = off

# - Master/Slave Mode -

master_slave_mode = on
master_slave_sub_mode = 'stream'

# - Streaming -

sr_check_period = '30'
sr_check_user = 'postgres'
sr_check_password = 'postgres'
sr_check_database = 'postgres'
delay_threshold = 10000000

# - Special commands -

follow_master_command = ''

# - Healthcheck -

health_check_period = '30'
health_check_timeout = '10'
health_check_user = 'postgres'
health_check_password = 'postgres'
health_check_database = ''
health_check_max_retries = '5'
health_check_retry_delay = '5'
connect_timeout = 10000

# - Failover -

failover_command = '/failover.sh %d %h %p %D %m %H %M %P %r %R %N %S'
failback_command = ''
failover_on_backend_error = 'on'
detach_false_primary = off
search_primary_node_timeout = '0'

# - Online Recovery -

recovery_user = 'nobody'
recovery_password = ''
recovery_1st_stage_command = 'recovery_1st_stage'
recovery_2nd_stage_command = ''
recovery_timeout = 90
client_idle_limit_in_recovery = 0
auto_failback = off
auto_failback_interval = 60

# - Watchdog Settings -

use_watchdog = off
trusted_servers = ''
ping_path = '/bin'
wd_hostname = ''
wd_port = 9000
wd_priority = 1
wd_authkey = ''
wd_ipc_socket_dir = '/tmp'
delegate_IP = ''
if_cmd_path = '/sbin'
if_up_cmd = '/usr/bin/sudo /sbin/ip addr add $_IP_$/24 dev eth0 label eth0:0'
if_down_cmd = '/usr/bin/sudo /sbin/ip addr del $_IP_$/24 dev eth0'
arping_path = '/usr/sbin'
# - pgpool Connection Settings -

listen_addresses = '*'
port = '5432'
socket_dir = '/tmp'
reserved_connections = 0

# - pgpool Communication Manager Connection Settings -

pcp_listen_addresses = '*'
pcp_port = 9898
pcp_socket_dir = '/tmp'
listen_backlog_multiplier = 2
serialize_accept = off

# - Backend Connection Settings -

backend_hostname0 = 'awxpostgresql0'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/pgsql/12/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_application_name0 = 'awxpostgresql0'
backend_hostname1 = 'awxpostgresql1'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/pgsql/12/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_application_name1 = 'awxpostgresql1'

# - Authentication -

enable_pool_hba = off
pool_passwd = 'pool_passwd'
authentication_timeout = '30'
allow_clear_text_frontend_auth = off

# - SSL Connections -

ssl = off
ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL'
ssl_prefer_server_ciphers = off
ssl_ecdh_curve = 'prime256v1'
ssl_dh_params_file = ''

# - Concurrent session and pool size -

num_init_children = 32
max_pool = '15'

# - Life time -

child_life_time = 300
child_max_connections = 1
connection_life_time = 0
client_idle_limit = 0

# - Where to log -

log_destination = 'stderr'

# - What to log -

log_line_prefix = '%t: pid %p: '
log_connections = off
log_hostname = off
log_statement = off
log_per_node_statement = off
log_client_messages = off
log_standby_delay = 'if_over_threshold'

# - Syslog specific -

syslog_facility = 'LOCAL0'
syslog_ident = 'pgpool'

# - File Locations -

pid_file_name = '/tmp/pgpool.pid'
logdir = '/var/log'

# - Connection Pooling -

connection_cache = on
reset_query_list = 'ABORT; DISCARD ALL'

# - Replication Mode -

replication_mode = off
replicate_select = off
insert_lock = off
lobj_lock_table = ''

# - Degenerate handling -

replication_stop_on_mismatch = off
failover_if_affected_tuples_mismatch = off

# - Load Balancing Mode -

load_balance_mode = 'on'
ignore_leading_white_space = on
white_function_list = ''
black_function_list = 'nextval,setval'
black_query_pattern_list = ''
database_redirect_preference_list = ''
app_name_redirect_preference_list = ''
allow_sql_comments = off
disable_load_balance_on_write = 'transaction'
statement_level_load_balance = off

# - Master/Slave Mode -

master_slave_mode = on
master_slave_sub_mode = 'stream'

# - Streaming -

sr_check_period = '30'
sr_check_user = 'postgres'
sr_check_password = 'postgres'
sr_check_database = 'postgres'
delay_threshold = 10000000

# - Special commands -

follow_master_command = ''

# - Healthcheck -

health_check_period = '30'
health_check_timeout = '10'
health_check_user = 'postgres'
health_check_password = 'postgres'
health_check_database = ''
health_check_max_retries = '5'
health_check_retry_delay = '5'
connect_timeout = 10000

# - Failover -

failover_command = '/failover.sh %d %h %p %D %m %H %M %P %r %R %N %S'
failback_command = ''
failover_on_backend_error = 'on'
detach_false_primary = off
search_primary_node_timeout = '0'

# - Online Recovery -

recovery_user = 'nobody'
recovery_password = ''
recovery_1st_stage_command = 'recovery_1st_stage'
recovery_2nd_stage_command = ''
recovery_timeout = 90
client_idle_limit_in_recovery = 0
auto_failback = off
auto_failback_interval = 60

# - Watchdog Settings -

use_watchdog = off
trusted_servers = ''
ping_path = '/bin'
wd_hostname = ''
wd_port = 9000
wd_priority = 1
wd_authkey = ''
arping_cmd = '/usr/bin/sudo /usr/sbin/arping -U $_IP_$ -w 1 -I eth0'
clear_memqcache_on_escalation = on
wd_escalation_command = ''
wd_de_escalation_command = ''
failover_when_quorum_exists = off
failover_require_consensus = off
allow_multiple_failover_requests_from_node = off
enable_consensus_with_half_votes = off
wd_monitoring_interfaces_list = ''
wd_lifecheck_method = 'heartbeat'
wd_interval = 10
wd_heartbeat_port = 9694
wd_heartbeat_keepalive = 2
wd_heartbeat_deadtime = 30
heartbeat_destination0 = 'host0_ip1'
heartbeat_destination_port0 = 9694
heartbeat_device0 = ''
wd_life_point = 3
wd_lifecheck_query = 'SELECT 1'
wd_lifecheck_dbname = 'template1'
wd_lifecheck_user = 'nobody'
wd_lifecheck_password = ''

# - Other Settings -

relcache_expire = 0
relcache_size = 256
check_temp_table = catalog
check_unlogged_table = on
enable_shared_relcache = on
relcache_query_target = master
memory_cache_enabled = off
memqcache_method = 'shmem'
memqcache_memcached_host = 'localhost'
memqcache_memcached_port = 11211
memqcache_total_size = 67108864
memqcache_max_num_cache = 1000000
memqcache_expire = 0
memqcache_auto_cache_invalidation = on
memqcache_maxcache = 409600
memqcache_cache_block_size = 1048576
memqcache_oiddir = '/var/log/pgpool/oiddir'
white_memqcache_table_list = ''
black_memqcache_table_list = ''
(0003292)
mvineza   
2020-04-01 13:46   
The old pgpool logs is no longer there because our kubernetes cluster has limits on the size. So I will generate again and put it here.
(0003293)
pengbo   
2020-04-01 14:50   
I checked your pgpool.conf, it is fine.

> The old pgpool logs is no longer there because our kubernetes cluster has limits on the size. So I will generate again and put it here.

OK. I wait for your logs.
(0003294)
mvineza   
2020-04-01 17:19   
There are confidential data on the logs and its hard to filter the output since they are so many of them. So I think this would work for our investigation:

2020-04-01 05:17:14: pid 11: DETAIL: connecting host=a.b.c.d port=44468
2020-04-01 05:17:14: pid 11: LOG: statement: SET TIME ZONE 'UTC'
2020-04-01 05:17:14: pid 11: LOG: statement: SELECT xyz
2020-04-01 05:17:14: pid 11: LOG: statement: SELECT xyz
2020-04-01 05:17:14: pid 11: LOG: statement: SELECT xyz
....
2020-04-01 05:17:15: pid 11: LOG: statement: SELECT xyz
2020-04-01 05:17:15: pid 11: LOG: statement: SELECT xyz
2020-04-01 05:17:15: pid 11: LOG: statement: SELECT xyz
2020-04-01 05:17:15: pid 38: LOG: statement: SELECT xyz
2020-04-01 05:17:16: pid 11: LOG: statement: INSERT INTO "main_activitystream" ... # --> the job ID will appear here
2020-04-01 05:17:16: pid 38: LOG: statement: SELECT xyz
2020-04-01 05:17:16: pid 11: LOG: statement: SELECT xyz
2020-04-01 05:17:16: pid 11: LOG: statement: SELECT xyz
2020-04-01 05:17:16: pid 38: LOG: statement: SELECT xyz
2020-04-01 05:17:16: pid 11: LOG: statement: SELECT xyz
2020-04-01 05:17:16: pid 38: LOG: statement: SELECT xyz
2020-04-01 05:17:16: pid 11: LOG: statement: DECLARE xyz
2020-04-01 05:17:16: pid 11: LOG: statement: FETCH FORWARD 2000 FROM "_django_curs_140286434461504_333"
2020-04-01 05:17:16: pid 38: LOG: statement: SELECT xyz
2020-04-01 05:17:16: pid 11: LOG: statement: CLOSE "_django_curs_140286434461504_333"

But in general, once a job in awx is launched, you will see something similar to the one above when "log_statement" is set to "on".

With "log_statement = 'off'" and "child_max_connections = 1", the pgpool logs would look like this whenever a job is launched in AWX.

2020-04-01 08:15:02: pid 1: LOG: Backend status file /var/log/pgpool_status does not exist
2020-04-01 08:15:02: pid 1: LOG: memory cache initialized
2020-04-01 08:15:02: pid 1: DETAIL: memcache blocks :64
2020-04-01 08:15:02: pid 1: LOG: pool_discard_oid_maps: discarded memqcache oid maps
2020-04-01 08:15:02: pid 1: LOG: Setting up socket for 0.0.0.0:5432
2020-04-01 08:15:02: pid 1: LOG: Setting up socket for :::5432
2020-04-01 08:15:02: pid 1: LOG: find_primary_node_repeatedly: waiting for finding a primary node
2020-04-01 08:15:02: pid 1: LOG: find_primary_node: primary node is 0
2020-04-01 08:15:02: pid 1: LOG: find_primary_node: standby node is 1
2020-04-01 08:15:02: pid 40: LOG: PCP process: 40 started
2020-04-01 08:15:02: pid 1: LOG: pgpool-II successfully started. version 4.1.1 (karasukiboshi)
2020-04-01 08:15:02: pid 1: LOG: node status[0]: 1
2020-04-01 08:15:02: pid 1: LOG: node status[1]: 2
2020-04-01 08:15:37: pid 11: LOG: pool_reuse_block: blockid: 0
2020-04-01 08:15:37: pid 11: CONTEXT: while searching system catalog, When relcache is missed
2020-04-01 08:15:37: pid 11: LOG: child exiting, 1 connections reached
2020-04-01 08:15:37: pid 1: LOG: child process with pid: 11 exits with status 256
2020-04-01 08:15:37: pid 1: LOG: fork a new child process with pid: 44
2020-04-01 08:15:37: pid 17: LOG: child exiting, 1 connections reached
2020-04-01 08:15:37: pid 1: LOG: child process with pid: 17 exits with status 256
2020-04-01 08:15:37: pid 1: LOG: fork a new child process with pid: 45
2020-04-01 08:15:37: pid 14: LOG: child exiting, 1 connections reached
2020-04-01 08:15:37: pid 1: LOG: child process with pid: 14 exits with status 256
2020-04-01 08:15:37: pid 1: LOG: fork a new child process with pid: 46
2020-04-01 08:15:38: pid 44: LOG: child exiting, 1 connections reached
2020-04-01 08:15:38: pid 1: LOG: child process with pid: 44 exits with status 256
2020-04-01 08:15:38: pid 1: LOG: fork a new child process with pid: 47
2020-04-01 08:15:38: pid 23: LOG: child exiting, 1 connections reached
2020-04-01 08:15:38: pid 1: LOG: child process with pid: 23 exits with status 256
2020-04-01 08:15:38: pid 1: LOG: fork a new child process with pid: 48
2020-04-01 08:15:40: pid 22: LOG: child exiting, 1 connections reached
2020-04-01 08:15:40: pid 1: LOG: child process with pid: 22 exits with status 256
2020-04-01 08:15:40: pid 1: LOG: fork a new child process with pid: 49
2020-04-01 08:15:40: pid 47: LOG: child exiting, 1 connections reached
2020-04-01 08:15:40: pid 1: LOG: child process with pid: 47 exits with status 256
2020-04-01 08:15:40: pid 1: LOG: fork a new child process with pid: 50
2020-04-01 08:15:45: pid 26: LOG: child exiting, 1 connections reached
2020-04-01 08:15:45: pid 1: LOG: child process with pid: 26 exits with status 256
2020-04-01 08:15:45: pid 1: LOG: fork a new child process with pid: 51

But in the case of missing job, its entirely not visible on the pgpool logs. Seems it even never reached pgpool because I can't even see the job ID.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
597 [Pgpool-II] Bug major always 2020-03-13 02:11 2020-04-01 04:07
Reporter: dcvythoulkas Platform: PostgreSQL 12  
Assigned To: OS: Debian  
Priority: high OS Version: 10.3  
Status: new Product Version: 4.1.0  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: pgpool2 fails to run the if_down_cmd and arping_cmd
Description: This a two node installation using watchdog with the gateway as the trusted server.

When pgpool2 shuts down on the master node, the secondary succsessfully gets promoted to master and raises the delegated IP. However the former master never releases the IP, so both nodes hold the IP. The only choice is to use 'ip addr del DELEGATED_IP/NETMASK dev NIC' to release the IP.
Also the new master node does not notify the local network about acquiring the delegated IP and therefore clients cannot connect.
Tags: virtual ip, watchdog
Steps To Reproduce: - Node1 master & Node2 standby
- Shutdown pgpool2 process on node1. pgpool2 on node2 promotes to master and raises delagated IP
- Node1 still has the delegated IP. Use ip addr dell command to remove
- Node2 has not announced taking the delegated ip and client cannot connect
Additional Information:
Attached Files: pgpool.conf (44,216 bytes) 2020-03-16 20:22
https://www.pgpool.net/mantisbt/file_download.php?file_id=770&type=bug
pgpool-cn1.log (23,214 bytes) 2020-03-16 20:22
https://www.pgpool.net/mantisbt/file_download.php?file_id=769&type=bug
pgpool-cn2.log (18,328 bytes) 2020-03-16 20:22
https://www.pgpool.net/mantisbt/file_download.php?file_id=768&type=bug
Notes
(0003269)
Muhammad Usama   
2020-03-13 23:31   
Can you please share the pgpool config and log files.
(0003271)
dcvythoulkas   
2020-03-16 20:22   
pgpool.conf is from node ...cn1, node cn2 has an equivalent configuration. The setup is deployed by Ansible.
Initially, cn2 was master and cn1 was standby. cn2 pgpool2 process was shut down and cn1 took over and raised the virtual IP.
However, cn2 never released the virtual IP and the local network was never notified about the change.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
600 [Pgpool-II] Bug minor have not tried 2020-03-30 21:07 2020-03-31 23:05
Reporter: sajithts Platform:  
Assigned To: pengbo OS:  
Priority: normal OS Version:  
Status: assigned Product Version: 4.1.0  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: Pgpool ii Node status 'waiting'
Description: When I try "show pool_status", some of the node status is shown 'waiting'

I tried "pcp_attach_node -U postgres -h 192.168.29.168 2" from one of the standby. But issue still persists
Tags: pgpool
Steps To Reproduce:
Additional Information:
Attached Files: pgpool status.JPG (77,049 bytes) 2020-03-30 21:07
https://www.pgpool.net/mantisbt/file_download.php?file_id=775&type=bug
Notes
(0003287)
pengbo   
2020-03-31 17:26   
It is the normal behaviour.

"Waiting" means the PostgreSQL server is up and running but has not been connected from Pgpool-II yet.
"waiting" can be safely assumed same as "up".
So once a client connects to Pgpool-II, it connects to PostgreSQL and the status will be turned to "up".
(0003288)
sajithts   
2020-03-31 22:52   
@pengbo:

I tried "pcp_attach_node -U postgres -h 192.168.29.168 2" from one of the standby. Is there another way to connect clients to Pgpool-II?

Can you please elaborate the steps?

Thanks.
(0003289)
sajithts   
2020-03-31 23:05   
@pengbo:
This is my environment setup:
>1 Master node
>2 Standby nodes
>1 PgPool-II node
> CentOs7
>EDB Advanced Server Version 10
> EFM Version 3.9
>PgPool-II Version 4.1

I had follow the below steps:
     Step.1) Setup streaming replication by editing the postgresql.conf file
     Step.2) Install and configure EFM 3.9 for Automatic failover
     Step.3) Install and configure PgPool-II 4.1 for load balancing.

Step 1 & 2 are working fine.
But loadbalancing is not working.

Thanks.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
594 [Pgpool-II] Bug major unable to reproduce 2020-03-06 17:18 2020-03-25 15:01
Reporter: kanika.kamboj Platform:  
Assigned To: pengbo OS: Linux  
Priority: urgent OS Version: 7.7  
Status: assigned Product Version: 4.0.7  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: Problem on Connection Pooling
Description: I am trying connection Pooling on one server. PostgreSQL max_connections are 450. But still getting 64676 pgpool: wait for connection request(36 times). Can you please help to initially configure pg pool on PostgreSQL 9.6. I have attached pgpool.conf

Tags: pgpool-II
Steps To Reproduce:
Additional Information:
Attached Files: pgpool.conf (40,730 bytes) 2020-03-06 17:18
https://www.pgpool.net/mantisbt/file_download.php?file_id=759&type=bug
Notes
(0003253)
kanika.kamboj   
2020-03-06 17:27   
Logs are also not being generated. Pg pool version 4.0.8
(0003255)
pengbo   
2020-03-10 10:50   
> I am trying connection Pooling on one server. PostgreSQL max_connections are 450. But still getting 64676 pgpool: wait for connection request(36 times).

It is a normal feature.
"pgpool: wait for connection request" means that this pgpool child process can accept client connection and is waiting for client connection.
Pgpool-II forks the number of pgpool processes specified in "num_init_children" parameter initially at startup.

See:
https://www.pgpool.net/docs/latest/en/html/runtime-config-connection.html#GUC-NUM-INIT-CHILDREN


> Logs are also not being generated. Pg pool version 4.0.8

Are you using 4.0.7?
It was a bug that no syslog output. Please upgrade to Pgpool-II 4.0.8.
(0003258)
kanika.kamboj   
2020-03-11 15:35   
I am already using Pg pool 4.0.8 . But still logs are not generating.

When I use ps -ef | grep postgres.I have attached the output for the same. Please confirm pg pool is configured properly.

root 523 1 0 10:28 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 530 523 0 10:28 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 535 657 41 10:28 ? 00:01:09 postgres: postgres automation_tn_2g [local] DELETE
root 555 1 0 10:28 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 563 555 0 10:28 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 567 657 48 10:28 ? 00:01:19 postgres: postgres automation_tn_2g [local] DELETE
root 627 1 0 10:28 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 638 627 0 10:28 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 639 657 46 10:28 ? 00:01:17 postgres: postgres automation_tn_2g [local] DELETE
postgres 657 1 0 Feb25 ? 00:19:12 /opt/PostgreSQL/9.5/bin/postgres -D /PM_APPLICATION/pmautomation/PM/postgres-9.5-data
postgres 667 657 0 Feb25 ? 00:00:00 postgres: logger process
postgres 678 657 0 Feb25 ? 00:48:36 postgres: checkpointer process
postgres 679 657 0 Feb25 ? 00:01:39 postgres: writer process
postgres 680 657 0 Feb25 ? 00:25:01 postgres: wal writer process
postgres 681 657 0 Feb25 ? 00:37:01 postgres: autovacuum launcher process
postgres 682 657 0 Feb25 ? 00:41:32 postgres: stats collector process
root 696 1 0 10:28 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 705 696 0 10:28 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 708 657 46 10:28 ? 00:01:15 postgres: postgres airtel_automation_tn_2g [local] DELETE
root 1185 1 0 10:28 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 1195 1185 0 10:28 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 1202 657 46 10:28 ? 00:01:10 postgres: postgres automation_tn_2g [local] DELETE
root 1205 1 0 10:17 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 1215 1205 0 10:17 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 1223 657 38 10:17 ? 00:05:28 postgres: postgres automation_tn_2g [local] DELETE
root 1375 1 0 10:28 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 1383 1375 0 10:28 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 1388 657 40 10:28 ? 00:01:00 postgres: postgres automation_tn_2g [local] DELETE
postgres 1404 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
root 1525 1 0 10:29 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 1534 1525 0 10:29 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 1542 657 40 10:29 ? 00:00:57 postgres: postgres automation_tn_2g [local] DELETE
root 1708 1 0 10:29 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 1716 1708 0 10:29 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 1719 657 46 10:29 ? 00:01:03 postgres: postgres automation_tn_2g [local] DELETE
root 1908 1 0 10:29 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 1917 1908 0 10:29 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 1924 657 38 10:29 ? 00:00:50 postgres: postgres automation_tn_2g [local] DELETE
root 2000 1 0 10:29 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 2009 2000 0 10:29 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 2018 657 44 10:29 ? 00:00:56 postgres: postgres automation_tn_2g [local] DELETE
root 2119 1 0 10:29 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 2124 2119 0 10:29 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 2125 657 44 10:29 ? 00:00:55 postgres: postgres automation_tn_2g [local] DELETE
root 2455 1 0 10:29 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 2464 2455 0 10:29 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 2465 657 37 10:29 ? 00:00:42 postgres: postgres automation_tn_2g [local] DELETE
root 2577 1 0 10:29 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 2587 2577 0 10:29 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 2595 657 46 10:29 ? 00:00:50 postgres: postgres automation_tn_2g [local] DELETE
root 2671 1 0 10:29 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 2678 2671 0 10:29 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 2681 657 45 10:29 ? 00:00:48 postgres: postgres automation_tn_2g [local] DELETE
root 2772 1 0 10:29 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 2779 2772 0 10:29 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 2816 657 37 10:29 ? 00:00:38 postgres: postgres automation_tn_2g [local] DELETE
root 2941 1 0 10:29 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 2948 2941 0 10:29 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 2957 657 36 10:29 ? 00:00:36 postgres: postgres automation_tn_2g [local] DELETE
root 3100 1 0 10:29 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 3107 3100 0 10:29 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 3110 657 38 10:29 ? 00:00:36 postgres: postgres automation_tn_2g [local] DELETE
root 3236 1 0 10:29 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 3242 3236 0 10:29 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 3246 657 35 10:29 ? 00:00:33 postgres: postgres automation_tn_2g [local] DELETE
root 3374 1 0 10:29 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 3380 3374 0 10:29 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 3383 657 44 10:29 ? 00:00:39 postgres: postgres automation_tn_2g [local] DELETE
postgres 3575 657 0 10:30 ? 00:00:00 postgres: postgres dc 127.0.0.1(41826) idle
postgres 3576 657 7 10:30 ? 00:00:06 postgres: postgres dc 127.0.0.1(41827) INSERT
postgres 3582 657 3 10:30 ? 00:00:03 postgres: postgres dc 127.0.0.1(41828) idle
postgres 3583 657 3 10:30 ? 00:00:03 postgres: postgres dc 127.0.0.1(41829) idle
postgres 3585 657 0 10:30 ? 00:00:00 postgres: postgres dc 127.0.0.1(41830) idle
postgres 3586 657 0 10:30 ? 00:00:00 postgres: postgres dc 127.0.0.1(41831) idle
postgres 3587 657 1 10:30 ? 00:00:00 postgres: postgres dc 127.0.0.1(41832) idle
postgres 3590 657 0 10:30 ? 00:00:00 postgres: postgres dc 127.0.0.1(41833) idle
root 3600 1 0 10:30 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d db
root 3608 3600 0 10:30 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 3615 657 35 10:30 ? 00:00:29 postgres: postgres automation_tn_2g [local] DELETE
root 3628 1 0 10:30 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 3633 3628 0 10:30 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 3638 657 46 10:30 ? 00:00:36 postgres: postgres automation_tn_2g [local] DELETE
root 3663 1 0 10:30 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 3672 3663 0 10:30 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 3679 657 39 10:30 ? 00:00:29 postgres: postgres automation_tn_2g [local] DELETE
root 3689 1 0 10:30 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 3697 3689 0 10:30 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 3705 657 45 10:30 ? 00:00:32 postgres: postgres automation_tn_2g [local] DELETE
root 3724 1 0 10:30 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 3729 3724 0 10:30 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 3733 657 37 10:30 ? 00:00:25 postgres: postgres automation_tn_2g [local] DELETE
root 3752 1 0 10:30 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 3760 3752 0 10:30 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 3767 657 40 10:30 ? 00:00:25 postgres: postgres airtel_automation_tn_2g [local] DELETE
root 3780 1 0 10:30 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 3788 3780 0 10:30 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 3798 657 35 10:30 ? 00:00:20 postgres: postgres automation_tn_2g [local] DELETE
root 3803 1 0 10:30 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 3811 3803 0 10:30 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 3818 657 37 10:30 ? 00:00:21 postgres: postgres automation_tn_2g [local] DELETE
root 3827 1 0 10:30 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 3833 3827 0 10:30 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 3836 657 35 10:30 ? 00:00:20 postgres: postgres automation_tn_2g [local] DELETE
root 3850 1 0 10:30 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 3855 3850 0 10:30 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 3857 657 32 10:30 ? 00:00:18 postgres: postgres automation_tn_2g [local] DELETE
root 3919 3884 0 10:30 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 3925 3919 0 10:30 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 3934 657 48 10:30 ? 00:00:20 postgres: postgres automation_tn_2g [local] DELETE
root 3948 3884 0 10:30 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 3956 3948 0 10:30 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 3957 657 44 10:30 ? 00:00:16 postgres: postgres automation_tn_2g [local] DELETE
root 3979 3884 0 10:30 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 3984 3979 0 10:30 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 3986 657 33 10:30 ? 00:00:10 postgres: postgres automation_tn_2g [local] DELETE
root 4088 3884 0 10:31 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 4096 4088 0 10:31 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 4103 657 47 10:31 ? 00:00:11 postgres: postgres automation_tn_2g [local] DELETE
root 4157 3884 0 10:31 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 4162 4157 0 10:31 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 4166 657 40 10:31 ? 00:00:07 postgres: postgres automation_tn_2g [local] DELETE
root 4188 3884 0 10:31 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 4194 4188 0 10:31 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 4203 657 31 10:31 ? 00:00:04 postgres: postgres automation_tn_2g [local] DELETE
root 4612 3884 0 10:31 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 4621 4612 0 10:31 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 4625 657 38 10:31 ? 00:00:04 postgres: postgres automation_tn_2g [local] DELETE
root 5607 3884 0 10:31 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 5615 5607 0 10:31 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 5619 657 24 10:31 ? 00:00:00 postgres: postgres automation_tn_2g [local] COPY
root 5658 30149 0 10:31 pts/3 00:00:00 grep postgres
root 7532 1 0 10:19 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 7539 7532 0 10:19 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 7542 657 50 10:19 ? 00:06:11 postgres: postgres automation_tn_2g [local] DELETE
root 12677 1 0 10:20 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 12682 12677 0 10:20 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 12686 657 39 10:20 ? 00:04:12 postgres: postgres automation_tn_2g [local] DELETE
postgres 13053 657 1 08:50 ? 00:01:27 postgres: postgres dc 127.0.0.1(41692) idle
postgres 13570 1 0 Mar05 ? 00:00:03 /usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf -n
postgres 13574 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13575 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13576 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13578 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13579 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13580 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13581 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13583 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13584 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13585 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13586 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13587 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13588 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13589 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13590 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13591 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13592 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13593 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13594 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13595 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13596 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13597 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13598 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13599 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13601 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13603 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13604 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13605 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13607 13570 0 Mar05 ? 00:00:00 pgpool: PCP: wait for connection request
postgres 13608 13570 0 Mar05 ? 00:00:00 pgpool: worker process
postgres 13609 13570 0 Mar05 ? 00:00:00 pgpool: health check process(0)
root 17174 1 0 10:22 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 17181 17174 0 10:22 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 17190 657 49 10:22 ? 00:04:23 postgres: postgres automation_tn_2g [local] DELETE
root 20643 1 0 10:24 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 20650 20643 0 10:24 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 20654 657 41 10:24 ? 00:02:56 postgres: postgres automation_tn_2g [local] DELETE
postgres 25145 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
root 25921 1 0 10:26 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 25928 25921 0 10:26 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 25929 657 48 10:26 ? 00:02:36 postgres: postgres automation_tn_2g [local] DELETE
postgres 28659 13570 0 Mar09 ? 00:00:00 pgpool: wait for connection request
root 29581 1 0 10:27 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 29588 29581 0 10:27 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 29594 657 39 10:27 ? 00:01:34 postgres: postgres automation_tn_2g [local] DELETE
root 30416 1 0 10:27 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 30423 30416 0 10:27 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 30426 657 41 10:27 ? 00:01:30 postgres: postgres automation_tn_2g [local] DELETE
root 30521 1 0 10:27 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 30527 30521 0 10:27 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 30532 657 49 10:27 ? 00:01:45 postgres: postgres automation_tn_2g [local] DELETE
root 30632 1 0 10:27 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 30642 30632 0 10:27 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 30643 657 40 10:27 ? 00:01:26 postgres: postgres automation_tn_2g [local] DELETE
root 30769 1 0 10:27 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 30777 30769 0 10:27 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 30780 657 46 10:27 ? 00:01:38 postgres: postgres automation_tn_2g [local] DELETE
root 30989 1 0 10:28 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 30995 30989 0 10:28 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 31000 657 39 10:28 ? 00:01:20 postgres: postgres automation_tn_2g [local] DELETE
root 31221 1 0 10:28 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 31228 31221 0 10:28 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 31232 657 48 10:28 ? 00:01:36 postgres: postgres automation_tn_2g [local] DELETE
root 31404 1 0 10:28 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 31411 31404 0 10:28 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 31416 657 39 10:28 ? 00:01:18 postgres: postgres airtel_automation_tn_2g [local] DELETE
root 31597 1 0 10:28 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 31607 31597 0 10:28 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 31608 657 47 10:28 ? 00:01:30 postgres: postgres automation_tn_2g [local] DELETE
root 31790 1 0 10:28 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 31800 31790 0 10:28 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 31802 657 47 10:28 ? 00:01:29 postgres: postgres automation_tn_2g [local] DELETE
postgres 31871 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
root 31997 1 0 10:28 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 32003 31997 0 10:28 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 32006 657 41 10:28 ? 00:01:16 postgres: postgres automation_tn_2g [local] DELETE
root 32207 1 0 10:28 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 32217 32207 0 10:28 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 32218 657 40 10:28 ? 00:01:12 postgres: postgres airtel_automation_tn_2g [local] DELETE
root 32333 1 0 10:28 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 32341 32333 0 10:28 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 32344 657 47 10:28 ? 00:01:23 postgres: postgres automation_tn_2g [local] DELETE
root 32543 1 0 10:28 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 32552 32543 0 10:28 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 32554 657 41 10:28 ? 00:01:10 postgres: postgres airtel_automation_tn_2g [local] DELETE
root 32680 1 0 10:28 pts/0 00:00:00 /bin/bash /opt/PostgreSQL/9.5/bin/psql -U postgres -d automation_tn_2g
root 32688 32680 0 10:28 pts/0 00:00:00 /opt/PostgreSQL/9.5/bin/psql.bin -U postgres -d automation_tn_2g
postgres 32698 657 41 10:28 ? 00:01:09 postgres: postgres automation_tn_2g [local] DELETE









Can you please help to check the connection pooling is configured and working properly.
(0003259)
pengbo   
2020-03-12 10:53   
> I am already using Pg pool 4.0.8 . But still logs are not generating.
If you start pgpool using systemd, you can check the log by using the commad below:

    # journalctl -a | grep pgpool

If you want to redirect standard output and standard error to log file,
you need to redirect while starting pgpool.

   # pgpool -n -f /etc/pgpool-II/pgpool.conf > /var/log/pgpool.log 2>&1 &

Also you can use "syslog" to output logs.
See example:
https://www.pgpool.net/docs/latest/en/html/example-cluster.html#EXAMPLE-CLUSTER-PGPOOL-CONFIG-LOG

> Please confirm pg pool is configured properly.


-----------
postgres 13574 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13575 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13576 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
postgres 13578 13570 0 Mar05 ? 00:00:00 pgpool: wait for connection request
-----------

Yes. I think pgpool is started properly.
(0003260)
kanika.kamboj   
2020-03-12 13:38   
Can you please suggest some best ways to test effective connection pooling using pg pool ?
(0003261)
kanika.kamboj   
2020-03-12 13:39   
How can we test the pooling and connections are being distributed..??
(0003270)
pengbo   
2020-03-16 12:11   
Connect to pgpool and exit, then check "pg_stat_activity" to see if the connection is cached.

(1) Connect to pgpool.

    $ psql -h 127.0.0.1 -U test1 test1 -p 9999

(2) Exit

    test1=> \q

(3) Connect to PostgreSQL and execute "select * from pg_stat_activity".
      If the connection to database test1 is displayed, it means the connection pooling works properly.

    $ psql -h 127.0.0.1 -U postgres postgres -p 9999
    postgres=# select * from pg_stat_activity where backend_type = 'client backend' and datname = 'test1';
    ...
   16396 | test1 | 5654 | 10 | test1 | psql | | | -1 | 202
0-03-11 10:30:26.283691+09 | | | 2020-03-11 10:30:26.285531+09 | Client | ClientRea
d | idle | | | | client backend
    (1 row)


You can also use "show pool_pools" command to check the pools status.

https://www.pgpool.net/docs/latest/en/html/sql-show-pool-pools.html
(0003272)
kanika.kamboj   
2020-03-18 19:18   
Still logs are not getting configured.
(0003282)
pengbo   
2020-03-25 15:01   
How did you start pgpool?


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
568 [Pgpool-II] Bug major always 2019-12-16 17:21 2020-03-24 17:21
Reporter: spaskalev Platform: Linux  
Assigned To: hoshiai OS: VMware Photon OS  
Priority: normal OS Version: 3.0  
Status: feedback Product Version: 4.0.7  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: Reattaching detached primary doesn't bring it up
Description: I'm running postgres with two async standbys, external failover agent (repmgrd) and pgpool as a proxy that has all postgres nodes added and can figure out which node is the primary and send traffic to it.

I'm running pgpool with health check enabled so that failed nodes are automatically detached.

When the primary is detached (either by a failed health check or manually through pcp_detach_node) and then attached back with pcp_node_attach pgpool continues to show its status as 'down' and will not send traffic to it.
Tags:
Steps To Reproduce: $ seq 0 2 | xargs -n 1 pcp_node_info -w
postgres-0 5432 2 -nan up primary 0 2019-12-16 08:01:14
postgres-1 5432 2 -nan up standby 0 2019-12-16 08:01:14
postgres-2 5432 2 -nan up standby 0 2019-12-16 08:01:14

$ pcp_detach_node -w 0
pcp_detach_node -- Command Successful

$ seq 0 2 | xargs -n 1 pcp_node_info -w
postgres-0 5432 3 -nan down primary 0 2019-12-16 08:02:44
postgres-1 5432 2 -nan up standby 0 2019-12-16 08:01:14
postgres-2 5432 2 -nan up standby 0 2019-12-16 08:01:14

$ pcp_attach_node -w 0
pcp_attach_node -- Command Successful

$ seq 0 2 | xargs -n 1 pcp_node_info -w
postgres-0 5432 3 -nan down primary 0 2019-12-16 08:02:44
postgres-1 5432 2 -nan up standby 0 2019-12-16 08:01:14
postgres-2 5432 2 -nan up standby 0 2019-12-16 08:01:14
Additional Information: Here is my pgpool config

sr_check_user = '...'
sr_check_password = '...'
sr_check_database = '...'
sr_check_period = 1

connect_timeout = 3000
health_check_timeout = 5
health_check_period = 1
health_check_max_retries = 0
health_check_user = '...'
health_check_password = '...'
health_check_database = '...'

search_primary_node_timeout = 0
detach_false_primary = on
failover_on_backend_error = on
failover_command = '/scripts/pgpool_failover.sh %h' # custom script for events

listen_addresses = '*'
port = 5432
socket_dir = '/var/run/postgresql'
listen_backlog_multiplier = 1
serialize_accept = on
pcp_listen_addresses = '' # Disable PCP over TCP
pcp_socket_dir = '/tmp'
enable_pool_hba = on
# Note: this is a file name, not a password
pool_passwd = 'pool_passwd'
allow_clear_text_frontend_auth = off

# - Concurrent session and pool size -
num_init_children = 1200
max_pool = 1

# - Life time -
serialize_accept = on
child_life_time = 0
child_max_connections = 1
connection_life_time = 3900
client_idle_limit = 720

log_destination = 'syslog'
syslog_facility = 'LOCAL0'
syslog_ident = 'pgpool'
pid_file_name = '/var/run/postgresql/pgpool.pid'
logdir = '/var/log/postgresql'

connection_cache = off
load_balancing = off

master_slave_mode = on
master_slave_sub_mode = 'stream'

backend_hostname0 = 'postgres-0'
backend_port0 = 5432
backend_weight0 = 0
backend_data_directory0 = '/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = 'postgres-1'
backend_port1 = 5432
backend_weight1 = 0
backend_data_directory1 = '/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'

backend_hostname2 = 'postgres-2'
backend_port2 = 5432
backend_weight2 = 0
backend_data_directory2 = '/data'
backend_flag2 = 'ALLOW_TO_FAILOVER'
Attached Files:
Notes
(0003023)
spaskalev   
2019-12-16 22:19   
The issue seems to be caused by the infinite search for a new primary node - when I detach the primary pgpool starts looking for a new primary

2019-12-16 13:06:44: pid 16222: LOG: find_primary_node: standby node is 1
2019-12-16 13:06:44: pid 16222: LOG: find_primary_node: standby node is 2
2019-12-16 13:06:45: pid 16222: LOG: find_primary_node: standby node is 1
2019-12-16 13:06:45: pid 16222: LOG: find_primary_node: standby node is 2
... - logs repeat

I've tried limiting the search_primary_node_timeout and re-attaching the existing primary after pgpool has given up on finding a new primary then correctly attaches it in an up state
(0003038)
hoshiai   
2020-01-06 09:34   
> The issue seems to be caused by the infinite search for a new primary node - when I detach the primary pgpool starts looking for a new primary

You're right. when this case, pgpool search new primary forever. because pgpool search new primary in active standby nodes.

> I've tried limiting the search_primary_node_timeout and re-attaching the existing primary after pgpool has given up on finding a new primary then correctly attaches it in an up state

Yes, your handling is no problem.

(0003039)
spaskalev   
2020-01-06 20:39   
True, but I don't want to limit the primary search interval, in case a real failover happens - as then pgpool has the wrong notion on which node is primary.

Alternatively, I need a way to trigger the primary search on some interval to detect a failover/switchover without any nodes going down.
(0003046)
hoshiai   
2020-01-08 10:52   
pgpool can execute one failover process(contain failover and failback, attach node, detach node internally).
search of new primary is included in failover process, so next failover process(pcp_attach_node) is not executed until find new primary.

Currently, if pgpool detect down of primary node, pgpool run on the premise that other standby node is promoted to new primary.
This behavior is for that pgpool don't think to use external failover system together in SR mode.
(0003052)
spaskalev   
2020-01-08 20:18   
I agree, but I think that it is a valid use case for an external failover. In my setup I use multiple pgpool instances for different clients to provide high availability and load balancing over pgpool itself. The multiple instances of pgpool don't know about each other and don't use the watchdog/virtual ip mechanism. This way multiple pgpool instances running on different machines can be used concurrently.

If one instance of the pgpools looses connectivity temporary to the postgres primary that doesn't mean that it should elect a new primary - only that it lost connectivity. Then after a while say the primary comes back (I currently do this manually via a cronjob, but I see that it is now available as a feature in pgpool 4.1.0) - and I would expect that pgpool would just start proxying traffic to it.

All of that without triggering failover on the actual postgres node. This way the behavior of my setup is decoupled and I can modify different parts without changing the rest.
(0003054)
hoshiai   
2020-01-09 11:24   
I understand your thinking.
However, I think that watchdog feature is a satisfactory too. If use watchdog in pgpool, it is no problem for that one pgpool node losting primary node temporarily . And multiple pgpool instances running on different machines can be used concurrently too( not use VIP).
In general, it it very serious incident if detect down of primary node. pgpool can't continue until resolved this problem.

Currently, we are starting proposal and develop ofnext pgpool version. If you need some new feature, please suggest in ML.

(0003055)
hoshiai   
2020-01-09 11:32   
And if this is temporarily down, you will resolve it by that failover condition is more easy(for example increasing health_check_max_retries or health_check_timeout ).
(0003275)
spaskalev   
2020-03-22 16:00   
I agree, but we are rather committed to our current architecture, so switching to the watchdog will need to be implemented, properly tested and so on, so I have to find a way around this for now without an architecture change.

Currently I have patched the health check processes to skip the check if BACKEND_INFO(*node_id).role != ROLE_PRIMARY.

This change appears to works fine for out case
- if the primary fails the 'failover_on_backend_error' immediately detects this as the primary is constantly in use - and starts to look for a new primary.
- if a standby fails the health check will disconnect it from the pool.

Let me know what you think of this - I can send a patch where this is behind an option.

Regards,
Stanislav
(0003276)
spaskalev   
2020-03-22 16:01   
My bad, I meant - "to do the healtch check only if BACKEND_INFO(*node_id).role != ROLE_PRIMARY'
(0003277)
hoshiai   
2020-03-23 16:37   
I think that's fine, if external failover agent certainly startup new primary from active standby nodes when failover is happened by failover_on_backend_error in pgpool.

In other words, it is a problem, if primary isn't switched when failover_on_backend_error is happend.
(0003278)
hoshiai   
2020-03-23 16:49   
In addition, pgpool has the timing of failover other than health check and failover_on_backend_error. pgpool do failover by doing shutdown on PostgreSQL while exist a connecting postgresql session. Please be carefull about this case.
(0003279)
spaskalev   
2020-03-23 17:44   
The shutdown case, yes - this is good. I wonder about the 'BACKEND_INFO(*node_id).role != ROLE_PRIMARY' check - after a few hours of testing I got a failure there and pgpool ran a health check for the primary.

I have now changed my patch to

      /*
       * Skip healthcheck for primary nodes
       */
      if ((*node_id) == REAL_PRIMARY_NODE_ID) {
         sleep(30);
         continue;
      }

in the main health check loop. Is this the proper way to get the current primary node id ?

Regards
(0003280)
spaskalev   
2020-03-24 15:24   
This could be a valid feature, applicable to other setups I guess - different, dynamic health check parameters depending on the node's role. So that replicas that aren't load can failover faster but primaries, that are heavily loaded can have increased health check settings. I know I can configure this per node, but its not dynamic.
(0003281)
hoshiai   
2020-03-24 17:21   
> in the main health check loop. Is this the proper way to get the current primary node id ?

Yes, you're right. REAL_PRIMARY_NODE_ID is more better than ROLE_PRIMARY. BACKEND_INFO.role change status in a moment duirng failover().


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
595 [Pgpool-II] Bug minor N/A 2020-03-12 12:18 2020-03-20 17:49
Reporter: gregn123 Platform:  
Assigned To: t-ishii OS:  
Priority: normal OS Version:  
Status: resolved Product Version: 4.1.0  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version: 4.1.2  
Summary: Incorrect message length validation in do_SCRAM() - backend authentication
Description: There is an incorrect message length validation performed during backend authentication in the do_SCRAM() function (src/auth/pool_auth.c).

               /* Read next packend */
                pool_read(backend, &kind, sizeof(kind));
                pool_read(backend, &len, sizeof(len));
                if (kind != 'R')
                        ereport(ERROR,
                                        (errmsg("backend authentication failed"),
                                         errdetail("backend response with kind \'%c\' when expecting \'R\'", kind)));
                message_length = ntohl(len);
                if (len <= 8)
                        ereport(ERROR,
                                        (errmsg("backend authentication failed"),
                                         errdetail("backend response with no data ")));


The code is currently checking if "len <= 8", but len is is network-byte-order (big-endian).
It is surely meant to be checking "message_length" instead, which is "len" coverted to host-byte-order (see previous line of code).
Under (Intel) Linux, which is little-endian, the value of "len" will be a large number and thus render the current error condition check ineffective [for example, in one case that I debugged, an example value of len was 134217728 (0x08000000), meaning that message_length was actually 8].
Additionally, it seems the "<=" check should actually be "<", based on the length values that I see when debugging this code.
I've attached a proposed patch (pgpool2_pool_auth_do_scram.patch).
Please review.
Tags:
Steps To Reproduce:
Additional Information:
Attached Files: pgpool2_pool_auth_do_scram.patch (572 bytes) 2020-03-12 12:18
https://www.pgpool.net/mantisbt/file_download.php?file_id=766&type=bug
Notes
(0003262)
t-ishii   
2020-03-12 15:35   
Good catch! I will take care of the patch.
(0003263)
t-ishii   
2020-03-12 15:52   
Your patch looks good. Can you please share your name (or email etc.) to be credited for the patch?
(0003266)
gregn123   
2020-03-13 09:24   
Greg Nancarrow (Fujitsu Australia)
(0003267)
t-ishii   
2020-03-13 09:34   
Patch pushed. Thanks!


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
596 [Pgpool-II] Bug minor always 2020-03-12 14:17 2020-03-13 17:27
Reporter: gregn123 Platform:  
Assigned To: t-ishii OS:  
Priority: normal OS Version:  
Status: feedback Product Version: 4.1.0  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version: 4.1.2  
Summary: Problems in watchdog processing json data
Description: In the watchdog source code (src/watchdog/wd_json_data.c), there are some instances of bad handling of values read from json data.
For example:
1) The boolean pool configuration settings "load_balance_mode" and "master_slave_mode" are read using json_get_int_value_for_key(), resulting in 4-bytes being written into their location within the POOL_CONFIG, yet (being bool) they are only 1-byte long. This corrupts the values of the structure members following them.
2) Similarly, when parsing node function json data, "Flags" is read using json_get_int_value_for_key(), resulting in 4-bytes being written into an "unsigned char flags" variable on the stack, overwriting 3-bytes of stack memory following it. On a big-endian system (e.g. Solaris-sparc or Linux for IBM Z), this causes regression test "013.watchdog_failover_require_consensus" to fail, since 0 is written into Flags, rather than the intended value which is in the least significant byte of the int value written.

I have attached a patch to correct these issues (gpool2_watchdog_json_handling_issues.patch).
Please review.

Tags: watchdog
Steps To Reproduce: Run the pgpool2 regression tests on a big-endian system (e.g. Solaris-sparc or Linux on IBM Z), and test 013 will fail.
(To build pgpool2 properly on these systems, I found it necessary to specify -DWORDS_BIGENDIAN in CFLAGS when running the configure script, as it's otherwise not accounting for it. My other fix for SCRAM authentication may also be needed - see issue 0000595).
The corruptions will occur on all systems, but these may or may not cause obvious issues.
Additional Information:
Attached Files: pgpool2_watchdog_json_handling_issues.patch (3,237 bytes) 2020-03-12 14:17
https://www.pgpool.net/mantisbt/file_download.php?file_id=767&type=bug
Notes
(0003264)
t-ishii   
2020-03-12 20:56   
Thank you! Patch looks good. I am going to commit it. (please let me know your name so that I can credit your name in a commit message).
(0003265)
gregn123   
2020-03-13 09:23   
Greg Nancarrow (Fujitsu Australia)
(0003268)
t-ishii   
2020-03-13 10:50   
Patch pushed. Thanks!


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
588 [Pgpool-II] General major always 2020-02-27 18:05 2020-03-11 12:52
Reporter: loay Platform:  
Assigned To: pengbo OS:  
Priority: normal OS Version:  
Status: feedback Product Version: 4.0.2  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: I cant stop/promtye backend node ( master or salve) from pgpooladmin
Description: Hello,

I have setup two pg servers verion 11 on debian 10 using repmgr

I have installed pgpool 4.0.2 on third server to load balance the traffic toward two backend server , when i tried to stop node from pgpoool admin I got the attached error .

 
Tags: error, pgpool in load balancing mode
Steps To Reproduce:
Additional Information: Repmgr gives everything ok


Attached Files: Capture.PNG (64,843 bytes) 2020-02-27 18:05
https://www.pgpool.net/mantisbt/file_download.php?file_id=752&type=bug
pool_passwd (82 bytes) 2020-02-27 18:05
https://www.pgpool.net/mantisbt/file_download.php?file_id=751&type=bug
pool_hba.conf (3,458 bytes) 2020-02-27 18:05
https://www.pgpool.net/mantisbt/file_download.php?file_id=750&type=bug
pgpool.conf (38,115 bytes) 2020-02-27 18:05
https://www.pgpool.net/mantisbt/file_download.php?file_id=749&type=bug
pcp.conf (42 bytes) 2020-02-27 18:05
https://www.pgpool.net/mantisbt/file_download.php?file_id=748&type=bug
Notes
(0003237)
pengbo   
2020-03-04 12:57   
Did you installed "pgpool-II-pgxx-extensions" and created pgpool_recovery extension in template1?

$ yum install pgpool-II-pgxx-extensions
$ psql template1 -c "CREATE EXTENSION pgpool_recovery"


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
516 [Pgpool-II] Bug major always 2019-05-22 05:51 2020-03-06 16:16
Reporter: Fowler Platform: Linux  
Assigned To: pengbo OS: Ubuntu  
Priority: high OS Version: Xenial 16.04 LTS  
Status: resolved Product Version: 4.0.4  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: ping probes fail with long hostnames due to small buffer
Description: pgpool calls "ping -q -c3 <hostname>" and parses the result to ascertain if a host is up or down. Unfortunately it uses a rather small buffer to read the output and the last line of the ping command can get truncated. This means that pgpool assumes the host is down.

The offending function is in src/watchdog/wd_ping.c
On line 177 a buffer called "result" of size WD_MAX_PING_RESULT is allocated on the stack. This is only 256 bytes in size. The output of the ping process is read through the outfd in to this.
On line 190 get_result(result) is called on the possibly truncated output. If it fails to parse, it will fail with the error "average RTT value is not greater than zero" and the probe will fail.

Tags:
Steps To Reproduce: Set your hostname to something long, (on my offending machines, it's 52 characters long). Use hostnames in the pgpool.conf as the postgres backeneds.
It doesn't matter if you use the short version of the hostname. If your search path is set in /etc/resolv.conf, it will be expanded to the FQDN

Setting debug level high will show the ping data that is being parsed. You'll notice the truncations on the last line.
Additional Information: You can work around this by wrapping ping and setting the ping_path in pgpool.conf to point to it. Here's the wrapper I use. It simply discards all but the last line as the parser simply searches for it by skipping the first 4 "/" characters.

#!/bin/bash
output=$(/bin/ping $@)
echo "$output" | tail -n 1

Ideally the buffer size could be simply increased or the output file descriptor could be parsed directly until a certain limit of chars was reached negating the need for the buffer altogether.
Attached Files:
Notes
(0003252)
pengbo   
2020-03-06 16:16   
Sorry for late response.

Bug is fixed:
https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=b5caee4076a529721d8205b66c3ca74bacbcc1dc


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
583 [Pgpool-II] General major always 2020-02-12 22:54 2020-03-05 10:20
Reporter: lyon0619 Platform:  
Assigned To: pengbo OS:  
Priority: high OS Version:  
Status: resolved Product Version: 4.1.0  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: pcp_recovery_node return "unable to connect to master node"
Description: Hi Support,
I am new to PGPool II and getting crazy about PCP connecting to PG backend nodes.
PG: 12
PGPooll: 4.1

I am getting below errors and get no clue why. (as other PCP commands run successfully)
[root@dev-postgresdb-test1 etc]# pcp_recovery_node -h 10.212.164.20 -p 9898 -U postgres -n 2 -d
Password:
DEBUG: recv: tos="m", len=8
DEBUG: recv: tos="r", len=21
DEBUG: send: tos="D", len=6
DEBUG: recv: tos="E", len=95
ERROR: node recovery failed, unable to connect to master node: 0



[root@dev-postgresdb-test1 etc]# pcp_watchdog_info -p 9898 -h 10.212.164.20 -U pgpool
Password:
3 YES dev-postgresdb-test1:9999 Linux dev-postgresdb-test1 dev-postgresdb-test1

dev-postgresdb-test1:9999 Linux dev-postgresdb-test1 dev-postgresdb-test1 9999 9000 4 MASTER
dev-postgresdb-test2:9999 Linux dev-postgresdb-test2 dev-postgresdb-test2 9999 9000 7 STANDBY
dev-postgresdb-test3:9999 Linux dev-postgresdb-test3 dev-postgresdb-test3 9999 9000 7 STANDBY

DEBUG: send: tos="X", len=4

[root@dev-postgresdb-test1 etc]# pcp_node_info -h 10.212.164.20 -p 9898 -U postgres -n 0
Password:
dev-postgresdb-test1 5422 2 0.333333 up primary 0 2020-02-12 21:10:36

I would be appreciated if you look into it.

Thanks

Lyon
Tags: pcp
Steps To Reproduce:
Additional Information:
Attached Files: pgpool.conf (42,168 bytes) 2020-02-13 16:05
https://www.pgpool.net/mantisbt/file_download.php?file_id=747&type=bug
postgres_dbPrivs.log (1,163 bytes) 2020-02-13 16:05
https://www.pgpool.net/mantisbt/file_download.php?file_id=746&type=bug
Notes
(0003196)
pengbo   
2020-02-13 10:23   
(Last edited: 2020-02-13 10:23)
Did you set "recovery_user" and "recovery_password" in pgpool.conf?
Also "recovery_user" needs superuser privilege.

See example:
https://www.pgpool.net/docs/latest/en/html/example-cluster.html#EXAMPLE-CLUSTER-PGPOOL-CONFIG-ONLINE-RECOVERY

(0003197)
lyon0619   
2020-02-13 16:05   
Hi Pengbo,
Thanks for your reply. Sorry to mention I am following procedure below to setup the PG+PGPool cluster.
https://www.pgpool.net/docs/latest/en/html/example-cluster.html
Linux 7.2
PG12
PGPooll 4.1

Yes, I set recovery_user to postgres that should have Superuser role on PostgreSQL.
recovery_password is left blank as pool_passwd is in place for autentication.

[root@dev-postgresdb-test1 etc]# ls -lht pool_passwd
-rw-r--r-- 1 root root 88 Jan 21 13:38 pool_passwd
[root@dev-postgresdb-test1 etc]# cat pool_passwd
postgres:md5a2b288a972fa230e9e7574715333d841
pgpool:md539ef55ae8fcc55dde7c7d521deefdae5
[root@dev-postgresdb-test1 etc]# pwd
/opt/pgpool410/etc

Attached please find my pgpool.conf and query results for postgres in PostgreSQL12.
(0003198)
lyon0619   
2020-02-13 16:12   
By the way,
My PGPool is owned by root while PostgresSQL is owned by postgres OS user. Could this be the issue?
Thanks
(0003238)
pengbo   
2020-03-04 13:50   
Sorry for late response.

It seems that connecting to backend PostgreSQL failed.
Could you connect postgresql node0 using the password included in "pool_passwd"?

# psql -h dev-postgresdb-test1 -p 5422 -U postgres
(0003243)
lyon0619   
2020-03-04 20:29   
Setting plain password to pool_passwd resolved the issue.

Thanks Pengbo
(0003244)
pengbo   
2020-03-05 10:19   
OK. I am going to close this issue.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
586 [Pgpool-II] General minor have not tried 2020-02-26 17:47 2020-03-03 15:47
Reporter: ducpham Platform: Linux  
Assigned To: t-ishii OS: Centos  
Priority: normal OS Version: 7.1908  
Status: assigned Product Version: 4.1.1  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: Do not update status of slave host on PGPool
Description: Dear pgPool team,

We are using standard master slave scenario with pgpool 4.1.1.

Currently, we have a problem as bellow:

① master (on) + slave (on) : status on pgpool => up -up

② master (on) + slave (off) : status on pgpool => up-down

③ master(on) + slave(re-on): status on pgpool => up-down

Could you please tell me about that do having an setting on pgpool.conf to update status of node on pgpool ?

or that is abnormal point on latest version 4.1.1 of PgPool ?

Thank you very much for your support !
Tags:
Steps To Reproduce:
Additional Information:
Attached Files:
Notes
(0003226)
t-ishii   
2020-02-27 09:02   
> master(on) + slave(re-on): status on pgpool => up-down
> Could you please tell me about that do having an setting on pgpool.conf to update status of node on pgpool ?
> or that is abnormal point on latest version 4.1.1 of PgPool ?
If you mean by "re-on" starting the slave again, then the behavior you are seeing is normal since Pgpool-II was born and explicitly stated in the doc:
https://www.pgpool.net/docs/latest/en/html/runtime-config-failover.html
"A PostgreSQL node detached by failover or switch over will never return to the previous state (attached state). "

In this you can use pcp_attach_node to make the slave to be in "up" as explained in the doc.

Or you can configure "auto_failback" which is new in 4.1 without issuing pcp_attach_node command. See the doc for more details.
(0003228)
ducpham   
2020-02-27 10:57   
Dear Mr t-isshi,
Thank you very much for your support,
I has been using pcp_attach_node as bellow
> pcp_recovery_node 60 10.116.226.186 9898 postgres postgres 1
But, this is current status of node server when using command "show pool_nodes" to show
 0 | 10.116.226.63 | 5432 | up | 0.500000 | primary | 4
 | true | 0 | |
      | 2020-02-26 15:29:14
 1 | 10.116.226.155 | 5432 | quarantine | 0.500000 | standby | 0
 | false | 0 | streaming | async
      | 2020-02-26 16:42:30
(2 rows)

=> Status of slave is change from "down" to "quarantine ".

On the other way, I using "pcp_node_info" to show status of 'slave' node. But having an other result - 'status is down 'as :

-bash-4.2$ pcp_node_info -h localhost -U postgres 1
Password:
10.116.226.155 5432 3 0.500000 down standby 0 streaming async


Having any abnormal point on this case ?
(0003229)
ducpham   
2020-02-27 12:13   
I'm sorry but pcp_attach_node command that i using as : pcp_attach_node
       pcp_attach_node -U postgres -h 10.51.219.14 -p 9898 1
(0003230)
t-ishii   
2020-02-27 13:29   
You need to turn on enable_consensus_with_half_votes because you have only 2 Pgpool-II nodes. We recommend to have at least 3 Pgpool-II nodes.
See release note (https://www.pgpool.net/docs/latest/en/html/release-4-1-0.html) for more details.
(0003236)
ducpham   
2020-03-03 15:47   
Thank you very much for support.

I stop pgpool services and delete file /tmp/pgpool_status.
After that restart pgool services and re-run this command:
> pcp_attach_node -U postgres -h 10.51.219.14 -p 9898 1
And it work ok with my case.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
577 [Pgpool-II] Bug major always 2020-01-22 23:09 2020-03-02 11:21
Reporter: raj.pandey1982@gmail.com Platform: Linux  
Assigned To: t-ishii OS: centos  
Priority: high OS Version: OS Version x86_6  
Status: feedback Product Version: 4.1.0  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: PGPOOLII goes unstable after DB fail over
Description: I have 2 postgres master-slave nodes.
Each having POSGRESDB/pgpoolI 4.1.0 configured on same node as Master and stand by.
1 virtual IP.


1st scenrio :- PGPPOL tool itself failover Going good:-

a - Maste to Slave tool failover happenning and Slave acquires tht VI_P as New Master.
b - Old Slave to Old master tool failover happenning and old master again come over as New MAster .Old Slave also coming up as new slave.
c - VIP deligation happening well .
d - Postgres Admin tool able to reconnect in same session.
e - Applcation(frontend) also acquire connections.
f - Conection pooling happenning.
g - read load balancing going through.
 
2nd scenrio :- DB Automatic Failover (which is very Important) :-

a-Failover_command worked well when i stopped Master Db(pgpool-poc01.novalocal:5432) , Slave DB (pgpool-poc02.novalocal:5432) got opened as Master

BUT :-

b- PGPOOL log gives lot many errors , almost error on both Master & Slave nodes and not coming in stable stage and hence neither PGAdmin nor Application/front End is able to connect with newly opend Slave Db.

Log:-
2020-01-22 12:39:11: pid 7492:DETAIL: postmaster on DB node 0 was shutdown by administrative command
2020-01-22 12:39:11: pid 7492:LOG: received degenerate backend request for node_id: 0 from pid [7492]
2020-01-22 12:39:11: pid 20780:LOG: new IPC connection received
2020-01-22 12:39:11: pid 20780:LOG: watchdog received the failover command from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: we have got the consensus to perform the failover
2020-01-22 12:39:11: pid 20780:DETAIL: 1 node(s) voted in the favor
2020-01-22 12:39:11: pid 20160:LOG: reading and processing packets
2020-01-22 12:39:11: pid 20160:DETAIL: postmaster on DB node 0 was shutdown by administrative command
2020-01-22 12:39:11: pid 20160:LOG: received degenerate backend request for node_id: 0 from pid [20160]
2020-01-22 12:39:11: pid 14003:LOG: reading and processing packets
2020-01-22 12:39:11: pid 14003:DETAIL: postmaster on DB node 0 was shutdown by administrative command
2020-01-22 12:39:11: pid 14003:LOG: received degenerate backend request for node_id: 0 from pid [14003]
2020-01-22 12:39:11: pid 20780:LOG: new IPC connection received
2020-01-22 12:39:11: pid 20780:LOG: new IPC connection received
2020-01-22 12:39:11: pid 20780:LOG: watchdog received the failover command from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: we have got the consensus to perform the failover
2020-01-22 12:39:11: pid 20780:DETAIL: 1 node(s) voted in the favor
2020-01-22 12:39:11: pid 20778:LOG: Pgpool-II parent process has received failover request
2020-01-22 12:39:11: pid 20780:LOG: watchdog received the failover command from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: we have got the consensus to perform the failover
2020-01-22 12:39:11: pid 20780:DETAIL: 1 node(s) voted in the favor
2020-01-22 12:39:11: pid 20780:LOG: new IPC connection received
2020-01-22 12:39:11: pid 20780:LOG: received the failover indication from Pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: watchdog is informed of failover start by the main process
2020-01-22 12:39:11: pid 9422:LOG: reading and processing packets
2020-01-22 12:39:11: pid 9422:DETAIL: postmaster on DB node 0 was shutdown by administrative command
2020-01-22 12:39:11: pid 20778:LOG: starting degeneration. shutdown host pgpool-poc01.novalocal(5432)
2020-01-22 12:39:11: pid 9422:LOG: received degenerate backend request for node_id: 0 from pid [9422]
2020-01-22 12:39:11: pid 17824:LOG: reading and processing packets
2020-01-22 12:39:11: pid 17824:DETAIL: postmaster on DB node 0 was shutdown by administrative command
2020-01-22 12:39:11: pid 17824:LOG: received degenerate backend request for node_id: 0 from pid [17824]
2020-01-22 12:39:11: pid 21172:LOG: reading and processing packets
2020-01-22 12:39:11: pid 21172:DETAIL: postmaster on DB node 0 was shutdown by administrative command
2020-01-22 12:39:11: pid 21172:LOG: received degenerate backend request for node_id: 0 from pid [21172]
2020-01-22 12:39:11: pid 30846:LOG: reading and processing packets
2020-01-22 12:39:11: pid 30846:DETAIL: postmaster on DB node 0 was shutdown by administrative command
2020-01-22 12:39:11: pid 30846:LOG: received degenerate backend request for node_id: 0 from pid [30846]
2020-01-22 12:39:11: pid 13838:LOG: reading and processing packets
2020-01-22 12:39:11: pid 13838:DETAIL: postmaster on DB node 0 was shutdown by administrative command
2020-01-22 12:39:11: pid 13838:LOG: received degenerate backend request for node_id: 0 from pid [13838]
2020-01-22 12:39:11: pid 20780:LOG: new IPC connection received
2020-01-22 12:39:11: pid 20780:LOG: new IPC connection received
2020-01-22 12:39:11: pid 20780:LOG: watchdog received the failover command from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: we have got the consensus to perform the failover
2020-01-22 12:39:11: pid 20780:DETAIL: 1 node(s) voted in the favor
2020-01-22 12:39:11: pid 20780:LOG: new IPC connection received
2020-01-22 12:39:11: pid 17408:LOG: reading and processing packets
2020-01-22 12:39:11: pid 17408:DETAIL: postmaster on DB node 0 was shutdown by administrative command
2020-01-22 12:39:11: pid 20780:LOG: watchdog received the failover command from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface

2020-01-22 12:39:11: pid 3924:LOG: received degenerate backend request for node_id: 0 from pid [3924]
2020-01-22 12:39:11: pid 20780:LOG: new IPC connection received
2020-01-22 12:39:11: pid 20780:LOG: new IPC connection received
2020-01-22 12:39:11: pid 20780:LOG: watchdog received the failover command from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: we have got the consensus to perform the failover
2020-01-22 12:39:11: pid 20780:DETAIL: 1 node(s) voted in the favor
2020-01-22 12:39:11: pid 21539:LOG: reading and processing packets
2020-01-22 12:39:11: pid 21539:DETAIL: postmaster on DB node 0 was shutdown by administrative command
2020-01-22 12:39:11: pid 21539:LOG: received degenerate backend request for node_id: 0 from pid [21539]
2020-01-22 12:39:11: pid 20780:LOG: new IPC connection received
2020-01-22 12:39:11: pid 20780:LOG: watchdog received the failover command from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: we have got the consensus to perform the failover
2020-01-22 12:39:11: pid 20780:DETAIL: 1 node(s) voted in the favor
2020-01-22 12:39:11: pid 20780:LOG: new IPC connection received
2020-01-22 12:39:11: pid 20780:LOG: watchdog received the failover command from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: we have got the consensus to perform the failover
2020-01-22 12:39:11: pid 20780:DETAIL: 1 node(s) voted in the favor
2020-01-22 12:39:11: pid 20780:LOG: new IPC connection received
2020-01-22 12:39:11: pid 20780:LOG: watchdog received the failover command from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: we have got the consensus to perform the failover
2020-01-22 12:39:11: pid 20780:DETAIL: 1 node(s) voted in the favor
2020-01-22 12:39:11: pid 20780:LOG: new IPC connection received
2020-01-22 12:39:11: pid 10792:LOG: reading and processing packets

2020-01-22 12:39:11: pid 20780:DETAIL: 1 node(s) voted in the favor
2020-01-22 12:39:11: pid 5643:LOG: reading and processing packets
2020-01-22 12:39:11: pid 5643:DETAIL: postmaster on DB node 0 was shutdown by administrative command
2020-01-22 12:39:11: pid 20780:LOG: new IPC connection received
2020-01-22 12:39:11: pid 5643:LOG: received degenerate backend request for node_id: 0 from pid [5643]
2020-01-22 12:39:11: pid 20780:LOG: watchdog received the failover command from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: we have got the consensus to perform the failover
2020-01-22 12:39:11: pid 20780:DETAIL: 1 node(s) voted in the favor
2020-01-22 12:39:11: pid 20780:LOG: new IPC connection received
2020-01-22 12:39:11: pid 20780:LOG: watchdog received the failover command from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: we have got the consensus to perform the failover
2020-01-22 12:39:11: pid 20780:DETAIL: 1 node(s) voted in the favor
2020-01-22 12:39:11: pid 7442:LOG: reading and processing packets
2020-01-22 12:39:11: pid 7442:DETAIL: postmaster on DB node 0 was shutdown by administrative command
2020-01-22 12:39:11: pid 7442:LOG: received degenerate backend request for node_id: 0 from pid [7442]
2020-01-22 12:39:11: pid 20780:LOG: new IPC connection received
2020-01-22 12:39:11: pid 20780:LOG: watchdog received the failover command from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20778:LOG: Restart all children
2020-01-22 12:39:11: pid 20780:LOG: we have got the consensus to perform the failover
2020-01-22 12:39:11: pid 20780:DETAIL: 1 node(s) voted in the favor
2020-01-22 12:39:11: pid 20780:LOG: new IPC connection received
2020-01-22 12:39:11: pid 20780:LOG: watchdog received the failover command from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: we have got the consensus to perform the failover
2020-01-22 12:39:11: pid 20780:DETAIL: 1 node(s) voted in the favor
22020-01-22 12:39:11: pid 20778:LOG: Restart all children
2020-01-22 12:39:11: pid 20780:LOG: we have got the consensus to perform the failover
2020-01-22 12:39:11: pid 20780:DETAIL: 1 node(s) voted in the favor
2020-01-22 12:39:11: pid 20780:LOG: new IPC connection received
2020-01-22 12:39:11: pid 20780:LOG: watchdog received the failover command from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2020-01-22 12:39:11: pid 20780:LOG: we have got the consensus to perform the failover
2020-01-22 12:39:11: pid 20780:DETAIL: 1 node(s) voted in the favor
2020-01-22 12:39:11: pid 20778:LOG: execute command: /usr/share/pgpool/4.1.0/etc/failover.sh 0 0 pgpool-poc02.novalocal reppassword /installer/postgresql-11.5/data/im_the_master


Authorized Uses Only.All activity may be Monitored and Reported
promote - Start
DEBUG: The script will be executed with the following arguments:
DEBUG: --trigger-file=/installer/postgresql-11.5/data/im_the_master
DEBUG: --standby_file=/installer/postgresql-11.5/data/im_slave
DEBUG: --demote-host=
DEBUG: --user=replication
DEBUG: --password=reppassword
DEBUG: --force
INFO: Checking if standby file exists...
INFO: Checking if trigger file exists...
INFO: Deleting recovery.conf file...
INFO: Checking if postgresql.conf file exists...
INFO: postgresql.conf file found. Checking if it is for primary server...
INFO: postgresql.conf file corresponds to primary server file. Nothing to do.
pg_ctl: server is running (PID: 25960)
/usr/local/pgsql11.5/bin/postgres "-D" "/installer/postgresql-11.5/data"
INFO: Restarting postgresql service...
waiting for server to shut down.... done
server stopped
waiting for server to start....2020-01-22 12:39:12 +03 LOG: listening on IPv4 address "0.0.0.0", port 5432
2020-01-22 12:39:12 +03 LOG: listening on IPv6 address "::", port 5432
2020-01-22 12:39:12 +03 LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2020-01-22 12:39:12 +03 LOG: redirecting log output to logging collector process
2020-01-22 12:39:12 +03 HINT: Future log output will appear in directory "/dblogs/logs".
 done
server started
pg_ctl: server is running (PID: 23249)
/usr/local/pgsql11.5/bin/postgres "-D" "/installer/postgresql-11.5/data"
INFO: postgresql already running.
INFO: Ensuring replication role and password...
INFO: Replication role found. Ensuring password...
ALTER ROLE
INFO: Creating primary info file...
promote - Done!
2020-01-22 12:39:15: pid 21791:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-01-22 12:39:15: pid 21791:ERROR: failed to make persistent db connection
2020-01-22 12:39:15: pid 21791:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
2020-01-22 12:39:16: pid 20780:LOG: watchdog received the failover command from remote pgpool-II node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-01-22 12:39:16: pid 20780:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
2020-01-22 12:39:16: pid 20780:LOG: we have got the consensus to perform the failover
2020-01-22 12:39:16: pid 20780:DETAIL: 1 node(s) voted in the favor
2020-01-22 12:39:16: pid 20780:LOG: invalid degenerate backend request, node id : 0 status: [3] is not valid for failover
2020-01-22 12:39:20: pid 21791:ERROR: Failed to check replication time lag
2020-01-22 12:39:20: pid 21791:DETAIL: No persistent db connection for the node 0
2020-01-22 12:39:20: pid 21791:HINT: check sr_check_user and sr_check_password
2020-01-22 12:39:20: pid 21791:CONTEXT: while checking replication time lag
2020-01-22 12:39:20: pid 21791:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-01-22 12:39:20: pid 21791:ERROR: failed to make persistent db connection
2020-01-22 12:39:20: pid 21791:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
2020-01-22 12:39:21: pid 20780:LOG: watchdog received the failover command from remote pgpool-II node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-01-22 12:39:21: pid 20780:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
2020-01-22 12:39:21: pid 20780:LOG: we have got the consensus to perform the failover
2020-01-22 12:39:21: pid 20780:DETAIL: 1 node(s) voted in the favor
2020-01-22 12:39:21: pid 20780:LOG: invalid degenerate backend request, node id : 0 status: [3] is not valid for failover
2020-01-22 12:39:25: pid 21791:ERROR: Failed to check replication time lag
2020-01-22 12:39:25: pid 21791:DETAIL: No persistent db connection for the node 0
2020-01-22 12:39:25: pid 21791:HINT: check sr_check_user and sr_check_password
2020-01-22 12:39:25: pid 21791:CONTEXT: while checking replication time lag
2020-01-22 12:39:25: pid 21791:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-01-22 12:39:25: pid 21791:ERROR: failed to make persistent db connection
2020-01-22 12:39:25: pid 21791:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
2020-01-22 12:39:26: pid 20780:LOG: watchdog received the failover command from remote pgpool-II node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-01-22 12:39:26: pid 20780:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
2020-01-22 12:39:26: pid 20780:LOG: we have got the consensus to perform the failover
2020-01-22 12:39:26: pid 20780:DETAIL: 1 node(s) voted in the favor
2020-01-22 12:39:26: pid 20780:LOG: invalid degenerate backend request, node id : 0 status: [3] is not valid for failover
2020-01-22 12:39:30: pid 21791:ERROR: Failed to check replication time lag
2020-01-22 12:39:30: pid 21791:DETAIL: No persistent db connection for the node 0
2020-01-22 12:39:30: pid 21791:HINT: check sr_check_user and sr_check_password
2020-01-22 12:39:30: pid 21791:CONTEXT: while checking replication time lag
2020-01-22 12:39:30: pid 21791:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-01-22 12:39:30: pid 21791:ERROR: failed to make persistent db connection
2020-01-22 12:39:30: pid 21791:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
2020-01-22 12:39:31: pid 20780:LOG: watchdog received the failover command from remote pgpool-II node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-01-22 12:39:31: pid 20780:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
2020-01-22 12:39:31: pid 20780:LOG: we have got the consensus to perform the failover
2020-01-22 12:39:31: pid 20780:DETAIL: 1 node(s) voted in the favor
2020-01-22 12:39:31: pid 20780:LOG: invalid degenerate backend request, node id : 0 status: [3] is not valid for failover
2020-01-22 12:39:35: pid 21791:ERROR: Failed to check replication time lag
2020-01-22 12:39:35: pid 21791:DETAIL: No persistent db connection for the node 0
2020-01-22 12:55:41: pid 21791:HINT: check sr_check_user and sr_check_password
2020-01-22 12:55:41: pid 21791:CONTEXT: while checking replication time lag
2020-01-22 12:55:41: pid 21791:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-01-22 12:55:41: pid 21791:ERROR: failed to make persistent db connection
2020-01-22 12:55:41: pid 21791:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
2020-01-22 12:55:42: pid 20780:LOG: watchdog received the failover command from remote pgpool-II node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-01-22 12:55:42: pid 20780:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
2020-01-22 12:55:42: pid 20780:LOG: we have got the consensus to perform the failover
2020-01-22 12:55:42: pid 20780:DETAIL: 1 node(s) voted in the favor
2020-01-22 12:55:42: pid 20780:LOG: invalid degenerate backend request, node id : 0 status: [3] is not valid for failover

Tags:
Steps To Reproduce:
Additional Information:
Attached Files: pgpool.conf (41,787 bytes) 2020-01-31 04:59
https://www.pgpool.net/mantisbt/file_download.php?file_id=729&type=bug
failover.sh (855 bytes) 2020-02-04 15:58
https://www.pgpool.net/mantisbt/file_download.php?file_id=730&type=bug
pg_hba.conf (11,678 bytes) 2020-02-04 15:58
https://www.pgpool.net/mantisbt/file_download.php?file_id=731&type=bug
pgpool-2.conf (41,871 bytes) 2020-02-04 15:58
https://www.pgpool.net/mantisbt/file_download.php?file_id=732&type=bug
postgresql.conf (24,089 bytes) 2020-02-04 15:58
https://www.pgpool.net/mantisbt/file_download.php?file_id=734&type=bug
pool_hba.conf (4,461 bytes) 2020-02-04 15:58
https://www.pgpool.net/mantisbt/file_download.php?file_id=733&type=bug
promote.sh (11,131 bytes) 2020-02-04 15:58
https://www.pgpool.net/mantisbt/file_download.php?file_id=735&type=bug
recovery.conf (352 bytes) 2020-02-04 15:58
https://www.pgpool.net/mantisbt/file_download.php?file_id=736&type=bug
mawidstg01-2020-02-09_000000.log (10,485,834 bytes) 2020-02-09 19:40
https://www.pgpool.net/mantisbt/file_download.php?file_id=737&type=bug
pgpool-3.conf (41,865 bytes) 2020-02-09 19:40
https://www.pgpool.net/mantisbt/file_download.php?file_id=739&type=bug
mawidstg01-2020-02-09_131237.log (128,546 bytes) 2020-02-09 19:40
https://www.pgpool.net/mantisbt/file_download.php?file_id=738&type=bug
pgpool.zip (1,210,974 bytes) 2020-02-09 19:55
https://www.pgpool.net/mantisbt/file_download.php?file_id=740&type=bug
pgpool_10feb1126am.log (340,102 bytes) 2020-02-10 17:34
https://www.pgpool.net/mantisbt/file_download.php?file_id=741&type=bug
pgpool.log_1126amLogWithRecoveryError (1,163,032 bytes) 2020-02-10 20:50
https://www.pgpool.net/mantisbt/file_download.php?file_id=742&type=bug
mawidstg01-2020-02-11_133657.log (351,585 bytes) 2020-02-11 19:50
https://www.pgpool.net/mantisbt/file_download.php?file_id=743&type=bug
postgresdblog-2020-02-12_185142.zip (67,948 bytes) 2020-02-13 02:20
https://www.pgpool.net/mantisbt/file_download.php?file_id=745&type=bug
pgpool_log_12feb.zip (630,397 bytes) 2020-02-13 02:20
https://www.pgpool.net/mantisbt/file_download.php?file_id=744&type=bug
Notes
(0003083)
raj.pandey1982@gmail.com   
2020-01-23 00:23   
Not Sure even after fail over command execute and slave is up as Master then also pgool services are not coming back to normal and keep on showing above error and hence remote connection is not happenning
(0003084)
raj.pandey1982@gmail.com   
2020-01-26 17:49   
Please suggest something her if possible .its little urgent.
(0003085)
raj.pandey1982@gmail.com   
2020-01-26 17:55   
Below messhage in log keep on comming on both nodes
:-
2020-01-23 12:14:16: pid 2909:LOG: watchdog received the failover command from remote pgpool-II node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-01-23 12:14:16: pid 2909:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
2020-01-23 12:14:16: pid 2909:LOG: we have got the consensus to perform the failover
2020-01-23 12:14:16: pid 2909:DETAIL: 1 node(s) voted in the favor
2020-01-23 12:14:16: pid 2909:LOG: received degenerate backend request for node_id: 0 from pid [2909]
2020-01-23 12:14:16: pid 2907:LOG: Pgpool-II parent process has received failover request
2020-01-23 12:14:16: pid 2909:LOG: new IPC connection received
2020-01-23 12:14:16: pid 2909:LOG: received the failover indication from Pgpool-II on IPC interface
2020-01-23 12:14:16: pid 2909:LOG: watchdog is informed of failover start by the main process
2020-01-23 12:14:16: pid 2907:LOG: starting degeneration. shutdown host pgpool-poc01.novalocal(5432)
2020-01-23 12:14:16: pid 2907:LOG: Restart all children
2020-01-23 12:14:16: pid 2907:LOG: execute command: /usr/share/pgpool/4.1.0/etc/failover.sh 0 0 pgpool-poc02.novalocal reppassword /installer/postgresql-11.5/data/im_the_master
Authorized Uses Only.All activity may be Monitored and Reported
promote - Start
DEBUG: The script will be executed with the following arguments:
DEBUG: --trigger-file=/installer/postgresql-11.5/data/im_the_master
DEBUG: --standby_file=/installer/postgresql-11.5/data/im_slave
DEBUG: --demote-host=
DEBUG: --user=replication
DEBUG: --password=reppassword
DEBUG: --force
INFO: Checking if standby file exists...
INFO: Checking if trigger file exists...
INFO: Deleting recovery.conf file...
INFO: Checking if postgresql.conf file exists...
INFO: postgresql.conf file found. Checking if it is for primary server...
INFO: postgresql.conf file corresponds to primary server file. Nothing to do.
pg_ctl: server is running (PID: 30170)
/usr/local/pgsql11.5/bin/postgres "-D" "/installer/postgresql-11.5/data"
INFO: Restarting postgresql service...
waiting for server to shut down.... done
server stopped
waiting for server to start....2020-01-23 12:14:17 +03 LOG: listening on IPv4 address "0.0.0.0", port 5432
2020-01-23 12:14:17 +03 LOG: listening on IPv6 address "::", port 5432
2020-01-23 12:14:17 +03 LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2020-01-23 12:14:17 +03 LOG: redirecting log output to logging collector process
2020-01-23 12:14:17 +03 HINT: Future log output will appear in directory "/dblogs/logs".
 done
server started
pg_ctl: server is running (PID: 31964)
/usr/local/pgsql11.5/bin/postgres "-D" "/installer/postgresql-11.5/data"
INFO: postgresql already running.
INFO: Ensuring replication role and password...
INFO: Replication role found. Ensuring password...
ALTER ROLE
INFO: Creating primary info file...
promote - Done!
2020-01-23 12:14:18: pid 3928:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-01-23 12:14:18: pid 3928:ERROR: failed to make persistent db connection
2020-01-23 12:14:18: pid 3928:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
2020-01-23 12:14:21: pid 2909:LOG: watchdog received the failover command from remote pgpool-II node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-01-23 12:14:21: pid 2909:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
2020-01-23 12:14:21: pid 2909:LOG: we have got the consensus to perform the failover
2020-01-23 12:14:21: pid 2909:DETAIL: 1 node(s) voted in the favor
2020-01-23 12:14:21: pid 2909:LOG: invalid degenerate backend request, node id : 0 status: [3] is not valid for failover
2020-01-23 12:14:23: pid 3928:ERROR: Failed to check replication time lag
2020-01-23 12:14:23: pid 3928:DETAIL: No persistent db connection for the node 0
2020-01-23 12:14:23: pid 3928:HINT: check sr_check_user and sr_check_password
2020-01-23 12:14:23: pid 3928:CONTEXT: while checking replication time lag
2020-01-23 12:14:23: pid 3928:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-01-23 12:14:23: pid 3928:ERROR: failed to make persistent db connection
2020-01-23 12:14:23: pid 3928:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
2020-01-23 12:14:26: pid 2909:LOG: watchdog received the failover command from remote pgpool-II node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-01-23 12:14:26: pid 2909:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
2020-01-23 12:14:26: pid 2909:LOG: we have got the consensus to perform the failover
2020-01-23 12:14:26: pid 2909:DETAIL: 1 node(s) voted in the favor
2020-01-23 12:14:26: pid 2909:LOG: invalid degenerate backend request, node id : 0 status: [3] is not valid for failover
2020-01-23 12:14:28: pid 3928:ERROR: Failed to check replication time lag
2020-01-23 12:14:28: pid 3928:DETAIL: No persistent db connection for the node 0
2020-01-23 12:14:28: pid 3928:HINT: check sr_check_user and sr_check_password
2020-01-23 12:14:28: pid 3928:CONTEXT: while checking replication time lag
2020-01-23 12:14:28: pid 3928:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-01-23 12:14:28: pid 3928:ERROR: failed to make persistent db connection
2020-01-23 12:14:28: pid 3928:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
(0003086)
raj.pandey1982@gmail.com   
2020-01-26 18:20   
What i simply want here that when i made master DB down at node 1 the pgpool running on node 1 and 2 should go in sync with slave db at node 2 which is now become master .That is it.
But here looks like pgpool still searchingg of Node 1 old master Db which is down and not syncing with node 2 which is become master after successful failover script run .
(0003087)
raj.pandey1982@gmail.com   
2020-01-27 17:41   
Hello Team, Any update on this!
(0003088)
raj.pandey1982@gmail.com   
2020-01-27 17:45   
I dont know when promotion is done then why still its keep on searching for old node and not allowing stable remote connections:-
log:-
promote - Done!
2020-01-23 12:14:18: pid 3928:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-01-23 12:14:18: pid 3928:ERROR: failed to make persistent db connection
2020-01-23 12:14:18: pid 3928:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
2020-01-23 12:14:21: pid 2909:LOG: watchdog received the failover command from remote pgpool-II node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-01-23 12:14:21: pid 2909:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
2020-01-23 12:14:21: pid 2909:LOG: we have got the consensus to perform the failover
2020-01-23 12:14:21: pid 2909:DETAIL: 1 node(s) voted in the favor
2020-01-23 12:14:21: pid 2909:LOG: invalid degenerate backend request, node id : 0 status: [3] is not valid for failover
2020-01-23 12:14:23: pid 3928:ERROR: Failed to check replication time lag
2020-01-23 12:14:23: pid 3928:DETAIL: No persistent db connection for the node 0
2020-01-23 12:14:23: pid 3928:HINT: check sr_check_user and sr_check_password
2020-01-23 12:14:23: pid 3928:CONTEXT: while checking replication time lag
2020-01-23 12:14:23: pid 3928:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-01-23 12:14:23: pid 3928:ERROR: failed to make persistent db connection
2020-01-23 12:14:23: pid 3928:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
2020-01-23 12:14:26: pid 2909:LOG: watchdog received the failover command from remote pgpool-II node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-01-23 12:14:26: pid 2909:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
2020-01-23 12:14:26: pid 2909:LOG: we have got the consensus to perform the failover
2020-01-23 12:14:26: pid 2909:DETAIL: 1 node(s) voted in the favor
2020-01-23 12:14:26: pid 2909:LOG: invalid degenerate backend request, node id : 0 status: [3] is not valid for failover
2020-01-23 12:14:28: pid 3928:ERROR: Failed to check replication time lag
2020-01-23 12:14:28: pid 3928:DETAIL: No persistent db connection for the node 0
2020-01-23 12:14:28: pid 3928:HINT: check sr_check_user and sr_check_password
2020-01-23 12:14:28: pid 3928:CONTEXT: while checking replication time lag
2020-01-23 12:14:28: pid 3928:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-01-23 12:14:28: pid 3928:ERROR: failed to make persistent db connection
2020-01-23 12:14:28: pid 3928:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
(0003089)
raj.pandey1982@gmail.com   
2020-01-27 19:21   
search_primary_node_timeout=10 also not working in my case to avoid below error again and again as this is streaming replication feature while mine is Master-Slave replication:-

"2020-01-23 12:14:18: pid 3928:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused""
(0003090)
raj.pandey1982@gmail.com   
2020-01-27 19:25   
is there any way this parameter "search_primary_node_timeout=10" can be set in Master Slave mode too so that pgpool stops searching for old down backend master node 1 and start accepting remote connections!
(0003094)
t-ishii   
2020-01-28 18:22   
I need to check your configuration. Can you share pgpool.conf?
(0003095)
raj.pandey1982@gmail.com   
2020-01-28 19:01   
Master Log:-
[root@pgpool-poc01 postgresql]# cat /usr/share/pgpool/4.1.0/etc/pgpool.conf
# ----------------------------
# pgPool-II configuration file
# ----------------------------
#
# This file consists of lines of the form:
#
# name = value
#
# Whitespace may be used. Comments are introduced with "#" anywhere on a line.
# The complete list of parameter names and allowed values can be found in the
# pgPool-II documentation.
#
# This file is read on server startup and when the server receives a SIGHUP
# signal. If you edit the file on a running system, you have to SIGHUP the
# server for the changes to take effect, or use "pgpool reload". Some
# parameters, which are marked below, require a server shutdown and restart to
# take effect.
#


#------------------------------------------------------------------------------
# CONNECTIONS
#------------------------------------------------------------------------------

# - pgpool Connection Settings -

listen_addresses = '*'
                                   # Host name or IP address to listen on:
                                   # '*' for all, '' for no TCP/IP connections
                                   # (change requires restart)
port = 5433
                                   # Port number
                                   # (change requires restart)
socket_dir = '/var/run/postgresql'
                                   # Unix domain socket path
                                   # The Debian package defaults to
                                   # /var/run/postgresql
                                   # (change requires restart)
listen_backlog_multiplier = 2
                                   # Set the backlog parameter of listen(2) to
                                   # num_init_children * listen_backlog_multiplier.
                                   # (change requires restart)
serialize_accept = off
                                   # whether to serialize accept() call to avoid thundering herd problem
                                   # (change requires restart)

# - pgpool Communication Manager Connection Settings -

pcp_listen_addresses = '*'
                                   # Host name or IP address for pcp process to listen on:
                                   # '*' for all, '' for no TCP/IP connections
                                   # (change requires restart)
pcp_port = 9898
                                   # Port number for pcp
                                   # (change requires restart)
pcp_socket_dir = '/var/run/postgresql'
                                   # Unix domain socket path for pcp
                                   # The Debian package defaults to
                                   # /var/run/postgresql
                                   # (change requires restart)

# - Backend Connection Settings -

                                   # Host name or IP address to connect to for backend 0
                                   # Port number for backend 0
                                   # Weight for backend 0 (only in load balancing mode)
                                   # Data directory for backend 0
                                   # Controls various backend behavior
                                   # ALLOW_TO_FAILOVER, DISALLOW_TO_FAILOVER
                                   # or ALWAYS_MASTER

# - Authentication -

enable_pool_hba = on
                                   # Use pool_hba.conf for client authentication
pool_passwd = 'pool_passwd'
                                   # File name of pool_passwd for md5 authentication.
                                   # "" disables pool_passwd.
                                   # (change requires restart)
authentication_timeout = 60
                                   # Delay in seconds to complete client authentication
                                   # 0 means no timeout.

allow_clear_text_frontend_auth = off
                                   # Allow Pgpool-II to use clear text password authentication
                                   # with clients, when pool_passwd does not
                                   # contain the user password


# - SSL Connections -

ssl = off
                                   # Enable SSL support
                                   # (change requires restart)
#ssl_key = './server.key'
                                   # Path to the SSL private key file
                                   # (change requires restart)
#ssl_cert = './server.cert'
                                   # Path to the SSL public certificate file
                                   # (change requires restart)
#ssl_ca_cert = ''
                                   # Path to a single PEM format file
                                   # containing CA root certificate(s)
                                   # (change requires restart)
#ssl_ca_cert_dir = ''
                                   # Directory containing CA root certificate(s)
                                   # (change requires restart)

ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL'
                                   # Allowed SSL ciphers
                                   # (change requires restart)
ssl_prefer_server_ciphers = off
                                   # Use server's SSL cipher preferences,
                                   # rather than the client's
                                   # (change requires restart)
#------------------------------------------------------------------------------
# POOLS
#------------------------------------------------------------------------------

# - Concurrent session and pool size -

num_init_children = 1000
                                   # Number of concurrent sessions allowed
                                   # (change requires restart)
max_pool = 10
#max_pool = 4
                                   # Number of connection pool caches per connection
                                   # (change requires restart)

# - Life time -

child_life_time = 300
                                   # Pool exits after being idle for this many seconds
child_max_connections = 0
                                   # Pool exits after receiving that many connections
                                   # 0 means no exit
connection_life_time = 0
                                   # Connection to backend closes after being idle for this many seconds
                                   # 0 means no close
#client_idle_limit = 0
                                   # Client is disconnected after being idle for that many seconds
                                   # (even inside an explicit transactions!)
                                   # 0 means no disconnection

reserved_connections = 1
#------------------------------------------------------------------------------
# LOGS
#------------------------------------------------------------------------------

# - Where to log -

log_destination = 'stderr'
                                   # Where to log
                                   # Valid values are combinations of stderr,
                                   # and syslog. Default to stderr.

# - What to log -

log_line_prefix = '%t: pid %p:'

log_connections = off
                                   # Log connections
log_hostname = off
                                   # Hostname will be shown in ps status
                                   # and in logs if connections are logged
log_statement = off
                                   # Log all statements
log_per_node_statement = off
                                   # Log all statements
                                   # with node and backend informations
log_client_messages = off
                                   # Log any client messages
log_standby_delay = 'none'
                                   # Log standby delay
                                   # Valid values are combinations of always,
                                   # if_over_threshold, none

# - Syslog specific -

syslog_facility = 'LOCAL0'
                                   # Syslog local facility. Default to LOCAL0
syslog_ident = 'pgpool'
                                   # Syslog program identification string
                                   # Default to 'pgpool'

# - Debug -

#log_error_verbosity = default # terse, default, or verbose messages

#client_min_messages = notice # values in order of decreasing detail:
                                        # debug5
                                        # debug4
                                        # debug3
                                        # debug2
                                        # debug1
                                        # log
                                        # notice
                                        # warning
                                        # error

#log_min_messages = warning # values in order of decreasing detail:
                                        # debug5
                                        # debug4
                                        # debug3
                                        # debug2
                                        # debug1
                                        # info
                                        # notice
                                        # warning
                                        # error
                                        # log
                                        # fatal
                                        # panic

#------------------------------------------------------------------------------
# FILE LOCATIONS
#------------------------------------------------------------------------------

pid_file_name = '/var/run/postgresql/pgpool.pid'
                                   # PID file name
                                   # Can be specified as relative to the"
                                   # location of pgpool.conf file or
                                   # as an absolute path
                                   # (change requires restart)
logdir = '/var/log/pgpool'
                                   # Directory of pgPool status file
                                   # (change requires restart)


#------------------------------------------------------------------------------
# CONNECTION POOLING
#------------------------------------------------------------------------------

connection_cache = on
                                   # Activate connection pools
                                   # (change requires restart)

                                   # Semicolon separated list of queries
                                   # to be issued at the end of a session
                                   # The default is for 8.3 and later
reset_query_list = 'ABORT; DISCARD ALL'
                                   # The following one is for 8.2 and before
#reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'


#------------------------------------------------------------------------------
# REPLICATION MODE
#------------------------------------------------------------------------------

replication_mode = off
                                   # Activate replication mode
                                   # (change requires restart)
replicate_select = off
                                   # Replicate SELECT statements
                                   # when in replication mode
                                   # replicate_select is higher priority than
                                   # load_balance_mode.

insert_lock = on
                                   # Automatically locks a dummy row or a table
                                   # with INSERT statements to keep SERIAL data
                                   # consistency
                                   # Without SERIAL, no lock will be issued
lobj_lock_table = ''
                                   # When rewriting lo_creat command in
                                   # replication mode, specify table name to
                                   # lock

# - Degenerate handling -

replication_stop_on_mismatch = off
                                   # On disagreement with the packet kind
                                   # sent from backend, degenerate the node
                                   # which is most likely "minority"
                                   # If off, just force to exit this session

failover_if_affected_tuples_mismatch = off
                                   # On disagreement with the number of affected
                                   # tuples in UPDATE/DELETE queries, then
                                   # degenerate the node which is most likely
                                   # "minority".
                                   # If off, just abort the transaction to
                                   # keep the consistency


#------------------------------------------------------------------------------
# LOAD BALANCING MODE
#------------------------------------------------------------------------------

load_balance_mode = on
                                   # Activate load balancing mode
                                   # (change requires restart)
ignore_leading_white_space = on
                                   # Ignore leading white spaces of each query
white_function_list = ''
                                   # Comma separated list of function names
                                   # that don't write to database
                                   # Regexp are accepted
black_function_list = 'Get_Appt_code,walkin_appointment_token_no,findAppointmentFacilityTypeIdNew,findAppointmentReferUrgencyType,walkin_appointment_token_no,walkin_appointment_token_no'
black_function_list = 'currval,lastval,nextval,setval'
                                   # Comma separated list of function names
                                   # that write to database
                                   # Regexp are accepted

black_query_pattern_list = ''
                                   # Semicolon separated list of query patterns
                                   # that should be sent to primary node
                                   # Regexp are accepted
                                   # valid for streaming replicaton mode only.

database_redirect_preference_list = ''
                                   # comma separated list of pairs of database and node id.
                                   # example: postgres:primary,mydb[0-4]:1,mydb[5-9]:2'
                                   # valid for streaming replicaton mode only.
app_name_redirect_preference_list = ''
                                   # comma separated list of pairs of app name and node id.
                                   # example: 'psql:primary,myapp[0-4]:1,myapp[5-9]:standby'
                                   # valid for streaming replicaton mode only.
allow_sql_comments = off
                                   # if on, ignore SQL comments when judging if load balance or
                                   # query cache is possible.
                                   # If off, SQL comments effectively prevent the judgment
                                   # (pre 3.4 behavior).

disable_load_balance_on_write = 'transaction'
                                   # Load balance behavior when write query is issued
                                   # in an explicit transaction.
                                   # Note that any query not in an explicit transaction
                                   # is not affected by the parameter.
                                   # 'transaction' (the default): if a write query is issued,
                                   # subsequent read queries will not be load balanced
                                   # until the transaction ends.
                                   # 'trans_transaction': if a write query is issued,
                                   # subsequent read queries in an explicit transaction
                                   # will not be load balanced until the session ends.
                                   # 'always': if a write query is issued, read queries will
                                   # not be load balanced until the session ends.

#------------------------------------------------------------------------------
# MASTER/SLAVE MODE
#------------------------------------------------------------------------------

master_slave_mode = on
                                   # Activate master/slave mode
                                   # (change requires restart)
master_slave_sub_mode = 'stream'
                                   # Master/slave sub mode
                                   # Valid values are combinations stream, slony
                                   # or logical. Default is stream.
                                   # (change requires restart)

# - Streaming -

sr_check_period = 5 #0
                                   # Streaming replication check period
                                   # Disabled (0) by default
sr_check_user = 'replication'
                                   # Streaming replication check user
                                   # This is necessary even if you disable
                                   # streaming replication delay check with
                                   # sr_check_period = 0

sr_check_password = 'reppassword'
                                   # Password for streaming replication check user.
                                   # Leaving it empty will make Pgpool-II to first look for the
                                   # Password in pool_passwd file before using the empty password
sr_check_database = 'mawidstg01'
#sr_check_database = 'postgres'
                                   # Database name for streaming replication check
delay_threshold = 0
                                   # Threshold before not dispatching query to standby node
                                   # Unit is in bytes
                                   # Disabled (0) by default

# - Special commands -

follow_master_command = ''
                                   # Executes this command after master failover
                                   # Special values:
                                   # %d = node id
                                   # %h = host name
                                   # %p = port number
                                   # %D = database cluster path
                                   # %m = new master node id
                                   # %H = hostname of the new master node
                                   # %M = old master node id
                                   # %P = old primary node id
                                   # %r = new master port number
                                   # %R = new master database cluster path
                                   # %% = '%' character

#------------------------------------------------------------------------------
# HEALTH CHECK GLOBAL PARAMETERS
#------------------------------------------------------------------------------

health_check_period = 5
                                   # Health check period
                                   # Disabled (0) by default
health_check_timeout = 10
#health_check_timeout = 0
                                   # Health check timeout
                                   # 0 means no timeout
health_check_user = 'postgres'
                                   # Health check user
health_check_password = 'TEXTpostgrestg'
                                   # Password for health check user
                                   # Leaving it empty will make Pgpool-II to first look for the
                                   # Password in pool_passwd file before using the empty password

health_check_database = 'mawidstg01'
                                   # Database name for health check. If '', tries 'postgres' frist, then 'template1'

health_check_max_retries = 0
                                   # Maximum number of times to retry a failed health check before giving up.
health_check_retry_delay = 1
                                   # Amount of time to wait (in seconds) between retries.
connect_timeout = 10000
                                   # Timeout value in milliseconds before giving up to connect to backend.
                                   # Default is 10000 ms (10 second). Flaky network user may want to increase
                                   # the value. 0 means no timeout.
                                   # Note that this value is not only used for health check,
                                   # but also for ordinary conection to backend.

#------------------------------------------------------------------------------
# HEALTH CHECK PER NODE PARAMETERS (OPTIONAL)
#------------------------------------------------------------------------------
#health_check_period0 = 0
#health_check_timeout0 = 20
#health_check_user0 = 'nobody'
#health_check_password0 = ''
#health_check_database0 = ''
#health_check_max_retries0 = 0
#health_check_retry_delay0 = 1
#connect_timeout0 = 10000

#------------------------------------------------------------------------------
# FAILOVER AND FAILBACK
#------------------------------------------------------------------------------
failover_command = '/usr/share/pgpool/4.1.0/etc/failover.sh %d %P %H reppassword /installer/postgresql-11.5/data/im_the_master'
#failover_command = '/etc/pgpool-II/failover.sh %d %P %H reppassword /installer/postgresql-11.5/data/im_the_master'
                                   # Executes this command at failover
                                   # Special values:
                                   # %d = node id
                                   # %h = host name
                                   # %p = port number
                                   # %D = database cluster path
                                   # %m = new master node id
                                   # %H = hostname of the new master node
                                   # %M = old master node id
                                   # %P = old primary node id
                                   # %r = new master port number
                                   # %R = new master database cluster path
                                   # %% = '%' character
failback_command = ''
                                   # Executes this command at failback.
                                   # Special values:
                                   # %d = node id
                                   # %h = host name
                                   # %p = port number
                                   # %D = database cluster path
                                   # %m = new master node id
                                   # %H = hostname of the new master node
                                   # %M = old master node id
                                   # %P = old primary node id
                                   # %r = new master port number
                                   # %R = new master database cluster path
                                   # %% = '%' character

failover_on_backend_error = on
                                   # Initiates failover when reading/writing to the
                                   # backend communication socket fails
                                   # If set to off, pgpool will report an
                                   # error and disconnect the session.

detach_false_primary = off
                                   # Detach false primary if on. Only
                                   # valid in streaming replicaton
                                   # mode and with PostgreSQL 9.6 or
                                   # after.
search_primary_node_timeout = 10
#search_primary_node_timeout = 300
                                   # Timeout in seconds to search for the
                                   # primary node when a failover occurs.
                                   # 0 means no timeout, keep searching
                                   # for a primary node forever.

#------------------------------------------------------------------------------
# ONLINE RECOVERY
#------------------------------------------------------------------------------

recovery_user = 'postgres'
                                   # Online recovery user
recovery_password = 'postgrestg'
                                   # Online recovery password
                                   # Leaving it empty will make Pgpool-II to first look for the
                                   # Password in pool_passwd file before using the empty password

recovery_1st_stage_command = 'recovery_1st_stage.sh'
                                   # Executes a command in first stage
recovery_2nd_stage_command = ''
                                   # Executes a command in second stage
recovery_timeout = 90
                                   # Timeout in seconds to wait for the
                                   # recovering node's postmaster to start up
                                   # 0 means no wait
client_idle_limit_in_recovery = 0
                                   # Client is disconnected after being idle
                                   # for that many seconds in the second stage
                                   # of online recovery
                                   # 0 means no disconnection
                                   # -1 means immediate disconnection


#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------

# - Enabling -

use_watchdog = on
                                    # Activates watchdog
                                    # (change requires restart)

# -Connection to up stream servers -
trusted_servers = 'mohvcasdb01.novalocal,mohcasdevdb.novalocal'
                                    # trusted server list which are used
                                    # to confirm network connection
                                    # (hostA,hostB,hostC,...)
                                    # (change requires restart)
ping_path = '/bin'
                                    # ping command path
                                    # (change requires restart)

# - Watchdog communication Settings -

wd_hostname = 'pgpool-poc01.novalocal'
                                    # Host name or IP address of this watchdog
                                    # (change requires restart)
wd_port = 9000
                                    # port number for watchdog service
                                    # (change requires restart)
wd_priority = 1
                                    # priority of this watchdog in leader election
                                    # (change requires restart)

wd_authkey = ''
                                    # Authentication key for watchdog communication
                                    # (change requires restart)

wd_ipc_socket_dir = '/var/run/postgresql'
                                    # Unix domain socket path for watchdog IPC socket
                                    # The Debian package defaults to
                                    # /var/run/postgresql
                                    # (change requires restart)


# - Virtual IP control Setting -

delegate_IP = '10.70.184.29'
                                    # delegate IP address
                                    # If this is empty, virtual IP never bring up.
                                    # (change requires restart)
if_cmd_path = '/sbin'
                                    # path to the directory where if_up/down_cmd exists
                                    # (change requires restart)
if_up_cmd = 'ip addr add $_IP_$/24 dev eth0 label eth0:0'
                                    # startup delegate IP command
                                    # (change requires restart)
if_down_cmd = 'ip addr del $_IP_$/24 dev eth0'
                                    # shutdown delegate IP command
                                    # (change requires restart)
arping_path = '/usr/sbin'
                                    # arping command path
                                    # (change requires restart)
arping_cmd = 'arping -U $_IP_$ -w 1 -I eth0'
                                    # arping command
                                    # (change requires restart)

# - Behaivor on escalation Setting -

clear_memqcache_on_escalation = on
                                    # Clear all the query cache on shared memory
                                    # when standby pgpool escalate to active pgpool
                                    # (= virtual IP holder).
                                    # This should be off if client connects to pgpool
                                    # not using virtual IP.
                                    # (change requires restart)
wd_escalation_command = ''
                                    # Executes this command at escalation on new active pgpool.
                                    # (change requires restart)
wd_de_escalation_command = ''
                                    # Executes this command when master pgpool resigns from being master.
                                    # (change requires restart)

# - Watchdog consensus settings for failover -

failover_when_quorum_exists = on
                                    # Only perform backend node failover
                                    # when the watchdog cluster holds the quorum
                                    # (change requires restart)

failover_require_consensus = on
                                    # Perform failover when majority of Pgpool-II nodes
                                    # aggrees on the backend node status change
                                    # (change requires restart)

allow_multiple_failover_requests_from_node = off
                                    # A Pgpool-II node can cast multiple votes
                                    # for building the consensus on failover
                                    # (change requires restart)

# - Lifecheck Setting -

# -- common --

wd_monitoring_interfaces_list = ''
                                    # if any interface from the list is active the watchdog will
                                    # consider the network is fine
                                    # 'any' to enable monitoring on all interfaces except loopback
                                    # '' to disable monitoring
                                    # (change requires restart)


wd_lifecheck_method = 'heartbeat'
                                    # Method of watchdog lifecheck ('heartbeat' or 'query' or 'external')
                                    # (change requires restart)
wd_interval = 3
                                    # lifecheck interval (sec) > 0
                                    # (change requires restart)

# -- heartbeat mode --

wd_heartbeat_port = 9694
                                    # Port number for receiving heartbeat signal
                                    # (change requires restart)
wd_heartbeat_keepalive = 2
                                    # Interval time of sending heartbeat signal (sec)
                                    # (change requires restart)
wd_heartbeat_deadtime = 30
                                    # Deadtime interval for heartbeat signal (sec)
                                    # (change requires restart)
                                    # Host name or IP address of destination 0
                                    # for sending heartbeat signal.
                                    # (change requires restart)
                                    # Port number of destination 0 for sending
                                    # heartbeat signal. Usually this is the
                                    # same as wd_heartbeat_port.
                                    # (change requires restart)
                                    # Name of NIC device (such like 'eth0')
                                    # used for sending/receiving heartbeat
                                    # signal to/from destination 0.
                                    # This works only when this is not empty
                                    # and pgpool has root privilege.
                                    # (change requires restart)

#heartbeat_destination1 = 'host0_ip2'
#heartbeat_destination_port1 = 9694
#heartbeat_device1 = ''

# -- query mode --

wd_life_point = 3
                                    # lifecheck retry times
                                    # (change requires restart)
wd_lifecheck_query = 'SELECT 1'
                                    # lifecheck query to pgpool from watchdog
                                    # (change requires restart)
wd_lifecheck_dbname = 'template1'
                                    # Database name connected for lifecheck
                                    # (change requires restart)
wd_lifecheck_user = 'nobody'
                                    # watchdog user monitoring pgpools in lifecheck
                                    # (change requires restart)
wd_lifecheck_password = ''
                                    # Password for watchdog user in lifecheck
                                    # Leaving it empty will make Pgpool-II to first look for the
                                    # Password in pool_passwd file before using the empty password
                                    # (change requires restart)

# - Other pgpool Connection Settings -

                                    # Host name or IP address to connect to for other pgpool 0
                                    # (change requires restart)
                                    # Port number for other pgpool 0
                                    # (change requires restart)
                                    # Port number for other watchdog 0
                                    # (change requires restart)
#other_pgpool_hostname1 = 'host1'
#other_pgpool_port1 = 5432
#other_wd_port1 = 9000


#------------------------------------------------------------------------------
# OTHERS
#------------------------------------------------------------------------------
relcache_expire = 0
                                   # Life time of relation cache in seconds.
                                   # 0 means no cache expiration(the default).
                                   # The relation cache is used for cache the
                                   # query result against PostgreSQL system
                                   # catalog to obtain various information
                                   # including table structures or if it's a
                                   # temporary table or not. The cache is
                                   # maintained in a pgpool child local memory
                                   # and being kept as long as it survives.
                                   # If someone modify the table by using
                                   # ALTER TABLE or some such, the relcache is
                                   # not consistent anymore.
                                   # For this purpose, cache_expiration
                                   # controls the life time of the cache.

relcache_size = 256
                                   # Number of relation cache
                                   # entry. If you see frequently:
                                   # "pool_search_relcache: cache replacement happend"
                                   # in the pgpool log, you might want to increate this number.

check_temp_table = on
                                   # If on, enable temporary table check in SELECT statements.
                                   # This initiates queries against system catalog of primary/master
                                   # thus increases load of master.
                                   # If you are absolutely sure that your system never uses temporary tables
                                   # and you want to save access to primary/master, you could turn this off.
                                   # Default is on.

check_unlogged_table = on
                                   # If on, enable unlogged table check in SELECT statements.
                                   # This initiates queries against system catalog of primary/master
                                   # thus increases load of master.
                                   # If you are absolutely sure that your system never uses unlogged tables
                                   # and you want to save access to primary/master, you could turn this off.
                                   # Default is on.

#------------------------------------------------------------------------------
# IN MEMORY QUERY MEMORY CACHE
#------------------------------------------------------------------------------
memory_cache_enabled = off
                                   # If on, use the memory cache functionality, off by default
                                   # (change requires restart)
memqcache_method = 'shmem'
                                   # Cache storage method. either 'shmem'(shared memory) or
                                   # 'memcached'. 'shmem' by default
                                   # (change requires restart)
memqcache_memcached_host = 'localhost'
                                   # Memcached host name or IP address. Mandatory if
                                   # memqcache_method = 'memcached'.
                                   # Defaults to localhost.
                                   # (change requires restart)
memqcache_memcached_port = 11211
                                   # Memcached port number. Mondatory if memqcache_method = 'memcached'.
                                   # Defaults to 11211.
                                   # (change requires restart)
memqcache_total_size = 67108864
                                   # Total memory size in bytes for storing memory cache.
                                   # Mandatory if memqcache_method = 'shmem'.
                                   # Defaults to 64MB.
                                   # (change requires restart)
memqcache_max_num_cache = 1000000
                                   # Total number of cache entries. Mandatory
                                   # if memqcache_method = 'shmem'.
                                   # Each cache entry consumes 48 bytes on shared memory.
                                   # Defaults to 1,000,000(45.8MB).
                                   # (change requires restart)
memqcache_expire = 0
                                   # Memory cache entry life time specified in seconds.
                                   # 0 means infinite life time. 0 by default.
                                   # (change requires restart)
memqcache_auto_cache_invalidation = on
                                   # If on, invalidation of query cache is triggered by corresponding
                                   # DDL/DML/DCL(and memqcache_expire). If off, it is only triggered
                                   # by memqcache_expire. on by default.
                                   # (change requires restart)
memqcache_maxcache = 409600
                                   # Maximum SELECT result size in bytes.
                                   # Must be smaller than memqcache_cache_block_size. Defaults to 400KB.
                                   # (change requires restart)
memqcache_cache_block_size = 1048576
                                   # Cache block size in bytes. Mandatory if memqcache_method = 'shmem'.
                                   # Defaults to 1MB.
                                   # (change requires restart)
memqcache_oiddir = '/var/log/pgpool/oiddir'
                                   # Temporary work directory to record table oids
                                   # (change requires restart)
white_memqcache_table_list = ''
                                   # Comma separated list of table names to memcache
                                   # that don't write to database
                                   # Regexp are accepted
black_memqcache_table_list = ''
                                   # Comma separated list of table names not to memcache
                                   # that don't write to database
                                   # Regexp are accepted
backend_hostname0 = 'pgpool-poc01.novalocal'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/installer/postgresql-11.5/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = 'pgpool-poc02.novalocal'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/installer/postgresql-11.5/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'

heartbeat_destination0 = 'pgpool-poc02.novalocal'
heartbeat_destination_port0 = 9694

other_pgpool_hostname0 = 'pgpool-poc02.novalocal'
other_pgpool_port0 = 5433
other_wd_port0 = 9000

heartbeat_device0 = 'eth0'
##Addded b Raj --> pgpool-II 4.1 onwards available parameters

statement_level_load_balance = on

enable_consensus_with_half_votes = on
(0003096)
raj.pandey1982@gmail.com   
2020-01-28 19:04   
Slave Conf:-
[root@pgpool-poc02 bin]# cat /usr/share/pgpool/4.1.0/etc/pgpool.conf
# ----------------------------
# pgPool-II configuration file
# ----------------------------
#
# This file consists of lines of the form:
#
# name = value
#
# Whitespace may be used. Comments are introduced with "#" anywhere on a line.
# The complete list of parameter names and allowed values can be found in the
# pgPool-II documentation.
#
# This file is read on server startup and when the server receives a SIGHUP
# signal. If you edit the file on a running system, you have to SIGHUP the
# server for the changes to take effect, or use "pgpool reload". Some
# parameters, which are marked below, require a server shutdown and restart to
# take effect.
#


#------------------------------------------------------------------------------
# CONNECTIONS
#------------------------------------------------------------------------------

# - pgpool Connection Settings -

listen_addresses = '*'
                                   # Host name or IP address to listen on:
                                   # '*' for all, '' for no TCP/IP connections
                                   # (change requires restart)
port = 5433
                                   # Port number
                                   # (change requires restart)
socket_dir = '/var/run/postgresql'
                                   # Unix domain socket path
                                   # The Debian package defaults to
                                   # /var/run/postgresql
                                   # (change requires restart)
listen_backlog_multiplier = 2
                                   # Set the backlog parameter of listen(2) to
                                   # num_init_children * listen_backlog_multiplier.
                                   # (change requires restart)
serialize_accept = off
                                   # whether to serialize accept() call to avoid thundering herd problem
                                   # (change requires restart)

# - pgpool Communication Manager Connection Settings -

pcp_listen_addresses = '*'
                                   # Host name or IP address for pcp process to listen on:
                                   # '*' for all, '' for no TCP/IP connections
                                   # (change requires restart)
pcp_port = 9898
                                   # Port number for pcp
                                   # (change requires restart)
pcp_socket_dir = '/var/run/postgresql'
                                   # Unix domain socket path for pcp
                                   # The Debian package defaults to
                                   # /var/run/postgresql
                                   # (change requires restart)

# - Backend Connection Settings -

                                   # Host name or IP address to connect to for backend 0
                                   # Port number for backend 0
                                   # Weight for backend 0 (only in load balancing mode)
                                   # Data directory for backend 0
                                   # Controls various backend behavior
                                   # ALLOW_TO_FAILOVER, DISALLOW_TO_FAILOVER
                                   # or ALWAYS_MASTER

# - Authentication -

enable_pool_hba = on
                                   # Use pool_hba.conf for client authentication
pool_passwd = 'pool_passwd'
                                   # File name of pool_passwd for md5 authentication.
                                   # "" disables pool_passwd.
                                   # (change requires restart)
authentication_timeout = 60
                                   # Delay in seconds to complete client authentication
                                   # 0 means no timeout.

allow_clear_text_frontend_auth = off
                                   # Allow Pgpool-II to use clear text password authentication
                                   # with clients, when pool_passwd does not
                                   # contain the user password


# - SSL Connections -

ssl = off
                                   # Enable SSL support
                                   # (change requires restart)
#ssl_key = './server.key'
                                   # Path to the SSL private key file
                                   # (change requires restart)
#ssl_cert = './server.cert'
                                   # Path to the SSL public certificate file
                                   # (change requires restart)
#ssl_ca_cert = ''
                                   # Path to a single PEM format file
                                   # containing CA root certificate(s)
                                   # (change requires restart)
#ssl_ca_cert_dir = ''
                                   # Directory containing CA root certificate(s)
                                   # (change requires restart)

ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL'
                                   # Allowed SSL ciphers
                                   # (change requires restart)
ssl_prefer_server_ciphers = off
                                   # Use server's SSL cipher preferences,
                                   # rather than the client's
                                   # (change requires restart)
#------------------------------------------------------------------------------
# POOLS
#------------------------------------------------------------------------------

# - Concurrent session and pool size -

num_init_children = 1000
                                   # Number of concurrent sessions allowed
                                   # (change requires restart)
max_pool = 10
#max_pool = 4
                                   # Number of connection pool caches per connection
                                   # (change requires restart)

# - Life time -

child_life_time = 300
                                   # Pool exits after being idle for this many seconds
child_max_connections = 0
                                   # Pool exits after receiving that many connections
                                   # 0 means no exit
connection_life_time = 0
                                   # Connection to backend closes after being idle for this many seconds
                                   # 0 means no close
client_idle_limit = 0
                                   # Client is disconnected after being idle for that many seconds
                                   # (even inside an explicit transactions!)
                                   # 0 means no disconnection

reserved_connections = 1
#------------------------------------------------------------------------------
# LOGS
#------------------------------------------------------------------------------

# - Where to log -

log_destination = 'stderr'
                                   # Where to log
                                   # Valid values are combinations of stderr,
                                   # and syslog. Default to stderr.

# - What to log -

log_line_prefix = '%t: pid %p:'

log_connections = off
                                   # Log connections
log_hostname = off
                                   # Hostname will be shown in ps status
                                   # and in logs if connections are logged
log_statement = off
                                   # Log all statements
log_per_node_statement = off
                                   # Log all statements
                                   # with node and backend informations
log_client_messages = off
                                   # Log any client messages
log_standby_delay = 'none'
                                   # Log standby delay
                                   # Valid values are combinations of always,
                                   # if_over_threshold, none

# - Syslog specific -

syslog_facility = 'LOCAL0'
                                   # Syslog local facility. Default to LOCAL0
syslog_ident = 'pgpool'
                                   # Syslog program identification string
                                   # Default to 'pgpool'

# - Debug -

#log_error_verbosity = default # terse, default, or verbose messages

#client_min_messages = notice # values in order of decreasing detail:
                                        # debug5
                                        # debug4
                                        # debug3
                                        # debug2
                                        # debug1
                                        # log
                                        # notice
                                        # warning
                                        # error

#log_min_messages = warning # values in order of decreasing detail:
                                        # debug5
                                        # debug4
                                        # debug3
                                        # debug2
                                        # debug1
                                        # info
                                        # notice
                                        # warning
                                        # error
                                        # log
                                        # fatal
                                        # panic

#------------------------------------------------------------------------------
# FILE LOCATIONS
#------------------------------------------------------------------------------

pid_file_name = '/var/run/postgresql/pgpool.pid'
                                   # PID file name
                                   # Can be specified as relative to the"
                                   # location of pgpool.conf file or
                                   # as an absolute path
                                   # (change requires restart)
logdir = '/var/log/pgpool'
                                   # Directory of pgPool status file
                                   # (change requires restart)


#------------------------------------------------------------------------------
# CONNECTION POOLING
#------------------------------------------------------------------------------

connection_cache = on
                                   # Activate connection pools
                                   # (change requires restart)

                                   # Semicolon separated list of queries
                                   # to be issued at the end of a session
                                   # The default is for 8.3 and later
reset_query_list = 'ABORT; DISCARD ALL'
                                   # The following one is for 8.2 and before
#reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'


#------------------------------------------------------------------------------
# REPLICATION MODE
#------------------------------------------------------------------------------

replication_mode = off
                                   # Activate replication mode
                                   # (change requires restart)
replicate_select = off
                                   # Replicate SELECT statements
                                   # when in replication mode
                                   # replicate_select is higher priority than
                                   # load_balance_mode.

insert_lock = on
                                   # Automatically locks a dummy row or a table
                                   # with INSERT statements to keep SERIAL data
                                   # consistency
                                   # Without SERIAL, no lock will be issued
lobj_lock_table = ''
                                   # When rewriting lo_creat command in
                                   # replication mode, specify table name to
                                   # lock

# - Degenerate handling -

replication_stop_on_mismatch = off
                                   # On disagreement with the packet kind
                                   # sent from backend, degenerate the node
                                   # which is most likely "minority"
                                   # If off, just force to exit this session

failover_if_affected_tuples_mismatch = off
                                   # On disagreement with the number of affected
                                   # tuples in UPDATE/DELETE queries, then
                                   # degenerate the node which is most likely
                                   # "minority".
                                   # If off, just abort the transaction to
                                   # keep the consistency


#------------------------------------------------------------------------------
# LOAD BALANCING MODE
#------------------------------------------------------------------------------

load_balance_mode = on
                                   # Activate load balancing mode
                                   # (change requires restart)
ignore_leading_white_space = on
                                   # Ignore leading white spaces of each query
white_function_list = ''
                                   # Comma separated list of function names
                                   # that don't write to database
                                   # Regexp are accepted
black_function_list = 'Get_Appt_code,walkin_appointment_token_no,findAppointmentFacilityTypeIdNew,findAppointmentReferUrgencyType,walkin_appointment_token_no,walkin_appointment_token_no'
black_function_list = 'currval,lastval,nextval,setval'
                                   # Comma separated list of function names
                                   # that write to database
                                   # Regexp are accepted

black_query_pattern_list = ''
                                   # Semicolon separated list of query patterns
                                   # that should be sent to primary node
                                   # Regexp are accepted
                                   # valid for streaming replicaton mode only.

database_redirect_preference_list = ''
                                   # comma separated list of pairs of database and node id.
                                   # example: postgres:primary,mydb[0-4]:1,mydb[5-9]:2'
                                   # valid for streaming replicaton mode only.
app_name_redirect_preference_list = ''
                                   # comma separated list of pairs of app name and node id.
                                   # example: 'psql:primary,myapp[0-4]:1,myapp[5-9]:standby'
                                   # valid for streaming replicaton mode only.
allow_sql_comments = off
                                   # if on, ignore SQL comments when judging if load balance or
                                   # query cache is possible.
                                   # If off, SQL comments effectively prevent the judgment
                                   # (pre 3.4 behavior).

disable_load_balance_on_write = 'transaction'
                                   # Load balance behavior when write query is issued
                                   # in an explicit transaction.
                                   # Note that any query not in an explicit transaction
                                   # is not affected by the parameter.
                                   # 'transaction' (the default): if a write query is issued,
                                   # subsequent read queries will not be load balanced
                                   # until the transaction ends.
                                   # 'trans_transaction': if a write query is issued,
                                   # subsequent read queries in an explicit transaction
                                   # will not be load balanced until the session ends.
                                   # 'always': if a write query is issued, read queries will
                                   # not be load balanced until the session ends.

#------------------------------------------------------------------------------
# MASTER/SLAVE MODE
#------------------------------------------------------------------------------

master_slave_mode = on
                                   # Activate master/slave mode
                                   # (change requires restart)
master_slave_sub_mode = 'stream'
                                   # Master/slave sub mode
                                   # Valid values are combinations stream, slony
                                   # or logical. Default is stream.
                                   # (change requires restart)

# - Streaming -

sr_check_period = 5 #0
                                   # Streaming replication check period
                                   # Disabled (0) by default
sr_check_user = 'replication'
                                   # Streaming replication check user
                                   # This is necessary even if you disable
                                   # streaming replication delay check with
                                   # sr_check_period = 0

sr_check_password = 'reppassword'
                                   # Password for streaming replication check user.
                                   # Leaving it empty will make Pgpool-II to first look for the
                                   # Password in pool_passwd file before using the empty password
sr_check_database = 'mawidstg01'
#sr_check_database = 'postgres'
                                   # Database name for streaming replication check
delay_threshold = 0
                                   # Threshold before not dispatching query to standby node
                                   # Unit is in bytes
                                   # Disabled (0) by default

# - Special commands -

follow_master_command = ''
                                   # Executes this command after master failover
                                   # Special values:
                                   # %d = node id
                                   # %h = host name
                                   # %p = port number
                                   # %D = database cluster path
                                   # %m = new master node id
                                   # %H = hostname of the new master node
                                   # %M = old master node id
                                   # %P = old primary node id
                                   # %r = new master port number
                                   # %R = new master database cluster path
                                   # %% = '%' character

#------------------------------------------------------------------------------
# HEALTH CHECK GLOBAL PARAMETERS
#------------------------------------------------------------------------------

health_check_period = 5
                                   # Health check period
                                   # Disabled (0) by default
health_check_timeout = 10
#health_check_timeout = 0
                                   # Health check timeout
                                   # 0 means no timeout
health_check_user = 'postgres'
                                   # Health check user
health_check_password = 'TEXTpostgrestg'
                                   # Password for health check user
                                   # Leaving it empty will make Pgpool-II to first look for the
                                   # Password in pool_passwd file before using the empty password

health_check_database = ''
                                   # Database name for health check. If '', tries 'postgres' frist, then 'template1'

health_check_max_retries = 0
                                   # Maximum number of times to retry a failed health check before giving up.
health_check_retry_delay = 1
                                   # Amount of time to wait (in seconds) between retries.
connect_timeout = 10000
                                   # Timeout value in milliseconds before giving up to connect to backend.
                                   # Default is 10000 ms (10 second). Flaky network user may want to increase
                                   # the value. 0 means no timeout.
                                   # Note that this value is not only used for health check,
                                   # but also for ordinary conection to backend.

#------------------------------------------------------------------------------
# HEALTH CHECK PER NODE PARAMETERS (OPTIONAL)
#------------------------------------------------------------------------------
#health_check_period0 = 0
#health_check_timeout0 = 20
#health_check_user0 = 'nobody'
#health_check_password0 = ''
#health_check_database0 = ''
#health_check_max_retries0 = 0
#health_check_retry_delay0 = 1
#connect_timeout0 = 10000

#------------------------------------------------------------------------------
# FAILOVER AND FAILBACK
#------------------------------------------------------------------------------
failover_command = '/etc/pgpool/4.1.0/failover.sh %d %P %H reppassword /installer/postgresql-11.5/data/im_the_master'
#failover_command = '/etc/pgpool-II/failover.sh %d %P %H reppassword /installer/postgresql-11.5/data/im_the_master'
                                   # Executes this command at failover
                                   # Special values:
                                   # %d = node id
                                   # %h = host name
                                   # %p = port number
                                   # %D = database cluster path
                                   # %m = new master node id
                                   # %H = hostname of the new master node
                                   # %M = old master node id
                                   # %P = old primary node id
                                   # %r = new master port number
                                   # %R = new master database cluster path
                                   # %% = '%' character
failback_command = ''
                                   # Executes this command at failback.
                                   # Special values:
                                   # %d = node id
                                   # %h = host name
                                   # %p = port number
                                   # %D = database cluster path
                                   # %m = new master node id
                                   # %H = hostname of the new master node
                                   # %M = old master node id
                                   # %P = old primary node id
                                   # %r = new master port number
                                   # %R = new master database cluster path
                                   # %% = '%' character

failover_on_backend_error = on
                                   # Initiates failover when reading/writing to the
                                   # backend communication socket fails
                                   # If set to off, pgpool will report an
                                   # error and disconnect the session.

detach_false_primary = off
                                   # Detach false primary if on. Only
                                   # valid in streaming replicaton
                                   # mode and with PostgreSQL 9.6 or
                                   # after.
search_primary_node_timeout = 10
#search_primary_node_timeout = 300
                                   # Timeout in seconds to search for the
                                   # primary node when a failover occurs.
                                   # 0 means no timeout, keep searching
                                   # for a primary node forever.

#------------------------------------------------------------------------------
# ONLINE RECOVERY
#------------------------------------------------------------------------------

recovery_user = 'postgres'
                                   # Online recovery user
recovery_password = 'postgrestg'
                                   # Online recovery password
                                   # Leaving it empty will make Pgpool-II to first look for the
                                   # Password in pool_passwd file before using the empty password

recovery_1st_stage_command = 'recovery_1st_stage.sh'
                                   # Executes a command in first stage
recovery_2nd_stage_command = ''
                                   # Executes a command in second stage
recovery_timeout = 90
                                   # Timeout in seconds to wait for the
                                   # recovering node's postmaster to start up
                                   # 0 means no wait
client_idle_limit_in_recovery = 0
                                   # Client is disconnected after being idle
                                   # for that many seconds in the second stage
                                   # of online recovery
                                   # 0 means no disconnection
                                   # -1 means immediate disconnection


#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------

# - Enabling -

use_watchdog = on
                                    # Activates watchdog
                                    # (change requires restart)

# -Connection to up stream servers -
trusted_servers = 'mohvcasdb01.novalocal,mohcasdevdb.novalocal'
                                    # trusted server list which are used
                                    # to confirm network connection
                                    # (hostA,hostB,hostC,...)
                                    # (change requires restart)
ping_path = '/bin'
                                    # ping command path
                                    # (change requires restart)

# - Watchdog communication Settings -

wd_hostname = 'pgpool-poc02.novalocal'
                                    # Host name or IP address of this watchdog
                                    # (change requires restart)
wd_port = 9000
                                    # port number for watchdog service
                                    # (change requires restart)
wd_priority = 1
                                    # priority of this watchdog in leader election
                                    # (change requires restart)

wd_authkey = ''
                                    # Authentication key for watchdog communication
                                    # (change requires restart)

wd_ipc_socket_dir = '/var/run/postgresql'
                                    # Unix domain socket path for watchdog IPC socket
                                    # The Debian package defaults to
                                    # /var/run/postgresql
                                    # (change requires restart)


# - Virtual IP control Setting -

delegate_IP = '10.70.184.29'
                                    # delegate IP address
                                    # If this is empty, virtual IP never bring up.
                                    # (change requires restart)
if_cmd_path = '/sbin'
                                    # path to the directory where if_up/down_cmd exists
                                    # (change requires restart)
if_up_cmd = 'ip addr add $_IP_$/24 dev eth0 label eth0:0'
                                    # startup delegate IP command
                                    # (change requires restart)
if_down_cmd = 'ip addr del $_IP_$/24 dev eth0'
                                    # shutdown delegate IP command
                                    # (change requires restart)
arping_path = '/usr/sbin'
                                    # arping command path
                                    # (change requires restart)
arping_cmd = 'arping -U $_IP_$ -w 1 -I eth0'
                                    # arping command
                                    # (change requires restart)

# - Behaivor on escalation Setting -

clear_memqcache_on_escalation = on
                                    # Clear all the query cache on shared memory
                                    # when standby pgpool escalate to active pgpool
                                    # (= virtual IP holder).
                                    # This should be off if client connects to pgpool
                                    # not using virtual IP.
                                    # (change requires restart)
wd_escalation_command = ''
                                    # Executes this command at escalation on new active pgpool.
                                    # (change requires restart)
wd_de_escalation_command = ''
                                    # Executes this command when master pgpool resigns from being master.
                                    # (change requires restart)

# - Watchdog consensus settings for failover -

failover_when_quorum_exists = on
                                    # Only perform backend node failover
                                    # when the watchdog cluster holds the quorum
                                    # (change requires restart)

failover_require_consensus = on
                                    # Perform failover when majority of Pgpool-II nodes
                                    # aggrees on the backend node status change
                                    # (change requires restart)

allow_multiple_failover_requests_from_node = off
                                    # A Pgpool-II node can cast multiple votes
                                    # for building the consensus on failover
                                    # (change requires restart)

# - Lifecheck Setting -

# -- common --

wd_monitoring_interfaces_list = ''
                                    # if any interface from the list is active the watchdog will
                                    # consider the network is fine
                                    # 'any' to enable monitoring on all interfaces except loopback
                                    # '' to disable monitoring
                                    # (change requires restart)


wd_lifecheck_method = 'heartbeat'
                                    # Method of watchdog lifecheck ('heartbeat' or 'query' or 'external')
                                    # (change requires restart)
wd_interval = 3
                                    # lifecheck interval (sec) > 0
                                    # (change requires restart)

# -- heartbeat mode --

wd_heartbeat_port = 9694
                                    # Port number for receiving heartbeat signal
                                    # (change requires restart)
wd_heartbeat_keepalive = 2
                                    # Interval time of sending heartbeat signal (sec)
                                    # (change requires restart)
wd_heartbeat_deadtime = 30
                                    # Deadtime interval for heartbeat signal (sec)
                                    # (change requires restart)
                                    # Host name or IP address of destination 0
                                    # for sending heartbeat signal.
                                    # (change requires restart)
                                    # Port number of destination 0 for sending
                                    # heartbeat signal. Usually this is the
                                    # same as wd_heartbeat_port.
                                    # (change requires restart)
                                    # Name of NIC device (such like 'eth0')
                                    # used for sending/receiving heartbeat
                                    # signal to/from destination 0.
                                    # This works only when this is not empty
                                    # and pgpool has root privilege.
                                    # (change requires restart)

#heartbeat_destination1 = 'host0_ip2'
#heartbeat_destination_port1 = 9694
#heartbeat_device1 = ''

# -- query mode --

wd_life_point = 3
                                    # lifecheck retry times
                                    # (change requires restart)
wd_lifecheck_query = 'SELECT 1'
                                    # lifecheck query to pgpool from watchdog
                                    # (change requires restart)
wd_lifecheck_dbname = 'template1'
                                    # Database name connected for lifecheck
                                    # (change requires restart)
wd_lifecheck_user = 'nobody'
                                    # watchdog user monitoring pgpools in lifecheck
                                    # (change requires restart)
wd_lifecheck_password = ''
                                    # Password for watchdog user in lifecheck
                                    # Leaving it empty will make Pgpool-II to first look for the
                                    # Password in pool_passwd file before using the empty password
                                    # (change requires restart)

# - Other pgpool Connection Settings -

                                    # Host name or IP address to connect to for other pgpool 0
                                    # (change requires restart)
                                    # Port number for other pgpool 0
                                    # (change requires restart)
                                    # Port number for other watchdog 0
                                    # (change requires restart)
#other_pgpool_hostname1 = 'host1'
#other_pgpool_port1 = 5432
#other_wd_port1 = 9000


#------------------------------------------------------------------------------
# OTHERS
#------------------------------------------------------------------------------
relcache_expire = 0
                                   # Life time of relation cache in seconds.
                                   # 0 means no cache expiration(the default).
                                   # The relation cache is used for cache the
                                   # query result against PostgreSQL system
                                   # catalog to obtain various information
                                   # including table structures or if it's a
                                   # temporary table or not. The cache is
                                   # maintained in a pgpool child local memory
                                   # and being kept as long as it survives.
                                   # If someone modify the table by using
                                   # ALTER TABLE or some such, the relcache is
                                   # not consistent anymore.
                                   # For this purpose, cache_expiration
                                   # controls the life time of the cache.

relcache_size = 256
                                   # Number of relation cache
                                   # entry. If you see frequently:
                                   # "pool_search_relcache: cache replacement happend"
                                   # in the pgpool log, you might want to increate this number.

check_temp_table = on
                                   # If on, enable temporary table check in SELECT statements.
                                   # This initiates queries against system catalog of primary/master
                                   # thus increases load of master.
                                   # If you are absolutely sure that your system never uses temporary tables
                                   # and you want to save access to primary/master, you could turn this off.
                                   # Default is on.

check_unlogged_table = on
                                   # If on, enable unlogged table check in SELECT statements.
                                   # This initiates queries against system catalog of primary/master
                                   # thus increases load of master.
                                   # If you are absolutely sure that your system never uses unlogged tables
                                   # and you want to save access to primary/master, you could turn this off.
                                   # Default is on.

#------------------------------------------------------------------------------
# IN MEMORY QUERY MEMORY CACHE
#------------------------------------------------------------------------------
memory_cache_enabled = off
                                   # If on, use the memory cache functionality, off by default
                                   # (change requires restart)
memqcache_method = 'shmem'
                                   # Cache storage method. either 'shmem'(shared memory) or
                                   # 'memcached'. 'shmem' by default
                                   # (change requires restart)
memqcache_memcached_host = 'localhost'
                                   # Memcached host name or IP address. Mandatory if
                                   # memqcache_method = 'memcached'.
                                   # Defaults to localhost.
                                   # (change requires restart)
memqcache_memcached_port = 11211
                                   # Memcached port number. Mondatory if memqcache_method = 'memcached'.
                                   # Defaults to 11211.
                                   # (change requires restart)
memqcache_total_size = 67108864
                                   # Total memory size in bytes for storing memory cache.
                                   # Mandatory if memqcache_method = 'shmem'.
                                   # Defaults to 64MB.
                                   # (change requires restart)
memqcache_max_num_cache = 1000000
                                   # Total number of cache entries. Mandatory
                                   # if memqcache_method = 'shmem'.
                                   # Each cache entry consumes 48 bytes on shared memory.
                                   # Defaults to 1,000,000(45.8MB).
                                   # (change requires restart)
memqcache_expire = 0
                                   # Memory cache entry life time specified in seconds.
                                   # 0 means infinite life time. 0 by default.
                                   # (change requires restart)
memqcache_auto_cache_invalidation = on
                                   # If on, invalidation of query cache is triggered by corresponding
                                   # DDL/DML/DCL(and memqcache_expire). If off, it is only triggered
                                   # by memqcache_expire. on by default.
                                   # (change requires restart)
memqcache_maxcache = 409600
                                   # Maximum SELECT result size in bytes.
                                   # Must be smaller than memqcache_cache_block_size. Defaults to 400KB.
                                   # (change requires restart)
memqcache_cache_block_size = 1048576
                                   # Cache block size in bytes. Mandatory if memqcache_method = 'shmem'.
                                   # Defaults to 1MB.
                                   # (change requires restart)
memqcache_oiddir = '/var/log/pgpool/oiddir'
                                   # Temporary work directory to record table oids
                                   # (change requires restart)
white_memqcache_table_list = ''
                                   # Comma separated list of table names to memcache
                                   # that don't write to database
                                   # Regexp are accepted
black_memqcache_table_list = ''
                                   # Comma separated list of table names not to memcache
                                   # that don't write to database
                                   # Regexp are accepted
backend_hostname0 = 'pgpool-poc01.novalocal'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/installer/postgresql-11.5/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = 'pgpool-poc02.novalocal'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/installer/postgresql-11.5/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'

heartbeat_destination0 = 'pgpool-poc01.novalocal'
heartbeat_destination_port0 = 9694

other_pgpool_hostname0 = 'pgpool-poc01.novalocal'
other_pgpool_port0 = 5433
other_wd_port0 = 9000

heartbeat_device0 = 'eth0'
##Addded b Raj --> pgpool-II 4.1 onwards available parameters

statement_level_load_balance = on

enable_consensus_with_half_votes = on

[root@pgpool-poc02 bin]#
(0003097)
raj.pandey1982@gmail.com   
2020-01-29 06:47   
One Observation here

:- when ever after DB failover above mentioned issue occurs, if i restart the pgpool services on both the nodes again then Remote connection is happening (checked through connecting PostgreSql Adamin) and the new Master DB at node2 getting connected using VIP/Port. while pgpool log on both the node keep on giving message as below:-
 
2020-01-29 00:34:55: pid 30207:LOG: new IPC connection received
2020-01-29 00:34:56: pid 31230:LOG: get_query_result: no rows returned
2020-01-29 00:34:56: pid 31230:DETAIL: node id (1)
2020-01-29 00:34:56: pid 31230:CONTEXT: while checking replication time lag
2020-01-29 00:34:56: pid 31230:LOG: get_query_result falied: status: -1
2020-01-29 00:34:56: pid 31230:CONTEXT: while checking replication time lag
2020-01-29 00:35:01: pid 31230:LOG: get_query_result: no rows returned
2020-01-29 00:35:01: pid 31230:DETAIL: node id (1)
2020-01-29 00:35:01: pid 31230:CONTEXT: while checking replication time lag
2020-01-29 00:35:01: pid 31230:LOG: get_query_result falied: status: -1
2020-01-29 00:35:01: pid 31230:CONTEXT: while checking replication time lag

---> Hence something is not good here which is not separating(de-attaching ) the down old master node 1 and hence pgpool keep on trying to connect to this down node and not letting
the newly elected Master DB to accept connections via PGPOOL VIP/Port. And if i restart service then separation happening but with above "get_query_result falied: status: -1" error and then connections also happening from remote.
(0003099)
t-ishii   
2020-01-29 11:49   
(Last edited: 2020-01-29 12:02)
When Pgpool-II goes into "unstable" state, can you show the output of "watchdog_info -h pgpool-poc02.novalocal -v"? This should give important info to study the issue.

In the mean time I noticed:
2020-01-22 12:39:16: pid 20780:LOG: invalid degenerate backend request, node id : 0 status: [3] is not valid for failover

Status = 3 means, node 0 (pgpool-poc1) of PostgreSQL went into "invalid" status. The inavlid status could only be set if detach_false_primary = on. However in your pgpool.conf it is set to off. So I am confused...

(0003100)
t-ishii   
2020-01-29 14:11   
> search_primary_node_timeout=10 also not working in my case to avoid below error again and again as this is streaming replication feature while mine is Master-Slave replication:-

What do you mean by "Master-Slave replication"? Can you please elaborate?
(0003101)
raj.pandey1982@gmail.com   
2020-01-29 18:27   
========================pcp_watchdog_info command on 1st/2nd node before failover

pcp_watchdog_info at node 1:-

[root@pgpool-poc01 etc]# pcp_watchdog_info -h pgpool-poc01.novalocal -U postgres -p 9898 -v -w
Watchdog Cluster Information
Total Nodes : 2
Remote Nodes : 1
Quorum state : QUORUM EXIST
Alive Remote Nodes : 1
VIP up on local node : YES
Master Node Name : pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal
Master Host Name : pgpool-poc01.novalocal

Watchdog Node Information
Node Name : pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal
Host Name : pgpool-poc01.novalocal
Delegate IP : 10.70.184.29
Pgpool port : 5433
Watchdog port : 9000
Node priority : 1
Status : 4
Status Name : MASTER

Node Name : pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
Host Name : pgpool-poc02.novalocal
Delegate IP : 10.70.184.29
Pgpool port : 5433
Watchdog port : 9000
Node priority : 1
Status : 7
Status Name : STANDBY

[root@pgpool-poc01 etc]# pcp_watchdog_info -h pgpool-poc02.novalocal -U postgres -p 9898 -v -w
Watchdog Cluster Information
Total Nodes : 2
Remote Nodes : 1
Quorum state : QUORUM EXIST
Alive Remote Nodes : 1
VIP up on local node : NO
Master Node Name : pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal
Master Host Name : pgpool-poc01.novalocal

Watchdog Node Information
Node Name : pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
Host Name : pgpool-poc02.novalocal
Delegate IP : 10.70.184.29
Pgpool port : 5433
Watchdog port : 9000
Node priority : 1
Status : 7
Status Name : STANDBY

Node Name : pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal
Host Name : pgpool-poc01.novalocal
Delegate IP : 10.70.184.29
Pgpool port : 5433
Watchdog port : 9000
Node priority : 1
Status : 4
Status Name : MASTER

[root@pgpool-poc01 etc]#



pcp_watchdog_info at node 2:-

[root@pgpool-poc02 replscripts]# pcp_watchdog_info -h pgpool-poc01.novalocal -U postgres -p 9898 -v -w
Watchdog Cluster Information
Total Nodes : 2
Remote Nodes : 1
Quorum state : QUORUM EXIST
Alive Remote Nodes : 1
VIP up on local node : YES
Master Node Name : pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal
Master Host Name : pgpool-poc01.novalocal

Watchdog Node Information
Node Name : pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal
Host Name : pgpool-poc01.novalocal
Delegate IP : 10.70.184.29
Pgpool port : 5433
Watchdog port : 9000
Node priority : 1
Status : 4
Status Name : MASTER

Node Name : pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
Host Name : pgpool-poc02.novalocal
Delegate IP : 10.70.184.29
Pgpool port : 5433
Watchdog port : 9000
Node priority : 1
Status : 7
Status Name : STANDBY

[root@pgpool-poc02 replscripts]# pcp_watchdog_info -h pgpool-poc02.novalocal -U postgres -p 9898 -v -w
Watchdog Cluster Information
Total Nodes : 2
Remote Nodes : 1
Quorum state : QUORUM EXIST
Alive Remote Nodes : 1
VIP up on local node : NO
Master Node Name : pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal
Master Host Name : pgpool-poc01.novalocal

Watchdog Node Information
Node Name : pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
Host Name : pgpool-poc02.novalocal
Delegate IP : 10.70.184.29
Pgpool port : 5433
Watchdog port : 9000
Node priority : 1
Status : 7
Status Name : STANDBY

Node Name : pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal
Host Name : pgpool-poc01.novalocal
Delegate IP : 10.70.184.29
Pgpool port : 5433
Watchdog port : 9000
Node priority : 1
Status : 4
Status Name : MASTER

[root@pgpool-poc02 replscripts]#
(0003102)
raj.pandey1982@gmail.com   
2020-01-29 18:29   
========================pcp_watchdog_info command on 1st/2nd node After failover


pcp_watchdog_info at node 1:-

[root@pgpool-poc01 etc]# pcp_watchdog_info -h pgpool-poc01.novalocal -U postgres -p 9898 -v -w
Watchdog Cluster Information
Total Nodes : 2
Remote Nodes : 1
Quorum state : QUORUM EXIST
Alive Remote Nodes : 1
VIP up on local node : YES
Master Node Name : pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal
Master Host Name : pgpool-poc01.novalocal

Watchdog Node Information
Node Name : pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal
Host Name : pgpool-poc01.novalocal
Delegate IP : 10.70.184.29
Pgpool port : 5433
Watchdog port : 9000
Node priority : 1
Status : 4
Status Name : MASTER

Node Name : pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
Host Name : pgpool-poc02.novalocal
Delegate IP : 10.70.184.29
Pgpool port : 5433
Watchdog port : 9000
Node priority : 1
Status : 7
Status Name : STANDBY

[root@pgpool-poc01 etc]# pcp_watchdog_info -h pgpool-poc02.novalocal -U postgres -p 9898 -v -w
Watchdog Cluster Information
Total Nodes : 2
Remote Nodes : 1
Quorum state : QUORUM EXIST
Alive Remote Nodes : 1
VIP up on local node : NO
Master Node Name : pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal
Master Host Name : pgpool-poc01.novalocal

Watchdog Node Information
Node Name : pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
Host Name : pgpool-poc02.novalocal
Delegate IP : 10.70.184.29
Pgpool port : 5433
Watchdog port : 9000
Node priority : 1
Status : 7
Status Name : STANDBY

Node Name : pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal
Host Name : pgpool-poc01.novalocal
Delegate IP : 10.70.184.29
Pgpool port : 5433
Watchdog port : 9000
Node priority : 1
Status : 4
Status Name : MASTER



pcp_watchdog_info at node 2:-

[root@pgpool-poc02 replscripts]# pcp_watchdog_info -h pgpool-poc01.novalocal -U postgres -p 9898 -v -w
Watchdog Cluster Information
Total Nodes : 2
Remote Nodes : 1
Quorum state : QUORUM EXIST
Alive Remote Nodes : 1
VIP up on local node : YES
Master Node Name : pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal
Master Host Name : pgpool-poc01.novalocal

Watchdog Node Information
Node Name : pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal
Host Name : pgpool-poc01.novalocal
Delegate IP : 10.70.184.29
Pgpool port : 5433
Watchdog port : 9000
Node priority : 1
Status : 4
Status Name : MASTER

Node Name : pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
Host Name : pgpool-poc02.novalocal
Delegate IP : 10.70.184.29
Pgpool port : 5433
Watchdog port : 9000
Node priority : 1
Status : 7
Status Name : STANDBY

[root@pgpool-poc02 replscripts]# pcp_watchdog_info -h pgpool-poc02.novalocal -U postgres -p 9898 -v -w
Watchdog Cluster Information
Total Nodes : 2
Remote Nodes : 1
Quorum state : QUORUM EXIST
Alive Remote Nodes : 1
VIP up on local node : NO
Master Node Name : pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal
Master Host Name : pgpool-poc01.novalocal

Watchdog Node Information
Node Name : pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
Host Name : pgpool-poc02.novalocal
Delegate IP : 10.70.184.29
Pgpool port : 5433
Watchdog port : 9000
Node priority : 1
Status : 7
Status Name : STANDBY

Node Name : pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal
Host Name : pgpool-poc01.novalocal
Delegate IP : 10.70.184.29
Pgpool port : 5433
Watchdog port : 9000
Node priority : 1
Status : 4
Status Name : MASTER

[root@pgpool-poc02 replscripts]#
(0003103)
raj.pandey1982@gmail.com   
2020-01-29 18:33   
>What do you mean by "Master-Slave replication"? Can you please elaborate?
i Mean Master Slave WAL replication i am using and SLOT replication.
#------------------------------------------------------------------------------
# REPLICATION MODE
#------------------------------------------------------------------------------

replication_mode = off
                                   # Activate replication mode
                                   # (change requires restart)
replicate_select = off
(0003104)
t-ishii   
2020-01-29 20:55   
> i Mean Master Slave WAL replication i am using and SLOT replication.
That's actually called "streaming replication" in PostgreSQL world.
(0003105)
raj.pandey1982@gmail.com   
2020-01-29 22:31   
Any Luck , we are running out of time.Hope you understand the scenario under which issue is occurring.
>2020-01-23 12:14:21: pid 2909:LOG: invalid degenerate backend request, node id : 0 status: [3] is not valid for failover
above is i also think the first think to worry about why this is coming up. even why after successfull DB failouver and promotion any error would come!.
(0003107)
t-ishii   
2020-01-29 23:29   
But from the code's point of view it should only when detach_false_primary = on as I said. If it's on I could make some theory. Have you ever turned it on?
(0003108)
raj.pandey1982@gmail.com   
2020-01-30 04:48   
I did not do this before but just i tried once with detach_false_primary= on and restarted Master-Standby pgpool services,but then remote connection stopped happning and logs give below error:-
2020-01-29 22:27:05: pid 24481:LOG: Backend status file /var/log/pgpool/pgpool_status discarded
2020-01-29 22:27:05: pid 24481:LOG: memory cache initialized
2020-01-29 22:27:05: pid 24481:DETAIL: memcache blocks :64
2020-01-29 22:27:05: pid 24481:LOG: pool_discard_oid_maps: discarded memqcache oid maps
2020-01-29 22:27:05: pid 24481:LOG: waiting for watchdog to initialize
2020-01-29 22:27:05: pid 24483:LOG: setting the local watchdog node name to "pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal"
2020-01-29 22:27:05: pid 24483:LOG: watchdog cluster is configured with 1 remote nodes
2020-01-29 22:27:05: pid 24483:LOG: watchdog remote node:0 on pgpool-poc02.novalocal:9000
2020-01-29 22:27:05: pid 24483:LOG: interface monitoring is disabled in watchdog
2020-01-29 22:27:05: pid 24483:LOG: watchdog node state changed from [DEAD] to [LOADING]
2020-01-29 22:27:10: pid 24483:LOG: watchdog node state changed from [LOADING] to [JOINING]
2020-01-29 22:27:14: pid 24483:LOG: watchdog node state changed from [JOINING] to [INITIALIZING]
2020-01-29 22:27:15: pid 24483:LOG: I am the only alive node in the watchdog cluster
2020-01-29 22:27:15: pid 24483:HINT: skipping stand for coordinator state
2020-01-29 22:27:15: pid 24483:LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
2020-01-29 22:27:15: pid 24483:LOG: I am announcing my self as master/coordinator watchdog node
2020-01-29 22:27:19: pid 24483:LOG: I am the cluster leader node
2020-01-29 22:27:19: pid 24483:DETAIL: our declare coordinator message is accepted by all nodes
2020-01-29 22:27:19: pid 24483:LOG: setting the local node "pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal" as watchdog cluster master
2020-01-29 22:27:19: pid 24483:LOG: I am the cluster leader node. Starting escalation process
2020-01-29 22:27:19: pid 24481:LOG: watchdog process is initialized
2020-01-29 22:27:19: pid 24481:DETAIL: watchdog messaging data version: 1.1
2020-01-29 22:27:19: pid 24483:LOG: escalation process started with PID:24485
2020-01-29 22:27:19: pid 24485:LOG: watchdog: escalation started
2020-01-29 22:27:19: pid 24483:LOG: new IPC connection received
2020-01-29 22:27:19: pid 24481:LOG: Setting up socket for 0.0.0.0:5433
2020-01-29 22:27:19: pid 24481:LOG: Setting up socket for :::5433
2020-01-29 22:27:19: pid 24483:LOG: new IPC connection received
2020-01-29 22:27:19: pid 24486:LOG: 2 watchdog nodes are configured for lifecheck
2020-01-29 22:27:19: pid 24486:LOG: watchdog nodes ID:0 Name:"pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal"
2020-01-29 22:27:19: pid 24486:DETAIL: Host:"pgpool-poc01.novalocal" WD Port:9000 pgpool-II port:5433
2020-01-29 22:27:19: pid 24486:LOG: watchdog nodes ID:1 Name:"Not_Set"
2020-01-29 22:27:19: pid 24486:DETAIL: Host:"pgpool-poc02.novalocal" WD Port:9000 pgpool-II port:5433
2020-01-29 22:27:19: pid 24486:LOG: watchdog lifecheck trusted server "mohvcasdb01.novalocal" added for the availability check
2020-01-29 22:27:19: pid 24486:LOG: watchdog lifecheck trusted server "mohcasdevdb.novalocal" added for the availability check
2020-01-29 22:27:20: pid 24481:LOG: find_primary_node_repeatedly: waiting for finding a primary node
2020-01-29 22:27:20: pid 24481:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2020-01-29 22:27:20: pid 24481:LOG: find_primary_node: primary node is 0
2020-01-29 22:27:20: pid 24481:LOG: find_primary_node: standby node is 1
2020-01-29 22:27:20: pid 25492:LOG: PCP process: 25492 started
2020-01-29 22:27:20: pid 24481:LOG: pgpool-II successfully started. version 4.1.0 (karasukiboshi)
2020-01-29 22:27:20: pid 24481:LOG: node status[0]: 1
2020-01-29 22:27:20: pid 24481:LOG: node status[1]: 2
2020-01-29 22:27:20: pid 25493:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2020-01-29 22:27:20: pid 24493:LOG: createing watchdog heartbeat receive socket.
2020-01-29 22:27:20: pid 24493:DETAIL: bind receive socket to device: "eth0"
2020-01-29 22:27:20: pid 24493:LOG: set SO_REUSEPORT option to the socket
2020-01-29 22:27:20: pid 24495:LOG: creating socket for sending heartbeat
2020-01-29 22:27:20: pid 24495:DETAIL: bind send socket to device: eth0
2020-01-29 22:27:20: pid 24493:LOG: creating watchdog heartbeat receive socket.
2020-01-29 22:27:20: pid 24493:DETAIL: set SO_REUSEPORT
2020-01-29 22:27:20: pid 24495:LOG: set SO_REUSEPORT option to the socket
2020-01-29 22:27:20: pid 24495:LOG: creating socket for sending heartbeat
2020-01-29 22:27:20: pid 24495:DETAIL: set SO_REUSEPORT
2020-01-29 22:27:23: pid 24485:LOG: successfully acquired the delegate IP:"10.70.184.29"
2020-01-29 22:27:23: pid 24485:DETAIL: 'if_up_cmd' returned with success
2020-01-29 22:27:23: pid 24483:LOG: watchdog escalation process with pid: 24485 exit with SUCCESS.
2020-01-29 22:27:25: pid 25493:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2020-01-29 22:27:30: pid 25493:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2020-01-29 22:27:35: pid 25493:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2020-01-29 22:27:40: pid 25493:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2020-01-29 22:27:45: pid 25493:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2020-01-29 22:27:50: pid 25493:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2020-01-29 22:27:55: pid 25493:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2020-01-29 22:28:00: pid 25493:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2020-01-29 22:28:03: pid 24483:LOG: new watchdog node connection is received from "10.70.184.28:57574"
2020-01-29 22:28:03: pid 24483:LOG: new node joined the cluster hostname:"pgpool-poc02.novalocal" port:9000 pgpool_port:5433
2020-01-29 22:28:03: pid 24483:DETAIL: Pgpool-II version:"4.1.0" watchdog messaging version: 1.1
2020-01-29 22:28:03: pid 24483:LOG: new outbound connection to pgpool-poc02.novalocal:9000
2020-01-29 22:28:05: pid 25493:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2020-01-29 22:28:09: pid 24483:LOG: adding watchdog node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" to the standby list
2020-01-29 22:28:09: pid 24481:LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog
2020-01-29 22:28:09: pid 24483:LOG: new IPC connection received
2020-01-29 22:28:09: pid 24481:LOG: watchdog cluster now holds the quorum
2020-01-29 22:28:09: pid 24481:DETAIL: updating the state of quarantine backend nodes
2020-01-29 22:28:09: pid 24483:LOG: new IPC connection received
2020-01-29 22:28:10: pid 25493:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2020-01-29 22:28:15: pid 25493:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2020-01-29 22:28:20: pid 25493:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2020-01-29 22:28:25: pid 25493:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2020-01-29 22:28:30: pid 25493:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2020-01-29 22:28:35: pid 25493:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2020-01-29 22:28:40: pid 25493:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2020-01-29 22:28:45: pid 25493:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2020-01-29 22:28:50: pid 25493:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2020-01-29 22:28:55: pid 25493:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2020-01-29 22:29:00: pid 25493:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
(0003109)
t-ishii   
2020-01-30 10:38   
> 2020-01-29 22:28:10: pid 25493:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
It seems PostgreSQL 0 and 1 are alive and PostgreSQL 0 is primary. But PostgreSQL 1 is not properly connected to the primary as a streaming replication standby. You can confirm this by issuing "show pool_nodes" or send query "select * from pg_stat_replication" to PostgreSQL primary.

Going back to the original problem:
> 2020-01-22 12:39:16: pid 20780:LOG: invalid degenerate backend request, node id : 0 status: [3] is not valid for failover
I believe that detach_false_primary= on at that time. Anyway, set detach_false_primary= off should solve the problem.
(0003110)
raj.pandey1982@gmail.com   
2020-01-30 15:10   
.>You can confirm this by issuing "show pool_nodes" or send query "select * from pg_stat_replication" to PostgreSQL primary.
show pool_nodes:-

mawidstg01=# show pool_nodes;
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_stat
e | last_status_change
---------+------------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+----------------------
--+---------------------
 0 | pgpool-poc01.novalocal | 5432 | up | 0.500000 | primary | 0 | false | 0 | |
  | 2020-01-30 09:04:22
 1 | pgpool-poc02.novalocal | 5432 | up | 0.500000 | standby | 0 | true | 0 | |
  | 2020-01-30 09:04:22
(2 rows)


select * from pg_stat_replication:
pid usesysid usename application_name client_addr client_hostname client_port backend_start backend_xmin state sent_lsn write_lsn flush_lsn replay_lsn write_lag flush_lag replay_lag sync_priority sync_state
24424 98470 replication walreceiver 10.70.184.28 43510 2020-01-29 22:23:27.035277+03 8915528 streaming 109/B301F4B8 109/B301F4B8 109/B301F4B8 109/B301F4B8 0 async


>I believe that detach_false_primary= on at that time. Anyway, set detach_false_primary= off should solve the problem.
> 2020-01-29 22:28:10: pid 25493:LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
>> the above error i produced after keeping detach_false_primary=on and restarting pgpool services (master slave were up that time ) just to showcase you ,else by default detach_false_primary is off only.
(0003111)
t-ishii   
2020-01-30 15:32   
In show pool_nodes, "replication_state" column is empty. This means Pgpool-II fails to collect streaming replication state. Check sr_check* parameters. Since "select * from pg_stat_replication:" seems to work (actually Pgpool-II sends the same query), streaming replication itself is working. Probably role or permission or password is not appropriate.
(0003112)
raj.pandey1982@gmail.com   
2020-01-30 18:27   
for sr_check* i changed user from 'replication' to 'postgres and password accordingly. but and did the fail over still same error:-

I dont understand when i make Master DB down and log gives degenerate backend request, why its still trying to connect node 1 master DB only even after failover and promotion.:-
(FYI : show pgpool_nodes still gives "replication_state" blank even from PGAdmin client when i fire select * from pg_stat_replication , all field values appear)

2020-01-30 12:21:06: pid 14552:DETAIL: postmaster on DB node 0 was shutdown by administrative command
2020-01-30 12:21:06: pid 14552:LOG: received degenerate backend request for node_id: 0 from pid [14552]
2020-01-30 12:21:06: pid 15546:LOG: reading and processing packets
2020-01-30 12:21:06: pid 15546:DETAIL: postmaster on DB node 0 was shutdown by administrative command
2020-01-30 12:21:06: pid 15546:LOG: received degenerate backend request for node_id: 0 from pid [15546]
2020-01-30 12:21:06: pid 14527:LOG: new IPC connection received
2020-01-30 12:21:06: pid 14527:LOG: new IPC connection received
2020-01-30 12:21:06: pid 14527:LOG: watchdog received the failover command from local pgpool-II on IPC interface
2020-01-30 12:21:06: pid 14527:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2020-01-30 12:21:06: pid 14527:LOG: we have got the consensus to perform the failover
2020-01-30 12:21:06: pid 14527:DETAIL: 1 node(s) voted in the favor
2020-01-30 12:21:06: pid 14527:LOG: watchdog received the failover command from local pgpool-II on IPC interface
2020-01-30 12:21:06: pid 14527:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2020-01-30 12:21:06: pid 14527:LOG: we have got the consensus to perform the failover
2020-01-30 12:21:06: pid 14527:DETAIL: 1 node(s) voted in the favor
2020-01-30 12:21:06: pid 14525:LOG: Pgpool-II parent process has received failover request
2020-01-30 12:21:06: pid 14527:LOG: new IPC connection received
2020-01-30 12:21:06: pid 14527:LOG: received the failover indication from Pgpool-II on IPC interface
2020-01-30 12:21:06: pid 14527:LOG: watchdog is informed of failover start by the main process
2020-01-30 12:21:06: pid 14525:LOG: starting degeneration. shutdown host pgpool-poc01.novalocal(5432)
2020-01-30 12:21:06: pid 15522:LOG: reading and processing packets
2020-01-30 12:21:06: pid 15522:DETAIL: postmaster on DB node 0 was shutdown by administrative command
2020-01-30 12:21:06: pid 15522:LOG: received degenerate backend request for node_id: 0 from pid [15522]
2020-01-30 12:21:06: pid 14527:LOG: new IPC connection received
2020-01-30 12:21:06: pid 14527:LOG: watchdog received the failover command from local pgpool-II on IPC interface
2020-01-30 12:21:06: pid 14527:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2020-01-30 12:21:06: pid 14527:LOG: we have got the consensus to perform the failover
2020-01-30 12:21:06: pid 14527:DETAIL: 1 node(s) voted in the favor
2020-01-30 12:21:06: pid 14525:LOG: Restart all children
2020-01-30 12:21:06: pid 14525:LOG: execute command: /usr/share/pgpool/4.1.0/etc/failover.sh 0 0 pgpool-poc02.novalocal reppassword /installer/postgresql-11.5/data/im_the_master
Authorized Uses Only.All activity may be Monitored and Reported
promote - Start
DEBUG: The script will be executed with the following arguments:
DEBUG: --trigger-file=/installer/postgresql-11.5/data/im_the_master
DEBUG: --standby_file=/installer/postgresql-11.5/data/im_slave
DEBUG: --demote-host=
DEBUG: --user=replication
DEBUG: --password=reppassword
DEBUG: --force
INFO: Checking if standby file exists...
INFO: Checking if trigger file exists...
INFO: Deleting recovery.conf file...
INFO: Checking if postgresql.conf file exists...
INFO: postgresql.conf file found. Checking if it is for primary server...
INFO: postgresql.conf file corresponds to primary server file. Nothing to do.
pg_ctl: server is running (PID: 2689)
/usr/local/pgsql11.5/bin/postgres "-D" "/installer/postgresql-11.5/data"
INFO: Restarting postgresql service...
waiting for server to shut down....2020-01-30 12:21:06: pid 15554:LOG: failed to connect to PostgreSQL server on "pgpool-poc02.novalocal:5432", getsockopt() detected error "Connection refused"
2020-01-30 12:21:06: pid 15554:ERROR: failed to make persistent db connection
2020-01-30 12:21:06: pid 15554:DETAIL: connection to host:"pgpool-poc02.novalocal:5432" failed
2020-01-30 12:21:06: pid 15554:LOG: health check failed on node 1 (timeout:0)
2020-01-30 12:21:06: pid 15554:LOG: received degenerate backend request for node_id: 1 from pid [15554]
2020-01-30 12:21:06: pid 14527:LOG: new IPC connection received
2020-01-30 12:21:06: pid 14527:LOG: watchdog received the failover command from local pgpool-II on IPC interface
2020-01-30 12:21:06: pid 14527:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2020-01-30 12:21:06: pid 14527:LOG: we have got the consensus to perform the failover
2020-01-30 12:21:06: pid 14527:DETAIL: 1 node(s) voted in the favor
 done
server stopped
waiting for server to start....2020-01-30 12:21:06 +03 LOG: listening on IPv4 address "0.0.0.0", port 5432
2020-01-30 12:21:06 +03 LOG: listening on IPv6 address "::", port 5432
2020-01-30 12:21:06 +03 LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2020-01-30 12:21:06 +03 LOG: redirecting log output to logging collector process
2020-01-30 12:21:06 +03 HINT: Future log output will appear in directory "/dblogs/logs".
 done
server started
pg_ctl: server is running (PID: 4709)
/usr/local/pgsql11.5/bin/postgres "-D" "/installer/postgresql-11.5/data"
INFO: postgresql already running.
INFO: Ensuring replication role and password...
INFO: Replication role found. Ensuring password...
ALTER ROLE
INFO: Creating primary info file...
promote - Done!
2020-01-30 12:21:08: pid 15551:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-01-30 12:21:08: pid 15551:ERROR: failed to make persistent db connection
2020-01-30 12:21:08: pid 15551:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
2020-01-30 12:21:09: pid 14527:LOG: watchdog received the failover command from remote pgpool-II node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-01-30 12:21:09: pid 14527:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
2020-01-30 12:21:09: pid 14527:LOG: we have got the consensus to perform the failover
2020-01-30 12:21:09: pid 14527:DETAIL: 1 node(s) voted in the favor
2020-01-30 12:21:09: pid 14527:LOG: invalid degenerate backend request, node id : 0 status: [3] is not valid for failover
2020-01-30 12:21:13: pid 15551:ERROR: Failed to check replication time lag
2020-01-30 12:21:13: pid 15551:DETAIL: No persistent db connection for the node 0
2020-01-30 12:21:13: pid 15551:HINT: check sr_check_user and sr_check_password
2020-01-30 12:21:13: pid 15551:CONTEXT: while checking replication time lag
2020-01-30 12:21:13: pid 15551:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-01-30 12:21:13: pid 15551:ERROR: failed to make persistent db connection
2020-01-30 12:21:13: pid 15551:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
2020-01-30 12:21:14: pid 14527:LOG: watchdog received the failover command from remote pgpool-II node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-01-30 12:21:14: pid 14527:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
2020-01-30 12:21:14: pid 14527:LOG: we have got the consensus to perform the failover
2020-01-30 12:21:14: pid 14527:DETAIL: 1 node(s) voted in the favor
2020-01-30 12:21:14: pid 14527:LOG: invalid degenerate backend request, node id : 0 status: [3] is not valid for failover
(0003113)
raj.pandey1982@gmail.com   
2020-01-30 19:15   
iwa sloking into " https://www.pgpool.net/mantisbt/view.php?id=421" where almost same error in log and you suggested search_primary_node_timeout=10 and it worked for that guy.But not happening in my case.
(0003114)
raj.pandey1982@gmail.com   
2020-01-31 04:59   
Problem Statement: I am not able to Write/Read to newly promoted master node Node thru pgpool VIP. I need to restart pgpool services then only able to read-write to newly elected master.

Scenario 1 : Manual promotion of standby node as Master :-
(a) stopped Master DB manually
(b) run manually failover.sh script for promotion of standby node as new Master node.
(c) read/write is happening thru pgpool vip without restarting pgpool service.

Scenario 2: Automated promotion thru pgpoll(using FAILOVER_COMMAND ) of standby node as Master:-
(a) stopped Master DB manually
(b) got the consensus to perform failover and failover.sh script for promotion of standby node as New Master node got successful.
(c) Read/Write is NOT HAPPENING thru pgpool vip without restarting pgpool service. If i restaet pgpool service then read/write is happening properly.

In both the scenarios configuration files and parameter files were same except in 1st Scenario when failover_command parameter was commented for manual promotion.

Need your help, so that in automated failover scenerio i dont need to restart pgpool services to read/write to newly elected Master DB using pgpool VIP.

Attaching the latest pgpool.conf.
(0003115)
raj.pandey1982@gmail.com   
2020-02-02 18:23   
Might i get some update here please.
(0003116)
raj.pandey1982@gmail.com   
2020-02-02 22:03   
replication_state | replication_sync_state showing blank even when i have set below:-

backend_application_name0 = 'pgpool-poc01.novalocal'
backend_application_name1 = 'pgpool-poc02.novalocal'
--------------------------------------------------------------------------------------------------------------------
Also At Node 2 Stand by server : node status show 0 in log

2020-02-02 15:51:57: pid 32249:LOG: node status[0]: 0
2020-02-02 15:51:57: pid 32249:LOG: node status[1]: 0

--------------------------------------------------------------------------------------------------------------------------
while At Node 2 Stand by server : node status show 1/2 in log:-
2020-02-02 15:48:24: pid 31835:LOG: node status[0]: 1
2020-02-02 15:48:24: pid 31835:LOG: node status[1]: 2

is this related to pgpool not responding properly after failover ?.
(0003120)
t-ishii   
2020-02-04 13:48   
> Also At Node 2 Stand by server : node status show 0 in log
>
> 2020-02-02 15:51:57: pid 32249:LOG: node status[0]: 0
> 2020-02-02 15:51:57: pid 32249:LOG: node status[1]: 0
>
> --------------------------------------------------------------------------------------------------------------------------
> while At Node 2 Stand by server : node status show 1/2 in log:-
> 2020-02-02 15:48:24: pid 31835:LOG: node status[0]: 1
> 2020-02-02 15:48:24: pid 31835:LOG: node status[1]: 2

So sometime Node 2 shows status 0, 0 and sometimes 1, 2?
(0003121)
t-ishii   
2020-02-04 13:52   
I think you haven't tell me the version of PostgreSQL. Can you share it?
(0003122)
raj.pandey1982@gmail.com   
2020-02-04 14:30   
>So sometime Node 2 shows status 0, 0 and sometimes 1, 2?
PGPOOL Master node always showes node0=1 and node1=2 ,PGPOOL StandBy node always showes node0=0 and node1=0.

>I think you haven't tell me the version of PostgreSQL. Can you share it?>
 PostgreSql 11.5 .

>One more finding for your help:- same kind of error i get with pgpool version 4.0 while failover.
(0003123)
t-ishii   
2020-02-04 14:49   
Can you attach the latest pgpool.conf, postgresql.conf and recovery.conf here? Cut&paste to the comments are is hard to read. Also you seem to have made modifications to them.
(0003124)
raj.pandey1982@gmail.com   
2020-02-04 15:58   
Please find the attached file as per request.
(0003125)
t-ishii   
2020-02-04 16:34   
I see a problem in recovery.conf:
primary_conninfo = 'user=replication password=reppassword host=10.70.184.27 port=5432 sslmode=disable sslcompression=0 target_session_attrs=any'

You need to add "application_name" parameter something like:
application_name=pgpool-poc01.novalocal
(this is an example for pgpool-poc01.novalocal)
(0003126)
raj.pandey1982@gmail.com   
2020-02-04 16:50   
> You need to add "application_name" parameter something like:>

Is below correct now:? i just added application_name=<hostname>

primary_conninfo = 'user=replication password=reppassword host=10.70.184.27 application_name=pgpool-poc01.novalocal port=5432 sslmode=disable sslcompression=0 target_session_attrs=any

But this above i use only once while making slave DB through PG_backup. and while db failover/promotion this recovery.conf file gets deleted too....so not sure how this file could be the cause of the issue.
(0003127)
t-ishii   
2020-02-04 17:13   
> Is below correct now:?
Looks good to me.

> But this above i use only once while making slave DB through PG_backup. and while db failover/promotion this recovery.conf file gets deleted too....so not sure how this file could be the cause of the issue.

No. recovery.conf is used by standby PostgreSQL to sync with primary PostgreSQL server even after failover. So you need to keep recovery.conf after making new standby PostgreSQL. The work should be done by recovery_1st_stage_command (in your case recovery_1st_stage.sh). If you are not sure how to do that, you can find an example at:
src/sample/scripts/recovery_1st_stage.sample
(0003128)
raj.pandey1982@gmail.com   
2020-02-04 17:46   
>No. recovery.conf is used by standby PostgreSQL to sync with primary PostgreSQL server even after failover.
i am not deleting recovery.conf while/after preparing slave db. Only when i shutdown master db then while failover recovery.conf gets deleted by failover script to promote the salve as Master. and master/Slave sync work properly.

Now i just created slave DB again (as i always need to do once slave is open as master after old master failover) after added the application_name=pgpool-poc01.novalocal parameter in recovery.conf file .
Both Master/Slave are in sync now:-

[postgres@pgpool-poc02 logs]$ tail -200f mawidstg01-2020-02-04_113709.log
2020-02-04 11:37:09 +03 LOG: database system was interrupted; last known up at 2020-02-04 11:00:25 +03
2020-02-04 11:37:09 +03 LOG: entering standby mode
cp: cannot stat ‘/installer/archivedir/0000000100000109000000CD’: Not a directory
2020-02-04 11:37:09 +03 LOG: redo starts at 109/CD000028
2020-02-04 11:37:09 +03 LOG: consistent recovery state reached at 109/CD000130
2020-02-04 11:37:09 +03 LOG: database system is ready to accept read only connections
cp: cannot stat ‘/installer/archivedir/0000000100000109000000CE’: Not a directory
2020-02-04 11:37:09 +03 LOG: started streaming WAL from primary at 109/CE000000 on timeline 1

Now i started PGPOOL servcies on both the nodes and checked the pool_nodes:

[postgres@pgpool-poc01 logs]$ /usr/local/pgsql11.5/bin/psql -h 10.70.184.29 -p 5433 -U postgres -d mawidstg01
psql (11.5)
Type "help" for help.

mawidstg01=# show pool_nodes;
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+------------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0 | pgpool-poc01.novalocal | 5432 | up | 0.500000 | primary | 210 | true | 0 | | | 2020-02-04 11:38:29
 1 | pgpool-poc02.novalocal | 5432 | up | 0.500000 | standby | 0 | false | 0 | | | 2020-02-04 11:38:29
(2 rows)

Again replication_state | replication_sync_state looks blank.
(0003129)
raj.pandey1982@gmail.com   
2020-02-04 17:47   
[postgres@pgpool-poc02 data]$ cat recovery.conf
standby_mode = 'on'
primary_conninfo = 'user=replication password=reppassword host=10.70.184.27 application_name=pgpool-poc01.novalocal port=5432 sslmode=disable sslcompression=0 target_session_attrs=any'
primary_conninfo = 'host=10.70.184.27 port=5432 user=replication password=reppassword'
trigger_file = '/installer/tmp/im_the_master'
restore_command = 'cp /installer/archivedir/%f "%p"'

[postgres@pgpool-poc02 data]$
(0003130)
raj.pandey1982@gmail.com   
2020-02-04 18:03   
I want to clear one more thing:- In our setup , aim is to do failover from Master to Slave and once Slave is promoted as Master Application/remote/client should connect to it.

If above is successful then i can even create a new slave DB any time from above Newly promoted master, either by manually restore using rsync OR or PG_Backup or automation thru PGPOOL.

But right now what i want to achieve that "when Master DB goes down Slave should be able to accept connections as a newly promoted Master". Now Script is doing its work and promoting slave as master but pgpool is not leaving the down node and keep on bugging it and not
accepting remote connection.
(0003131)
raj.pandey1982@gmail.com   
2020-02-04 18:37   
After DB failover, if i restart PGPOLL services manually at Master and Stand By , then Remote connection is happening to newly promoted Master (but then its manual work and not automatic) with below message in pgpool log:-

2020-02-04 12:32:46: pid 31014:CONTEXT: while checking replication time lag
2020-02-04 12:32:46: pid 31014:LOG: get_query_result falied: status: -1
2020-02-04 12:32:46: pid 31014:CONTEXT: while checking replication time lag
2020-02-04 12:32:51: pid 31014:LOG: get_query_result: no rows returned
2020-02-04 12:32:51: pid 31014:DETAIL: node id (1)
2020-02-04 12:32:51: pid 31014:CONTEXT: while checking replication time lag
2020-02-04 12:32:51: pid 31014:LOG: get_query_result falied: status: -1
2020-02-04 12:32:51: pid 31014:CONTEXT: while checking replication time lag
2020-02-04 12:32:56: pid 31014:LOG: get_query_result: no rows returned
2020-02-04 12:32:56: pid 31014:DETAIL: node id (1)
2020-02-04 12:32:56: pid 31014:CONTEXT: while checking replication time lag
2020-02-04 12:32:56: pid 31014:LOG: get_query_result falied: status: -1
2020-02-04 12:32:56: pid 31014:CONTEXT: while checking replication time lag
2020-02-04 12:33:01: pid 31014:LOG: get_query_result: no rows returned
2020-02-04 12:33:01: pid 31014:DETAIL: node id (1)
2020-02-04 12:33:01: pid 31014:CONTEXT: while checking replication time lag
2020-02-04 12:33:01: pid 31014:LOG: get_query_result falied: status: -1
2020-02-04 12:33:01: pid 31014:CONTEXT: while checking replication time lag
(0003137)
raj.pandey1982@gmail.com   
2020-02-05 16:44   
Hello Team, An findings here to resolve this wih some fixes.
(0003138)
t-ishii   
2020-02-05 17:45   
> Again replication_state | replication_sync_state looks blank.
There must be something wrong with your settings.

Can you try following command on pgpool-poc01.novalocal ?

PGPASSWORD=postgrestg psql -p 5432 -h pgpool-poc01.novalocal -d postgres -U postgres -w
(0003139)
raj.pandey1982@gmail.com   
2020-02-05 17:47   
do i need to execute it as it is or to put in pgpool.conf file?
(0003140)
raj.pandey1982@gmail.com   
2020-02-05 17:51   
below are working

[postgres@pgpool-poc01 data]$ /usr/local/pgsql11.5/bin/psql -h 10.70.184.28 -p 5432 -U postgres -d mawidstg01
psql (11.5)
Type "help" for help.

mawidstg01=# \q
[postgres@pgpool-poc01 data]$ /usr/local/pgsql11.5/bin/psql -h pgpool-poc01.novalocal -p 5432 -U postgres -d mawidstg01
psql (11.5)
Type "help" for help.

mawidstg01=# \q
[postgres@pgpool-poc01 data]$ /usr/local/pgsql11.5/bin/psql -h pgpool-poc01.novalocal -p 5432 -U postgres -d postgres
psql (11.5)
Type "help" for help.

postgres=#
(0003141)
raj.pandey1982@gmail.com   
2020-02-05 17:52   
postgres=# \q
[postgres@pgpool-poc01 data]$ /usr/local/pgsql11.5/bin/psql -h pgpool-poc01.novalocal -p 5432 -U postgres -d postgres PGPASSWORD=postgrestg -w
psql: warning: extra command-line argument "PGPASSWORD=postgrestg" ignored
psql (11.5)
Type "help" for help.

postgres=# \q
[postgres@pgpool-poc01 data]$ /usr/local/pgsql11.5/bin/psql -h pgpool-poc01.novalocal -p 5432 -U postgres -d postgres PGPASSWORD=postgrestg
psql: warning: extra command-line argument "PGPASSWORD=postgrestg" ignored
psql (11.5)
Type "help" for help.

postgres=#
(0003142)
raj.pandey1982@gmail.com   
2020-02-05 21:54   
Hi I run this as you asked for and this worked with both db an pgpool port but sill status is blank:-
[postgres@pgpool-poc01 postgresql-11.5]$ PGPASSWORD=postgrestg /usr/local/pgsql11.5/bin/psql -h pgpool-poc01.novalocal -p 5432 -U postgres -d postgres -w
psql (11.5)
Type "help" for help.

postgres=# select * from pg_catalog.pg_stat_replication;
  pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_
lag | replay_lag | sync_priority | sync_state
-------+----------+-------------+------------------+--------------+-----------------+-------------+-------------------------------+--------------+-----------+--------------+--------------+--------------+--------------+-----------+-------
----+------------+---------------+------------
 23100 | 98470 | replication | walreceiver | 10.70.184.28 | | 46798 | 2020-02-04 17:14:34.695439+03 | 8915781 | streaming | 109/D413DF50 | 109/D413DF50 | 109/D413DF50 | 109/D413DF50 | |
    | | 0 | async
(1 row)

postgres=# \q
[postgres@pgpool-poc01 postgresql-11.5]$ PGPASSWORD=postgrestg /usr/local/pgsql11.5/bin/psql -h pgpool-poc01.novalocal -p 5433 -U postgres -d postgres -w
psql (11.5)
Type "help" for help.

postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+------------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0 | pgpool-poc01.novalocal | 5432 | up | 0.500000 | primary | 0 | true | 0 | | | 2020-02-05 15:34:17
 1 | pgpool-poc02.novalocal | 5432 | up | 0.500000 | standby | 0 | false | 0 | | | 2020-02-05 15:34:17
(2 rows)

postgres=#
(0003143)
t-ishii   
2020-02-06 00:08   
(Last edited: 2020-02-06 00:15)
Oops. I forgot to add the SQL to be executed by the psql command:
PGPASSWORD=postgrestg psql -p 5432 -h pgpool-poc01.novalocal -d postgres -U postgres -w -c "SELECT application_name, state, sync_state FROM pg_stat_replication"

This is effectively same SQL with database, user and password specified in pgpool.conf, executed in pgpool to collect information used by show pool_nodes command.

(0003144)
raj.pandey1982@gmail.com   
2020-02-06 01:59   
postgres@pgpool-poc01 postgresql-11.5]$ PGPASSWORD=postgrestg /usr/local/pgsql11.5/bin/psql -h pgpool-poc01.novalocal -p 5432 -U postgres -d postgres -w -c "SELECT application_name, state, sync_state FROM pg_stat_replication"
 application_name | state | sync_state
------------------+-----------+------------
 walreceiver | streaming | async
(1 row)

[postgres@pgpool-poc01 postgresql-11.5]$
(0003145)
raj.pandey1982@gmail.com   
2020-02-06 02:02   
Finally your query gives the result as per expectation.So what is wrong with my pgpool settings
(0003146)
t-ishii   
2020-02-06 08:13   
application_name should be pgpool-poc02.novalocal rather than walreceiver. walreceiver is the default application name for wal receiver process. This means that application name is not properly set in recovery.con on pgpool-poc02.novalocal Have you set application name parameter to pgpool-poc02.novalocal in the recovery.con on pgpool-poc02.novalocal? I mean:
application_name=pgpool-poc02.novalocal
in the primary_conninfo line.
(0003147)
raj.pandey1982@gmail.com   
2020-02-06 16:33   
Hi Friend,
Thanks.I did the changes and this worked .Both the fields values now showing up Now, what next thing to check as i tried failover after this but same issue :-
[postgres@pgpool-poc02 data]$ cat /tmp/slaveconfbkp/recovery.conf
standby_mode = 'on'
primary_conninfo = 'user=replication password=reppassword host=10.70.184.27 application_name=pgpool-poc02.novalocal port=5432 sslmode=disable sslcompression=0 target_session_attrs=any'
trigger_file = '/installer/tmp/im_the_master'
restore_command = 'cp /installer/archivedir/%f "%p"'


This worked
[postgres@pgpool-poc01 postgresql-11.5]$ PGPASSWORD=postgrestg /usr/local/pgsql11.5/bin/psql -h pgpool-poc01.novalocal -p 5433 -U postgres -d postgres -w -c "SELECT application_name, state, sync_state FROM pg_stat_replication"
    application_name | state | sync_state
------------------------+-----------+------------
 pgpool-poc02.novalocal | streaming | async
(1 row)

[postgres@pgpool-poc01 postgresql-11.5]$ /usr/local/pgsql11.5/bin/psql -h 10.70.184.29 -p 5433 -U postgres -d mawidstg01
psql (11.5)
Type "help" for help.

mawidstg01=# show pool_nodes;
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+------------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0 | pgpool-poc01.novalocal | 5432 | up | 0.500000 | primary | 3 | false | 0 | | | 2020-02-06 10:16:31
 1 | pgpool-poc02.novalocal | 5432 | up | 0.500000 | standby | 0 | true | 0 | streaming | async | 2020-02-06 10:16:31
(2 rows)
(0003148)
raj.pandey1982@gmail.com   
2020-02-06 16:35   
After replication_state | replication_sync_state started displaying i again did the DB failover but got same error again :-(...Please suggest :-

the 2020-02-06 10:16:09: pid 18247:LOG: successfully acquired the delegate IP:"10.70.184.29"
2020-02-06 10:16:09: pid 18247:DETAIL: 'if_up_cmd' returned with success
2020-02-06 10:16:09: pid 18243:LOG: watchdog escalation process with pid: 18247 exit with SUCCESS.
2020-02-06 10:16:17: pid 18243:LOG: new watchdog node connection is received from "10.70.184.28:21705"
2020-02-06 10:16:17: pid 18243:LOG: new node joined the cluster hostname:"pgpool-poc02.novalocal" port:9000 pgpool_port:5433
2020-02-06 10:16:17: pid 18243:DETAIL: Pgpool-II version:"4.1.0" watchdog messaging version: 1.1
2020-02-06 10:16:17: pid 18243:LOG: new outbound connection to pgpool-poc02.novalocal:9000
2020-02-06 10:16:23: pid 18243:LOG: adding watchdog node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" to the standby list
2020-02-06 10:16:23: pid 18241:LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog
2020-02-06 10:16:23: pid 18243:LOG: new IPC connection received
2020-02-06 10:16:23: pid 18241:LOG: watchdog cluster now holds the quorum
2020-02-06 10:16:23: pid 18241:DETAIL: updating the state of quarantine backend nodes
2020-02-06 10:16:23: pid 18243:LOG: new IPC connection received
2020-02-06 10:16:31: pid 19253:LOG: pool_reuse_block: blockid: 0
2020-02-06 10:16:31: pid 19253:CONTEXT: while searching system catalog, When relcache is missed
2020-02-06 10:16:32: pid 19255:LOG: forked new pcp worker, pid=19282 socket=7
2020-02-06 10:16:32: pid 19255:LOG: PCP process with pid: 19282 exit with SUCCESS.
2020-02-06 10:16:32: pid 19255:LOG: PCP process with pid: 19282 exits with status 0
2020-02-06 10:16:32: pid 19255:LOG: forked new pcp worker, pid=19285 socket=7
2020-02-06 10:16:32: pid 19255:LOG: PCP process with pid: 19285 exit with SUCCESS.
2020-02-06 10:16:32: pid 19255:LOG: PCP process with pid: 19285 exits with status 0
2020-02-06 10:16:32: pid 19255:LOG: forked new pcp worker, pid=19288 socket=7
2020-02-06 10:16:32: pid 19255:LOG: PCP process with pid: 19288 exit with SUCCESS.
2020-02-06 10:16:32: pid 19255:LOG: PCP process with pid: 19288 exits with status 0
2020-02-06 10:20:05: pid 19255:LOG: forked new pcp worker, pid=19545 socket=7
2020-02-06 10:20:05: pid 19255:LOG: PCP process with pid: 19545 exit with SUCCESS.
2020-02-06 10:20:05: pid 19255:LOG: PCP process with pid: 19545 exits with status 0
2020-02-06 10:20:05: pid 19255:LOG: forked new pcp worker, pid=19548 socket=7
2020-02-06 10:20:05: pid 19255:LOG: PCP process with pid: 19548 exit with SUCCESS.
2020-02-06 10:20:05: pid 19255:LOG: PCP process with pid: 19548 exits with status 0
2020-02-06 10:20:05: pid 19255:LOG: forked new pcp worker, pid=19551 socket=7
2020-02-06 10:20:05: pid 19255:LOG: PCP process with pid: 19551 exit with SUCCESS.
2020-02-06 10:20:05: pid 19255:LOG: PCP process with pid: 19551 exits with status 0
2020-02-06 10:21:31: pid 18241:LOG: child process with pid: 19253 exits with status 256
2020-02-06 10:21:31: pid 18241:LOG: fork a new child process with pid: 19646
2020-02-06 10:21:32: pid 18241:LOG: child process with pid: 19232 exits with status 256
2020-02-06 10:21:32: pid 18241:LOG: fork a new child process with pid: 19647
2020-02-06 10:21:58: pid 18241:LOG: child process with pid: 19235 exits with status 256
2020-02-06 10:21:58: pid 18241:LOG: fork a new child process with pid: 19679
2020-02-06 10:23:37: pid 19256:LOG: received degenerate backend request for node_id: 0 from pid [19256]
2020-02-06 10:23:37: pid 18243:LOG: new IPC connection received
2020-02-06 10:23:37: pid 18243:LOG: watchdog received the failover command from local pgpool-II on IPC interface
2020-02-06 10:23:37: pid 18243:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2020-02-06 10:23:37: pid 18243:LOG: we have got the consensus to perform the failover
2020-02-06 10:23:37: pid 18243:DETAIL: 1 node(s) voted in the favor
2020-02-06 10:23:37: pid 19256:ERROR: unable to read data from DB node 0
2020-02-06 10:23:37: pid 19256:DETAIL: socket read failed with an error "Success"
2020-02-06 10:23:37: pid 18241:LOG: Pgpool-II parent process has received failover request
2020-02-06 10:23:37: pid 18243:LOG: new IPC connection received
2020-02-06 10:23:37: pid 18243:LOG: received the failover indication from Pgpool-II on IPC interface
2020-02-06 10:23:37: pid 18243:LOG: watchdog is informed of failover start by the main process
2020-02-06 10:23:37: pid 18241:LOG: starting degeneration. shutdown host pgpool-poc01.novalocal(5432)
2020-02-06 10:23:37: pid 18241:LOG: Restart all children
2020-02-06 10:23:37: pid 18241:LOG: execute command: /usr/share/pgpool/4.1.0/etc/failover.sh 0 0 pgpool-poc02.novalocal postgrestg /installer/postgresql-11.5/data/im_the_master
Authorized Uses Only.All activity may be Monitored and Reported
promote - Start
DEBUG: The script will be executed with the following arguments:
DEBUG: --trigger-file=/installer/postgresql-11.5/data/im_the_master
DEBUG: --standby_file=/installer/postgresql-11.5/data/im_slave
DEBUG: --demote-host=
DEBUG: --user=postgres
DEBUG: --password=postgrestg
DEBUG: --force
INFO: Checking if standby file exists...
INFO: Checking if trigger file exists...
INFO: Deleting recovery.conf file...
INFO: Checking if postgresql.conf file exists...
INFO: postgresql.conf file found. Checking if it is for primary server...
INFO: postgresql.conf file corresponds to primary server file. Nothing to do.
pg_ctl: server is running (PID: 28770)
/usr/local/pgsql11.5/bin/postgres "-D" "/installer/postgresql-11.5/data"
INFO: Restarting postgresql service...
waiting for server to shut down.... done
server stopped
waiting for server to start....2020-02-06 10:23:37 +03 LOG: listening on IPv4 address "0.0.0.0", port 5432
2020-02-06 10:23:37 +03 LOG: listening on IPv6 address "::", port 5432
2020-02-06 10:23:37 +03 LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2020-02-06 10:23:37 +03 LOG: redirecting log output to logging collector process
2020-02-06 10:23:37 +03 HINT: Future log output will appear in directory "/dblogs/logs".
 done
server started
pg_ctl: server is running (PID: 30372)
/usr/local/pgsql11.5/bin/postgres "-D" "/installer/postgresql-11.5/data"
INFO: postgresql already running.
INFO: Ensuring replication role and password...
INFO: Replication role found. Ensuring password...
ALTER ROLE
INFO: Creating primary info file...
promote - Done!
2020-02-06 10:23:42: pid 19256:ERROR: Failed to check replication time lag
2020-02-06 10:23:42: pid 19256:DETAIL: No persistent db connection for the node 0
2020-02-06 10:23:42: pid 19256:HINT: check sr_check_user and sr_check_password
2020-02-06 10:23:42: pid 19256:CONTEXT: while checking replication time lag
2020-02-06 10:23:42: pid 19256:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-02-06 10:23:42: pid 19256:ERROR: failed to make persistent db connection
2020-02-06 10:23:42: pid 19256:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
2020-02-06 10:23:42: pid 18243:LOG: watchdog received the failover command from remote pgpool-II node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-02-06 10:23:42: pid 18243:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
2020-02-06 10:23:42: pid 18243:LOG: we have got the consensus to perform the failover
2020-02-06 10:23:42: pid 18243:DETAIL: 1 node(s) voted in the favor
2020-02-06 10:23:42: pid 18243:LOG: invalid degenerate backend request, node id : 0 status: [3] is not valid for failover
2020-02-06 10:23:47: pid 19256:ERROR: Failed to check replication time lag
2020-02-06 10:23:47: pid 19256:DETAIL: No persistent db connection for the node 0
2020-02-06 10:23:47: pid 19256:HINT: check sr_check_user and sr_check_password
2020-02-06 10:23:47: pid 19256:CONTEXT: while checking replication time lag
2020-02-06 10:23:47: pid 19256:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-02-06 10:23:47: pid 19256:ERROR: failed to make persistent db connection
2020-02-06 10:23:47: pid 19256:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
2020-02-06 10:23:50: pid 18243:LOG: watchdog received the failover command from remote pgpool-II node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-02-06 10:23:50: pid 18243:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
2020-02-06 10:23:50: pid 18243:LOG: we have got the consensus to perform the failover
2020-02-06 10:23:50: pid 18243:DETAIL: 1 node(s) voted in the favor
2020-02-06 10:23:50: pid 18243:LOG: invalid degenerate backend request, node id : 0 status: [3] is not valid for failover
2020-02-06 10:23:52: pid 19256:ERROR: Failed to check replication time lag
2020-02-06 10:23:52: pid 19256:DETAIL: No persistent db connection for the node 0
2020-02-06 10:23:52: pid 19256:HINT: check sr_check_user and sr_check_password
2020-02-06 10:23:52: pid 19256:CONTEXT: while checking replication time lag
2020-02-06 10:23:52: pid 19256:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-02-06 10:23:52: pid 19256:ERROR: failed to make persistent db connection
2020-02-06 10:23:52: pid 19256:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
2020-02-06 10:23:57: pid 19256:ERROR: Failed to check replication time lag
2020-02-06 10:23:57: pid 19256:DETAIL: No persistent db connection for the node 0
2020-02-06 10:23:57: pid 19256:HINT: check sr_check_user and sr_check_password
2020-02-06 10:23:57: pid 19256:CONTEXT: while checking replication time lag
2020-02-06 10:23:57: pid 19256:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-02-06 10:23:57: pid 19256:ERROR: failed to make persistent db connection
2020-02-06 10:23:57: pid 19256:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
2020-02-06 10:23:58: pid 18243:LOG: watchdog received the failover command from remote pgpool-II node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-02-06 10:23:58: pid 18243:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
2020-02-06 10:23:58: pid 18243:LOG: we have got the consensus to perform the failover
2020-02-06 10:23:58: pid 18243:DETAIL: 1 node(s) voted in the favor
2020-02-06 10:23:58: pid 18243:LOG: invalid degenerate backend request, node id : 0 status: [3] is not valid for failover
2020-02-06 10:24:02: pid 19256:ERROR: Failed to check replication time lag
2020-02-06 10:24:02: pid 19256:DETAIL: No persistent db connection for the node 0
2020-02-06 10:24:02: pid 19256:HINT: check sr_check_user and sr_check_password
2020-02-06 10:24:02: pid 19256:CONTEXT: while checking replication time lag
2020-02-06 10:24:02: pid 19256:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-02-06 10:24:02: pid 19256:ERROR: failed to make persistent db connection
2020-02-06 10:24:02: pid 19256:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
2020-02-06 10:24:06: pid 18243:LOG: watchdog received the failover command from remote pgpool-II node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-02-06 10:24:06: pid 18243:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
2020-02-06 10:24:06: pid 18243:LOG: we have got the consensus to perform the failover
2020-02-06 10:24:06: pid 18243:DETAIL: 1 node(s) voted in the favor
2020-02-06 10:24:06: pid 18243:LOG: invalid degenerate backend request, node id : 0 status: [3] is not valid for failover
2020-02-06 10:24:07: pid 19256:ERROR: Failed to check replication time lag
2020-02-06 10:24:07: pid 19256:DETAIL: No persistent db connection for the node 0
(0003149)
raj.pandey1982@gmail.com   
2020-02-06 17:20   
if after above issue i restart Primary and Standby pgpool Services ,all goes good (Remote connection start happening) with below log message:-(but i expect this thing to happen when first time only i made master db down at node 1):-

2020-02-06 11:11:33: pid 20295:LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
2020-02-06 11:11:33: pid 20295:LOG: I am announcing my self as master/coordinator watchdog node
2020-02-06 11:11:37: pid 20295:LOG: I am the cluster leader node
2020-02-06 11:11:37: pid 20295:DETAIL: our declare coordinator message is accepted by all nodes
2020-02-06 11:11:37: pid 20295:LOG: setting the local node "pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal" as watchdog cluster master
2020-02-06 11:11:37: pid 20295:LOG: I am the cluster leader node. Starting escalation process
2020-02-06 11:11:37: pid 20293:LOG: watchdog process is initialized
2020-02-06 11:11:37: pid 20293:DETAIL: watchdog messaging data version: 1.1
2020-02-06 11:11:37: pid 20295:LOG: escalation process started with PID:20296
2020-02-06 11:11:37: pid 20296:LOG: watchdog: escalation started
2020-02-06 11:11:37: pid 20295:LOG: new IPC connection received
2020-02-06 11:11:37: pid 20295:LOG: new IPC connection received
2020-02-06 11:11:37: pid 20297:LOG: 2 watchdog nodes are configured for lifecheck
2020-02-06 11:11:37: pid 20297:LOG: watchdog nodes ID:0 Name:"pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal"
2020-02-06 11:11:37: pid 20297:DETAIL: Host:"pgpool-poc01.novalocal" WD Port:9000 pgpool-II port:5433
2020-02-06 11:11:37: pid 20297:LOG: watchdog nodes ID:1 Name:"Not_Set"
2020-02-06 11:11:37: pid 20297:DETAIL: Host:"pgpool-poc02.novalocal" WD Port:9000 pgpool-II port:5433
2020-02-06 11:11:37: pid 20293:LOG: Setting up socket for 0.0.0.0:5433
2020-02-06 11:11:37: pid 20293:LOG: Setting up socket for :::5433
2020-02-06 11:11:37: pid 20297:LOG: watchdog lifecheck trusted server "mohvcasdb01.novalocal" added for the availability check
2020-02-06 11:11:37: pid 20297:LOG: watchdog lifecheck trusted server "mohcasdevdb.novalocal" added for the availability check
RTNETLINK answers: File exists
2020-02-06 11:11:37: pid 20296:LOG: failed to acquire the delegate IP address
2020-02-06 11:11:37: pid 20296:DETAIL: 'if_up_cmd' failed
2020-02-06 11:11:37: pid 20296:WARNING: watchdog escalation failed to acquire delegate IP
2020-02-06 11:11:37: pid 20295:LOG: watchdog escalation process with pid: 20296 exit with SUCCESS.
2020-02-06 11:11:38: pid 20293:LOG: find_primary_node_repeatedly: waiting for finding a primary node
2020-02-06 11:11:38: pid 20293:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-02-06 11:11:38: pid 20293:ERROR: failed to make persistent db connection
2020-02-06 11:11:38: pid 20293:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
2020-02-06 11:11:38: pid 20293:LOG: find_primary_node: make_persistent_db_connection_noerror failed on node 0
2020-02-06 11:11:38: pid 20293:LOG: find_primary_node: primary node is 1
2020-02-06 11:11:38: pid 21305:LOG: PCP process: 21305 started
2020-02-06 11:11:38: pid 21306:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-02-06 11:11:38: pid 21306:ERROR: failed to make persistent db connection
2020-02-06 11:11:38: pid 21306:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
2020-02-06 11:11:38: pid 21307:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-02-06 11:11:38: pid 21307:ERROR: failed to make persistent db connection
2020-02-06 11:11:38: pid 21307:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
2020-02-06 11:11:38: pid 21307:LOG: health check retrying on DB node: 0 (round:1)
2020-02-06 11:11:38: pid 20293:LOG: pgpool-II successfully started. version 4.1.0 (karasukiboshi)
2020-02-06 11:11:38: pid 20293:LOG: node status[0]: 0
2020-02-06 11:11:38: pid 20293:LOG: node status[1]: 1
2020-02-06 11:11:38: pid 20299:LOG: createing watchdog heartbeat receive socket.
2020-02-06 11:11:38: pid 20299:DETAIL: bind receive socket to device: "eth0"
2020-02-06 11:11:38: pid 20299:LOG: set SO_REUSEPORT option to the socket
2020-02-06 11:11:38: pid 20299:LOG: creating watchdog heartbeat receive socket.
2020-02-06 11:11:38: pid 20299:DETAIL: set SO_REUSEPORT
2020-02-06 11:11:38: pid 20302:LOG: creating socket for sending heartbeat
2020-02-06 11:11:38: pid 20302:DETAIL: bind send socket to device: eth0
2020-02-06 11:11:38: pid 20302:LOG: set SO_REUSEPORT option to the socket
2020-02-06 11:11:38: pid 20302:LOG: creating socket for sending heartbeat
2020-02-06 11:11:38: pid 20302:DETAIL: set SO_REUSEPORT
2020-02-06 11:11:39: pid 21307:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-02-06 11:11:39: pid 21307:ERROR: failed to make persistent db connection
2020-02-06 11:11:39: pid 21307:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
2020-02-06 11:11:39: pid 21307:LOG: health check retrying on DB node: 0 (round:2)
2020-02-06 11:11:40: pid 21307:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-02-06 11:11:40: pid 21307:ERROR: failed to make persistent db connection
2020-02-06 11:11:40: pid 21307:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
2020-02-06 11:11:40: pid 21307:LOG: health check retrying on DB node: 0 (round:3)
2020-02-06 11:11:41: pid 21307:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-02-06 11:11:41: pid 21307:ERROR: failed to make persistent db connection
2020-02-06 11:11:41: pid 21307:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
2020-02-06 11:11:41: pid 21307:LOG: health check failed on node 0 (timeout:0)
2020-02-06 11:11:41: pid 21307:LOG: received degenerate backend request for node_id: 0 from pid [21307]
2020-02-06 11:11:41: pid 20295:LOG: new IPC connection received
2020-02-06 11:11:41: pid 20295:LOG: watchdog received the failover command from local pgpool-II on IPC interface
2020-02-06 11:11:41: pid 20295:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2020-02-06 11:11:41: pid 20295:LOG: we have got the consensus to perform the failover
2020-02-06 11:11:41: pid 20295:DETAIL: 1 node(s) voted in the favor
2020-02-06 11:11:41: pid 20293:LOG: Pgpool-II parent process has received failover request
2020-02-06 11:11:41: pid 20295:LOG: new IPC connection received
2020-02-06 11:11:41: pid 20295:LOG: received the failover indication from Pgpool-II on IPC interface
2020-02-06 11:11:41: pid 20295:LOG: watchdog is informed of failover start by the main process
2020-02-06 11:11:41: pid 20293:LOG: starting degeneration. shutdown host pgpool-poc01.novalocal(5432)
2020-02-06 11:11:41: pid 20293:LOG: Do not restart children because we are switching over node id 0 host: pgpool-poc01.novalocal port: 5432 and we are in streaming replication mode
2020-02-06 11:11:41: pid 20293:LOG: execute command: /usr/share/pgpool/4.1.0/etc/failover.sh 0 1 pgpool-poc02.novalocal postgrestg /installer/postgresql-11.5/data/im_the_master
2020-02-06 11:11:41: pid 20293:LOG: failover: set new primary node: 1
2020-02-06 11:11:41: pid 20293:LOG: failover: set new master node: 1
2020-02-06 11:11:41: pid 21306:ERROR: Failed to check replication time lag
2020-02-06 11:11:41: pid 21306:DETAIL: No persistent db connection for the node 0
2020-02-06 11:11:41: pid 21306:HINT: check sr_check_user and sr_check_password
2020-02-06 11:11:41: pid 21306:CONTEXT: while checking replication time lag
2020-02-06 11:11:41: pid 21306:LOG: worker process received restart request
2020-02-06 11:11:41: pid 20295:LOG: new IPC connection received
2020-02-06 11:11:41: pid 20295:LOG: received the failover indication from Pgpool-II on IPC interface
2020-02-06 11:11:41: pid 20295:LOG: watchdog is informed of failover end by the main process
failover done. shutdown host pgpool-poc01.novalocal(5432)2020-02-06 11:11:41: pid 20293:LOG: failover done. shutdown host pgpool-poc01.novalocal(5432)
2020-02-06 11:11:42: pid 21305:LOG: restart request received in pcp child process
2020-02-06 11:11:42: pid 20293:LOG: PCP child 21305 exits with status 0 in failover()
2020-02-06 11:11:42: pid 20293:LOG: fork a new PCP child pid 21313 in failover()
2020-02-06 11:11:42: pid 20293:LOG: worker child process with pid: 21306 exits with status 256
2020-02-06 11:11:42: pid 21313:LOG: PCP process: 21313 started
2020-02-06 11:11:42: pid 20293:LOG: fork a new worker child process with pid: 21314
2020-02-06 11:11:42: pid 21314:LOG: get_query_result: no rows returned
2020-02-06 11:11:42: pid 21314:DETAIL: node id (1)
2020-02-06 11:11:42: pid 21314:CONTEXT: while checking replication time lag
2020-02-06 11:11:42: pid 21314:CONTEXT: while checking replication time lag
2020-02-06 11:11:42: pid 21314:LOG: get_query_result falied: status: -1
2020-02-06 11:11:42: pid 21314:CONTEXT: while checking replication time lag
2020-02-06 11:11:47: pid 21314:LOG: get_query_result: no rows returned
2020-02-06 11:11:47: pid 21314:DETAIL: node id (1)
2020-02-06 11:11:47: pid 21314:CONTEXT: while checking replication time lag
2020-02-06 11:11:47: pid 21314:LOG: get_query_result falied: status: -1
2020-02-06 11:11:47: pid 21314:CONTEXT: while checking replication time lag
2020-02-06 11:11:52: pid 21314:LOG: get_query_result: no rows returned
2020-02-06 11:11:52: pid 21314:DETAIL: node id (1)
2020-02-06 11:11:52: pid 21314:CONTEXT: while checking replication time lag
2020-02-06 11:11:52: pid 21314:LOG: get_query_result falied: status: -1
2020-02-06 11:11:52: pid 21314:CONTEXT: while checking replication time lag
2020-02-06 11:11:57: pid 21314:LOG: get_query_result: no rows returned
2020-02-06 11:11:57: pid 21314:DETAIL: node id (1)
2020-02-06 11:11:57: pid 21314:CONTEXT: while checking replication time lag
2020-02-06 11:11:57: pid 21314:LOG: get_query_result falied: status: -1
2020-02-06 11:11:57: pid 21314:CONTEXT: while checking replication time lag
2020-02-06 11:12:02: pid 21314:LOG: get_query_result: no rows returned
2020-02-06 11:12:02: pid 21314:DETAIL: node id (1)
2020-02-06 11:12:02: pid 21314:CONTEXT: while checking replication time lag
2020-02-06 11:12:02: pid 21314:LOG: get_query_result falied: status: -1
2020-02-06 11:12:02: pid 21314:CONTEXT: while checking replication time lag
2020-02-06 11:12:07: pid 21314:LOG: get_query_result: no rows returned
2020-02-06 11:12:07: pid 21314:DETAIL: node id (1)
2020-02-06 11:12:07: pid 21314:CONTEXT: while checking replication time lag
2020-02-06 11:12:07: pid 21314:LOG: get_query_result falied: status: -1
2020-02-06 11:12:07: pid 21314:CONTEXT: while checking replication time lag
2020-02-06 11:12:12: pid 21314:LOG: get_query_result: no rows returned
2020-02-06 11:12:12: pid 21314:DETAIL: node id (1)
2020-02-06 11:12:12: pid 21314:CONTEXT: while checking replication time lag
2020-02-06 11:12:12: pid 21314:LOG: get_query_result falied: status: -1
2020-02-06 11:12:12: pid 21314:CONTEXT: while checking replication time lag
2020-02-06 11:12:17: pid 21314:LOG: get_query_result: no rows returned
2020-02-06 11:12:17: pid 21314:DETAIL: node id (1)
2020-02-06 11:12:17: pid 21314:CONTEXT: while checking replication time lag
2020-02-06 11:12:17: pid 21314:LOG: get_query_result falied: status: -1
2020-02-06 11:12:17: pid 21314:CONTEXT: while checking replication time lag
2020-02-06 11:12:22: pid 21314:LOG: get_query_result: no rows returned
2020-02-06 11:12:22: pid 21314:DETAIL: node id (1)
2020-02-06 11:12:22: pid 21314:CONTEXT: while checking replication time lag
(0003150)
raj.pandey1982@gmail.com   
2020-02-06 17:35   
Sadly, After Promotion pgpool should not worry about node1 but should be good with node 2.But its not happening here.
(0003151)
raj.pandey1982@gmail.com   
2020-02-06 17:56   
See the difference here in after log message "1 node(s) voted in the favor" after db failover and after db failover with pgpool service restart.
============================================================================================================================
Log after DB failouver :-
2020-02-06 10:23:50: pid 18243:DETAIL: 1 node(s) voted in the favor
2020-02-06 10:23:50: pid 18243:LOG: invalid degenerate backend request, node id : 0 status: [3] is not valid for failover
020-02-06 10:23:47: pid 19256:ERROR: Failed to check replication time lag

Log after restarting services after Db failover:-

2020-02-06 11:11:41: pid 20295:DETAIL: 1 node(s) voted in the favor
2020-02-06 11:11:41: pid 20293:LOG: Pgpool-II parent process has received failover request
2020-02-06 11:11:41: pid 20295:LOG: new IPC connection received
2020-02-06 11:11:41: pid 20295:LOG: received the failover indication from Pgpool-II on IPC interface
2020-02-06 11:11:41: pid 20295:LOG: watchdog is informed of failover start by the main process
2020-02-06 11:11:41: pid 20293:LOG: starting degeneration. shutdown host pgpool-poc01.novalocal(5432)
2020-02-06 11:11:41: pid 20293:LOG: Do not restart children because we are switching over node id 0 host: pgpool-poc01.novalocal port: 5432 and we are in streaming replication mode


So this message where difference starts:- "invalid degenerate backend request, node id : 0 status: [3] is not valid for failover".

something is not picking up here well just after DB failover which is picking well after service restart after db failover.
(0003152)
t-ishii   
2020-02-07 07:50   
> After replication_state | replication_sync_state started displaying i again did the DB failover but got same error again :-(...Please suggest :-

What exactly did you do? You shutdtown pgpool-poc02.novalocal? Or just shutdown postmaster on pgpool-poc02.novalocal?
(0003153)
raj.pandey1982@gmail.com   
2020-02-07 18:00   
Master DB and Primary PGPOOL is configured on node1 (pgpool-poc01novalocal)
Slave DB and Standby PGOOL is configured on node 2 (pgpool-poc02novalocal).
1st i added the recovery.conf with application_name=pgpool-poc02novalocal.
2nd i stopped PGPOOL Standby
3rd i stopped PGPOOL Master.
4th i stopped Slave DB and started again
5th I started PGPOOL at Primary & Standby
6th i checked replication_state | replication_sync_state showing thru 'pool_nodes' command.
7th i did failover by shutting down Master Database at Node1.
8th then i checked PGPOOL log on both Master / Stand by
9th both logs showed up SLAVE DB at node 2 promoted as Master and then started same old error like below and not allowing remote connection:-

2020-02-06 10:23:42: pid 19256:ERROR: Failed to check replication time lag
2020-02-06 10:23:42: pid 19256:DETAIL: No persistent db connection for the node 0
2020-02-06 10:23:42: pid 19256:HINT: check sr_check_user and sr_check_password
2020-02-06 10:23:42: pid 19256:CONTEXT: while checking replication time lag
2020-02-06 10:23:42: pid 19256:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-02-06 10:23:42: pid 19256:ERROR: failed to make persistent db connection
2020-02-06 10:23:42: pid 19256:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed
2020-02-06 10:23:42: pid 18243:LOG: watchdog received the failover command from remote pgpool-II node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-02-06 10:23:42: pid 18243:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal
2020-02-06 10:23:42: pid 18243:LOG: we have got the consensus to perform the failover
2020-02-06 10:23:42: pid 18243:DETAIL: 1 node(s) voted in the favor
2020-02-06 10:23:42: pid 18243:LOG: invalid degenerate backend request, node id : 0 status: [3] is not valid for failover
2020-02-06 10:23:47: pid 19256:ERROR: Failed to check replication time lag
2020-02-06 10:23:47: pid 19256:DETAIL: No persistent db connection for the node 0
2020-02-06 10:23:47: pid 19256:HINT: check sr_check_user and sr_check_password
2020-02-06 10:23:47: pid 19256:CONTEXT: while checking replication time lag
2020-02-06 10:23:47: pid 19256:LOG: failed to connect to PostgreSQL server on "pgpool-poc01.novalocal:5432", getsockopt() detected error "Connection refused"
2020-02-06 10:23:47: pid 19256:ERROR: failed to make persistent db connection
2020-02-06 10:23:47: pid 19256:DETAIL: connection to host:"pgpool-poc01.novalocal:5432" failed

10th i stopped pgpool servcies on Standby
11th i stopped pgpool services on Master.
12th i started pgpool services at Master
13th i started pgpool services at Slave
14th Now Remote connection (checked with PostgreSql Admin) happening with below log messages:-

020-02-06 11:11:52: pid 21314:CONTEXT: while checking replication time lag
2020-02-06 11:11:52: pid 21314:LOG: get_query_result falied: status: -1
2020-02-06 11:11:52: pid 21314:CONTEXT: while checking replication time lag
2020-02-06 11:11:57: pid 21314:LOG: get_query_result: no rows returned
2020-02-06 11:11:57: pid 21314:DETAIL: node id (1)
2020-02-06 11:11:57: pid 21314:CONTEXT: while checking replication time lag
2020-02-06 11:11:57: pid 21314:LOG: get_query_result falied: status: -1
2020-02-06 11:11:57: pid 21314:CONTEXT: while checking replication time lag
2020-02-06 11:12:02: pid 21314:LOG: get_query_result: no rows returned
2020-02-06 11:12:02: pid 21314:DETAIL: node id (1)
2020-02-06 11:12:02: pid 21314:CONTEXT: while checking replication time lag
2020-02-06 11:12:02: pid 21314:LOG: get_query_result falied: status: -1
2020-02-06 11:12:02: pid 21314:CONTEXT: while checking replication time lag
2020-02-06 11:12:07: pid 21314:LOG: get_query_result: no rows returned
2020-02-06 11:12:07: pid 21314:DETAIL: node id (1)

Now Please suggest what else i am lagging here. Time is running out for me. I will have to give up the PGPOOL Implementation if issue not resolved in a week.After 1 week week i have to complete UAT with pgpool and then only can go live if successful.
(0003154)
t-ishii   
2020-02-08 19:30   
You seem to do too much at one time. I suggest you test the system step by step. If you find problem there, you'd better not to proceed to next step until you fix the problem.

> Master DB and Primary PGPOOL is configured on node1 (pgpool-poc01novalocal)
> Slave DB and Standby PGOOL is configured on node 2 (pgpool-poc02novalocal).

Probably the first test is stopping postmaster on node 2. Pgpool-II should trigger failover and "show pool_nodes" command should show that node 2 is down.
(0003155)
raj.pandey1982@gmail.com   
2020-02-08 23:58   
Sorry for goign fast with multiple steps.
Now i did as you asked for and as per your expectation results also appeared as below:-

>Probably the first test is stopping postmaster on node 2. Pgpool-II should trigger failover and "show pool_nodes" command should show that node 2 is down.

Before Postgres DB shutdown at node 2;-

[postgres@pgpool-poc01 logs]$ /usr/local/pgsql11.5/bin/psql -h 10.70.184.29 -p 5433 -U postgres -d mawidstg01
psql (11.5)
Type "help" for help.

mawidstg01=# show pool_nodes;
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_stat
e | last_status_change
---------+------------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+----------------------
--+---------------------
 0 | pgpool-poc01.novalocal | 5432 | up | 0.500000 | primary | 1913 | true | 0 | |
  | 2020-02-08 17:38:56
 1 | pgpool-poc02.novalocal | 5432 | up | 0.500000 | standby | 0 | false | 0 | streaming | async
  | 2020-02-08 17:38:56
(2 rows)


[postgres@pgpool-poc02 logs]$ /usr/local/pgsql11.5/bin/psql -h 10.70.184.29 -p 5433 -U postgres -d mawidstg01
psql (11.5)
Type "help" for help.

mawidstg01=# show pool_nodes;
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_stat
e | last_status_change
---------+------------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+----------------------
--+---------------------
 0 | pgpool-poc01.novalocal | 5432 | up | 0.500000 | primary | 0 | true | 0 | |
  | 2020-02-08 17:47:13
 1 | pgpool-poc02.novalocal | 5432 | up | 0.500000 | standby | 0 | false | 0 | streaming | async
  | 2020-02-08 17:47:13
(2 rows)



After Postgres DB shutdown at node 2:-

[postgres@pgpool-poc02 logs]$ /usr/local/pgsql11.5/bin/pg_ctl -D /installer/postgresql-11.5/data stop
waiting for server to shut down.... done
server stopped

[postgres@pgpool-poc02 logs]$ /usr/local/pgsql11.5/bin/psql -h 10.70.184.29 -p 5433 -U postgres -d mawidstg01
psql (11.5)
Type "help" for help.

mawidstg01=# show pool_nodes;
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_stat
e | last_status_change
---------+------------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+----------------------
--+---------------------
 0 | pgpool-poc01.novalocal | 5432 | up | 0.500000 | primary | 0 | true | 0 | |
  | 2020-02-08 17:48:56
 1 | pgpool-poc02.novalocal | 5432 | down | 0.500000 | standby | 0 | false | 0 | |
  | 2020-02-08 17:48:52
(2 rows)



mawidstg01=# show pool_nodes;

 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_stat
e | last_status_change
---------+------------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+----------------------
--+---------------------
 0 | pgpool-poc01.novalocal | 5432 | up | 0.500000 | primary | 2553 | true | 0 | |
  | 2020-02-08 17:38:56
 1 | pgpool-poc02.novalocal | 5432 | down | 0.500000 | standby | 0 | false | 0 | |
  | 2020-02-08 17:48:52
(2 rows)



PGPOOL Log while above activty :-

2020-02-08 17:48:52: pid 15644:LOG: reading and processing packets
2020-02-08 17:48:52: pid 15644:DETAIL: postmaster on DB node 1 was shutdown by administrative command
2020-02-08 17:48:52: pid 15644:LOG: received degenerate backend request for node_id: 1 from pid [15644]
2020-02-08 17:48:52: pid 14643:LOG: new IPC connection received
2020-02-08 17:48:52: pid 14643:LOG: watchdog received the failover command from local pgpool-II on IPC interface
2020-02-08 17:48:52: pid 15029:LOG: reading and processing packets
2020-02-08 17:48:52: pid 15029:DETAIL: postmaster on DB node 1 was shutdown by administrative command
2020-02-08 17:48:52: pid 14643:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2020-02-08 17:48:52: pid 14643:LOG: we have got the consensus to perform the failover
2020-02-08 17:48:52: pid 14643:DETAIL: 1 node(s) voted in the favor
2020-02-08 17:48:52: pid 14641:LOG: Pgpool-II parent process has received failover request
2020-02-08 17:48:52: pid 14643:LOG: new IPC connection received
2020-02-08 17:48:52: pid 14643:LOG: received the failover indication from Pgpool-II on IPC interface
2020-02-08 17:48:52: pid 14643:LOG: watchdog is informed of failover start by the main process
2020-02-08 17:48:52: pid 14641:LOG: starting degeneration. shutdown host pgpool-poc02.novalocal(5432)
2020-02-08 17:48:52: pid 14641:LOG: Do not restart children because we are switching over node id 1 host: pgpool-poc02.novalocal port: 5432 and we are in streaming replication mode
2020-02-08 17:48:52: pid 14641:LOG: child pid 15644 needs to restart because pool 0 uses backend 1
2020-02-08 17:48:52: pid 14641:LOG: execute command: /usr/share/pgpool/4.1.0/etc/failover.sh 1 0 pgpool-poc01.novalocal postgrestg /installer/postgresql-11.5/data/im_the_master
2020-02-08 17:48:52: pid 14641:LOG: failover: set new primary node: 0
2020-02-08 17:48:52: pid 14641:LOG: failover: set new master node: 0
2020-02-08 17:48:52: pid 14641:LOG: child pid 15644 needs to restart because pool 0 uses backend 1
2020-02-08 17:48:52: pid 14643:LOG: new IPC connection received
2020-02-08 17:48:52: pid 14643:LOG: received the failover indication from Pgpool-II on IPC interface
2020-02-08 17:48:52: pid 14643:LOG: watchdog is informed of failover end by the main process
2020-02-08 17:48:52: pid 15663:LOG: worker process received restart request
failover done. shutdown host pgpool-poc02.novalocal(5432)2020-02-08 17:48:52: pid 14641:LOG: failover done. shutdown host pgpool-poc02.novalocal(5432)
2020-02-08 17:48:53: pid 15662:LOG: restart request received in pcp child process
2020-02-08 17:48:53: pid 14641:LOG: PCP child 15662 exits with status 0 in failover()
2020-02-08 17:48:53: pid 14641:LOG: fork a new PCP child pid 16406 in failover()
2020-02-08 17:48:53: pid 14641:LOG: child process with pid: 15644 exits with status 256
2020-02-08 17:48:53: pid 14641:LOG: child process with pid: 15644 exited with success and will not be restarted
2020-02-08 17:48:53: pid 16406:LOG: PCP process: 16406 started
2020-02-08 17:48:53: pid 14641:LOG: worker child process with pid: 15663 exits with status 256
2020-02-08 17:48:53: pid 14641:LOG: fork a new worker child process with pid: 16407
2020-02-08 17:48:53: pid 16407:LOG: get_query_result: no rows returned
2020-02-08 17:48:53: pid 16407:DETAIL: node id (0)
2020-02-08 17:48:53: pid 16407:CONTEXT: while checking replication time lag
2020-02-08 17:48:53: pid 16407:LOG: get_query_result falied: status: -1
2020-02-08 17:48:53: pid 16407:CONTEXT: while checking replication time lag
2020-02-08 17:48:58: pid 16407:LOG: get_query_result: no rows returned
2020-02-08 17:48:58: pid 16407:DETAIL: node id (0)
2020-02-08 17:48:58: pid 16407:CONTEXT: while checking replication time lag
2020-02-08 17:48:58: pid 16407:LOG: get_query_result falied: status: -1
2020-02-08 17:48:58: pid 16407:CONTEXT: while checking replication time lag
2020-02-08 17:49:03: pid 16407:LOG: get_query_result: no rows returned
2020-02-08 17:49:03: pid 16407:DETAIL: node id (0)
2020-02-08 17:49:03: pid 16407:CONTEXT: while checking replication time lag
2020-02-08 17:49:03: pid 16407:LOG: get_query_result falied: status: -1
2020-02-08 17:49:03: pid 16407:CONTEXT: while checking replication time lag
2020-02-08 17:49:08: pid 16407:LOG: get_query_result: no rows returned
2020-02-08 17:49:08: pid 16407:DETAIL: node id (0)
2020-02-08 17:49:08: pid 16407:CONTEXT: while checking replication time lag
2020-02-08 17:49:08: pid 16407:LOG: get_query_result falied: status: -1
(0003156)
t-ishii   
2020-02-09 15:24   
Looks good to me. In the log:
2020-02-08 17:49:08: pid 16407:LOG: get_query_result: no rows returned
2020-02-08 17:49:08: pid 16407:DETAIL: node id (0)
2020-02-08 17:49:08: pid 16407:CONTEXT: while checking replication time lag
2020-02-08 17:49:08: pid 16407:LOG: get_query_result falied: status: -1

This says that query "SELECT pg_stat_replication..." returns no row. This is normal because there's no standby node.

Next test is if you can recover the standby node by using pcp_recovery_node. After issuing pcp_recovery_node command something like:

pcp_recovery_node -h pgpool-poc01.novalocal 1

you should see that pgpool-poc02.novalocal comes back online.
(0003157)
raj.pandey1982@gmail.com   
2020-02-09 15:34   
Sure i will do and share the result.
But in between,one thing i had tested yesterday after sending you update:- i started slave DB and it came in sync then i attached the node thru pcp_attach node that went fine.
(0003158)
t-ishii   
2020-02-09 15:44   
Yes, it should work too as long as the standby node is healthy after restarting. pcp_recovery_node can recover a stanbdy node which is out of sync with the primary node for example. So testing pcp_recovery_node is important.
(0003159)
raj.pandey1982@gmail.com   
2020-02-09 16:37   
[root@pgpool-poc01 replscripts]# pcp_recovery_node -h pgpool-poc02.novalocal -p 9898 -n 1 -U postgres -w
ERROR: executing recovery, execution of command failed at "1st stage"
DETAIL: command:"recovery_1st_stage.sh"

[root@pgpool-poc01 replscripts]#
(0003160)
t-ishii   
2020-02-09 19:31   
You'd better to check PostgreSQL log of pgpool-poc01.novalocal and pgpool.log. Can you share them?
(0003161)
raj.pandey1982@gmail.com   
2020-02-09 19:53   
++ pgpool log too
(0003162)
t-ishii   
2020-02-10 07:50   
Which one is the pgpool-poc01.novalocal log when you executed pcp_recovery_node?
(0003164)
raj.pandey1982@gmail.com   
2020-02-10 15:09   
The zip file pgpool.zip is the pgpool log of pgpool-poc01.novalocal.
(0003165)
t-ishii   
2020-02-10 17:14   
When did you execute pcp_recovery_node?
(0003166)
raj.pandey1982@gmail.com   
2020-02-10 17:32   
For your convince i remove dold log just did it agian today at 11:26 AM
Step 1:
[postgres@pgpool-poc02 logs]$ /usr/local/pgsql11.5/bin/pg_ctl -D /installer/postgresql-11.5/data stop
waiting for server to shut down.... done
server stopped
[postgres@pgpool-poc02 logs]$

Step 2
[root@pgpool-poc01 pgpool]# pcp_recovery_node -h pgpool-poc02.novalocal -p 9898 -n 1 -U postgres -w
ERROR: executing recovery, execution of command failed at "1st stage"
DETAIL: command:"recovery_1st_stage.sh"


Step 3 Checked pgpool log at pgpool-poc01.novalocal :-

2020-02-10 11:26:24: pid 11668:DETAIL: postmaster on DB node 1 was shutdown by administrative command
2020-02-10 11:26:24: pid 11668:LOG: received degenerate backend request for node_id: 1 from pid [11668]
2020-02-10 11:26:24: pid 11029:LOG: new IPC connection received
2020-02-10 11:26:24: pid 11029:LOG: watchdog received the failover command from local pgpool-II on IPC interface
2020-02-10 11:26:24: pid 11029:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2020-02-10 11:26:24: pid 11029:LOG: we have got the consensus to perform the failover
2020-02-10 11:26:24: pid 11029:DETAIL: 1 node(s) voted in the favor
2020-02-10 11:26:24: pid 11027:LOG: Pgpool-II parent process has received failover request
2020-02-10 11:26:24: pid 11029:LOG: new IPC connection received
2020-02-10 11:26:24: pid 11029:LOG: received the failover indication from Pgpool-II on IPC interface
2020-02-10 11:26:24: pid 11029:LOG: watchdog is informed of failover start by the main process
2020-02-10 11:26:24: pid 11726:LOG: reading and processing packets
2020-02-10 11:26:24: pid 11726:DETAIL: postmaster on DB node 1 was shutdown by administrative command
2020-02-10 11:26:24: pid 11726:LOG: received degenerate backend request for node_id: 1 from pid [11726]
2020-02-10 11:26:24: pid 11027:LOG: starting degeneration. shutdown host pgpool-poc02.novalocal(5432)
2020-02-10 11:26:24: pid 11029:LOG: new IPC connection received
2020-02-10 11:26:24: pid 11029:LOG: watchdog received the failover command from local pgpool-II on IPC interface
2020-02-10 11:26:24: pid 11029:LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2020-02-10 11:26:24: pid 11029:LOG: we have got the consensus to perform the failover
2020-02-10 11:26:24: pid 11029:DETAIL: 1 node(s) voted in the favor
2020-02-10 11:26:24: pid 11027:LOG: Do not restart children because we are switching over node id 1 host: pgpool-poc02.novalocal port: 5432 and we are in streaming replication mode
2020-02-10 11:26:24: pid 11027:LOG: child pid 11668 needs to restart because pool 0 uses backend 1
2020-02-10 11:26:24: pid 11027:LOG: child pid 11726 needs to restart because pool 0 uses backend 1
2020-02-10 11:26:24: pid 11027:LOG: execute command: /usr/share/pgpool/4.1.0/etc/failover.sh 1 0 pgpool-poc01.novalocal postgrestg /installer/postgresql-11.5/data/im_the_master
2020-02-10 11:26:24: pid 11027:LOG: failover: set new primary node: 0
2020-02-10 11:26:24: pid 11027:LOG: failover: set new master node: 0
2020-02-10 11:26:24: pid 11027:LOG: child pid 11668 needs to restart because pool 0 uses backend 1
2020-02-10 11:26:24: pid 11027:LOG: child pid 11726 needs to restart because pool 0 uses backend 1
2020-02-10 11:26:24: pid 11029:LOG: new IPC connection received
2020-02-10 11:26:24: pid 11029:LOG: received the failover indication from Pgpool-II on IPC interface
2020-02-10 11:26:24: pid 11029:LOG: watchdog is informed of failover end by the main process
failover done. shutdown host pgpool-poc02.novalocal(5432)2020-02-10 11:26:24: pid 11027:LOG: failover done. shutdown host pgpool-poc02.novalocal(5432)
2020-02-10 11:26:24: pid 12056:LOG: worker process received restart request
2020-02-10 11:26:24: pid 11029:LOG: new IPC connection received
2020-02-10 11:26:24: pid 11029:LOG: received the failover indication from Pgpool-II on IPC interface
2020-02-10 11:26:24: pid 11029:LOG: watchdog is informed of failover start by the main process
2020-02-10 11:26:24: pid 11027:LOG: failover: no backends are degenerated
2020-02-10 11:26:25: pid 12055:LOG: restart request received in pcp child process
2020-02-10 11:26:25: pid 11027:LOG: PCP child 12055 exits with status 0 in failover()
2020-02-10 11:26:25: pid 11027:LOG: fork a new PCP child pid 12172 in failover()
2020-02-10 11:26:25: pid 11027:LOG: child process with pid: 11668 exits with status 256
2020-02-10 11:26:25: pid 11027:LOG: child process with pid: 11668 exited with success and will not be restarted
2020-02-10 11:26:25: pid 11027:LOG: child process with pid: 11726 exits with status 256
2020-02-10 11:26:25: pid 11027:LOG: child process with pid: 11726 exited with success and will not be restarted
2020-02-10 11:26:25: pid 11027:LOG: worker child process with pid: 12056 exits with status 256
2020-02-10 11:26:25: pid 12172:LOG: PCP process: 12172 started
2020-02-10 11:26:25: pid 11027:LOG: fork a new worker child process with pid: 12173
2020-02-10 11:26:25: pid 12173:LOG: get_query_result: no rows returned
2020-02-10 11:26:25: pid 12173:DETAIL: node id (0)
2020-02-10 11:26:25: pid 12173:CONTEXT: while checking replication time lag
2020-02-10 11:26:25: pid 12173:LOG: get_query_result falied: status: -1
2020-02-10 11:26:25: pid 12173:CONTEXT: while checking replication time lag
2020-02-10 11:26:30: pid 12173:LOG: get_query_result: no rows returned
2020-02-10 11:26:30: pid 12173:DETAIL: node id (0)
2020-02-10 11:26:30: pid 12173:CONTEXT: while checking replication time lag
2020-02-10 11:26:30: pid 12173:LOG: get_query_result falied: status: -1
2020-02-10 11:26:30: pid 12173:CONTEXT: while checking replication time lag
2020-02-10 11:26:35: pid 12173:LOG: get_query_result: no rows returned
2020-02-10 11:26:35: pid 12173:DETAIL: node id (0)
2020-02-10 11:26:35: pid 12173:CONTEXT: while checking replication time lag
2020-02-10 11:26:35: pid 12173:LOG: get_query_result falied: status: -1
2020-02-10 11:26:35: pid 12173:CONTEXT: while checking replication time lag
2020-02-10 11:26:40: pid 12173:LOG: get_query_result: no rows returned
2020-02-10 11:26:40: pid 12173:DETAIL: node id (0)
2020-02-10 11:26:40: pid 12173:CONTEXT: while checking replication time lag
2020-02-10 11:26:40: pid 12173:LOG: get_query_result falied: status: -1
2020-02-10 11:26:40: pid 12173:CONTEXT: while checking replication time lag
2020-02-10 11:26:45: pid 12173:LOG: get_query_result: no rows returned
2020-02-10 11:26:45: pid 12173:DETAIL: node id (0)
2020-02-10 11:26:45: pid 12173:CONTEXT: while checking replication time lag
2020-02-10 11:26:45: pid 12173:LOG: get_query_result falied: status: -1
2020-02-10 11:26:45: pid 12173:CONTEXT: while checking replication time lag
2020-02-10 11:26:50: pid 12173:LOG: get_query_result: no rows returned
2020-02-10 11:26:50: pid 12173:DETAIL: node id (0)
2020-02-10 11:26:50: pid 12173:CONTEXT: while checking replication time lag
2020-02-10 11:26:50: pid 12173:LOG: get_query_result falied: status: -1
2020-02-10 11:26:50: pid 12173:CONTEXT: while checking replication time lag
2020-02-10 11:26:55: pid 12173:LOG: get_query_result: no rows returned

Step 4:- attaching the full logfile also.
(0003167)
raj.pandey1982@gmail.com   
2020-02-10 17:44   
Hello Friend i got this in the log while executing recovery command:-

[root@pgpool-poc01 pgpool]# pcp_recovery_node -h pgpool-poc01.novalocal -p 9898 -n 1 -U postgres -w
ERROR: executing recovery, execution of command failed at "1st stage"
DETAIL: command:"recovery_1st_stage.sh"


2020-02-10 11:42:13: pid 14200:LOG: starting recovering node 1
2020-02-10 11:42:13: pid 14200:LOG: executing recovery
2020-02-10 11:42:13: pid 14200:DETAIL: starting recovery command: "SELECT pgpool_recovery('recovery_1st_stage.sh', 'pgpool-poc02.novalocal', '/installer/postgresql-11.5/data', '5432', 1, '5432')"
2020-02-10 11:42:13: pid 14200:LOG: executing recovery
2020-02-10 11:42:13: pid 14200:DETAIL: disabling statement_timeout
2020-02-10 11:42:13: pid 14200:ERROR: executing recovery, execution of command failed at "1st stage"
2020-02-10 11:42:13: pid 14200:DETAIL: command:"recovery_1st_stage.sh"
2020-02-10 11:42:13: pid 12172:LOG: PCP process with pid: 14200 exit with SUCCESS.
2020-02-10 11:42:13: pid 12172:LOG: PCP process with pid: 14200 exits with status 0
2020-02-10 11:42:17: pid 12173:LOG: get_query_result: no rows returned
2020-02-10 11:42:17: pid 12173:DETAIL: node id (0)
2020-02-10 11:42:17: pid 12173:CONTEXT: while checking replication time lag
2020-02-10 11:42:17: pid 12173:LOG: get_query_result falied: status: -1
2020-02-10 11:42:17: pid 12173:CONTEXT: while checking replication time lag
2020-02-10 11:42:22: pid 12173:LOG: get_query_result: no rows returned
2020-02-10 11:42:22: pid 12173:DETAIL: node id (0)
2020-02-10 11:42:22: pid 12173:CONTEXT: while checking replication time lag
2020-02-10 11:42:22: pid 12173:LOG: get_query_result falied: status: -1
2020-02-10 11:42:22: pid 12173:CONTEXT: while checking replication time lag
2020-02-10 11:42:27: pid 12173:LOG: get_query_result: no rows returned
2020-02-10 11:42:27: pid 12173:DETAIL: node id (0)
2020-02-10 11:42:27: pid 12173:CONTEXT: while checking replication time lag
2020-02-10 11:42:27: pid 12173:LOG: get_query_result falied: status: -1
2020-02-10 11:42:27: pid 12173:CONTEXT: while checking replication time lag
2020-02-10 11:42:32: pid 12173:LOG: get_query_result: no rows returned
2020-02-10 11:42:32: pid 12173:DETAIL: node id (0)
2020-02-10 11:42:32: pid 12173:CONTEXT: while checking replication time lag
2020-02-10 11:42:32: pid 12173:LOG: get_query_result falied: status: -1
2020-02-10 11:42:32: pid 12173:CONTEXT: while checking replication time lag
2020-02-10 11:42:37: pid 12173:LOG: get_query_result: no rows returned
2020-02-10 11:42:37: pid 12173:DETAIL: node id (0)
2020-02-10 11:42:37: pid 12173:CONTEXT: while checking replication time lag
2020-02-10 11:42:37: pid 12173:LOG: get_query_result falied: status: -1
2020-02-10 11:42:37: pid 12173:CONTEXT: while checking replication time lag
2020-02-10 11:42:42: pid 12173:LOG: get_query_result: no rows returned
2020-02-10 11:42:42: pid 12173:DETAIL: node id (0)
2020-02-10 11:42:42: pid 12173:CONTEXT: while checking replication time lag
2020-02-10 11:42:42: pid 12173:LOG: get_query_result falied: status: -1
2020-02-10 11:42:42: pid 12173:CONTEXT: while checking replication time lag
(0003168)
t-ishii   
2020-02-10 17:48   
Can you share PostgreSQL log at the same time on pgpool-poc01.novalocal?
(0003170)
raj.pandey1982@gmail.com   
2020-02-11 16:02   
Hi Friend, an findings here!/
(0003171)
t-ishii   
2020-02-11 16:28   
Sorry but what I wanted was PostgreSQL log, not pgpool log.
(0003172)
raj.pandey1982@gmail.com   
2020-02-11 19:50   
I did this again to reproduce the issue, here is the Master DB node postgres log out put (attaching the full log as well.:-
2020-02-11 13:45:20 +03 LOG: statement: SELECT pg_current_wal_lsn()
2020-02-11 13:45:20 +03 LOG: statement: SELECT application_name, state, sync_state FROM pg_stat_replication
2020-02-11 13:45:20 +03 LOG: statement: SELECT pg_is_in_recovery()
2020-02-11 13:45:21 +03 LOG: statement: SELECT
           (SELECT count(*) FROM pg_stat_activity) AS "Total",
           (SELECT count(*) FROM pg_stat_activity WHERE state = 'active') AS "Active",
           (SELECT count(*) FROM pg_stat_activity WHERE state = 'idle') AS "Idle"
2020-02-11 13:45:21 +03 LOG: statement: SET statement_timeout To 0
2020-02-11 13:45:21 +03 LOG: statement: SELECT pgpool_recovery('recovery_1st_stage.sh', 'pgpool-poc02.novalocal', '/installer/postgresql-11.5/data', '5432', 1, '5432')
2020-02-11 13:45:21 +03 ERROR: function pgpool_recovery(unknown, unknown, unknown, unknown, integer, unknown) does not exist at character 8
2020-02-11 13:45:21 +03 HINT: No function matches the given name and argument types. You might need to add explicit type casts.
2020-02-11 13:45:21 +03 STATEMENT: SELECT pgpool_recovery('recovery_1st_stage.sh', 'pgpool-poc02.novalocal', '/installer/postgresql-11.5/data', '5432', 1, '5432')
2020-02-11 13:45:21 +03 LOG: statement: SELECT pg_current_wal_lsn()
2020-02-11 13:45:21 +03 LOG: statement: SELECT application_name, state, sync_state FROM pg_stat_replication
2020-02-11 13:45:21 +03 LOG: statement: SELECT pg_is_in_recovery()
2020-02-11 13:45:22 +03 LOG: statement: SELECT
           (SELECT count(*) FROM pg_stat_activity) AS "Total",
           (SELECT count(*) FROM pg_stat_activity WHERE state = 'active') AS "Active",
           (SELECT count(*) FROM pg_stat_activity WHERE state = 'idle') AS "Idle"
2020-02-11 13:45:23 +03 LOG: statement: SELECT
           (SELECT count(*) FROM pg_stat_activity) AS "Total",
           (SELECT count(*) FROM pg_stat_activity WHERE state = 'active') AS "Active",
           (SELECT count(*) FROM pg_stat_activity WHERE state = 'idle') AS "Idle"
2020-02-11 13:45:24 +03 LOG: statement: SELECT
           (SELECT count(*) FROM pg_stat_activity) AS "Total",
           (SELECT count(*) FROM pg_stat_activity WHERE state = 'active') AS "Active",
           (SELECT count(*) FROM pg_stat_activity WHERE state = 'idle') AS "Idle"
2020-02-11 13:45:25 +03 LOG: statement: SELECT
           (SELECT count(*) FROM pg_stat_activity) AS "Total",
           (SELECT count(*) FROM pg_stat_activity WHERE state = 'active') AS "Active",
(0003173)
t-ishii   
2020-02-11 20:05   
You need to install pgpool_recovery extension which comes with Pgpool-II, to all PostgreSQL.
(0003174)
raj.pandey1982@gmail.com   
2020-02-11 20:52   
Its already there i just checked:-

select * from pg_available_extensions;
'plpgsql','1.0','1.0','PL/pgSQL procedural language'
'pgcrypto','1.3','1.3','cryptographic functions'
'pg_stat_statements','1.6','1.4','track execution statistics of all SQL statements executed'
'pg_buffercache','1.3','1.2','examine the shared buffer cache'
'pgpool_adm','1.2',,'Administrative functions for pgPool'
'pgpool_recovery','1.3',,'recovery functions for pgpool-II for V4.1 or later'
'pgpool_regclass','1.0',,'replacement for regclass'
(0003175)
raj.pandey1982@gmail.com   
2020-02-11 20:56   
Also the Recovery 1st stage script that i am using:-

[postgres@pgpool-poc01 data]$ cat recovery_1st_stage.sh
#! /bin/sh
psql=/usr/local/pgsqli-11.5/bin/psql
DATADIR_BASE=/installer/postgresql-11.5/data
PGSUPERUSER=postgres
master_db_cluster=$1
recovery_node_host_name=$2
DEST_CLUSTER=$3
PORT=$4
recovery_node=$5
pg_rewind_failed="true"
log=/installer/replscripts/recovery.log
echo >> $log
date >> $log
if [ $pg_rewind_failed = "true" ];then
$psql -p $PORT -c "SELECT pg_start_backup('Streaming Replication', true)" postgres
echo "source: $master_db_cluster dest: $DEST_CLUSTER" >> $log
rsync -C -a -c --delete --exclude postgresql.conf --exclude postmaster.pid \
--exclude postmaster.opts --exclude pg_log \
--exclude recovery.conf --exclude recovery.done \
--exclude pg_xlog \
$master_db_cluster/ $DEST_CLUSTER/
rm -fr $DEST_CLUSTER/pg_xlog
mkdir $DEST_CLUSTER/pg_xlog
chmod 700 $DEST_CLUSTER/pg_xlog
rm $DEST_CLUSTER/recovery.done
fi
cat > $DEST_CLUSTER/recovery.conf $lt;$lt;REOF
standby_mode = 'on'
primary_conninfo = 'port=$PORT user=$PGSUPERUSER'
recovery_target_timeline='latest'
restore_command = 'cp /installer/archivedir/%f "%p" 2> /dev/null'
REOF
if [ $pg_rewind_failed = "true" ];then
$psql -p $PORT -c "SELECT pg_stop_backup()" postgres
fi
if [ $pg_rewind_failed = "false" ];then
cp /tmp/slaveconfbkp/postgresql.conf $DEST_CLUSTER/
fi
(0003176)
t-ishii   
2020-02-11 21:59   
The extension needs to be installed into template1 database. Have you done it?
(0003177)
raj.pandey1982@gmail.com   
2020-02-11 22:09   
did now :-

root@pgpool-poc01 sql]# sudo -u postgres /usr/local/pgsql11.5/bin/psql -f pgpool-recovery.sql template1
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
[root@pgpool-poc01 sql]#
(0003178)
raj.pandey1982@gmail.com   
2020-02-11 22:12   
Template1 Database output:-
select * from pg_available_extensions;
'plpgsql','1.0','1.0','PL/pgSQL procedural language'
'pgcrypto','1.3',,'cryptographic functions'
'pg_stat_statements','1.6',,'track execution statistics of all SQL statements executed'
'pg_buffercache','1.3',,'examine the shared buffer cache'
'pgpool_adm','1.2',,'Administrative functions for pgPool'
'pgpool_recovery','1.3',,'recovery functions for pgpool-II for V4.1 or later'
'pgpool_regclass','1.0',,'replacement for regclass'
(0003179)
raj.pandey1982@gmail.com   
2020-02-11 22:35   
again tried to shudown slave and fired recovery command:-
2020-02-11 16:30:42 +03 LOG: statement: SELECT
           (SELECT sum(blks_read) FROM pg_stat_database WHERE datname = (SELECT datname FROM pg_database WHERE oid = 16416)) AS "Reads",
           (SELECT sum(blks_hit) FROM pg_stat_database WHERE datname = (SELECT datname FROM pg_database WHERE oid = 16416)) AS "Hits"
2020-02-11 16:30:42 +03 LOG: statement: SET statement_timeout To 0
2020-02-11 16:30:42 +03 LOG: statement: SELECT pgpool_recovery('recovery_1st_stage.sh', 'pgpool-poc02.novalocal', '/installer/postgresql-11.5/data', '5432', 1, '5432')
2020-02-11 16:30:42 +03 ERROR: function pgpool_recovery(unknown, unknown, unknown, unknown, integer, unknown) does not exist at character 8
2020-02-11 16:30:42 +03 HINT: No function matches the given name and argument types. You might need to add explicit type casts.
2020-02-11 16:30:42 +03 STATEMENT: SELECT pgpool_recovery('recovery_1st_stage.sh', 'pgpool-poc02.novalocal', '/installer/postgresql-11.5/data', '5432', 1, '5432')
2020-02-11 16:30:42 +03 LOG: statement: SELECT
           (SELECT count(*) FROM pg_stat_activity WHERE datname = (SELECT datname FROM pg_database WHERE oid = 16416)) AS "Total",
           (SELECT count(*) FROM pg_stat_activity WHERE state = 'active' AND datname = (SELECT datname FROM pg_database WHERE oid = 16416)) AS "Acti
(0003180)
t-ishii   
2020-02-11 22:45   
Why don't you use the CREATE EXTENSION command? Using psql and SQL script lke pgpool-recovery.sql is an obsoleted way to register functions.
(0003185)
raj.pandey1982@gmail.com   
2020-02-12 15:45   
its giving error:-

template1=# CREATE EXTENSION pgpool_recovery;
ERROR: extension "pgpool_recovery" has no installation script nor update path for version "1.3"
(0003186)
raj.pandey1982@gmail.com   
2020-02-12 15:47   
[postgres@pgpool-poc01 logs]$ locate pgpool_recovery
/installer/pgpool-II-4.1.0/src/sql/pgpool-recovery/pgpool_recovery--1.1--1.2.sql
/installer/pgpool-II-4.1.0/src/sql/pgpool-recovery/pgpool_recovery--1.1.sql
/installer/pgpool-II-4.1.0/src/sql/pgpool-recovery/pgpool_recovery--1.2--1.3.sql
/installer/pgpool-II-4.1.0/src/sql/pgpool-recovery/pgpool_recovery--1.2.sql
/installer/pgpool-II-4.1.0/src/sql/pgpool-recovery/pgpool_recovery--1.3.sql
/installer/pgpool-II-4.1.0/src/sql/pgpool-recovery/pgpool_recovery.control
/usr/local/pgsql11.5/share/extension/pgpool_recovery--1.1.sql
/usr/local/pgsql11.5/share/extension/pgpool_recovery.control
[postgres@pgpool-poc01 logs]$
(0003187)
t-ishii   
2020-02-12 16:42   
It's an installation mistake. pgpool_recovery--1.3.sql is not there. Ask help to someone who did the installation.
(0003188)
raj.pandey1982@gmail.com   
2020-02-12 16:57   
i copied th script from installer and did it again but as i have already created function its givng already exist message.
[postgres@pgpool-poc01 lib]$ cp -p /installer/pgpool-II-4.1.0/src/sql/pgpool-recovery/pgpool_recovery--1.3.sql /usr/local/pgsql11.5/share/extension/
[postgres@pgpool-poc01 lib]$ /usr/local/pgsql11.5/bin/psql -h 10.70.184.27 -p 5432 -U postgres -d template1
psql (11.5)
Type "help" for help.

template1=# CREATE EXTENSION pgpool_recovery;
ERROR: function "pgpool_recovery" already exists with same argument types
template1=#
(0003189)
raj.pandey1982@gmail.com   
2020-02-12 19:49   
itr ied again after this:-same issue:-
020-02-12 13:42:14 +03 LOG: statement: SELECT pg_current_wal_lsn()
2020-02-12 13:42:14 +03 LOG: statement: SELECT application_name, state, sync_state FROM pg_stat_replication
2020-02-12 13:42:14 +03 LOG: statement: SELECT pg_is_in_recovery()
2020-02-12 13:42:15 +03 LOG: statement: SET statement_timeout To 0
2020-02-12 13:42:15 +03 LOG: statement: SELECT pgpool_recovery('recovery_1st_stage.sh', 'pgpool-poc02.novalocal', '/installer/postgresql-11.5/data', '5432', 1, '5432')
2020-02-12 13:42:15 +03 ERROR: function pgpool_recovery(unknown, unknown, unknown, unknown, integer, unknown) does not exist at character 8
2020-02-12 13:42:15 +03 HINT: No function matches the given name and argument types. You might need to add explicit type casts.
2020-02-12 13:42:15 +03 STATEMENT: SELECT pgpool_recovery('recovery_1st_stage.sh', 'pgpool-poc02.novalocal', '/installer/postgresql-11.5/data', '5432', 1, '5432')
2020-02-12 13:42:19 +03 LOG: statement: SELECT pg_current_wal_lsn()
2020-02-12 13:42:19 +03 LOG: statement: SELECT application_name, state, sync_state FROM pg_stat_replication
2020-02-12 13:42:19 +03 LOG: statement: SELECT pg_is_in_recovery()
2020-02-12 13:42:22 +03 LOG: execute <unnamed>: SET extra_float_digits = 3
2020-02-12 13:42:22 +03 LOG: execute <unnamed>: SET application_name = 'his-syncreport-svc'
2020-02-12 13:42:24 +03 LOG: statement: SELECT pg_current_wal_lsn()
2020-02-12 13:42:24 +03 LOG: statement: SELECT application_name, state, sync_state FROM pg_stat_replication
(0003190)
raj.pandey1982@gmail.com   
2020-02-13 02:10   
I changed the failover script from mine one to one PGPOOL4.1 doc example script and then tried to do failover by shutting down DB at node 1:-

Failover happened well (slave promoted to master)but again i could not make connections with PGAdmin consol . I checked below commands and node info command showed newly promoted S Master status as' waiting' for log time and front end connections were not happening. then i checked after 15/20 mints .,tHe newly promoted Master status was showing as UP. But still Remote connections were not happening:-
 
[root@pgpool-poc01 etc]# pcp_node_info --verbose -h 10.70.184.27 -U postgres 0 -w
Hostname : pgpool-poc01.novalocal
Port : 5432
Status : 3
Weight : 0.500000
Status Name : down
Role : standby
Replication Delay : 0
Replication State :
Replication Sync State :
Last Status Change : 2020-02-12 18:59:49
[root@pgpool-poc01 etc]# pcp_node_info --verbose -h 10.70.184.28 -U postgres 1 -w
Hostname : pgpool-poc02.novalocal
Port : 5432
Status : 1
Weight : 0.500000
Status Name : waiting
Role : primary
Replication Delay : 0
Replication State :
Replication Sync State :
Last Status Change : 1970-01-01 03:00:00
[root@pgpool-poc01 etc]# /usr/local/pgsql11.5/bin/psql -h 10.70.184.29 -p 5433 -U postgres -d mawidstg01
psql (11.5)
Type "help" for help.

mawidstg01=# show pool_nodes
mawidstg01-# show pool_nodes;
ERROR: syntax error at or near "show"
LINE 2: show pool_nodes;
        ^
mawidstg01=# ^C
mawidstg01=# show pool_node;
ERROR: unrecognized configuration parameter "pool_node"
mawidstg01=# \q
[root@pgpool-poc01 etc]# /usr/local/pgsql11.5/bin/psql -h 10.70.184.29 -p 5433 -U postgres -d mawidstg01
psql (11.5)
Type "help" for help.

mawidstg01=# show pool_nodes;
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_stat
e | last_status_change
---------+------------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+----------------------
--+---------------------
 0 | pgpool-poc01.novalocal | 5432 | down | 0.500000 | standby | 236 | false | 0 | |
  | 2020-02-12 18:59:49
 1 | pgpool-poc02.novalocal | 5432 | up | 0.500000 | primary | 0 | true | 0 | |
  | 2020-02-12 18:59:49
(2 rows)

PGPOOL Log of that time showed:-
2020-02-12 19:04:31: pid 19354:LOG: get_query_result: no rows returned
2020-02-12 19:04:31: pid 19354:DETAIL: node id (1)
2020-02-12 19:04:31: pid 19354:CONTEXT: while checking replication time lag
2020-02-12 19:04:31: pid 19354:LOG: get_query_result falied: status: -1
2020-02-12 19:04:31: pid 19354:CONTEXT: while checking replication time lag
(0003191)
raj.pandey1982@gmail.com   
2020-02-13 02:15   
2020-02-12 19:00:05: pid 19354:LOG: get_query_result falied: status: -1
(0003192)
raj.pandey1982@gmail.com   
2020-02-13 02:20   
2020-02-12 18:59:50: pid 17238:LOG: child process with pid: 18254 exited with success and will not be restarted
2020-02-12 18:59:50: pid 17238:LOG: child process with pid: 18255 exits with status 256
2020-02-12 18:59:50: pid 17238:LOG: child process with pid: 18255 exited with success and will not be restarted
2020-02-12 18:59:50: pid 17238:LOG: child process with pid: 18256 exits with status 256
2020-02-12 18:59:50: pid 17238:LOG: child process with pid: 18256 exited with success and will not be restarted
2020-02-12 18:59:50: pid 17238:LOG: worker child process with pid: 18259 exits with status 256
2020-02-12 18:59:50: pid 17238:LOG: fork a new worker child process with pid: 19354
2020-02-12 18:59:50: pid 17238:LOG: child process with pid: 18353 exits with status 256
2020-02-12 18:59:50: pid 17238:LOG: fork a new child process with pid: 19355
2020-02-12 18:59:50: pid 19354:LOG: get_query_result: no rows returned
2020-02-12 18:59:50: pid 19354:DETAIL: node id (1)
2020-02-12 18:59:50: pid 19354:CONTEXT: while checking replication time lag
2020-02-12 18:59:50: pid 19354:LOG: get_query_result falied: status: -1
2020-02-12 18:59:50: pid 19354:CONTEXT: while checking replication time lag
2020-02-12 18:59:55: pid 19354:LOG: get_query_result: no rows returned
2020-02-12 18:59:55: pid 19354:DETAIL: node id (1)
2020-02-12 18:59:55: pid 19354:CONTEXT: while checking replication time lag
2020-02-12 18:59:55: pid 19354:LOG: get_query_result falied: status: -1
2020-02-12 18:59:55: pid 19354:CONTEXT: while checking replication time lag
2020-02-12 19:00:00: pid 19354:LOG: get_query_result: no rows returned
2020-02-12 19:00:00: pid 19354:DETAIL: node id (1)
2020-02-12 19:00:00: pid 19354:CONTEXT: while checking replication time lag
2020-02-12 19:00:00: pid 19354:LOG: get_query_result falied: status: -1
2020-02-12 19:00:00: pid 19354:CONTEXT: while checking replication time lag
2020-02-12 19:00:05: pid 19354:LOG: get_query_result: no rows returned
2020-02-12 19:00:05: pid 19354:DETAIL: node id (1)
2020-02-12 19:00:05: pid 19354:CONTEXT: while checking replication time lag
2020-02-12 19:00:05: pid 19354:LOG: get_query_result falied: status: -1
(0003193)
raj.pandey1982@gmail.com   
2020-02-13 02:23   
Please help here. I am sure i am very close to resolution, there must be something you would have for this 'waiting' status of the newly promoted master node to and if after failover this waiting get changed immediately to UP state then remote connection would start happening.
(0003194)
raj.pandey1982@gmail.com   
2020-02-13 02:29   
I just restarted the pgpool services at both the nodes and now VIP connections are happening. But why to restart services :-(. it should not be required.pgpool should take care byitslef.
(0003195)
t-ishii   
2020-02-13 07:37   
> i copied th script from installer and did it again but as i have already created function its givng already exist message.
You cannot do that way. You should do "make install" under src/sql/pgpool-recovery/.
(0003199)
raj.pandey1982@gmail.com   
2020-02-16 14:14   
Might you please check the post 0003190-0003193, here i can see after starting instance primary/master be in waiting state and Postgres Admin is not able to make connections. I restart the pgpool 2/3 times and try to connect with PGAdmin then it works. Now after this when i perform DB failover, this looks like again goes in waiting state.

(Note: i dont thing now any issue with pg-recovery function as its already available and i had used binary so was not required to install separately using make command
(0003200)
t-ishii   
2020-02-16 14:40   
> (Note: i dont thing now any issue with pg-recovery function as its already available and i had used binary so was not required to install separately using make command

No. PostgreSQL's extension consists of not only binaries (.so files) but some control files and so on. If you don't have enough knowledge of extension, you should use the make command to install them. Again I suggest you to test step by step. With random tries you will not get anything closer to the goal.
(0003201)
raj.pandey1982@gmail.com   
2020-02-16 15:14   
All thepgpool-recovery libraries and control functions are available :-

/usr/local/pgsql11.5/lib/pgpool-recovery.c
/usr/local/pgsql11.5/lib/pgpool-recovery.so
[root@pgpool-poc01 bin]#
/usr/local/pgsql11.5/share/extension/pgpool_recovery--1.1.sql
/usr/local/pgsql11.5/share/extension/pgpool_recovery--1.3.sql
/usr/local/pgsql11.5/share/extension/pgpool_recovery.control
(0003202)
raj.pandey1982@gmail.com   
2020-02-16 19:15   
Its again saying same thing:- what is this error "You might need to add explicit type casts."
2020-02-16 13:07:25 +03 LOG: statement: SET statement_timeout To 0
2020-02-16 13:07:25 +03 LOG: statement: SELECT pgpool_recovery('recovery_1st_stage.sh', 'pgpool-poc02.novalocal', '/installer/postgresql-11.5/data', '5432', 1, '5432')
2020-02-16 13:07:25 +03 ERROR: function pgpool_recovery(unknown, unknown, unknown, unknown, integer, unknown) does not exist at character 8
2020-02-16 13:07:25 +03 HINT: No function matches the given name and argument types. You might need to add explicit type casts.
2020-02-16 13:07:25 +03 STATEMENT: SELECT pgpool_recovery('recovery_1st_stage.sh', 'pgpool-poc02.novalocal', '/installer/postgresql-11.5/data', '5432', 1, '5432')
(0003203)
t-ishii   
2020-02-16 20:06   
I don't know why you keep on ignoring my suggestions: make install under src/sql.
What does following command show?

psql template1
select * from pg_available_extensions;

If you do not see following line, then pgpool_recovery is not installed correctly.

pgpool_recovery | 1.3 | 1.3 | recovery functions for pgpool-II for V4.1 or later
(0003204)
raj.pandey1982@gmail.com   
2020-02-17 01:07   
Sorry Sir,
Here is the output :
[postgres@pgpool-poc01 logs]$ /usr/local/pgsql11.5/bin/psql template1
psql (11.5)
Type "help" for help.

template1=# select * from pg_available_extensions;
        name | default_version | installed_version | comment
--------------------+-----------------+-------------------+-----------------------------------------------------------
 plpgsql | 1.0 | 1.0 | PL/pgSQL procedural language
 pgcrypto | 1.3 | | cryptographic functions
 pg_stat_statements | 1.6 | | track execution statistics of all SQL statements executed
 pg_buffercache | 1.3 | | examine the shared buffer cache
 pgpool_adm | 1.2 | | Administrative functions for pgPool
 pgpool_recovery | 1.3 | | recovery functions for pgpool-II for V4.1 or later
 pgpool_regclass | 1.0 | | replacement for regclass
(7 rows)

template1=#
(0003205)
t-ishii   
2020-02-17 12:49   
"installed_version" is empty. It is apparent that you did not execute CREATE EXTENSION on template1.
(0003206)
raj.pandey1982@gmail.com   
2020-02-17 15:55   
[postgres@pgpool-poc01 ~]$ sudo -u postgres /usr/local/pgsql11.5/bin/psql template1

template1=# CREATE EXTENSION pgpool_recovery;
ERROR: function "pgpool_recovery" already exists with same argument types
template1=#
(0003207)
t-ishii   
2020-02-17 16:25   
That's because you installed the function without using CREATE EXTENSION command. You need to remove pgpool_recovery function(s) manually from template1 database first.
(0003208)
raj.pandey1982@gmail.com   
2020-02-17 16:28   
should i use drop command? drop extension something?
(0003209)
raj.pandey1982@gmail.com   
2020-02-17 16:30   
[postgres@pgpool-poc01 sql]$ /usr/local/pgsql11.5/bin/psql template1
psql (11.5)
Type "help" for help.

template1=# DROP EXTENSION pgpool_recovery;
ERROR: extension "pgpool_recovery" does not exist
template1=# select * from pg_available_extensions;
        name | default_version | installed_version | comment
--------------------+-----------------+-------------------+-----------------------------------------------------------
 plpgsql | 1.0 | 1.0 | PL/pgSQL procedural language
 pgcrypto | 1.3 | | cryptographic functions
 pg_stat_statements | 1.6 | | track execution statistics of all SQL statements executed
 pg_buffercache | 1.3 | | examine the shared buffer cache
 pgpool_adm | 1.2 | | Administrative functions for pgPool
 pgpool_recovery | 1.3 | | recovery functions for pgpool-II for V4.1 or later
 pgpool_regclass | 1.0 | | replacement for regclass
(7 rows)

template1=#
(0003210)
raj.pandey1982@gmail.com   
2020-02-17 16:46   
Is it about :HINT: No function matches the given name and argument types. You might need to add explicit type casts. ?

ERROR: function pgpool_recovery(unknown, unknown, unknown, unknown, integer, unknown) does not exist
LINE 1: SELECT pgpool_recovery('recovery_1st_stage.sh', 'pgpool-poc...
                ^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
SQL state: 42883
Character: 9
(0003211)
t-ishii   
2020-02-17 16:53   
> DROP EXTENSION pgpool_recovery;
No, you mistakenly installed pgpool_recovery without using CREATE EXTENSION. So you need to remove them by using DROP FUNCTION command.
(0003212)
raj.pandey1982@gmail.com   
2020-02-17 17:03   
template1=# drop FUNCTION pgpool_recovery;
DROP FUNCTION
template1=# CREATE EXTENSION pgpool_recovery;
ERROR: function "pgpool_remote_start" already exists with same argument types
template1=#
(0003214)
t-ishii   
2020-02-18 09:11   
You need to DROP all functions defined in pgpool_recovery. Check pgpool_recovery--1.3.sql.
(0003218)
raj.pandey1982@gmail.com   
2020-02-19 17:31   
[postgres@pgpool-poc01 ~]$ /usr/local/pgsql11.5/bin/psql template1
psql (11.5)
Type "help" for help.

template1=# drop FUNCTION pgpool_recovery;
DROP FUNCTION
template1=# CREATE EXTENSION pgpool_recovery;
ERROR: function "pgpool_remote_start" already exists with same argument types
template1=#
You need to DROP all functions defined in pgpool_recovery. Check pgpool_recovery--1.3.sql.
template1=# drop FUNCTION pgpool_remote_start ;
DROP FUNCTION
template1=# CREATE EXTENSION pgpool_recovery;
CREATE EXTENSION
template1=# select * from pg_available_extensions;
        name | default_version | installed_version | comment
--------------------+-----------------+-------------------+-----------------------------------------------------------
 plpgsql | 1.0 | 1.0 | PL/pgSQL procedural language
 pgcrypto | 1.3 | | cryptographic functions
 pg_stat_statements | 1.6 | | track execution statistics of all SQL statements executed
 pg_buffercache | 1.3 | | examine the shared buffer cache
 pgpool_adm | 1.2 | | Administrative functions for pgPool
 pgpool_recovery | 1.3 | 1.3 | recovery functions for pgpool-II for V4.1 or later
 pgpool_regclass | 1.0 | | replacement for regclass
(7 rows)

Now installed_version is showing for pgpool_recovery which was previously blank.
(0003219)
t-ishii   
2020-02-20 08:46   
(Last edited: 2020-02-20 08:49)
Ok. Now is the time to try following test:

Shut down postmaster on pgpool-poc02.novalocal . You should see it becomes down status using "show pool_nodes".
>Next test is if you can recover the standby node by using pcp_recovery_node. After issuing pcp_recovery_node command something like:
>
> pcp_recovery_node -h pgpool-poc01.novalocal 1
>
> you should see that pgpool-poc02.novalocal comes back online.

(0003220)
raj.pandey1982@gmail.com   
2020-02-23 18:00   
When i would shutdown slave and execute pcp_recovery_node, the below recovery_1st_stage.sh script will execute which will delete slave data files and restore it back using pg_basebackup.
But I have one doubt here:- why we need to cleanup the all slave data ? should the script not just try to sync and fill the gape ?
For Exaple: if i make slave down for 5 mitns then why i have to cleanup all data and restore from scratch!!

> recovery_1st_stage.sh THAT I WOULD BE USING HERE:-
#!/bin/bash
# This script is executed by "recovery_1st_stage" to recovery a Standby node.

set -o xtrace
exec > >(logger -i -p local1.info) 2>&1

PRIMARY_NODE_PGDATA="$1"
DEST_NODE_HOST="$2"
DEST_NODE_PGDATA="$3"
PRIMARY_NODE_PORT="$4"
DEST_NODE_ID="$5"
DEST_NODE_PORT="$6"

PRIMARY_NODE_HOST=$(hostname)
PGHOME=/usr/pgsql-11
ARCHIVEDIR=/var/lib/pgsql/archivedir
REPLUSER=repl

logger -i -p local1.info recovery_1st_stage: start: pg_basebackup for Standby node $DEST_NODE_ID

## Test passwrodless SSH
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${DEST_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ls /tmp > /dev/null

if [ $? -ne 0 ]; then
    logger -i -p local1.info recovery_1st_stage: passwrodless SSH to postgres@${DEST_NODE_HOST} failed. Please setup passwrodless SSH.
    exit 1
fi

## Get PostgreSQL major version
PGVERSION=`${PGHOME}/bin/initdb -V | awk '{print $3}' | sed 's/\..*//' | sed 's/\([0-9]*\)[a-zA-Z].*/\1/'`
if [ $PGVERSION -ge 12 ]; then
    RECOVERYCONF=${DEST_NODE_PGDATA}/myrecovery.conf
else
    RECOVERYCONF=${DEST_NODE_PGDATA}/recovery.conf
fi

## Create replication slot "${DEST_NODE_HOST}"
${PGHOME}/bin/psql -p ${PRIMARY_NODE_PORT} << EOQ
SELECT pg_create_physical_replication_slot('${DEST_NODE_HOST}');
EOQ

## Execute pg_basebackup to recovery Standby node
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@$DEST_NODE_HOST -i ~/.ssh/id_rsa_pgpool "

    set -o errexit

    rm -rf $DEST_NODE_PGDATA
    rm -rf $ARCHIVEDIR/*

    ${PGHOME}/bin/pg_basebackup -h $PRIMARY_NODE_HOST -U $REPLUSER -p $PRIMARY_NODE_PORT -D $DEST_NODE_PGDATA -X stream

    if [ ${PGVERSION} -ge 12 ]; then
        sed -i -e \"\\\$ainclude_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'\" \
               -e \"/^include_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'/d\" ${DEST_NODE_PGDATA}/postgresql.conf
    fi

    cat > ${RECOVERYCONF} << EOT
primary_conninfo = 'host=${PRIMARY_NODE_HOST} port=${PRIMARY_NODE_PORT} user=${REPLUSER} application_name=${DEST_NODE_HOST} passfile=''/var/lib/pgsql/.pgpass'''
recovery_target_timeline = 'latest'
restore_command = 'scp ${PRIMARY_NODE_HOST}:${ARCHIVEDIR}/%f %p'
primary_slot_name = '${DEST_NODE_HOST}'
EOT

    if [ ${PGVERSION} -ge 12 ]; then
            touch ${DEST_NODE_PGDATA}/standby.signal
    else
            echo \"standby_mode = 'on'\" >> ${RECOVERYCONF}
    fi

    sed -i \"s/#*port = .*/port = ${DEST_NODE_PORT}/\" ${DEST_NODE_PGDATA}/postgresql.conf
"

if [ $? -ne 0 ]; then

    ${PGHOME}/bin/psql -p ${PRIMARY_NODE_PORT} << EOQ
SELECT pg_drop_replication_slot('${DEST_NODE_HOST}');
EOQ

    logger -i -p local1.error recovery_1st_stage: end: pg_basebackup failed. online recovery failed
    exit 1
fi

logger -i -p local1.info recovery_1st_stage: end: recovery_1st_stage complete
exit 0
(0003221)
raj.pandey1982@gmail.com   
2020-02-24 16:13   
Hello Friend, might i have an update on this.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
590 [Pgpool-II] General major always 2020-02-28 19:33 2020-03-02 11:20
Reporter: yangjuexi Platform: linux  
Assigned To: t-ishii OS: redhat  
Priority: urgent OS Version: 7  
Status: resolved Product Version: 4.1.0  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: sonarqube uses jdbc driver to connect to pgpool-II failed
Description: Hi guys,i get some problems that i cant fix and i want to get some help here.

Here are environment information:

I have a postgresql-10.11 cluster with 3 nodes and a pgpool-ii cluster with 3 nodes,and ,they are on the same servers.

pg version 10.11
pgpool-ii version 4.1.0

cluster ip :
node 0 : 10.194.15.134
node 1 : 10.194.15.135
node 2 : 10.194.15.136
vip : 10.194.15.137
spnarqube ip : 10.194.13.230

Here my problems:
the cluster seems work correctly.But if i use sonarqube to connect to the primary node of postgresql directly,it works well.but if i use sonarqube to connect to the pgpool-II with vip ,it seems connections has been built,but no sql execute success.Also ,the same connection property works when i test on navicat.Is anything wrong with my conf file? i have no idea with this problem



Here are part of the log (i will attach the log file and pgpool.conf below):
2020-02-28 17:20:15: pid 9711: LOG: new connection received
2020-02-28 17:20:15: pid 9711: DETAIL: connecting host=10.194.13.230 port=46802
2020-02-28 17:20:15: pid 9711: DEBUG: reading startup packet
2020-02-28 17:20:15: pid 9711: DETAIL: Protocol Major: 3 Minor: 0 database: sonar user: postgres
2020-02-28 17:20:15: pid 9711: LOG: md5 authentication successful with frontend
2020-02-28 17:20:15: pid 9711: DEBUG: creating new connection to backend
2020-02-28 17:20:15: pid 9711: DETAIL: connecting 0 backend
2020-02-28 17:20:15: pid 9711: DEBUG: creating new connection to backend
2020-02-28 17:20:15: pid 9711: DETAIL: connecting 1 backend
2020-02-28 17:20:15: pid 9711: DEBUG: creating new connection to backend
2020-02-28 17:20:15: pid 9711: DETAIL: connecting 2 backend
2020-02-28 17:20:15: pid 9711: DEBUG: authentication backend
2020-02-28 17:20:15: pid 9711: DETAIL: auth kind:0
2020-02-28 17:20:15: pid 9711: DEBUG: process parameter status
2020-02-28 17:20:15: pid 9711: DETAIL: backend:0 name:"application_name" value:""
2020-02-28 17:20:15: pid 9711: DEBUG: process parameter status
2020-02-28 17:20:15: pid 9711: DETAIL: backend:1 name:"application_name" value:""
2020-02-28 17:20:15: pid 9711: DEBUG: process parameter status
2020-02-28 17:20:15: pid 9711: DETAIL: backend:2 name:"application_name" value:""
2020-02-28 17:20:15: pid 9711: LOG: DB node id: 0 backend pid: 11371 statement: SELECT version()
2020-02-28 17:20:15: pid 9711: DEBUG: memcache encode key
2020-02-28 17:20:15: pid 9711: DETAIL: username: "postgres" database_name: "sonar"
2020-02-28 17:20:15: pid 9711: CONTEXT: while searching system catalog, When relcache is missed
2020-02-28 17:20:15: pid 9711: DEBUG: memcache encode key
2020-02-28 17:20:15: pid 9711: DETAIL: query: "SELECT version()"
2020-02-28 17:20:15: pid 9711: CONTEXT: while searching system catalog, When relcache is missed
2020-02-28 17:20:15: pid 9711: DEBUG: memcache encode key
2020-02-28 17:20:15: pid 9711: DETAIL: `postgresSELECT version()sonar' -> `803f3947f7b49c23e56c4a0c4520e990'
2020-02-28 17:20:15: pid 9711: CONTEXT: while searching system catalog, When relcache is missed
2020-02-28 17:20:15: pid 9711: DEBUG: fetching from cache storage
2020-02-28 17:20:15: pid 9711: DETAIL: search key "803f3947f7b49c23e56c4a0c4520e990"
2020-02-28 17:20:15: pid 9711: CONTEXT: while searching system catalog, When relcache is missed
2020-02-28 17:20:15: pid 9711: DEBUG: fetching from cache storage
2020-02-28 17:20:15: pid 9711: DETAIL: cache not found on shared memory
2020-02-28 17:20:15: pid 9711: CONTEXT: while searching system catalog, When relcache is missed
2020-02-28 17:20:15: pid 9711: DEBUG: not hit local relation cache and query cache
2020-02-28 17:20:15: pid 9711: DETAIL: query:SELECT version()
2020-02-28 17:20:15: pid 9711: CONTEXT: while searching system catalog, When relcache is missed
2020-02-28 17:20:15: pid 9711: DEBUG: do_query: extended:1 query:"SELECT version()"
2020-02-28 17:20:15: pid 9711: CONTEXT: while searching system catalog, When relcache is missed
2020-02-28 17:20:15: pid 9711: DEBUG: commiting relation cache to cache storage
2020-02-28 17:20:15: pid 9711: DETAIL: Query="SELECT version()"
2020-02-28 17:20:15: pid 9711: CONTEXT: while searching system catalog, When relcache is missed
2020-02-28 17:20:15: pid 9711: DEBUG: memcache encode key


Tags: pgpool, pgpool-II
Steps To Reproduce:
Additional Information:
Attached Files: pgpool.log (1,473,563 bytes) 2020-02-28 19:33
https://www.pgpool.net/mantisbt/file_download.php?file_id=754&type=bug
pgpool.conf (44,024 bytes) 2020-02-28 19:33
https://www.pgpool.net/mantisbt/file_download.php?file_id=753&type=bug
Notes
(0003233)
t-ishii   
2020-02-29 19:31   
It seems you turn on both streaming replication mode and native replication mode.
replication_mode = on
master_slave_mode = on
master_slave_sub_mode = 'stream'

I guess what you want to use is the streaming replication mode. If so, then you should turn off replication_mode. i.e. replication_mode = off.
(0003234)
yangjuexi   
2020-03-02 11:01   
It works! Thank you for you help!
(0003235)
t-ishii   
2020-03-02 11:20   
You are welcome. I am going mark this issue as "resolved".


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
585 [Pgpool-II] Bug major have not tried 2020-02-18 04:07 2020-02-18 17:30
Reporter: tiennguyen Platform:  
Assigned To: hoshiai OS: CentOS  
Priority: high OS Version: 7.7  
Status: feedback Product Version: 4.1.0  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: Can't connect to virtual ip (VIP) on cluster node
Description: Hello,

I have problem when i trying to connect postgres over Virutal IP (VIP) and pgpool port

Command:
`psql -h VIP -p 9999 -U pgpool -d postgres -w -c 'SHOW POOL_NODES;'`

Log output:
```
psql: error: could not connect to server: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
```

Here is log pgpool contain Virutal IP up:
```
Feb 18 01:59:23 tiennm-dev pgpool[1264]: [42-1] 2020-02-18 01:59:23: pid 1264: ERROR: backend authentication failed
Feb 18 01:59:23 tiennm-dev pgpool[1264]: [42-2] 2020-02-18 01:59:23: pid 1264: DETAIL: backend response with kind 'G' when expecting 'R'
```

I am work on vmware machine it still working fine, but when i move it to my local system on my company, it has the problem as above

Please help me explain error in pgpool log and help me fix it

thanks very much
Tags:
Steps To Reproduce:
Additional Information:
Attached Files:
Notes
(0003217)
hoshiai   
2020-02-18 17:30   
What postgresql version do you use?

If you can replicate this problem, you set
  log_min_messages = 'debug1'
  log_client_messages='on'
in pgpool, and share pgpool.conf and pgpool.log during problem happend.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
584 [Pgpool-II] General minor have not tried 2020-02-15 02:34 2020-02-18 17:13
Reporter: dani Platform:  
Assigned To: hoshiai OS:  
Priority: normal OS Version:  
Status: feedback Product Version: 4.1.0  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: Watchdog session disconnected when any node disconnected (even the standby)
Description: Hi,
I set up pgpool successfully based on this article (https://www.pgpool.net/docs/41/en/html/example-cluster.html).
Now I'm able to connect to postgresql via watchdog virtual ip. Also when runing show pool_nodes command both nodes are shown with up status.
Knowing that I'm using streaming mode, and having two nodes only: one primary and one standby.
The issue is that the failover is not working at all! Once I disconnect any node, even the standby one, the watchdog session disconnected! Even if I tried to connect again to watchdog VIP, the connection failed, until I reconnect the disconnected node again.

Any help would be appreciated,
Thank you.
Tags: pgpool-II, standby, streaming replication, virtual ip, watchdog
Steps To Reproduce:
Additional Information:
Attached Files:
Notes
(0003213)
tiennguyen   
2020-02-18 04:12   
This problem can be mentioned,

You can check more infomation at here:
http://www.sraoss.jp/pipermail/pgpool-general/2019-December/006849.html
https://pgpool.net/mediawiki/index.php/Pgpool-II_4.2_development
(0003216)
hoshiai   
2020-02-18 17:13   
Could you share pgpool.log and pgpool.conf in all pgpool nodes, when this problem is happend ?


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
396 [Pgpool-II] General major unable to reproduce 2018-05-09 23:21 2020-02-18 17:11
Reporter: navingupta Platform:  
Assigned To: t-ishii OS:  
Priority: immediate OS Version:  
Status: feedback Product Version: 3.7.3  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: pgpool is getting started but not able to connect using port 9999 using psql application
Description: Hi Team,

I have tried a lot but I am not able to connect to database using port 9999 .please help.

Log output :-

sudo pgpool -d -n -f /usr/local/etc/pgpool.conf -F /usr/local/etc/pcp.conf 2>&1
2018-05-09 19:44:56: pid 5277: DEBUG: initializing pool configuration
2018-05-09 19:44:56: pid 5277: DETAIL: num_backends: 1 total_weight: 1.000000
2018-05-09 19:44:56: pid 5277: DEBUG: initializing pool configuration
2018-05-09 19:44:56: pid 5277: DETAIL: backend 0 weight: 2147483647.000000 flag: 0000
2018-05-09 19:44:56: pid 5277: DEBUG: pool_coninfo_size: num_init_children (32) * max_pool (4) * MAX_NUM_BACKENDS (128) * sizeof(ConnectionInfo) (136) = 2228224 bytes requested for shared memory
2018-05-09 19:44:56: pid 5277: DEBUG: ProcessInfo: num_init_children (32) * sizeof(ProcessInfo) (32) = 1024 bytes requested for shared memory
2018-05-09 19:44:56: pid 5277: DEBUG: Request info are: sizeof(POOL_REQUEST_INFO) 5264 bytes requested for shared memory
2018-05-09 19:44:56: pid 5277: DEBUG: Recovery management area: sizeof(int) 4 bytes requested for shared memory
2018-05-09 19:44:56: pid 5277: LOG: Setting up socket for 0.0.0.0:9999
2018-05-09 19:44:56: pid 5277: LOG: Setting up socket for :::9999
2018-05-09 19:44:56: pid 5278: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5279: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5280: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5281: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5285: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5282: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5287: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5283: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5286: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5284: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5291: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5297: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5288: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5292: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5289: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5293: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5290: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5294: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5295: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5296: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5298: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5299: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5300: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5303: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5304: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5305: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5301: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5306: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5307: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5308: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5277: DEBUG: find_primary_node: not in streaming replication mode
2018-05-09 19:44:56: pid 5309: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5310: DEBUG: I am PCP child with pid:5310
2018-05-09 19:44:56: pid 5311: DEBUG: I am 5311
2018-05-09 19:44:56: pid 5311: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5312: DEBUG: I am health check process pid:5312 DB node id:0
2018-05-09 19:44:56: pid 5312: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5302: DEBUG: initializing backend status
2018-05-09 19:44:56: pid 5277: LOG: pgpool-II successfully started. version 3.7.3 (amefuriboshi)


Psql connect :-

/home/postgres/bin/pgsql-9.6.1/bin/psql -h localhost -p 9999 -U postgres
Password for user postgres:
psql: ERROR: authentication failed
DETAIL: password authentication failed for user "postgres"


Log after firing above command :-

2018-05-09 19:45:24: pid 5302: DEBUG: I am 5302 accept fd 9
2018-05-09 19:45:24: pid 5302: DEBUG: reading startup packet
2018-05-09 19:45:24: pid 5302: DETAIL: Protocol Major: 1234 Minor: 5679 database: user:
2018-05-09 19:45:24: pid 5302: DEBUG: selecting backend connection
2018-05-09 19:45:24: pid 5302: DETAIL: SSLRequest from client
2018-05-09 19:45:24: pid 5302: DEBUG: SSL is requested but SSL support is not available
2018-05-09 19:45:24: pid 5302: DEBUG: pool_write: to frontend: kind:N po:0
2018-05-09 19:45:24: pid 5302: DEBUG: pool_flush_it: flush size: 1
2018-05-09 19:45:24: pid 5302: DEBUG: reading startup packet
2018-05-09 19:45:24: pid 5302: DETAIL: application_name: psql
2018-05-09 19:45:24: pid 5302: DEBUG: reading startup packet
2018-05-09 19:45:24: pid 5302: DETAIL: Protocol Major: 3 Minor: 0 database: postgres user: postgres
2018-05-09 19:45:24: pid 5302: DEBUG: creating new connection to backend
2018-05-09 19:45:24: pid 5302: DETAIL: connecting 0 backend
2018-05-09 19:45:24: pid 5302: DEBUG: SSL is requested but SSL support is not available
2018-05-09 19:45:24: pid 5302: DEBUG: pool_flush_it: flush size: 84
2018-05-09 19:45:24: pid 5302: DEBUG: pool_read: read 13 bytes from backend 0
2018-05-09 19:45:24: pid 5302: DEBUG: reading message length
2018-05-09 19:45:24: pid 5302: DETAIL: slot: 0 length: 12
2018-05-09 19:45:24: pid 5302: DEBUG: authentication backend
2018-05-09 19:45:24: pid 5302: DETAIL: auth kind:5
2018-05-09 19:45:24: pid 5302: DEBUG: authentication backend
2018-05-09 19:45:24: pid 5302: DETAIL: trying md5 authentication
2018-05-09 19:45:24: pid 5302: DEBUG: performing md5 authentication
2018-05-09 19:45:24: pid 5302: DETAIL: DB node id: 0 salt: fb7cb627
2018-05-09 19:45:24: pid 5302: DEBUG: pool_write: to frontend: kind:R po:0
2018-05-09 19:45:24: pid 5302: DEBUG: pool_write: to frontend: length:4 po:1
2018-05-09 19:45:24: pid 5302: DEBUG: pool_write: to frontend: length:4 po:5
2018-05-09 19:45:24: pid 5302: DEBUG: pool_write: to frontend: length:4 po:9
2018-05-09 19:45:24: pid 5302: DEBUG: pool_flush_it: flush size: 13

Tags:
Steps To Reproduce:
Additional Information:
Attached Files: pgpool.conf (35,852 bytes) 2018-05-11 02:05
https://www.pgpool.net/mantisbt/file_download.php?file_id=452&type=bug
pcp.conf (884 bytes) 2018-05-11 02:05
https://www.pgpool.net/mantisbt/file_download.php?file_id=451&type=bug
pool_hba.conf (3,432 bytes) 2018-05-11 02:05
https://www.pgpool.net/mantisbt/file_download.php?file_id=450&type=bug
pg_hba.conf (4,784 bytes) 2018-05-11 15:25
https://www.pgpool.net/mantisbt/file_download.php?file_id=453&type=bug
Notes
(0002012)
t-ishii   
2018-05-10 07:33   
Sounds like a configuration problem. Please share: pgpool.conf, pool_hba.conf, pg_hba.conf.
(0002013)
navingupta   
2018-05-11 02:05   
As you asked, please find the attached conf file.

Thanks,
Navin.
(0002014)
t-ishii   
2018-05-11 08:03   
I think the password for postgres user in pool_passwd does not match with PostgreSQL's password.
You can verify this by:
$ grep postgres pool_passwd
and
select passwd from pg_shadow where usename = 'postgres';
The output of select should match with the password part of pool_passwd .
(0002015)
navingupta   
2018-05-11 15:00   
Hi ,

I verified the password, it is same ..

postgres@navin-pc:~$ grep postgres /usr/local/etc/pool_passwd
postgres:md53175bce1d3201d16594cebf9d7eb3f9d

navin_test_db=# select passwd from pg_shadow where usename = 'postgres';
               passwd
-------------------------------------
 md53175bce1d3201d16594cebf9d7eb3f9d
(1 row)
(0002016)
t-ishii   
2018-05-11 15:03   
Are you sure that you can directly login to PostgreSQL with the password?
(0002017)
t-ishii   
2018-05-11 15:10   
(Last edited: 2018-05-11 15:10)
Can you provide pg_hba.conf?

(0002018)
navingupta   
2018-05-11 15:11   
Yes I am able to login with pasword ..

/home/postgres/bin/pgsql-9.6.1/bin/psql navin_test_db -U postgres
Password for user postgres:
psql (9.6.1)
Type "help" for help.

navin_test_db=#
(0002019)
navingupta   
2018-05-11 15:16   
Please find the pg_hba.conf file..
(0002020)
navingupta   
2018-05-11 15:20   
Hi,

Please find the pg_hba.conf file.
(0002021)
navingupta   
2018-05-11 15:22   
Are you getting the file ?It is not showing on front-end.
(0002022)
t-ishii   
2018-05-11 15:23   
No.
(0002023)
navingupta   
2018-05-11 15:25   
please find the file.
(0002024)
t-ishii   
2018-05-11 15:28   
Thanks. Also I need pgpool log after "2018-05-09 19:45:24: pid 5302: DEBUG: pool_flush_it: flush size: 13".

This seems quite normal so I guess something happend after this.
(0002025)
navingupta   
2018-05-11 15:45   
Hi, actually log is not getting written in /var/log/pgpool.log file. This is getting generated on terminal when I fire command

sudo pgpool -d -n -f /usr/local/etc/pgpool.conf -F /usr/local/etc/pcp.conf 2>&1
2018-05-11 12:12:43: pid 14883: DEBUG: initializing pool configuration
2018-05-11 12:12:43: pid 14883: DETAIL: num_backends: 1 total_weight: 1.000000
2018-05-11 12:12:43: pid 14883: DEBUG: initializing pool configuration
2018-05-11 12:12:43: pid 14883: DETAIL: backend 0 weight: 2147483647.000000 flag: 0000
2018-05-11 12:12:43: pid 14883: DEBUG: pool_coninfo_size: num_init_children (32) * max_pool (4) * MAX_NUM_BACKENDS (128) * sizeof(ConnectionInfo) (136) = 2228224 bytes requested for shared memory
(0002104)
t-ishii   
2018-07-06 13:50   
Sorry for long absence. Today I realized that you have only 1 PostgreSQL server. In this case, you should NOT use pool_passwd file. pool_passwd is only consulted when there are 2 or more PostgreSQL servers. So please set "enable_pool_hba = off" in your pgpoo.conf and restart Pgpool-II, then try to login via Pgpool-II.
You can leave pool_passwd as it is. If enable_pool_hba is disabled, it's not consulted anyway.
(0002612)
t-ishii   
2019-05-21 14:13   
May I close this issue?
(0002823)
Carlos Mendez   
2019-09-06 00:54   
Hi T-Ishii
Sorry for intruding, Could you please give more details about your last comments:

I dont understand, when you say pool_passwd is only consulted when there are 2 or more PostgreSQL servers.

Regards
CAR
(0002827)
t-ishii   
2019-09-09 13:47   
Sorry,
> you should NOT use pool_passwd file. pool_passwd
This was not correct. I should have said "you may not use pool_passwd file.".
The reason why pool_passwd is explained here. (thus you will know why pool_passwd is not necessary for 1 backend configuration).
https://pgpool.net/mediawiki/index.php/FAQ#How_does_pgpool-II_handle_md5_authentication.3F

Back to original issue. I have tried with 3.7.3 and it works for me. Here are my logs. Do you see any difference from yours?

2019-09-09 13:34:12: pid 16236: DEBUG: reading startup packet
2019-09-09 13:34:12: pid 16236: DETAIL: application_name: psql
2019-09-09 13:34:12: pid 16236: DEBUG: reading startup packet
2019-09-09 13:34:12: pid 16236: DETAIL: Protocol Major: 3 Minor: 0 database: test user: foo
2019-09-09 13:34:12: pid 16236: DEBUG: creating new connection to backend
2019-09-09 13:34:12: pid 16236: DETAIL: connecting 0 backend
2019-09-09 13:34:12: pid 16236: DEBUG: pool_flush_it: flush size: 75
2019-09-09 13:34:12: pid 16236: DEBUG: pool_read: read 13 bytes from backend 0
2019-09-09 13:34:12: pid 16236: DEBUG: reading message length
2019-09-09 13:34:12: pid 16236: DETAIL: slot: 0 length: 12
2019-09-09 13:34:12: pid 16236: DEBUG: authentication backend
2019-09-09 13:34:12: pid 16236: DETAIL: auth kind:5
2019-09-09 13:34:12: pid 16236: DEBUG: authentication backend
2019-09-09 13:34:12: pid 16236: DETAIL: trying md5 authentication
2019-09-09 13:34:12: pid 16236: DEBUG: performing md5 authentication
2019-09-09 13:34:12: pid 16236: DETAIL: DB node id: 0 salt: 315d5a91
2019-09-09 13:34:12: pid 16236: DEBUG: pool_write: to frontend: kind:R po:0
2019-09-09 13:34:12: pid 16236: DEBUG: pool_write: to frontend: length:4 po:1
2019-09-09 13:34:12: pid 16236: DEBUG: pool_write: to frontend: length:4 po:5
2019-09-09 13:34:12: pid 16236: DEBUG: pool_write: to frontend: length:4 po:9
2019-09-09 13:34:12: pid 16236: DEBUG: pool_flush_it: flush size: 13
2019-09-09 13:34:12: pid 16236: DEBUG: pool_write: to backend: 0 kind:p
2019-09-09 13:34:12: pid 16236: DEBUG: pool_flush_it: flush size: 41
2019-09-09 13:34:12: pid 16236: DEBUG: pool_read: read 324 bytes from backend 0
2019-09-09 13:34:12: pid 16236: DEBUG: pool_write: to frontend: kind:R po:0
2019-09-09 13:34:12: pid 16236: DEBUG: pool_write: to frontend: length:4 po:1
2019-09-09 13:34:12: pid 16236: DEBUG: pool_write: to frontend: length:4 po:5
2019-09-09 13:34:12: pid 16236: DEBUG: pool_flush_it: flush size: 9
2019-09-09 13:34:12: pid 16236: DEBUG: pool_write: to frontend: kind:S po:0
2019-09-09 13:34:12: pid 16236: DEBUG: reading message length
2019-09-09 13:34:12: pid 16236: DETAIL: master slot: 0 length: 26
2019-09-09 13:34:12: pid 16236: DEBUG: pool_write: to frontend: length:4 po:1
2019-09-09 13:34:12: pid 16236: DEBUG: process parameter status
2019-09-09 13:34:12: pid 16236: DETAIL: backend:0 name:"application_name" value:"psql"
:
:
(0003215)
t-ishii   
2020-02-18 17:11   
No response over 1 month. I am going to close this issue if there's no objection.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
534 [Pgpool-II] Bug major always 2019-08-06 23:25 2020-02-18 15:10
Reporter: eldad Platform: CentOS release 6.10  
Assigned To: t-ishii OS: CentOS Linux  
Priority: high OS Version: 6.10  
Status: resolved Product Version: 4.0.5  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version: 4.0.7  
    Target Version: 4.0.7  
Summary: unable to connect when enable_pool_hba is off and pg_hba on password
Description: Hi,

I'm having a strange behavior when trying to connect to the DB with pgpool.
in pgpool pool hba is disabled and pool_passwd =''
in postgres pg_hba I have entries for the master and standby with password METHOD.
.pgpass is configure for both postgres port and pgpool port:
*:5432:*:postgres:*************
*:5400:*:postgres:*************


connecting to postgres directly works well, but with pgpool it gives error

-bash-4.1$ psql -U postgres -p 5432 -h smxashpocpdb01
psql (10.9)
Type "help" for help.

postgres=# \q
-bash-4.1$ psql -U postgres -p 5400 -h smxashpocpdb01
psql: FATAL: clear text password authentication failed
DETAIL: unable to get the password for user: "postgres"

see related logs info in additional information.

I have similar configuration with older postgres (9.6) and pgpool (pg96-3.6.2) that works well.

Please see how can I use it in the new versions - pgpool 4.0.5 and postgres 10.9

Thanks,
Eldad
Tags: pgpool-II
Steps To Reproduce: in pgpool :
enable_pool_hba = off
pool_passwd =''

in pg_hba.conf:
# "local" is for Unix domain socket connections only
local all all password
# IPv4 local connections:
host all all 127.0.0.1/32 password
host replication all 192.168.35.3/32 trust
host all all 192.168.35.3/32 password
host replication all 192.168.35.4/32 trust
host all all 192.168.35.4/32 password
Additional Information: pgpool log:
2019-08-06 13:57:45: pid 16823: DEBUG: reading message length
2019-08-06 13:57:45: pid 16823: DETAIL: slot: 1 length: 8
2019-08-06 13:57:45: pid 16823: LOCATION: pool_auth.c:2007
2019-08-06 13:57:45: pid 16823: DEBUG: authentication backend
2019-08-06 13:57:45: pid 16823: DETAIL: auth kind:3
2019-08-06 13:57:45: pid 16823: LOCATION: pool_auth.c:408
2019-08-06 13:57:45: pid 16823: DEBUG: authentication backend
2019-08-06 13:57:45: pid 16823: DETAIL: trying clear text password authentication
2019-08-06 13:57:45: pid 16823: LOCATION: pool_auth.c:437
2019-08-06 13:57:45: pid 16823: WARNING: unable to get password, password file descriptor is NULL
2019-08-06 13:57:45: pid 16823: LOCATION: pool_passwd.c:347
2019-08-06 13:57:45: pid 16823: FATAL: clear text password authentication failed
2019-08-06 13:57:45: pid 16823: DETAIL: unable to get the password for user: "postgres"
2019-08-06 13:57:45: pid 16823: LOCATION: pool_auth.c:1018
2019-08-06 13:57:45: pid 16823: DEBUG: pool_write: to frontend: kind:E po:0
2019-08-06 13:57:45: pid 16823: LOCATION: pool_stream.c:461

postgres log:
postgres 2019-08-06 13:57:45 UTC [16842][postgres][192.168.35.4]: [1-1] DEBUG: postgres child[16842]: starting with (
postgres 2019-08-06 13:57:45 UTC [16842][postgres][192.168.35.4]: [2-1] DEBUG: postgres
postgres 2019-08-06 13:57:45 UTC [16842][postgres][192.168.35.4]: [3-1] DEBUG: )
postgres 2019-08-06 13:57:45 UTC [16842][postgres][192.168.35.4]: [4-1] DEBUG: InitPostgres
postgres 2019-08-06 13:57:45 UTC [16842][postgres][192.168.35.4]: [5-1] DEBUG: my backend ID is 3
postgres 2019-08-06 13:57:45 UTC [16842][postgres][192.168.35.4]: [6-1] DEBUG: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0
postgres 2019-08-06 13:57:45 UTC [16842][postgres][192.168.35.4]: [7-1] FATAL: expected password response, got message type 88
postgres 2019-08-06 13:57:45 UTC [16842][postgres][192.168.35.4]: [8-1] DEBUG: shmem_exit(1): 1 before_shmem_exit callbacks to make
postgres 2019-08-06 13:57:45 UTC [16842][postgres][192.168.35.4]: [9-1] DEBUG: shmem_exit(1): 6 on_shmem_exit callbacks to make
postgres 2019-08-06 13:57:45 UTC [16842][postgres][192.168.35.4]: [10-1] DEBUG: proc_exit(1): 3 callbacks to make
postgres 2019-08-06 13:57:45 UTC [16842][postgres][192.168.35.4]: [11-1] DEBUG: exit(1)
postgres 2019-08-06 13:57:45 UTC [16842][postgres][192.168.35.4]: [12-1] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
postgres 2019-08-06 13:57:45 UTC [16842][postgres][192.168.35.4]: [13-1] DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
postgres 2019-08-06 13:57:45 UTC [16842][postgres][192.168.35.4]: [14-1] DEBUG: proc_exit(-1): 0 callbacks to make
Attached Files: postgresql.conf (23,028 bytes) 2019-08-11 16:18
https://www.pgpool.net/mantisbt/file_download.php?file_id=636&type=bug
pgpool.conf (40,647 bytes) 2019-08-11 16:18
https://www.pgpool.net/mantisbt/file_download.php?file_id=635&type=bug
Notes
(0002755)
t-ishii   
2019-08-07 15:57   
I think you need to set:

allow_clear_text_frontend_auth = on

in your pgpool.conf.
(0002756)
eldad   
2019-08-08 20:15   
Hi,

I already set allow_clear_text_frontend_auth = on before reporting the problem, it didn't help.
Is there any additional information you need ?

Regards,
Eldad
(0002759)
t-ishii   
2019-08-09 10:34   
(Last edited: 2019-08-09 11:24)
I suspect that you actually store passwords in PostgreSQL as md5 hash (as long as you don't change password_encryption parameter of PostgreSQL) despite that you set auth method as "password", which makes Pgpool-II confused since it thinks that PostgreSQL sends password in clear text form.

You can workaround this by setting "md5" instead of "password" in pg_hba.conf.

(0002768)
eldad   
2019-08-11 16:18   
Hi,

I tried using md5 got this error:
-bash-4.1$ psql -U postgres -p 5400 -h smxashpocpdb01
psql: ERROR: unable to read message length
DETAIL: message length (8) in slot 1 does not match with slot 0(12)

connecting directly to postgres with port 5432 works well in both cases.
-bash-4.1$ psql -U postgres -p 5432 -h smxashpocpdb01
psql (10.9)
Type "help" for help.

postgres=#


I'm attaching pgpool and postgres conf files
regards,
Eldad
(0002769)
eldad   
2019-08-11 16:32   
Issue resolved after setting md5 on slave pg_hba.conf also and allow_clear_text_frontend_auth = on

seems like there is no way to work with "password" in pg_hba.conf, maybe you should check it or update the documentation accordingly.

Thanks for the help,
Eldad
(0002770)
t-ishii   
2019-08-11 17:40   
I think you should have been able to use "password" in pg_hba.conf and allow_clear_text_frontend_auth = on. I suspect it's a new issue with Pgpool-II. I believe one of our developers Usama is working on it.
(0002791)
t-ishii   
2019-08-19 08:48   
(Last edited: 2019-08-19 08:49)
Usama has pushed the fix for this. Will appear in the next minor release.
https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=94428eaeff8d5985940d422bd00442dbd00e437c

(0002820)
t-ishii   
2019-09-05 15:25   
The minor release has been already out. I am going to mark this issue as resolved if there's no objection.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
582 [Pgpool-II] Bug major have not tried 2020-02-09 00:22 2020-02-12 13:50
Reporter: postgann2020 Platform: RHEL-64  
Assigned To: hoshiai OS: Linux  
Priority: high OS Version: 7.2  
Status: feedback Product Version: 3.7.11  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: 'idle in transaction' connections in pgpool server and not able see the same in postgres servers
Description: Hi Team,

Thanks for your support.

Hi all,
We need support.
I am getting too many 'idle in transaction' connections in pgpool server and the related PIDs are missing in postgres servers i.e connections are getting closed in postgres servers( not sure what is happening ).

what may be the problem?


Environment Details : Two pgpool servers( 3.7.11 ) with [ one master and three slave Postgres servers (9.5) ]
                      Application side : Java with struts framework and deploying in Wildfly servers(9).
                      
1.Can not reproduce the issue every time.

2. Pgpool.conf params :
    connection_cache = on
                                   # Activate connection pools
                                   # (change requires restart)
                                   # Semicolon separated list of queries
                                   # to be issued at the end of a session
                                   # The default is for 8.3 and later
    reset_query_list = 'ABORT; DISCARD ALL'
                                       # The following one is for 8.2 and before
    #reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'
    num_init_children = 1000
                                       # Number of concurrent sessions allowed
                                       # (change requires restart)
    max_pool = 1
                                       # Number of connection pool caches per connection
                                       # (change requires restart)
    # - Life time -
    child_life_time = 300
                                       # Pool exits after being idle for this many seconds
    child_max_connections = 0
                                       # Pool exits after receiving that many connections
                                       # 0 means no exit
    connection_life_time = 0
                                       # Connection to backend closes after being idle for this many seconds
                                       # 0 means no close
    client_idle_limit = 0
    connect_timeout = 100000
    client_idle_limit_in_recovery = 0
    
3. pids which got struck in pgpool servers and disappear in postgres servers

    1). From the pgpool servers:

    (i) In Pgpool :
    root 30659 6239 0 13:17 ? 00:00:00 pgpool: postgres postgres 10.19.61.122 idle in transaction
    root 32688 6239 0 13:29 ? 00:00:00 pgpool: postgres postgres 10.19.61.122 idle in transaction
    
    (ii) show pool_pools;
    30659 | 2020-02-08 13:17:26 | 0 | 0 | postgres | postgres | 2020-02-08 13:47:53 | 3 | 0 | 1 | 378276 | 1
    30659 | 2020-02-08 13:17:26 | 0 | 1 | postgres | postgres | 2020-02-08 13:47:53 | 3 | 0 | 1 | 446378 | 1
    30659 | 2020-02-08 13:17:26 | 0 | 2 | postgres | postgres | 2020-02-08 13:47:53 | 3 | 0 | 1 | 26020 | 1
    30659 | 2020-02-08 13:17:26 | 0 | 3 | postgres | postgres | 2020-02-08 13:47:53 | 3 | 0 | 1 | 346430 | 1

    32688 | 2020-02-08 13:29:14 | 0 | 0 | postgres | postgres | 2020-02-08 13:52:05 | 3 | 0 | 1 | 378814 | 1
    32688 | 2020-02-08 13:29:14 | 0 | 1 | postgres | postgres | 2020-02-08 13:52:05 | 3 | 0 | 1 | 447031 | 1
    32688 | 2020-02-08 13:29:14 | 0 | 2 | postgres | postgres | 2020-02-08 13:52:05 | 3 | 0 | 1 | 26544 | 1
    32688 | 2020-02-08 13:29:14 | 0 | 3 | postgres | postgres | 2020-02-08 13:52:05 | 3 | 0 | 1 | 346873 | 1
    
    (iii) In Postgres :

    ps -ef| grep 378814
    397874 393866 0 16:26 pts/0 00:00:00 grep --color=auto 378814
    ps -ef| grep 378276
    397995 393866 0 16:27 pts/0 00:00:00 grep --color=auto 378276
    
    iv: Not able to see in pg_stat_activity

Please help us to find the issues:

1. Why we are not able to see the pgpool "idle in transaction" connections in backend DB's(Using pg_stat_activity and ps -eaf| grep pid).
2. how to check what exactly the "idle in transaction" connections are doing in backends?. and how to see those connections?.
3. How to avoid these kind of issues and what parameters at pgpool and db level will help us to release these "idle in transaction" connections?.

Your support will be much appreciate.
Thanks for your support.

Regards,
Bingo.
Tags:
Steps To Reproduce:
Additional Information:
Attached Files:
Notes
(0003163)
siva   
2020-02-10 11:26   
Hi Team,

Could someone please respond on this request, this issues is entirely blocking all the activities in production.

Thanks.
(0003169)
postgann2020   
2020-02-11 14:54   
Hi hoshiai,

Good Morning,

Just for information.

We are seeing below messages in log:

1. pgpool[12047]: [87052-1] 2020-02-10 19:25:05: pid 12047: FATAL: connection was terminated due to conflict with recovery.
2. pgpool[26352]: [869029-2] 2020-02-10 18:52:26: pid 26352: DETAIL: kind mismatch among backends. Possible last query was: "COMMIT" kind details are: 0[2] 3[E: terminating connection due to conflict with recovery].

Please help us for below queries as well.
1. Can we ignore "connection was terminated due to conflict with recovery" ?.
2. Please suggest, how to avoid the event ?.

Thanks & Regards,
postgann.
(0003182)
hoshiai   
2020-02-12 13:21   
(Last edited: 2020-02-12 13:24)
In the above 'Description', I understand that pgpool use its connection and postgres(node:0) connection isn't maybe existed.
We can't know Why backend process don't exist that based on just this information.

If possible, replay a situation, log_connections and log_disconnections are enabled by postgresql, and pgpool set log_min_messages to 'debug1'.

> (iii) In Postgres :
> ps -ef| grep 378814
> 397874 393866 0 16:26 pts/0 00:00:00 grep --color=auto 378814
> ps -ef| grep 378276
> 397995 393866 0 16:27 pts/0 00:00:00 grep --color=auto 378276
Then, was this executed by postgres of node0 ?

(0003183)
hoshiai   
2020-02-12 13:35   
And Could you share pgpool.conf? If necessary, you masked data(address, password ,etc ..).
If only above description, I can't understand some settings such as pgpool's mode.
(0003184)
hoshiai   
2020-02-12 13:43   
(Last edited: 2020-02-12 13:50)
I'm sorry many times... Please share postgresql.conf, postgresql.log and pgpool.log too.



View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
562 [pgpool-HA] General major always 2019-12-05 05:31 2020-02-05 09:39
Reporter: mnieva Platform:  
Assigned To: hoshiai OS:  
Priority: normal OS Version:  
Status: feedback Product Version:  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: Add to watchdog cluster request is rejected by node 9000
Description: I have 2 db servers with repmgr set up for replication. pgpool is installed in both servers and is configured. I followed the steps in https://www.pgpool.net/pgpool-web/contrib_docs/watchdog_master_slave/en.html#thisis.

My problem is when pgpool is started on the 2 host, I am getting this.

2019-12-04 09:43:01: db: [No Connection] pid 23568: user: [No Connection] client [No Connection]: 16 FATAL: Add to watchdog cluster request is rejected by node "pgsql01-lcp-prd.vz.points.com:9000"
2019-12-04 09:43:01: db: [No Connection] pid 23568: user: [No Connection] client [No Connection]: 17 HINT: check the watchdog configurations.
2019-12-04 09:43:01: db: [No Connection] pid 23568: user: [No Connection] client [No Connection]: 18 LOG: Watchdog is shutting down
2019-12-04 09:43:01: db: [No Connection] pid 23566: user: [No Connection] client [No Connection]: 17 DEBUG: reaper handler
2019-12-04 09:43:01: db: [No Connection] pid 23566: user: [No Connection] client [No Connection]: 18 DEBUG: watchdog child process with pid: 23568 exit with FATAL ERROR. pgpool-II will be shutdown
2019-12-04 09:43:01: db: [No Connection] pid 23566: user: [No Connection] client [No Connection]: 19 LOG: watchdog child process with pid: 23568 exits with status 768
2019-12-04 09:43:01: db: [No Connection] pid 23566: user: [No Connection] client [No Connection]: 20 FATAL: watchdog child process exit with fatal error. exiting pgpool-II
2019-12-04 09:43:01: db: [No Connection] pid 23569: user: [No Connection] client [No Connection]: 1 LOG: setting the local watchdog node name to "pgsql02-lcp-prd.vz.points.com:9999 Linux pgsql02-lcp-prd.vz.points.com"
2019-12-04 09:43:01: db: [No Connection] pid 23569: user: [No Connection] client [No Connection]: 2 LOG: watchdog cluster is configured with 1 remote nodes
2019-12-04 09:43:01: db: [No Connection] pid 23569: user: [No Connection] client [No Connection]: 3 LOG: watchdog remote node:0 on pgsql01-lcp-prd.vz.points.com:9000
2019-12-04 09:43:01: db: [No Connection] pid 23569: user: [No Connection] client [No Connection]: 4 LOG: interface monitoring is disabled in watchdog
2019-12-04 09:43:01: db: [No Connection] pid 23569: user: [No Connection] client [No Connection]: 5 INFO: IPC socket path: "/home/postgres/config/9999/run/.s.PGPOOLWD_CMD.9000"
2019-12-04 09:43:01: db: [No Connection] pid 23569: user: [No Connection] client [No Connection]: 6 LOG: watchdog node state changed from [DEAD] to [LOADING]
2019-12-04 09:43:01: db: [No Connection] pid 23569: user: [No Connection] client [No Connection]: 7 DEBUG: STATE MACHINE INVOKED WITH EVENT = STATE CHANGED Current State = LOADING
2019-12-04 09:43:01: db: [No Connection] pid 23569: user: [No Connection] client [No Connection]: 8 LOG: new outbound connection to pgsql01-lcp-prd.vz.points.com:9000
2019-12-04 09:43:01: db: [No Connection] pid 23569: user: [No Connection] client [No Connection]: 9 DEBUG: STATE MACHINE INVOKED WITH EVENT = NEW OUTBOUND_CONNECTION Current State = LOADING
2019-12-04 09:43:01: db: [No Connection] pid 23569: user: [No Connection] client [No Connection]: 10 DEBUG: sending packet, watchdog node:[] command id:[4] type:[ADD NODE] state:[LOADING]
2019-12-04 09:43:01: db: [No Connection] pid 23569: user: [No Connection] client [No Connection]: 11 DEBUG: sending watchdog packet to socket:7, type:[A], command ID:4, data Length:383
2019-12-04 09:43:01: db: [No Connection] pid 23569: user: [No Connection] client [No Connection]: 12 LOG: Watchdog is shutting down


Not so sure how to resolve this issue. I am unsure on which configuration did I do wrong. Hope you can help me on this.
Tags:
Steps To Reproduce: start pgpool on host 1
start pgpool on host2

pgpool on host2 is shutdown
Additional Information: I am using postgres 11.6
repmgr 5.0
pgpool 4.1

I am attaching the pgpool.conf and the pgpool logs
Attached Files: pgpool server 1.zip (18,882 bytes) 2019-12-05 05:31
https://www.pgpool.net/mantisbt/file_download.php?file_id=696&type=bug
pgpool server 2.zip (10,153 bytes) 2019-12-05 05:31
https://www.pgpool.net/mantisbt/file_download.php?file_id=695&type=bug
pgpool.log (363,466 bytes) 2019-12-07 05:37
https://www.pgpool.net/mantisbt/file_download.php?file_id=704&type=bug
Notes
(0003001)
hoshiai   
2019-12-05 13:06   
The config paramter is wrong.
your config fie set :
  other_pgpool_port0 = 5432
  port = 9999

port number is non-match. If pgpool(on host 1) use port 9999, pgpool(on host 2) should set other_pgpool_port0 to 9999:
(0003012)
mnieva   
2019-12-07 05:37   
Hi, thanks for your help. I have updated he other_pgpool_port and now able to start up both pgpool.

However, pgpool is unable to do the failover. I started both pgpool pool. Reboot the master host. The db slave was not promoted to the master.
When I do not use watchdog, and start only one pgpool, the slave gets promoted. Failover happens. But when watchdog is on and both pgpool is started, the failover is not done.

I am attaching the pgpool logs for the slave pgpool. Am I missing anything on the configuration?
(0003017)
hoshiai   
2019-12-10 14:05   
Ok, I wll check your log later.

This ticket mistake project name(pgpool-HA). Actually, 'Pgpool-II' is correct.
So, could you re-create ticket with correct project?
(0003132)
hoshiai   
2020-02-05 09:39   
> This ticket mistake project name(pgpool-HA). Actually, 'Pgpool-II' is correct.
> So, could you re-create ticket with correct project?

Could you create new issue which it is same content on Pgpool-II project?

> I am attaching the pgpool logs for the slave pgpool. Am I missing anything on the configuration?

both Pgpools was not communicated yet.
I think that you sould set 'haertbeat_device0' to value other than empty.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
559 [Pgpool-II] Bug major sometimes 2019-11-20 19:28 2020-02-03 10:56
Reporter: pdomagala Platform:  
Assigned To: pengbo OS:  
Priority: high OS Version:  
Status: feedback Product Version: 4.1.0  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: Randomly closed connections
Description: We've running latest PGPool v4.1.0 in docker container. Everything works flawlessly, but sometimes (under higher usage/traffic) our app returns

```
PG::UnableToSend: server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
```

Also we see in PostgreSQL logs something like this:
```
2019-11-19 18:13:19 GMT LOG: could not receive data from client: Connection reset by peer
2019-11-19 18:13:19 GMT LOG: unexpected EOF on client connection with an open transaction
```.

PGPool and Postgresql configurations attached.
Tags:
Steps To Reproduce: Those errors are pretty random, we can't reproduce it. But it occurs at least a few times per hour.
Additional Information:
Attached Files: pg.conf (1,583 bytes) 2019-11-20 19:28
https://www.pgpool.net/mantisbt/file_download.php?file_id=690&type=bug
pgpool.conf (40,349 bytes) 2019-11-20 19:28
https://www.pgpool.net/mantisbt/file_download.php?file_id=689&type=bug
pg.log (4,278 bytes) 2019-11-23 00:10
https://www.pgpool.net/mantisbt/file_download.php?file_id=692&type=bug
pgpool.log (15,439 bytes) 2019-11-23 00:10
https://www.pgpool.net/mantisbt/file_download.php?file_id=691&type=bug
Notes
(0002971)
pdomagala   
2019-11-20 19:35   
also we use PostgreSQL 10.9-1.
(0002974)
pengbo   
2019-11-21 09:34   
```
2019-11-19 18:13:19 GMT LOG: could not receive data from client: Connection reset by peer
2019-11-19 18:13:19 GMT LOG: unexpected EOF on client connection with an open transaction
```
The reason the connection from client is dropped.

If your application connects to PostgreSQL directly, does it occur?
(0002975)
pdomagala   
2019-11-21 18:26   
No, we don't have such issues when app is connected to PostgreSQL directly. Is there any chance that is occuring because we don't have enabled SSL on frontend (app->pgpool)?

Also we have set `client_idle_limit = 30`, so PGPool should kill idle connections after 30 seconds, but I see there are a lot of unclosed connections even after a couple of minutes of disconnecting app.
(0002976)
pengbo   
2019-11-22 13:11   
> No, we don't have such issues when app is connected to PostgreSQL directly. Is there any chance that is occuring because we don't have enabled SSL on frontend (app->pgpool)?

I don't think this is the reason.

Could you share pgpool.log including "2019-11-19 18:13:19"?
(0002977)
pdomagala   
2019-11-23 00:10   
I've more recent logs:

APP:
```
2019-11-22T14:51:40.140Z 1 TID-ouq0pxufd WARN: ActiveRecord::StatementInvalid: PG::ConnectionBad: PQconsumeInput() server closed the connection unexpectedly
2019-11-22T14:51:11.062Z 1 TID-ouxkp800l WARN: ActiveRecord::StatementInvalid: PG::ConnectionBad: PQconsumeInput() server closed the connection unexpectedly
```

PGPool and PG logs attached to this comment.
(0002980)
pengbo   
2019-11-27 22:54   
I have checked pgpool.log, but I could not find any errors.
Is the transaction aborted, when the connection from app is disconnected.

Can you confirm the status of the process using "ps auxwww" command, when this error occurs?
(0002981)
pengbo   
2019-11-27 23:35   
> Also we have set `client_idle_limit = 30`, so PGPool should kill idle connections after 30 seconds, but I see there are a lot of unclosed connections even after a couple of minutes of disconnecting app.

"client_idle_limit" is the parameter to kill the idle connection between client and pgpool.
If you want to close connections to backend, you need to config "child_life_time" or "connection_life_time" parameter.
(0003119)
pengbo   
2020-02-03 10:56   
If you have resolved this issue, may I close this one?


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
566 [Pgpool-II] General minor always 2019-12-11 06:21 2020-02-03 10:54
Reporter: sureshreddy21 Platform: Linux  
Assigned To: pengbo OS: Oracle Linux Server release 8.1  
Priority: low OS Version: 4.18.0-147.0.3  
Status: feedback Product Version: 4.1.0  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: Pgpool is not failing over compeltely to First node when 2nd and 3rd nodes were down
Description: We have 3 node PGPOOL-II with watchdog and repmgr configured.
3rd node became the primrary during failovers.
after we bring up First node , it is alwys giving below error.

Dec 10 16:19:17 rac01-prod pgpool[27256]: 2019-12-10 16:19:17: pid 27256: ERROR: failed to make persistent db connection
Dec 10 16:19:17 rac01-prod pgpool[27256]: 2019-12-10 16:19:17: pid 27256: DETAIL: connection to host:"rac03-prod:5432" failed
Dec 10 16:19:17 rac01-prod pgpool[27256]: 2019-12-10 16:19:17: pid 27256: LOG: find_primary_node: make_persistent_db_connection_noerror failed on node 2
Dec 10 16:19:20 rac01-prod pgpool[27256]: 2019-12-10 16:19:20: pid 27256: LOG: failed to connect to PostgreSQL server on "rac03-prod:5432", getsockopt() detected error "No route to host"
Tags:
Steps To Reproduce:
Additional Information:
Attached Files: pgpool.conf (4,640 bytes) 2019-12-11 06:21
https://www.pgpool.net/mantisbt/file_download.php?file_id=706&type=bug
Notes
(0003021)
pengbo   
2019-12-12 18:05   
> after we bring up First node , it is alwys giving below error.

After you start the First node, did you attach this node to pgpool using "pcp_attach_node" command?

Could you show the result of "show pool_nodes" command?

$ psql -h localhost -p 9999 -U postgres -c "show pool_nodes"
(0003025)
sureshreddy21   
2019-12-18 09:30   
I am not attaching other standby nodes explicitly, I am thinking those should take care automatically by failover and follow_master script in /etc/pgpool-II folder and anoterh 2 scripts in $PGDATA like pgpool_remote_start and recovery_st_stage.

[postgres@rac01-prod ~]$ psql -h localhost -p 9999 -U postgres -c "show pool_nodes"
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_stat
us_change
---------+------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+----------
-----------
 0 | rac01-prod | 5432 | up | 0.500000 | primary | 0 | true | 0 | | | 2019-12-1
7 19:29:57
 1 | rac02-prod | 5432 | down | 0.500000 | standby | 0 | false | 0 | | | 2019-12-1
7 19:26:54
(2 rows)
(0003029)
pengbo   
2019-12-24 14:19   
(Last edited: 2019-12-24 14:19)
If you run "pcp_recovery_node" to recover and start a DOWN node, the DOWN node will be attached to cluster automatically.
The "pgpool_remote_start" and "recovery_1st_stage" scripts are used by "pcp_recovery_node" command.

If you recover the DOWN node using the pg_basebackup command, etc., it is necessary to execute "pcp_attach_node"
to manually attach this node to the cluster.
https://www.pgpool.net/docs/latest/en/html/pcp-attach-node.html

(0003033)
pengbo   
2019-12-25 10:41   
There are 3 nodes set in pgpool.conf.
But in the reset of "show pool_nodes", there are only 2 nodes.

Before failover, is the node3 (rac03-prod) started correctly?
(0003118)
pengbo   
2020-02-03 10:54   
If you have resolved this issue, may I close this one?


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
567 [Pgpool-II] General minor always 2019-12-11 07:41 2020-02-03 10:53
Reporter: sureshreddy21 Platform: Linu  
Assigned To: pengbo OS: Oracle Linux 8.1  
Priority: normal OS Version: OEL 8.1  
Status: feedback Product Version: 4.1.0  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: Getting belwo error while bring ing up pgpool and my VIP is not coming up
Description: Dec 10 17:38:09 rac01-prod pgpool[4657]: 2019-12-10 17:38:09: pid 4779: LOG: failed to create watchdog heartbeat receive socket.
Dec 10 17:38:09 rac01-prod pgpool[4657]: 2019-12-10 17:38:09: pid 4779: DETAIL: setsockopt(SO_BINDTODEVICE) requies root privilege
Dec 10 17:38:09 rac01-prod pgpool[4657]: 2019-12-10 17:38:09: pid 4779: LOG: set SO_REUSEPORT option to the socket
Dec 10 17:38:09 rac01-prod pgpool[4657]: 2019-12-10 17:38:09: pid 4779: LOG: creating watchdog heartbeat receive socket.
Dec 10 17:38:09 rac01-prod pgpool[4657]: 2019-12-10 17:38:09: pid 4779: DETAIL: set SO_REUSEPORT
Tags:
Steps To Reproduce:
Additional Information:
Attached Files: pgpool.conf (4,640 bytes) 2019-12-11 07:41
https://www.pgpool.net/mantisbt/file_download.php?file_id=707&type=bug
Notes
(0003020)
pengbo   
2019-12-12 18:02   
Did you start pgpool using "postgres" user or "root" user?
Could you show the result of "ps -ef | grep pgpool"?
(0003024)
sureshreddy21   
2019-12-18 09:28   
I have started pgpool using systemd service, ofcourse this service is using postgres account to bring up pgpool services.

postgres 11786 1 0 19:26 ? 00:00:00 /usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf -D -n
postgres 11794 11786 0 19:26 ? 00:00:00 pgpool: watchdog
postgres 11797 11786 0 19:26 ? 00:00:00 pgpool: lifecheck
postgres 11798 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11799 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11800 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11801 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11802 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11803 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11804 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11805 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11806 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11807 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11808 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11809 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11810 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11811 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11812 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11813 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11814 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11815 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11816 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11817 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11818 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11819 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11820 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11821 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11822 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11823 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11824 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11825 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11826 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11827 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11828 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11829 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11830 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11831 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11832 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11833 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11834 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11835 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11836 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11837 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11838 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11839 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11840 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11841 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11842 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11843 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11844 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11845 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11846 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11847 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11848 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11849 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11850 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11851 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11852 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11853 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11854 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11855 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11856 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11857 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11858 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11859 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11860 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11861 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11862 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11863 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11864 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11865 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11866 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11867 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11868 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11869 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11870 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11871 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11872 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11873 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11874 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11875 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11876 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11877 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11878 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11879 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11880 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11881 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11882 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11883 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11884 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11885 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11886 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11887 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11888 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11889 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11890 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11891 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11892 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11893 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11894 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11895 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11896 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11897 11786 0 19:26 ? 00:00:00 pgpool: wait for connection request
postgres 11899 11797 0 19:26 ? 00:00:00 pgpool: heartbeat receiver
postgres 11900 11797 0 19:26 ? 00:00:00 pgpool: heartbeat sender
postgres 11901 11797 0 19:26 ? 00:00:00 pgpool: heartbeat receiver
postgres 11902 11797 0 19:26 ? 00:00:00 pgpool: heartbeat sender
postgres 11910 11786 0 19:26 ? 00:00:00 pgpool: worker process
postgres 11911 11786 0 19:26 ? 00:00:00 pgpool: health check process(0)
postgres 11912 11786 0 19:26 ? 00:00:00 pgpool: health check process(1)
postgres 12017 11786 0 19:26 ? 00:00:00 pgpool: PCP: wait for connection request
(0003030)
pengbo   
2019-12-24 14:30   
Are you using RPM packages that are built by the pgpool community?
https://pgpool.net/mediawiki/index.php/Downloads

Could you show the package info?
# rpm -qi pgpool-II-pgxxxxxxxx
(0003117)
pengbo   
2020-02-03 10:53   
If you have resolved this issue, may I close this one?


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
570 [Pgpool-II] Bug minor have not tried 2020-01-02 14:33 2020-02-03 10:52
Reporter: sivapgpool89 Platform: Linux  
Assigned To: OS: Redhat  
Priority: normal OS Version: 7.1  
Status: resolved Product Version: 3.7.11  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: Log rotation is not working
Description: Hi,

Good Morning.

We are using pgpool-3.7.11 and configured logging as per the documentation .
Seems log rotation is not working .
please find our configurations.

 #------------------------------------------------------------------------------
# LOGS
#------------------------------------------------------------------------------
# - Where to log -
log_destination = 'stderr,syslog'
                                   # Where to log
                                   # Valid values are combinations of stderr,
                                   # and syslog. Default to stderr.
>>>>
# - Syslog specific -
syslog_facility = 'LOCAL1'
                                   # Syslog local facility. Default to LOCAL0
syslog_ident = 'pgpool'
                                   # Syslog program identification string
                                   # Default to 'pgpool'
>>>>>
$ cat /etc/rsyslog.conf | grep pgpool
#pgpool log
local1.* /var/log/pgpool/pgpool.log

>>>>
$ ls -l /etc/logrotate.d/syslog
-rw-r--r-- 1 root root 251 Oct 5 21:18 /etc/logrotate.d/syslog

>>>>>
$ cat /etc/logrotate.d/syslog | grep pgpool
/var/log/pgpool/pgpool.log

>>>>>
$ ls -tlrh /var/log/pgpool/pgpool.log
-rwxrwxrwx 1 root root 14G Jan 2 10:39 /var/log/pgpool/pgpool.log

Could you please suggest how to resolve this issue.

Thanks.
Siva.
Tags:
Steps To Reproduce:
Additional Information:
Attached Files:
Notes
(0003034)
nglp   
2020-01-02 23:42   
Maybe im wrong, but i think you should rotate using system tools, pgpool will not be able to do if you use syslog facility

For ex, you can add /var/log/pgpool/pgpool.log to /etc/logrotate.d/syslog
(0003035)
sivapgpool89   
2020-01-03 14:04   
Thanks nglp.

i have added already added /var/log/pgpool/pgpool.log to /etc/logrotate.d/syslog.But still log rotation is not happening.

Thanks for update.
(0003036)
nglp   
2020-01-03 17:05   
Hi Siva,

Check your crontab settings, logrotate is executed by cron in redhat systems (check /etc/cron.daily/logrotate)

Also, you can force logrotate manually to try settings, ex: 'logrotate -vf /etc/logrotate.conf' or 'logrotate -v /etc/logrotate.d/syslog'

I recommend you to check man page of logrotate command to see all options available

Best regards, Guille
(0003037)
sivapgpool89   
2020-01-03 22:49   
Thanks for update nglp.
(0003066)
administrator   
2020-01-16 09:49   
If you have resolved this issue, may I close this one?


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
578 [Pgpool-II] Bug major always 2020-01-27 22:45 2020-01-28 15:06
Reporter: giminni Platform: Linux  
Assigned To: pengbo OS: Alpine Linux  
Priority: normal OS Version: 3.11.3  
Status: feedback Product Version: 4.1.0  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: 4.1: Cannot get standby online after a pcp_promote_node, instead getting "... failed to start in 90 second"
Description: Here the log output
...
- PCP: processing promote node
- promoting Node ID 1
- received promote backend request for node_id: 1 from pid [3888]
- starting to select new master node
- starting promotion. promote host /var/run/pgpool(11003)
- starting follow degeneration. shutdown host /var/run/pgpool(11002)
- starting follow degeneration. shutdown host /var/run/pgpool(11004)
- failover: 2 follow backends have been degenerated
- failover: set new primary node: 1
- failover: set new master node: 1
- promotion done. promoted host /var/run/pgpool(11003)
- start triggering follow command.
- execute command: /opt/pgpool/etc/follow_master.sh 0 /var/run/pgpool 11002 /opt/pgpool/data0 1 0 /var/run/pgpool 0 11003 /opt/pgpool/data1
- initializing backend status
- start online recovery
- starting recovering node 0
- starting recovery command: "SELECT pgpool_recovery('basebackup.sh', 'localhost', '/opt/pgpool/data0', '11003', 0, '11002')- executing recovery
- executing recovery, start recovery
- executing recovery, finish recovery
- node recovery, 1st stage is done
- executing remote start
- finish pgpool_remote_start
- checking if postmaster is started
....
Tags: failover, streaming replication
Steps To Reproduce: 1) Execute pgpool_setup -m s -s -n 3 -r -d
2) Change pcppass to "*:11001:postgres:postgres"
3) Copy pcpass to $HOME/.pcppass
4) Chmod 600 $HOME/.pcppass
5) Added "host all all * trust" to pool_hba.conf

5) Execute startall

pgpool port is 11000
pcp port is 11001
0000001 port is 11002
0000002 port is 11003
0000003 port is 11004

6) Execute pcp_promote_node -h 0.0.0.0 -p 11001 -n 1 -w
Additional Information: ARCH: arm64
ORIGIN: docker.io/alpine
CONTAINER: Rootless podman container with podman run -ti --name alps --net=host alpine
DIR: /opt/pgpool
SOCKET: /var/run/pgpool
DB: Postgres V12
MODE: Streaming replication with replication slot
CONFIG: See attachment

*) Due to missing alpine linux, I added the pgpool extension manually
Attached Files: pgpool.conf.tgz (1,070 bytes) 2020-01-27 22:45
https://www.pgpool.net/mantisbt/file_download.php?file_id=726&type=bug
Notes
(0003093)
pengbo   
2020-01-28 15:05   
It is an expected behaviour.

https://www.pgpool.net/docs/latest/en/html/pcp-promote-node.html

| Please note that this command does not actually promote standby PostgreSQL backend:
| it just changes the internal status of Pgpool-II and trigger failover and users have to promote standby PostgreSQL outside Pgpool-II.

If you want to promote any standby node, please stop primary node first,
then pgpool will promote any standby automatically.
To recover the old primary as standby, please run "pcp_recovery_node".

Here is a setup example:
https://www.pgpool.net/docs/latest/en/html/example-cluster.html


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
580 [Pgpool-II] Bug major always 2020-01-28 02:23 2020-01-28 11:45
Reporter: giminni Platform:  
Assigned To: pengbo OS:  
Priority: normal OS Version:  
Status: feedback Product Version: 4.1.0  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: 4.1: BUG: Pgpool_recovery extension protocol has two more parameters
Description: Downloaded and compiled source tarball results in beeing unable to create the pgpool_recovery extension.

Looked into pgpool_recovery.sql I found this:
CREATE OR REPLACE FUNCTION pgpool_recovery(text, text, text, text)

New protocol has two more parameter, so I added integer and text
CREATE OR REPLACE FUNCTION pgpool_recovery(text, text, text, text, integer, text)

After creating the extension I was able to execute pcp_recovery_node
Tags:
Steps To Reproduce:
Additional Information:
Attached Files:
Notes
(0003092)
pengbo   
2020-01-28 11:44   
"pgpool_recovery.sql" is the old version, you need to use "pgpool_recovery--1.3.sql" for the latest version.

$ ll src/sql/pgpool-recovery/
合計 56
-rw-rw-r-- 1 pengbo pengbo 790 1月 28 11:20 Makefile
-rw-rw-r-- 1 pengbo pengbo 10460 1月 28 11:20 pgpool-recovery.c
-rw-rw-r-- 1 pengbo pengbo 533 1月 28 11:12 pgpool-recovery.sql.in
-rw-rw-r-- 1 pengbo pengbo 407 1月 28 11:12 pgpool_recovery--1.0--1.1.sql
-rw-rw-r-- 1 pengbo pengbo 791 1月 28 11:12 pgpool_recovery--1.0.sql
-rw-rw-r-- 1 pengbo pengbo 429 1月 28 11:20 pgpool_recovery--1.1--1.2.sql
-rw-rw-r-- 1 pengbo pengbo 1002 1月 28 11:12 pgpool_recovery--1.1.sql
-rw-rw-r-- 1 pengbo pengbo 447 1月 28 11:20 pgpool_recovery--1.2--1.3.sql
-rw-rw-r-- 1 pengbo pengbo 1243 1月 28 11:20 pgpool_recovery--1.2.sql
-rw-rw-r-- 1 pengbo pengbo 1508 1月 28 11:20 pgpool_recovery--1.3.sql
-rw-rw-r-- 1 pengbo pengbo 178 1月 28 11:20 pgpool_recovery.control
-rw-rw-r-- 1 pengbo pengbo 136 1月 28 11:12 uninstall_pgpool-recovery.sql

$ cat src/sql/pgpool-recovery/pgpool_recovery.control
# pgpool-recovery extension
comment = 'recovery functions for pgpool-II for V4.1 or later'
default_version = '1.3'
...


To install all of the pgpool_recovery function, you can also use "CREATE EXTENSION" instead of executing "pgpool_recovery--1.3.sql".

(1) Install pgpool-recoevry
# cd src/sql/pgpool-recovery/
# make && make install

(2) Create extension on PostgreSQL backend server
# psql
=# CREATE EXTENSION pgpool_recovery;


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
579 [Pgpool-II] Bug major always 2020-01-28 02:16 2020-01-28 11:29
Reporter: giminni Platform: Linux  
Assigned To: pengbo OS: Alpine Linux  
Priority: normal OS Version: 3.11.3  
Status: feedback Product Version: 4.1.0  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version: 4.1.1  
Summary: 4.1: BUG: Follow_master_command parameter mismatch
Description: In chapter 8.3.5.2. Failover configuration

Looking at the code I found mismatched parameter at 5th, 6th and 7th position

failover_command = '/etc/pgpool-II/failover.sh %d %h %p %D %m %H %M %P %r %R %N %S'
follow_master_command = '/etc/pgpool-II/follow_master.sh %d %h %p %D %m %M %H %P %r %R'

Please change to:
follow_master_command = '/etc/pgpool-II/follow_master.sh %d %h %p %D %m %H %M %P %r %R %N %S'
Tags:
Steps To Reproduce:
Additional Information:
Attached Files:
Notes
(0003091)
pengbo   
2020-01-28 11:29   
Thank you for reporting this issue.
It is not a bug.

In follow_master.sh it is the correct parameters passed by "follow_master_command = '/etc/pgpool-II/follow_master.sh %d %h %p %D %m %M %H %P %r %R".
...
OLD_MASTER_NODE_ID="$6"
NEW_MASTER_NODE_HOST="$7"
...

Because it is better to use the same parameter position with "failover_command",
I have changed the docs and sample scripts.

https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=b5db9b42478c455bc2c6784a5bd70b752c4e99bd
https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=3e92e5bf76ed0b07915afa369a8ea9700162c1c6


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
564 [Pgpool-II] Bug crash always 2019-12-06 22:02 2020-01-20 22:10
Reporter: eduarte Platform: Tomcat  
Assigned To: pengbo OS: Centos 7  
Priority: urgent OS Version: 7  
Status: assigned Product Version: 4.0.7  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: PGPool generates error and close the conection where there is an error in the master
Description: I have pgpool-II configured only for load balancing and when there is an error in an insert in the backend0 (master) also it generates an error in the replication node (backend1) and then pgpool closes the conection. We are working with Tomcat 9, PostgreSQL 10 and in the tomcat we use the postgresql-42.2.5.jar as a driver.

THIS IS THE LOG IN THE MASTER
-----------------------------------------
< 2019-12-05 17:12:28 -05 192.168.179.7 sucursal aplivirtual > ERROR: llave duplicada viola restricción de unicidad «userrepository_pkey»
< 2019-12-05 17:12:28 -05 192.168.179.7 sucursal aplivirtual > DETALLE: Ya existe la llave (userguid, repid)=(ec002f47-989c-43bd-8c77-546a2a615b94 , 2).
< 2019-12-05 17:12:28 -05 192.168.179.7 sucursal aplivirtual > SENTENCIA: INSERT INTO gam.UserRepository(UserRepRecPwdAns, UserRepCreDate, UserRepCreUser, UserRepUpdDate, UserRepUpdUser, UserGUID, RepId, UserRepMainRoleId, UserRepSecPolId, UserRepQstUserId) VALUES($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)

THIS IS THE LOG IN STANDBY
---------------------------------------------
< 2019-12-05 17:12:03 COT 192.168.179.7 sucursal aplivirtual > ERROR: no hay un savepoint con ese nombre
< 2019-12-05 17:12:03 COT 192.168.179.7 sucursal aplivirtual > SENTENCIA: ROLLBACK TO SAVEPOINT gxupdate

THIS IS PART OF CATALINA.OUT IN TOMCAT
-----------------------------
org.postgresql.util.PSQLException: FATAL: failed to read kind from backend
  Detail: kind mismatch among backends. Possible last query was: "ROLLBACK TO SAVEPOINT gxupdate" kind details are: 0[C] 1[E: no hay un savepoint con ese nombre]
  Hint: check data consistency among db nodes

We are using the PostgreSQL native replication, and the nodes are uptime correctly

Tags: error, master slave, pgpool in load balancing mode
Steps To Reproduce:
Additional Information:
Attached Files: catalina.2019-12-05.log (17,108 bytes) 2019-12-06 22:02
https://www.pgpool.net/mantisbt/file_download.php?file_id=702&type=bug
pgpool.conf (40,652 bytes) 2019-12-06 22:02
https://www.pgpool.net/mantisbt/file_download.php?file_id=701&type=bug
localhost_access_log.2019-12-05.txt (24,700 bytes) 2019-12-06 22:02
https://www.pgpool.net/mantisbt/file_download.php?file_id=700&type=bug
localhost.2019-12-05.log (48,170 bytes) 2019-12-06 23:23
https://www.pgpool.net/mantisbt/file_download.php?file_id=703&type=bug
salida.log (128,534 bytes) 2019-12-09 23:05
https://www.pgpool.net/mantisbt/file_download.php?file_id=705&type=bug
error master-standby.txt (954 bytes) 2020-01-20 22:10
https://www.pgpool.net/mantisbt/file_download.php?file_id=724&type=bug
Notes
(0003011)
eduarte   
2019-12-06 23:23   
File localhost from tomcat
(0003013)
pengbo   
2019-12-09 15:28   
I want to check the query sent by client.

If it is possible, could you enable "log_client_messages" parameter, and run that query again to log the client query to pgpool.log?
-----
log_client_messages = on
-----
(0003015)
eduarte   
2019-12-09 23:05   
Hi pengbo, here I add the output file. We are use Genexus as development tool, then we can´t manipulate the querys that this tool generates
(0003018)
pengbo   
2019-12-11 15:46   
Thank you.

I found the "kind mismatch" error in pgpool.log.

===
dic 09 08:53:58 contenedorwebapp1.crediservir.com pgpool[41460]: 2019-12-09 08:53:58: pid 41488: FATAL: failed to read kind from backend
dic 09 08:53:58 contenedorwebapp1.crediservir.com pgpool[41460]: 2019-12-09 08:53:58: pid 41488: DETAIL: kind mismatch among backends. Possible last query was: "ROLLBACK TO SAVEPOINT gxupdate" kind details are: 0[C] 1[E: no hay un savepoint con ese nombre]
===

Normally SAVEPOINT are sent to both primary and standby.
May I see the both PostgreSQL primary and standby log?
(0003076)
eduarte   
2020-01-20 22:10   
Hi pengbo, sorry but I didn't see the previous message. Here I add file with 2 messages one in master and another in standby.

This files are of december 19, in this moment I can't generate more test because we are in production with our application, and the solution was we don't use pgpool and pointing directly to the database.

Well, I could, but I have to configure this setting in a test environment, if you need it, let me know


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
576 [Pgpool-II] Bug trivial always 2020-01-17 02:50 2020-01-19 11:43
Reporter: dcvythoulkas Platform: Postgresql 12  
Assigned To: t-ishii OS: Debian  
Priority: normal OS Version: 10.2  
Status: resolved Product Version: 4.1.0  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: getsockopt() detected error "Connection refused"
Description: I install pgpool-II version 4.1.0 (karasukiboshi) from http://apt.postgresql.org/pub/repos/apt repo. I also install Postgresql 12.
I configured a 3 node streaming replication cluster with physical replication slots (no archiving) and used modified versions of the failover, followmaster, etc, from the example here https://www.pgpool.net/docs/latest/en/html/example-cluster.html#EXAMPLE-CLUSTER-PGPOOL-CONFIG.

The setup works. Replication, healthchecks, watchdog, failover, online recovery, loadbalancing everything I tested (pg_rewind not yet because I am still using the initdb from Debian, so I use the pg_basebackup way).

The only thing I cannot explain is why the following appears in the syslog of all servers. It appears for every server for its own local database:

Jan 16 17:28:42 debian10 pgpool[10477]: 2020-01-16 17:28:42: pid 10523: LOG: failed to connect to PostgreSQL server on "pg2:5433", getsockopt() detected error "Connection refused"
Jan 16 17:28:42 debian10 pgpool[10477]: 2020-01-16 17:28:42: pid 10523: LOCATION: pool_connection_pool.c:680

Both the intervals and the PID mentioned point to the healtheck as the culprit, however healthcheck appears to be working properly and instantly detects any problem that I manually create.

Is it something that I can ignore? Is there something I miss configured?
I have attached the pool_hba.conf (same for everyone) and every pgpool.conf from each node.
Tags: logs, pgpool health check, settings
Steps To Reproduce:
Additional Information:
Attached Files: pgpool3.conf (44,432 bytes) 2020-01-17 02:50
https://www.pgpool.net/mantisbt/file_download.php?file_id=721&type=bug
pgpool2.conf (44,432 bytes) 2020-01-17 02:50
https://www.pgpool.net/mantisbt/file_download.php?file_id=720&type=bug
pool_hba.conf (3,413 bytes) 2020-01-17 02:50
https://www.pgpool.net/mantisbt/file_download.php?file_id=719&type=bug
pgpool.conf (44,432 bytes) 2020-01-17 02:50
https://www.pgpool.net/mantisbt/file_download.php?file_id=718&type=bug
pg1_syslog (21,055 bytes) 2020-01-17 17:31
https://www.pgpool.net/mantisbt/file_download.php?file_id=723&type=bug
pg1_postgres_log (41,923 bytes) 2020-01-17 17:31
https://www.pgpool.net/mantisbt/file_download.php?file_id=722&type=bug
Notes
(0003067)
t-ishii   
2020-01-17 16:35   
I guess health check succeeds after some retries because you said the heath check feature seems to be working. I need to confirm this by looking at the health check log in syslog. Can we share the log?
(0003068)
dcvythoulkas   
2020-01-17 17:31   
Thanks for the rapid reply. I've attached extracts from syslog and postgresql log of the same time from 1 of the nodes. The whole setup is configured from my custom ansible roles. Equivalent logs exist on the other two nodes. The following is what i think to be the issue from the postgres log. The unknown@unknown login attempt followed by the proper pgpool@postgres

2020-01-17 07:57:11.319 UTC [7980] pg1(55164)[unknown][unknown]@[unknown] LOG: 00000: connection received: host=pg1 port=55164
2020-01-17 07:57:11.319 UTC [7980] pg1(55164)[unknown][unknown]@[unknown] LOCATION: BackendInitialize, postmaster.c:4296
2020-01-17 07:57:11.328 UTC [7980] pg1(55164)[unknown]pgpool@postgres LOG: 00000: connection authorized: user=pgpool database=postgres
2020-01-17 07:57:11.328 UTC [7980] pg1(55164)[unknown]pgpool@postgres LOCATION: PerformAuthentication, postinit.c:303
2020-01-17 07:57:11.370 UTC [7980] pg1(55164)[unknown]pgpool@postgres LOG: 00000: disconnection: session time: 0:00:00.050 user=pgpool database=postgres host=pg1 port=55164
2020-01-17 07:57:11.370 UTC [7980] pg1(55164)[unknown]pgpool@postgres LOCATION: log_disconnections, postgres.c:4666
(0003069)
t-ishii   
2020-01-18 07:12   
I looked into pg1syslog and found a strange thing. It seems pgpool's health check process behaves as if it successfully connected to backend despite the getsockopt() error. Moreover, PostgreSQL backend accept the connection from health check process and does the authentication negotiation. I expected health check retry after getsockopt() error in this situation, and is also expected from the source code.
Is it possible that in your debian package the source code is modified? Especially I suspect this(src/protocol/pool_connection_pool.c line 676):
                    /* Non Solaris case */
                    if (error != 0)
                    {
                        ereport(LOG,
                                (errmsg("failed to connect to PostgreSQL server on \"%s:%d\", getsockopt() detected error \"%s\"", host, port, strerror(error))));
                        return false; <--- maybe replaced with "return true;"?
                    }
(0003070)
dcvythoulkas   
2020-01-18 07:47   
If you check the pg1_postres_log you will that PostgreSQL did actually accepted a login connection, so healthcheck did actually manage to successfully connect.
Now regarding the source, as I mentioned in the initial report, I used binary packages from http://apt.postgresql.org/pub/repos/apt . I did not compile anything from source, nor do I have the knowledge to do so.

Also if I try stopping the backend, the healthcheck immediately notices and proceeds with quarantine, etc.
If the code had changed as you suggest, then I think that healthcheck would fail to notice the change.
(0003071)
t-ishii   
2020-01-18 09:16   
Yes, I know you used binary packages. My guess was debian packagers patched the source code then created the binary package. I am not familiar with debian packages but there's should be a way to get the actual source code to be used to build debian packages (like source packages provided in RPM). If so, I could be able to confirm my speculation.

> Also if I try stopping the backend, the healthcheck immediately notices and proceeds with quarantine, etc.
If the code had changed as you suggest, then I think that healthcheck would fail to notice the change.

If you stop PostgreSQL by using pg_ctl or other way and you connect to pgpool at the time, pgpool detects it by receiving a notice message from PostgreSQL. The health check process is not involved here. Also please note that there are multiple code path that detects failure in connecting to backend in health check and it is possible that health check detects the connection error in different ways.
(0003072)
t-ishii   
2020-01-18 10:34   
(Last edited: 2020-01-18 11:00)
I found this page:
https://sources.debian.org/patches/pgpool2/4.1.0-1/
It seems there's no patch for pool_connection_pool.c. So my theory fails:-<

Another theory is, getaddrinfo() for "pg1" returns multiple network address info, e.g. IPV4 and IPV6 address. In this case health check first tries IPV4 address. If it fails with the error you are seeing, then it tries IPV6 address. If it succeeds, health check succeeds. This theory explains the situation.

If my theory is correct, you could avoid the error by enabling both IPV4 and IPV6 address of PostgreSQL.

(0003073)
dcvythoulkas   
2020-01-19 02:43   
Your theory is correct. But it was not the IPv6, it was name resolution in the /etc/hosts file and the postgresql.conf.

Specifically, the /etc/hosts was like this:
vagrant@pg1:~$ cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 pg1
....
10.10.20.11 pg1
....

And postgresql.conf like this:
....
listen_addresses = 'localhost,10.10.20.11' # what IP address(es) to listen on;
....

So when healthcheck tried to reach the local node on every machine, it would check via hostname (pg1, pg2, etc). With the hosts file above that leeds to 127.0.1.1 that postgresql doesn't listen, so it would fail, hence the error, and then try with the 10.10.20.11 which would be successful. That also explains why there was no error from healthchecks of remote hosts.

I changed the hosts file to the following:
vagrant@pg1:~$ cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 pg1-local
....
10.10.20.11 pg1
....

And the logs are clear.
Thank you for your quick response and support. You can close this issue.
(0003074)
t-ishii   
2020-01-19 08:24   
Thanks for the feedback. I am going to close this issue.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
574 [Pgpool-II] Bug major always 2020-01-12 18:32 2020-01-13 08:22
Reporter: raj.pandey1982@gmail.com Platform: Linux  
Assigned To: t-ishii OS: centos  
Priority: high OS Version: x86_64 x86_64 x8  
Status: resolved Product Version: 4.1.0  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: waiting for the quorum to start escalation process
Description: DB: Postgres:11.5
PGPOOL II 4.1.0

I have 2 postgres master-slave nodes each having pgpoolI 4.1.0 configured as Master and stand by as well and 1 virtual IP.
when pgpool on node 1 or node 2 is made down the VIP get released but not getting acquired by other one.

If node 1 made down and VIP gor released then Node 2 Log says:-

2020-01-12 11:56:52: pid 16567:LOG: I am the cluster leader node but we do not have enough nodes in cluster
2020-01-12 11:56:52: pid 16567:DETAIL: waiting for the quorum to start escalation process

Now when i start pgpool at node 1 again then only Node2 acquires the VIP. so main issue is other node not holding the QUORUM until unless i again start
the previously stopped node as stand by.

The old suggestion (#systemctl stop firewalld) i tried but did not help.

Also i have an other instance of 4.0.1 with same kind of 2 nodes setup but IP release and acquiring is going thrugh well.

Below is full log of node2 :

2020-01-12 11:43:09: pid 16565:LOG: waiting for watchdog to initialize
2020-01-12 11:43:09: pid 16567:LOG: setting the local watchdog node name to "pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal"
2020-01-12 11:43:09: pid 16567:LOG: watchdog cluster is configured with 1 remote nodes
2020-01-12 11:43:09: pid 16567:LOG: watchdog remote node:0 on pgpool-poc02.novalocal:9000
2020-01-12 11:43:09: pid 16567:LOG: interface monitoring is disabled in watchdog
2020-01-12 11:43:09: pid 16567:LOG: watchdog node state changed from [DEAD] to [LOADING]
2020-01-12 11:43:09: pid 16567:LOG: new outbound connection to pgpool-poc02.novalocal:9000
2020-01-12 11:43:09: pid 16567:LOG: setting the remote node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" as watchdog cluster master
2020-01-12 11:43:09: pid 16567:LOG: watchdog node state changed from [LOADING] to [INITIALIZING]
2020-01-12 11:43:09: pid 16567:LOG: new watchdog node connection is received from "10.70.184.28:25794"
2020-01-12 11:43:09: pid 16567:LOG: new node joined the cluster hostname:"pgpool-poc02.novalocal" port:9000 pgpool_port:5433
2020-01-12 11:43:09: pid 16567:DETAIL: Pgpool-II version:"4.1.0" watchdog messaging version: 1.1
2020-01-12 11:43:10: pid 16567:LOG: watchdog node state changed from [INITIALIZING] to [STANDBY]
2020-01-12 11:43:10: pid 16567:LOG: successfully joined the watchdog cluster as standby node
2020-01-12 11:43:10: pid 16567:DETAIL: our join coordinator request is accepted by cluster leader node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-01-12 11:43:10: pid 16565:LOG: watchdog process is initialized
2020-01-12 11:43:10: pid 16565:DETAIL: watchdog messaging data version: 1.1
2020-01-12 11:43:10: pid 16565:LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog
2020-01-12 11:43:10: pid 16567:LOG: new IPC connection received
2020-01-12 11:43:10: pid 16567:LOG: new IPC connection received
2020-01-12 11:43:10: pid 16565:LOG: watchdog cluster now holds the quorum
2020-01-12 11:43:10: pid 16565:DETAIL: updating the state of quarantine backend nodes
2020-01-12 11:43:10: pid 16567:LOG: new IPC connection received
2020-01-12 11:43:10: pid 16567:LOG: new IPC connection received
2020-01-12 11:43:10: pid 16569:LOG: 2 watchdog nodes are configured for lifecheck
2020-01-12 11:43:10: pid 16569:LOG: watchdog nodes ID:0 Name:"pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal"
2020-01-12 11:43:10: pid 16569:DETAIL: Host:"pgpool-poc01.novalocal" WD Port:9000 pgpool-II port:5433
2020-01-12 11:43:10: pid 16569:LOG: watchdog nodes ID:1 Name:"pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-01-12 11:43:10: pid 16569:DETAIL: Host:"pgpool-poc02.novalocal" WD Port:9000 pgpool-II port:5433
2020-01-12 11:43:10: pid 16565:LOG: we have joined the watchdog cluster as STANDBY node
2020-01-12 11:43:10: pid 16565:DETAIL: syncing the backend states from the MASTER watchdog node
2020-01-12 11:43:10: pid 16567:LOG: new IPC connection received
2020-01-12 11:43:10: pid 16567:LOG: received the get data request from local pgpool-II on IPC interface
2020-01-12 11:43:10: pid 16567:LOG: get data request from local pgpool-II node received on IPC interface is forwarded to master watchdog node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-01-12 11:43:10: pid 16567:DETAIL: waiting for the reply...
2020-01-12 11:43:10: pid 16569:LOG: watchdog lifecheck trusted server "mohvcasdb01.novalocal" added for the availability check
2020-01-12 11:43:10: pid 16569:LOG: watchdog lifecheck trusted server "mohcasdevdb.novalocal" added for the availability check
2020-01-12 11:43:10: pid 16565:LOG: master watchdog node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" returned status for 2 backend nodes
2020-01-12 11:43:10: pid 16565:LOG: Setting up socket for 0.0.0.0:5433
2020-01-12 11:43:10: pid 16565:LOG: Setting up socket for :::5433
2020-01-12 11:43:10: pid 16604:LOG: PCP process: 16604 started
2020-01-12 11:43:10: pid 16565:LOG: pgpool-II successfully started. version 4.1.0 (karasukiboshi)
2020-01-12 11:43:10: pid 16565:LOG: node status[0]: 0
2020-01-12 11:43:10: pid 16565:LOG: node status[1]: 0
2020-01-12 11:43:11: pid 16570:LOG: createing watchdog heartbeat receive socket.
2020-01-12 11:43:11: pid 16570:DETAIL: bind receive socket to device: "eth0"
2020-01-12 11:43:11: pid 16570:LOG: set SO_REUSEPORT option to the socket
2020-01-12 11:43:11: pid 16570:LOG: creating watchdog heartbeat receive socket.
2020-01-12 11:43:11: pid 16570:DETAIL: set SO_REUSEPORT
2020-01-12 11:43:11: pid 16571:LOG: creating socket for sending heartbeat
2020-01-12 11:43:11: pid 16571:DETAIL: bind send socket to device: eth0
2020-01-12 11:43:11: pid 16571:LOG: set SO_REUSEPORT option to the socket
2020-01-12 11:43:11: pid 16571:LOG: creating socket for sending heartbeat
2020-01-12 11:43:11: pid 16571:DETAIL: set SO_REUSEPORT
2020-01-12 11:55:14: pid 16604:LOG: forked new pcp worker, pid=17365 socket=7
2020-01-12 11:55:14: pid 16567:LOG: new IPC connection received
2020-01-12 11:55:14: pid 16604:LOG: PCP process with pid: 17365 exit with SUCCESS.
2020-01-12 11:55:14: pid 16604:LOG: PCP process with pid: 17365 exits with status 0
2020-01-12 11:56:43: pid 16567:LOG: remote node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" is shutting down
2020-01-12 11:56:43: pid 16567:LOG: watchdog cluster has lost the coordinator node
2020-01-12 11:56:43: pid 16567:LOG: removing the remote node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" from watchdog cluster master
2020-01-12 11:56:43: pid 16567:LOG: We have lost the cluster master node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-01-12 11:56:43: pid 16567:LOG: watchdog node state changed from [STANDBY] to [JOINING]
2020-01-12 11:56:47: pid 16567:LOG: watchdog node state changed from [JOINING] to [INITIALIZING]
2020-01-12 11:56:48: pid 16567:LOG: I am the only alive node in the watchdog cluster
2020-01-12 11:56:48: pid 16567:HINT: skipping stand for coordinator state
2020-01-12 11:56:48: pid 16567:LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
2020-01-12 11:56:48: pid 16567:LOG: I am announcing my self as master/coordinator watchdog node
2020-01-12 11:56:52: pid 16567:LOG: I am the cluster leader node
2020-01-12 11:56:52: pid 16567:DETAIL: our declare coordinator message is accepted by all nodes
2020-01-12 11:56:52: pid 16567:LOG: setting the local node "pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal" as watchdog cluster master
2020-01-12 11:56:52: pid 16567:LOG: I am the cluster leader node but we do not have enough nodes in cluster
2020-01-12 11:56:52: pid 16567:DETAIL: waiting for the quorum to start escalation process
2020-01-12 11:56:52: pid 16567:LOG: new IPC connection received
Tags:
Steps To Reproduce:
Additional Information:
Attached Files:
Notes
(0003059)
t-ishii   
2020-01-12 20:09   
You need to turn on enable_consensus_with_half_votes if you have only 2 watchdog nodes if you want that consensus on failover requires only half of the total number of votes (= 1 node) in Pgpool-II 4.1 or later. ( we recommend to have 3 and more even number of nodes).
(0003060)
raj.pandey1982@gmail.com   
2020-01-12 20:48   
Turning on the said parameter, enable_consensus_with_half_votes worked !!!!!!! and two node cluster pgpool fail-over is started happening. Thanks a lot for quick resolution. You Rock !!!.
(0003061)
t-ishii   
2020-01-13 08:22   
Glad to hear that. I am going to mark this issue as "resolved".


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
572 [Pgpool-II] Bug major random 2020-01-05 16:37 2020-01-08 11:38
Reporter: zhuguangzhi9527 Platform:  
Assigned To: hoshiai OS:  
Priority: high OS Version:  
Status: feedback Product Version: 3.6.6  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: ERROR: unable to read message kind. DETAIL: kind does not match between
Description: I seems got the same bug of No.000231 in pgpool3.6.6 & postgres9.6.5,how can i get patches of No.000231 to fix the bug
there are the keys of the bug :
        ERROR: unable to read message kind
        DETAIL: kind does not match between master(45) slot[1] (52)
Tags: pgpool
Steps To Reproduce: like No.000231
Additional Information: like No.000231
Attached Files:
Notes
(0003047)
hoshiai   
2020-01-08 11:38   
OK ,I can't think only the above information.

Please share your pgpool.log when this problem is happend and if possible, set log_min_messages parameter to 'debug5'.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
548 [Pgpool-II] Bug minor always 2019-09-27 14:14 2019-12-24 09:08
Reporter: harukat Platform:  
Assigned To: hoshiai OS:  
Priority: low OS Version:  
Status: resolved Product Version: 4.1 beta1  
Product Build: Resolution: fixed  
Projection: none      
ETA: none Fixed in Version: 4.1 RC1  
    Target Version: 4.1 RC1  
Summary: syslog_facility couldn't be changed by reload
Description: The parameter "syslog_facility" couldn't be changed by Pgpool-II reload, even though
the document say "this parameter can be changed by reloading the Pgpool-II configurations."
Tags:
Steps To Reproduce:
Additional Information:
Attached Files: pgpool2_41_syslog_reloadable.patch (806 bytes) 2019-09-29 09:50
https://www.pgpool.net/mantisbt/file_download.php?file_id=669&type=bug
bug548_pgpool_4_1.patch (3,242 bytes) 2019-10-01 20:03
https://www.pgpool.net/mantisbt/file_download.php?file_id=670&type=bug
Notes
(0002889)
harukat   
2019-09-27 16:25   
When I set the C preprocessor variables HAVE_SYSLOG for build like the following,
log messages flowed as new syslog facility after pgpool reloading.

  $ CFLAGS="-g -DHAVE_SYSLOG=1" ./configure --prefix /home/postgres/tmp/pgpool2_41 \
     --with-openssl --with-memcached=/usr --with-pgsql=/home/postgres/pgsql/11.3

But even if it was a build with HAVE_SYSLOG, the command "SHOW syslog_facility"
answered the old syslog_facility value after reloding.

test environment:
 CentOS6.x, x86_64, glibc-2.12-1.209.el6_9.2, rsyslog-5.8.10-8.el6
(0002891)
harukat   
2019-09-27 16:43   
At least, I think we need to fix configure.ac like this.

  - AC_CHECK_FUNCS([strlcat, strlcpy])
  + AC_CHECK_FUNCS([strlcat, strlcpy, syslog])
(0002894)
harukat   
2019-09-29 09:50   
This is a patch to fix this bug.
(0002897)
hoshiai   
2019-10-01 20:03   
Thank you your report and patch.
You are right, 'HAVE_SYSLOG' was not defined properly.
I fixed your patch a little. This change vsyslog to syslog because vsyslog is not used.
(0002911)
hoshiai   
2019-10-08 13:17   
I commited this fix.

https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=934d78132cfd2202058f0abd99e97be02244285a
(0003027)
hoshiai   
2019-12-24 09:08   
This fix has already been fixed, so i change status of ticket to resolved


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
536 [Pgpool-II] Bug minor always 2019-08-09 20:11 2019-11-27 16:39
Reporter: gmartinez Platform: Linux  
Assigned To: hoshiai OS: CentOS  
Priority: normal OS Version: 7  
Status: feedback Product Version: 3.7.10  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: Recovery of standby node in streaming replication mode causes all connections to be dropped
Description: When we run pcp_recovery_node on a standby node while in streaming replication, it seems that pgpool drops all connections to the database now, causing errors in the application. We have load balance disabled and only use pgpool for the failover ability.

Previously, we were on 3.7.7 and this behavior did not occur.
Tags:
Steps To Reproduce: 1) Stop postgresql on standby node
2) wait for pgpool to notice it's down
3) Start postgresql
4) Run pcp_recovery_node on the downed postgresql
5) Note that all children restarted and the frontend application has dropped connections.
Additional Information: Log events:

Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 624658: LOG: node recovery, node: 4 restarted
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 624658: LOG: received failback request for node_id: 4 from pid [624658]
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: new IPC connection received
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: watchdog received the failover command from local pgpool-II on IPC interface
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: watchdog is processing the failover command [FAILBACK_REQUEST] received from local pgpool-II on IPC interface
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: The failover request does not need quorum to hold
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: DETAIL: proceeding with the failover
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: HINT: REQ_DETAIL_CONFIRMED
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: Pgpool-II parent process has received failover request
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598089: WARNING: failover/failback is in progress
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598089: DETAIL: executing failover or failback on backend
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598089: HINT: In a moment you should be able to reconnect to the database
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597318: WARNING: failover/failback is in progress
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597318: DETAIL: executing failover or failback on backend
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597318: HINT: In a moment you should be able to reconnect to the database
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: new IPC connection received
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597971: WARNING: failover/failback is in progress
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597971: DETAIL: executing failover or failback on backend
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597971: HINT: In a moment you should be able to reconnect to the database
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597993: WARNING: failover/failback is in progress
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597993: DETAIL: executing failover or failback on backend
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597993: HINT: In a moment you should be able to reconnect to the database
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598089: WARNING: failover/failback is in progress
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598089: DETAIL: executing failover or failback on backend
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598089: HINT: In a moment you should be able to reconnect to the database
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: received the failover indication from Pgpool-II on IPC interface
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597318: WARNING: failover/failback is in progress
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597318: DETAIL: executing failover or failback on backend
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597318: HINT: In a moment you should be able to reconnect to the database
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: watchdog is informed of failover start by the main process
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597971: WARNING: failover/failback is in progress
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597971: DETAIL: executing failover or failback on backend
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597971: HINT: In a moment you should be able to reconnect to the database
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597993: WARNING: failover/failback is in progress
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597993: DETAIL: executing failover or failback on backend
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597993: HINT: In a moment you should be able to reconnect to the database
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: starting fail back. reconnect host db005(5432)
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: Node 0 is not down (status: 2)
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: Do not restart children because we are failing back node id 4 host: db005 port: 5432 and we are in streaming replication mode and not all backends were down
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: find_primary_node_repeatedly: waiting for finding a primary node
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: find_primary_node: checking backend no 0
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: find_primary_node: checking backend no 1
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: find_primary_node: checking backend no 2
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598130: WARNING: failover/failback is in progress
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598130: DETAIL: executing failover or failback on backend
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598130: HINT: In a moment you should be able to reconnect to the database
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598130: WARNING: failover/failback is in progress
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598130: DETAIL: executing failover or failback on backend
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598130: HINT: In a moment you should be able to reconnect to the database
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: find_primary_node: checking backend no 3
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: find_primary_node: primary node id is 3
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: failover: set new primary node: 3
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: failover: set new master node: 0
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: new IPC connection received
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: received the failover indication from Pgpool-II on IPC interface
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: watchdog is informed of failover end by the main process
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: failback done. reconnect host db005(5432)
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: start health check process for host db005(5432)
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: start health check process for host db005(5432)
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 624658: LOG: recovery done
Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 614305: LOG: failback event detected
Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 614305: DETAIL: restarting myself
Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 598356: LOG: restart request received in pcp child process
Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 597088: LOG: PCP child 598356 exits with status 0 in failover()
Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 597088: LOG: fork a new PCP child pid 624833 in failover()
Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 617874: LOG: failback event detected
Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 617874: DETAIL: restarting myself
Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 597137: LOG: failback event detected
(repeat failback many times)
Attached Files: pgpool.conf (8,041 bytes) 2019-08-09 20:11
https://www.pgpool.net/mantisbt/file_download.php?file_id=632&type=bug
Notes
(0002796)
hoshiai   
2019-08-22 14:31   
(Last edited: 2019-08-22 14:33)
When pgpool execute failover or failoback, alll backend process of pgpool is restarted.
But if target is standby and standby is not loadbalance, pgpool is waiting session finished before restarting process.

The following log is shown, when the to restart process waited until that session finished:
  LOG: failback event detected
  DETAIL: restarting myself

I saw this message in your log, so I don't understand the problem that your connection was dropped happend.
Could you share all log, when this problem is happened and log_min_messages set to 'DEBUG5' ?

(0002801)
gmartinez   
2019-08-27 07:36   
I will set the logging to DEBUG5 in our lab and try to reproduce there.
(0002803)
hoshiai   
2019-08-27 14:35   
Thank you. I wait your trying result.
(0002873)
gmartinez   
2019-09-21 08:18   
So I was able to reproduce this behavior on my CentOS 6 lab as well and I noticed that shutting down postgresql on a standby node or re-attaching a downed node would also trigger this behavior. I did see errors in the application: Caused by: org.postgresql.util.PSQLException: This connection has been closed.
and saw the same behavior I mentioned earlier in the pgpool logs.


DEBUG5 logging was extremely verbose (and had query logs!), so I'm not sure how to capture the logs without exposing data while the application is running.
(0002979)
hoshiai   
2019-11-27 16:39   
Sorry, for the very very late for reply.

I investigated your ploblem.
In conclusion, this problem(pgpool session is reset during failover/failback) is intended behavior as pgpool community. This behavior has safety for not causing more bigger problem(i.g. segment fault) during failover/failback.

This design is mainly for failover, so
In the future, I would like to fix that failback(online recovery) will be not caused this behavior.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
557 [Pgpool-II] Bug minor always 2019-11-14 23:48 2019-11-20 21:47
Reporter: nglp Platform: Linux  
Assigned To: t-ishii OS: Red Hat  
Priority: high OS Version: 7.6  
Status: resolved Product Version: 4.0.7  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version: 4.0.8  
Summary: Syslog settings ignored
Description: Hi,

We've detected in pgpool 4.0.7 syslog settings are being ignored/not working. In 4.0.6 it works properly

We've syslog configured exactly like https://www.pgpool.net/docs/latest/en/html/example-cluster.html

If we downgrade packages to 4.0.6 everything works as expected

Let me know if you need anything
Tags:
Steps To Reproduce: Following configuration:
log_destination = 'syslog'
syslog_facility = 'LOCAL1'
Additional Information:
Attached Files:
Notes
(0002963)
tmartincpp   
2019-11-15 18:58   
Hi,

we have the same issue with pgpool 3.7.12.

Rollbacking to 3.7.11 solves the problem

Our configuration:

log_destination = 'syslog'
syslog_facility = 'LOCAL2'
syslog_ident = 'pgpool-II'

When reloading only the following message:
"Done."

is displayed.
(0002964)
nglp   
2019-11-15 19:08   
Looks like to be related to this fix: https://www.pgpool.net/mantisbt/view.php?id=548

Configuring in stderr mode works as expected
(0002965)
t-ishii   
2019-11-17 10:50   
It appears that there's a packaging issue with 3.7.12 (and other latest minor releases). i.e. it missed to update the configure script when configure.ac (the template file for configure) got updated to solve the issue you reffered. This caused the problem you are seeing with RPM package.

Developers are planning to release updated RPM packages in early next week.
(0002970)
t-ishii   
2019-11-19 22:31   
Updated RPMs have been just released.

https://pgpool.net/mediawiki/index.php/Main_Page#Pgpool-II_4.1.0.2C_4.0.7.2C_3.7.12.2C_3.6.19.2C_3.5.23_and_3.4.26_RPMs_Update_2_officially_released_.282019.2F11.2F19.29
(0002972)
nglp   
2019-11-20 20:57   
thank you, its solved in 4.0.7-2
(0002973)
t-ishii   
2019-11-20 21:47   
Ok, mark this issue as resolved.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
555 [Pgpool-II] Bug block always 2019-11-04 01:25 2019-11-05 08:57
Reporter: GeorgBerlin Platform: Virtual Machine / KVM  
Assigned To: OS: CentOS Linux  
Priority: urgent OS Version: 7 (core)  
Status: resolved Product Version: 4.1.0  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: pgpool does not start without any notice even in debug mode
Description: By the help of Salt I set up a Postgres cluster with two instances. PGPool is used to handle the streaming replication and the failover.
Since approximately six days, the last time when when the full installation worked, PGPool does not start in the new created setups.

-> pgpool-II version
    pgpool-II-12-4.1.0-1.rhel7.x86_64
    pgpool-II-12-extensions-4.1.0-1.rhel7.x86_64
    pgpool-II-12-debuginfo-4.1.0-1.rhel7.x86_64
    pgpool-II-12-devel-4.1.0-1.rhel7.x86_64

-> PostgreSQL version
    postgresql12-12.0-1PGDG.rhel7.x86_64
    postgresql12-plperl-12.0-1PGDG.rhel7.x86_64
    postgresql12-devel-12.0-1PGDG.rhel7.x86_64
    postgresql12-libs-12.0-1PGDG.rhel7.x86_64
    postgresql12-server-12.0-1PGDG.rhel7.x86_64
    postgresql12-contrib-12.0-1PGDG.rhel7.x86_64
Tags: pgpool-II, streaming replication
Steps To Reproduce: Basic installation of CentOS
Adjusting the Repositories to use the Repos for Postgres and PGPool
Installing Postgres and PGPool

pgpool-configs for master and standby attached

Project is published at
https://github.com/ionos-enterprise/automated-ha-setup
If it is needed I can start the script to create a virtual data center right away and import public keys for access at any time.
Additional Information: It is my first post on this mailing list and I am quit unsure if I may have overseen something in my Postgres and PGPool setup.
Nevertheless when something went wrong in the past I got some log message of any kind. This time not.
The classification as 'Blocker' and priority 'Immediate' explains how I am facing the issue.

Hopefully you can help me fixing that.


Many thanks in advance

Georg

Attached Files: pgpool.conf_Standby (43,860 bytes) 2019-11-04 01:25
https://www.pgpool.net/mantisbt/file_download.php?file_id=686&type=bug
pgpool.conf_Master (43,861 bytes) 2019-11-04 01:25
https://www.pgpool.net/mantisbt/file_download.php?file_id=685&type=bug
Notes
(0002953)
GeorgBerlin   
2019-11-04 01:58   
The streaming replication is done by Postgres itself, not by PGPool.
(0002954)
GeorgBerlin   
2019-11-04 02:14   
Still investigating. PGPool is now running.
(0002955)
GeorgBerlin   
2019-11-04 03:03   
For Postgresql10 and PGpool10 I still observer the aforementioned behavior. For Postgresql12 and PGPool12 the program starts and waits for connections.
(0002956)
GeorgBerlin   
2019-11-04 05:45   
Sorry for bothering you. I found the culprit with strace, it was an incorrect entry in the pgpool.conf
Please consider this 'bug'-report as solved.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
547 [Pgpool-II] Bug major always 2019-09-12 16:37 2019-10-31 18:41
Reporter: harukat Platform:  
Assigned To: Muhammad Usama OS:  
Priority: normal OS Version:  
Status: assigned Product Version: 3.7.11  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version: 3.7.12  
    Target Version: 3.7.12  
Summary: We need to do arping again after recovering from split-brain.
Description: Pgpool-II should do arping again (or wd_IP_down() and wd_IP_up() again)
in the master/coordinator node after recovering from split-brain.
In the following scenario, Pgpool-II doesn't work well.

scenario:

1. There are watchdog cluster nodes: n1, n2, and n3.
    n3 is master/coordinator. VIP is set on n3.

2. Network trouble occurs.
    n1 and n2 decide that n3 is down.
   n2 become new master/coordinator, VIP is set on n2 with arping.

3. Network recovers.
    n1, n2, and n3 notice split-brain status.
    They decide n3 is best master/coordinator.
    n2 resigns master/coordinator. VIP is released on n2.

4. VIP has been set on n3, but ARP table says n2 is VIP target yet.
Tags:
Steps To Reproduce:
Additional Information:
Attached Files: lost_arping_case.log (239,651 bytes) 2019-09-17 11:10
https://www.pgpool.net/mantisbt/file_download.php?file_id=655&type=bug
pgpool2_V3_7_STSBLE_arping_again.patch (2,390 bytes) 2019-09-27 12:19
https://www.pgpool.net/mantisbt/file_download.php?file_id=668&type=bug
watchdog_node_lost_fix.diff (54,091 bytes) 2019-10-02 23:07
https://www.pgpool.net/mantisbt/file_download.php?file_id=672&type=bug
011.watchdog_quorum_failover (8,474 bytes) 2019-10-02 23:28
https://www.pgpool.net/mantisbt/file_download.php?file_id=673&type=bug
pgpool-logs.tar.gz (5,493 bytes) 2019-10-02 23:32
https://www.pgpool.net/mantisbt/file_download.php?file_id=674&type=bug
test_20191007.tgz (113,231 bytes) 2019-10-08 13:31
https://www.pgpool.net/mantisbt/file_download.php?file_id=680&type=bug
make_splitbrain.sh (999 bytes) 2019-10-08 13:31
https://www.pgpool.net/mantisbt/file_download.php?file_id=679&type=bug
Notes
(0002845)
t-ishii   
2019-09-13 15:57   
(Last edited: 2019-09-13 17:07)
According to Usama's explanation in similar case:
https://www.pgpool.net/pipermail/pgpool-general/2019-August/006733.html
After "3. Network recovers."
> n1, n2, and n3 notice split-brain status.
> They decide n3 is best master/coordinator.
> n2 resigns master/coordinator. VIP is released on n2.
n2 should have not resigned the master in the first place. Since the other case in the upthread of the URL above looks quite similar, I guess there's something wrong in watchdog code to handle the case when former master node comes back.

So,
> We need to do arping again after recovering from split-brain.
Point is not here.

(0002847)
Muhammad Usama   
2019-09-13 21:57   
(Last edited: 2019-09-13 21:58)
In the above-explained scenario. As per the current design, the watchdog should behave as follows:

1. There are watchdog cluster nodes: n1, n2, and n3.
    n3 is master/coordinator. VIP is set on n3.

2. Network trouble occurs.
        n1 and n2 decide that n3 is down.
        n2 become new master/coordinator, VIP is set on n2 with arping.

2-b. At the same time, n3 would perform de-escalation since it has lost the quorum.
         Even if the n3 stays as master it would remove the VIP because of losing the quorum.

3. Network recovers.
       n1, n2, and n3 notice split-brain status.
       They decide n3 is best master/coordinator.
      n2 resigns master/coordinator. VIP is released on n2.

4. Even in the case of No:4. If n3 is chosen as the best master. The n3 will perform
       the escalation again. The reason being it had already performed the de-escalation when the
      quorum was lost. And when the escalation is performed again after recovery from
      split-brain the arping step would be performed again.

Now if somehow (step No:4 of the original scenario: VIP has been set on n3, but ARP table says n2 is VIP target yet.)
is happening which means the watchdog had not performed the de-escalation when the quorum was lost.

If that is the case can you please provide me the log files when that happened?

(0002855)
harukat   
2019-09-17 11:10   
This is the log for this issue.
IP address and host name are modified from our customer's original.
N3 had been master/coordinator with VIP for a long time.
When n1 and n2 realized n3 was lost, n3 didn't throw any errors for watchdog.
(0002879)
harukat   
2019-09-25 13:03   
Our customer says:
In our case, n3 did not seem to know quorum was lost...for most of that network partition,
n3 could *see* n1 and n2, but n2 and n1 could not *see* n3.
I'm not certain if n3 had a way to know quorum was lost during the network partition?
We do know with certainty that the end result was two production outages within 3 days.
(0002885)
harukat   
2019-09-27 12:19   
This is a simple patch for V3_7_STABLE to do arping after recovering from split-brain.
It passed the regression test.
(0002886)
t-ishii   
2019-09-27 12:32   
The regression does not do any VIP/arp test. That means passing the regression test does not say anything about your patch is ok or not as far as VIP/arp issue concerns.
(0002887)
t-ishii   
2019-09-27 13:14   
Have you actually tested the patch with case described above ?
(0002896)
harukat   
2019-10-01 18:10   
No, I haven't recreate the reported situation.
(0002898)
t-ishii   
2019-10-02 09:36   
We do not accept untested patches.
(0002899)
Muhammad Usama   
2019-10-02 23:07   
Hi, Harukat,

First of all sorry for the delayed response, and thanks a lot for providing the patch and log files.

The log file contained the messages from all three nodes and it was quite huge so took me a while to understand
the issue.

Anyhow after reviewing the log files I realized that there was confusion in the watchdog code on how to deal
with the life-check failed scenarios especially for the cases when the life-check reports the node failure while
watchdog core still able to communicate with remote nodes. and also for the case when node A's life-check reports
node B as lost while B still thinks A is alive and healthy.

So I have reviewed the whole watchdog design around the life-check reports and have made some fixes in that area
in the attached patch

You can try the attached patch and test your scenario if you can still find yourself in the same situation as described
in the initial bug report.
The patch is generated against the current MASTER branch and I will commit it after little more testing and then backport
it to all supported branches.

Finally, the original idea in your patch and bug report to do arping after recovering from the split-brain seems reasonable
but your patch needs a little more thought on the design, since executing the wd_arping function from watchdog main process
is not the right thing to do.
But effectively after my patch (attached) you should never end up in the situation where
multiple watchdog nodes ended up performing the escalation provided you are using odd number of pgpool-II nodes.
The arping solution you suggested should only be effective in a situation where the watchdog cluster is configured with
even number of total nodes and network partition divides the network in such a way that both network partitions gets exactly
half nodes each.

Thanks
Best regards
Muhammad Usama
]
(0002900)
t-ishii   
2019-10-02 23:28   
Usama,
After applying the patch, one of watchdog regression test failed.
testing 011.watchdog_quorum_failover...failed.

regression log attached.
(0002902)
t-ishii   
2019-10-02 23:32   
Also pgpool.log attached.
(0002903)
Muhammad Usama   
2019-10-03 00:19   
Hi Ishii-San,

I am looking into attached logs

Somehow regression always passes on my machine :-)

testing 011.watchdog_quorum_failover...ok.
testing 012.watchdog_failover_when_quorum_exists...ok.
testing 013.watchdog_failover_require_consensus...ok.
testing 014.watchdog_test_quorum_bypass...ok.
testing 015.watchdog_master_and_backend_fail...ok.
testing 016.node_0_is_not_primary...ok.
testing 017.node_0_is_down...ok.
testing 018.detach_primary...ok.
testing 019.log_client_messages...ok.
testing 020.allow_clear_text_frontend_auth...ok.
testing 021.pool_passwd_auth...ok.
testing 022.pool_passwd_alternative_auth...ok.
testing 023.ssl_connection...ok.
testing 024.cert_auth...ok.
(0002904)
Muhammad Usama   
2019-10-03 00:33   
From the look of it seems like the setup for PostgreSQL server is failing the test case.

See the line 23 in attached 011.watchdog_quorum_failover

 23 recovery node 1...ERROR: executing recovery, execution of command failed at "1st stage"
 24 DETAIL: command:"basebackup.sh"

But I can't figure out what could be the reason for that.
(0002905)
t-ishii   
2019-10-03 07:13   
Oops. You are right. I had broken my pgpool set up while migrating to PostgreSQL 12. Sorry for noise.
(0002906)
harukat   
2019-10-03 13:33   
Thanks, Usama.
I'll do a test the patch in the V4_1_STABLE environment.

It looks big changes.
Shouldn't I except 3.7.x version fix?
(0002907)
Muhammad Usama   
2019-10-03 18:19   
Once you verify the fix I will backport it to all supported branches including 3.7

Thanks
(0002912)
harukat   
2019-10-08 13:31   
(Last edited: 2019-10-08 13:32)
I did a test that runs Pgpool 4.1.x nodes in the artificial unstable network by executing the attached script.
This "make_splitbrain.sh" script blocks the watchdog communication randomly when it is a master node.
In spite of the severe network environment, patched Pgpools run well long time (over 1 day) in most cases.
They often recovered from split-brain status successfully.

I also got a failed case. It's log is attached "test_20191007.tgz" file.
I'm sorry, it may be difficult to read because it includes Japanese locale output.
In that case, I need to execute arping to use Pgpool-II via VIP.
Its log say at last:
  Oct 7 16:19:00 cp2 [16456]: [2889-1] 2019-10-07 16:19:00: pid 16456: LOG: successfully acquired the delegate IP:"10.10.10.152"
  Oct 7 16:19:00 cp2 [16456]: [2889-2] 2019-10-07 16:19:00: pid 16456: DETAIL: 'if_up_cmd' returned with success
But I cannot access Pgpool-II in cp2 host via VIP without arping. I couldn't find its cause.

(0002918)
Muhammad Usama   
2019-10-11 00:19   
Hi Harukat,

First of all many thanks for doing thorough testing.
I have gone through all the attached logs and it seems that watchdog is behaving correctly. despite some
flooding of log messages, I can't see any issue currently at least in the logs.

As far as the problem you described, that you had to do arping on cp2 after the escalation that happened at time:'Oct 7 16:19:00'.

I believe it's not because of anything wrong done by Pgpool-II or watchdog ( at least noting in the logs suggests
anything wrong with watchdog or pgpool that could have caused that)

If you see the logs of all three nodes around that time you could see that only one node CP2 had performed
the escalation and brought up the VIP and no other node had acquired the VIP after that time.
And since after acquiring the VIP pgpool escalation process does performs the arping and ping, so I am guessing some
external factor might be involved. Because nothing in the logs points to any situation or issue that could require a manual arping.

I am not sure but I am thinking if that can be caused by a nature of the test case as it was causing the very frequent escalations and de-escalations,
So do you think if it is possible that a network switch or VM host might have played role in that?
For example, the last de-escalation on CP2 happened at time:"7 16:18:45" (VIP released by CP2) and the new escalation started
at time:"7 16:18:56" ( VIP acquired again), So its just 10 seconds gap in between the release and acquiring of VIP on CP2.
So I am just thinking out loud that what if ARP tables on client machine might have got it wrong because of these frequent updates.
Somehow the client machine still didn't receive the new VIP record.
Though I still think that might not be a case and there is some other external elements that would have caused that.
(0002927)
harukat   
2019-10-15 11:54   
Thank you for confirming my report.
If you have no idea on the Pgpool-II code for the failed case,
I also think some external elements caused it because the log suggests the code must do arping last.
And my test script could start dropping the packets in the VIP holder host immediately
after its escalation without a stable time. It can be said a kind of double fault, so I think
it's OK that the Pgpool-II patched code don't cover the such case.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
545 [Pgpool-II] Bug major random 2019-09-09 20:25 2019-10-31 18:40
Reporter: nglp Platform:  
Assigned To: Muhammad Usama OS:  
Priority: normal OS Version:  
Status: resolved Product Version: 4.0.4  
Product Build: Resolution: fixed  
Projection: none      
ETA: none Fixed in Version: 4.0.7  
    Target Version: 4.0.7  
Summary: Quorum lost and not recovered
Description: Hi,

We've a 3 node PGPool setup with 3 PostgreSQL master-standby-standby with streaming replication running on vmware vms

PGPool health check method is heartbeat

We had an issue, because our backup method (snapshot) hanged master watchdog node (node2 on that moment) for more time than defined in wd_heartbeat_deadtime, and VIP was released, that its a normal behaviour

Issue was after that, quorum was lost and never recovered, VIP was not reallocated in the other 2 nodes, so service went down

Quorum was never recovered, we had to restart entire pgpool stack, despite checking master watchdog node on pgpool was indicating a master. This has happened in the past and quorum was recovered and VIP reallocated successfully, so its random i guess

Find attached logs and our pgpool configuration

Any question we will be happy to answer

Thanks, Guille
Tags:
Steps To Reproduce: This has happened after more master failovers due same hang, but:
* Hang watchdog master node for a bit more than wd_heartbeat_deadtime
* Quorum is lost
Additional Information: node1, node2 and node3 have different wd_priority, in that order, other pgpool settings are exactly the same

node2 was watchdog master at that moment, and after quorum lost
Attached Files: node3.txt (983 bytes) 2019-09-09 20:25
https://www.pgpool.net/mantisbt/file_download.php?file_id=646&type=bug
node1.txt (983 bytes) 2019-09-09 20:25
https://www.pgpool.net/mantisbt/file_download.php?file_id=644&type=bug
node2.txt (5,123 bytes) 2019-09-09 20:25
https://www.pgpool.net/mantisbt/file_download.php?file_id=645&type=bug
pgpool.conf (4,479 bytes) 2019-09-09 20:25
https://www.pgpool.net/mantisbt/file_download.php?file_id=647&type=bug
server2.txt.gz (112,256 bytes) 2019-09-13 21:51
https://www.pgpool.net/mantisbt/file_download.php?file_id=652&type=bug
server1.txt.gz (115,952 bytes) 2019-09-13 21:51
https://www.pgpool.net/mantisbt/file_download.php?file_id=651&type=bug
server3.txt.gz (116,210 bytes) 2019-09-13 21:51
https://www.pgpool.net/mantisbt/file_download.php?file_id=653&type=bug
recover_quorum_fix.diff (8,328 bytes) 2019-09-18 23:18
https://www.pgpool.net/mantisbt/file_download.php?file_id=656&type=bug
regression_log.tar.gz (1,891 bytes) 2019-09-19 20:39
https://www.pgpool.net/mantisbt/file_download.php?file_id=657&type=bug
011-012.tar.gz (6,663 bytes) 2019-09-20 20:58
https://www.pgpool.net/mantisbt/file_download.php?file_id=658&type=bug
reg_logs_usama.zip (29,726 bytes) 2019-09-20 23:39
https://www.pgpool.net/mantisbt/file_download.php?file_id=659&type=bug
server1.txt-2.gz (106,666 bytes) 2019-09-24 20:21
https://www.pgpool.net/mantisbt/file_download.php?file_id=660&type=bug
server2.txt-2.gz (97,357 bytes) 2019-09-24 20:21
https://www.pgpool.net/mantisbt/file_download.php?file_id=661&type=bug
server3.txt-2.gz (97,602 bytes) 2019-09-24 20:21
https://www.pgpool.net/mantisbt/file_download.php?file_id=662&type=bug
pgpool-II-4.0.4.fix.tgz (3,898,639 bytes) 2019-09-25 19:57
https://www.pgpool.net/mantisbt/file_download.php?file_id=663&type=bug
pgpool-II-4.0.4-patched.tar.gz (3,365,708 bytes) 2019-09-25 20:53
https://www.pgpool.net/mantisbt/file_download.php?file_id=664&type=bug
server2.txt-3.gz (106,937 bytes) 2019-09-25 22:11
https://www.pgpool.net/mantisbt/file_download.php?file_id=666&type=bug
server1.txt-3.gz (117,111 bytes) 2019-09-25 22:11
https://www.pgpool.net/mantisbt/file_download.php?file_id=665&type=bug
server3.txt-3.gz (107,761 bytes) 2019-09-25 22:11
https://www.pgpool.net/mantisbt/file_download.php?file_id=667&type=bug
Notes
(0002835)
t-ishii   
2019-09-12 10:36   
I have pinged Usama, who is an authority of watchdog, a few days ago to look into this issue.
Probably you 'd better to wait for response from him.

In the mean time I noticed you set wd_heartbeat_deadtime = 5, that means if watchdog does not receive heart beat signal within 5 seconds, the pgpool will be regarded as dead (that is your situation). For a workaround, you could set it to longer, say 300 seconds, to not regard your backuping server is dead.
(0002836)
nglp   
2019-09-12 16:09   
Hi t-ishii,

Thank you for your response, we will wait for him :)

Yes, once we had identified that timeout, we increased wd_heartbeat_deadtime to 30 (before 5) and wd_interval to 10 (before 2) to avoid these problems. That are default values in a stream replication cluster example configuration

We configured lower values in the past to allow fastest failover possible

Best regards, Guille
(0002837)
t-ishii   
2019-09-12 16:49   
This one:
https://www.pgpool.net/mantisbt/view.php?id=547
seems to be similar issue?
(0002838)
nglp   
2019-09-12 17:19   
Hi,

No, its not the same, if you see logs attached, following things happens:
* Heartbeat signals are lost
* Quorum is lost
* Watchdog deescalation starts
* VIP is removed
* Heartbead signals are back

But at this moment, quorum is not recovered, so VIP is lost

Issue in our case is quorum is never recovered, despite heartbeat signals are back

Best regards
(0002839)
Muhammad Usama   
2019-09-12 18:54   
Hi

The log files you shared do not have much information and its difficult to guess what could have gone wrong.
Can you please share the complete pgpool logs for all three nodes.
(0002840)
nglp   
2019-09-12 19:07   
Hi,

Im sorry to say thats the only output generated from pgpool for this issue

If you want i can attach entire logs, but are filled with non-relevant information

Let me know

Thanks, Guille
(0002841)
Muhammad Usama   
2019-09-12 19:17   
Do you able to reliably reproduce the issue. If that's the case then sharing the exact steps would be enough.

Otherwise please share the entire log along with the time when the issue happened, and I will try to see
if some useable information can be extracted from that.

Thanks
(0002842)
nglp   
2019-09-12 19:27   
Im setting a new environment to make some tests, other ones are all production ones

The way i though you can reproduce it is:
* set low wd_heartbeat_deadtime and wd_interval values (like 2 and 1)
* make virtual machine snapshot on master watchdog node
* check what happens

Expect a random result, on other times VIP has failed over successfully, and quorum has been recovered ok

I will prepare logs to upload them (i've to replace some strings, like hostname, ips, or remove some queries)

Thanks, Guille
(0002846)
nglp   
2019-09-13 21:51   
Hi,

Im back with "good" news, im able to reproduce issue in my test environment with pgpool.conf config attached and following modifications:
* wd_interval = 1
* wd_heartbeat_deadtime = 2
* log_error_verbosity = verbose
* client_min_messages = debug5
* log_min_messages = debug1

Attaching logs, pgpool was stopped, log cleared, started, wait 2mins, issue reproduced, wait 1mins and then stopped

Let me know if you need more debug on log_min_messages

Thanks, Guille
(0002848)
Muhammad Usama   
2019-09-13 23:02   
Hi

Thanks a bundle for the logs. I am looking into these
(0002853)
Muhammad Usama   
2019-09-17 05:05   
Hi

Thanks for providing the logs. I am able to identify the issue using the attached logs.
I am working on the fix.

Thanks
(0002857)
nglp   
2019-09-17 16:37   
Hi,

Thank you so much for your work

If we can help you in any way (reproducing more times, other parameters, etc) just let me know

Best regards, Guille
(0002860)
Muhammad Usama   
2019-09-18 23:18   
Hi Guille,

Can you please try out the attached patch if it fixes the issue.

Thanks
Best Regards
Muhammad Usama
(0002862)
nglp   
2019-09-19 00:17   
Hi,

Should i download source code of 4.0.4 and compile with patch? or its for another version? Currently its has been installed using rpm package from repository

Best regards
(0002864)
t-ishii   
2019-09-19 13:26   
(Last edited: 2019-09-19 13:29)
After applying the patch to master branch, I get a few regression test failures:
testing 004.watchdog...ok.
testing 011.watchdog_quorum_failover...failed.
testing 012.watchdog_failover_when_quorum_exists...failed.
testing 013.watchdog_failover_require_consensus...failed.
testing 014.watchdog_test_quorum_bypass...ok.
testing 015.watchdog_master_and_backend_fail...ok.
testing 028.watchdog_enable_consensus_with_half_votes...ok.

013 has already failed these days but 011 and 012 are new.

(0002865)
Muhammad Usama   
2019-09-19 18:24   
Somehow I am getting clean regression with the patch.

..
testing 006.memqcache...ok.
testing 007.memqcache-memcached...ok.
testing 008.dbredirect...ok.
testing 009.sql_comments...ok.
testing 011.watchdog_quorum_failover...ok.
testing 012.watchdog_failover_when_quorum_exists...ok.
testing 013.watchdog_failover_require_consensus...ok.
testing 014.watchdog_test_quorum_bypass...ok.
testing 015.watchdog_master_and_backend_fail...ok.
testing 016.node_0_is_not_primary...ok.
testing 017.node_0_is_down...ok.
testing 018.detach_primary...ok.
testing 019.log_client_messages...ok.
testing 020.allow_clear_text_frontend_auth...ok.
testing 021.pool_passwd_auth...ok.
testing 022.pool_passwd_alternative_auth...ok.
testing 023.ssl_connection...ok.
testing 024.cert_auth...ok.
testing 025.enable_shared_relcache...ok.
testing 026.temp_table...ok.
testing 027.auto_failback...ok.
testing 028.watchdog_enable_consensus_with_half_votes...ok.
testing 050.bug58...ok.
testing 051.bug60...ok.
..

So I will try to figure out why It could be failing on your system.
I am running the regression on MASTER branch with PG12
(0002866)
Muhammad Usama   
2019-09-19 18:26   
Hi Guille,

If you have a development environment setup on your system then you can pull the code and apply the patch on the master branch
(I believe the same patch should also work with 4_0 branch) and test the scenario.
(0002867)
nglp   
2019-09-19 19:17   
Hi Muhammad,

I've downloaded source code for 4.0.4 version, applied patch, compiled on all machines and started service with new binary

Im sadly to inform issue still occur

My PgSQL version is 11.4 in this environment (i saw in compiling it needs some libraries)

Best regards
(0002868)
t-ishii   
2019-09-19 20:39   
The regression log for 011 and 012 attached.
This is master branch + PostgreSQL 11.4.
(0002869)
Muhammad Usama   
2019-09-20 18:50   
Hi Ishii-San

Is it possible if you can also share the pgpool.log files for both these test cases? Since I am still not able to reproduce the failures either by running against PG11 or PG12
(0002870)
t-ishii   
2019-09-20 20:58   
Sure.
(0002871)
Muhammad Usama   
2019-09-20 23:39   
Hi Ishii-San

Many thanks for providing the logs. There is something strange I am not able to get what's going wrong with your setup.
Although I figured out the logs you shared does not have health check debug related messages.

like below:
WARNING: check_backend_down_request: failed to open file standby2/log/backend_down_request

So I am wondering was the pgpool-II compiled with health check debug enabled?

I am also attaching the similar logs when I ran the regression on my system. If you can spot the difference to identify what could be going wrong.

Thanks
(0002872)
t-ishii   
2019-09-21 08:03   
Oops. You are right. I did not enable health check debug while compiling. After enabling it, 011 and 012 tests succeed.
I am terribly sorry for the misinformation.
(0002874)
Muhammad Usama   
2019-09-24 15:31   
Hi Ishii-San

;-) no issues. Happens with me all the time.

So I am able to test the patch thoroughly using the new wd_cli utility I checked in yesterday. And the patch is behaving as expected.
Do you agree if I go on and commit this one?
(0002875)
t-ishii   
2019-09-24 15:39   
Not sure. I think you need to get log from nglp to find out why his issue is not resolved with the patch?
(0002876)
nglp   
2019-09-24 16:23   
Hi,

Will do, probably will take 1-2 days

Thanks, Guille
(0002877)
Muhammad Usama   
2019-09-24 17:05   
Hi Ishii-San
Sorry, my bad, I somehow overlooked the note from Guille, and never knew he notified about the issue not fixed with the patch.

@Guille I will look into the issue once you share the pgpool log files with the patch applied.
Thanks in advance.

Kind Regards
Muhammad Usama
(0002878)
nglp   
2019-09-24 20:21   
Hi,

 Attaching logs, pgpool was stopped, log cleared, started, wait 2mins, issue reproduced, wait 1mins and then stopped

Compiled PGPool 4.0.4 + Patch
PostgreSQL 11.4

Let me know

Best regards
(0002880)
Muhammad Usama   
2019-09-25 19:09   
Hi Guille,

Many thanks for providing the logs. Looking at the logs I suspect there is some issue with applying the patch.
Since the code line number of logs are appearing at some offset where they should be.
For example:
Sep 24 07:16:20 server1 pgpool[15336]: [557-1] 2019-09-24 07:16:20: pid 15336: DEBUG: I am the cluster leader node command finished with status:[COMMAND TIMEED OUT] which is success
Sep 24 07:16:20 server1 pgpool[15336]: [557-2] 2019-09-24 07:16:20: pid 15336: DETAIL: The command was sent to 2 nodes and 1 nodes replied to it
Sep 24 07:16:20 server1 pgpool[15336]: [557-3] 2019-09-24 07:16:20: pid 15336: LOCATION: watchdog.c:5548

This line should be "LOCATION: watchdog.c:5517" not at "LOCATION: watchdog.c:5548" With pgpool-II 4.0.4 source code.

Also, the other log messages suggest a similar problem with the code.
So is it possible if you could send me the code with patch applied? or I can send you the built binaries(if you are using MAC or centos7) whichever suits you better.

Thanks in advance for helping in debugging this.

Best Regards
(0002881)
nglp   
2019-09-25 19:57   
Hi,

Sure, find my PGPool 4.0.4 + Patch attached

Probably its my bad applying it, i did it manually, so if you have the source code with patch properly applied, i can compile it in my test environment (redhat 7)

Also, we're open mind to update PGPool to latest version (some days ago we suffered a segfault in a connection pool process, we are still trying to reproduce or obtain a coredump to open an issue), so if you have it applied over 4.0.6 will work for us too

Many thanks to you, Guille
(0002882)
Muhammad Usama   
2019-09-25 20:53   
Hi

Many thanks for the prompt reply. And my suspicion was correct, there was an issue with patching.
I am attaching the patched code for same pgpool-II 4.0.4. Can you please try it out.
Once you verify the fix I will apply it to all supported branches.

Thanks
(0002883)
nglp   
2019-09-25 22:11   
Hi,

Good news, issue looks to be solved with that code

Attaching logs, pgpool was stopped, log cleared, started, wait 2mins, issue reproduced, wait 1mins and then stopped

Also, out of logs, i repeated issue like 10-15 times and everytime went ok

Thank you all for your work

Best regards, Guille
(0002884)
Muhammad Usama   
2019-09-25 23:41   
Hi Guille,

Many thanks for testing and confirmation of the fix. I will commit the patch to all affected branches.

Best Regards
Muhammad Usama
(0002892)
Muhammad Usama   
2019-09-29 06:06   
I have committed the fix to all supported branches

https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=963ac83ce19dbb3d1a423ae5ce582c6866b47106
(0002893)
Muhammad Usama   
2019-09-29 06:07   
Fix committed to all supported branches


https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=963ac83ce19dbb3d1a423ae5ce582c6866b47106


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
546 [Pgpool-II] Bug block always 2019-09-11 22:49 2019-10-31 18:40
Reporter: venkat_rj Platform: Linux  
Assigned To: t-ishii OS: REHL7/CentOS  
Priority: high OS Version: 7.5.1804  
Status: resolved Product Version: 4.0.6  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version: 4.0.7  
    Target Version: 4.0.7  
Summary: Executing DML & DDL combination via PostgresSQL ODBC to pgpool fails with 'WARNING: child process with pid: 24870 was terminated
Description: Executing DML & DDL combination via PostgresSQL ODBC to pgpool fails with 'WARNING: child process with pid: 24870 was terminated by segmentation fault'

By defult we config pgpool's load_balance_mode is off.

Looks like pgpool is doing load balance when it sees SELECT statement even though load_balance_mode is off.


==================================
Version:
==================================
- pgpool-II version 4.0.6
- postgres (PostgreSQL) 11.5
- PostgresSQL psqlodbc-11.01.0000


==================================
pgpool erorr in the log file:
==================================

2019-09-10 10:22:09: pid 20205: LOG: fork a new child process with pid: 24870
2019-09-10 10:22:18: pid 24870: LOG: pool_send_and_wait: Error or notice message from backend: : DB node id: 0 backend pid: 24885 statement: "CREATE INDEX "USERS_AUTH_ID_INDEX" ON "ENTITIES" ( "AUTH_ID" )" message: "relation "USERS_AUTH_ID_INDEX" already exists"
2019-09-10 10:22:18: pid 20205: WARNING: child process with pid: 24870 was terminated by segmentation fault
2019-09-10 10:22:18: pid 20205: LOG: fork a new child process with pid: 24888
2019-09-10 10:22:18: pid 19153: WARNING: ssl read: EOF detected
2019-09-10 10:22:18: pid 19153: ERROR: unable to read data from frontend
2019-09-10 10:22:18: pid 19153: DETAIL: socket read failed with an error "Success"


==================================
pgpool.conf:
==================================

egrep -i 'load|white|black' pgpool.conf
# Whitespace may be used. Comments are introduced with "#" anywhere on a line.
# server for the changes to take effect, or use "pgpool reload". Some
                                   # Weight for backend 0 (only in load balancing mode)
                                   # load_balance_mode.
# LOAD BALANCING MODE
load_balance_mode = on
                                   # Activate load balancing mode
ignore_leading_white_space = on
                                   # Ignore leading white spaces of each query
white_function_list = ''
black_function_list = 'currval,lastval,nextval,setval'
black_query_pattern_list = ''
                                   # if on, ignore SQL comments when judging if load balance or
disable_load_balance_on_write = 'transaction'
                                   # Load balance behavior when write query is issued
                                   # subsequent read queries will not be load balanced
                                   # will not be load balanced until the session ends.
                                   # not be load balanced until the session ends.
                                   # thus increases load of master.
                                   # thus increases load of master.
white_memqcache_table_list = ''
black_memqcache_table_list = ''
load_balance_mode = off # Added through Consul-template



=========================================================================
DML & DDL Calls via PostgresSQL ODBC driver to pgpool summary:
=========================================================================


    In: ConnectionHandle=0x7fffea45b960, WindowHandle=0, InConnectionString=`tracefile=/tmp/fun.croc;traceflags=319;CONOPTS=(SSLMode=require;pqopt={sslrootcert=/opt/<MaskedPath>/cacerts/trustedcerts.pem});DRIVER=POSTGRES;CATALOG=FB;DATABASE='DA_SYSCAT_default';;UID=dbmsowner;PWD=<Removed>;SERVER=cole-test2-onpremnnnnn.com;PORT=5431`, StringLength1=TKTS_NTS, OutConnectionString=0, BufferLength=0, StringLength2Ptr=0, DriverCompletion=TKTS_DRIVER_NOPROMPT=0
    In: StatementHandle=0x7fffea494640, Sql=`SET SESSION CLIENT_MIN_MESSAGES=WARNING;`, SqlL=40
    In: StatementHandle=0x7fffea494640, Sql=`CREATE SCHEMA IF NOT EXISTS "FS_SYSCAT";`, SqlL=41
    In: StatementHandle=0x7fffea494640, Sql=`SET SCHEMA 'FS_SYSCAT';`, SqlL=23

    In: ConnectionHandle=0x7fffdfd9a8e0, WindowHandle=0, InConnectionString=`tracefile=/tmp/fun.croc;traceflags=319;CONOPTS=(SSLMode=require;pqopt={sslrootcert=/opt/<MaskedPath>/cacerts/trustedcerts.pem});DRIVER=POSTGRES;CATALOG=FB;DATABASE='DA_SYSCAT_default';;UID=dbmsowner;PWD=<Removed>;SERVER=cole-test2-onpremnnnnn.com;PORT=5431`, StringLength1=TKTS_NTS, OutConnectionString=0, BufferLength=0, StringLength2Ptr=0, DriverCompletion=TKTS_DRIVER_NOPROMPT=0
    In: StatementHandle=0x7fffdfd9e0e0, Sql=`SET SESSION CLIENT_MIN_MESSAGES=WARNING;`, SqlL=40
    In: StatementHandle=0x7fffdfd9e0e0, Sql=`SET SCHEMA 'FS_SYSCAT';`, SqlL=23
    In: StatementHandle=0x7fffdfd9e0e0, Sql=`select "VERSION" from "DATA_SERVICES" where "DS_ID"=1`, SqlL=53
    In: StatementHandle=0x7fffdfd9e0e0, Sql=`SELECT * FROM "DATA_SERVICES" WHERE 1=0`, SqlL=39
    In: StatementHandle=0x7fffdfd9e0e0, Sql=`CREATE INDEX "USERS_AUTH_ID_INDEX" ON "ENTITIES" ( "AUTH_ID" )`, SqlL=62
    In: StatementHandle=0x7fffdfd9e0e0, Sql=`CREATE INDEX "USERS_ENTITY_ID_INDEX" ON "ENTITIES" ( "ENTITY_ID" )`, SqlL=66


==================
ODBC log:
==================


# Processing input file `/home/centos/t9.croc`

TKTSAllocHandle:
        In: HandleType = TKTS_HANDLE_ENV=1, InputHandle = TKTS Extension Handle=0x7f04341645c0, OutputHandlePtr = VALID
        Return: TKTS_SUCCESS = 0
        Out: *OutputHandlePtr = henv1=0x7f043417a0a0
TKTSAllocHandle:
        In: HandleType = TKTS_HANDLE_DBC=2, InputHandle = henv1=0x7f043417a0a0, OutputHandlePtr = VALID
        Return: TKTS_SUCCESS = 0
        Out: *OutputHandlePtr = hdbc1=0x7f043417c940
TKTSDriverConnect:
        In: ConnectionHandle = hdbc1=0x7f043417c940, WindowHandle = TKTS_NULL_HANDLE=0,
                InConnectionstring = `tracefile=/tmp/fun.croc;traceflags=319;CONOPTS=(SSLMode=require;pqopt={sslrootcert=/opt/<MaskedPath>/cacerts/trustedcerts.pem});DRIVER=POSTGRES;CATALOG=FB;DATABASE='DA_SYSCAT_default';;UID=dbmsowner;PWD=<Removed>;SERVER=cole-test2-onpremnnnnn.com;PORT=5431`,
                StringLength1 = TKTS_NTS=-3, OutConnectionString = NULL=0, BufferLength = 0, StringLength2Ptr = NULL=0, DriverCompletion = TKTS_DRIVER_NOPROMPT=0
        Return: TKTS_SUCCESS_WITH_INFO = 0x80fff801
        dbc: szSqlState = `01S02`, *pfNativeError = -2130708291, *pcbErrorMsg = 25
                MessageText = `Current catalog set to FB`
TKTSSetConnectAttr:
        In: ConnectionHandle = hdbc1=0x7f043417c940, Attribute = TKTS_ATTR_AUTOCOMMIT=102, ValuePtr = TKTS_AUTOCOMMIT_ON=1, StringLength = TKTS_IS_UINTEGER=-5
        Return: TKTS_SUCCESS = 0
TKTSAllocHandle:
        In: HandleType = TKTS_HANDLE_STMT=3, InputHandle = hdbc1=0x7f043417c940, OutputHandlePtr = VALID
        Return: TKTS_SUCCESS = 0
        Out: *OutputHandlePtr = hstmt1=0x7f0425306c00

TKTSExecDirect:
        In: StatementHandle = hstmt1=0x7f0425306c00, StatementText = `SET SESSION CLIENT_MIN_MESSAGES=WARNING;`, TextLength = 40
        Return: TKTS_SUCCESS = 0


TKTSExecDirect:
        In: StatementHandle = hstmt1=0x7f0425306c00, StatementText = `CREATE SCHEMA IF NOT EXISTS "FS_SYSCAT";`, TextLength = 41
        Return: TKTS_SUCCESS = 0


TKTSExecDirect:
        In: StatementHandle = hstmt1=0x7f0425306c00, StatementText = `SET SCHEMA 'FS_SYSCAT';`, TextLength = 23
        Return: TKTS_SUCCESS = 0

TKTSFreeHandle:
        In: HandleType = TKTS_HANDLE_STMT=3, Handle = hstmt1=0x7f0425306c00
        Return: TKTS_SUCCESS = 0
TKTSAllocHandle:
        In: HandleType = TKTS_HANDLE_DBC=2, InputHandle = henv1=0x7f043417a0a0, OutputHandlePtr = VALID
        Return: TKTS_SUCCESS = 0
        Out: *OutputHandlePtr = hdbc2=0x7f04253070e0
TKTSGetConnectAttr:
        In: ConnectionHandle = hdbc2=0x7f04253070e0, Attribute = <unknown>=20000, ValuePtr = VALID, BufferLength = TKTS_IS_POINTER=-4, StringLengthPtr = VALID
        Return: TKTS_SUCCESS = 0
        Out: *ValuePtr = 0, *StringLengthPtr = 8
TKTSDriverConnect:
        In: ConnectionHandle = hdbc2=0x7f04253070e0, WindowHandle = TKTS_NULL_HANDLE=0,
                InConnectionstring = `tracefile=/tmp/fun.croc;traceflags=319;CONOPTS=(SSLMode=require;pqopt={sslrootcert=/opt/<MaskedPath>/cacerts/trustedcerts.pem});DRIVER=POSTGRES;CATALOG=FB;DATABASE='DA_SYSCAT_default';;UID=dbmsowner;PWD=<Removed>;SERVER=cole-test2-onpremnnnnn.com;PORT=5431`,
                StringLength1 = TKTS_NTS=-3, OutConnectionString = NULL=0, BufferLength = 0, StringLength2Ptr = NULL=0, DriverCompletion = TKTS_DRIVER_NOPROMPT=0
        Return: TKTS_SUCCESS_WITH_INFO = 0x80fff801
        dbc: szSqlState = `01S02`, *pfNativeError = -2130708291, *pcbErrorMsg = 25
                MessageText = `Current catalog set to FB`
TKTSSetConnectAttr:
        In: ConnectionHandle = hdbc2=0x7f04253070e0, Attribute = TKTS_ATTR_AUTOCOMMIT=102, ValuePtr = TKTS_AUTOCOMMIT_ON=1, StringLength = TKTS_IS_UINTEGER=-5
        Return: TKTS_SUCCESS = 0
TKTSAllocHandle:
        In: HandleType = TKTS_HANDLE_STMT=3, InputHandle = hdbc2=0x7f04253070e0, OutputHandlePtr = VALID
        Return: TKTS_SUCCESS = 0
        Out: *OutputHandlePtr = hstmt2=0x7f0425311d00

TKTSExecDirect:
        In: StatementHandle = hstmt2=0x7f0425311d00, StatementText = `SET SESSION CLIENT_MIN_MESSAGES=WARNING;`, TextLength = 40
        Return: TKTS_SUCCESS = 0


TKTSExecDirect:
        In: StatementHandle = hstmt2=0x7f0425311d00, StatementText = `SET SCHEMA 'FS_SYSCAT';`, TextLength = 23
        Return: TKTS_SUCCESS = 0

TKTSFreeHandle:
        In: HandleType = TKTS_HANDLE_STMT=3, Handle = hstmt2=0x7f0425311d00
        Return: TKTS_SUCCESS = 0
TKTSAllocHandle:
        In: HandleType = TKTS_HANDLE_STMT=3, InputHandle = hdbc1=0x7f043417c940, OutputHandlePtr = VALID
        Return: TKTS_SUCCESS = 0
        Out: *OutputHandlePtr = hstmt3=0x7f0425311d00
TKTSFreeStmt:
        In: StatementHandle = hstmt3=0x7f0425311d00, Option = TKTS_CLOSE=0
        Return: TKTS_SUCCESS = 0

TKTSExecDirect:
        In: StatementHandle = hstmt3=0x7f0425311d00, StatementText = `select "VERSION" from "DATA_SERVICES" where "DS_ID"=1`, TextLength = 53
        Return: TKTS_SUCCESS = 0

TKTSBindCol:
        In: StatementHandle = hstmt3=0x7f0425311d00, ColumnNumber = 1, TargetType = TKTS_C_TKCHAR=3, TargetValuePtr = VALID, BufferLength = 1024, StrLen_or_IndPtr = VALID
        Return: TKTS_SUCCESS = 0
TKTSFetch:
        In: StatementHandle = hstmt3=0x7f0425311d00
        Return: TKTS_SUCCESS = 0
TKTSFreeStmt:
        In: StatementHandle = hstmt3=0x7f0425311d00, Option = TKTS_CLOSE=0
        Return: TKTS_SUCCESS = 0
TKTSFreeHandle:
        In: HandleType = TKTS_HANDLE_STMT=3, Handle = hstmt3=0x7f0425311d00
        Return: TKTS_SUCCESS = 0
TKTSAllocHandle:
        In: HandleType = TKTS_HANDLE_STMT=3, InputHandle = hdbc1=0x7f043417c940, OutputHandlePtr = VALID
        Return: TKTS_SUCCESS = 0
        Out: *OutputHandlePtr = hstmt4=0x7f041b49e6a0
TKTSPrepare:
        In: StatementHandle = hstmt4=0x7f041b49e6a0, StatementText = `SELECT * FROM "DATA_SERVICES" WHERE 1=0`, TextLength = 39
        Return: TKTS_SUCCESS = 0
TKTSFreeStmt:
        In: StatementHandle = hstmt4=0x7f041b49e6a0, Option = TKTS_UNPREPARE=101
        Return: TKTS_SUCCESS = 0

TKTSExecDirect:
        In: StatementHandle = hstmt4=0x7f041b49e6a0, StatementText = `CREATE INDEX "USERS_AUTH_ID_INDEX" ON "ENTITIES" ( "AUTH_ID" )`, TextLength = 62
        Return: TKTS_ERROR = 0x80fff802
        stmt: szSqlState = `42P07`, *pfNativeError = 1, *pcbErrorMsg = 92, *ColumnNumber = -2, *RowNumber = 0
                MessageText = `ERROR: ERROR: relation "USERS_AUTH_ID_INDEX" already exists;
Error while executing the query`


TKTSExecDirect:
        In: StatementHandle = hstmt4=0x7f041b49e6a0, StatementText = `CREATE INDEX "USERS_ENTITY_ID_INDEX" ON "ENTITIES" ( "ENTITY_ID" )`, TextLength = 66
        Return: TKTS_ERROR = 0x80fff802
        stmt: szSqlState = `08S01`, *pfNativeError = 35, *pcbErrorMsg = 174, *ColumnNumber = -2, *RowNumber = 0
                MessageText = `ERROR: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
;
The connection has been lost`

TKTSFreeHandle:
        In: HandleType = TKTS_HANDLE_STMT=3, Handle = hstmt4=0x7f041b49e6a0
        Return: TKTS_SUCCESS = 0
TKTSEndTran:
        In: HandleType = TKTS_HANDLE_DBC=2, Handle = hdbc1=0x7f043417c940, CompletionType = TKTS_ROLLBACK=1
        Return: TKTS_SUCCESS = 0


# Done!


==================================
Wrokaround:
==================================
Reason is when we pass a hint i.e. /*NO LOAD BALANCE*/ with SELECT statement pgpool dose not terminate child process with segmentation fault.
But without the hint it dose fail.


==================================
SELECT with Hint:
==================================

TKTSPrepare:
        In: StatementHandle = hstmt4=0x7fd050948d00, StatementText = `/*NO LOAD BALANCE*/ SELECT * FROM "DATA`, TextLength = 39


Tags:
Steps To Reproduce:
Additional Information:
Attached Files: pgpool_20190912_155329.log (1,495,334 bytes) 2019-09-13 05:10
https://www.pgpool.net/mantisbt/file_download.php?file_id=649&type=bug
pgpool.conf (43,213 bytes) 2019-09-13 05:10
https://www.pgpool.net/mantisbt/file_download.php?file_id=648&type=bug
gdb_pgpool_pid_attach.log (7,463 bytes) 2019-09-13 05:20
https://www.pgpool.net/mantisbt/file_download.php?file_id=650&type=bug
pgpool_build_full_log.txt (109,488 bytes) 2019-09-17 04:09
https://www.pgpool.net/mantisbt/file_download.php?file_id=654&type=bug
Notes
(0002834)
t-ishii   
2019-09-12 07:30   
Please provide stack trace of the process terminated by segfault.

Also I need pgpool log enabling log_client_messages.

> Looks like pgpool is doing load balance when it sees SELECT statement even though load_balance_mode is off.
Please provide whole pgpool.con. At least you need to provide master_slave_mode and master_slave_sub_mode.
(0002843)
venkat_rj   
2019-09-13 05:20   
Uploaded pgpool_20190912_155329.log and pgpool.conf.

Unfortunately current pgpool RPM we build in house dose not have debugging symbols. We are trying to build one.
I will try to generate the the stack trace once I install the debug enabled RPM.

Attaching the gdb_pgpool_pid_attach.log
(0002844)
t-ishii   
2019-09-13 08:50   
Thanks. pgpool_20190912_155329.log is very helpful. I am going to inspect the log file so that I can create a minimum set of test case using pgproto (pgproto is a small tool included in the Pgpool-II source code to replay messages, provided in a text file, between client and pgpool [1]).

[1] https://www.pgpool.net/docs/latest/en/html/pgproto.html
(0002849)
t-ishii   
2019-09-16 10:38   
I found the cause of the segfault. Following is a minimum test case for this:

1. Parse (SELECT 1) to statement foo
2. DEALLOCATE foo
3. Error query (for example CREATE INDEX on already existing index)

At 3, data created at 1 is already removed. However, it forgot to set the state and after 3, pgpool tries to remove the data which is already removed.

Also I found occasional hang while processing above sequence if load balance is enabled.

I have commited/pushed fix for both to all supported branches. For 4.0 stable branches see the commits for more details:

https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=a333a454d6881d380e330cbb85aa9ef02d912631
https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=83c1988d3c8bdd0ecbdf6a3d28371febee556483
(0002850)
venkat_rj   
2019-09-17 00:23   
Thank you so much for fixing the issue and I appreciate your valuable time.
(0002851)
venkat_rj   
2019-09-17 00:30   
For reference, I set following log parameters to generate the pgpool log file.

log_client_messages = on
log_statement = on
log_error_verbosity = verbose # terse, default, or verbose messages
log_min_messages = debug5 # values in order of decreasing detail:
(0002852)
venkat_rj   
2019-09-17 04:09   
With new code build fails with below error.
Please can you help.

Build log file: pgpool_build_full_log.txt

Build error:
2019-09-16T14:57:57.034308000Z context/pool_query_context.c: In function 'where_to_send_deallocate':
2019-09-16T14:57:57.034383000Z context/pool_query_context.c:1487: error: 'POOL_QUERY_CONTEXT' has no member named 'load_balance_node_id'
2019-09-16T14:57:57.034404000Z context/pool_query_context.c:1487: error: 'POOL_QUERY_CONTEXT' has no member named 'load_balance_node_id'
2019-09-16T14:57:57.048176000Z make[2]: *** [context/pool_query_context.o] Error 1
2019-09-16T14:57:57.048270000Z make[2]: *** Waiting for unfinished jobs....
2019-09-16T14:57:57.378666000Z make[2]: Leaving directory `/build/BUILD/sas-pgpool-II-4.0.6/src'
2019-09-16T14:57:57.379245000Z make[1]: *** [all-recursive] Error 1
2019-09-16T14:57:57.379321000Z make[1]: Leaving directory `/build/BUILD/sas-pgpool-II-4.0.6/src'
2019-09-16T14:57:57.379921000Z make: *** [all-recursive] Error 1
2019-09-16T14:57:57.380335000Z error: Bad exit status from /var/tmp/rpm-tmp.1km5xn (%build)
2019-09-16T14:57:57.380395000Z
2019-09-16T14:57:57.380418000Z
2019-09-16T14:57:57.380432000Z RPM build errors:
2019-09-16T14:57:57.380447000Z Bad exit status from /var/tmp/rpm-tmp.1km5xn (%build)
2019-09-16T14:57:57.410413000Z pralixx.builder:[ERROR] Error occurred during execution of 'build' method
2019-09-16T14:57:57.426527000Z error: exit status 1
2019-09-16T14:57:57.748756134Z job 01DMX97PVEAFQQ5E0HP121WV61 "sas-pgpool-II-40-4.0.6-20190916.1568645534833" failed: job exited with status 1
(0002854)
t-ishii   
2019-09-17 07:41   
Sorry, https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=83c1988d3c8bdd0ecbdf6a3d28371febee556483 was not needed for 4.0 stable or older branches. The commit has been reverted for those older branches.
(0002861)
venkat_rj   
2019-09-18 23:35   
Thank you so much.

We tested our code with new fix and it worked.
(0002863)
administrator   
2019-09-19 11:50   
Thank you for your report! Issue resolved.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
542 [Pgpool-II] Bug major sometimes 2019-09-02 16:53 2019-10-29 14:02
Reporter: avi Platform:  
Assigned To: hoshiai OS:  
Priority: high OS Version:  
Status: feedback Product Version: 3.6.9  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: pgpool service will not come up after failover
Description: We have 2 node Postgres 9.6.7 cluster with streaming replication. Pgpool is managed by our cluster as a managed resource and therefore has only one instance running.

We have seen cases where the machine that has Postgres master and pgpool is shutdown, Postgres on the other machine will assume the role of master. The cluster also attempts to start pgpool, but pgpool service fail to start.

In such a case postgres database is accessed with port 5432 but pgpool does not come up and we cannot access the database with port 9999. I do not know why, but at some point the cluster manages to bring up pgpool. Attached please find the log messages generated when I ran pgpool in debug mode.

172.18.255.42 is the down server
172.18.255.41 is the active server that pgpool does not come up

2019-09-01 15:59 is the time the cluster managed to bring up pgpool

Your help is most needed.
Tags:
Steps To Reproduce:
Additional Information:
Attached Files: pgpool_messages.zip (2,020,083 bytes) 2019-09-02 16:53
https://www.pgpool.net/mantisbt/file_download.php?file_id=643&type=bug
pgpool.conf (34,738 bytes) 2019-09-02 16:53
https://www.pgpool.net/mantisbt/file_download.php?file_id=642&type=bug
Notes
(0002890)
hoshiai   
2019-09-27 16:39   
Sorry for the late reply.

When I saw your log file, I concern that pgpool was stopped and started many times using pgpool command.
Why do you run pgpool stop and start command many times?

If you don't run pgpool stop command, is this problem resolved?
(0002908)
avi   
2019-10-03 18:54   
Thanks for the reply. We have a shell script that checks Postgres and pgpool health every 1 minute. If the script sees that it was successful accessing the database with port 5432 but failed with port 9999 (and vip) it will attempt to restart pgpool.

As you can see it attempted it many times but it failed to access the database with port 9999 so it kept on trying. At some point, which I assume is some timeout that expires, pgpool is restarted and everything works OK.
(0002945)
hoshiai   
2019-10-29 14:02   
I see. This cause is the repeat of pgpool's restart by your script. Wahen primary postgresql node(172.18.255.42) is down, promote of standby node will be needed by pgpool. but it is not finished because pgpool is restarted many time.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
553 [Pgpool-II] General minor N/A 2019-10-11 10:51 2019-10-29 13:24
Reporter: sebastianwebber Platform:  
Assigned To: OS:  
Priority: normal OS Version:  
Status: resolved Product Version: 3.6.17  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: transaction mode in pgpool
Description: Hi Everyone,

first of all, please forgive me if this bug tracker is not the proper place to ask questions.

In my current environment, I'm using both pgpool and pgbouncer. Basically our application connects on pgbouncer, which connects on pgpool and pgpool parses the queries and route them between our primary server and the database replicas.

I was wondering if can we use pgpool in the same "transaction mode"[1] that pgbouncer does. is this possible?

btw, transaction mode means that the pool only holds the connection during a transaction and when it ends the connection is put back on the pool, making it available for other connections.

I saw that the connection_cache and max_pool[2] parameters can enable a "connection cache" but its not clear if they work in the same way.

Thanks for your help!
[1] http://www.pgbouncer.org/config.html#pool_mode
[2] https://www.pgpool.net/docs/latest/en/html/runtime-config-connection.html#GUC-NUM-INIT-CHILDREN
Tags:
Steps To Reproduce: NA
Additional Information:
Attached Files:
Notes
(0002930)
pengbo   
2019-10-17 08:59   
> I was wondering if can we use pgpool in the same "transaction mode"[1] that pgbouncer does. is this possible?

No. Pgpool-II doesn't have "transaction mode".


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
554 [Pgpool-II] Bug minor always 2019-10-16 18:14 2019-10-21 17:06
Reporter: nglp Platform:  
Assigned To: pengbo OS:  
Priority: low OS Version:  
Status: resolved Product Version: 4.0.4  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: stale socket files after system crash
Description: Hi,

We've detected pgpool doesnt start after a system crash (ex: electrical outage), because socket files on socket_dir/pcp_socket_dir/wd_ipc_socket_dir

Socket files are:
.s.PGPOOLWD_CMD.9000
.s.PGSQL.9898
.s.PGSQL.9999

Our environment:
PGPool 4.0.4 with attached config
Redhat 7.6

We've workaround this issue with this commands on systemd file (if anyone has this issue, it can help you meanwhile):
ExecStartPre=/bin/bash -c "{ /usr/bin/test -s /run/pgpool/pgpool.pid && ! /usr/bin/pgrep --pidfile /run/pgpool/pgpool.pid; } || { /usr/bin/test ! -s /run/pgpool/pgpool.pid && ! /usr/bin/pgrep pgpool; } && { /usr/bin/test -S /tmp/.s.PGPOOLWD_CMD.9000 && /bin/rm -f /tmp/.s.PGPOOLWD_CMD.9000; } || /usr/bin/test ! -S /tmp/.s.PGPOOLWD_CMD.9000"
ExecStartPre=/bin/bash -c "{ /usr/bin/test -s /run/pgpool/pgpool.pid && ! /usr/bin/pgrep --pidfile /run/pgpool/pgpool.pid; } || { /usr/bin/test ! -s /run/pgpool/pgpool.pid && ! /usr/bin/pgrep pgpool; } && { /usr/bin/test -S /tmp/.s.PGSQL.9898 && /bin/rm -f /tmp/.s.PGSQL.9898; } || /usr/bin/test ! -S /tmp/.s.PGSQL.9898"
ExecStartPre=/bin/bash -c "{ /usr/bin/test -s /run/pgpool/pgpool.pid && ! /usr/bin/pgrep --pidfile /run/pgpool/pgpool.pid; } || { /usr/bin/test ! -s /run/pgpool/pgpool.pid && ! /usr/bin/pgrep pgpool; } && { /usr/bin/test -S /tmp/.s.PGSQL.9999 && /bin/rm -f /tmp/.s.PGSQL.9999; } || /usr/bin/test ! -S /tmp/.s.PGSQL.9999"

Any question we will be happy to answer

Thanks, Guille
Tags:
Steps To Reproduce: * start pgpool (with start on boot enabled)
* kill machine (ex kill virtual machine, remove power, force panic, etc)
* pgpool doesnt start automatically because that files, failing until manual deletion
Additional Information:
Attached Files: pgpool.conf (4,479 bytes) 2019-10-16 18:14
https://www.pgpool.net/mantisbt/file_download.php?file_id=683&type=bug
Notes
(0002931)
pengbo   
2019-10-17 10:27   
Yes.
If the socket left, Pgpool-II startup will fail.
So far, you need to manually remove socket files before starting Pgpool-II.
(0002932)
nglp   
2019-10-17 15:16   
Hi pengbo,

So its a normal behaviour, could i ask you why? i mean, what is the reason for that

We will go ahead with our workaround

Thanks for your response

Best regards, Guille
(0002934)
pengbo   
2019-10-21 16:49   
> So its a normal behaviour, could i ask you why? i mean, what is the reason for that

Because at startup Pgpool-II doesn't check if the PID or socket files exist.
So, we would like to improve the startup process.

Thank you for your comment.
(0002935)
nglp   
2019-10-21 17:03   
Then, check is going to be implemented in code?

If not, please feel free to close this issue

Many thanks, Guille
(0002936)
pengbo   
2019-10-21 17:06   
> Then, check is going to be implemented in code?

Yes. It will be implemented in the next major release.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
541 [Pgpool-II] Bug feature always 2019-08-31 08:03 2019-09-06 07:18
Reporter: Carlos Mendez Platform:  
Assigned To: t-ishii OS:  
Priority: high OS Version:  
Status: resolved Product Version: 3.7.1  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: Unable to start Virtual IP for Pgpool
Description: Environment Details:

pgpool_01 IP: 10.241.166.21
VIP: 10.241.166.24

PRIMARY DB IP: 10.241.166.32
STANDBY DB IP: 10.241.166.33

***********************************************************************************
After to start pgpool services the Virtual IP is not available

# /usr/share/pgpool-II/3.7.1/bin/pgpool -D -n > /var/log/pgpool/pgpool.log 2>&1 &

***********************************************************************************
In the log file the last message is:


2019-08-30 12:16:16: pid 32201: WARNING: checking setuid bit of if_up_cmd
2019-08-30 12:16:16: pid 32201: DETAIL: ifup[/usr/sbin/ip] doesn't have setuid bit
2019-08-30 12:16:16: pid 32201: WARNING: checking setuid bit of if_down_cmd
2019-08-30 12:16:16: pid 32201: DETAIL: ifdown[/usr/sbin/ip] doesn't have setuid bit
2019-08-30 12:16:16: pid 32201: WARNING: checking setuid bit of arping command
2019-08-30 12:16:16: pid 32201: DETAIL: arping[/usr/sbin/arping] doesn't have setuid bit
2019-08-30 12:16:16: pid 32201: LOG: Backend status file /var/log/pgpool/pgpool_status discarded
2019-08-30 12:16:16: pid 32201: LOG: waiting for watchdog to initialize
2019-08-30 12:16:16: pid 32202: LOG: setting the local watchdog node name to "10.241.166.21:9999 Linux pgpool_01"
2019-08-30 12:16:16: pid 32202: LOG: watchdog cluster is configured with 2 remote nodes
2019-08-30 12:16:16: pid 32202: LOG: watchdog remote node:0 on pgpool_02:9000
2019-08-30 12:16:16: pid 32202: LOG: watchdog remote node:1 on pgpool_03:9000
2019-08-30 12:16:16: pid 32202: LOG: interface monitoring is disabled in watchdog
2019-08-30 12:16:16: pid 32202: LOG: watchdog node state changed from [DEAD] to [LOADING]
2019-08-30 12:16:21: pid 32202: LOG: watchdog node state changed from [LOADING] to [JOINING]
2019-08-30 12:16:25: pid 32202: LOG: watchdog node state changed from [JOINING] to [INITIALIZING]
2019-08-30 12:16:26: pid 32202: LOG: I am the only alive node in the watchdog cluster
2019-08-30 12:16:26: pid 32202: HINT: skiping stand for coordinator state
2019-08-30 12:16:26: pid 32202: LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
2019-08-30 12:16:26: pid 32202: LOG: I am announcing my self as master/coordinator watchdog node
2019-08-30 12:16:30: pid 32202: LOG: I am the cluster leader node
2019-08-30 12:16:30: pid 32202: DETAIL: our declare coordinator message is accepted by all nodes
2019-08-30 12:16:30: pid 32202: LOG: setting the local node "10.241.166.21:9999 Linux pgpool_01" as watchdog cluster master
2019-08-30 12:16:30: pid 32202: LOG: I am the cluster leader node but we do not have enough nodes in cluster
2019-08-30 12:16:30: pid 32202: DETAIL: waiting for the quorum to start escalation process
2019-08-30 12:16:30: pid 32201: LOG: watchdog process is initialized
2019-08-30 12:16:30: pid 32202: LOG: new IPC connection received
2019-08-30 12:16:30: pid 32202: LOG: new IPC connection received
2019-08-30 12:16:30: pid 32217: LOG: 3 watchdog nodes are configured for lifecheck
2019-08-30 12:16:30: pid 32217: LOG: watchdog nodes ID:0 Name:"10.241.166.21:9999 Linux pgpool_01"
2019-08-30 12:16:30: pid 32217: DETAIL: Host:"10.241.166.21" WD Port:9000 pgpool-II port:9999
2019-08-30 12:16:30: pid 32217: LOG: watchdog nodes ID:1 Name:"Not_Set"
2019-08-30 12:16:30: pid 32217: DETAIL: Host:"pgpool_02" WD Port:9000 pgpool-II port:9999
2019-08-30 12:16:30: pid 32217: LOG: watchdog nodes ID:2 Name:"Not_Set"
2019-08-30 12:16:30: pid 32217: DETAIL: Host:"pgpool_03" WD Port:9000 pgpool-II port:9999
2019-08-30 12:16:30: pid 32201: LOG: Setting up socket for 0.0.0.0:9999
2019-08-30 12:16:30: pid 32201: LOG: Setting up socket for :::9999
2019-08-30 12:16:30: pid 32217: LOG: watchdog lifecheck trusted server "10.241.166.21" added for the availability check
2019-08-30 12:16:30: pid 32217: LOG: watchdog lifecheck trusted server "10.241.166.22" added for the availability check
2019-08-30 12:16:30: pid 32217: LOG: watchdog lifecheck trusted server "10.241.166.23" added for the availability check
2019-08-30 12:16:30: pid 32201: LOG: find_primary_node_repeatedly: waiting for finding a primary node
2019-08-30 12:16:30: pid 32201: LOG: find_primary_node: checking backend no 0
2019-08-30 12:16:30: pid 32201: LOG: find_primary_node: primary node id is 0
2019-08-30 12:16:30: pid 32201: LOG: pgpool-II successfully started. version 3.7.1 (amefuriboshi)
2019-08-30 12:16:31: pid 32218: LOG: set SO_REUSEPORT option to the socket
2019-08-30 12:16:31: pid 32218: LOG: creating watchdog heartbeat receive socket.
2019-08-30 12:16:31: pid 32218: DETAIL: set SO_REUSEPORT
2019-08-30 12:16:31: pid 32219: LOG: set SO_REUSEPORT option to the socket
2019-08-30 12:16:31: pid 32219: LOG: creating socket for sending heartbeat
2019-08-30 12:16:31: pid 32219: DETAIL: set SO_REUSEPORT
2019-08-30 12:16:31: pid 32221: LOG: set SO_REUSEPORT option to the socket
2019-08-30 12:16:31: pid 32221: LOG: creating watchdog heartbeat receive socket.
2019-08-30 12:16:31: pid 32221: DETAIL: set SO_REUSEPORT
2019-08-30 12:16:31: pid 32223: LOG: set SO_REUSEPORT option to the socket
2019-08-30 12:16:31: pid 32223: LOG: creating socket for sending heartbeat
2019-08-30 12:16:31: pid 32223: DETAIL: set SO_REUSEPORT

******************************************************************************************

The following commands for assign the VIP are working without any issues if these are executed directly

if_up_cmd = 'ip addr add 10.241.166.24/25 dev eth0 label eth0:0'
if_down_cmd = 'ip addr del 10.241.166.24/25 dev eth0'


********************************************************************************************

According to the log I think the issue can be with the following command SO_REUSEPORT but not sure about this
please if you have faced this kind of error
Tags: pgpool 3.7.1, settings
Steps To Reproduce: starting pgpool services

# /usr/share/pgpool-II/3.7.1/bin/pgpool -D -n > /var/log/pgpool/pgpool.log 2>&1 &
Additional Information:
Attached Files:
Notes
(0002807)
t-ishii   
2019-09-02 09:28   
I noticed two things:

1) Pgpool-II 3.7.1 is way old. I strongly recommend to upgrade to the newest one (3.7.11 at this point).

2) Probably VIP was not set because there's only 1 live watchdog node. I am not sure if this is an expected behavior of watchdog, though.
(0002808)
Carlos Mendez   
2019-09-02 12:39   
Hi T-Ishii
Thanks by your comments
1. I will check with the operation for go to newest version

2. For the second point according to your comments then it's necessary to have more pgpools?

 Our configuration include 3 pgpools but at this moment we are only trying to start just one

Tomorrow during my day I will try to start all the 3 pgpools and Check about this anyway if you have some other idea or comments about the issue please let me know


Regards
(0002809)
t-ishii   
2019-09-02 13:07   
> 2. For the second point according to your comments then it's necessary to have more pgpools?
>
> Our configuration include 3 pgpools but at this moment we are only trying to start just one
Yes, if you have 3 pgpools, then you should start at least two of them to let "quorum" exist.
(0002811)
Carlos Mendez   
2019-09-03 01:17   
T-Ishii

According to your last update, after to start the services for the second pgpool the VIP was assigned to the first server this is something that I did not know thanks by your support

In the log for the pgpool 1 I was able to see the following messages:


2019-09-02 06:01:40: pid 32202: LOG: new watchdog node connection is received from "10.241.166.22:60088"
2019-09-02 06:01:40: pid 32202: LOG: new node joined the cluster hostname:"pgpool_02" port:9000 pgpool_port:9999
2019-09-02 06:01:41: pid 32202: LOG: adding watchdog node "pgpool_02:9999 Linux pgpool_02" to the standby list
2019-09-02 06:01:41: pid 32202: LOG: quorum found
2019-09-02 06:01:41: pid 32202: DETAIL: starting escalation process
2019-09-02 06:01:41: pid 32202: LOG: escalation process started with PID:6889
2019-09-02 06:01:41: pid 6889: LOG: watchdog: escalation started
2019-09-02 06:01:41: pid 32201: LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog
2019-09-02 06:01:41: pid 32202: LOG: new IPC connection received
2019-09-02 06:01:41: pid 32201: LOG: watchdog cluster now holds the quorum
2019-09-02 06:01:41: pid 32201: DETAIL: updating the state of quarantine backend nodes
2019-09-02 06:01:41: pid 32202: LOG: new IPC connection received
2019-09-02 06:01:45: pid 6889: LOG: successfully acquired the delegate IP:"10.241.166.24"
2019-09-02 06:01:45: pid 6889: DETAIL: 'if_up_cmd' returned with success
2019-09-02 06:01:45: pid 32202: LOG: watchdog escalation process with pid: 6889 exit with SUCCESS.
2019-09-02 06:04:18: pid 32286: LOG: new connection received
2019-09-02 06:04:18: pid 32286: DETAIL: connecting host=10.241.156.104 port=64471
2019-09-02 06:04:23: pid 32308: LOG: new connection received
2019-09-02 06:04:23: pid 32308: DETAIL: connecting host=10.241.156.104 port=64474
2019-09-02 06:11:58: pid 32202: LOG: new watchdog node connection is received from "10.241.166.23:38023"
2019-09-02 06:11:58: pid 32202: LOG: new node joined the cluster hostname:"pgpool_03" port:9000 pgpool_port:9999
2019-09-02 06:11:59: pid 32202: LOG: adding watchdog node "pgpool_03:9999 Linux pgpool_03" to the standby list
2019-09-02 06:13:10: pid 32217: LOG: watchdog: lifecheck started


Regards
CAR
(0002812)
t-ishii   
2019-09-03 09:30   
Thanks for the report. Now I wonder if you shut down pgpool1 (i.e. pgpool0 is the only live node), then VIP is released or not.
I know that if you start the whole cluster with only 1 node, VIP is not assigned but this is another scenario.

Can you please test it out?
(0002814)
Carlos Mendez   
2019-09-04 02:11   
Hi T-ishii

At this moment we were able to start the VIP 10.241.166.24 thi is our configuration now:
pgpool_01 10.241.166.21
pgpool_02 10.241.166.22
pgpool_03 10.241.166.23
Primary DB 10.241.166.32
Standby DB 10.241.166. 33

If I start PGPOOL01 and some other PGPOOL (PGPOOL2 or PGPOOL3 ) the VIP is started on PGPOOL 1
But if I stop the services for pgpool1 the VIP is released but is not started in the remains pgpools (pgpool2 or pgpool3)
and the service to the Primary DB is affected

on the Logs we can see this:

STARTING SERVICES

PGPOOL1 LOG

2019-09-03 12:04:50: pid 16453: LOG: set SO_REUSEPORT option to the socket
2019-09-03 12:04:50: pid 16453: LOG: creating watchdog heartbeat receive socket.
2019-09-03 12:04:50: pid 16453: DETAIL: set SO_REUSEPORT
2019-09-03 12:04:56: pid 16438: LOG: new watchdog node connection is received from "10.241.166.22:28397"
2019-09-03 12:04:56: pid 16438: LOG: new node joined the cluster hostname:"pgpool_02" port:9000 pgpool_port:9999
2019-09-03 12:04:57: pid 16438: LOG: adding watchdog node "pgpool_02:9999 Linux pgpool_02" to the standby list
2019-09-03 12:04:57: pid 16438: LOG: quorum found
2019-09-03 12:04:57: pid 16438: DETAIL: starting escalation process
2019-09-03 12:04:57: pid 16438: LOG: escalation process started with PID:16568
2019-09-03 12:04:57: pid 16568: LOG: watchdog: escalation started
2019-09-03 12:04:57: pid 16437: LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog
2019-09-03 12:04:57: pid 16438: LOG: new IPC connection received
2019-09-03 12:04:57: pid 16437: LOG: watchdog cluster now holds the quorum
2019-09-03 12:04:57: pid 16437: DETAIL: updating the state of quarantine backend nodes
2019-09-03 12:04:57: pid 16438: LOG: new IPC connection received
2019-09-03 12:05:01: pid 16568: LOG: successfully acquired the delegate IP:"10.241.166.24"
2019-09-03 12:05:01: pid 16568: DETAIL: 'if_up_cmd' returned with success
2019-09-03 12:05:01: pid 16438: LOG: watchdog escalation process with pid: 16568 exit with SUCCESS.
2019-09-03 12:05:56: pid 16438: LOG: new watchdog node connection is received from "10.241.166.23:51383"
2019-09-03 12:05:56: pid 16438: LOG: new node joined the cluster hostname:"pgpool_03" port:9000 pgpool_port:9999
2019-09-03 12:05:57: pid 16438: LOG: adding watchdog node "pgpool_03:9999 Linux pgpool_03" to the standby list
2019-09-03 12:06:29: pid 16452: LOG: watchdog: lifecheck started


*************************************************************************************************************
STOPPING SERVICES PGPOOL1

PGPOOL1 LOG:

2019-09-03 12:07:37: pid 16438: LOG: Watchdog is shutting down
2019-09-03 12:07:37: pid 16720: LOG: watchdog: de-escalation started
2019-09-03 12:07:37: pid 16720: LOG: successfully released the delegate IP:"10.241.166.24"
2019-09-03 12:07:37: pid 16720: DETAIL: 'if_down_cmd' returned with success

PGPOOL 2 LOG

2019-09-03 12:07:37: pid 20356: LOG: remote node "pgpool_01:9999 Linux pgpool_01" is shutting down
2019-09-03 12:07:37: pid 20356: LOG: watchdog cluster has lost the coordinator node
2019-09-03 12:07:37: pid 20356: LOG: unassigning the remote node "pgpool_01:9999 Linux pgpool_01" from watchdog cluster master
2019-09-03 12:07:37: pid 20356: LOG: We have lost the cluster master node "pgpool_01:9999 Linux pgpool_01"
2019-09-03 12:07:37: pid 20356: LOG: watchdog node state changed from [STANDBY] to [JOINING]
2019-09-03 12:07:41: pid 20356: LOG: watchdog node state changed from [JOINING] to [INITIALIZING]
2019-09-03 12:07:42: pid 20356: LOG: I am the only alive node in the watchdog cluster
2019-09-03 12:07:42: pid 20356: HINT: skiping stand for coordinator state
2019-09-03 12:07:42: pid 20356: LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
2019-09-03 12:07:42: pid 20356: LOG: I am announcing my self as master/coordinator watchdog node
2019-09-03 12:07:46: pid 20356: LOG: I am the cluster leader node
2019-09-03 12:07:46: pid 20356: DETAIL: our declare coordinator message is accepted by all nodes
2019-09-03 12:07:46: pid 20356: LOG: setting the local node "pgpool_02:9999 Linux pgpool_02" as watchdog cluster master
2019-09-03 12:07:46: pid 20356: LOG: I am the cluster leader node but we do not have enough nodes in cluster
2019-09-03 12:07:46: pid 20356: DETAIL: waiting for the quorum to start escalation process
2019-09-03 12:07:46: pid 20355: LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog
2019-09-03 12:07:46: pid 20356: LOG: new IPC connection received
2019-09-03 12:07:46: pid 20356: LOG: new IPC connection received


PGPOOL3 LOG

2019-09-03 12:07:37: pid 20870: LOG: remote node "pgpool_01:9999 Linux pgpool_01" is shutting down
2019-09-03 12:07:37: pid 20870: LOG: watchdog cluster has lost the coordinator node
2019-09-03 12:07:37: pid 20870: LOG: unassigning the remote node "pgpool_01:9999 Linux pgpool_01" from watchdog cluster master
2019-09-03 12:07:37: pid 20870: LOG: We have lost the cluster master node "pgpool_01:9999 Linux pgpool_01"
2019-09-03 12:07:37: pid 20870: LOG: watchdog node state changed from [STANDBY] to [JOINING]
2019-09-03 12:07:41: pid 20870: LOG: watchdog node state changed from [JOINING] to [INITIALIZING]
2019-09-03 12:07:42: pid 20870: LOG: I am the only alive node in the watchdog cluster
2019-09-03 12:07:42: pid 20870: HINT: skiping stand for coordinator state
2019-09-03 12:07:42: pid 20870: LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
2019-09-03 12:07:42: pid 20870: LOG: I am announcing my self as master/coordinator watchdog node
2019-09-03 12:07:46: pid 20870: LOG: I am the cluster leader node
2019-09-03 12:07:46: pid 20870: DETAIL: our declare coordinator message is accepted by all nodes
2019-09-03 12:07:46: pid 20870: LOG: setting the local node "pgpool_03:9999 Linux pgpool_03" as watchdog cluster master
2019-09-03 12:07:46: pid 20870: LOG: I am the cluster leader node but we do not have enough nodes in cluster
2019-09-03 12:07:46: pid 20870: DETAIL: waiting for the quorum to start escalation process
2019-09-03 12:07:46: pid 20870: LOG: new IPC connection received



As we can see the VIP is released by pgpool1 but is not started in any other PGPOOL
Any comments will help me

Regards
(0002815)
t-ishii   
2019-09-04 07:28   
I think it's an expected behavior. If there's only one node out of 3, then there's no quorum, thus pgpool won't bring up VIP. You need to keep at least 2 nodes to bring up VIP.
(0002816)
t-ishii   
2019-09-04 10:41   
(Last edited: 2019-09-04 11:57)
Wait. If pgpool2 and pgpool3 are alive, VIP should be brought up.
Can you show the result of following commands:

pcp_watchdog_info -v -p pcp_port -h 10.241.166.22
pcp_watchdog_info -v -p pcp_port -h 10.241.166.22

where "pcp_port" is your setting in pgpool.conf.

(0002817)
Carlos Mendez   
2019-09-05 01:42   
Hi Buddy
Yes it was the situation pgpool1 was down and pgpool2 and pgpool3 were as standby, then the VIP was released by pgpool1 but the VIP was not able to start in any other pgpool (2 or 3)

 Anyway looks we have fix this situation, and looks the issue is with a PORT since we have stopped the firewall for the other pgpool servers (pgpool2 and pgpool3) the VIP was able to start if we stop pgpool 1

#systemctl stop firewalld


Thanks
(0002818)
t-ishii   
2019-09-05 09:14   
Thanks for the feedback. Can we close this issue?
(0002822)
Carlos Mendez   
2019-09-05 23:57   
hi Buddy
Yes the issue can be closed, thanks by your support

Regards


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
539 [Pgpool-II] Bug crash random 2019-08-26 21:38 2019-09-02 15:13
Reporter: RKo_ Platform: Linux  
Assigned To: t-ishii OS: CentOS  
Priority: normal OS Version: 7.6.1810  
Status: feedback Product Version: 4.0.5  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version: 4.0.6  
    Target Version:  
Summary: Random segfault crashes of pgpool
Description: Pgpool-II-96 is ran on master server of master-slave combo. Postgres version is 9.6.14. Randomly the pgpool crashes on segfault but is rather quickly responsive after service is automatically restarted. On some occasions the monitoring will happen exactly at the same time, but mostly crashes have happened without anyone noticing them. Finally these 1 minute breaks in monitoring was noticed, and logs were showing segfault errors of pgpool.

Will attach dmesg errors, /var/log/messages, pgpool configs and core dumps of pgpool.
Tags:
Steps To Reproduce:
Additional Information:
Attached Files: pgpool_coredumps.zip (1,615,654 bytes) 2019-08-26 21:38
https://www.pgpool.net/mantisbt/file_download.php?file_id=641&type=bug
messages.txt (300 bytes) 2019-08-26 21:38
https://www.pgpool.net/mantisbt/file_download.php?file_id=640&type=bug
core-pgpool-11-26-26-123586-1566799999.txt (9,184 bytes) 2019-08-26 21:38
https://www.pgpool.net/mantisbt/file_download.php?file_id=639&type=bug
core-pgpool-11-26-26-92411-1566813128.txt (9,184 bytes) 2019-08-26 21:38
https://www.pgpool.net/mantisbt/file_download.php?file_id=638&type=bug
pgpool.conf (2,408 bytes) 2019-08-26 21:38
https://www.pgpool.net/mantisbt/file_download.php?file_id=637&type=bug
Notes
(0002802)
t-ishii   
2019-08-27 07:51   
(Last edited: 2019-08-28 07:29)
I think this has been already fixed in 4.0.6.
https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=b6d77f470d1506a467776614216e2d727e4d2344

(0002805)
RKo_   
2019-08-28 14:00   
Ok, shall try 4.0.6. I will close the issue, and reopen if segfault occurs again.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
480 [Pgpool-II] General minor have not tried 2019-03-26 17:37 2019-08-26 10:04
Reporter: ben Platform: linux  
Assigned To: pengbo OS: Ubuntu  
Priority: normal OS Version: 16.04.5 LTS  
Status: feedback Product Version: 3.4.3  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: FATAL: unable to read data from DB node 0 Detail: EOF encountered with backend.
Description: Hi all,

I have two servers PostgresSQL Master/Slave - Slot Replication
I also have a scenario PostgreSQL HA with pgpool-II

All Server [Virtual Maschine] are running on:
Ubuntu 16.04.5 LTS
postgresql-9-5-pgpool2 version 3.4.3-1
pgpool2 version 3.4.3-1
Tomcat7
psql (9.5.14)

Server PgpoolII : Tomcat7 + pgpool_master(primary)IP: x.x.x.x
Server PgpoolII : tomcat7 + pgpool_slave (standby)IP: x.x.x.x

Server Master : PGsql Backend (primary - node 0)IP: x.x.x.x
Server Slave : PGsql Backend (standby - node 1)IP: x.x.x.x


Use Case:
What happens if the Master Server [PGsql Backend - (node 0)] is completely turned off?

1. Standby Server is automatically converted to Master Server (using the failover_command of the pgpool process).

Before turning off the server:
root@pgps2:~# su - postgres
postgres@pgps2:~$ psql -h localhost -p 5433 -c "show pool_nodes" postgres
 node_id | hostname | port | status | lb_weight | role
---------+---------------+------+--------+-----------+---------
 0 | x.x.x.x | 5432 | 2 | 0.500000 | primary
 1 | x.x.x.x | 5432 | 2 | 0.500000 | standby
(2 rows)


After turning off the server:
root@pgps2:~# su - postgres
postgres@pgps2:~$ psql -h localhost -p 5433 -c "show pool_nodes" postgres
 node_id | hostname | port | status | lb_weight | role
---------+---------------+------+--------+-----------+---------
 0 | x.x.x.x | 5432 | 3 | 0.500000 | standby
 1 | x.x.x.x | 5432 | 2 | 0.500000 | primary
(2 rows)


2. The Web Application (from Tomcat7 - pgpool_master) no longer has any connection to the database.

From Firefox:

Internal Error
org.hibernate.exception.JDBCConnectionException: could not inspect JDBC autocommit mode
javax.persistence.PersistenceException: org.hibernate.exception.JDBCConnectionException: could not inspect JDBC autocommit mode at


3. The following error is received:

Pgpool LOG:

org.postgresql.util.PSQLException: FATAL: unable to read data from DB node 0 Detail: EOF encountered with backend.


4. The two Tomcat7-Pgpool2-Server show, that there is no connection with the database.

postgres@pgps2:~$ psql -h x.x.x.x -p 5433 -c "show pool_nodes" postgres
psql: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
postgres@pgps2:~$



What can be done here so that pgpool does not cut the connection to the database, but pgpool2 and also tomcat7 (Web Application) recognize the connection to the database?


Best regards, a help in advance is appreciated.

Ben
Tags:
Steps To Reproduce:
Additional Information:
Attached Files:
Notes
(0002465)
pengbo   
2019-03-27 09:02   
Does it also occur in latest version of 3.4.x?
(0002491)
ben   
2019-04-01 16:13   
I have tested the version pgpool-3.6.1 and it has the same problem.

Log pgpool:
pid 6437: LOG: failed to connect to PostgreSQL server on "x.x.x.x:5432", getsockopt() detected error "connection refused"
pid 6437: ERROR: failed to make persistent db connection

Log tomcat7:
org.postgresql.util.PSQLException: This connection has been closed.

Here is a URL: http://www.sraoss.jp/pipermail/pgpool-general/2018-December/006382.html
Is this URL similar to this problem?
Is there any idea to solve this feature?
(0002492)
pengbo   
2019-04-01 16:40   
If possible, could you try pgpool-II 3.6.16?

Could you provide the debug log of Pgpool-II or could you provide a test program to help us to reproduce this issue.
(0002799)
pengbo   
2019-08-26 10:04   
No response more than one month.
May I close this issue?


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
525 [Pgpool-II] Bug major sometimes 2019-06-25 14:28 2019-08-15 19:15
Reporter: guobo507 Platform: x86_64  
Assigned To: t-ishii OS: CentOS  
Priority: normal OS Version: 7.6.1810 (Core)  
Status: resolved Product Version: 4.0.3  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version: 4.0.6  
    Target Version: 4.0.6  
Summary: Segmentation fault occurs when memory_cache_enable is set to on
Description: Hi there,
When memory_cache_enable = on , whether using shm or memcacahed,my java project always has the following error:

org.hibernate.exception.JDBCConnectionException: could not execute query
        at org.hibernate.exception.internal.SQLStateConversionDelegate.convert(SQLStateConversionDelegate.java:115)
        at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:42)
        at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:109)
        at org.hibernate.loader.Loader.doList(Loader.java:2614)
        at org.hibernate.loader.Loader.doList(Loader.java:2594)
        at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2423)
        at org.hibernate.loader.Loader.list(Loader.java:2418)
        at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:501)
        at org.hibernate.hql.internal.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:371)
        at org.hibernate.engine.query.spi.HQLQueryPlan.performList(HQLQueryPlan.java:216)
        at org.hibernate.internal.SessionImpl.list(SessionImpl.java:1326)
        at org.hibernate.internal.QueryImpl.list(QueryImpl.java:87)
        at com.yinhai.core.service.ta3.domain.dao.HibernateDAO.selectFildsList(HibernateDAO.java:439)
        at com.yinhai.modules.codetable.domain.dao.impl.Aa10a1DaoImpl.selectCodeList(Aa10a1DaoImpl.java:60)
        at com.yinhai.modules.codetable.domain.dao.impl.Aa10a1DaoImpl$$FastClassBySpringCGLIB$$6fcdd538.invoke(<generated>)
        at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:721)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
        at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
        at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:656)
        at com.yinhai.modules.codetable.domain.dao.impl.Aa10a1DaoImpl$$EnhancerBySpringCGLIB$$a9bb5f6d.selectCodeList(<generated>)
        at com.yinhai.modules.codetable.domain.bpo.AppCodeBpoImpl.queryAppCode(AppCodeBpoImpl.java:67)
        at com.yinhai.modules.codetable.domain.bpo.AppCodeBpoImpl$$FastClassBySpringCGLIB$$e718e8c7.invoke(<generated>)
        at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:721)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
        at org.springframework.aop.aspectj.AspectJAfterAdvice.invoke(AspectJAfterAdvice.java:47)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
        at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
        at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:656)
        at com.yinhai.modules.codetable.domain.bpo.AppCodeBpoImpl$$EnhancerBySpringCGLIB$$f581cb7a.queryAppCode(<generated>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
        at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:99)
        at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:282)
        at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:96)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213)
        at com.sun.proxy.$Proxy77.queryAppCode(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
        at org.springframework.aop.aspectj.AspectJAfterAdvice.invoke(AspectJAfterAdvice.java:47)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
        at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213)
        at com.sun.proxy.$Proxy77.queryAppCode(Unknown Source)
        at com.yinhai.modules.codetable.domain.bpo.AppCodeCacheBpoImpl.refreshAll(AppCodeCacheBpoImpl.java:141)
        at com.yinhai.modules.codetable.domain.bpo.AppCodeCacheBpoImpl$$FastClassBySpringCGLIB$$1b9a2859.invoke(<generated>)
        at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:721)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
        at org.springframework.aop.aspectj.AspectJAfterAdvice.invoke(AspectJAfterAdvice.java:47)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
        at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
        at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:656)
        at com.yinhai.modules.codetable.domain.bpo.AppCodeCacheBpoImpl$$EnhancerBySpringCGLIB$$2bf86ec2.refreshAll(<generated>)
        at com.yinhai.modules.codetable.domain.bpo.CodeTableInitService.initAppCode(CodeTableInitService.java:38)
        at com.yinhai.modules.codetable.domain.bpo.CodeTableInitService.init(CodeTableInitService.java:27)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeCustomInitMethod(AbstractAutowireCapableBeanFactory.java:1758)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1695)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1624)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:555)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:483)
        at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
        at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
        at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
        at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
        at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:761)
        at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:866)
        at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:542)
        at org.springframework.web.context.ContextLoader.configureAndRefreshWebApplicationContext(ContextLoader.java:443)
        at org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:325)
        at org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:107)
        at com.yinhai.project.listener.StartupListener.contextInitialized(StartupListener.java:18)
        at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4861)
        at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5322)
        at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:145)
        at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:753)
        at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:729)
        at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:717)
        at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:974)
        at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1850)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
        at org.postgresql.core.v3.QueryExecutorImpl.fetch(QueryExecutorImpl.java:2377)
        at org.postgresql.jdbc.PgResultSet.next(PgResultSet.java:1856)
        at com.alibaba.druid.filter.FilterChainImpl.resultSet_next(FilterChainImpl.java:654)
        at com.alibaba.druid.filter.FilterAdapter.resultSet_next(FilterAdapter.java:1885)
        at com.alibaba.druid.filter.FilterChainImpl.resultSet_next(FilterChainImpl.java:651)
        at com.alibaba.druid.proxy.jdbc.ResultSetProxyImpl.next(ResultSetProxyImpl.java:882)
        at com.alibaba.druid.pool.DruidPooledResultSet.next(DruidPooledResultSet.java:69)
        at org.hibernate.loader.Loader.processResultSet(Loader.java:968)
        at org.hibernate.loader.Loader.doQuery(Loader.java:930)
        at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:336)
        at org.hibernate.loader.Loader.doList(Loader.java:2611)
        ... 101 common frames omitted
Caused by: java.io.EOFException: null
        at org.postgresql.core.PGStream.receiveChar(PGStream.java:308)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1952)
        at org.postgresql.core.v3.QueryExecutorImpl.fetch(QueryExecutorImpl.java:2372)
        ... 111 common frames omitted
......

The system dmesg message shows the following information when error occur:

Jun 20 06:25:24 localhost kernel: pgpool[8971]: segfault at ffffffff00000272 ip 00007f967eae06e7 sp 00007ffce13c4c58 error 6 in libc-2.12.so[7f967ea57000+18a000]
Jun 20 06:25:27 localhost abrt[9163]: Saved core dump of pid 8971 (/usr/pgpool-11/bin/pgpool) to /var/spool/abrt/ccpp-2019-06-20-06:25:24-8971 (1048576000 bytes)
Jun 20 06:25:27 localhost abrtd: Directory 'ccpp-2019-06-20-06:25:24-8971' creation detected
Jun 20 06:25:27 localhost pgpool[8958]: [7-1] 2019-06-20 14:25:27: pid 8958: WARNING: child process with pid: 8971 was terminated by segmentation fault
Jun 20 06:25:27 localhost pgpool[8958]: [8-1] 2019-06-20 14:25:27: pid 8958: LOG: fork a new child process with pid: 9165
Jun 20 06:25:27 localhost pgpool[8974]: [6-1] 2019-06-20 14:25:27: pid 8974: LOG: new connection received
Jun 20 06:25:27 localhost pgpool[8974]: [6-2] 2019-06-20 14:25:27: pid 8974: DETAIL: connecting host=localhost port=59906
Jun 20 06:25:27 localhost pgpool[8974]: [7-1] 2019-06-20 14:25:27: pid 8974: LOG: Parse message from frontend.
Jun 20 06:25:27 localhost pgpool[8974]: [7-2] 2019-06-20 14:25:27: pid 8974: DETAIL: statement: "", query: "SET extra_float_digits = 3"
Jun 20 06:25:27 localhost pgpool[8974]: [8-1] 2019-06-20 14:25:27: pid 8974: LOG: DB node id: 0 backend pid: 9167 statement: Parse: SET extra_float_digits = 3
Jun 20 06:25:27 localhost pgpool[8974]: [9-1] 2019-06-20 14:25:27: pid 8974: LOG: Bind message from frontend.
......

My web page prompt error 500 message.

My system logs (messages) is: messages.2_with-memory_cache_enabled-on.txt, which also contains the pgpool-II log.

By the way, when memory_cache_enable = off, everything works fine. The rpm packages we use are all from the pgdg yum repository.

can somebody help?


Tags:
Steps To Reproduce:
Additional Information:
Attached Files: catalina.out.2_with-memory_cache_enabled-on (114,907 bytes) 2019-06-21 12:06
https://www.pgpool.net/mantisbt/file_download.php?file_id=603&type=bug
messages.2_with-memory_cache_enabled-on (261,770 bytes) 2019-06-21 12:06
https://www.pgpool.net/mantisbt/file_download.php?file_id=602&type=bug
stack-trace-pgpool.txt (8,307 bytes) 2019-06-25 18:13
https://www.pgpool.net/mantisbt/file_download.php?file_id=605&type=bug
mail-info.txt (61,873 bytes) 2019-06-25 18:13
https://www.pgpool.net/mantisbt/file_download.php?file_id=604&type=bug
java-log.txt (12,499 bytes) 2019-07-02 18:25
https://www.pgpool.net/mantisbt/file_download.php?file_id=608&type=bug
trace.txt (8,351 bytes) 2019-07-02 18:25
https://www.pgpool.net/mantisbt/file_download.php?file_id=607&type=bug
mail.txt (61,841 bytes) 2019-07-02 18:25
https://www.pgpool.net/mantisbt/file_download.php?file_id=606&type=bug
bug525.diff (790 bytes) 2019-07-02 18:51
https://www.pgpool.net/mantisbt/file_download.php?file_id=609&type=bug
Notes
(0002674)
t-ishii   
2019-06-25 14:38   
Can you share Pgpool-II log with log_client_messages = on? Also stack trace of Pgpool-II would be valuable info.
(0002676)
guobo507   
2019-06-25 18:13   
In my pgpool.conf, log_client_messages is already enabled.

When error occur, the root user received an email message and I have uploaded the file(mail-info.txt).
(0002681)
t-ishii   
2019-06-26 10:58   
Thanks for the logs. I will look into them.
(0002686)
t-ishii   
2019-07-02 14:56   
Still struggling to find the cause of the segfault. It seems your stack trace's pid (12516) is different from the ones (8971, 8974) and appeared in the pgpool log (messages.2_with-memory_cache_enabled-on). Do you have minimum test data to reproduce the problem?
I was not able to reproduce the segfault.
(0002687)
guobo507   
2019-07-02 16:10   
TKS for the reply. i will reproduce the segfault and collect the information you need later... thanks again!
(0002688)
guobo507   
2019-07-02 18:25   
Hi, it may be difficult for you to reproduce the segfaul, because at the moment I noticed that the error only appeared on one of our java projects. And, only appeared when we set memory_cache_enable = on.

 I have reproduced the segfault, the stack trace information is in the trace.txt file. The pgpool-II log is in mail.txt, which is the mail received by the root user when error occur. Apache server log is in java-log.txt file, which only include the error info.

TKS a lot!
(0002689)
t-ishii   
2019-07-02 18:50   
Thank for the info. It seems pgpool segfaults at the same place as before. Can you please try out attached patch?
It seems Pgpool-II may access wrong pointer.
(0002690)
guobo507   
2019-07-02 23:01   
Thanks very much, but can you tell me how to use it? I have never patched pgpool-II...
(0002691)
t-ishii   
2019-07-03 08:49   
(Last edited: 2019-07-03 08:50)
You need to install Pgpool-II from the source code. The instruction how to do it can be found in the manual:
http://www.pgpool.net/docs/latest/en/html/installation.html

Before going further, you can apply the patch like this:
tar xf pgpool-II-4.0.5.tar.gz
cd pgppool-II--4.0.5
patch -b -p1 < /path/to/bug525.diff
./configure ....

(0002692)
guobo507   
2019-07-03 08:58   
tks, i will try it later...
(0002693)
guobo507   
2019-07-03 12:31   
TKS for the patch, I have tested it on pgpool-II 4.0.3 and 4.0.5, both works fine.
(0002694)
t-ishii   
2019-07-03 12:53   
Glad to hear that. I am going to commit the patch. Thanks!


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
483 [Pgpool-II] Bug minor have not tried 2019-03-26 21:18 2019-08-15 19:14
Reporter: nagata Platform:  
Assigned To: Muhammad Usama OS:  
Priority: normal OS Version:  
Status: resolved Product Version: 3.6.15  
Product Build: Resolution: fixed  
Projection: none      
ETA: none Fixed in Version: 3.6.18  
    Target Version:  
Summary: online-recovery is blocked after a child process exits abnormally with replication mode and watchdog
Description: Pgpool-II 3.6.15 fixed a failure of online-recovery which occurs when child processes exit abnormally.

 - Fix online recovery failed due to client_idle_limit_in_recovery in certain cases. (bug 431) (Tatsuo Ishii)
(https://www.pgpool.net/mantisbt/view.php?id=431)

However, got the same problem even in 3.6.15, after a child was terminated by segfault, although client_idle_limit_in_recovery = -1.

I found this is due to watchdog. When watchdog is enabled, wd_start_recovery() is called just after "2nd stage" starts. In wd_start_recovery(), the recovery request is sent to other pgpool, and the other pgpool will waits for all children exiting.

However, if some child process has exited abnormally in the other pgpool, this never returns a response because Req_info->conn_counter cannot be zero. Therefore, the original pgpool will waits for the response until the timeout is detected, and the online recovery fails eventually.

I guess a fix similar to one in 3.6.15 will be needed in process_wd_command_timer_event(), where Req_info->conn_counter is checked periodically for processing recovery commands received by watchdog.
Tags:
Steps To Reproduce: The related discussion is in
 [pgpool-hackers: 3211] Deal with recovery failure by an abnormally exiting child proces
See this thread for details.

Additional Information:
Attached Files: bug_483.diff (2,295 bytes) 2019-04-16 00:21
https://www.pgpool.net/mantisbt/file_download.php?file_id=580&type=bug
Notes
(0002556)
Muhammad Usama   
2019-04-16 00:21   
Hi yugo

I think your analysis is spot on. I have applied the solution done by Tatsuo Ishii for bug 431into watchdog code flow.
Can you please try out the attached patch to see if it solves the issue.

Thanks
(0002598)
harukat   
2019-05-17 19:41   
(Last edited: 2019-05-17 19:54)
Hello Usama,
I did a test for the bug_483.diff patch with Pgpool-II 3.6.17.
It seems not to work.

(connect via pgpool1 and kill its child process)
$ psql -p 9999 db1
$ ps x | grep pgpool | grep idle
$ kill -9 12345

(recovery request via pgpool2)
$ pcp_recovery_node -p 9797 1
ERROR: node recovery failed, failed to send start recovery packet

This can be success without the previous step.

pgpool1 (that has a killed child pgpool proces) log:
2019-05-17 19:33:51: pid 13391: LOG: watchdog received online recovery request from "localhost:9998 Linux srapc2499"
2019-05-17 19:33:51: pid 13391: LOG: wait_connection_closed: mulformed conn_counter (1) detected. reset it to 0

pgpool2 log:
2019-05-17 19:33:51: pid 13654: WARNING: start recovery command lock failed
2019-05-17 19:33:51: pid 13654: DETAIL: ipc command timeout
2019-05-17 19:33:51: pid 13393: LOG: new IPC connection received
2019-05-17 19:33:51: pid 13393: LOG: read from socket failed, remote end closed the connection
2019-05-17 19:33:51: pid 13393: LOG: online recovery request from local pgpool-II node received on IPC interface is forwarded to master watchdog node "localhost:9998 Linux srapc2499"
2019-05-17 19:33:51: pid 13393: DETAIL: waiting for the reply...
2019-05-17 19:33:51: pid 13654: ERROR: node recovery failed, failed to send start recovery packet
2019-05-17 19:33:51: pid 13608: LOG: PCP process with pid: 13654 exit with SUCCESS.
2019-05-17 19:33:51: pid 13608: LOG: PCP process with pid: 13654 exits with status 0

According to the script logs, recovery_1st_stage was executed,
but recovery_2nd_stage and pgpool_remote_start weren't executed.

(0002642)
harukat   
2019-05-30 14:09   
Hello, Usama and Yugo.
It has been a while. I am reporting more.

In this case above, pgpool2's issue_command_to_watchdog() always resulted in timeout
that exceeds recovery_timeout like the following logic flow.

In pgpool2, which receives the pcp_recovery_node request:
  wd_start_recovery() - issue_command_to_watchdog() with recovery_timeout

In pgpool1, which has a child process that killed:
   ensure_conn_counter_validity() after timer_expired (recovery_timeout)

I confirmed that the change like the following for example can avoid this failure.

src/watchdog/wd_commands.c: wd_start_recovery(void):
     WDIPCCmdResult *result = issue_command_to_watchdog(WD_IPC_ONLINE_RECOVERY_COMMAND,
- pool_config->recovery_timeout,
+ pool_config->recovery_timeout + 10,
                                                        func, strlen(func), true);

When I do pcp_recovery_node for pgpool1 with bug_483.diff patch,
it always succeeds.
(0002668)
Muhammad Usama   
2019-06-19 15:27   
Hi Haruka,

Thanks for the patch. I am looking into this one.
(0002757)
Muhammad Usama   
2019-08-08 23:45   
Hi Haruka,

I can confirm your suggested changes were required on top of the original patch. I have committed the modified patch on all supported branches.

https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=c402060814ba9737e9572ba07eef1be7664c9679

Thanks


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
515 [Pgpool-II] Bug major always 2019-05-21 12:55 2019-08-15 16:56
Reporter: staslg Platform: x86_64  
Assigned To: t-ishii OS: CentOS Linux release 7.6.1810  
Priority: normal OS Version: 7.6  
Status: feedback Product Version: 4.0.5  
Product Build: Resolution: reopened  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: Pgpool node status waiting
Description: Hi

I have a question about node status:
Very often I see status waiting and I understand that failover do not work

But if I check file status I see that status is up for all nodes:
postgres@db-balanser$ cat /var/log/pgpool-II-10/pgpool_status
up
up

If I make command for pcp_attach waiting node status do not changed

How to fix this bug ?
Tags:
Steps To Reproduce: postgres@db-balanser$ psql -U pgpool -p 9999 --host sso-db-balanser --dbname postgres -c "show pool_nodes"
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | last_status_change
---------+----------+------+---------+-----------+--------+------------+-------------------+-------------------+---------------------
 0 | db1 | 5432 | up | 0.500000 | master | 0 | true | 0 | 2019-05-21 06:37:52
 1 | db2 | 5432 | waiting | 0.500000 | slave | 0 | false | 0 | 2019-05-21 06:38:05

repmgr status

 ID | Name | Role | Status | Upstream | Location | Priority | Connection string
----+-------+---------+-----------+----------+----------+----------+--------------------------------------------------
 1 | node1 | primary | * running | | default | 100 | host=db1 port=5432 user=repmgr dbname=repmgr
 2 | node2 | standby | running | node1 | default | 100 | host=db2 port=5432 user=repmgr dbname=repmgr




Additional Information:
Attached Files:
Notes
(0002610)
t-ishii   
2019-05-21 13:28   
"waiting" can be safely assumed same as "up". "Waiting" means the PostgreSQL server is up and running but has not been connected from Pgpool-II yet. So once a client connects to Pgpool-II, it connects to PostgreSQL and the status will be turned to "up".
(0002611)
staslg   
2019-05-21 13:50   
Thanks for your answer, today I will check again
(0002629)
staslg   
2019-05-23 14:03   
Thanks
The ticket can be closed.
(0002639)
staslg   
2019-05-28 13:13   
Hi again
We check again failover today and it dose not work :(

Firstly we checked status Node0 and Node1, all in up status
When we shutdown node0, failover stared and promot to node1 - repmgr work correctly
But pgpool shows status waiting to node1 and connection refused

Check status pgpool:
psql -U pgpool -p 9999 --host db-bln-stg --dbname postgres -c "show pool_nodes"
psql: FATAL: failed to create a backend connection
DETAIL: executing failover on backend

Check status nodes:
db0 5432 3 0.500000 down master 0 2019-05-28 06:57:54
db1 5432 1 0.500000 waiting slave 0 2019-05-28 06:57:44

Check status from repmgr:
ID | Name | Role | Status | Upstream | Location | Priority | Connection string
----+-------+---------+-----------+----------+----------+----------+----------------------------------------------------------
1 | db1 | primary | - failed | | default | 100 | host=node1-sso-stage port=5432 user=repmgr dbname=repmgr
2 | db2 | primary | * running | | default | 100 | host=node2-sso-stage port=5432 user=repmgr dbname=repmgr

What's problem ?
(0002653)
t-ishii   
2019-06-13 21:44   
Sorry for delay. I noticed:

1. You configure Pgpool-II as master/slave mode. Probably it's not what you want. You should configure Pgpool-II in streaming replication mode.

2. Both Pgpool-II and repmgr try to manage failover. It's not surprising that Pgpool-II does not work well with failover if someone else tries to manage it.
(0002785)
t-ishii   
2019-08-15 16:56   
No response from the reporter over 1 month. Issue will be closed.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
503 [Pgpool-II] Bug crash always 2019-04-28 11:41 2019-08-15 16:37
Reporter: zfuhao Platform: Openstack/KVM  
Assigned To: t-ishii OS: Red Hat Enterprise Linux Server  
Priority: immediate OS Version: 6.8  
Status: feedback Product Version: 4.0.4  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version:  
Summary: 2019-04-28 09:43:14: pid 26332: WARNING: child process with pid: 26387 was terminated by segmentation fault
Description: Every time we do a stress test, we always get this error when simulating the concurrency of around 200 users.

019-04-28 09:58:51: pid 28450: WARNING: child process with pid: 28847 was terminated by segmentation fault
2019-04-28 09:58:51: pid 28450: LOG: fork a new child process with pid: 30567

But we have no problem connecting directly to the three instances in the background.

Tags:
Steps To Reproduce:
Additional Information:
Attached Files: gdb.log (10,844 bytes) 2019-04-28 11:41
https://www.pgpool.net/mantisbt/file_download.php?file_id=586&type=bug
pgpool.conf (41,592 bytes) 2019-04-28 11:41
https://www.pgpool.net/mantisbt/file_download.php?file_id=585&type=bug
core28847.rar (121,959 bytes) 2019-04-28 12:17
https://www.pgpool.net/mantisbt/file_download.php?file_id=587&type=bug
Notes
(0002577)
t-ishii   
2019-04-28 19:10   
Which version of PostgreSQL are you using?
(0002581)
zfuhao   
2019-05-05 16:43   
Thank you for your reply, the version of our database is 9.5.16
(0002582)
t-ishii   
2019-05-06 10:00   
(Last edited: 2019-05-06 10:02)
Thank you for the info. From gdb.log,
    if (create_table_stmt->relation->relpersistence == 't')
access to a pointer (create_table_stmt->relation) seems to cause the segfault. Apparently this should not happen but I failed to see the cause from what I read the source code. So I tried to reproduce the error by running following simple script using pgbench:
$ cat ../create_table.sql
CREATE TEMP TABLE t1(i int);
SELECT * FROM t1;

$ pgbench -p 11000 -n -c 180 -C -T 300 -P 5 -f ../create_table.sql test

So far I see no error.

Can you provide a test case so that I could reproduce the problem?

(0002783)
t-ishii   
2019-08-15 16:37   
No response from the reporter over 1 month. Issue will be closed.


View Issue Details
ID: Category: Severity: Reproducibility: Date Submitted: Last Update:
467 [Pgpool-II] Bug major always 2019-02-20 14:06 2019-08-14 16:12
Reporter: max3903 Platform: Linux  
Assigned To: pengbo OS: Oracle Linux  
Priority: urgent OS Version: 7  
Status: assigned Product Version: 4.0.2  
Product Build: Resolution: open  
Projection: none      
ETA: none Fixed in Version:  
    Target Version: 4.0.6  
Summary: Encoding issue with bytea field and PostgreSQL 11
Description: I am using 3 servers:

Application Server:
* Oracle Linux 7
* Odoo 11
* Python 3.6.6
* Psycopg2 2.7.5

PGPool Server:
* Oracle Linux 7
* PGPool 4.0.2 from the yum repository

DB Server:
* Oracle Linux 7
* PostgreSQL 11
Tags: pgpool-II
Steps To Reproduce: Odoo is using binary field to upload files:
https://github.com/odoo/odoo/blob/12.0/addons/base_import/models/base_import.py#L116

and psycopg2 store it in a bytea attribute:
https://github.com/odoo/odoo/blob/12.0/odoo/fields.py#L1768

Any uploaded file is not stored correctly in the database when using PGPool. If I connect Odoo directly to PostgreSQL, all my scenario work.

For example, when importing data from a csv file, the file is encoded and stored in the database.
Then the first lines get displayed for preview and column-to-field matching.
See screenshot attached below.

Another example I have is that the value of a bytea attribute is decoded and I get a "Invalid padding" error:
  File "/usr/lib64/python3.6/base64.py", line 87, in b64decode
    return binascii.a2b_base64(s)
binascii.Error: Incorrect padding