[pgpool-general: 2989] Re: Pgpool cluster with PostgreSQL 9.3 (bug or misconfiguration?)

Mon Jun 30 14:37:09 JST 2014

Yugo Nagata <nagata at sraoss.co.jp> writes:

> On Sat, 21 Jun 2014 23:05:12 -0700 Alexandru Cardaniuc
> <cardaniuc at gmail.com> wrote:

>> Also to add, for some reason with this configuration and setup, while
>> testing failover and failing the master database, pgpool correctly
>> identified the master as failed and detached it from the cluster. But
>> all writes failed to the slave since the trigger was not done and the
>> slave didn't become master (was still and read-only state). For some
>> reason, failover_command = '/store/pgdata/failover_ stream.sh %d %P
>> %H %R postgresql.trigger' was not executed.
>> 
>> Normally, I expect to see something like that in the logs:
>> 2014-06-03 05:25:33 ERROR: pid 2970: make_persistent_db_connection:
>> connection to 10.0.90.11(5432) failed 2014-06-03 05:25:33 ERROR: pid
>> 2970: health check failed. 0 th host 10.0.90.11 at port 5432 is down
>> 2014-06-03 05:25:33 LOG: pid 2970: set 0 th backend down status
>> 2014-06-03 05:25:33 LOG: pid 2970: wd_start_interlock: start
>> interlocking 2014-06-03 05:25:33 LOG: pid 2970:
>> wd_assume_lock_holder: become a new lock holder 2014-06-03 05:25:37
>> LOG: pid 17705: connection received: host=10.0.97.1 port=57218
>> 2014-06-03 05:25:37 ERROR: pid 17705: connect_inet_domain_socket:
>> getsockopt() detected error: Connection refused 2014-06-03 05:25:37
>> ERROR: pid 17705: connection to 10.0.90.11(5432) failed 2014-06-03
>> 05:25:37 ERROR: pid 17705: new_connection: create_cp() failed
>> 2014-06-03 05:25:37 LOG: pid 17705: new_connection: do not failover
>> because fail_over_on_backend_error is off 2014-06-03 05:25:44 ERROR:
>> pid 2970: wd_start_interlock: timed out 2014-06-03 05:25:44 LOG: pid
>> 2970: starting degeneration. shutdown host 10.0.90.11(5432)
>> 2014-06-03 05:25:44 LOG: pid 2970: Restart all children 2014-06-03
>> 05:25:44 LOG: pid 2970: execute command:
>> /store/pgdata/failover_stream.sh 0 0 10.0.90.12 /store/pgdata
>> postgresql.trigger + FAILED_NODE=0 + OLD_MASTER=0 +
>> NEW_MASTER=10.0.90.12 + PGDATA=/store/pgdata +
>> TRIGGER_FILE=postgresql.trigger + '[' 0 = 0 ']' + /usr/bin/ssh -T
>> 10.0.90.12 /bin/touch /store/pgdata/postgresql.trigger + exit 0
>> 2014-06-03 05:25:44 LOG: pid 2970: find_primary_node_repeatedly:
>> waiting for finding a primary node 2014-06-03 05:25:47 LOG: pid
>> 2970: find_primary_node: primary node id is 1 2014-06-03 05:25:47
>> LOG: pid 2970: starting follow degeneration. shutdown host
>> 10.0.90.11(5432) 2014-06-03 05:25:47 LOG: pid 2970: failover: 1
>> follow backends have been degenerated 2014-06-03 05:25:47 LOG: pid
>> 26259: start triggering follow command. 2014-06-03 05:25:47 LOG: pid
>> 26259: execute command: echo $(date) host:10.0.90.11, new master
>> id:1, old master id:0 >> /var/log/pgpool/master_failover.log
>> 2014-06-03 05:25:47 LOG: pid 2970: wd_end_interlock: end
>> interlocking 2014-06-03 05:25:48 LOG: pid 2970: failover: set new
>> primary node: 1 2014-06-03 05:25:48 LOG: pid 2970: failover: set new
>> master node: 1 2014-06-03 05:25:48 LOG: pid 2970: failover done.
>> shutdown host 10.0.90.11(5432)

>> But in this case, when I shutdown 10.0.90.1 (master) and expect the
>> failover script to be run, I don't see it being run at all. See the
>> log snippet below:

> Hmm... this is odd that failover script isn't excuted. Did you
> shutdowned 10.0.90.11 by power-off or pg_ctl stop? And, is the log of
> pgpool on 10.0.90.12?

Yes that's odd. I did around several failovers (for testing) and that
situation happened only once.

I shutdown 10.0.90.11 either with
# service postgresql-9.3 stop
or 
# /usr/pgsql-9.3/bin/pg_ctl stop

The log of the pgpool that I indicated is the log of the primary pgpool
(the one that holds the delegate IP). From my understanding the standby
pgpool shouldn't be doing anything at that point and hence shouldn't
have that info in the log?

Again to note, pgpool correctly detached 10.0.90.11 from the node. I
could see that since the status for 10.0.90.11 was 3 (using
pcp_node_info), but the failover script was not run. Hence the trigger
was not set on the slave, and hence the slave was in READ_ONLY mode when
pgpool was trying to write to it. As you can see in the logs below.

Any ideas?

>> 2014-06-05 23:38:57 ERROR: pid 2970: health check failed. 0 th host
>> 10.0.90.11 at port 5432 is down 2014-06-05 23:38:57 LOG: pid 2970:
>> health check retry sleep time: 1 second(s) 2014-06-05 23:38:58
>> ERROR: pid 2970: s_do_auth: expecting R got E 2014-06-05 23:38:58
>> ERROR: pid 2970: make_persistent_db_connection: s_do_auth failed
>> 2014-06-05 23:38:58 ERROR: pid 2970: health check failed. 0 th host
>> 10.0.90.11 at port 5432 is down 2014-06-05 23:38:58 LOG: pid 2970:
>> health check retry sleep time: 1 second(s) 2014-06-05 23:38:59
>> ERROR: pid 2970: s_do_auth: expecting R got E 2014-06-05 23:38:59
>> ERROR: pid 2970: make_persistent_db_connection: s_do_auth failed
>> 2014-06-05 23:38:59 ERROR: pid 2970: health check failed. 0 th host
>> 10.0.90.11 at port 5432 is down 2014-06-05 23:38:59 LOG: pid 2970:
>> health check retry sleep time: 1 second(s) 2014-06-05 23:39:00 LOG:
>> pid 2982: wd_send_response: WD_STAND_FOR_LOCK_HOLDER received it
>> 2014-06-05 23:39:00 ERROR: pid 2970: s_do_auth: expecting R got E
>> 2014-06-05 23:39:00 ERROR: pid 2970: make_persistent_db_connection:
>> s_do_auth failed 2014-06-05 23:39:00 ERROR: pid 2970: health check
>> failed. 0 th host 10.0.90.11 at port 5432 is down 2014-06-05
>> 23:39:00 LOG: pid 2970: set 0 th backend down status 2014-06-05
>> 23:39:00 LOG: pid 2970: wd_start_interlock: start interlocking
>> 2014-06-05 23:39:01 LOG: pid 2970: starting degeneration. shutdown
>> host 10.0.90.11(5432) 2014-06-05 23:39:01 LOG: pid 2970: Restart all
>> children 2014-06-05 23:39:02 LOG: pid 2970:
>> find_primary_node_repeatedly: waiting for finding a primary node
>> 2014-06-05 23:39:12 LOG: pid 2970: failover: no follow backends are
>> degenerated 2014-06-05 23:39:12 LOG: pid 2970: wd_end_interlock: end
>> interlocking 2014-06-05 23:39:12 LOG: pid 2970: failover: set new
>> primary node: -1 2014-06-05 23:39:12 LOG: pid 2970: failover: set
>> new master node: 1 2014-06-05 23:39:12 LOG: pid 30220: connection
>> received: host=10.0.97.1 port=45122 2014-06-05 23:39:13 LOG: pid
>> 30220: pool_send_and_wait: Error or notice message from backend: :
>> DB node id: 1 backend pid: 23115 statement: UPDATE
>> SCGroupRequestStatus SET Status = $1 WHERE Status = $2 and
>> LastUpdateTime < $3 and Owner is NULL message: cannot execute UPDATE
>> in a read-only transaction 2014-06-05 23:39:13 LOG: pid 30326:
>> do_child: failback event found. restart myself. 2014-06-05 23:39:13
>> LOG: pid 30456: do_child: failback event found. restart myself.
>> 2014-06-05 23:39:13 LOG: pid 30468: do_child: failback event found.
>> restart myself. 2014-06-05 23:39:13 LOG: pid 2970: failover done.
>> shutdown host 10.0.90.11(5432) 2014-06-05 23:39:13 LOG: pid 29758:
>> worker process received restart request 2014-06-05 23:39:14 LOG: pid
>> 29511: pcp child process received restart request 2014-06-05
>> 23:39:14 LOG: pid 2970: PCP child 29511 exits with status 256 in
>> failover() 2014-06-05 23:39:14 LOG: pid 2970: fork a new PCP child
>> pid 30470 in failover() 2014-06-05 23:39:14 LOG: pid 2970: worker
>> child 29758 exits with status 256 2014-06-05 23:39:14 LOG: pid 2970:
>> fork a new worker child pid 30471 2014-06-05 23:39:25 LOG: pid
>> 30472: connection received: host=10.0.97.1 port=45123 2014-06-05
>> 23:39:59 LOG: pid 30466: connection received: host=10.0.70.1
>> port=50938 2014-06-05 23:40:02 LOG: pid 30220: pool_send_and_wait:
>> Error or notice message from backend: : DB node id: 1 backend pid:
>> 23115 statement: UPDATE SCGroupRequestStatus SET Status = $1 WHERE
>> Status = $2 and LastUpdateTime < $3 and Owner is NULL message:
>> cannot execute UPDATE in a read-only transaction 2014-06-05 23:41:02
>> LOG: pid 30220: pool_send_and_wait: Error or notice message from
>> backend: : DB node id: 1 backend pid: 23115 statement: UPDATE
>> SCGroupRequestStatus SET Status = $1 WHERE Status = $2 and
>> LastUpdateTime < $3 and Owner is NULL message: cannot execute UPDATE
>> in a read-only transaction
>> 
>> 
>> As you can see the node was detached from the cluster correctly, but
>> the failover script was NOT executed? So, the trigger was not done,
>> and the slave was not promoted to master.
>> 
>> Why is that? Please see the configurations in the previous email.
>> 
>> Also, this is the failover script:
>> 
>> #!/bin/bash -x FAILED_NODE=$1 # %d OLD_MASTER=$2 # %P NEW_MASTER=$3
>> # %H PGDATA=$4 # %R TRIGGER_FILE=$5
>> 
>> SSH_OPTIONS='-T -o StrictHostKeyChecking=no'
>> 
>> if [ $FAILED_NODE = $OLD_MASTER ]; then /usr/bin/ssh $SSH_OPTIONS
>> $NEW_MASTER /bin/touch $PGDATA/$TRIGGER_FILE fi; exit 0;
>> 
>> 
>> Sincerely, Alexandru
>> 
>> 
>> 
>> On Sat, Jun 21, 2014 at 10:49 PM, Alexandru Cardaniuc
>> <cardaniuc at gmail.com> wrote:
>> 
>> > Hi All,
>> > Need some validation of the configuration that I did for the
>> > cluster of 2 pgpools with a cluster of 2 PostgreSQL 9.3 databases.
>> > I went through all the documentation, but in some places it's not
>> > completely clear, so I am not sure if I misunderstood some of the
>> > things.
>> > Here's the info.
>> > Pgpool version: 3.3.3 and using watchdog (2 pgpools: 1 primary, 1
>> > standby)
>> > PostgreSQL version 9.3 (cluster of 2 databases with streaming
>> > replication and hot standby configured).
>> > This is my relevant PostgreSQL 9.3 config:
>> > ------------------------------------------ listen_addresses = '*'
>> > port = 5432 max_connections = 550 superuser_reserved_connections =
>> > 3 wal_level = hot_standby checkpoint_segments = 10
>> > checkpoint_completion_target = 0.9 max_wal_senders = 10
>> > wal_keep_segments = 100 hot_standby = on
>> > ------------------------------------------ The documentation
>> > states that: "To enable read-only queries on a standby server,
>> > wal_level must be set to hot_standby on the primary, and
>> > hot_standby must be enabled in the standby."
>> > I use the same config above on both primary database and secondary
>> > with the assumption that 'wal_level = hot_standby' is ignored on
>> > the slave and 'hot_standby = on' is ignored on the master.
>> > Is that a correct assumption? That assumption shouldn't create any
>> > issues during pgpool/postgres failover?
>> >
>> > Also, about the pgpool config. This is the pgpool config for the
>> > 1st pgpool:
>> > --------------------------------------------------------------------
>> > listen_addresses = '*' port = 9999 socket_dir = '/var/run/pgpool'
>> > pcp_port = 9898 pcp_socket_dir = '/var/run/pgpool'
>> > num_init_children = 250 child_life_time = 300
>> > child_max_connections = 0 client_idle_limit = 0 enable_pool_hba =
>> > false pool_passwd = 'pool_passwd' authentication_timeout = 60
>> > print_timestamp = true log_connections = true log_hostname = false
>> > log_statement = true log_per_node_statement = false debug_level =
>> > 0 pid_file_name = '/var/run/pgpool/pgpool.pid' logdir =
>> > /var/run/pgpool' connection_cache = false health_check_period = 60
>> > health_check_timeout = 15 health_check_user = 'postgres'
>> > health_check_password = 'password' health_check_max_retries = 3
>> > health_check_retry_delay = 1 search_primary_node_timeout = 10
>> > failover_command = '/store/pgdata/failover_stream.sh %d %P %H %R
>> > postgresql.trigger' failback_command = 'echo $(date) host:%h, new
>> > master id:%m, old master id:%M >> /var/log/pgpool/failback.log'
>> > follow_master_command = 'echo $(date) host:%h, new master id:%m,
>> > old master id:%M >> /var/log/pgpool/master_failover.log'
>> > fail_over_on_backend_error = false ignore_leading_white_space =
>> > true backend_hostname0 = '10.0.90.11' backend_port0 = 5432
>> > backend_weight0 = 1 backend_data_directory0 = '/store/pgdata'
>> > backend_flag0 = 'ALLOW_TO_FAILOVER' backend_hostname1 =
>> > 10.0.90.12' backend_port1 = 5432 backend_weight1 = 1
>> > backend_data_directory1 = '/store/pgdata' backend_flag1 =
>> > ALLOW_TO_FAILOVER' ssl = false max_pool = 1 connection_life_time =
>> > 0 reset_query_list = 'ABORT; DISCARD ALL' replication_mode = false
>> > replication_stop_on_mismatch = true replicate_select = false
>> > insert_lock = true recovery_user = 'postgres' recovery_password =
>> > password' recovery_1st_stage_command = 'base_backup.sh'
>> > recovery_2nd_stage_command = '' recovery_timeout = 120
>> > client_idle_limit_in_recovery = 1 lobj_lock_table = ''
>> > master_slave_mode = true load_balance_mode = true
>> > master_slave_sub_mode = 'stream' delay_threshold = 0
>> > sr_check_period = 0 sr_check_user = 'postgres' sr_check_password =
>> > password' log_standby_delay = 'none' parallel_mode = false
>> > system_db_hostname = 'localhost' system_db_port = 5432
>> > system_db_dbname = 'pgpool' system_db_schema = 'pgpool_catalog'
>> > system_db_user = 'pgpool' system_db_password = '' pgpool2_hostname
>> > = '' memory_cache_enabled = off use_watchdog = on wd_hostname =
>> > 10.0.90.11' wd_port = 9000 delegate_IP = '10.0.90.1' ifconfig_path
>> > = '/store/pgdata/sbin' if_up_cmd = 'ifconfig eth0:0 inet $_IP_$
>> > netmask 255.255.0.0' if_down_cmd = 'ifconfig eth0:0 down'
>> > arping_path = '/store/pgdata/sbin' arping_cmd = 'arping -U $_IP_$
>> > -w 1' wd_lifecheck_method = 'heartbeat' wd_interval = 10
>> > wd_heartbeat_port = 9694 wd_heartbeat_keepalive = 2
>> > heartbeat_destination0 = '10.0.90.12' heartbeat_destination_port0
>> > = 9694 other_pgpool_hostname0 = '10.0.90.12' other_pgpool_port0 =
>> > 9999 other_wd_port0 = 9000
>> > --------------------------------------------------------------------
>> > second pgpool config file:
>> > --------------------------------------------------------------------
>> > listen_addresses = '*' port = 9999 socket_dir = '/var/run/pgpool'
>> > pcp_port = 9898 pcp_socket_dir = '/var/run/pgpool'
>> > num_init_children = 250 child_life_time = 300
>> > child_max_connections = 0 client_idle_limit = 0 enable_pool_hba =
>> > false pool_passwd = 'pool_passwd' authentication_timeout = 60
>> > print_timestamp = true log_connections = true log_hostname = false
>> > log_statement = true log_per_node_statement = false debug_level =
>> > 0 pid_file_name = '/var/run/pgpool/pgpool.pid' logdir =
>> > /var/run/pgpool' connection_cache = false health_check_period = 60
>> > health_check_timeout = 15 health_check_user = 'postgres'
>> > health_check_password = 'password' health_check_max_retries = 3
>> > health_check_retry_delay = 1 search_primary_node_timeout = 10
>> > failover_command = '/store/pgdata/failover_stream.sh %d %P %H %R
>> > postgresql.trigger' failback_command = 'echo $(date) host:%h, new
>> > master id:%m, old master id:%M >> /var/log/pgpool/failback.log'
>> > follow_master_command = 'echo $(date) host:%h, new master id:%m,
>> > old master id:%M >> /var/log/pgpool/master_failover.log'
>> > fail_over_on_backend_error = false ignore_leading_white_space =
>> > true backend_hostname0 = '10.0.90.11' backend_port0 = 5432
>> > backend_weight0 = 1 backend_data_directory0 = '/store/pgdata'
>> > backend_flag0 = 'ALLOW_TO_FAILOVER' backend_hostname1 =
>> > 10.0.90.12' backend_port1 = 5432 backend_weight1 = 1
>> > backend_data_directory1 = '/store/pgdata' backend_flag1 =
>> > ALLOW_TO_FAILOVER' ssl = false max_pool = 1 connection_life_time =
>> > 0 reset_query_list = 'ABORT; DISCARD ALL' replication_mode = false
>> > replication_stop_on_mismatch = true replicate_select = false
>> > insert_lock = true recovery_user = 'postgres' recovery_password =
>> > password' recovery_1st_stage_command = 'base_backup.sh'
>> > recovery_2nd_stage_command = '' recovery_timeout = 120
>> > client_idle_limit_in_recovery = 1 lobj_lock_table = ''
>> > master_slave_mode = true load_balance_mode = true
>> > master_slave_sub_mode = 'stream' delay_threshold = 0
>> > sr_check_period = 0 sr_check_user = 'postgres' sr_check_password =
>> > password' log_standby_delay = 'none' parallel_mode = false
>> > system_db_hostname = 'localhost' system_db_port = 5432
>> > system_db_dbname = 'pgpool' system_db_schema = 'pgpool_catalog'
>> > system_db_user = 'pgpool' system_db_password = '' pgpool2_hostname
>> > = '' memory_cache_enabled = off use_watchdog = on wd_hostname =
>> > 10.0.90.12' wd_port = 9000 delegate_IP = '10.0.90.1' ifconfig_path
>> > = '/store/pgdata/sbin' if_up_cmd = 'ifconfig eth0:0 inet $_IP_$
>> > netmask 255.255.0.0' if_down_cmd = 'ifconfig eth0:0 down'
>> > arping_path = '/store/pgdata/sbin' arping_cmd = 'arping -U $_IP_$
>> > -w 1' wd_lifecheck_method = 'heartbeat' wd_interval = 10
>> > wd_heartbeat_port = 9694 wd_heartbeat_keepalive = 2
>> > heartbeat_destination0 = '10.0.90.11' heartbeat_destination_port0
>> > = 9694 other_pgpool_hostname0 = '10.0.90.11' other_pgpool_port0 =
>> > 9999 other_wd_port0 = 9000
>> > --------------------------------------------------------------------
>> > Anything wrong or missing in the configs? I need a cluster of
>> > pgpools with watchdog and healthchecks monitoring a cluster of
>> > postgres databases with streaming replication setup.
>> > Any help in review or suggestions based on your knowledge or
>> > experience will be appreciated.
>> > Sincerely, Alexandru
>> > -- "A journey of a thousand miles begins with a single step." -
>> > Ancient Proverb
>> >
>> 
>> 
>> 
>> -- Sincerely yours, Alexandru Cardaniuc

-- 
"The secret of getting ahead is getting started. The secret of getting
started is breaking your complex overwhelming tasks into small
manageable tasks, and then starting on the first one."  
- Mark Twain (1835-1910)