[pgpool-general: 1065] Re: pgpool-II, watchdog and segfaults
Yugo Nagata
nagata at sraoss.co.jp
Wed Oct 3 13:57:58 JST 2012
I'm investigating this.
Could you provide the back trace?
If you have a core file, you can get a back trace as follows.
% gdb pgpool core-file
(gdb) bt
In addition, what version of the pgpool-II?
On Tue, 2 Oct 2012 10:12:40 -0400
Greg Swallow <gswallow at exacttarget.com> wrote:
> Hi,
>
> I have installed pgpool-II on four Ubuntu Lucid systems and when I enable watchdog, I get constant segfaults:
>
> Oct 2 13:40:59 db1a kernel: [3671199.301094] pgpool[18352]: segfault at 0 ip 00007f80c71b5052 sp 00007fffe7779618 error 4 in libc-2.11.1.so[7f80c7132000+17a000]
> Oct 2 13:42:39 db1a kernel: [3671299.305608] pgpool[18261]: segfault at 0 ip 00007f80c71b5052 sp 00007fffe77796a8 error 4 in libc-2.11.1.so[7f80c7132000+17a000]
> Oct 2 13:44:19 db1a kernel: [3671399.310278] pgpool[18421]: segfault at 0 ip 00007f80c71b5052 sp 00007fffe7779618 error 4 in libc-2.11.1.so[7f80c7132000+17a000]
>
> I turned debug up to 255 and this is what I see that coincides with a segfault:
>
> Oct 2 13:57:32 db1a pgpool[19174]: I am 19174
> Oct 2 13:57:32 db1a pgpool[19174]: pool_initialize_private_backend_status: initialize backend status
> ...
> Oct 2 13:59:13 db1a pgpool[19174]: I am 19174 accept fd 8
> Oct 2 13:59:13 db1a pgpool[19174]: Protocol Major: 1234 Minor: 5679 database: user:
> Oct 2 13:59:13 db1a pgpool[19174]: SSLRequest from client
> Oct 2 13:59:13 db1a pgpool[19174]: Protocol Major: 3 Minor: 0 database: template1 user: (null)
> Oct 2 13:59:13 db1a pgpool[19154]: reap_handler called
> Oct 2 13:59:13 db1a pgpool[19154]: reap_handler: call wait3
> Oct 2 13:59:13 db1a pgpool[19154]: child 19174 exits with status 11 by signal 11
>
> When I disable the watchdog, this behavior stops. I can run keepalived for a virtual IP address if this is a watchdog bug and I can help trace whatever you can guide me through. I searched through the pgpool-II documentation and it seems like the child PID *might* be trying to perform online recovery in a streaming replication scenario, but I haven't configured a recovery user and password? I do not want pgpool-II to automatically fail anything over.
>
> My config:
>
> root at db1a:~# pcp_pool_status 60 localhost 9898 pgpool2 blah
> name : listen_addresses
> value: *
> desc : host name(s) or IP address(es) to listen to
>
> name : port
> value: 5431
> desc : pgpool accepting port number
>
> name : socket_dir
> value: /tmp
> desc : pgpool socket directory
>
> name : pcp_port
> value: 9898
> desc : PCP port # to bind
>
> name : pcp_socket_dir
> value: /var/run/pgpool
> desc : PCP socket directory
>
> name : enable_pool_hba
> value: 1
> desc : if true, use pool_hba.conf for client authentication
>
> name : authentication_timeout
> value: 20
> desc : maximum time in seconds to complete client authentication
>
> name : ssl
> value: 0
> desc : SSL support
>
> name : ssl_key
> value:
> desc : path to the SSL private key file
>
> name : ssl_cert
> value:
> desc : path to the SSL public certificate file
>
> name : ssl_ca_cert
> value:
> desc : path to a single PEM format file
>
> name : ssl_ca_cert_dir
> value:
> desc : directory containing CA root certificate(s)
>
> name : num_init_children
> value: 10
> desc : # of children initially pre-forked
>
> name : max_pool
> value: 40
> desc : max # of connection pool per child
>
> name : child_life_time
> value: 600
> desc : if idle for this seconds, child exits
>
> name : child_max_connections
> value: 0
> desc : if max_connections received, chile exits
>
> name : connection_life_time
> value: 0
> desc : if idle for this seconds, connection closes
>
> name : client_idle_limit
> value: 0
> desc : if idle for this seconds, child connection closes
>
> name : log_destination
> value: syslog
> desc : logging destination
>
> name : print_timestamp
> value: 1
> desc : if true print time stamp to each log line
>
> name : log_connections
> value: 1
> desc : if true, print incoming connections to the log
>
> name : log_hostname
> value: 0
> desc : if true, resolve hostname for ps and log print
>
> name : log_statement
> value: 0
> desc : if non 0, logs all SQL statements
>
> name : log_per_node_statement
> value: 0
> desc : if non 0, logs all SQL statements on each node
>
> name : log_standby_delay
> value: if_over_threshold
> desc : how to log standby delay
>
> name : syslog_facility
> value: LOCAL0
> desc : syslog local faclity
>
> name : syslog_ident
> value: pgpool
> desc : syslog program ident string
>
> name : debug_level
> value: 255
> desc : debug message level
>
> name : pid_file_name
> value: /var/run/pgpool/pgpool.pid
> desc : path to pid file
>
> name : logdir
> value: /var/log/pgpool
> desc : PgPool status file logging directory
>
> name : connection_cache
> value: 1
> desc : if true, cache connection pool
>
> name : reset_query_list
> value: ABORT; DISCARD ALL
> desc : queries issued at the end of session
>
> name : replication_mode
> value: 0
> desc : non 0 if operating in replication mode
>
> name : replicate_select
> value: 0
> desc : non 0 if SELECT statement is replicated
>
> name : insert_lock
> value: 1
> desc : insert lock
>
> name : lobj_lock_table
> value:
> desc : table name used for large object replication control
>
> name : replication_stop_on_mismatch
> value: 0
> desc : stop replication mode on fatal error
>
> name : failover_if_affected_tuples_mismatch
> value: 0
> desc : failover if affected tuples are mismatch
>
> name : load_balance_mode
> value: 1
> desc : non 0 if operating in load balancing mode
>
> name : ignore_leading_white_space
> value: 1
> desc : ignore leading white spaces
>
> name : white_function_list
> value:
> desc : functions those do not write to database
>
> name : black_function_list
> value: nextval,setval
> desc : functions those write to database
>
> name : master_slave_mode
> value: 1
> desc : if true, operate in master/slave mode
>
> name : master_slave_sub_mode
> value: stream
> desc : master/slave sub mode
>
> name : sr_check_period
> value: 10
> desc : sr check period
>
> name : sr_check_user
> value: pgquery
> desc : sr check user
>
> name : delay_threshold
> value: 2097152
> desc : standby delay threshold
>
> name : follow_master_command
> value:
> desc : follow master command
>
> name : parallel_mode
> value: 0
> desc : if non 0, run in parallel query mode
>
> name : enable_query_cache
> value: 0
> desc : if non 0, use query cache
>
> name : pgpool2_hostname
> value: db1a
> desc : pgpool2 hostname
>
> name : system_db_hostname
> value: localhost
> desc : system DB hostname
>
> name : system_db_port
> value: 5432
> desc : system DB port number
>
> name : system_db_dbname
> value: pgpool
> desc : system DB name
>
> name : system_db_schema
> value: pgpool_catalog
> desc : system DB schema name
>
> name : system_db_user
> value: pgpool
> desc : user name to access system DB
>
> name : health_check_period
> value: 15
> desc : health check period
>
> name : health_check_timeout
> value: 10
> desc : health check timeout
>
> name : health_check_user
> value: pgquery
> desc : health check user
>
> name : health_check_max_retries
> value: 3
> desc : health check max retries
>
> name : health_check_retry_delay
> value: 1
> desc : health check retry delay
>
> name : failover_command
> value:
> desc : failover command
>
> name : failback_command
> value:
> desc : failback command
>
> name : fail_over_on_backend_error
> value: 1
> desc : fail over on backend error
>
> name : recovery_user
> value:
> desc : online recovery user
>
> name : recovery_1st_stage_command
> value:
> desc : execute a command in first stage.
>
> name : recovery_2nd_stage_command
> value:
> desc : execute a command in second stage.
>
> name : recovery_timeout
> value: 90
> desc : max time in seconds to wait for the recovering node's postmaster
>
> name : client_idle_limit_in_recovery
> value: 0
> desc : if idle for this seconds, child connection closes in recovery 2n
>
> name : relcache_expire
> value: 0
> desc : relation cache expiration time in seconds
>
> name : parallel_mode
> value: 0
> desc : if non 0, run in parallel query mode
>
> name : enable_query_cache
> value: 0
> desc : if non 0, use query cache
>
> name : pgpool2_hostname
> value: db1a
> desc : pgpool2 hostname
>
> name : system_db_hostname
> value: localhost
> desc : system DB hostname
>
> name : system_db_port
> value: 5432
> desc : system DB port number
>
> name : system_db_dbname
> value: pgpool
> desc : system DB name
>
> name : system_db_schema
> value: pgpool_catalog
> desc : system DB schema name
>
> name : system_db_user
> value: pgpool
> desc : user name to access system DB
>
> name : use_watchdog
> value: 1
> desc : non 0 if operating in use_watchdog
>
> name : trusted_servers
> value: 172.26.42.254,db1.stg.cotweet.com,db1a.stg.cotweet.com,db1b.stg.cotweet.com
> desc : upper server list to observe connection
>
> name : delegate_IP
> value: 172.26.42.25
> desc : delegate IP address of master pgpool
>
> name : wd_port
> value: 9000
> desc : watchdog port number
>
> name : wd_interval
> value: 10
> desc : life check interval (second)
>
> name : ping_path
> value: /bin
> desc : path to ping command
>
> name : ifconfig_path
> value: /sbin
> desc : path to ifconfig command
>
> name : if_up_cmd
> value: ifconfig eth0:0 inet $_IP_$ netmask 255.255.255.255
> desc : virtual interface up command with full parameters
>
> name : if_down_cmd
> value: ifconfig eth0:0 down
> desc : virtual interface down command with full parameters
>
> name : arping_path
> value: /usr/bin
> desc : path to arping command
>
> name : arping_cmd
> value: arping -U $_IP_$ -w 1
> desc : send ARP REQUESTi to neighbour host
>
> name : wd_life_point
> value: 3
> desc : retry times of life check
>
> name : wd_lifecheck_query
> value: SELECT 1
> desc : lifecheck query to pgpool from watchdog
>
> name : memory_cache_enabled
> value: 0
> desc : If true, use the memory cache functionality, false by default
>
> name : memqcache_method
> value: shmem
> desc : Cache store method. either shmem(shared memory) or Memcached. sh
>
> name : memqcache_memcached_host
> value: localhost
> desc : Memcached host name. Mandatory if memqcache_method=memcached
>
> name : memqcache_memcached_port
> value: 11211
> desc : Memcached port number. Mondatory if memqcache_method=memcached
>
> name : memqcache_total_size
> value: 67108864
> desc : Total memory size in bytes for storing memory cache. Mandatory i
>
> name : memqcache_max_num_cache
> value: 1000000
> desc : Total number of cache entries
>
> name : memqcache_expire
> value: 0
> desc : Memory cache entry life time specified in seconds. 60 by default
>
> name : memqcache_auto_cache_invalidation
> value: 0
> desc : If true, invalidation of query cache is triggered by correspondi
>
> name : memqcache_maxcache
> value: 409600
> desc : Maximum SELECT result size in bytes
>
> name : memqcache_cache_block_size
> value: 1048576
> desc : Cache block size in bytes. 8192 by default
>
> name : memqcache_cache_oiddir
> value: /var/log/pgpool/oiddir
> desc : Tempory work directory to record table oids
>
> name : memqcache_stats_start_time
> value: Thu Jan 1 00:00:00 1970
>
> desc : Start time of query cache stats
>
> name : memqcache_no_cache_hits
> value: 0
> desc : Number of SELECTs not hitting query cache
>
> name : memqcache_cache_hits
> value: 0
> desc : Number of SELECTs hitting query cache
>
> name : white_memqcache_table_list
> value:
> desc : tables to memqcache
>
> name : black_memqcache_table_list
> value:
> desc : tables not to memqcache
>
> name : backend_hostname0
> value: db1.stg.cotweet.com
> desc : backend #0 hostname
>
> name : backend_port0
> value: 5432
> desc : backend #0 port number
>
> name : backend_weight0
> value: 0.333333
> desc : weight of backend #0
>
> name : backend_data_directory0
> value:
> desc : data directory for backend #0
>
> name : backend_status0
> value: 1
> desc : status of backend #0
>
> name : standby_delay0
> value: 0
> desc : standby delay of backend #0
>
> name : backend_flag0
> value: DISALLOW_TO_FAILOVER
> desc : backend #0 flag
>
> name : backend_hostname1
> value: db1a.stg.cotweet.com
> desc : backend #1 hostname
>
> name : backend_port1
> value: 5432
> desc : backend #1 port number
>
> name : backend_weight1
> value: 0.333333
> desc : weight of backend #1
>
> name : backend_data_directory1
> value:
> desc : data directory for backend #1
>
> name : backend_status1
> value: 1
> desc : status of backend #1
>
> name : standby_delay1
> value: 0
> desc : standby delay of backend #1
>
> name : backend_flag1
> value: DISALLOW_TO_FAILOVER
> desc : backend #1 flag
>
> name : backend_hostname2
> value: db1b.stg.cotweet.com
> desc : backend #2 hostname
>
> name : backend_port2
> value: 5432
> desc : backend #2 port number
>
> name : backend_weight2
> value: 0.333333
> desc : weight of backend #2
>
> name : backend_data_directory2
> value:
> desc : data directory for backend #2
>
> name : backend_status2
> value: 3
> desc : status of backend #2
>
> name : standby_delay2
> value: 0
> desc : standby delay of backend #2
>
> name : backend_flag2
> value: DISALLOW_TO_FAILOVER
> desc : backend #2 flag
>
> name : other_pgpool_hostname1
> value: db1b.stg.cotweet.com
> desc : pgpool #1 hostname
>
> name : other_pgpool_port1
> value: 5431
> desc : pgpool #1 port number
>
> name : other_pgpool_wd_port1
> value: 9000
> desc : pgpool #1 watchdog port number
>
>
--
Yugo Nagata <nagata at sraoss.co.jp>
More information about the pgpool-general
mailing list