[pgpool-general: 1065] Re: pgpool-II, watchdog and segfaults

Yugo Nagata nagata at sraoss.co.jp
Wed Oct 3 13:57:58 JST 2012


I'm investigating this.

Could you provide the back trace?
If you have a core file, you can get a back trace as follows.

 % gdb pgpool core-file
  (gdb) bt

In addition, what version of the pgpool-II?

On Tue, 2 Oct 2012 10:12:40 -0400
Greg Swallow <gswallow at exacttarget.com> wrote:

> Hi,
> 
> I have installed pgpool-II on four Ubuntu Lucid systems and when I enable watchdog, I get constant segfaults:
> 
> Oct  2 13:40:59 db1a kernel: [3671199.301094] pgpool[18352]: segfault at 0 ip 00007f80c71b5052 sp 00007fffe7779618 error 4 in libc-2.11.1.so[7f80c7132000+17a000]
> Oct  2 13:42:39 db1a kernel: [3671299.305608] pgpool[18261]: segfault at 0 ip 00007f80c71b5052 sp 00007fffe77796a8 error 4 in libc-2.11.1.so[7f80c7132000+17a000]
> Oct  2 13:44:19 db1a kernel: [3671399.310278] pgpool[18421]: segfault at 0 ip 00007f80c71b5052 sp 00007fffe7779618 error 4 in libc-2.11.1.so[7f80c7132000+17a000]
> 
> I turned debug up to 255 and this is what I see that coincides with a segfault:
> 
> Oct  2 13:57:32 db1a pgpool[19174]: I am 19174
> Oct  2 13:57:32 db1a pgpool[19174]: pool_initialize_private_backend_status: initialize backend status
> ...
> Oct  2 13:59:13 db1a pgpool[19174]: I am 19174 accept fd 8
> Oct  2 13:59:13 db1a pgpool[19174]: Protocol Major: 1234 Minor: 5679 database:  user: 
> Oct  2 13:59:13 db1a pgpool[19174]: SSLRequest from client
> Oct  2 13:59:13 db1a pgpool[19174]: Protocol Major: 3 Minor: 0 database: template1 user: (null)
> Oct  2 13:59:13 db1a pgpool[19154]: reap_handler called
> Oct  2 13:59:13 db1a pgpool[19154]: reap_handler: call wait3
> Oct  2 13:59:13 db1a pgpool[19154]: child 19174 exits with status 11 by signal 11
> 
> When I disable the watchdog, this behavior stops.  I can run keepalived for a virtual IP address if this is a watchdog bug and I can help trace whatever you can guide me through.  I searched through the pgpool-II documentation and it seems like the child PID *might* be trying to perform online recovery in a streaming replication scenario, but I haven't configured a recovery user and password?  I do not want pgpool-II to automatically fail anything over.
> 
> My config:
> 
> root at db1a:~# pcp_pool_status 60 localhost 9898 pgpool2 blah 
> name : listen_addresses
> value: *
> desc : host name(s) or IP address(es) to listen to
> 
> name : port
> value: 5431
> desc : pgpool accepting port number
> 
> name : socket_dir
> value: /tmp
> desc : pgpool socket directory
> 
> name : pcp_port
> value: 9898
> desc : PCP port # to bind
> 
> name : pcp_socket_dir
> value: /var/run/pgpool
> desc : PCP socket directory
> 
> name : enable_pool_hba
> value: 1
> desc : if true, use pool_hba.conf for client authentication
> 
> name : authentication_timeout
> value: 20
> desc : maximum time in seconds to complete client authentication
> 
> name : ssl
> value: 0
> desc : SSL support
> 
> name : ssl_key
> value: 
> desc : path to the SSL private key file
> 
> name : ssl_cert
> value: 
> desc : path to the SSL public certificate file
> 
> name : ssl_ca_cert
> value: 
> desc : path to a single PEM format file
> 
> name : ssl_ca_cert_dir
> value: 
> desc : directory containing CA root certificate(s)
> 
> name : num_init_children
> value: 10
> desc : # of children initially pre-forked
> 
> name : max_pool
> value: 40
> desc : max # of connection pool per child
> 
> name : child_life_time
> value: 600
> desc : if idle for this seconds, child exits
> 
> name : child_max_connections
> value: 0
> desc : if max_connections received, chile exits
> 
> name : connection_life_time
> value: 0
> desc : if idle for this seconds, connection closes
> 
> name : client_idle_limit
> value: 0
> desc : if idle for this seconds, child connection closes
> 
> name : log_destination
> value: syslog
> desc : logging destination
> 
> name : print_timestamp
> value: 1
> desc : if true print time stamp to each log line
> 
> name : log_connections
> value: 1
> desc : if true, print incoming connections to the log
> 
> name : log_hostname
> value: 0
> desc : if true, resolve hostname for ps and log print
> 
> name : log_statement
> value: 0
> desc : if non 0, logs all SQL statements
> 
> name : log_per_node_statement
> value: 0
> desc : if non 0, logs all SQL statements on each node
> 
> name : log_standby_delay
> value: if_over_threshold
> desc : how to log standby delay
> 
> name : syslog_facility
> value: LOCAL0
> desc : syslog local faclity
> 
> name : syslog_ident
> value: pgpool
> desc : syslog program ident string
> 
> name : debug_level
> value: 255
> desc : debug message level
> 
> name : pid_file_name
> value: /var/run/pgpool/pgpool.pid
> desc : path to pid file
> 
> name : logdir
> value: /var/log/pgpool
> desc : PgPool status file logging directory
> 
> name : connection_cache
> value: 1
> desc : if true, cache connection pool
> 
> name : reset_query_list
> value: ABORT; DISCARD ALL
> desc : queries issued at the end of session
> 
> name : replication_mode
> value: 0
> desc : non 0 if operating in replication mode
> 
> name : replicate_select
> value: 0
> desc : non 0 if SELECT statement is replicated
> 
> name : insert_lock
> value: 1
> desc : insert lock
> 
> name : lobj_lock_table
> value: 
> desc : table name used for large object replication control
> 
> name : replication_stop_on_mismatch
> value: 0
> desc : stop replication mode on fatal error
> 
> name : failover_if_affected_tuples_mismatch
> value: 0
> desc : failover if affected tuples are mismatch
> 
> name : load_balance_mode
> value: 1
> desc : non 0 if operating in load balancing mode
> 
> name : ignore_leading_white_space
> value: 1
> desc : ignore leading white spaces
> 
> name : white_function_list
> value: 
> desc : functions those do not write to database
> 
> name : black_function_list
> value: nextval,setval
> desc : functions those write to database
> 
> name : master_slave_mode
> value: 1
> desc : if true, operate in master/slave mode
> 
> name : master_slave_sub_mode
> value: stream
> desc : master/slave sub mode
> 
> name : sr_check_period
> value: 10
> desc : sr check period
> 
> name : sr_check_user
> value: pgquery
> desc : sr check user
> 
> name : delay_threshold
> value: 2097152
> desc : standby delay threshold
> 
> name : follow_master_command
> value: 
> desc : follow master command
> 
> name : parallel_mode
> value: 0
> desc : if non 0, run in parallel query mode
> 
> name : enable_query_cache
> value: 0
> desc : if non 0, use query cache
> 
> name : pgpool2_hostname
> value: db1a
> desc : pgpool2 hostname
> 
> name : system_db_hostname
> value: localhost
> desc : system DB hostname
> 
> name : system_db_port
> value: 5432
> desc : system DB port number
> 
> name : system_db_dbname
> value: pgpool
> desc : system DB name
> 
> name : system_db_schema
> value: pgpool_catalog
> desc : system DB schema name
> 
> name : system_db_user
> value: pgpool
> desc : user name to access system DB
> 
> name : health_check_period
> value: 15
> desc : health check period
> 
> name : health_check_timeout
> value: 10
> desc : health check timeout
> 
> name : health_check_user
> value: pgquery
> desc : health check user
> 
> name : health_check_max_retries
> value: 3
> desc : health check max retries
> 
> name : health_check_retry_delay
> value: 1
> desc : health check retry delay
> 
> name : failover_command
> value: 
> desc : failover command
> 
> name : failback_command
> value: 
> desc : failback command
> 
> name : fail_over_on_backend_error
> value: 1
> desc : fail over on backend error
> 
> name : recovery_user
> value: 
> desc : online recovery user
> 
> name : recovery_1st_stage_command
> value: 
> desc : execute a command in first stage.
> 
> name : recovery_2nd_stage_command
> value: 
> desc : execute a command in second stage.
> 
> name : recovery_timeout
> value: 90
> desc : max time in seconds to wait for the recovering node's postmaster
> 
> name : client_idle_limit_in_recovery
> value: 0
> desc : if idle for this seconds, child connection closes in recovery 2n
> 
> name : relcache_expire
> value: 0
> desc : relation cache expiration time in seconds
> 
> name : parallel_mode
> value: 0
> desc : if non 0, run in parallel query mode
> 
> name : enable_query_cache
> value: 0
> desc : if non 0, use query cache
> 
> name : pgpool2_hostname
> value: db1a
> desc : pgpool2 hostname
> 
> name : system_db_hostname
> value: localhost
> desc : system DB hostname
> 
> name : system_db_port
> value: 5432
> desc : system DB port number
> 
> name : system_db_dbname
> value: pgpool
> desc : system DB name
> 
> name : system_db_schema
> value: pgpool_catalog
> desc : system DB schema name
> 
> name : system_db_user
> value: pgpool
> desc : user name to access system DB
> 
> name : use_watchdog
> value: 1
> desc : non 0 if operating in use_watchdog
> 
> name : trusted_servers
> value: 172.26.42.254,db1.stg.cotweet.com,db1a.stg.cotweet.com,db1b.stg.cotweet.com
> desc : upper server list to observe connection
> 
> name : delegate_IP
> value: 172.26.42.25
> desc : delegate IP address of master pgpool
> 
> name : wd_port
> value: 9000
> desc : watchdog port number
> 
> name : wd_interval
> value: 10
> desc : life check interval (second)
> 
> name : ping_path
> value: /bin
> desc : path to ping command
> 
> name : ifconfig_path
> value: /sbin
> desc : path to ifconfig command
> 
> name : if_up_cmd
> value: ifconfig eth0:0 inet $_IP_$ netmask 255.255.255.255
> desc : virtual interface up command with full parameters
> 
> name : if_down_cmd
> value: ifconfig eth0:0 down
> desc : virtual interface down command with full parameters
> 
> name : arping_path
> value: /usr/bin
> desc : path to arping command
> 
> name : arping_cmd
> value: arping -U $_IP_$ -w 1
> desc : send ARP REQUESTi to neighbour host
> 
> name : wd_life_point
> value: 3
> desc : retry times of life check
> 
> name : wd_lifecheck_query
> value: SELECT 1
> desc : lifecheck query to pgpool from watchdog
> 
> name : memory_cache_enabled
> value: 0
> desc : If true, use the memory cache functionality, false by default
> 
> name : memqcache_method
> value: shmem
> desc : Cache store method. either shmem(shared memory) or Memcached. sh
> 
> name : memqcache_memcached_host
> value: localhost
> desc : Memcached host name. Mandatory if memqcache_method=memcached
> 
> name : memqcache_memcached_port
> value: 11211
> desc : Memcached port number. Mondatory if memqcache_method=memcached
> 
> name : memqcache_total_size
> value: 67108864
> desc : Total memory size in bytes for storing memory cache. Mandatory i
> 
> name : memqcache_max_num_cache
> value: 1000000
> desc : Total number of cache entries
> 
> name : memqcache_expire
> value: 0
> desc : Memory cache entry life time specified in seconds. 60 by default
> 
> name : memqcache_auto_cache_invalidation
> value: 0
> desc : If true, invalidation of query cache is triggered by correspondi
> 
> name : memqcache_maxcache
> value: 409600
> desc : Maximum SELECT result size in bytes
> 
> name : memqcache_cache_block_size
> value: 1048576
> desc : Cache block size in bytes. 8192 by default
> 
> name : memqcache_cache_oiddir
> value: /var/log/pgpool/oiddir
> desc : Tempory work directory to record table oids
> 
> name : memqcache_stats_start_time
> value: Thu Jan  1 00:00:00 1970
> 
> desc : Start time of query cache stats
> 
> name : memqcache_no_cache_hits
> value: 0
> desc : Number of SELECTs not hitting query cache
> 
> name : memqcache_cache_hits
> value: 0
> desc : Number of SELECTs hitting query cache
> 
> name : white_memqcache_table_list
> value: 
> desc : tables to memqcache
> 
> name : black_memqcache_table_list
> value: 
> desc : tables not to memqcache
> 
> name : backend_hostname0
> value: db1.stg.cotweet.com
> desc : backend #0 hostname
> 
> name : backend_port0
> value: 5432
> desc : backend #0 port number
> 
> name : backend_weight0
> value: 0.333333
> desc : weight of backend #0
> 
> name : backend_data_directory0
> value: 
> desc : data directory for backend #0
> 
> name : backend_status0
> value: 1
> desc : status of backend #0
> 
> name : standby_delay0
> value: 0
> desc : standby delay of backend #0
> 
> name : backend_flag0
> value: DISALLOW_TO_FAILOVER
> desc : backend #0 flag
> 
> name : backend_hostname1
> value: db1a.stg.cotweet.com
> desc : backend #1 hostname
> 
> name : backend_port1
> value: 5432
> desc : backend #1 port number
> 
> name : backend_weight1
> value: 0.333333
> desc : weight of backend #1
> 
> name : backend_data_directory1
> value: 
> desc : data directory for backend #1
> 
> name : backend_status1
> value: 1
> desc : status of backend #1
> 
> name : standby_delay1
> value: 0
> desc : standby delay of backend #1
> 
> name : backend_flag1
> value: DISALLOW_TO_FAILOVER
> desc : backend #1 flag
> 
> name : backend_hostname2
> value: db1b.stg.cotweet.com
> desc : backend #2 hostname
> 
> name : backend_port2
> value: 5432
> desc : backend #2 port number
> 
> name : backend_weight2
> value: 0.333333
> desc : weight of backend #2
> 
> name : backend_data_directory2
> value: 
> desc : data directory for backend #2
> 
> name : backend_status2
> value: 3
> desc : status of backend #2
> 
> name : standby_delay2
> value: 0
> desc : standby delay of backend #2
> 
> name : backend_flag2
> value: DISALLOW_TO_FAILOVER
> desc : backend #2 flag
> 
> name : other_pgpool_hostname1
> value: db1b.stg.cotweet.com
> desc : pgpool #1 hostname
> 
> name : other_pgpool_port1
> value: 5431
> desc : pgpool #1 port number
> 
> name : other_pgpool_wd_port1
> value: 9000
> desc : pgpool #1 watchdog port number
> 
> 


-- 
Yugo Nagata <nagata at sraoss.co.jp>


More information about the pgpool-general mailing list