[pgpool-general: 1062] pgpool-II, watchdog and segfaults

Greg Swallow gswallow at exacttarget.com
Tue Oct 2 23:12:40 JST 2012


Hi,

I have installed pgpool-II on four Ubuntu Lucid systems and when I enable watchdog, I get constant segfaults:

Oct  2 13:40:59 db1a kernel: [3671199.301094] pgpool[18352]: segfault at 0 ip 00007f80c71b5052 sp 00007fffe7779618 error 4 in libc-2.11.1.so[7f80c7132000+17a000]
Oct  2 13:42:39 db1a kernel: [3671299.305608] pgpool[18261]: segfault at 0 ip 00007f80c71b5052 sp 00007fffe77796a8 error 4 in libc-2.11.1.so[7f80c7132000+17a000]
Oct  2 13:44:19 db1a kernel: [3671399.310278] pgpool[18421]: segfault at 0 ip 00007f80c71b5052 sp 00007fffe7779618 error 4 in libc-2.11.1.so[7f80c7132000+17a000]

I turned debug up to 255 and this is what I see that coincides with a segfault:

Oct  2 13:57:32 db1a pgpool[19174]: I am 19174
Oct  2 13:57:32 db1a pgpool[19174]: pool_initialize_private_backend_status: initialize backend status
...
Oct  2 13:59:13 db1a pgpool[19174]: I am 19174 accept fd 8
Oct  2 13:59:13 db1a pgpool[19174]: Protocol Major: 1234 Minor: 5679 database:  user: 
Oct  2 13:59:13 db1a pgpool[19174]: SSLRequest from client
Oct  2 13:59:13 db1a pgpool[19174]: Protocol Major: 3 Minor: 0 database: template1 user: (null)
Oct  2 13:59:13 db1a pgpool[19154]: reap_handler called
Oct  2 13:59:13 db1a pgpool[19154]: reap_handler: call wait3
Oct  2 13:59:13 db1a pgpool[19154]: child 19174 exits with status 11 by signal 11

When I disable the watchdog, this behavior stops.  I can run keepalived for a virtual IP address if this is a watchdog bug and I can help trace whatever you can guide me through.  I searched through the pgpool-II documentation and it seems like the child PID *might* be trying to perform online recovery in a streaming replication scenario, but I haven't configured a recovery user and password?  I do not want pgpool-II to automatically fail anything over.

My config:

root at db1a:~# pcp_pool_status 60 localhost 9898 pgpool2 blah 
name : listen_addresses
value: *
desc : host name(s) or IP address(es) to listen to

name : port
value: 5431
desc : pgpool accepting port number

name : socket_dir
value: /tmp
desc : pgpool socket directory

name : pcp_port
value: 9898
desc : PCP port # to bind

name : pcp_socket_dir
value: /var/run/pgpool
desc : PCP socket directory

name : enable_pool_hba
value: 1
desc : if true, use pool_hba.conf for client authentication

name : authentication_timeout
value: 20
desc : maximum time in seconds to complete client authentication

name : ssl
value: 0
desc : SSL support

name : ssl_key
value: 
desc : path to the SSL private key file

name : ssl_cert
value: 
desc : path to the SSL public certificate file

name : ssl_ca_cert
value: 
desc : path to a single PEM format file

name : ssl_ca_cert_dir
value: 
desc : directory containing CA root certificate(s)

name : num_init_children
value: 10
desc : # of children initially pre-forked

name : max_pool
value: 40
desc : max # of connection pool per child

name : child_life_time
value: 600
desc : if idle for this seconds, child exits

name : child_max_connections
value: 0
desc : if max_connections received, chile exits

name : connection_life_time
value: 0
desc : if idle for this seconds, connection closes

name : client_idle_limit
value: 0
desc : if idle for this seconds, child connection closes

name : log_destination
value: syslog
desc : logging destination

name : print_timestamp
value: 1
desc : if true print time stamp to each log line

name : log_connections
value: 1
desc : if true, print incoming connections to the log

name : log_hostname
value: 0
desc : if true, resolve hostname for ps and log print

name : log_statement
value: 0
desc : if non 0, logs all SQL statements

name : log_per_node_statement
value: 0
desc : if non 0, logs all SQL statements on each node

name : log_standby_delay
value: if_over_threshold
desc : how to log standby delay

name : syslog_facility
value: LOCAL0
desc : syslog local faclity

name : syslog_ident
value: pgpool
desc : syslog program ident string

name : debug_level
value: 255
desc : debug message level

name : pid_file_name
value: /var/run/pgpool/pgpool.pid
desc : path to pid file

name : logdir
value: /var/log/pgpool
desc : PgPool status file logging directory

name : connection_cache
value: 1
desc : if true, cache connection pool

name : reset_query_list
value: ABORT; DISCARD ALL
desc : queries issued at the end of session

name : replication_mode
value: 0
desc : non 0 if operating in replication mode

name : replicate_select
value: 0
desc : non 0 if SELECT statement is replicated

name : insert_lock
value: 1
desc : insert lock

name : lobj_lock_table
value: 
desc : table name used for large object replication control

name : replication_stop_on_mismatch
value: 0
desc : stop replication mode on fatal error

name : failover_if_affected_tuples_mismatch
value: 0
desc : failover if affected tuples are mismatch

name : load_balance_mode
value: 1
desc : non 0 if operating in load balancing mode

name : ignore_leading_white_space
value: 1
desc : ignore leading white spaces

name : white_function_list
value: 
desc : functions those do not write to database

name : black_function_list
value: nextval,setval
desc : functions those write to database

name : master_slave_mode
value: 1
desc : if true, operate in master/slave mode

name : master_slave_sub_mode
value: stream
desc : master/slave sub mode

name : sr_check_period
value: 10
desc : sr check period

name : sr_check_user
value: pgquery
desc : sr check user

name : delay_threshold
value: 2097152
desc : standby delay threshold

name : follow_master_command
value: 
desc : follow master command

name : parallel_mode
value: 0
desc : if non 0, run in parallel query mode

name : enable_query_cache
value: 0
desc : if non 0, use query cache

name : pgpool2_hostname
value: db1a
desc : pgpool2 hostname

name : system_db_hostname
value: localhost
desc : system DB hostname

name : system_db_port
value: 5432
desc : system DB port number

name : system_db_dbname
value: pgpool
desc : system DB name

name : system_db_schema
value: pgpool_catalog
desc : system DB schema name

name : system_db_user
value: pgpool
desc : user name to access system DB

name : health_check_period
value: 15
desc : health check period

name : health_check_timeout
value: 10
desc : health check timeout

name : health_check_user
value: pgquery
desc : health check user

name : health_check_max_retries
value: 3
desc : health check max retries

name : health_check_retry_delay
value: 1
desc : health check retry delay

name : failover_command
value: 
desc : failover command

name : failback_command
value: 
desc : failback command

name : fail_over_on_backend_error
value: 1
desc : fail over on backend error

name : recovery_user
value: 
desc : online recovery user

name : recovery_1st_stage_command
value: 
desc : execute a command in first stage.

name : recovery_2nd_stage_command
value: 
desc : execute a command in second stage.

name : recovery_timeout
value: 90
desc : max time in seconds to wait for the recovering node's postmaster

name : client_idle_limit_in_recovery
value: 0
desc : if idle for this seconds, child connection closes in recovery 2n

name : relcache_expire
value: 0
desc : relation cache expiration time in seconds

name : parallel_mode
value: 0
desc : if non 0, run in parallel query mode

name : enable_query_cache
value: 0
desc : if non 0, use query cache

name : pgpool2_hostname
value: db1a
desc : pgpool2 hostname

name : system_db_hostname
value: localhost
desc : system DB hostname

name : system_db_port
value: 5432
desc : system DB port number

name : system_db_dbname
value: pgpool
desc : system DB name

name : system_db_schema
value: pgpool_catalog
desc : system DB schema name

name : system_db_user
value: pgpool
desc : user name to access system DB

name : use_watchdog
value: 1
desc : non 0 if operating in use_watchdog

name : trusted_servers
value: 172.26.42.254,db1.stg.cotweet.com,db1a.stg.cotweet.com,db1b.stg.cotweet.com
desc : upper server list to observe connection

name : delegate_IP
value: 172.26.42.25
desc : delegate IP address of master pgpool

name : wd_port
value: 9000
desc : watchdog port number

name : wd_interval
value: 10
desc : life check interval (second)

name : ping_path
value: /bin
desc : path to ping command

name : ifconfig_path
value: /sbin
desc : path to ifconfig command

name : if_up_cmd
value: ifconfig eth0:0 inet $_IP_$ netmask 255.255.255.255
desc : virtual interface up command with full parameters

name : if_down_cmd
value: ifconfig eth0:0 down
desc : virtual interface down command with full parameters

name : arping_path
value: /usr/bin
desc : path to arping command

name : arping_cmd
value: arping -U $_IP_$ -w 1
desc : send ARP REQUESTi to neighbour host

name : wd_life_point
value: 3
desc : retry times of life check

name : wd_lifecheck_query
value: SELECT 1
desc : lifecheck query to pgpool from watchdog

name : memory_cache_enabled
value: 0
desc : If true, use the memory cache functionality, false by default

name : memqcache_method
value: shmem
desc : Cache store method. either shmem(shared memory) or Memcached. sh

name : memqcache_memcached_host
value: localhost
desc : Memcached host name. Mandatory if memqcache_method=memcached

name : memqcache_memcached_port
value: 11211
desc : Memcached port number. Mondatory if memqcache_method=memcached

name : memqcache_total_size
value: 67108864
desc : Total memory size in bytes for storing memory cache. Mandatory i

name : memqcache_max_num_cache
value: 1000000
desc : Total number of cache entries

name : memqcache_expire
value: 0
desc : Memory cache entry life time specified in seconds. 60 by default

name : memqcache_auto_cache_invalidation
value: 0
desc : If true, invalidation of query cache is triggered by correspondi

name : memqcache_maxcache
value: 409600
desc : Maximum SELECT result size in bytes

name : memqcache_cache_block_size
value: 1048576
desc : Cache block size in bytes. 8192 by default

name : memqcache_cache_oiddir
value: /var/log/pgpool/oiddir
desc : Tempory work directory to record table oids

name : memqcache_stats_start_time
value: Thu Jan  1 00:00:00 1970

desc : Start time of query cache stats

name : memqcache_no_cache_hits
value: 0
desc : Number of SELECTs not hitting query cache

name : memqcache_cache_hits
value: 0
desc : Number of SELECTs hitting query cache

name : white_memqcache_table_list
value: 
desc : tables to memqcache

name : black_memqcache_table_list
value: 
desc : tables not to memqcache

name : backend_hostname0
value: db1.stg.cotweet.com
desc : backend #0 hostname

name : backend_port0
value: 5432
desc : backend #0 port number

name : backend_weight0
value: 0.333333
desc : weight of backend #0

name : backend_data_directory0
value: 
desc : data directory for backend #0

name : backend_status0
value: 1
desc : status of backend #0

name : standby_delay0
value: 0
desc : standby delay of backend #0

name : backend_flag0
value: DISALLOW_TO_FAILOVER
desc : backend #0 flag

name : backend_hostname1
value: db1a.stg.cotweet.com
desc : backend #1 hostname

name : backend_port1
value: 5432
desc : backend #1 port number

name : backend_weight1
value: 0.333333
desc : weight of backend #1

name : backend_data_directory1
value: 
desc : data directory for backend #1

name : backend_status1
value: 1
desc : status of backend #1

name : standby_delay1
value: 0
desc : standby delay of backend #1

name : backend_flag1
value: DISALLOW_TO_FAILOVER
desc : backend #1 flag

name : backend_hostname2
value: db1b.stg.cotweet.com
desc : backend #2 hostname

name : backend_port2
value: 5432
desc : backend #2 port number

name : backend_weight2
value: 0.333333
desc : weight of backend #2

name : backend_data_directory2
value: 
desc : data directory for backend #2

name : backend_status2
value: 3
desc : status of backend #2

name : standby_delay2
value: 0
desc : standby delay of backend #2

name : backend_flag2
value: DISALLOW_TO_FAILOVER
desc : backend #2 flag

name : other_pgpool_hostname1
value: db1b.stg.cotweet.com
desc : pgpool #1 hostname

name : other_pgpool_port1
value: 5431
desc : pgpool #1 port number

name : other_pgpool_wd_port1
value: 9000
desc : pgpool #1 watchdog port number


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2835 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20121002/c14a507e/attachment.p7s>


More information about the pgpool-general mailing list