[Pgpool-general] pcp_child: pcp_read() failed. reason: Success

Tatsuo Ishii ishii at sraoss.co.jp
Sun Nov 8 10:23:33 UTC 2009


I would like to know how  the condition of pgpool is. What does ps
show for pgpool processes? Even better, can you attach debugger to one
of pgpool process and get back trace?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> I've tried setting the local backend_hostname = ''
> Same problems are occurring.
> Pgpool has actually failed something like 4 separate times today, all but
> one of them using this local socket configuration.
> 
> Any other thoughts?
> 
> thx
> -s
> 
> On Thu, Nov 5, 2009 at 10:52 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
> > > Actually I'm was running pgpool on db2 (backend_hostname1) and am now
> > > running it on db3 (backend_hostname2).
> > > I have actually suspected that pgpool might be opting for some sort of
> > > socket connection to the local instance of postgres instead of using the
> > > TCP/IP connection parameters in an effort to speed things up.
> > >
> > > I have done my best to ensure that pgpool has completely separate socket
> > > directories but it wouldn't be hard for pgpool to find a local postgres
> > > socket if it wanted.  If I end up with another outage and this time db3
> > is
> > > the postgres instance that locks up, I'll be fairly certain that this is
> > the
> > > problem but for the moment I can only speculate.
> > >
> > > I'm assuming you're suggesting I set backend_hostname0 = '' because it is
> > > already weighted to 0.0 anyway?
> >
> > No. Because I thought you are running pgpool on db1. '' means force
> > pgpool to use UNIX domain socket. So if you running pgpool on db3, you
> > could set:
> >
> > backend_hostname2 = ''
> >
> > > I have db1 (backend_hostname0) weighted to 0.0 in an effort to direct all
> > > selects to the two slave hosts (db2 and db3) but still benefit from
> > pgpool
> > > intelligently sending writes to db1.
> > > db1 is the mammoth master host and needs all available i/o to deal with
> > > writes.
> > > My understanding is that this is how "master_slave_mode = true" works.
> > > Writes are always directed to backend_hostname0.
> > >
> > > If I need to reevaluate that thinking, please advise but that has been
> > > working for me for months now.
> > >
> > > thx
> > > -s
> > >
> > > On Thu, Nov 5, 2009 at 9:26 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> > >
> > > > Besides the useless error message from pgp_child(it seems someone
> > > > believed that EOF will set some error number to the global errono
> > > > variable. I will fix this anyway.), for me it seems socket files are
> > > > going dead. I suspect some network stack bugs could cause this but I'm
> > > > not sure. One thing you might want to try is, changing this:
> > > >
> > > > backend_hostname0 = 'db1.xxx.xxx'
> > > >
> > > > to:
> > > >
> > > > backend_hostname0 = ''
> > > >
> > > > This will make pgpool to use UNIX domain socket for the communication
> > > > channel to PostgreSQL, rather than TCP/IP. It may or may not affect
> > > > the problem you have, since the network code in the kernel will be
> > > > different.
> > > >
> > > > (I assume you are running pgpool on db1.xxx.xxx)
> > > > --
> > > > Tatsuo Ishii
> > > > SRA OSS, Inc. Japan
> > > >
> > > > > Has anyone else run into this:
> > > > >
> > > > > My pgpool instance runs without problems for days on end and then
> > > > suddenly
> > > > > stops responding to all requests.
> > > > > At the same moment, one of my three backend db hosts becomes
> > completely
> > > > > inaccessible.
> > > > > Pgpool will not respond to shutdown, or even kill and must be kill
> > -9'd
> > > > > Once all pgpool processes are out of the way, the inaccessible
> > postgres
> > > > > server once again becomes responsive.
> > > > > I restart pgpool and everything works properly for a few more days.
> > > > >
> > > > > At the moment the problem occurs, pgpool's log output, which
> > typically
> > > > > consists of just connection logging, turns into a steady stream of
> > this:
> > > > > Nov  5 11:33:18 src at obfuscated pgpool: 2009-11-05 11:33:18 ERROR:
> > pid
> > > > 12811:
> > > > > pcp_child: pcp_read() failed. reason: Success
> > > > > These errors show up sporaticlly in my pgpool logs all the time but
> > don't
> > > > > appear to have any adverse effects until the whole thing takes a
> > dive.
> > > > > I would desperately like to know what this error message is trying to
> > > > tell
> > > > > me.
> > > > >
> > > > > I have not been able to correlate any given query/connection/process
> > to
> > > > the
> > > > > timing of the outages.
> > > > > Sometimes they happens at peak usage periods, sometimes they happen
> > in
> > > > the
> > > > > middle of the night.
> > > > >
> > > > > I experienced this problem using pgpool-II v1.3 and have recently
> > > > upgraded
> > > > > to pgpool-II v2.2.5 but am still seeing the same issue.
> > > > >
> > > > > It may be relevant to point out that I am running pgpool on one of
> > the
> > > > > machines that is also acting as a postgres backend and it is always
> > the
> > > > > postgres instance on the pgpool host that locks up.
> > > > > This morning I moved the pgpool instance onto another one of the
> > postgres
> > > > > backend hosts in an effort to see if the cohabitation of pgpool and
> > > > postgres
> > > > > is causing problems or if there is simply an issue with that postres
> > on
> > > > that
> > > > > host of if this is just a coincidence.
> > > > > I likely won't gain anything from this test for a day or more.
> > > > >
> > > > > Also relevant is that I am running mammoth replicator and am only
> > using
> > > > > pgpool for connection load balancing and high availability.
> > > > >
> > > > > Below is my pgpool.conf.
> > > > >
> > > > > Any thoughts appreciated.
> > > > >
> > > > > -steve crandell
> > > > >
> > > > >
> > > > >
> > > > > f
> > > > >
> > > > > #
> > > > > # pgpool-II configuration file sample
> > > > > # $Header: /cvsroot/pgpool/pgpool-II/pgpool.conf.sample,v 1.4.2.3
> > > > > 2007/10/12 09:15:02 y-asaba Exp $
> > > > >
> > > > > # Host name or IP address to listen on: '*' for all, '' for no TCP/IP
> > > > > # connections
> > > > > #listen_addresses = 'localhost'
> > > > > listen_addresses = '10.xxx.xxx.xxx'
> > > > >
> > > > > # Port number for pgpool
> > > > > port = 5432
> > > > >
> > > > > # Port number for pgpool communication manager
> > > > > pcp_port = 9898
> > > > >
> > > > > # Unix domain socket path.  (The Debian package defaults to
> > > > > # /var/run/postgresql.)
> > > > > socket_dir = '/usr/local/pgpool'
> > > > >
> > > > > # Unix domain socket path for pgpool communication manager.
> > > > > pcp_socket_dir = '/usr/local/pgpool'
> > > > >
> > > > > # Unix domain socket path for the backend. Debian package defaults to
> > > > > /var/run/postgresql!
> > > > > backend_socket_dir = '/usr/local/pgpool'
> > > > >
> > > > > # pgpool communication manager timeout. 0 means no timeout, but
> > > > > strongly not recommended!
> > > > > pcp_timeout = 10
> > > > >
> > > > > # number of pre-forked child process
> > > > > num_init_children = 32
> > > > >
> > > > >
> > > > > # Number of connection pools allowed for a child process
> > > > > max_pool = 4
> > > > >
> > > > >
> > > > > # If idle for this many seconds, child exits.  0 means no timeout.
> > > > > child_life_time = 30
> > > > >
> > > > > # If idle for this many seconds, connection to PostgreSQL closes.
> > > > > # 0 means no timeout.
> > > > > #connection_life_time = 0
> > > > > connection_life_time = 30
> > > > >
> > > > > # If child_max_connections connections were received, child exits.
> > > > > # 0 means no exit.
> > > > > # change
> > > > > child_max_connections = 0
> > > > >
> > > > > # Maximum time in seconds to complete client authentication.
> > > > > # 0 means no timeout.
> > > > > authentication_timeout = 60
> > > > >
> > > > > # Logging directory (more accurately, the directory for the PID file)
> > > > > logdir = '/usr/local/pgpool'
> > > > >
> > > > > # Replication mode
> > > > > replication_mode = false
> > > > >
> > > > > # Set this to true if you want to avoid deadlock situations when
> > > > > # replication is enabled.  There will, however, be a noticable
> > > > performance
> > > > > # degradation.  A workaround is to set this to false and insert a
> > > > /*STRICT*/
> > > > > # comment at the beginning of the SQL command.
> > > > > replication_strict = false
> > > > >
> > > > > # When replication_strict is set to false, there will be a chance for
> > > > > # deadlocks.  Set this to nonzero (in milliseconds) to detect this
> > > > > # situation and resolve the deadlock by aborting current session.
> > > > > replication_timeout = 5000
> > > > >
> > > > > # Load balancing mode, i.e., all SELECTs except in a transaction
> > block
> > > > > # are load balanced.  This is ignored if replication_mode is false.
> > > > > # change
> > > > > load_balance_mode = true
> > > > >
> > > > > # if there's a data mismatch between master and secondary
> > > > > # start degeneration to stop replication mode
> > > > > replication_stop_on_mismatch = false
> > > > >
> > > > > # If true, replicate SELECT statement when load balancing is
> > disabled.
> > > > > # If false, it is only sent to the master node.
> > > > > # change
> > > > > replicate_select = true
> > > > >
> > > > > # Semicolon separated list of queries to be issued at the end of a
> > > > session
> > > > > reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION
> > DEFAULT'
> > > > >
> > > > > # If true print timestamp on each log line.
> > > > > print_timestamp = true
> > > > >
> > > > > # If true, operate in master/slave mode.
> > > > > # change
> > > > > master_slave_mode = true
> > > > >
> > > > > # If true, cache connection pool.
> > > > > connection_cache = false
> > > > >
> > > > > # Health check timeout.  0 means no timeout.
> > > > > health_check_timeout = 20
> > > > >
> > > > > # Health check period.  0 means no health check.
> > > > > health_check_period = 0
> > > > >
> > > > > # Health check user
> > > > > health_check_user = 'nobody'
> > > > >
> > > > > # If true, automatically lock table with INSERT statements to keep
> > SERIAL
> > > > > # data consistency.  An /*INSERT LOCK*/ comment has the same effect.
> >  A
> > > > > # /NO INSERT LOCK*/ comment disables the effect.
> > > > > insert_lock = false
> > > > >
> > > > > # If true, ignore leading white spaces of each query while pgpool
> > judges
> > > > > # whether the query is a SELECT so that it can be load balanced.
> >  This
> > > > > # is useful for certain APIs such as DBI/DBD which is known to adding
> > an
> > > > > # extra leading white space.
> > > > > ignore_leading_white_space = false
> > > > >
> > > > > # If true, print all statements to the log.  Like the log_statement
> > > > option
> > > > > # to PostgreSQL, this allows for observing queries without engaging
> > in
> > > > full
> > > > > # debugging.
> > > > > log_statement = false
> > > > >
> > > > > # If true, incoming connections will be printed to the log.
> > > > > # change
> > > > > log_connections = true
> > > > >
> > > > > # If true, hostname will be shown in ps status. Also shown in
> > > > > # connection log if log_connections = true.
> > > > > # Be warned that this feature will add overhead to look up hostname.
> > > > > log_hostname = false
> > > > >
> > > > > # if non 0, run in parallel query mode
> > > > > parallel_mode = false
> > > > >
> > > > > # if non 0, use query cache
> > > > > enable_query_cache = 0
> > > > >
> > > > > #set pgpool2 hostname
> > > > > pgpool2_hostname = ''
> > > > >
> > > > > # system DB info
> > > > > #system_db_hostname = 'localhost'
> > > > > #system_db_port = 5432
> > > > > #system_db_dbname = 'pgpool'
> > > > > #system_db_schema = 'pgpool_catalog'
> > > > > #system_db_user = 'pgpool'
> > > > > #system_db_password = ''
> > > > >
> > > > > # backend_hostname, backend_port, backend_weight
> > > > > # here are examples
> > > > > backend_hostname0 = 'db1.xxx.xxx'
> > > > > backend_port0 = 5433
> > > > > backend_weight0 = 0.0
> > > > >
> > > > > backend_hostname1 = 'db2.xxx.xxx'
> > > > > backend_port1 = 5433
> > > > > backend_weight1 = 0.4
> > > > >
> > > > > backend_hostname2 = 'db3.xxx.xxx'
> > > > > backend_port2 = 5433
> > > > > backend_weight2 = 0.6
> > > > >
> > > > >
> > > > >
> > > > > # - HBA -
> > > > >
> > > > > # If true, use pool_hba.conf for client authentication. In pgpool-II
> > > > > # 1.1, the default value is false. The default value will be true in
> > > > > # 1.2.
> > > > > enable_pool_hba = false
> > > >
> >


More information about the Pgpool-general mailing list