[Pgpool-general] pcp_child: pcp_read() failed. reason: Success

Mon Nov 9 12:57:42 UTC 2009

I've done extensive testing since my last message.
The problem appears to be that I am getting more connections to pgpool than
I have num_init_children.
Per my tests, as soon as pgpool gets connection num_init_children + 1, it
locks up and takes at least one of the backend nodes with it.

Can anyone else confirm this?
I'd like to make sure it is not just my particular configuration causing the
issue.

In the mean time I have simply increased the num_init_children parameter
significantly in an effort to stay well ahead of the number of incoming
connections and this appears to be working.

-s

On Sun, Nov 8, 2009 at 3:23 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> I would like to know how  the condition of pgpool is. What does ps
> show for pgpool processes? Even better, can you attach debugger to one
> of pgpool process and get back trace?
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
>
> > I've tried setting the local backend_hostname = ''
> > Same problems are occurring.
> > Pgpool has actually failed something like 4 separate times today, all but
> > one of them using this local socket configuration.
> >
> > Any other thoughts?
> >
> > thx
> > -s
> >
> > On Thu, Nov 5, 2009 at 10:52 PM, Tatsuo Ishii <ishii at sraoss.co.jp>
> wrote:
> >
> > > > Actually I'm was running pgpool on db2 (backend_hostname1) and am now
> > > > running it on db3 (backend_hostname2).
> > > > I have actually suspected that pgpool might be opting for some sort
> of
> > > > socket connection to the local instance of postgres instead of using
> the
> > > > TCP/IP connection parameters in an effort to speed things up.
> > > >
> > > > I have done my best to ensure that pgpool has completely separate
> socket
> > > > directories but it wouldn't be hard for pgpool to find a local
> postgres
> > > > socket if it wanted.  If I end up with another outage and this time
> db3
> > > is
> > > > the postgres instance that locks up, I'll be fairly certain that this
> is
> > > the
> > > > problem but for the moment I can only speculate.
> > > >
> > > > I'm assuming you're suggesting I set backend_hostname0 = '' because
> it is
> > > > already weighted to 0.0 anyway?
> > >
> > > No. Because I thought you are running pgpool on db1. '' means force
> > > pgpool to use UNIX domain socket. So if you running pgpool on db3, you
> > > could set:
> > >
> > > backend_hostname2 = ''
> > >
> > > > I have db1 (backend_hostname0) weighted to 0.0 in an effort to direct
> all
> > > > selects to the two slave hosts (db2 and db3) but still benefit from
> > > pgpool
> > > > intelligently sending writes to db1.
> > > > db1 is the mammoth master host and needs all available i/o to deal
> with
> > > > writes.
> > > > My understanding is that this is how "master_slave_mode = true"
> works.
> > > > Writes are always directed to backend_hostname0.
> > > >
> > > > If I need to reevaluate that thinking, please advise but that has
> been
> > > > working for me for months now.
> > > >
> > > > thx
> > > > -s
> > > >
> > > > On Thu, Nov 5, 2009 at 9:26 PM, Tatsuo Ishii <ishii at sraoss.co.jp>
> wrote:
> > > >
> > > > > Besides the useless error message from pgp_child(it seems someone
> > > > > believed that EOF will set some error number to the global errono
> > > > > variable. I will fix this anyway.), for me it seems socket files
> are
> > > > > going dead. I suspect some network stack bugs could cause this but
> I'm
> > > > > not sure. One thing you might want to try is, changing this:
> > > > >
> > > > > backend_hostname0 = 'db1.xxx.xxx'
> > > > >
> > > > > to:
> > > > >
> > > > > backend_hostname0 = ''
> > > > >
> > > > > This will make pgpool to use UNIX domain socket for the
> communication
> > > > > channel to PostgreSQL, rather than TCP/IP. It may or may not affect
> > > > > the problem you have, since the network code in the kernel will be
> > > > > different.
> > > > >
> > > > > (I assume you are running pgpool on db1.xxx.xxx)
> > > > > --
> > > > > Tatsuo Ishii
> > > > > SRA OSS, Inc. Japan
> > > > >
> > > > > > Has anyone else run into this:
> > > > > >
> > > > > > My pgpool instance runs without problems for days on end and then
> > > > > suddenly
> > > > > > stops responding to all requests.
> > > > > > At the same moment, one of my three backend db hosts becomes
> > > completely
> > > > > > inaccessible.
> > > > > > Pgpool will not respond to shutdown, or even kill and must be
> kill
> > > -9'd
> > > > > > Once all pgpool processes are out of the way, the inaccessible
> > > postgres
> > > > > > server once again becomes responsive.
> > > > > > I restart pgpool and everything works properly for a few more
> days.
> > > > > >
> > > > > > At the moment the problem occurs, pgpool's log output, which
> > > typically
> > > > > > consists of just connection logging, turns into a steady stream
> of
> > > this:
> > > > > > Nov  5 11:33:18 src at obfuscated pgpool: 2009-11-05 11:33:18
> ERROR:
> > > pid
> > > > > 12811:
> > > > > > pcp_child: pcp_read() failed. reason: Success
> > > > > > These errors show up sporaticlly in my pgpool logs all the time
> but
> > > don't
> > > > > > appear to have any adverse effects until the whole thing takes a
> > > dive.
> > > > > > I would desperately like to know what this error message is
> trying to
> > > > > tell
> > > > > > me.
> > > > > >
> > > > > > I have not been able to correlate any given
> query/connection/process
> > > to
> > > > > the
> > > > > > timing of the outages.
> > > > > > Sometimes they happens at peak usage periods, sometimes they
> happen
> > > in
> > > > > the
> > > > > > middle of the night.
> > > > > >
> > > > > > I experienced this problem using pgpool-II v1.3 and have recently
> > > > > upgraded
> > > > > > to pgpool-II v2.2.5 but am still seeing the same issue.
> > > > > >
> > > > > > It may be relevant to point out that I am running pgpool on one
> of
> > > the
> > > > > > machines that is also acting as a postgres backend and it is
> always
> > > the
> > > > > > postgres instance on the pgpool host that locks up.
> > > > > > This morning I moved the pgpool instance onto another one of the
> > > postgres
> > > > > > backend hosts in an effort to see if the cohabitation of pgpool
> and
> > > > > postgres
> > > > > > is causing problems or if there is simply an issue with that
> postres
> > > on
> > > > > that
> > > > > > host of if this is just a coincidence.
> > > > > > I likely won't gain anything from this test for a day or more.
> > > > > >
> > > > > > Also relevant is that I am running mammoth replicator and am only
> > > using
> > > > > > pgpool for connection load balancing and high availability.
> > > > > >
> > > > > > Below is my pgpool.conf.
> > > > > >
> > > > > > Any thoughts appreciated.
> > > > > >
> > > > > > -steve crandell
> > > > > >
> > > > > >
> > > > > >
> > > > > > f
> > > > > >
> > > > > > #
> > > > > > # pgpool-II configuration file sample
> > > > > > # $Header: /cvsroot/pgpool/pgpool-II/pgpool.conf.sample,v 1.4.2.3
> > > > > > 2007/10/12 09:15:02 y-asaba Exp $
> > > > > >
> > > > > > # Host name or IP address to listen on: '*' for all, '' for no
> TCP/IP
> > > > > > # connections
> > > > > > #listen_addresses = 'localhost'
> > > > > > listen_addresses = '10.xxx.xxx.xxx'
> > > > > >
> > > > > > # Port number for pgpool
> > > > > > port = 5432
> > > > > >
> > > > > > # Port number for pgpool communication manager
> > > > > > pcp_port = 9898
> > > > > >
> > > > > > # Unix domain socket path.  (The Debian package defaults to
> > > > > > # /var/run/postgresql.)
> > > > > > socket_dir = '/usr/local/pgpool'
> > > > > >
> > > > > > # Unix domain socket path for pgpool communication manager.
> > > > > > pcp_socket_dir = '/usr/local/pgpool'
> > > > > >
> > > > > > # Unix domain socket path for the backend. Debian package
> defaults to
> > > > > > /var/run/postgresql!
> > > > > > backend_socket_dir = '/usr/local/pgpool'
> > > > > >
> > > > > > # pgpool communication manager timeout. 0 means no timeout, but
> > > > > > strongly not recommended!
> > > > > > pcp_timeout = 10
> > > > > >
> > > > > > # number of pre-forked child process
> > > > > > num_init_children = 32
> > > > > >
> > > > > >
> > > > > > # Number of connection pools allowed for a child process
> > > > > > max_pool = 4
> > > > > >
> > > > > >
> > > > > > # If idle for this many seconds, child exits.  0 means no
> timeout.
> > > > > > child_life_time = 30
> > > > > >
> > > > > > # If idle for this many seconds, connection to PostgreSQL closes.
> > > > > > # 0 means no timeout.
> > > > > > #connection_life_time = 0
> > > > > > connection_life_time = 30
> > > > > >
> > > > > > # If child_max_connections connections were received, child
> exits.
> > > > > > # 0 means no exit.
> > > > > > # change
> > > > > > child_max_connections = 0
> > > > > >
> > > > > > # Maximum time in seconds to complete client authentication.
> > > > > > # 0 means no timeout.
> > > > > > authentication_timeout = 60
> > > > > >
> > > > > > # Logging directory (more accurately, the directory for the PID
> file)
> > > > > > logdir = '/usr/local/pgpool'
> > > > > >
> > > > > > # Replication mode
> > > > > > replication_mode = false
> > > > > >
> > > > > > # Set this to true if you want to avoid deadlock situations when
> > > > > > # replication is enabled.  There will, however, be a noticable
> > > > > performance
> > > > > > # degradation.  A workaround is to set this to false and insert a
> > > > > /*STRICT*/
> > > > > > # comment at the beginning of the SQL command.
> > > > > > replication_strict = false
> > > > > >
> > > > > > # When replication_strict is set to false, there will be a chance
> for
> > > > > > # deadlocks.  Set this to nonzero (in milliseconds) to detect
> this
> > > > > > # situation and resolve the deadlock by aborting current session.
> > > > > > replication_timeout = 5000
> > > > > >
> > > > > > # Load balancing mode, i.e., all SELECTs except in a transaction
> > > block
> > > > > > # are load balanced.  This is ignored if replication_mode is
> false.
> > > > > > # change
> > > > > > load_balance_mode = true
> > > > > >
> > > > > > # if there's a data mismatch between master and secondary
> > > > > > # start degeneration to stop replication mode
> > > > > > replication_stop_on_mismatch = false
> > > > > >
> > > > > > # If true, replicate SELECT statement when load balancing is
> > > disabled.
> > > > > > # If false, it is only sent to the master node.
> > > > > > # change
> > > > > > replicate_select = true
> > > > > >
> > > > > > # Semicolon separated list of queries to be issued at the end of
> a
> > > > > session
> > > > > > reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION
> > > DEFAULT'
> > > > > >
> > > > > > # If true print timestamp on each log line.
> > > > > > print_timestamp = true
> > > > > >
> > > > > > # If true, operate in master/slave mode.
> > > > > > # change
> > > > > > master_slave_mode = true
> > > > > >
> > > > > > # If true, cache connection pool.
> > > > > > connection_cache = false
> > > > > >
> > > > > > # Health check timeout.  0 means no timeout.
> > > > > > health_check_timeout = 20
> > > > > >
> > > > > > # Health check period.  0 means no health check.
> > > > > > health_check_period = 0
> > > > > >
> > > > > > # Health check user
> > > > > > health_check_user = 'nobody'
> > > > > >
> > > > > > # If true, automatically lock table with INSERT statements to
> keep
> > > SERIAL
> > > > > > # data consistency.  An /*INSERT LOCK*/ comment has the same
> effect.
> > >  A
> > > > > > # /NO INSERT LOCK*/ comment disables the effect.
> > > > > > insert_lock = false
> > > > > >
> > > > > > # If true, ignore leading white spaces of each query while pgpool
> > > judges
> > > > > > # whether the query is a SELECT so that it can be load balanced.
> > >  This
> > > > > > # is useful for certain APIs such as DBI/DBD which is known to
> adding
> > > an
> > > > > > # extra leading white space.
> > > > > > ignore_leading_white_space = false
> > > > > >
> > > > > > # If true, print all statements to the log.  Like the
> log_statement
> > > > > option
> > > > > > # to PostgreSQL, this allows for observing queries without
> engaging
> > > in
> > > > > full
> > > > > > # debugging.
> > > > > > log_statement = false
> > > > > >
> > > > > > # If true, incoming connections will be printed to the log.
> > > > > > # change
> > > > > > log_connections = true
> > > > > >
> > > > > > # If true, hostname will be shown in ps status. Also shown in
> > > > > > # connection log if log_connections = true.
> > > > > > # Be warned that this feature will add overhead to look up
> hostname.
> > > > > > log_hostname = false
> > > > > >
> > > > > > # if non 0, run in parallel query mode
> > > > > > parallel_mode = false
> > > > > >
> > > > > > # if non 0, use query cache
> > > > > > enable_query_cache = 0
> > > > > >
> > > > > > #set pgpool2 hostname
> > > > > > pgpool2_hostname = ''
> > > > > >
> > > > > > # system DB info
> > > > > > #system_db_hostname = 'localhost'
> > > > > > #system_db_port = 5432
> > > > > > #system_db_dbname = 'pgpool'
> > > > > > #system_db_schema = 'pgpool_catalog'
> > > > > > #system_db_user = 'pgpool'
> > > > > > #system_db_password = ''
> > > > > >
> > > > > > # backend_hostname, backend_port, backend_weight
> > > > > > # here are examples
> > > > > > backend_hostname0 = 'db1.xxx.xxx'
> > > > > > backend_port0 = 5433
> > > > > > backend_weight0 = 0.0
> > > > > >
> > > > > > backend_hostname1 = 'db2.xxx.xxx'
> > > > > > backend_port1 = 5433
> > > > > > backend_weight1 = 0.4
> > > > > >
> > > > > > backend_hostname2 = 'db3.xxx.xxx'
> > > > > > backend_port2 = 5433
> > > > > > backend_weight2 = 0.6
> > > > > >
> > > > > >
> > > > > >
> > > > > > # - HBA -
> > > > > >
> > > > > > # If true, use pool_hba.conf for client authentication. In
> pgpool-II
> > > > > > # 1.1, the default value is false. The default value will be true
> in
> > > > > > # 1.2.
> > > > > > enable_pool_hba = false
> > > > >
> > >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pgfoundry.org/pipermail/pgpool-general/attachments/20091109/4cee5ee5/attachment-0001.html>