[Pgpool-general] pcp_attach_node problem?

Wed Jan 21 18:32:48 UTC 2009

Thanks for your response, I really appreciate it.

> First, I don't really agree on just attaching a node back into the  
> pool the manner your are doing with the steps shown below. If a  
> postgreSQL backend node goes down, for some reason out of anyone's  
> control, you should bring that node back into the pool by using  
> online_recovery, that's why that mechanism is in place.
> 
> Now there are times that we may need to purposely take one of the  
> postgreSQL backend nodes down, (I agree on that)  but when that is  
> the  case one should have in place some maintenance 
> procedures. There  
> are several scenarios though depending on your setup.  You 
> may need to  
> keep your environment in read/write mode at all times which 
> means you  
> would use the pcp utilities to detach the PG node, do whatever you  
> need to do and then use the pcp online recovery to bring that node  
> back on the pool. (not pcp attach)
> If you happen to be able to have your environment in read-only mode  
> then you could use the pcp detach to take the backend node 
> out of the  
> pool and then then use pcp attach to bring that node back 
> into the pool.

I understand your point and that's what I think too. But my example only
shows unit testing.

My real case is as follows:

I have 2 or 4 server configuration.

2-server configuration:
Application and DB run in each server

4-server configuration:
Application run in two of the servers
DB run in the other two servers.

Pgpool-II would run only in the server where the application is running
(a total of 2), but only one application would be active at a time. The
applications would always connect to localhost port 9999.

In any case, when we are installing the applications and DBs, it's
always done one at a time (this is the procedure and can not currently
be changed).

The worst case scenerio for pgpool is at installation time with 2-server
setup:
1. Install first server (App & pgpool and DB)
2. Install second server (App & pgpool and DB)

For changes to take effect, the installation reboots the server (don't
ask me... It's the way it has been and takes a lot of time/money to
replace this procedure). So imagine it:
At the end of step 1, the system reboots. When it comes up, only the
first of the two servers is up; the other one does not have even IP
address set. Pgpool starts and sees that there's no secondary database.
With failover_command I trigger a script that would look for
availability of the secondary database.
At the end of step two, after rebooting, secondary server is up and
running. Its pgpool will successfully connect to both databases since
the first one is already up. However, the script running in the primary
server detects that there's the secondary database running (I check for
specific tables in the database, so I know it's up and ready for running
application requests). If specific data in tables are not the same
between primary and secondary database for any reason, I will do
*manual* pcp recovery; otherwise (which is the most likely to happen at
installation time since it has been just installed and both databases
should have the same data), do pcp attach.

Why don't I do pcp recovery in all cases? Because pcp recovery requires
no connections from the application at the second stage of the recovery.
With the release that is working for me (2.1) I can not disconnect
clients at second stage only (using client_idle_limit_in_recovery in the
latest copy of pgpool-II), so I need to close the application on
purpose. Therefore, I need manual recovery. In this regard, I'm going to
re-image my development box and install a fresh latest CVS version of
pgpool-II, because something funny like this happened when I went from
2.0.1 to 2.1, so... No clue. The thing is that I'm running out of time.

In conclusion, it should not behave the way it does when I disconnect a
backend and do pcp attach after that.

> I have downloaded the latest CVS version and tried the 
> following a few  
> times and did not see any issues.

I'll push very hard to use it, starting with re-imaging my box.

> On your last step though, you mentioned that you "re-attached the  
> primary" backend but I guess you meant the secondary backend since  
> that was the one you stopped.

Yes, you are right: I meant 'sceondary'.

> Marcelo
> PostgreSQL DBA
> Linux/Solaris System Administrator

Thanks, Marcelo

Daniel

> 
> On Jan 20, 2009, at 5:46 PM, Daniel.Crespo at l-3com.com wrote:
> 
> > I think the patch is for debugging purposes, but I'm not sure.
> >
> > The weird thing that happens to me is the following (I just 
> tested it
> > again):
> >
> > 1. The two backends start
> > 2. start pgpool. So both backend statuses are 2.
> > 3.a stop primary backend,
> >    The connection is lost with the message "server closed the
> > connection unexpectedly
> >        This probably means the server terminated abnormally
> >        before or while processing the request.
> > The connection to the server was lost. Attempting reset: 
> Succeeded.",
> > every time I try to re-run the query.
> >    If I re-attach the primary backend, the connection works 
> just fine
> > again.
> > 3.b stop secondary backend.
> >    The connection keeps going (good).
> >    If I re-attach the primary backend, the connection blocks.
> >
> > It's weird
> >
> > Daniel
> >
> >
> >> -----Original Message-----
> >> From: Marcelo Martins [mailto:pglists at zeroaccess.org]
> >> Sent: Tuesday, January 20, 2009 6:03 PM
> >> To: Crespo, Daniel @ SDS
> >> Cc: pgpool-general at pgfoundry.org
> >> Subject: Re: [Pgpool-general] pcp_attach_node problem?
> >>
> >> yeah just saw your new one when sent mine :)
> >>
> >> weird  that it just keeps throwing that error.
> >> I think I have done the PG shutdown and then re-attaching about 15
> >> times now and I only get the "server closed the connection
> >> unexpectedly" once.
> >>
> >> I haven't tried to apply the patch that Tatsuo mentioned on 18th
> >> though to see what difference it makes. might try that today
> >>
> >>
> >> Marcelo
> >> PostgreSQL DBA
> >> Linux/Solaris System Administrator
> >>
> >> On Jan 20, 2009, at 4:52 PM, Daniel.Crespo at l-3com.com wrote:
> >>
> >>> Hi, Marcelo,
> >>>
> >>> I just wrote to the mail list something about exactly this.
> >>>
> >>> In your description, it doesn't happen to me... I don't 
> know why...
> >>> After doing failover, when a query is executed it throws back that
> >>> "server closed the connection unexpectedly", and keeps
> >> throwing that
> >>> for
> >>> every try I make. No idea about this.
> >>>
> >>> Thanks for the information!
> >>>
> >>> Daniel
> >>>
> >>>> -----Original Message-----
> >>>> From: Marcelo Martins [mailto:pglists at zeroaccess.org]
> >>>> Sent: Tuesday, January 20, 2009 5:34 PM
> >>>> To: Crespo, Daniel @ SDS
> >>>> Subject: Re: [Pgpool-general] pcp_attach_node problem?
> >>>>
> >>>> Hi Daniel,
> >>>>
> >>>> I have just tested that with pgpool 2.1 and I also have the
> >>>> same issue.
> >>>> When I re-attach node 1 (second node) back, the psql
> >>>> connection that I
> >>>> had opened hangs  after executing a second query.
> >>>>
> >>>> ERROR: pid 31003: pool_read2: EOF encountered with backend
> >>>>
> >>>> On the latest CVS version though the hanging issue seems
> >> to be fixed.
> >>>> Now when the failover/failback happens though it seems 
> like pgpool
> >>>> failover_handler process kills the childs that pgpool 
> had open with
> >>>> node 1 (second node - at least that is what I can tell 
> from what I
> >>>> see ) therefore when a query is executed it throws back
> >> that "server
> >>>> closed the connection unexpectedly" . When I execute the query a
> >>>> second time then pgpool uses a new child that has connection
> >>>> opened to
> >>>> node 0 "new_connection: skipping slot 1 because 
> backend_status = 3"
> >>>>
> >>>>
> >>>> Marcelo
> >>>> PostgreSQL DBA
> >>>> Linux/Solaris System Administrator
> >>>>
> >>>> On Jan 13, 2009, at 8:18 AM, Daniel.Crespo at l-3com.com wrote:
> >>>>
> >>>>> Sorry for the delay, I haven't had enough time.
> >>>>>
> >>>>>> 1. Show us the logs. Full logs, but only the relevant
> >>>> parts (got tons
> >>>>>> of things to read every day here). :)
> >>>>>
> >>>>> I'll try it again with full logs to give them to you guys
> >>>>>
> >>>>>> 2. Check whether PostgreSQL is having some problem of some sort
> >>>>>> before
> >>>>>> blaming it on pgpool-II. Can you run the same queries on
> >> both nodes
> >>>>>> and get the same results?
> >>>>>
> >>>>> PostgreSQL is not having any problems. It's not a query problem.
> >>>>> When I
> >>>>> install the latest CVS head, what I showed to you is 
> what happens.
> >>>>> However, when I uninstall it and install the 2.1 released
> >>>> version, it
> >>>>> doesn't happen anymore. The problem with this 2.1 release
> >> is that it
> >>>>> doesn't keep the connection when a node is detached or
> >>>> attached (if I
> >>>>> have an already opened connection and do attach/detach node, it
> >>>>> locks. I
> >>>>> must disconnect and reconnect in order to keep doing
> >>>> queries). Another
> >>>>> problem is that I need the insert lock newly introduced to
> >>>>> automatically
> >>>>> apply on serial fields tables.
> >>>>>
> >>>>>> 3. Check permissions in both bg_hba.conf files.
> >>>>> No problem with this.
> >>>>>
> >>>>>> 4. Have you considered using version 8.3.5 of PostgreSQL
> >>>> and see how
> >>>>>> it goes? Or at least, the last revision of the 8.1 branch.
> >>>>> No. I can not update PostgreSQL. I'm using 8.2.1.
> >>>>>
> >>>>> When I have the logs, I'll post them for sure. Thanks!
> >>>>>
> >>>>> Daniel
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: pgpool-general-bounces at pgfoundry.org
> >>>>>> [mailto:pgpool-general-bounces at pgfoundry.org] On Behalf Of
> >>>>>> Jaume Sabater
> >>>>>> Sent: Friday, January 09, 2009 2:32 AM
> >>>>>> To: pgpool-general at pgfoundry.org
> >>>>>> Subject: Re: [Pgpool-general] pcp_attach_node problem?
> >>>>>>
> >>>>>> On Thu, Jan 8, 2009 at 10:14 PM,
> >> <Daniel.Crespo at l-3com.com> wrote:
> >>>>>>
> >>>>>>>     And issue a SQL Select command on a table, like:
> >>>>>>>         postgres=# select * from pg_stat_activity ;
> >>>>>>>
> >>>>>>> It returns:
> >>>>>>> postgres=# select 1;
> >>>>>>> server closed the connection unexpectedly
> >>>>>>>     This probably means the server terminated abnormally
> >>>>>>>     before or while processing the request.
> >>>>>>> The connection to the server was lost. Attempting reset:
> >>>>>> Succeeded.
> >>>>>>>
> >>>>>>> postgres=# select 1;
> >>>>>>
> >>>>>> Some ideas:
> >>>>>>
> >>>>>> 1. Show us the logs. Full logs, but only the relevant
> >>>> parts (got tons
> >>>>>> of things to read every day here). :)
> >>>>>> 2. Check whether PostgreSQL is having some problem of some sort
> >>>>>> before
> >>>>>> blaming it on pgpool-II. Can you run the same queries on
> >> both nodes
> >>>>>> and get the same results?
> >>>>>> 3. Check permissions in both bg_hba.conf files.
> >>>>>> 4. Have you considered using version 8.3.5 of PostgreSQL
> >>>> and see how
> >>>>>> it goes? Or at least, the last revision of the 8.1 branch.
> >>>>>>
> >>>>>> -- 
> >>>>>> Jaume Sabater
> >>>>>> http://linuxsilo.net/
> >>>>>>
> >>>>>> "Ubi sapientas ibi libertas"
> >>>>>> _______________________________________________
> >>>>>> Pgpool-general mailing list
> >>>>>> Pgpool-general at pgfoundry.org
> >>>>>> http://pgfoundry.org/mailman/listinfo/pgpool-general
> >>>>>>
> >>>>> _______________________________________________
> >>>>> Pgpool-general mailing list
> >>>>> Pgpool-general at pgfoundry.org
> >>>>> http://pgfoundry.org/mailman/listinfo/pgpool-general
> >>>>
> >>>>
> >>
> >>
> 
>