[Pgpool-general] pcp_attach_node problem?

Thu Jan 22 00:00:04 UTC 2009

No worries, I guess no more testing for me tonight :)

Just out of curiosity, what linux distro you were/are having issues  
uninstalling the pgpool-II package ?

Marcelo
PostgreSQL DBA
Linux/Solaris System Administrator

On Jan 21, 2009, at 5:04 PM, Daniel.Crespo at l-3com.com wrote:

> Definitely, my box is not good: I tested UNinstalling pgpool-II-2.1  
> from
> the current servers that I'm using, and INstalled the latest CVS  
> version
> (I got it today). Everything worked exactly as Marcelo said. Life is
> good.
>
> I have no idea what is wrong with my box, but exactly the same thing
> happened when I transitioned from 2.0.1 to 2.1: I had to reimage. The
> uninstallation process would work I don't know why. I did this same
> uninstallation process on the servers I'm working on, and everything  
> is
> working.
>
> Thanks a lot, Marcelo, for enlighten me.
>
> Daniel
>
>> -----Original Message-----
>> From: pgpool-general-bounces at pgfoundry.org
>> [mailto:pgpool-general-bounces at pgfoundry.org] On Behalf Of
>> Daniel.Crespo at l-3com.com
>> Sent: Wednesday, January 21, 2009 1:33 PM
>> To: Marcelo Martins
>> Cc: pgpool-general at pgfoundry.org
>> Subject: Re: [Pgpool-general] pcp_attach_node problem?
>>
>>
>> Thanks for your response, I really appreciate it.
>>
>>> First, I don't really agree on just attaching a node back into the
>>> pool the manner your are doing with the steps shown below. If a
>>> postgreSQL backend node goes down, for some reason out of anyone's
>>> control, you should bring that node back into the pool by using
>>> online_recovery, that's why that mechanism is in place.
>>>
>>> Now there are times that we may need to purposely take one of the
>>> postgreSQL backend nodes down, (I agree on that)  but when that is
>>> the  case one should have in place some maintenance
>>> procedures. There
>>> are several scenarios though depending on your setup.  You
>>> may need to
>>> keep your environment in read/write mode at all times which
>>> means you
>>> would use the pcp utilities to detach the PG node, do whatever you
>>> need to do and then use the pcp online recovery to bring that node
>>> back on the pool. (not pcp attach)
>>> If you happen to be able to have your environment in
>> read-only mode
>>> then you could use the pcp detach to take the backend node
>>> out of the
>>> pool and then then use pcp attach to bring that node back
>>> into the pool.
>>
>> I understand your point and that's what I think too. But my
>> example only
>> shows unit testing.
>>
>> My real case is as follows:
>>
>> I have 2 or 4 server configuration.
>>
>> 2-server configuration:
>> Application and DB run in each server
>>
>> 4-server configuration:
>> Application run in two of the servers
>> DB run in the other two servers.
>>
>> Pgpool-II would run only in the server where the application
>> is running
>> (a total of 2), but only one application would be active at a
>> time. The
>> applications would always connect to localhost port 9999.
>>
>> In any case, when we are installing the applications and DBs, it's
>> always done one at a time (this is the procedure and can not  
>> currently
>> be changed).
>>
>> The worst case scenerio for pgpool is at installation time
>> with 2-server
>> setup:
>> 1. Install first server (App & pgpool and DB)
>> 2. Install second server (App & pgpool and DB)
>>
>> For changes to take effect, the installation reboots the server  
>> (don't
>> ask me... It's the way it has been and takes a lot of time/money to
>> replace this procedure). So imagine it:
>> At the end of step 1, the system reboots. When it comes up, only the
>> first of the two servers is up; the other one does not have even IP
>> address set. Pgpool starts and sees that there's no secondary
>> database.
>> With failover_command I trigger a script that would look for
>> availability of the secondary database.
>> At the end of step two, after rebooting, secondary server is up and
>> running. Its pgpool will successfully connect to both databases since
>> the first one is already up. However, the script running in
>> the primary
>> server detects that there's the secondary database running (I
>> check for
>> specific tables in the database, so I know it's up and ready
>> for running
>> application requests). If specific data in tables are not the same
>> between primary and secondary database for any reason, I will do
>> *manual* pcp recovery; otherwise (which is the most likely to
>> happen at
>> installation time since it has been just installed and both databases
>> should have the same data), do pcp attach.
>>
>> Why don't I do pcp recovery in all cases? Because pcp
>> recovery requires
>> no connections from the application at the second stage of
>> the recovery.
>> With the release that is working for me (2.1) I can not disconnect
>> clients at second stage only (using
>> client_idle_limit_in_recovery in the
>> latest copy of pgpool-II), so I need to close the application on
>> purpose. Therefore, I need manual recovery. In this regard,
>> I'm going to
>> re-image my development box and install a fresh latest CVS version of
>> pgpool-II, because something funny like this happened when I went  
>> from
>> 2.0.1 to 2.1, so... No clue. The thing is that I'm running
>> out of time.
>>
>> In conclusion, it should not behave the way it does when I
>> disconnect a
>> backend and do pcp attach after that.
>>
>>> I have downloaded the latest CVS version and tried the
>>> following a few
>>> times and did not see any issues.
>>
>> I'll push very hard to use it, starting with re-imaging my box.
>>
>>> On your last step though, you mentioned that you "re-attached the
>>> primary" backend but I guess you meant the secondary backend since
>>> that was the one you stopped.
>>
>> Yes, you are right: I meant 'sceondary'.
>>
>>> Marcelo
>>> PostgreSQL DBA
>>> Linux/Solaris System Administrator
>>
>> Thanks, Marcelo
>>
>> Daniel
>>
>>>
>>> On Jan 20, 2009, at 5:46 PM, Daniel.Crespo at l-3com.com wrote:
>>>
>>>> I think the patch is for debugging purposes, but I'm not sure.
>>>>
>>>> The weird thing that happens to me is the following (I just
>>> tested it
>>>> again):
>>>>
>>>> 1. The two backends start
>>>> 2. start pgpool. So both backend statuses are 2.
>>>> 3.a stop primary backend,
>>>>   The connection is lost with the message "server closed the
>>>> connection unexpectedly
>>>>       This probably means the server terminated abnormally
>>>>       before or while processing the request.
>>>> The connection to the server was lost. Attempting reset:
>>> Succeeded.",
>>>> every time I try to re-run the query.
>>>>   If I re-attach the primary backend, the connection works
>>> just fine
>>>> again.
>>>> 3.b stop secondary backend.
>>>>   The connection keeps going (good).
>>>>   If I re-attach the primary backend, the connection blocks.
>>>>
>>>> It's weird
>>>>
>>>> Daniel
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Marcelo Martins [mailto:pglists at zeroaccess.org]
>>>>> Sent: Tuesday, January 20, 2009 6:03 PM
>>>>> To: Crespo, Daniel @ SDS
>>>>> Cc: pgpool-general at pgfoundry.org
>>>>> Subject: Re: [Pgpool-general] pcp_attach_node problem?
>>>>>
>>>>> yeah just saw your new one when sent mine :)
>>>>>
>>>>> weird  that it just keeps throwing that error.
>>>>> I think I have done the PG shutdown and then
>> re-attaching about 15
>>>>> times now and I only get the "server closed the connection
>>>>> unexpectedly" once.
>>>>>
>>>>> I haven't tried to apply the patch that Tatsuo mentioned on 18th
>>>>> though to see what difference it makes. might try that today
>>>>>
>>>>>
>>>>> Marcelo
>>>>> PostgreSQL DBA
>>>>> Linux/Solaris System Administrator
>>>>>
>>>>> On Jan 20, 2009, at 4:52 PM, Daniel.Crespo at l-3com.com wrote:
>>>>>
>>>>>> Hi, Marcelo,
>>>>>>
>>>>>> I just wrote to the mail list something about exactly this.
>>>>>>
>>>>>> In your description, it doesn't happen to me... I don't
>>> know why...
>>>>>> After doing failover, when a query is executed it
>> throws back that
>>>>>> "server closed the connection unexpectedly", and keeps
>>>>> throwing that
>>>>>> for
>>>>>> every try I make. No idea about this.
>>>>>>
>>>>>> Thanks for the information!
>>>>>>
>>>>>> Daniel
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Marcelo Martins [mailto:pglists at zeroaccess.org]
>>>>>>> Sent: Tuesday, January 20, 2009 5:34 PM
>>>>>>> To: Crespo, Daniel @ SDS
>>>>>>> Subject: Re: [Pgpool-general] pcp_attach_node problem?
>>>>>>>
>>>>>>> Hi Daniel,
>>>>>>>
>>>>>>> I have just tested that with pgpool 2.1 and I also have the
>>>>>>> same issue.
>>>>>>> When I re-attach node 1 (second node) back, the psql
>>>>>>> connection that I
>>>>>>> had opened hangs  after executing a second query.
>>>>>>>
>>>>>>> ERROR: pid 31003: pool_read2: EOF encountered with backend
>>>>>>>
>>>>>>> On the latest CVS version though the hanging issue seems
>>>>> to be fixed.
>>>>>>> Now when the failover/failback happens though it seems
>>> like pgpool
>>>>>>> failover_handler process kills the childs that pgpool
>>> had open with
>>>>>>> node 1 (second node - at least that is what I can tell
>>> from what I
>>>>>>> see ) therefore when a query is executed it throws back
>>>>> that "server
>>>>>>> closed the connection unexpectedly" . When I execute
>> the query a
>>>>>>> second time then pgpool uses a new child that has connection
>>>>>>> opened to
>>>>>>> node 0 "new_connection: skipping slot 1 because
>>> backend_status = 3"
>>>>>>>
>>>>>>>
>>>>>>> Marcelo
>>>>>>> PostgreSQL DBA
>>>>>>> Linux/Solaris System Administrator
>>>>>>>
>>>>>>> On Jan 13, 2009, at 8:18 AM, Daniel.Crespo at l-3com.com wrote:
>>>>>>>
>>>>>>>> Sorry for the delay, I haven't had enough time.
>>>>>>>>
>>>>>>>>> 1. Show us the logs. Full logs, but only the relevant
>>>>>>> parts (got tons
>>>>>>>>> of things to read every day here). :)
>>>>>>>>
>>>>>>>> I'll try it again with full logs to give them to you guys
>>>>>>>>
>>>>>>>>> 2. Check whether PostgreSQL is having some problem
>> of some sort
>>>>>>>>> before
>>>>>>>>> blaming it on pgpool-II. Can you run the same queries on
>>>>> both nodes
>>>>>>>>> and get the same results?
>>>>>>>>
>>>>>>>> PostgreSQL is not having any problems. It's not a
>> query problem.
>>>>>>>> When I
>>>>>>>> install the latest CVS head, what I showed to you is
>>> what happens.
>>>>>>>> However, when I uninstall it and install the 2.1 released
>>>>>>> version, it
>>>>>>>> doesn't happen anymore. The problem with this 2.1 release
>>>>> is that it
>>>>>>>> doesn't keep the connection when a node is detached or
>>>>>>> attached (if I
>>>>>>>> have an already opened connection and do
>> attach/detach node, it
>>>>>>>> locks. I
>>>>>>>> must disconnect and reconnect in order to keep doing
>>>>>>> queries). Another
>>>>>>>> problem is that I need the insert lock newly introduced to
>>>>>>>> automatically
>>>>>>>> apply on serial fields tables.
>>>>>>>>
>>>>>>>>> 3. Check permissions in both bg_hba.conf files.
>>>>>>>> No problem with this.
>>>>>>>>
>>>>>>>>> 4. Have you considered using version 8.3.5 of PostgreSQL
>>>>>>> and see how
>>>>>>>>> it goes? Or at least, the last revision of the 8.1 branch.
>>>>>>>> No. I can not update PostgreSQL. I'm using 8.2.1.
>>>>>>>>
>>>>>>>> When I have the logs, I'll post them for sure. Thanks!
>>>>>>>>
>>>>>>>> Daniel
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: pgpool-general-bounces at pgfoundry.org
>>>>>>>>> [mailto:pgpool-general-bounces at pgfoundry.org] On Behalf Of
>>>>>>>>> Jaume Sabater
>>>>>>>>> Sent: Friday, January 09, 2009 2:32 AM
>>>>>>>>> To: pgpool-general at pgfoundry.org
>>>>>>>>> Subject: Re: [Pgpool-general] pcp_attach_node problem?
>>>>>>>>>
>>>>>>>>> On Thu, Jan 8, 2009 at 10:14 PM,
>>>>> <Daniel.Crespo at l-3com.com> wrote:
>>>>>>>>>
>>>>>>>>>>    And issue a SQL Select command on a table, like:
>>>>>>>>>>        postgres=# select * from pg_stat_activity ;
>>>>>>>>>>
>>>>>>>>>> It returns:
>>>>>>>>>> postgres=# select 1;
>>>>>>>>>> server closed the connection unexpectedly
>>>>>>>>>>    This probably means the server terminated abnormally
>>>>>>>>>>    before or while processing the request.
>>>>>>>>>> The connection to the server was lost. Attempting reset:
>>>>>>>>> Succeeded.
>>>>>>>>>>
>>>>>>>>>> postgres=# select 1;
>>>>>>>>>
>>>>>>>>> Some ideas:
>>>>>>>>>
>>>>>>>>> 1. Show us the logs. Full logs, but only the relevant
>>>>>>> parts (got tons
>>>>>>>>> of things to read every day here). :)
>>>>>>>>> 2. Check whether PostgreSQL is having some problem
>> of some sort
>>>>>>>>> before
>>>>>>>>> blaming it on pgpool-II. Can you run the same queries on
>>>>> both nodes
>>>>>>>>> and get the same results?
>>>>>>>>> 3. Check permissions in both bg_hba.conf files.
>>>>>>>>> 4. Have you considered using version 8.3.5 of PostgreSQL
>>>>>>> and see how
>>>>>>>>> it goes? Or at least, the last revision of the 8.1 branch.
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> Jaume Sabater
>>>>>>>>> http://linuxsilo.net/
>>>>>>>>>
>>>>>>>>> "Ubi sapientas ibi libertas"
>>>>>>>>> _______________________________________________
>>>>>>>>> Pgpool-general mailing list
>>>>>>>>> Pgpool-general at pgfoundry.org
>>>>>>>>> http://pgfoundry.org/mailman/listinfo/pgpool-general
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pgpool-general mailing list
>>>>>>>> Pgpool-general at pgfoundry.org
>>>>>>>> http://pgfoundry.org/mailman/listinfo/pgpool-general
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>>
>> _______________________________________________
>> Pgpool-general mailing list
>> Pgpool-general at pgfoundry.org
>> http://pgfoundry.org/mailman/listinfo/pgpool-general
>>