[pgpool-general: 3056] Re: Correct procedure for restarting a pgpool node

Fri Jul 25 12:22:18 JST 2014

Hey,

You are right - reading back that is all a bit vague. I will perform more
investigation, I suppose I just want to confirm this isn't as expected and
that a new pgpool node should sync with a currently active node.

I have installed from the EnterpriseDB PPAS suite installer.

My steps are:

   1. Start pgpool1
   2. Start pgpool2
   3. Verify show_nodes is correct from both pgpool nodes
   4. Run pcp_detach to detach the standby PostgreSQL node (pcp_detach_node
   1 pgpool1 9898 postgres postgres 0)
   5. Verify show_nodes is correct from both pgpool nodes
   6. Stop all pgpool services on pgpool2 (check with ps)
   7. Start pgpool on pgpool2
   8. show_nodes no longer shows the same output on pgpool1 and pgpool2

I'll bump up logging and do a bit more digging.

Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________

 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099

On Fri, Jul 25, 2014 at 12:19 PM, Richard Michael <rmichael at edgeofthenet.org
> wrote:

>
>
>
> On Thu, Jul 24, 2014 at 8:36 PM, James Sewell <james.sewell at lisasoft.com>
> wrote:
>
>> Hello all,
>>
>> I have two pgpool nodes which I am using a TCP load balancer to spread
>> between. I am using watchdog to synchronise PostgreSQL node information
>> between the two and an external HA solution (with ALLOW_TO_FAILOVER).
>>
>> If I start both my pgpool nodes up I get the following inital state:
>>
>> postgres=# show pool_nodes;
>>  node_id |  hostname   | port | status | lb_weight |  role
>> ---------+-------------+------+--------+-----------+---------
>>  0       | 10.10.10.1   | 5432 | 2      | 0.500000  | standby
>>  1       | 10.10.10.2   | 5432 | 2      | 0.500000  | primary
>> (2 rows)
>>
>> And then I run the following command:
>>
>>  pcp_detach_node 1 load_balancer 9898 postgres postgres 0
>>
>> Now both pgpool nodes show the following:
>>
>> postgres=# show pool_nodes;
>>  node_id |  hostname   | port | status | lb_weight |  role
>> ---------+-------------+------+--------+-----------+---------
>>  0       | 10.10.10.1  | 5432 | 3      | 0.500000  | standby
>>  1       | 10.10.10.2  | 5432 | 2      | 0.500000  | primary
>> (2 rows)
>>
>> This proves watchdog is working, as the command is sent to one pgpool
>> node through the load_balancer.
>>
>> Now if I restart pgpool node two I would expect it to come back up with
>> the config above, not the initial config - as it would have talked to the
>> watchdog on the other node. This is not the case though.
>>
>> The non restarted node still reports:
>>
>> postgres=# show pool_nodes;
>>  node_id |  hostname   | port | status | lb_weight |  role
>> ---------+-------------+------+--------+-----------+---------
>>  0       | 10.10.10.1  | 5432 | 3      | 0.500000  | standby
>>  1       | 10.10.10.2  | 5432 | 2      | 0.500000  | primary
>> (2 rows)
>>
>> And the restarted node reports:
>>
>> postgres=# show pool_nodes;
>>  node_id |  hostname   | port | status | lb_weight |  role
>> ---------+-------------+------+--------+-----------+---------
>>  0       | 10.10.10.1   | 5432 | 2      | 0.500000  | standby
>>  1       | 10.10.10.2   | 5432 | 2      | 0.500000  | primary
>> (2 rows)
>>
>> Any ideas on how to fix this?
>>
>
> Relevant portions (or tails) of log files (during the
> restarts/detaches/attaches) from both nodes would be helpful. Permissions
> on the status file [on each node] should also be verified.
>
> Which version of pgpool; and, from where -- RPMs from pgpool.net, or your
> distro?  Basically -- what SHA1 of source?  (Note, this is different from
> version "3.3.3" because the RPMs -1, -2, -3 from pgpool.net are built
> from source bumped down the 3_3_STABLE branch, not simply tag 3.3.3).
>  Peruse the recent commits on V3_3_STABLE and the bug tracker for several
> issues preventing nodes from communicating during restart.
>
> I've needed to run pgp attach_node to teach pgpool that the failed node is
> back; it did not detect it in my situation (although, I have perhaps
> overlooked configuration).  I intend to investigate this.
>
> What is "restart" exactly?  Shutdown and use ps to confirm pgpool and all
> related processes (watchdog, etc.) are gone, then start it again?  Or, HUP
> or .. ?
>
>
> Just two remarks aside, I think it would be easier to follow your
> discussion/explanation if:
>
> 1/  you referred to the machines by node_id, as opposed to ".. I restart
> node two .. ".  I assume you mean: you restarted node_id 0 -- the status 3
> (down) node -- and it came back up knowing that it was back up (status 2,
> and as standby), but the "always up" node_id 1 remains unaware node_id 0
> has returned.  (Explicitly telling node_id 1 that node_id 0 is back with
> pgp attach will probably "solve" this, as I mention above, but that's not
> automated.)
>
>  2/  you indicated the IP (hence node_id) through which you've run
> show_nodes, e.g., "psql -h 10.10.10.1 -c 'SELECT show_nodes();'"
>
> Hope some of this helps.
>
> Regards,
> Richard
>
>
>> James Sewell,
>> PostgreSQL Team Lead / Solutions Architect
>> ______________________________________
>>
>>
>>  Level 2, 50 Queen St, Melbourne VIC 3000
>>
>> *P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099
>>
>>
>> ------------------------------
>> The contents of this email are confidential and may be subject to legal
>> or professional privilege and copyright. No representation is made that
>> this email is free of viruses or other defects. If you have received this
>> communication in error, you may not copy or distribute any part of it or
>> otherwise disclose its contents to anyone. Please advise the sender of your
>> incorrect receipt of this correspondence.
>>
>>
>> _______________________________________________
>> pgpool-general mailing list
>> pgpool-general at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>
>>
>

-- 

------------------------------
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20140725/c5139093/attachment.html>