[pgpool-general: 3055] Re: Correct procedure for restarting a pgpool node

Fri Jul 25 11:19:46 JST 2014

On Thu, Jul 24, 2014 at 8:36 PM, James Sewell <james.sewell at lisasoft.com>
wrote:

> Hello all,
>
> I have two pgpool nodes which I am using a TCP load balancer to spread
> between. I am using watchdog to synchronise PostgreSQL node information
> between the two and an external HA solution (with ALLOW_TO_FAILOVER).
>
> If I start both my pgpool nodes up I get the following inital state:
>
> postgres=# show pool_nodes;
>  node_id |  hostname   | port | status | lb_weight |  role
> ---------+-------------+------+--------+-----------+---------
>  0       | 10.10.10.1   | 5432 | 2      | 0.500000  | standby
>  1       | 10.10.10.2   | 5432 | 2      | 0.500000  | primary
> (2 rows)
>
> And then I run the following command:
>
>  pcp_detach_node 1 load_balancer 9898 postgres postgres 0
>
> Now both pgpool nodes show the following:
>
> postgres=# show pool_nodes;
>  node_id |  hostname   | port | status | lb_weight |  role
> ---------+-------------+------+--------+-----------+---------
>  0       | 10.10.10.1  | 5432 | 3      | 0.500000  | standby
>  1       | 10.10.10.2  | 5432 | 2      | 0.500000  | primary
> (2 rows)
>
> This proves watchdog is working, as the command is sent to one pgpool node
> through the load_balancer.
>
> Now if I restart pgpool node two I would expect it to come back up with
> the config above, not the initial config - as it would have talked to the
> watchdog on the other node. This is not the case though.
>
> The non restarted node still reports:
>
> postgres=# show pool_nodes;
>  node_id |  hostname   | port | status | lb_weight |  role
> ---------+-------------+------+--------+-----------+---------
>  0       | 10.10.10.1  | 5432 | 3      | 0.500000  | standby
>  1       | 10.10.10.2  | 5432 | 2      | 0.500000  | primary
> (2 rows)
>
> And the restarted node reports:
>
> postgres=# show pool_nodes;
>  node_id |  hostname   | port | status | lb_weight |  role
> ---------+-------------+------+--------+-----------+---------
>  0       | 10.10.10.1   | 5432 | 2      | 0.500000  | standby
>  1       | 10.10.10.2   | 5432 | 2      | 0.500000  | primary
> (2 rows)
>
> Any ideas on how to fix this?
>

Relevant portions (or tails) of log files (during the
restarts/detaches/attaches) from both nodes would be helpful. Permissions
on the status file [on each node] should also be verified.

Which version of pgpool; and, from where -- RPMs from pgpool.net, or your
distro?  Basically -- what SHA1 of source?  (Note, this is different from
version "3.3.3" because the RPMs -1, -2, -3 from pgpool.net are built from
source bumped down the 3_3_STABLE branch, not simply tag 3.3.3).  Peruse
the recent commits on V3_3_STABLE and the bug tracker for several issues
preventing nodes from communicating during restart.

I've needed to run pgp attach_node to teach pgpool that the failed node is
back; it did not detect it in my situation (although, I have perhaps
overlooked configuration).  I intend to investigate this.

What is "restart" exactly?  Shutdown and use ps to confirm pgpool and all
related processes (watchdog, etc.) are gone, then start it again?  Or, HUP
or .. ?

Just two remarks aside, I think it would be easier to follow your
discussion/explanation if:

1/  you referred to the machines by node_id, as opposed to ".. I restart
node two .. ".  I assume you mean: you restarted node_id 0 -- the status 3
(down) node -- and it came back up knowing that it was back up (status 2,
and as standby), but the "always up" node_id 1 remains unaware node_id 0
has returned.  (Explicitly telling node_id 1 that node_id 0 is back with
pgp attach will probably "solve" this, as I mention above, but that's not
automated.)

 2/  you indicated the IP (hence node_id) through which you've run
show_nodes, e.g., "psql -h 10.10.10.1 -c 'SELECT show_nodes();'"

Hope some of this helps.

Regards,
Richard

> James Sewell,
> PostgreSQL Team Lead / Solutions Architect
> ______________________________________
>
>
>  Level 2, 50 Queen St, Melbourne VIC 3000
>
> *P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099
>
>
> ------------------------------
> The contents of this email are confidential and may be subject to legal or
> professional privilege and copyright. No representation is made that this
> email is free of viruses or other defects. If you have received this
> communication in error, you may not copy or distribute any part of it or
> otherwise disclose its contents to anyone. Please advise the sender of your
> incorrect receipt of this correspondence.
>
>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20140724/61a9a1fd/attachment.html>