[pgpool-hackers: 3920] Re: [pgpool-general: 7588] Re: Promote specific backend to primary in streaming replication

Thu Jun 10 20:28:19 JST 2021

Sorry for a duplicating post. I feel I need to discuss with developers
living in pgpool-hackers.

Previously I posted the PoC patch. Now I decided to create a commit-table patch.
Differences from the PoC patch are:

- The user interface is now pcp_promote, rather than hacked version of
  pcp_detach_node.

- I added new parameter " -r, --really-promote really promote" to
  pcp_promote_node command, which is needed to keep the backward
  compatibility. If the option is specified, pcp_promote_node will
  detach the current primary and kicks failover command, with the main
  node id to be set to the node id specified by pcp_promote_command. A
  follow primary command will run if it is set.

- To pass the new argument to pcp process, I tweaked number of
  internal functions used in the pcp system.

- Doc patch added.

Here is the sample session of the modified pcp_promote_node.

$ pcp_promote_node --help
pcp_promote_node - promote a node as new main from pgpool-II
Usage:
pcp_promote_node [OPTION...] [node-id]
Options:
  -U, --username=NAME    username for PCP authentication
  -h, --host=HOSTNAME    pgpool-II host
  -p, --port=PORT        PCP port number
  -w, --no-password      never prompt for password
  -W, --password         force password prompt (should happen automatically)
  -n, --node-id=NODEID   ID of a backend node
  -g, --gracefully       promote gracefully(optional)
  -r, --really-promote   really promote(optional) <-- added
  -d, --debug            enable debug message (optional)
  -v, --verbose          output verbose messages
  -?, --help             print this help

# start with 4-node cluster. Primary is node 0.

$ psql -p 11000 -c "show pool_nodes" test
 node_id | hostname | port  | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | /tmp     | 11002 | up     | up        | 0.250000  | primary | primary | 0          | false             | 0                 |                   |                        | 2021-06-10 20:17:41
 1       | /tmp     | 11003 | up     | up        | 0.250000  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2021-06-10 20:17:41
 2       | /tmp     | 11004 | up     | up        | 0.250000  | standby | standby | 0          | true              | 0                 | streaming         | async                  | 2021-06-10 20:17:41
 3       | /tmp     | 11005 | up     | up        | 0.250000  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2021-06-10 20:17:41
(4 rows)

# promote node 2.
$ pcp_promote_node -p 11001 -w -r 2
pcp_promote_node -- Command Successful

[a little time has passed]

# see if node 2 becomes new primary. Note that other standbys are sync
# with new primary because follow primary command is set.

psql -p 11000 -c "show pool_nodes" test
 node_id | hostname | port  | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | /tmp     | 11002 | up     | up        | 0.250000  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2021-06-10 20:20:31
 1       | /tmp     | 11003 | up     | up        | 0.250000  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2021-06-10 20:20:31
 2       | /tmp     | 11004 | up     | up        | 0.250000  | primary | primary | 0          | true              | 0                 |                   |                        | 2021-06-10 20:18:20
 3       | /tmp     | 11005 | up     | up        | 0.250000  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2021-06-10 20:20:31
(4 rows)

Patch attached.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Hi Nathan,
> 
> I have revisited this and am leaning toward the idea: modifying
> existing pcp command so that we can specify which node to be promoted.
> 
> My idea is using the detaching node protocol of pcp command. Currently
> the protocol specifies the node to be detached. I add a new request
> detail flag bit "REQ_DETAIL_PROMOTE". If the flag is set, the protocol
> treats the requested node id to be promoted, rather than to be
> detached. In this case the node to be detached will be the current
> primary node. The node to be promoted is passed to a failover script
> as the existing argument "new main node" (the live node (not including
> current primary node) which has the smallest node id). Existing
> failover scripts usually treat the "new main node" as the node to be
> promoted.
> 
> Attached is the PoC patch for this. After applying the patch, the -g
> argument (detach gracefully) of pcp_detach_node specifies the
> "REQ_DETAIL_PROMOTE" flag and the node id argument now represents the
> node id to be promoted.  For example:
> 
> # create 4-node cluster.
> $ pgpool_setup -n 4
> 
> # primary node is 0.
> $ psql -p 11000 -c "show pool_nodes" test
>  node_id | hostname | port  | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
> ---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
>  0       | /tmp     | 11002 | up     | up        | 0.250000  | primary | primary | 0          | false             | 0                 |                   |                        | 2021-06-07 11:02:25
>  1       | /tmp     | 11003 | up     | up        | 0.250000  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2021-06-07 11:02:25
>  2       | /tmp     | 11004 | up     | up        | 0.250000  | standby | standby | 0          | true              | 0                 | streaming         | async                  | 2021-06-07 11:02:25
>  3       | /tmp     | 11005 | up     | up        | 0.250000  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2021-06-07 11:02:25
> (4 rows)
> 
> # promote node 2.
> pcp_detach_node -g -p 11001 -w 2
> psql -p 11000 -c "show pool_nodes" test
>  node_id | hostname | port  | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
> ---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
>  0       | /tmp     | 11002 | up     | up        | 0.250000  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2021-06-07 11:03:17
>  1       | /tmp     | 11003 | up     | up        | 0.250000  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2021-06-07 11:03:17
>  2       | /tmp     | 11004 | up     | up        | 0.250000  | primary | primary | 0          | true              | 0                 |                   |                        | 2021-06-07 11:02:43
>  3       | /tmp     | 11005 | up     | up        | 0.250000  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2021-06-07 11:03:27
> (4 rows)
> 
> (note that the node 3 may become in down status after the execution of
> pcp_detach_node but it's because other issue. See [pgpool-hackers:
> 3915] ERROR: failed to process PCP request at the moment for more
> details.)
> 
> One of the merits of this method is, existing failover scripts need
> not to be changed.  (Of course we could modify existing
> pcp_promote_node so that it does the job).
> 
> However I see difficulties with modifying existing pcp commands
> (either pcp_detach_node and pcp_promote_node). The protocol used by
> pcp commands are very limited in that the only argument to be able to
> use is, the node id. So how pcp_detach_node implements -g flag?
> pcp_detach_node uses two kinds of pcp protocol: one is for without -g
> and the other is with -g. (I think eventually we should overhaul
> existing pcp protocol someday to overcome these limitations but it's
> other story).
> 
> Probably we need to invent new pcp protocol and use it for this
> purpose.
> 
> Comments and suggestions are welcome.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
> 
>> Hello!
>> 
>>> On 7/04/2021, at 12:57 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>>> 
>>> Hi,
>>> 
>>>> Thanks for your time on this so far!
>>>> 
>>>> Would it be possible to allow these to be changed at runtime without a config change + reload? Perhaps a more generic pcp command, to allow any “reloadable” configuration item to be changed? I’m not sure how many of these there are, and if any of them require deeper changes. Maybe we need a 3rd “class” of configuration item, to allow it to be changed without changing config.
>>> 
>>> Not sure. PostgreSQL does not have such a 3rd"class".
>> 
>> That’s true, though you can modify a special configuration file without editing files directly with ALTER SYSTEM and then reload the configuration - i.e. you can make changes to the running server without editing configuration files directly.
>> 
>> Pgpool could somewhat mirror postgres functionality here, if there was an include directive, then a reload is required. If we enabled configuration to be added to such a file with pcp, that would be useful, so it could be done from a remote system perhaps?
>> 
>> Just some thoughts.. these seem like bigger changes, and I’m new here :-)
>> 
>>>> We (and I am sure many others) use config management tools to keep our config in sync. In particular, we use Puppet. I have a module which I’m hoping to be able to publish soon, in fact.
>>>> 
>>>> Usually, environments which use config management like this have policies to not modify configuration which is managed.
>>>> 
>>>> Switching to a different backend would mean we have to manually update that managed configuration (on the pgpool primary, presumably) and reload, and then run pcp_detach_node against the primary backend. In the time between updating the configuration, and detaching the node, our automation might run and undo our changes which would mean we may get an outcome we don’t expect.
>>>> An alternative would be to update the configuration in our management system and deploy it - but in many environments making these changes can be quite involved. I don’t think it’s good to require operational changes (i.e. “move the primary function to this backend”) to require configuration updates.
>>>> 
>>>> A work-around to this would be to have an include directive in the configuration and include a local, non-managed file for these sorts of temporary changes, but pgpool doesn’t have this at the moment I believe?
>>> 
>>> No, pgpool does not have. However I think it would be a good idea to
>>> have "include" feature like PostgreSQL.
>> 
>> I would certainly use that. My puppet module writes the configuration data twice:
>> 1) In to the main pgpool.conf file.
>> 2) In to one of two files, either a “reloadable” config, or a “restart required” config - so that I can use puppet notifications for changes to those files to trigger either a reload or a restart.
>> It then does some checks to make sure the pgpool.conf is older than the reloadable/restart required config, etc. etc. - it’s quite messy!
>> 
>> An include directive would make that all a lot easier.. and of course would enable this use case.
>> 
>> --
>> Nathan Ward
>> 
>> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: promote_specified_node_v2.diff
Type: text/x-patch
Size: 21266 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20210610/f2dbd943/attachment-0001.bin>