5.11. Online Recovery

Pgpool-II can synchronize database nodes and attach a node without stopping the service. This feature is called "online recovery". Online recovery can be executed by using pcp_recovery_node command.

For online recovery, the recovery target node must be in detached state. This means the node must be either manually detached by pcp_detach_node or automatically detached by Pgpool-II as a consequence of failover.

If you wish to add a PostgreSQL server node dynamically, reload the pgpool.conf after adding the backend_hostname and its associated parameters. This will register the new server to Pgpool-II as a detached backend node, after that you execute pcp_recovery_node command, the server is add.

Note: The recovery target PostgreSQL server must not be running for performing the online recovery. If the target PostgreSQL server has already started, you must shut it down before starting the online recovery.

Online recovery is performed in two phases. The first phase is called "first stage" and the second phase is called "second stage". Only native replication mode and snapshot isolation mode require the second stage. For other modes including streaming replication mode the second stage is not performed and you don't need to provide a script for the stage in recovery_2nd_stage_command. i.e. you can safely leave it as an empty string.

In the first stage the standby (replica) node is created by using PostgreSQL's pg_basebackup, for example, from a backup of the main (primary) node. Update data while executing the first stage will be logged into the PostgreSQL's transaction log.

In the second stage the target recovery node is started. The transaction log will be replayed and the replica node will be completely synced with the master node.

You need to provide scripts for each stage. Complete sample scripts are provided at /etc/pgpool-II/recovery_1st_stage.sample and /etc/pgpool-II/recovery_2nd_stage.sample. Example installation using those scripts can be found in Section 8.2.7.8.

Connections from clients are not allowed only in the second stage while the data can be updated or retrieved during the first stage.

Pgpool-II performs the follows steps in online recovery:

Note: There is a restriction in the online recovery in native replication mode and snapshot isolation mode. If Pgpool-II itself is installed on multiple hosts without enabling watchdog, online recovery does not work correctly, because Pgpool-II has to stop all the clients during the 2nd stage of online recovery. If there are several Pgpool-II hosts, only one of them will have received the online recovery command and will block the connections from clients.

recovery_user (string)

Specifies the PostgreSQL user name to perform online recovery.

This parameter can be changed by reloading the Pgpool-II configurations.

recovery_password (string)

Specifies the password for the PostgreSQL user name configured in recovery_user to perform online recovery.

If recovery_password is left blank Pgpool-II will first try to get the password for recovery_user from pool_passwd file before using the empty password.

You can also specify AES256-CBC encrypted password in recovery_password field. To specify the AES encrypted password, password string must be prefixed with AES after encrypting (using aes-256-cbc algorithm) and encoding to base64.

To specify the unencrypted clear text password, prefix the password string with TEXT. For example if you want to set mypass as a password, you should specify TEXTmypass in the password field. In the absence of a valid prefix, Pgpool-II will considered the string as a plain text password.

You can also use pg_enc utility to create the correctly formatted AES encrypted password strings.

Note: Pgpool-II will require a valid decryption key at the startup to use the encrypted passwords. see Section 6.4.2 for more details on providing the decryption key to Pgpool-II

This parameter can be changed by reloading the Pgpool-II configurations.

recovery_1st_stage_command (string)

Specifies a command to be run by main (primary) node at the first stage of online recovery. The command file must be placed in the database cluster directory for security reasons. For example, if recovery_1st_stage_command = 'sync-command', then Pgpool-II will look for the command script in $PGDATA directory and will try to execute $PGDATA/sync-command.

recovery_1st_stage_command receives following 7 parameters:

  • Path to the database cluster of the main (primary) node.

  • Hostname of the backend node to be recovered.

  • Path to the database cluster of the node to be recovered.

  • Port number of the main (primary) node (Pgpool-II 3.4 or after).

  • Node number to be recovered (Pgpool-II 4.0 or after)

  • Port number to be recovered (Pgpool-II 4.1 or after)

  • Hostname of the main (primary) node (Pgpool-II 4.3 or after)

    Before the hostname of the main (primary) node was obtained by using hostname command. This is mostly ok since the script runs on the main (primary) node anyway. However in some systems the hostname obtained by hostname command is different from the hostname defined in backend_hostname configuration parameter. This could cause a trouble in detach_false_primary because it checks connectivity between primary and standby node by using host parameter in primary_conninfo parameter, which is generated by recovery_1st_stage_command. Thus it is strongly recommended to use this parameter instead of using hostname command to obtain the hostname of the primary node in recovery_1st_stage_command.

Note: Pgpool-II accept connections and queries while recovery_1st_stage command is executed, so you can retrieve and update data.

Caution

recovery_1st_stage command runs as a SQL command from PostgreSQL's point of view. So recovery_1st_stage command can get prematurely killed by PostgreSQL if the PostgreSQL's statement_time_out is configured with the value that is smaller than the time recovery_1st_stage_command takes for completion.

Typical error in such case is

       rsync used in the command is killed by signal 2 for example.
      

This parameter can be changed by reloading the Pgpool-II configurations.

recovery_2nd_stage_command (string)

Specifies a command to be run by main node at the second stage of online recovery. This command is required only native replication mode, so for other modes don't need to provide a command file. The command file must be placed in the database cluster directory for security reasons. For example, if recovery_2nd_stage_command = 'sync-command', then Pgpool-II will look for the command script in $PGDATA directory and will try to execute $PGDATA/sync-command.

recovery_2nd_stage_command receives following 7 parameters:

  • Path to the database cluster of the main(primary) node.

  • Hostname of the backend node to be recovered.

  • Path to the database cluster of the node to be recovered.

  • Port number of the main (primary) node (Pgpool-II 3.4 or after).

  • Node number to be recovered (Pgpool-II 4.0 or after)

  • Port number to be recovered (Pgpool-II 4.1 or after)

  • Hostname of the main (primary) node (Pgpool-II 4.3 or after)

Note: Pgpool-II does not accept client connections and queries during the execution of recovery_2nd_stage_command command, and waits for the existing clients to close their connections before executing the command. Therefore, the recovery_2nd_stage_command may not execute if the client stays connected for a long time.

Caution

recovery_2nd_stage command runs as a SQL command from PostgreSQL's point of view. Therefore, recovery_2nd_stage command can get prematurely killed by PostgreSQL if the PostgreSQL's statement_time_out is configured with the value that is smaller than the time recovery_2nd_stage_command takes for completion.

This parameter can be changed by reloading the Pgpool-II configurations.

recovery_timeout (integer)

Specifies the timeout in seconds to cancel the online recovery if it does not completes within this time. Since Pgpool-II does not accepts the connections during the second stage of online recovery, this parameter can be used to cancel the online recovery to manage the service down time during the online recovery.

This parameter can be changed by reloading the Pgpool-II configurations.

client_idle_limit_in_recovery (integer)

Specifies the time in seconds to disconnect a client if it remains idle since the last query during the online recovery. client_idle_limit_in_recovery is similar to the client_idle_limit but only takes effect during the second stage of online recovery.

This is useful for preventing the Pgpool-II recovery from being disturbed by the lazy clients or if the TCP/IP connection between the client and Pgpool-II is accidentally down (a cut cable for instance).

Note: client_idle_limit_in_recovery must be smaller than recovery_timeout. Otherwise, recovery_timeout comes first and you will see following error while executing online recovery:

	 ERROR:  node recovery failed, waiting connection closed in the other pgpools timeout
	

If set to -1, all clients get immediately disconnected when the second stage of online recovery starts. The default is 0, which turns off the feature.

This parameter can be changed by reloading the Pgpool-II configurations. You can also use PGPOOL SET command to alter the value of this parameter for a current session.