5.10. Online Recovery

Pgpool-II can synchronize database nodes and attach a node without stopping the service. This feature is called "online recovery". Online recovery can be executed by using pcp_recovery_node command.

For online recovery, the recovery target node must be in detached state. This means the node must be either manually detached by pcp_detach_node or automatically detached by Pgpool-II as a consequence of failover.

If you wish to add a PostgreSQL server node dynamically, reload the pgpool.conf after adding the backend_hostname and its associated parameters. This will register the new server to Pgpool-II as a detached backend node.

Note: The recovery target PostgreSQL server must not be running for performing the online recovery. If the target PostgreSQL server has already started, you must shut it down before starting the online recovery.

Online recovery is performed in two phases. The first phase is called "first stage" and the second phase is called "second stage". You need to provide scripts for each stage. Only replication_mode requires the second stage. For other modes including streaming replication mode the second stage is not performed and you don't need to provide a script for the stage in recovery_2nd_stage_command. i.e. you can safely leave it as an empty string.

Connections from clients are not allowed only in the second stage while the data can be updated or retrieved during the first stage.

Pgpool-II performs the follows steps in online recovery:

Note: There is a restriction in the online recovery in replication_mode. If Pgpool-II itself is installed on multiple hosts, online recovery does not work correctly, because Pgpool-II has to stop all the clients during the 2nd stage of online recovery. If there are several Pgpool-II hosts, only one of them will have received the online recovery command and will block the connections from clients.

recovery_user (string)

Specifies the PostgreSQL user name to perform online recovery.

This parameter can be changed by reloading the Pgpool-II configurations.

recovery_password (string)

Specifies the password for the PostgreSQL user name configured in recovery_user to perform online recovery.

This parameter can be changed by reloading the Pgpool-II configurations.

recovery_1st_stage_command (string)

Specifies a command to be run by master (primary) node at the first stage of online recovery. The command file must be placed in the database cluster directory for security reasons. For example, if recovery_1st_stage_command = 'sync-command', then Pgpool-II will look for the command script in $PGDATA directory and will try to execute $PGDATA/sync-command.

recovery_1st_stage_command receives following 4 parameters:

  • Path to the database cluster of the master (primary) node.

  • Hostname of the backend node to be recovered.

  • Path to the database cluster of the node to be recovered.

  • Port number of the master (primary) node.

Note: Pgpool-II accept connections and queries while recovery_1st_stage command is executed, so you can retrieve and update data.

Caution

recovery_1st_stage command runs as a SQL command from PostgreSQL's point of view. So recovery_1st_stage command can get prematurely killed by PostgreSQL if the PostgreSQL's statement_time_out is configured with the value that is smaller than the time recovery_1st_stage_command takes for completion.

Typical error in such case is

       rsync used in the command is killed by signal 2 for example.
      

This parameter can be changed by reloading the Pgpool-II configurations.

recovery_2nd_stage_command (string)

Specifies a command to be run by master node at the second stage of online recovery. This parameter only necessary for replication_mode. The command file must be placed in the database cluster directory for security reasons. For example, if recovery_2nd_stage_command = 'sync-command', then Pgpool-II will look for the command script in $PGDATA directory and will try to execute $PGDATA/sync-command.

recovery_2nd_stage_command receives following 4 parameters:

  • Path to the database cluster of the master(primary) node.

  • Hostname of the backend node to be recovered.

  • Path to the database cluster of the node to be recovered.

  • Port number of the master (primary) node.

Note: Pgpool-II does not accept client connections and queries during the execution of recovery_2nd_stage_command command, and waits for the existing clients to close their connections before executing the command. Therefore, the recovery_2nd_stage_command may not execute if the client stays connected for a long time.

Caution

recovery_2nd_stage command runs as a SQL command from PostgreSQL's point of view. Therefore, recovery_2nd_stage command can get prematurely killed by PostgreSQL if the PostgreSQL's statement_time_out is configured with the value that is smaller than the time recovery_2nd_stage_command takes for completion.

This parameter can be changed by reloading the Pgpool-II configurations.

recovery_timeout (integer)

Specifies the timeout in seconds to cancel the online recovery if it does not completes within this time. Since Pgpool-II does not accepts the connections during the second stage of online recovery, this parameter can be used to cancel the online recovery to manage the service down time during the online recovery.

This parameter can be changed by reloading the Pgpool-II configurations.

client_idle_limit_in_recovery (integer)

Specifies the time in seconds to disconnect a client if it remains idle since the last query during the online recovery. client_idle_limit_in_recovery is similar to the client_idle_limit but only takes effect during the second stage of online recovery.

This is useful for preventing the Pgpool-II recovery from being disturbed by the lazy clients or if the TCP/IP connection between the client and Pgpool-II is accidentally down (a cut cable for instance).

Note: client_idle_limit_in_recovery must be smaller than recovery_timeout. Otherwise, recovery_timeout comes first and you will see following error while executing online recovery:

	 ERROR:  node recovery failed, waiting connection closed in the other pgpools timeout
	

If set to -1, all clients get immediately disconnected when the second stage of online recovery starts. The default is 0, which turns off the feature.

This parameter can be changed by reloading the Pgpool-II configurations. You can also use PGPOOL SET command to alter the value of this parameter for a current session.