7.3. Pgpool-II + Watchdog Setup Example

This section shows an example of streaming replication configuration using Pgpool-II. In this example, we use 3 Pgpool-II servers to manage PostgreSQL servers to create a robust cluster system and avoid the single point of failure or split brain.

7.3.1. Requirements

We assume that all the Pgpool-II servers and the PostgreSQL servers are in the same subnet.

7.3.2. Cluster System Configuration

We use 2 PostgreSQL servers and 3 Pgpool-II servers with CentOS7. Let these servers be osspc16, osspc17, osspc18, osspc19 and osspc20.

Figure 7-1. Cluster System Configuration

Note: The roles of Active, Standy, Primary, Standby are not fixed and may be changed by further operations.

7.3.3. Installation

In this example, we install Pgpool-II and PostgreSQL by using RPM packages.

Table 7-2. Pgpool-II, PostgreSQL version information and Configuration

ServerVersionHost NamePort$PGDATA Directory
PostgreSQL server (primary)PostgreSQL 9.6.1osspc195432/var/lib/pgsql/9.6/data
PostgreSQL server (standby) PostgreSQL 9.6.1osspc205432/var/lib/pgsql/9.6/data
Pgpool-II serverPgpool-II 3.6.1osspc169999-
Pgpool-II serverPgpool-II 3.6.1osspc179999-
Pgpool-II serverPgpool-II 3.6.1osspc189999-

Install Pgpool-II by using Pgpool-II YUM repository.

     # yum install http://www.pgpool.net/yum/rpms/3.6/redhat/rhel-7-x86_64/pgpool-II-release-3.6-1.noarch.rpm
     # yum install pgpool-II-pg96
     # yum install pgpool-II-pg96-debuginfo
     # yum install pgpool-II-pg96-devel
     # yum install pgpool-II-pg96-extensions
    

Install PostgreSQL by using PostgreSQL YUM repository.

     # yum install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
     # yum install postgresql96-server
    

7.3.4. Before Starting

Before you start the configuration process, please check the following prerequisites.

7.3.5. Pgpool-II Configuration

7.3.5.1. Common Settings

Here are the common settings on osspc16, osspc17 and osspc18.

When installing Pgpool-II from RPM, all the Pgpool-II configuration files are in /etc/pgpool-II. In this example, we copy the sample configuration file for streaming replication mode.

      # cp /etc/pgpool-II/pgpool.conf.sample-stream /etc/pgpool-II/pgpool.conf
     

To allow Pgpool-II to accept all incoming connections, we set listen_addresses = '*'.

      listen_addresses = '*'
     

Specify replication delay check user and password.

      sr_check_user = 'pgpool'
      sr_check_password = 'pgpool'
     

Enable health check so that pgpool-II performs failover. Also, if the network is unstable, the health check fails even though the backend is running properly, failover or degenerate operation may occur. In order to prevent such incorrect detection of health check, we set health_check_max_retries = 10.

      health_check_period = 5
      health_check_timeout = 20
      health_check_user = 'pgpool'
      health_check_password = 'pgpool'

      health_check_max_retries = 10
     

Specify the backend information with osspc19 and osspc20.

      # - Backend Connection Settings -

      backend_hostname0 = 'osspc19'
      backend_port0 = 5432
      backend_weight0 = 1
      backend_data_directory0 = '/var/lib/pgsql/9.6/data'
      backend_flag0 = 'ALLOW_TO_FAILOVER'

      backend_hostname1 = 'osspc20'
      backend_port1 = 5432
      backend_weight1 = 1
      backend_data_directory1 = '/var/lib/pgsql/9.6/data'
      backend_flag1 = 'ALLOW_TO_FAILOVER'
     

7.3.5.2. Failover configuration

Specify failover_command to execute failover.sh script. The special characters %d %P %H %R in failover_command are replaced with DB node ID of the detached node, Old primary node ID, Hostname of the new master node, Database cluster directory of the new master node.

      failover_command = '/etc/pgpool-II/failover.sh %d %P %H %R'
     

Create /etc/pgpool-II/failover.sh, and set the file permission to 755.

      # vi /etc/pgpool-II/failover.sh
      # chmod 755 /etc/pgpool-II/failover.sh
     

  • /etc/pgpool-II/failover.sh

    #! /bin/sh -x
    # Execute command by failover.
    # special values:  %d = node id
    #                  %h = host name
    #                  %p = port number
    #                  %D = database cluster path
    #                  %m = new master node id
    #                  %H = new master node host name
    #                  %M = old master node id
    #                  %P = old primary node id
    #                  %r = new master port number
    #                  %R = new master database cluster path
    #                  %% = '%' character
    
    falling_node=$1          # %d
    old_primary=$2           # %P
    new_primary=$3           # %H
    pgdata=$4                # %R
    
    pghome=/usr/pgsql-9.6
    log=/var/log/pgpool/failover.log
    
    date >> $log
    echo "failed_node_id=$falling_node new_primary=$new_primary" >> $log
    
    if [ $falling_node = $old_primary ]; then
        if [ $UID -eq 0 ]
        then
           su postgres -c "ssh -T postgres@$new_primary $pghome/bin/pg_ctl promote -D $pgdata"
        else
            ssh -T postgres@$new_primary $pghome/bin/pg_ctl promote -D $pgdata
        fi
        exit 0;
    fi;
    exit 0;
           

7.3.5.3. Pgpool-II Online Recovery Configurations

Next, in order to perform online recovery with Pgpool-II we specify the PostgreSQL user name and online recovery command recovery_1st_stage. Then, we create recovery_1st_stage and pgpool_remote_start in database cluster directory of PostgreSQL primary server, and set the file permissions to 755.

      recovery_user = 'postgres'
      recovery_password = 'postgres'

      recovery_1st_stage_command = 'recovery_1st_stage'
     
      $ vi /var/lib/pgsql/9.6/data/recovery_1st_stage
      $ vi /var/lib/pgsql/9.6/data/pgpool_remote_start
      $ chmod 755 /var/lib/pgsql/9.6/data/recovery_1st_stage
      $ chmod 755 /var/lib/pgsql/9.6/data/pgpool_remote_start
     

  • /var/lib/pgsql/9.6/data/recovery_1st_stage

    #!/bin/bash -x
    # Recovery script for streaming replication.
    
    pgdata=$1
    remote_host=$2
    remote_pgdata=$3
    port=$4
    
    pghome=/usr/pgsql-9.6
    archivedir=/var/lib/pgsql/archivedir
    hostname=$(hostname)
    
    ssh -T postgres@$remote_host "
    rm -rf $remote_pgdata
    $pghome/bin/pg_basebackup -h $hostname -U repl -D $remote_pgdata -x -c fast
    rm -rf $archivedir/*
    
    cd $remote_pgdata
    cp postgresql.conf postgresql.conf.bak
    sed -e 's/#*hot_standby = off/hot_standby = on/' postgresql.conf.bak > postgresql.conf
    rm -f postgresql.conf.bak
    cat > recovery.conf << EOT
    standby_mode = 'on'
    primary_conninfo = 'host="$hostname" port=$port user=repl'
    restore_command = 'scp $hostname:$archivedir/%f %p'
    EOT
    "
           
  • /var/lib/pgsql/9.6/data/pgpool_remote_start

    #! /bin/sh -x
    
    pghome=/usr/pgsql-9.6
    remote_host=$1
    remote_pgdata=$2
    
    # Start recovery target PostgreSQL server
    ssh -T $remote_host $pghome/bin/pg_ctl -w -D $remote_pgdata start > /dev/null 2>&1 < /dev/null &
           

In order to use the online recovery functionality, the functions of pgpool_recovery, pgpool_remote_start, pgpool_switch_xlog are required, so we need install pgpool_recovery on template1 of PostgreSQL server osspc19.

      # su - postgres
      $ psql template1
      =# CREATE EXTENSION pgpool_recovery;
     

7.3.5.4. Client Authentication Configuration

Because in the section Before Starting, we already set PostgreSQL authentication method to md5, it is necessary to set a client authentication by Pgpool-II to connect to backend nodes. When installing from RPM, the Pgpool-II configuration file pool_hba.conf is in /etc/pgpool-II. By default, pool_hba authentication is disabled, and set enable_pool_hba = on to enable it.

      enable_pool_hba = on
     

The format of pool_hba.conf file follows very closely PostgreSQL's pg_hba.conf format. Set pgpool and postgres user's authentication method to md5.

      host    all         pgpool           0.0.0.0/0          md5
      host    all         postgres         0.0.0.0/0          md5
     

To use md5 authentication, we need to register the user name and password in file pool_passwd. Execute command pg_md5 --md5auth --username=<user name> <password> to register user name and MD5-hashed password in file pool_passwd. If pool_passwd doesn't exist yet, it will be created in the same directory as pgpool.conf.

      # pg_md5 --md5auth --username=pgpool <password of pgpool user>
      # pg_md5 --md5auth --username=postgres <password of postgres user>
     

7.3.5.5. Watchdog Configuration

Enable watchdog functionality on osspc16, osspc17, osspc18.

      use_watchdog = on
     

Specify virtual IP address that accepts connections from clients on osspc16, osspc17, osspc18. Ensure that the IP address set to virtual IP isn't used yet.

      delegate_IP = '133.137.174.153'
     

Specify the hostname and port number of each Pgpool-II server.

  • osspc16

    	wd_hostname = 'osspc16'
    	wd_port = 9000
           
  • osspc17

    	wd_hostname = 'osspc17'
    	wd_port = 9000
           
  • osspc18

    	wd_hostname = 'osspc18'
    	wd_port = 9000
           

Specify the hostname, Pgpool-II port number, and watchdog port number of monitored Pgpool-II servers on each Pgpool-II server.

  • osspc16

    	# - Other pgpool Connection Settings -
    
    	other_pgpool_hostname0 = 'osspc17'
    	other_pgpool_port0 = 9999
    	other_wd_port0 = 9000
    
    	other_pgpool_hostname1 = 'osspc18'
    	other_pgpool_port1 = 9999
    	other_wd_port1 = 9000
           
  • osspc17

    	# - Other pgpool Connection Settings -
    
    	other_pgpool_hostname0 = 'osspc16'
    	other_pgpool_port0 = 9999
    	other_wd_port0 = 9000
    
    	other_pgpool_hostname1 = 'osspc18'
    	other_pgpool_port1 = 9999
    	other_wd_port1 = 9000
           
  • osspc18

    	# - Other pgpool Connection Settings -
    
    	other_pgpool_hostname0 = 'osspc16'
    	other_pgpool_port0 = 9999
    	other_wd_port0 = 9000
    
    	other_pgpool_hostname1 = 'osspc17'
    	other_pgpool_port1 = 9999
    	other_wd_port1 = 9000
           

Specify the hostname and port number of destination for sending heartbeat signal on osspc16, osspc17, osspc18.

  • osspc16

    	heartbeat_destination0 = 'osspc17'
    	heartbeat_destination_port0 = 9694
    	heartbeat_device0 = ''
    
    	heartbeat_destination1 = 'osspc18'
    	heartbeat_destination_port1 = 9694
    	heartbeat_device1 = ''
    
           
  • osspc17

    	heartbeat_destination0 = 'osspc16'
    	heartbeat_destination_port0 = 9694
    	heartbeat_device0 = ''
    
    	heartbeat_destination1 = 'osspc18'
    	heartbeat_destination_port1 = 9694
    	heartbeat_device1 = ''
    
           
  • osspc18

    	heartbeat_destination0 = 'osspc16'
    	heartbeat_destination_port0 = 9694
    	heartbeat_device0 = ''
    
    	heartbeat_destination1 = 'osspc17'
    	heartbeat_destination_port1 = 9694
    	heartbeat_device1 = ''
           

7.3.5.6. PCP Command Configuration

Because user authentication is required to use the PCP command, we specify user name and md5 encrypted password in pcp.conf. Here we create the encrypted password for user postgres, and add <username: encrypted password≷ in /etc/pgpool-II/pcp.conf.

      # pg_md5 -p
      Password: (input password)
      (paste the md5 encrypted password to pcp.conf)

      # vi /etc/pgpool-II/pcp.conf
      (add password entry)
      user name:md5 encrypted password
     

The settings of Pgpool-II is completed.

7.3.6. Starting/Stopping Pgpool-II

Next we start Pgpool-II. Before starting Pgpool-II, please start PostgreSQL servers first. Also, when stopping PostgreSQL, it is necessary to stop Pgpool-II first.

7.3.7. Log

Use journalctl command to see Pgpool-II logs.

     # journalctl -a | grep pgpool
    

7.3.8. How to use

Let's start to use Pgpool-II. First, let's start Pgpool-II on osspc16, osspc17, osspc18 by using the following command.

     # systemctl start pgpool.service
    

7.3.8.1. Set up PostgreSQL standby server

First, we should set up PostgreSQL standby server by using Pgpool-II online recovery functionality. Ensure that recovery_1st_stage and pgpool_remote_start scripts used by pcp_recovery_node command are in database cluster directory of PostgreSQL primary server (osspc19).

      # pcp_recovery_node -h 133.137.174.153 -p 9898 -U postgres -n 1
     

After executing pcp_recovery_node command, verify that osspc20 is started as a PostgreSQL standby server.

      # psql -h 133.137.174.153 -p 9999 -U pgpool postgres

      postgres=> show pool_nodes;
      node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
      ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
      0       | osspc19  | 5432 | up     | 0.500000  | primary | 0          | true              | 0
      1       | osspc20  | 5432 | up     | 0.500000  | standby | 0          | false             | 0
     

7.3.8.2. Switching active/standby watchdog

Confirm the watchdog status by using pcp_watchdog_info. The Pgpool-II server which is started first run as MASTER.

      # pcp_watchdog_info -h 133.137.174.153 -p 9898 -U postgres
      Password:
      3 YES osspc16:9999 Linux osspc16 osspc16

      osspc16:9999 Linux osspc16 osspc16 9999 9000 4 MASTER  #The Pgpool-II server started first becames "MASTER".
      osspc17:9999 Linux osspc17 osspc17 9999 9000 7 STANDBY #run as standby
      osspc18:9999 Linux osspc18 osspc18 9999 9000 7 STANDBY #run as standby
     

Stop active server osspc16, then osspc17 or osspc18 will be promoted to active server. To stop osspc16, we can stop Pgpool-II service or shutdown the whole system. Here, we stop Pgpool-II service.

      [root@osspc16 ~]# systemctl stop pgpool.service
      [root@osspc16 ~]# pcp_watchdog_info -h 133.137.174.153 -p 9898 -U postgres
      Password:
      3 YES osspc17:9999 Linux osspc17 osspc17

      osspc17:9999 Linux osspc17 osspc17 9999 9000 4 MASTER     #osspc17 is promoted to MASTER
      osspc16:9999 Linux osspc16 osspc16 9999 9000 10 SHUTDOWN  #osspc16 is stopped
      osspc18:9999 Linux osspc18 osspc18 9999 9000 7 STANDBY    #osspc18 runs as STANDBY
     

Start Pgpool-II (osspc16) which we have stopped again, and verify that osspc16 runs as a standby.

      [root@osspc16 ~]# systemctl start pgpool.service
      [root@osspc16 ~]# pcp_watchdog_info -h 133.137.174.153 -p 9898 -U postgres
      Password:
      3 YES osspc17:9999 Linux osspc17 osspc17

      osspc17:9999 Linux osspc17 osspc17 9999 9000 4 MASTER
      osspc16:9999 Linux osspc16 osspc16 9999 9000 7 STANDBY
      osspc18:9999 Linux osspc18 osspc18 9999 9000 7 STANDBY
     

7.3.8.3. Failover

First, use psql to connect to PostgreSQL via virtual IP, and verify the backend information.

      # psql -h 133.137.174.153 -p 9999 -U pgpool postgres

      postgres=> show pool_nodes;
      node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
      ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
      0       | osspc19  | 5432 | up     | 0.500000  | primary | 0          | true              | 0
      1       | osspc20  | 5432 | up     | 0.500000  | standby | 0          | false             | 0
     

Next, stop primary PostgreSQL server osspc19, and verify automatic failover.

      $ pg_ctl -D /var/lib/pgsql/9.6/data -m immediate stop
     

After stopping PostgreSQL on osspc19, failover occurs and PostgreSQL on osspc20 becomes new primary DB.

      # psql -h 133.137.174.153 -p 9999 -U pgpool postgres

      postgres=> show pool_nodes;
      node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
      ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
      0       | osspc19  | 5432 | down   | 0.500000  | standby | 0          | false             | 0
      1       | osspc20  | 5432 | up     | 0.500000  | primary | 0          | true              | 0
     

7.3.8.4. Online Recovery

Here, we use Pgpool-II online recovery functionality to restore osspc19 (old primary server) as a standby. Before restoring the old primary server, please ensure that recovery_1st_stage and pgpool_remote_start scripts exist in database cluster directory of current primary server osspc20.

      # pcp_recovery_node -h 133.137.174.153 -p 9898 -U postgres -n 0
     

Then verify that osspc19 is started as a standby.

      # psql -h 133.137.174.153 -p 9999 -U pgpool postgres

      postgres=> show pool_nodes;
      node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
      ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
      0       | osspc19  | 5432 | up     | 0.500000  | standby | 0          | false             | 0
      1       | osspc20  | 5432 | up     | 0.500000  | primary | 0          | true              | 0