View Issue Details

IDProjectCategoryView StatusLast Update
0000174Pgpool-IIBugpublic2016-03-18 21:43
ReporterapleeAssigned ToMuhammad Usama 
PriorityhighSeverityblockReproducibilityalways
Status resolvedResolutionfixed 
PlatformlinuxOSubuntuOS Version12.04
Product Version 
Target VersionFixed in Version 
Summary0000174: pgpool child processes exit unexpectedly and don't restart after online recovery of a standby node
DescriptionPgpool version: 3.5.0
mode: master/slave streaming replication mode

We have a two-node cluster, one is primary and the other is standby.
The postgresql of the standy is down ,and we use pcp_recovery_node to recover it.Well after pcp_recovery_node cmd executed,the standby is recovered.But unexpectedly, the child processes of pgpool exited after a while and never restarted again.


Steps To Reproduce1, stop the postgressql of the standby node
2, use pcp_recovery_node to recover the standby node
Additional Informationpgpool.conf:
#------------------------------------------------------------------------------
# ONLINE RECOVERY
#------------------------------------------------------------------------------

recovery_user = 'postgres'
                                   # Online recovery user
recovery_password = ''
                                   # Online recovery password
recovery_1st_stage_command = 'basebackup.sh'
                                   # Executes a command in first stage
recovery_2nd_stage_command = ''
                                   # Executes a command in second stage
recovery_timeout = 90
                                   # Timeout in seconds to wait for the
                                   # recovering node's postmaster to start up
                                   # 0 means no wait
client_idle_limit_in_recovery = 0
                                   # Client is disconnected after being idle
                                   # for that many seconds in the second stage
                                   # of online recovery
                                   # 0 means no disconnection
                                   # -1 means immediate disconnection

basebackup.sh:
#! /bin/sh
# Recovery script for streaming replication.
# This script assumes that DB node 0 is primary, and 1 is standby.
#
datadir=$1
desthost=$2
destdir=$3

psql -c "SELECT pg_start_backup('Streaming Replication', true)" postgres

rsync -C -a --delete -e ssh --exclude postgresql.conf --exclude postmaster.pid \
--exclude postmaster.opts --exclude pg_log --exclude pg_xlog \
--exclude recovery.conf $datadir/ $desthost:$destdir/

psql -c "SELECT pg_stop_backup()" postgres

pgpool_remote_start:
#! /bin/sh

if [ $# -ne 2 ]
then
    exit 1
fi

DEST=$1
DESTDIR=$2

ssh -T root@$DEST rm -f /tmp/trigger_file0
ssh -T root@$DEST cp $DESTDIR/recovery.done $DESTDIR/recovery.conf -f

ssh -T root@$DEST service postgresql start
TagsNo tags attached.

Activities

t-ishii

2016-03-18 14:19

developer   ~0000693

What do you see logs related to failover in the pgpool log?

I see below when I grep the log with keyword "failover".
Especially if you do not see "failover done.", it it likely that your failover script does not finish.

2016-03-18 14:15:36: pid 32450: LOG: execute command: /home/t-ishii/work/pgpool-II/current/aaa/etc/failover.sh 1 /tmp 11003 /home/t-ishii/work/pgpool-II/current/aaa/data1 0 0 /tmp 0 11002 /home/t-ishii/work/pgpool-II/current/aaa/data0
2016-03-18 14:15:36: pid 32450: LOG: failover: set new primary node: 0
2016-03-18 14:15:36: pid 32450: LOG: failover: set new master node: 0
2016-03-18 14:15:36: pid 32450: LOG: failover done. shutdown host /tmp(11003)

Muhammad Usama

2016-03-18 21:43

developer   ~0000694

This is the bug in the latest version pgpool-II 3.5. The fix for it is already pushed into the repository and you can pull the latest code if you are building pgpool from the source. And it will be part of next minor release, which is expected at the end of this month

http://git.postgresql.org/gitweb/?p=pgpool2.git;a=commitdiff;h=e2f822fae9a4956f210bccafd0eade26642c90fa

Issue History

Date Modified Username Field Change
2016-03-17 16:26 aplee New Issue
2016-03-18 14:14 t-ishii Assigned To => t-ishii
2016-03-18 14:14 t-ishii Status new => assigned
2016-03-18 14:19 t-ishii Note Added: 0000693
2016-03-18 14:20 t-ishii Status assigned => feedback
2016-03-18 21:43 Muhammad Usama Note Added: 0000694
2016-03-18 21:43 Muhammad Usama Status feedback => resolved
2016-03-18 21:43 Muhammad Usama Resolution open => fixed
2016-03-18 21:43 Muhammad Usama Assigned To t-ishii => Muhammad Usama