[pgpool-general: 6447] Re: follow_master_command executed on node shown as down (one of unrecovered masters from previous failover)

Andre Piwoni apiwoni at webmd.net
Sat Mar 2 03:34:58 JST 2019


I just realized that I already handled the case of re-start that triggered
failover in another way. Mainly, before promoting new node to master in
failover script I am forcing old primary to be shut down. So even if I do
restart of the primary and failover occurs it will shut down restarted old
primary.
Anyway, it doesn't hurt to have that check in follow_master script in case
rebooting machine restarts old primary etc.

On Fri, Mar 1, 2019 at 9:58 AM Andre Piwoni <apiwoni at webmd.net> wrote:

> I agree. This shouldn't be so complicated.
>
> Since I'm using sed to repoint slave in follow_master script by updating
> recovery.conf if the command fails I'm not re-starting and re-attaching the
> node. Kill two birds with one stone :-)
>
> Here'w what I'm testing now:
> ssh -o StrictHostKeyChecking=no -i /var/lib/pgsql/.ssh/id_rsa postgres@{detached_node_host}
> -T "sed -i 's/host=.*sslmode=/host=${new_master_node_host} port=5432
> sslmode=/g' /var/lib/pgsql/10/data/recovery.conf" >> $LOGFILE
> repoint_status=$?
>
> if [ ${repoint_status} -eq 0 ]; then
>
>       //restart
>
>       //reattach
>
> else
>
>      // WARNING: this could be restarted master so there's no recovery.conf
>
>      // CONSIDERATION: Should I shut it down since I don't want to have two masters running even though Pgpool load balances one???
>
> fi
>
>
> On Fri, Mar 1, 2019 at 9:44 AM Pierre Timmermans <ptim007 at yahoo.com>
> wrote:
>
>> Thank you, it makes sense indeed and I also like to have a relatively
>> long "grace" delay via the health check interval so that If the primary
>> restarts quickly enough there is no failover
>>
>> For the case where there is a degenerated master, I have added this code
>> in the follow_master script, it seems to work fine in my tests:
>>
>> ssh_options="ssh -o UserKnownHostsFile=/dev/null -o
>> StrictHostKeyChecking=no"
>> in_reco=$( $ssh_options postgres@${HOSTNAME} 'psql -t -c "select
>> pg_is_in_recovery();"' | head -1 | awk '{print $1}' )
>> if [ "a${in_reco}" != "a" ] ; then
>>   echo "Node $HOSTNAME is not in recovery, probably a degenerated master,
>> skip it" | tee -a $LOGFILE
>>   exit 0
>>  fi
>>
>> At the end I believe that pgpool algorithm to choose a primary node
>> (always the node with the lowest id) is the root cause of the problem:
>> pgpool should select the most adequate node (the node that is in recovery
>> and with the lowest gap). Unfortunately I cannot code in "C", otherwise I
>> would contribute.
>>
>> Pierre
>>
>>
>> On Friday, March 1, 2019, 5:07:06 PM GMT+1, Andre Piwoni <
>> apiwoni at webmd.net> wrote:
>>
>>
>> FYI,
>>
>> One of the things that I have done to minimize impact of restarting the
>> primary is using health check where max_retries x retry_delay_interval
>> allows enough time for the primary to be restarted without triggering
>> failover which may take more time time than restart itself. This is with
>> disabled fail_over_on_backend_error
>>
>> Andre
>>
>> On Fri, Mar 1, 2019 at 7:58 AM Andre Piwoni <apiwoni at webmd.net> wrote:
>>
>> Hi Pierre,
>>
>> Hmmm? I have not covered the case you described which is restart of the
>> primary on node 0, resulting failover ans subsequent restart of new primary
>> on node 1 which results in calling follow_master on node 0. In my case I
>> was shutting down node 0 which resulted in follow_master being called on it
>> after second failover since I was not checking if node 0 was running. In
>> your case, node 0 is running since it has been restarted.
>>
>> Here's part of my script that I have to improve given your case:
>>
>> ssh -o StrictHostKeyChecking=no -i /var/lib/pgsql/.ssh/id_rsa postgres@${detached_node_host}
>> -T "/usr/pgsql-10/bin/pgctl -D /var/lib/pgsql/10/data status" | grep "is
>> running"
>> running_status=$?
>>
>> if [ ${running_status} -eq 0 ]; then
>>         // TODO: Check if recovery.conf exists or pg_is_in_recovery() on
>> ${detached_node_host} and exit if this is not a slave node
>> // repoint to new master ${new_master_node_host}
>> // restart ${detached_node_host}
>> // reattach restarted node with pcp_attach_node
>> else
>> // do nothing since this could be old slave or primary that needs to be
>> recovered or node in maintenance mode etc.
>> fi
>>
>>
>>
>> On Fri, Mar 1, 2019 at 3:28 AM Pierre Timmermans <ptim007 at yahoo.com>
>> wrote:
>>
>> Hi
>>
>> Same issue for me but I am not sure how to fix it. Andre can you tell
>> exactly how you check ?
>>
>> I cannot add a test using pcp_node_info to check that the status is up,
>> because then follow_master is never doing something. Indeed, in my case,
>> when the follow_master is executed the status of the target node is always
>> down, so my script does the standby follow command and then a
>> pcp_attach_node.
>>
>> To solve the issue now I added a check that the command select
>> pg_is_in_recovery(); returns "t" on the node, if it returns "f" then I can
>> assume it is a degenerated master and I don't execute the follow_master
>> command.
>>
>>
>>
>> So my use case is this
>>
>> 1. node 0 is primary, node 1 and node 2 are standby
>> 2. node 0 is restarted, node 1 becomes primary and node 2 follows the new
>> primary (thanks to folllow_master). In follow_master of node 2 I have to do
>> pcp_attach_node after because the status of the node is down
>> 3. in the meantime node 0 has rebooted, the db is started on node 0 but
>> it is down in pgpool and its role is standby (it is a degenerated master)
>> 4. node 1 is restarted, pgpool executes failover on node 2 and
>> follow_master on node 0 => the follow_master on node 0 breaks everything
>> because after that node 0 becomes a primary again
>>
>> Thanks and regards
>>
>> Pierre
>>
>>
>> On Monday, February 25, 2019, 5:35:11 PM GMT+1, Andre Piwoni <
>> apiwoni at webmd.net> wrote:
>>
>>
>> I have already put that check in place.
>>
>> Thank you for confirming.
>>
>> On Sat, Feb 23, 2019 at 11:56 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>>
>> Sorry, I was wrong. A follow_master_command will be executed against
>> the down node as well. So you need to check whether target PostgreSQL
>> node is running in the follow_master_commdn. If it's not, you can skip
>> the node.
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>> > I have added pg_ctl status check to ensure no action is taken when node
>> is
>> > down but I'll check 3.7.8 version.
>> >
>> > Here's the Pgpool log from the time node2 is shutdown to time
>> node1(already
>> > dead old primary) received follow master command.
>> > Sorry for double date logging. I'm also including self-explanatory
>> > failover.log that I my failover and follow_master scripts generated.
>> >
>> > Arguments passed to scripts for your reference.
>> > failover.sh %d %h %p %D %M %P %m %H %r %R
>> > follow_master.sh %d %h %p %D %M %P %m %H %r %R
>> >
>> > Pool status before shutdown of node 2:
>> > postgres=> show pool_nodes;
>> >  node_id |          hostname          | port | status | lb_weight |
>> role
>> >  | select_cnt | load_balance_node | replication_delay
>> >
>> ---------+----------------------------+------+--------+-----------+---------+------------+-------------------+-------------------
>> >  0       | pg-hdp-node1.kitchen.local | 5432 | down   | 0.333333  |
>> standby
>> > | 0          | false             | 0
>> >  1       | pg-hdp-node2.kitchen.local | 5432 | up     | 0.333333  |
>> primary
>> > | 0          | false             | 0
>> >  2       | pg-hdp-node3.kitchen.local | 5432 | up     | 0.333333  |
>> standby
>> > | 0          | true              | 0
>> > (3 rows)
>> >
>> > Pgpool log
>> > Feb 22 10:43:27 pg-hdp-node3 pgpool[12437]: [126-1] 2019-02-22 10:43:27:
>> > pid 12437: LOG:  failed to connect to PostgreSQL server on
>> > "pg-hdp-node2.kitchen.local:5432", getsockopt() detected error
>> "Connection
>> > refused"
>> > Feb 22 10:43:27 pg-hdp-node3 pgpool[12437]: [127-1] 2019-02-22 10:43:27:
>> > pid 12437: ERROR:  failed to make persistent db connection
>> > Feb 22 10:43:27 pg-hdp-node3 pgpool[12437]: [127-2] 2019-02-22 10:43:27:
>> > pid 12437: DETAIL:  connection to host:"pg-hdp-node2.kitchen.local:5432"
>> > failed
>> > Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [128-1] 2019-02-22 10:43:37:
>> > pid 12437: ERROR:  Failed to check replication time lag
>> > Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [128-2] 2019-02-22 10:43:37:
>> > pid 12437: DETAIL:  No persistent db connection for the node 1
>> > Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [128-3] 2019-02-22 10:43:37:
>> > pid 12437: HINT:  check sr_check_user and sr_check_password
>> > Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [128-4] 2019-02-22 10:43:37:
>> > pid 12437: CONTEXT:  while checking replication time lag
>> > Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [129-1] 2019-02-22 10:43:37:
>> > pid 12437: LOG:  failed to connect to PostgreSQL server on
>> > "pg-hdp-node2.kitchen.local:5432", getsockopt() detected error
>> "Connection
>> > refused"
>> > Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [130-1] 2019-02-22 10:43:37:
>> > pid 12437: ERROR:  failed to make persistent db connection
>> > Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [130-2] 2019-02-22 10:43:37:
>> > pid 12437: DETAIL:  connection to host:"pg-hdp-node2.kitchen.local:5432"
>> > failed
>> > Feb 22 10:43:45 pg-hdp-node3 pgpool[7786]: [6-1] 2019-02-22 10:43:45:
>> pid
>> > 7786: LOG:  failed to connect to PostgreSQL server on
>> > "pg-hdp-node2.kitchen.local:5432", getsockopt() detected error
>> "Connection
>> > refused"
>> > Feb 22 10:43:45 pg-hdp-node3 pgpool[7786]: [7-1] 2019-02-22 10:43:45:
>> pid
>> > 7786: ERROR:  failed to make persistent db connection
>> > Feb 22 10:43:45 pg-hdp-node3 pgpool[7786]: [7-2] 2019-02-22 10:43:45:
>> pid
>> > 7786: DETAIL:  connection to host:"pg-hdp-node2.kitchen.local:5432"
>> failed
>> > Feb 22 10:43:45 pg-hdp-node3 pgpool[7786]: [8-1] 2019-02-22 10:43:45:
>> pid
>> > 7786: LOG:  health check retrying on DB node: 1 (round:1)
>> > Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [131-1] 2019-02-22 10:43:47:
>> > pid 12437: ERROR:  Failed to check replication time lag
>> > Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [131-2] 2019-02-22 10:43:47:
>> > pid 12437: DETAIL:  No persistent db connection for the node 1
>> > Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [131-3] 2019-02-22 10:43:47:
>> > pid 12437: HINT:  check sr_check_user and sr_check_password
>> > Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [131-4] 2019-02-22 10:43:47:
>> > pid 12437: CONTEXT:  while checking replication time lag
>> > Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [132-1] 2019-02-22 10:43:47:
>> > pid 12437: LOG:  failed to connect to PostgreSQL server on
>> > "pg-hdp-node2.kitchen.local:5432", getsockopt() detected error
>> "Connection
>> > refused"
>> > Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [133-1] 2019-02-22 10:43:47:
>> > pid 12437: ERROR:  failed to make persistent db connection
>> > Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [133-2] 2019-02-22 10:43:47:
>> > pid 12437: DETAIL:  connection to host:"pg-hdp-node2.kitchen.local:5432"
>> > failed
>> > Feb 22 10:43:48 pg-hdp-node3 pgpool[7786]: [9-1] 2019-02-22 10:43:48:
>> pid
>> > 7786: LOG:  failed to connect to PostgreSQL server on
>> > "pg-hdp-node2.kitchen.local:5432", getsockopt() detected error
>> "Connection
>> > refused"
>> > Feb 22 10:43:48 pg-hdp-node3 pgpool[7786]: [10-1] 2019-02-22 10:43:48:
>> pid
>> > 7786: ERROR:  failed to make persistent db connection
>> > Feb 22 10:43:48 pg-hdp-node3 pgpool[7786]: [10-2] 2019-02-22 10:43:48:
>> pid
>> > 7786: DETAIL:  connection to host:"pg-hdp-node2.kitchen.local:5432"
>> failed
>> > Feb 22 10:43:48 pg-hdp-node3 pgpool[7786]: [11-1] 2019-02-22 10:43:48:
>> pid
>> > 7786: LOG:  health check retrying on DB node: 1 (round:2)
>> > Feb 22 10:43:51 pg-hdp-node3 pgpool[7786]: [12-1] 2019-02-22 10:43:51:
>> pid
>> > 7786: LOG:  failed to connect to PostgreSQL server on
>> > "pg-hdp-node2.kitchen.local:5432", getsockopt() detected error
>> "Connection
>> > refused"
>> > Feb 22 10:43:51 pg-hdp-node3 pgpool[7786]: [13-1] 2019-02-22 10:43:51:
>> pid
>> > 7786: ERROR:  failed to make persistent db connection
>> > Feb 22 10:43:51 pg-hdp-node3 pgpool[7786]: [13-2] 2019-02-22 10:43:51:
>> pid
>> > 7786: DETAIL:  connection to host:"pg-hdp-node2.kitchen.local:5432"
>> failed
>> > Feb 22 10:43:51 pg-hdp-node3 pgpool[7786]: [14-1] 2019-02-22 10:43:51:
>> pid
>> > 7786: LOG:  health check retrying on DB node: 1 (round:3)
>> > Feb 22 10:43:54 pg-hdp-node3 pgpool[7786]: [15-1] 2019-02-22 10:43:54:
>> pid
>> > 7786: LOG:  failed to connect to PostgreSQL server on
>> > "pg-hdp-node2.kitchen.local:5432", getsockopt() detected error
>> "Connection
>> > refused"
>> > Feb 22 10:43:54 pg-hdp-node3 pgpool[7786]: [16-1] 2019-02-22 10:43:54:
>> pid
>> > 7786: ERROR:  failed to make persistent db connection
>> > Feb 22 10:43:54 pg-hdp-node3 pgpool[7786]: [16-2] 2019-02-22 10:43:54:
>> pid
>> > 7786: DETAIL:  connection to host:"pg-hdp-node2.kitchen.local:5432"
>> failed
>> > Feb 22 10:43:54 pg-hdp-node3 pgpool[7786]: [17-1] 2019-02-22 10:43:54:
>> pid
>> > 7786: LOG:  health check failed on node 1 (timeout:0)
>> > Feb 22 10:43:54 pg-hdp-node3 pgpool[7786]: [18-1] 2019-02-22 10:43:54:
>> pid
>> > 7786: LOG:  received degenerate backend request for node_id: 1 from pid
>> > [7786]
>> > Feb 22 10:43:54 pg-hdp-node3 pgpool[7746]: [253-1] 2019-02-22 10:43:54:
>> pid
>> > 7746: LOG:  Pgpool-II parent process has received failover request
>> > Feb 22 10:43:54 pg-hdp-node3 pgpool[7746]: [254-1] 2019-02-22 10:43:54:
>> pid
>> > 7746: LOG:  starting degeneration. shutdown host
>> > pg-hdp-node2.kitchen.local(5432)
>> > Feb 22 10:43:54 pg-hdp-node3 pgpool[7746]: [255-1] 2019-02-22 10:43:54:
>> pid
>> > 7746: LOG:  Restart all children
>> > Feb 22 10:43:54 pg-hdp-node3 pgpool[7746]: [256-1] 2019-02-22 10:43:54:
>> pid
>> > 7746: LOG:  execute command: /etc/pgpool-II/failover.sh 1
>> > pg-hdp-node2.kitchen.local 5432 /var/lib/pgsql/10/data 1 1 2
>> > pg-hdp-node3.kitchen.local 5432 /var/lib/pgsql/10/data
>> > Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [257-1] 2019-02-22 10:43:55:
>> pid
>> > 7746: LOG:  find_primary_node_repeatedly: waiting for finding a primary
>> node
>> > Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [258-1] 2019-02-22 10:43:55:
>> pid
>> > 7746: LOG:  find_primary_node: checking backend no 0
>> > Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [259-1] 2019-02-22 10:43:55:
>> pid
>> > 7746: LOG:  find_primary_node: checking backend no 1
>> > Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [260-1] 2019-02-22 10:43:55:
>> pid
>> > 7746: LOG:  find_primary_node: checking backend no 2
>> > Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [261-1] 2019-02-22 10:43:55:
>> pid
>> > 7746: LOG:  find_primary_node: primary node id is 2
>> > Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [262-1] 2019-02-22 10:43:55:
>> pid
>> > 7746: LOG:  starting follow degeneration. shutdown host
>> > pg-hdp-node1.kitchen.local(5432)
>> > Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [263-1] 2019-02-22 10:43:55:
>> pid
>> > 7746: LOG:  starting follow degeneration. shutdown host
>> > pg-hdp-node2.kitchen.local(5432)
>> > Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [264-1] 2019-02-22 10:43:55:
>> pid
>> > 7746: LOG:  failover: 2 follow backends have been degenerated
>> > Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [265-1] 2019-02-22 10:43:55:
>> pid
>> > 7746: LOG:  failover: set new primary node: 2
>> > Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [266-1] 2019-02-22 10:43:55:
>> pid
>> > 7746: LOG:  failover: set new master node: 2
>> > Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [267-1] 2019-02-22 10:43:55:
>> pid
>> > 7746: LOG:  failover done. shutdown host
>> pg-hdp-node2.kitchen.local(5432)
>> > Feb 22 10:43:55 pg-hdp-node3 pgpool[12437]: [134-1] 2019-02-22 10:43:55:
>> > pid 12437: ERROR:  Failed to check replication time lag
>> > Feb 22 10:43:55 pg-hdp-node3 pgpool[12437]: [134-2] 2019-02-22 10:43:55:
>> > pid 12437: DETAIL:  No persistent db connection for the node 1
>> > Feb 22 10:43:55 pg-hdp-node3 pgpool[12437]: [134-3] 2019-02-22 10:43:55:
>> > pid 12437: HINT:  check sr_check_user and sr_check_password
>> > Feb 22 10:43:55 pg-hdp-node3 pgpool[12437]: [134-4] 2019-02-22 10:43:55:
>> > pid 12437: CONTEXT:  while checking replication time lag
>> > Feb 22 10:43:55 pg-hdp-node3 pgpool[12437]: [135-1] 2019-02-22 10:43:55:
>> > pid 12437: LOG:  worker process received restart request
>> > Feb 22 10:43:55 pg-hdp-node3 pgpool[12774]: [267-1] 2019-02-22 10:43:55:
>> > pid 12774: LOG:  failback event detected
>> > Feb 22 10:43:55 pg-hdp-node3 pgpool[12774]: [267-2] 2019-02-22 10:43:55:
>> > pid 12774: DETAIL:  restarting myself
>> > Feb 22 10:43:55 pg-hdp-node3 pgpool[12742]: [265-1] 2019-02-22 10:43:55:
>> > pid 12742: LOG:  start triggering follow command.
>> > Feb 22 10:43:55 pg-hdp-node3 pgpool[12742]: [266-1] 2019-02-22 10:43:55:
>> > pid 12742: LOG:  execute command: /etc/pgpool-II/follow_master.sh 0
>> > pg-hdp-node1.kitchen.local 5432 /var/lib/pgsql/10/data 1 1 2
>> > pg-hdp-node3.kitchen.local 5432 /var/lib/pgsql/10/data
>> > Feb 22 10:43:55 pg-hdp-node3 pgpool[12742]: [267-1] 2019-02-22 10:43:55:
>> > pid 12742: LOG:  execute command: /etc/pgpool-II/follow_master.sh 1
>> > pg-hdp-node2.kitchen.local 5432 /var/lib/pgsql/10/data 1 1 2
>> > pg-hdp-node3.kitchen.local 5432 /var/lib/pgsql/10/data
>> > Feb 22 10:43:56 pg-hdp-node3 pgpool[12436]: [60-1] 2019-02-22 10:43:56:
>> pid
>> > 12436: LOG:  restart request received in pcp child process
>> > Feb 22 10:43:56 pg-hdp-node3 pgpool[7746]: [268-1] 2019-02-22 10:43:56:
>> pid
>> > 7746: LOG:  PCP child 12436 exits with status 0 in failover()
>> >
>> > Pgpool self-explanatory failover.log
>> >
>> > 2019-02-22 10:43:54.893 PST Executing failover script ...
>> > 2019-02-22 10:43:54.895 PST Script arguments:
>> > failed_node_id           1
>> > failed_node_host         pg-hdp-node2.kitchen.local
>> > failed_node_port         5432
>> > failed_node_pgdata       /var/lib/pgsql/10/data
>> > old_primary_node_id      1
>> > old_master_node_id       1
>> > new_master_node_id       2
>> > new_master_node_host     pg-hdp-node3.kitchen.local
>> > new_master_node_port     5432
>> > new_master_node_pgdata   /var/lib/pgsql/10/data
>> > 2019-02-22 10:43:54.897 PST Primary node running on
>> > pg-hdp-node2.kitchen.local host is unresponsive or have died
>> > 2019-02-22 10:43:54.898 PST Attempting to stop primary node running on
>> > pg-hdp-node2.kitchen.local host before promoting slave as the new
>> primary
>> > 2019-02-22 10:43:54.899 PST ssh -o StrictHostKeyChecking=no -i
>> > /var/lib/pgsql/.ssh/id_rsa postgres at pg-hdp-node2.kitchen.local -T
>> > /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data stop -m fast
>> > 2019-02-22 10:43:55.151 PST Promoting pg-hdp-node3.kitchen.local host as
>> > the new primary
>> > 2019-02-22 10:43:55.153 PST ssh -o StrictHostKeyChecking=no -i
>> > /var/lib/pgsql/.ssh/id_rsa postgres at pg-hdp-node3.kitchen.local -T
>> > /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data promote
>> > waiting for server to promote.... done
>> > server promoted
>> > 2019-02-22 10:43:55.532 PST Completed executing failover
>> >
>> > 2019-02-22 10:43:55.564 PST Executing follow master script ...
>> > 2019-02-22 10:43:55.566 PST Script arguments
>> > detached_node_id         0
>> > detached_node_host       pg-hdp-node1.kitchen.local
>> > detached_node_port       5432
>> > detached_node_pgdata     /var/lib/pgsql/10/data
>> > old_primary_node_id      1
>> > old_master_node_id       1
>> > new_master_node_id       2
>> > new_master_node_host     pg-hdp-node3.kitchen.local
>> > new_master_node_port     5432
>> > new_master_node_pgdata   /var/lib/pgsql/10/data
>> > 2019-02-22 10:43:55.567 PST Checking if server is running on
>> > pg-hdp-node1.kitchen.local host
>> > 2019-02-22 10:43:55.569 PST ssh -o StrictHostKeyChecking=no -i
>> > /var/lib/pgsql/.ssh/id_rsa postgres at pg-hdp-node1.kitchen.local -T
>> > /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data status
>> >
>> >
>> > pg_ctl: no server running
>> > 2019-02-22 10:43:55.823 PST Node on pg-hdp-node1.kitchen.local host is
>> not
>> > running. It could be old slave or primary that needs to be recovered.
>> > 2019-02-22 10:43:55.824 PST Completed executing follow master script
>> >
>> > 2019-02-22 10:43:55.829 PST Executing follow master script ...
>> > 2019-02-22 10:43:55.830 PST Script arguments
>> > detached_node_id         1
>> > detached_node_host       pg-hdp-node2.kitchen.local
>> > detached_node_port       5432
>> > detached_node_pgdata     /var/lib/pgsql/10/data
>> > old_primary_node_id      1
>> > old_master_node_id       1
>> > new_master_node_id       2
>> > new_master_node_host     pg-hdp-node3.kitchen.local
>> > new_master_node_port     5432
>> > new_master_node_pgdata   /var/lib/pgsql/10/data
>> > 2019-02-22 10:43:55.831 PST Detached node on pg-hdp-node2.kitchen.local
>> > host is the the old primary node
>> > 2019-02-22 10:43:55.833 PST Slave can be created from old primary node
>> by
>> > deleting PG_DATA directory under /var/lib/pgsql/10/data on
>> > pg-hdp-node2.kitchen.local host and re-running Chef client
>> > 2019-02-22 10:43:55.834 PST Slave can be recovered from old primary
>> node by
>> > running /usr/pgsql-10/bin/pg_rewind -D /var/lib/pgsql/10/data
>> > --source-server="port=5432 host=pg-hdp-node3.kitchen.local" command on
>> > pg-hdp-node2.kitchen.local host as postgres user
>> > 2019-02-22 10:43:55.835 PST After successful pg_rewind run cp
>> > /var/lib/pgsql/10/data/recovery.done
>> /var/lib/pgsql/10/data/recovery.conf,
>> > ensure host connection string points to pg-hdp-node3.kitchen.local,
>> start
>> > PostgreSQL and attach it to pgpool
>> > 2019-02-22 10:43:55.836 PST Completed executing follow master script
>> >
>> > On Thu, Feb 21, 2019 at 4:47 PM Tatsuo Ishii <ishii at sraoss.co.jp>
>> wrote:
>> >
>> >> > Is this correct behavior?
>> >> >
>> >> > In 3-node setup, node1(primary) is shutdown, failover is executed and
>> >> node2
>> >> > becomes new primary and node3 follows new primary on node2.
>> >> > Now, node2(new primary) is shutdown, failover is executed and node3
>> >> becomes
>> >> > new primary but fallow_master_command is executed on node1 even
>> though it
>> >> > is reported as down.
>> >>
>> >> No. follow master command should not be executed on an already-down
>> >> node (in this case node1).
>> >>
>> >> > It happens that my script repoints node1 and restarts it which breaks
>> >> hell
>> >> > because node1 was never recovered after being shutdown.
>> >> >
>> >> > I'm on PgPool 3.7.4.
>> >>
>> >> Can you share the log from when node2 was shutdown to when node1 was
>> >> recovered by your follow master command?
>> >>
>> >> In the mean time 3.7.4 is not the latest one. Can you try with the
>> >> latest one? (3.7.8).
>> >>
>> >> Best regards,
>> >> --
>> >> Tatsuo Ishii
>> >> SRA OSS, Inc. Japan
>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> Japanese:http://www.sraoss.co.jp
>> >>
>> >
>> >
>> > --
>> >
>> > *Andre Piwoni*
>>
>> _______________________________________________
>> pgpool-general mailing list
>> pgpool-general at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>
>>
>>
>> --
>>
>>
>
> --
>
> *Andre Piwoni*
>
> Sr. Software Developer, BI/Database
>
> *Web*MD Health Services
>
> Mobile: 801.541.4722
>
> www.webmdhealthservices.com
>


-- 

*Andre Piwoni*

Sr. Software Developer, BI/Database

*Web*MD Health Services

Mobile: 801.541.4722

www.webmdhealthservices.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20190301/77cf5add/attachment-0001.html>


More information about the pgpool-general mailing list