[pgpool-general: 6449] Re: follow_master_command executed on node shown as down (one of unrecovered masters from previous failover)

Tatsuo Ishii ishii at sraoss.co.jp
Wed Mar 6 06:47:46 JST 2019


I have updated follow master command description in the Pgpool-II
document to clarify what it actually does (will appear in next
Pgpool-II 4.0.4 release).

In the mean time I have upload the HTML compiled version to my Github
page. Please take a look at and give comments if you like.

http://localhost/~t-ishii/pgpool-II/html/runtime-config-failover.html

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

From: Pierre Timmermans <ptim007 at yahoo.com>
Subject: Re: [pgpool-general: 6435] Re: follow_master_command executed on node shown as down (one of unrecovered masters from previous failover)
Date: Fri, 1 Mar 2019 21:54:17 +0000 (UTC)
Message-ID: <654470371.7835532.1551477257916 at mail.yahoo.com>

> It is probably a good idea to force old primary to shut down but it is not always possible, if for example the primary node gets shutdown then the failover script will not be able to ssh into it and kill the old primary. If the old server comes back online then there is a degenerated master. I have a cron job that checks for degenerated master (and for detached standby) and re-instate it if possible, but I am sure there is always a risk of edge cases...
> Pierre 
> 
>     On Friday, March 1, 2019, 7:35:12 PM GMT+1, Andre Piwoni <apiwoni at webmd.net> wrote:  
>  
>  I just realized that I already handled the case of re-start that triggered failover in another way. Mainly, before promoting new node to master in failover script I am forcing old primary to be shut down. So even if I do restart of the primary and failover occurs it will shut down restarted old primary.Anyway, it doesn't hurt to have that check in follow_master script in case rebooting machine restarts old primary etc.
> On Fri, Mar 1, 2019 at 9:58 AM Andre Piwoni <apiwoni at webmd.net> wrote:
> 
> I agree. This shouldn't be so complicated.
> Since I'm using sed to repoint slave in follow_master script by updating recovery.conf if the command fails I'm not re-starting and re-attaching the node. Kill two birds with one stone :-)
> Here'w what I'm testing now:ssh -o StrictHostKeyChecking=no -i /var/lib/pgsql/.ssh/id_rsa postgres@{detached_node_host} -T "sed -i 's/host=.*sslmode=/host=${new_master_node_host} port=5432 sslmode=/g' /var/lib/pgsql/10/data/recovery.conf" >> $LOGFILE
> repoint_status=$?if [ ${repoint_status} -eq 0 ]; then      //restart      //reattachelse     // WARNING: this could be restarted master so there's no recovery.conf     // CONSIDERATION: Should I shut it down since I don't want to have two masters running even though Pgpool load balances one???fi
> On Fri, Mar 1, 2019 at 9:44 AM Pierre Timmermans <ptim007 at yahoo.com> wrote:
> 
> Thank you, it makes sense indeed and I also like to have a relatively long "grace" delay via the health check interval so that If the primary restarts quickly enough there is no failover
> For the case where there is a degenerated master, I have added this code in the follow_master script, it seems to work fine in my tests:
> 
> ssh_options="ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no"
> 
> in_reco=$( $ssh_options postgres@${HOSTNAME} 'psql -t -c "select pg_is_in_recovery();"' | head -1 | awk '{print $1}' )if [ "a${in_reco}" != "a" ] ; then
>   echo "Node $HOSTNAME is not in recovery, probably a degenerated master, skip it" | tee -a $LOGFILE
>   exit 0
>  fi
> 
> At the end I believe that pgpool algorithm to choose a primary node (always the node with the lowest id) is the root cause of the problem: pgpool should select the most adequate node (the node that is in recovery and with the lowest gap). Unfortunately I cannot code in "C", otherwise I would contribute.
> Pierre 
> 
>     On Friday, March 1, 2019, 5:07:06 PM GMT+1, Andre Piwoni <apiwoni at webmd.net> wrote:  
>  
>  FYI,
> One of the things that I have done to minimize impact of restarting the primary is using health check where max_retries x retry_delay_interval allows enough time for the primary to be restarted without triggering failover which may take more time time than restart itself. This is with disabled fail_over_on_backend_error
> Andre
> On Fri, Mar 1, 2019 at 7:58 AM Andre Piwoni <apiwoni at webmd.net> wrote:
> 
> Hi Pierre,
> Hmmm? I have not covered the case you described which is restart of the primary on node 0, resulting failover ans subsequent restart of new primary on node 1 which results in calling follow_master on node 0. In my case I was shutting down node 0 which resulted in follow_master being called on it after second failover since I was not checking if node 0 was running. In your case, node 0 is running since it has been restarted.
> Here's part of my script that I have to improve given your case:
> ssh -o StrictHostKeyChecking=no -i /var/lib/pgsql/.ssh/id_rsa postgres@${detached_node_host} -T "/usr/pgsql-10/bin/pgctl -D /var/lib/pgsql/10/data status" | grep "is running"running_status=$?
> if [ ${running_status} -eq 0 ]; then        // TODO: Check if recovery.conf exists or pg_is_in_recovery() on ${detached_node_host} and exit if this is not a slave node // repoint to new master ${new_master_node_host} // restart ${detached_node_host}  // reattach restarted node with pcp_attach_nodeelse // do nothing since this could be old slave or primary that needs to be recovered or node in maintenance mode etc.fi
> 
> 
> On Fri, Mar 1, 2019 at 3:28 AM Pierre Timmermans <ptim007 at yahoo.com> wrote:
> 
> Hi
> Same issue for me but I am not sure how to fix it. Andre can you tell exactly how you check ?
> I cannot add a test using pcp_node_info to check that the status is up, because then follow_master is never doing something. Indeed, in my case, when the follow_master is executed the status of the target node is always down, so my script does the standby follow command and then a pcp_attach_node.
> To solve the issue now I added a check that the command select pg_is_in_recovery(); returns "t" on the node, if it returns "f" then I can assume it is a degenerated master and I don't execute the follow_master command.
> 
> 
> So my use case is this
> 
> 1. node 0 is primary, node 1 and node 2 are standby2. node 0 is restarted, node 1 becomes primary and node 2 follows the new primary (thanks to folllow_master). In follow_master of node 2 I have to do pcp_attach_node after because the status of the node is down 3. in the meantime node 0 has rebooted, the db is started on node 0 but it is down in pgpool and its role is standby (it is a degenerated master)4. node 1 is restarted, pgpool executes failover on node 2 and follow_master on node 0 => the follow_master on node 0 breaks everything because after that node 0 becomes a primary again 
> Thanks and regards
> Pierre 
> 
>     On Monday, February 25, 2019, 5:35:11 PM GMT+1, Andre Piwoni <apiwoni at webmd.net> wrote:  
>  
>  I have already put that check in place.
> Thank you for confirming.
> On Sat, Feb 23, 2019 at 11:56 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
> Sorry, I was wrong. A follow_master_command will be executed against
> the down node as well. So you need to check whether target PostgreSQL
> node is running in the follow_master_commdn. If it's not, you can skip
> the node.
> 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
> 
>> I have added pg_ctl status check to ensure no action is taken when node is
>> down but I'll check 3.7.8 version.
>> 
>> Here's the Pgpool log from the time node2 is shutdown to time node1(already
>> dead old primary) received follow master command.
>> Sorry for double date logging. I'm also including self-explanatory
>> failover.log that I my failover and follow_master scripts generated.
>> 
>> Arguments passed to scripts for your reference.
>> failover.sh %d %h %p %D %M %P %m %H %r %R
>> follow_master.sh %d %h %p %D %M %P %m %H %r %R
>> 
>> Pool status before shutdown of node 2:
>> postgres=> show pool_nodes;
>>  node_id |          hostname          | port | status | lb_weight |  role
>>  | select_cnt | load_balance_node | replication_delay
>> ---------+----------------------------+------+--------+-----------+---------+------------+-------------------+-------------------
>>  0       | pg-hdp-node1.kitchen.local | 5432 | down   | 0.333333  | standby
>> | 0          | false             | 0
>>  1       | pg-hdp-node2.kitchen.local | 5432 | up     | 0.333333  | primary
>> | 0          | false             | 0
>>  2       | pg-hdp-node3.kitchen.local | 5432 | up     | 0.333333  | standby
>> | 0          | true              | 0
>> (3 rows)
>> 
>> Pgpool log
>> Feb 22 10:43:27 pg-hdp-node3 pgpool[12437]: [126-1] 2019-02-22 10:43:27:
>> pid 12437: LOG:  failed to connect to PostgreSQL server on
>> "pg-hdp-node2.kitchen.local:5432", getsockopt() detected error "Connection
>> refused"
>> Feb 22 10:43:27 pg-hdp-node3 pgpool[12437]: [127-1] 2019-02-22 10:43:27:
>> pid 12437: ERROR:  failed to make persistent db connection
>> Feb 22 10:43:27 pg-hdp-node3 pgpool[12437]: [127-2] 2019-02-22 10:43:27:
>> pid 12437: DETAIL:  connection to host:"pg-hdp-node2.kitchen.local:5432"
>> failed
>> Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [128-1] 2019-02-22 10:43:37:
>> pid 12437: ERROR:  Failed to check replication time lag
>> Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [128-2] 2019-02-22 10:43:37:
>> pid 12437: DETAIL:  No persistent db connection for the node 1
>> Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [128-3] 2019-02-22 10:43:37:
>> pid 12437: HINT:  check sr_check_user and sr_check_password
>> Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [128-4] 2019-02-22 10:43:37:
>> pid 12437: CONTEXT:  while checking replication time lag
>> Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [129-1] 2019-02-22 10:43:37:
>> pid 12437: LOG:  failed to connect to PostgreSQL server on
>> "pg-hdp-node2.kitchen.local:5432", getsockopt() detected error "Connection
>> refused"
>> Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [130-1] 2019-02-22 10:43:37:
>> pid 12437: ERROR:  failed to make persistent db connection
>> Feb 22 10:43:37 pg-hdp-node3 pgpool[12437]: [130-2] 2019-02-22 10:43:37:
>> pid 12437: DETAIL:  connection to host:"pg-hdp-node2.kitchen.local:5432"
>> failed
>> Feb 22 10:43:45 pg-hdp-node3 pgpool[7786]: [6-1] 2019-02-22 10:43:45: pid
>> 7786: LOG:  failed to connect to PostgreSQL server on
>> "pg-hdp-node2.kitchen.local:5432", getsockopt() detected error "Connection
>> refused"
>> Feb 22 10:43:45 pg-hdp-node3 pgpool[7786]: [7-1] 2019-02-22 10:43:45: pid
>> 7786: ERROR:  failed to make persistent db connection
>> Feb 22 10:43:45 pg-hdp-node3 pgpool[7786]: [7-2] 2019-02-22 10:43:45: pid
>> 7786: DETAIL:  connection to host:"pg-hdp-node2.kitchen.local:5432" failed
>> Feb 22 10:43:45 pg-hdp-node3 pgpool[7786]: [8-1] 2019-02-22 10:43:45: pid
>> 7786: LOG:  health check retrying on DB node: 1 (round:1)
>> Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [131-1] 2019-02-22 10:43:47:
>> pid 12437: ERROR:  Failed to check replication time lag
>> Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [131-2] 2019-02-22 10:43:47:
>> pid 12437: DETAIL:  No persistent db connection for the node 1
>> Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [131-3] 2019-02-22 10:43:47:
>> pid 12437: HINT:  check sr_check_user and sr_check_password
>> Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [131-4] 2019-02-22 10:43:47:
>> pid 12437: CONTEXT:  while checking replication time lag
>> Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [132-1] 2019-02-22 10:43:47:
>> pid 12437: LOG:  failed to connect to PostgreSQL server on
>> "pg-hdp-node2.kitchen.local:5432", getsockopt() detected error "Connection
>> refused"
>> Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [133-1] 2019-02-22 10:43:47:
>> pid 12437: ERROR:  failed to make persistent db connection
>> Feb 22 10:43:47 pg-hdp-node3 pgpool[12437]: [133-2] 2019-02-22 10:43:47:
>> pid 12437: DETAIL:  connection to host:"pg-hdp-node2.kitchen.local:5432"
>> failed
>> Feb 22 10:43:48 pg-hdp-node3 pgpool[7786]: [9-1] 2019-02-22 10:43:48: pid
>> 7786: LOG:  failed to connect to PostgreSQL server on
>> "pg-hdp-node2.kitchen.local:5432", getsockopt() detected error "Connection
>> refused"
>> Feb 22 10:43:48 pg-hdp-node3 pgpool[7786]: [10-1] 2019-02-22 10:43:48: pid
>> 7786: ERROR:  failed to make persistent db connection
>> Feb 22 10:43:48 pg-hdp-node3 pgpool[7786]: [10-2] 2019-02-22 10:43:48: pid
>> 7786: DETAIL:  connection to host:"pg-hdp-node2.kitchen.local:5432" failed
>> Feb 22 10:43:48 pg-hdp-node3 pgpool[7786]: [11-1] 2019-02-22 10:43:48: pid
>> 7786: LOG:  health check retrying on DB node: 1 (round:2)
>> Feb 22 10:43:51 pg-hdp-node3 pgpool[7786]: [12-1] 2019-02-22 10:43:51: pid
>> 7786: LOG:  failed to connect to PostgreSQL server on
>> "pg-hdp-node2.kitchen.local:5432", getsockopt() detected error "Connection
>> refused"
>> Feb 22 10:43:51 pg-hdp-node3 pgpool[7786]: [13-1] 2019-02-22 10:43:51: pid
>> 7786: ERROR:  failed to make persistent db connection
>> Feb 22 10:43:51 pg-hdp-node3 pgpool[7786]: [13-2] 2019-02-22 10:43:51: pid
>> 7786: DETAIL:  connection to host:"pg-hdp-node2.kitchen.local:5432" failed
>> Feb 22 10:43:51 pg-hdp-node3 pgpool[7786]: [14-1] 2019-02-22 10:43:51: pid
>> 7786: LOG:  health check retrying on DB node: 1 (round:3)
>> Feb 22 10:43:54 pg-hdp-node3 pgpool[7786]: [15-1] 2019-02-22 10:43:54: pid
>> 7786: LOG:  failed to connect to PostgreSQL server on
>> "pg-hdp-node2.kitchen.local:5432", getsockopt() detected error "Connection
>> refused"
>> Feb 22 10:43:54 pg-hdp-node3 pgpool[7786]: [16-1] 2019-02-22 10:43:54: pid
>> 7786: ERROR:  failed to make persistent db connection
>> Feb 22 10:43:54 pg-hdp-node3 pgpool[7786]: [16-2] 2019-02-22 10:43:54: pid
>> 7786: DETAIL:  connection to host:"pg-hdp-node2.kitchen.local:5432" failed
>> Feb 22 10:43:54 pg-hdp-node3 pgpool[7786]: [17-1] 2019-02-22 10:43:54: pid
>> 7786: LOG:  health check failed on node 1 (timeout:0)
>> Feb 22 10:43:54 pg-hdp-node3 pgpool[7786]: [18-1] 2019-02-22 10:43:54: pid
>> 7786: LOG:  received degenerate backend request for node_id: 1 from pid
>> [7786]
>> Feb 22 10:43:54 pg-hdp-node3 pgpool[7746]: [253-1] 2019-02-22 10:43:54: pid
>> 7746: LOG:  Pgpool-II parent process has received failover request
>> Feb 22 10:43:54 pg-hdp-node3 pgpool[7746]: [254-1] 2019-02-22 10:43:54: pid
>> 7746: LOG:  starting degeneration. shutdown host
>> pg-hdp-node2.kitchen.local(5432)
>> Feb 22 10:43:54 pg-hdp-node3 pgpool[7746]: [255-1] 2019-02-22 10:43:54: pid
>> 7746: LOG:  Restart all children
>> Feb 22 10:43:54 pg-hdp-node3 pgpool[7746]: [256-1] 2019-02-22 10:43:54: pid
>> 7746: LOG:  execute command: /etc/pgpool-II/failover.sh 1
>> pg-hdp-node2.kitchen.local 5432 /var/lib/pgsql/10/data 1 1 2
>> pg-hdp-node3.kitchen.local 5432 /var/lib/pgsql/10/data
>> Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [257-1] 2019-02-22 10:43:55: pid
>> 7746: LOG:  find_primary_node_repeatedly: waiting for finding a primary node
>> Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [258-1] 2019-02-22 10:43:55: pid
>> 7746: LOG:  find_primary_node: checking backend no 0
>> Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [259-1] 2019-02-22 10:43:55: pid
>> 7746: LOG:  find_primary_node: checking backend no 1
>> Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [260-1] 2019-02-22 10:43:55: pid
>> 7746: LOG:  find_primary_node: checking backend no 2
>> Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [261-1] 2019-02-22 10:43:55: pid
>> 7746: LOG:  find_primary_node: primary node id is 2
>> Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [262-1] 2019-02-22 10:43:55: pid
>> 7746: LOG:  starting follow degeneration. shutdown host
>> pg-hdp-node1.kitchen.local(5432)
>> Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [263-1] 2019-02-22 10:43:55: pid
>> 7746: LOG:  starting follow degeneration. shutdown host
>> pg-hdp-node2.kitchen.local(5432)
>> Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [264-1] 2019-02-22 10:43:55: pid
>> 7746: LOG:  failover: 2 follow backends have been degenerated
>> Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [265-1] 2019-02-22 10:43:55: pid
>> 7746: LOG:  failover: set new primary node: 2
>> Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [266-1] 2019-02-22 10:43:55: pid
>> 7746: LOG:  failover: set new master node: 2
>> Feb 22 10:43:55 pg-hdp-node3 pgpool[7746]: [267-1] 2019-02-22 10:43:55: pid
>> 7746: LOG:  failover done. shutdown host pg-hdp-node2.kitchen.local(5432)
>> Feb 22 10:43:55 pg-hdp-node3 pgpool[12437]: [134-1] 2019-02-22 10:43:55:
>> pid 12437: ERROR:  Failed to check replication time lag
>> Feb 22 10:43:55 pg-hdp-node3 pgpool[12437]: [134-2] 2019-02-22 10:43:55:
>> pid 12437: DETAIL:  No persistent db connection for the node 1
>> Feb 22 10:43:55 pg-hdp-node3 pgpool[12437]: [134-3] 2019-02-22 10:43:55:
>> pid 12437: HINT:  check sr_check_user and sr_check_password
>> Feb 22 10:43:55 pg-hdp-node3 pgpool[12437]: [134-4] 2019-02-22 10:43:55:
>> pid 12437: CONTEXT:  while checking replication time lag
>> Feb 22 10:43:55 pg-hdp-node3 pgpool[12437]: [135-1] 2019-02-22 10:43:55:
>> pid 12437: LOG:  worker process received restart request
>> Feb 22 10:43:55 pg-hdp-node3 pgpool[12774]: [267-1] 2019-02-22 10:43:55:
>> pid 12774: LOG:  failback event detected
>> Feb 22 10:43:55 pg-hdp-node3 pgpool[12774]: [267-2] 2019-02-22 10:43:55:
>> pid 12774: DETAIL:  restarting myself
>> Feb 22 10:43:55 pg-hdp-node3 pgpool[12742]: [265-1] 2019-02-22 10:43:55:
>> pid 12742: LOG:  start triggering follow command.
>> Feb 22 10:43:55 pg-hdp-node3 pgpool[12742]: [266-1] 2019-02-22 10:43:55:
>> pid 12742: LOG:  execute command: /etc/pgpool-II/follow_master.sh 0
>> pg-hdp-node1.kitchen.local 5432 /var/lib/pgsql/10/data 1 1 2
>> pg-hdp-node3.kitchen.local 5432 /var/lib/pgsql/10/data
>> Feb 22 10:43:55 pg-hdp-node3 pgpool[12742]: [267-1] 2019-02-22 10:43:55:
>> pid 12742: LOG:  execute command: /etc/pgpool-II/follow_master.sh 1
>> pg-hdp-node2.kitchen.local 5432 /var/lib/pgsql/10/data 1 1 2
>> pg-hdp-node3.kitchen.local 5432 /var/lib/pgsql/10/data
>> Feb 22 10:43:56 pg-hdp-node3 pgpool[12436]: [60-1] 2019-02-22 10:43:56: pid
>> 12436: LOG:  restart request received in pcp child process
>> Feb 22 10:43:56 pg-hdp-node3 pgpool[7746]: [268-1] 2019-02-22 10:43:56: pid
>> 7746: LOG:  PCP child 12436 exits with status 0 in failover()
>> 
>> Pgpool self-explanatory failover.log
>> 
>> 2019-02-22 10:43:54.893 PST Executing failover script ...
>> 2019-02-22 10:43:54.895 PST Script arguments:
>> failed_node_id           1
>> failed_node_host         pg-hdp-node2.kitchen.local
>> failed_node_port         5432
>> failed_node_pgdata       /var/lib/pgsql/10/data
>> old_primary_node_id      1
>> old_master_node_id       1
>> new_master_node_id       2
>> new_master_node_host     pg-hdp-node3.kitchen.local
>> new_master_node_port     5432
>> new_master_node_pgdata   /var/lib/pgsql/10/data
>> 2019-02-22 10:43:54.897 PST Primary node running on
>> pg-hdp-node2.kitchen.local host is unresponsive or have died
>> 2019-02-22 10:43:54.898 PST Attempting to stop primary node running on
>> pg-hdp-node2.kitchen.local host before promoting slave as the new primary
>> 2019-02-22 10:43:54.899 PST ssh -o StrictHostKeyChecking=no -i
>> /var/lib/pgsql/.ssh/id_rsa postgres at pg-hdp-node2.kitchen.local -T
>> /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data stop -m fast
>> 2019-02-22 10:43:55.151 PST Promoting pg-hdp-node3.kitchen.local host as
>> the new primary
>> 2019-02-22 10:43:55.153 PST ssh -o StrictHostKeyChecking=no -i
>> /var/lib/pgsql/.ssh/id_rsa postgres at pg-hdp-node3.kitchen.local -T
>> /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data promote
>> waiting for server to promote.... done
>> server promoted
>> 2019-02-22 10:43:55.532 PST Completed executing failover
>> 
>> 2019-02-22 10:43:55.564 PST Executing follow master script ...
>> 2019-02-22 10:43:55.566 PST Script arguments
>> detached_node_id         0
>> detached_node_host       pg-hdp-node1.kitchen.local
>> detached_node_port       5432
>> detached_node_pgdata     /var/lib/pgsql/10/data
>> old_primary_node_id      1
>> old_master_node_id       1
>> new_master_node_id       2
>> new_master_node_host     pg-hdp-node3.kitchen.local
>> new_master_node_port     5432
>> new_master_node_pgdata   /var/lib/pgsql/10/data
>> 2019-02-22 10:43:55.567 PST Checking if server is running on
>> pg-hdp-node1.kitchen.local host
>> 2019-02-22 10:43:55.569 PST ssh -o StrictHostKeyChecking=no -i
>> /var/lib/pgsql/.ssh/id_rsa postgres at pg-hdp-node1.kitchen.local -T
>> /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data status
>> 
>> 
>> pg_ctl: no server running
>> 2019-02-22 10:43:55.823 PST Node on pg-hdp-node1.kitchen.local host is not
>> running. It could be old slave or primary that needs to be recovered.
>> 2019-02-22 10:43:55.824 PST Completed executing follow master script
>> 
>> 2019-02-22 10:43:55.829 PST Executing follow master script ...
>> 2019-02-22 10:43:55.830 PST Script arguments
>> detached_node_id         1
>> detached_node_host       pg-hdp-node2.kitchen.local
>> detached_node_port       5432
>> detached_node_pgdata     /var/lib/pgsql/10/data
>> old_primary_node_id      1
>> old_master_node_id       1
>> new_master_node_id       2
>> new_master_node_host     pg-hdp-node3.kitchen.local
>> new_master_node_port     5432
>> new_master_node_pgdata   /var/lib/pgsql/10/data
>> 2019-02-22 10:43:55.831 PST Detached node on pg-hdp-node2.kitchen.local
>> host is the the old primary node
>> 2019-02-22 10:43:55.833 PST Slave can be created from old primary node by
>> deleting PG_DATA directory under /var/lib/pgsql/10/data on
>> pg-hdp-node2.kitchen.local host and re-running Chef client
>> 2019-02-22 10:43:55.834 PST Slave can be recovered from old primary node by
>> running /usr/pgsql-10/bin/pg_rewind -D /var/lib/pgsql/10/data
>> --source-server="port=5432 host=pg-hdp-node3.kitchen.local" command on
>> pg-hdp-node2.kitchen.local host as postgres user
>> 2019-02-22 10:43:55.835 PST After successful pg_rewind run cp
>> /var/lib/pgsql/10/data/recovery.done /var/lib/pgsql/10/data/recovery.conf,
>> ensure host connection string points to pg-hdp-node3.kitchen.local, start
>> PostgreSQL and attach it to pgpool
>> 2019-02-22 10:43:55.836 PST Completed executing follow master script
>> 
>> On Thu, Feb 21, 2019 at 4:47 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>> 
>>> > Is this correct behavior?
>>> >
>>> > In 3-node setup, node1(primary) is shutdown, failover is executed and
>>> node2
>>> > becomes new primary and node3 follows new primary on node2.
>>> > Now, node2(new primary) is shutdown, failover is executed and node3
>>> becomes
>>> > new primary but fallow_master_command is executed on node1 even though it
>>> > is reported as down.
>>>
>>> No. follow master command should not be executed on an already-down
>>> node (in this case node1).
>>>
>>> > It happens that my script repoints node1 and restarts it which breaks
>>> hell
>>> > because node1 was never recovered after being shutdown.
>>> >
>>> > I'm on PgPool 3.7.4.
>>>
>>> Can you share the log from when node2 was shutdown to when node1 was
>>> recovered by your follow master command?
>>>
>>> In the mean time 3.7.4 is not the latest one. Can you try with the
>>> latest one? (3.7.8).
>>>
>>> Best regards,
>>> --
>>> Tatsuo Ishii
>>> SRA OSS, Inc. Japan
>>> English: http://www.sraoss.co.jp/index_en.php
>>> Japanese:http://www.sraoss.co.jp
>>>
>> 
>> 
>> -- 
>> 
>> *Andre Piwoni*
> 
> 
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
>   
> 
> 
> -- 
>   
> 
> 
> -- 
> 
> Andre Piwoni
> 
> Sr. Software Developer,BI/Database
> 
> WebMD Health Services
> 
> Mobile: 801.541.4722
> 
> www.webmdhealthservices.com
> 
> 
> 
> -- 
> 
> Andre Piwoni
> 
> Sr. Software Developer,BI/Database
> 
> WebMD Health Services
> 
> Mobile: 801.541.4722
> 
> www.webmdhealthservices.com
>   


More information about the pgpool-general mailing list