[pgpool-general-jp: 1549] Re: failover時に渡されるold primary node idの値について

2018年 6月 8日 (金) 19:38:04 JST

彭さま

ご回答ありがとうございました。

確認済の現象だったわけですね。
お手数を取らせてしまい、申し訳ございませんでした。
今後はもっと調べてから質問させていただきます。

3.7.4がリリースされましたら、入れ替えるようにいたします。

以上、よろしくお願いします。

2018年6月6日(水) 18:31 Bo Peng <pengbo ＠ sraoss.co.jp>:
>
> 彭です。
>
> ご報告ありがとうございます。
>
> > 7)では%Pは"1"が指定されるものと考えておりましたが、
> > そもそもの認識の間違いなのでしょうか？
> > それとも、設定もしくは環境の問題なのでしょうか？
>
> ご報告頂いている事象は認識済の不具合です。
> 修正はすでにコミット済で、6/12(火) に 3.7.4 として
> リリースされる予定です。
>
>   https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commitdiff;h=92cbc38ea4679af7ca5e6a7e4ca0d667995c0ac7
>
> 以上、よろしくお願いします。
>
> On Wed, 6 Jun 2018 14:11:29 +0900
> Katsuya Hata <hata ＠ cellant.jp> wrote:
>
> > はじめまして。
> > 畑と申します。よろしくお願い致します。
> >
> > failover_commandにて指定する引数"%P"に渡される値について
> > 質問をさせてください。
> >
> > node:0がstandby、node:1がprimaryの状態でnode:1の
> > PostgreSQLを停止させると、"%P"に"0"が指定されています。
> >
> > 以下の手順でfailoverを発生させた場合に上記のようになりました。
> >
> > 1) node:0がprimary(up)、node:1がstandby(up)でPostgreSQLを停止
> > 2) failoverスクリプトがnode:1のPostgreSQLをpromote
> >    →%Pには"0"が指定
> >    →node:0がstandby(down)、node:1がprimary(up)の状態を確認
> > 3) node:0でpg_basebackupを実行しnode:1と同期
> >    →同期後、recovery.confを作成
> >    →postgresql.confのhot_standby_feedbackをonに変更
> > 4) node:0のPostgreSQLを起動
> > 5) pcp_attach_nodeコマンドでnode:0をpgpool-IIの管理ノードに追加
> >    →node:0がstandby(up)、node:1がprimary(up)の状態を確認
> > 6) node:1のPostgreSQLを停止
> > 7) failoverスクリプトで取得した%Pの値が"0"
> >    →%dの値は"1"が指定されておりました
> >
> > 7)では%Pは"1"が指定されるものと考えておりましたが、
> > そもそもの認識の間違いなのでしょうか？
> > それとも、設定もしくは環境の問題なのでしょうか？
> >
> > ご教示いただけますと幸いです。
> >
> > OS、PostgreSQL、pgpool-IIの構成
> > --------------
> > OS : CentOS 7.4
> > PostgreSQL : 10.3
> > pgpool-II : 3.7.3
> > --------------
> >
> > サーバ構成
> > --------------
> > ・node:0、node:1は別筐体のサーバ
> >　　 node:0 → 10.252.0.212
> >　　 node:1 → 10.252.0.213
> > ・PostgreSQLは
> >　　 node:0 → primary
> >　　 node:1 → standby
> >　 となるストリーミングレプリケーション構成でセットアップ
> > ・pgpool-IIはwatchdogで両ノードで稼働（node:0がmaster）
> > --------------
> >
> > pgpool.confのfailover_commandの設定
> > --------------
> > failover_command = '/bin/sh /home/pgpool/replication/failover.sh %d %P %H %R'
> > --------------
> >
> > PostgreSQLの停止コマンド
> > --------------
> > pg_ctl stop -D <cluster path> -m immediate
> > --------------
> >
> > failover時のpgpoolのログ
> > ●node:0がprimary、node:1がstandbyの場合
> > --------------
> > 2018-06-06 13:07:18: pid 23755: LOG:  new IPC connection received
> > 2018-06-06 13:07:18: pid 23755: LOG:  watchdog received the failover
> > command from local pgpool-II on IPC interface
> > 2018-06-06 13:07:18: pid 23755: LOG:  watchdog is processing the
> > failover command [DEGENERATE_BACKEND_REQUEST] received from local
> > pgpool-II on IPC interface
> > 2018-06-06 13:07:18: pid 23755: LOG:  we have got the consensus to
> > perform the failover
> > 2018-06-06 13:07:18: pid 23755: DETAIL:  1 node(s) voted in the favor
> > 2018-06-06 13:07:18: pid 23752: LOG:  Pgpool-II parent process has
> > received failover request
> > 2018-06-06 13:07:18: pid 23755: LOG:  new IPC connection received
> > 2018-06-06 13:07:18: pid 23755: LOG:  received the failover indication
> > from Pgpool-II on IPC interface
> > 2018-06-06 13:07:18: pid 23755: LOG:  watchdog is informed of failover
> > end by the main process
> > 2018-06-06 13:07:18: pid 23752: LOG:  starting degeneration. shutdown
> > host 10.252.0.212(5432)
> > 2018-06-06 13:07:18: pid 23755: LOG:  watchdog received the failover
> > command from remote pgpool-II node "10.252.0.213:9999 Linux
> > tk9-n5z-568-18-r330.directorz.internal"
> > 2018-06-06 13:07:18: pid 23755: LOG:  watchdog is processing the
> > failover command [DEGENERATE_BACKEND_REQUEST] received from
> > 10.252.0.213:9999 Linux tk9-n5z-568-18-r330.directorz.internal
> > 2018-06-06 13:07:18: pid 23755: LOG:  we have got the consensus to
> > perform the failover
> > 2018-06-06 13:07:18: pid 23755: DETAIL:  1 node(s) voted in the favor
> > 2018-06-06 13:07:18: pid 23755: LOG:  invalid degenerate backend
> > request, node id : 0 status: [3] is not valid for failover
> > 2018-06-06 13:07:18: pid 23752: LOG:  Restart all children
> > 2018-06-06 13:07:18: pid 23752: LOG:  execute command: /bin/sh
> > /home/pgpool/replication/failover.sh 0 0 10.252.0.213 /database/pgdata
> > 2018/06/06 13:07:18 : start failover script(failover.sh)
> >
> > Failed node : 0
> > Currenr master node : 0
> > New master server : 10.252.0.213
> > New master cluster path : /database/pgdata
> >
> >
> > 2018/06/06 13:07:18 : node:0 status
> > 2018-06-06 13:07:18: pid 24351: LOG:  forked new pcp worker, pid=30038 socket=7
> > Hostname   : 10.252.0.212
> > Port       : 5432
> > Status     : 3
> > Weight     : 0.500000
> > Status Name: down
> > Role       : primary
> > 2018-06-06 13:07:18: pid 24351: LOG:  PCP process with pid: 30038 exit
> > with SUCCESS.
> > 2018-06-06 13:07:18: pid 24351: LOG:  PCP process with pid: 30038
> > exits with status 0
> > 2018-06-06 13:07:18: pid 24351: LOG:  forked new pcp worker, pid=30040 socket=7
> > 2018-06-06 13:07:18: pid 24351: LOG:  PCP process with pid: 30040 exit
> > with SUCCESS.
> > 2018-06-06 13:07:18: pid 24351: LOG:  PCP process with pid: 30040
> > exits with status 0
> > 2018/06/06 13:07:18 : node:1 status
> > 2018-06-06 13:07:18: pid 24351: LOG:  forked new pcp worker, pid=30054 socket=7
> > Hostname   : 10.252.0.213
> > Port       : 5432
> > Status     : 2
> > Weight     : 0.500000
> > Status Name: up
> > Role       : standby
> > 2018-06-06 13:07:18: pid 24351: LOG:  PCP process with pid: 30054 exit
> > with SUCCESS.
> > 2018-06-06 13:07:18: pid 24351: LOG:  PCP process with pid: 30054
> > exits with status 0
> > 2018-06-06 13:07:18: pid 24351: LOG:  forked new pcp worker, pid=30056 socket=7
> > 2018-06-06 13:07:18: pid 24351: LOG:  PCP process with pid: 30056 exit
> > with SUCCESS.
> > 2018-06-06 13:07:18: pid 24351: LOG:  PCP process with pid: 30056
> > exits with status 0
> > 2018/06/06 13:07:18 : node:0 is down
> > 2018/06/06 13:07:18 : promote command : /usr/bin/ssh -i
> > /home/pgpool/.ssh/replication.pk -T postgres ＠ 10.252.0.213
> > /usr/local/pgsql/bin/pg_ctl promote -D /database/pgdata
> > 2018/06/06 13:07:20 : 10.252.0.213 has been promoted from slave to temp master
> > 2018-06-06 13:07:20: pid 23752: LOG:  find_primary_node_repeatedly:
> > waiting for finding a primary node
> > 2018-06-06 13:07:20: pid 23752: LOG:  find_primary_node: checking backend no 0
> > 2018-06-06 13:07:20: pid 23752: LOG:  find_primary_node: checking backend no 1
> > 2018-06-06 13:07:20: pid 23752: LOG:  find_primary_node: primary node id is 1
> > 2018-06-06 13:07:20: pid 23752: LOG:  failover: set new primary node: 1
> > 2018-06-06 13:07:20: pid 23752: LOG:  failover: set new master node: 1
> > --------------
> >
> > ●node:0がstandby、node:1がprimaryの場合
> > --------------
> > 2018-06-06 13:19:43: pid 23755: LOG:  new IPC connection received
> > 2018-06-06 13:19:43: pid 23755: LOG:  received the failover indication
> > from Pgpool-II on IPC interface
> > 2018-06-06 13:19:43: pid 23755: LOG:  watchdog is informed of failover
> > end by the main process
> > 2018-06-06 13:19:43: pid 23752: LOG:  starting degeneration. shutdown
> > host 10.252.0.213(5432)
> > 2018-06-06 13:19:43: pid 23752: LOG:  Do not restart children because
> > we are switching over node id 1 host: 10.252.0.213 port: 432 and we
> > are in streaming replication mode
> > 2018-06-06 13:19:43: pid 23752: LOG:  execute command: /bin/sh
> > /home/pgpool/replication/failover.sh 1 0 10.252.0.212 /database/gdata
> > 2018/06/06 13:19:43 : start failover script(failover.sh)
> >
> > Failed node : 1
> > Currenr master node : 0
> > New master server : 10.252.0.212
> > New master cluster path : /database/pgdata
> >
> >
> > 2018/06/06 13:19:43 : node:0 status
> > 2018-06-06 13:19:43: pid 31382: LOG:  forked new pcp worker, pid=31634 socket=7
> > Hostname   : 10.252.0.212
> > Port       : 5432
> > Status     : 2
> > Weight     : 0.500000
> > Status Name: up
> > Role       : standby
> > 2018-06-06 13:19:43: pid 31382: LOG:  PCP process with pid: 31634 exit
> > with SUCCESS.
> > 2018-06-06 13:19:43: pid 31382: LOG:  PCP process with pid: 31634
> > exits with status 0
> > 2018-06-06 13:19:43: pid 31382: LOG:  forked new pcp worker, pid=31636 socket=7
> > 2018-06-06 13:19:43: pid 31382: LOG:  PCP process with pid: 31636 exit
> > with SUCCESS.
> > 2018-06-06 13:19:43: pid 31382: LOG:  PCP process with pid: 31636
> > exits with status 0
> > 2018/06/06 13:19:43 : node:1 status
> > 2018-06-06 13:19:43: pid 31382: LOG:  forked new pcp worker, pid=31650 socket=7
> > Hostname   : 10.252.0.213
> > Port       : 5432
> > Status     : 3
> > Weight     : 0.500000
> > Status Name: down
> > Role       : primary
> > 2018-06-06 13:19:43: pid 31382: LOG:  PCP process with pid: 31650 exit
> > with SUCCESS.
> > 2018-06-06 13:19:43: pid 31382: LOG:  PCP process with pid: 31650
> > exits with status 0
> > 2018-06-06 13:19:43: pid 31382: LOG:  forked new pcp worker, pid=31652 socket=7
> > 2018-06-06 13:19:43: pid 31382: LOG:  PCP process with pid: 31652 exit
> > with SUCCESS.
> > 2018-06-06 13:19:43: pid 31382: LOG:  PCP process with pid: 31652
> > exits with status 0
> > 2018/06/06 13:19:43 : node:1 is down
> > 2018/06/06 13:19:43 : promote command : /usr/bin/ssh -i
> > /home/pgpool/.ssh/replication.pk -T postgres ＠ 10.252.0.212
> > /usr/local/pgql/bin/pg_ctl promote -D /database/pgdata
> > 2018-06-06 13:19:43: pid 23846: LOG:  failed to connect to PostgreSQL
> > server on "10.252.0.213:5432", getsockopt() detected erro "Connection
> > refused"
> > 2018-06-06 13:19:43: pid 23846: ERROR:  failed to make persistent db connection
> > 2018-06-06 13:19:43: pid 23846: DETAIL:  connection to
> > host:"10.252.0.213:5432" failed
> > 2018-06-06 13:19:43: pid 23846: LOG:  health check failed on node 1 (timeout:0)
> > 2018-06-06 13:19:43: pid 23846: LOG:  received degenerate backend
> > request for node_id: 1 from pid [23846]
> > 2018-06-06 13:19:43: pid 23755: LOG:  new IPC connection received
> > 2018-06-06 13:19:43: pid 23755: LOG:  watchdog received the failover
> > command from local pgpool-II on IPC interface
> > 2018-06-06 13:19:43: pid 23755: LOG:  watchdog is processing the
> > failover command [DEGENERATE_BACKEND_REQUEST] received from loal
> > pgpool-II on IPC interface
> > 2018-06-06 13:19:43: pid 23755: LOG:  we have got the consensus to
> > perform the failover
> > 2018-06-06 13:19:43: pid 23755: DETAIL:  1 node(s) voted in the favor
> > 2018-06-06 13:19:44: pid 31383: LOG:  failed to connect to PostgreSQL
> > server on "10.252.0.213:5432", getsockopt() detected erro "Connection
> > refused"
> > 2018-06-06 13:19:44: pid 31383: ERROR:  failed to make persistent db connection
> > 2018-06-06 13:19:44: pid 31383: DETAIL:  connection to
> > host:"10.252.0.213:5432" failed
> > 2018/06/06 13:19:45 : 10.252.0.212 has been promoted from slave to temp master
> > 2018-06-06 13:19:45: pid 23752: LOG:  find_primary_node_repeatedly:
> > waiting for finding a primary node
> > 2018-06-06 13:19:45: pid 23752: LOG:  find_primary_node: checking backend no 0
> > 2018-06-06 13:19:45: pid 23752: LOG:  find_primary_node: primary node id is 0
> > 2018-06-06 13:19:45: pid 23752: LOG:  failover: set new primary node: 0
> > 2018-06-06 13:19:45: pid 23752: LOG:  failover: set new master node: 0
> > --------------
> >
> > ※failover.sh内でpcp_node_infoコマンドで各ノードの状態をログに出力しております。
> > ※failover.shでは引数の値ではなく、node:0、node:1の状態をpcp_node_infoで取得して
> >　 promoteが必要かを判断するようにしております。
> >
> > 以上、よろしくお願いします。
> >
> > --
> > Katsuay HATA <hata ＠ cellant.jp>
> > CELLANT Corp.
> > _______________________________________________
> > pgpool-general-jp mailing list
> > pgpool-general-jp ＠ sraoss.jp
> > http://www.sraoss.jp/mailman/listinfo/pgpool-general-jp
> >
>
>
> --
> Bo Peng <pengbo ＠ sraoss.co.jp>
> SRA OSS, Inc. Japan
>


-- 
 -----------------------------------------------------------------------
 ┏━┓□┏┓                ＣＥＬＬ▲ＮＴ                 ┏┓□┏━┓
 ┃┏╋┓┗┛         The Cell Added New Technology         ┗┛┏╋┓┃
 ┗╋┛┃                                                       ┃┗╋┛
 □┗━┛                 http://www.cellant.jp                 ┗━┛□
                                                                    ┏┓
 □┏━┓     株式会社セラン         〒101-0023                     ┗┛
 ┏╋┓┃     取締役                 千代田区神田松永町17 アサヒKビル
 ┃┗╋┛┏┓ 畑　勝也               TEL. 03-5289-9577
 ┗━┛□┗┛ hata ＠ cellant.jp        FAX. 03-5289-9578
 -----------------------------------------------------------------------