[pgpool-general-jp: 1199] pgpool_setupについて

Takanori Urata takanori.urata @ thot.co.jp
2013年 10月 14日 (月) 15:13:49 JST


to: pgpool-general-jp @ sraoss.jp 御中

pgpool-IIのソースコード(pgpool-II-3.3.1.tar.gz)をダウンロードし、環境を構築しておりますが、
どうしてもプライマリノード障害(pg_ctl stop -m immediate)を発生させてもfollow_master.sh
が実行されず行き詰ってしまいました。

構築中の環境 : PostgreSQL 9.1 / pgpool-II 3.3.1によるSR/HS
サーバ1:PostgreSQL 9.1(プライマリ)
サーバ2:PostgreSQL9.1(スタンバイ)、pgpool-II

PostgreSQLの設定ファイル
サーバ1:postgresql.conf
listen_addresses = '*'
port = 49152
wal_level = hot_standby
max_wal_senders = 2
wal_keep_segments = 8
synchronous_standby_names = 'DB02'
サーバ2:postgresql.conf
listen_addresses = '*'
port = 49153
wal_level = hot_standby
hot_standby = on
※pg_hba.confは両サーバとも全てのリクエストを通すように設定
pgpool-IIの設定ファイル
サーバ2:pgpool.conf
listen_addresses = '*'
port = 5432
backend_hostname0 = 'サーバ1'
backend_port0 = 49152
backend_hostname1 = 'サーバ2'
backend_port1 = 49153
master_slave_mode = on
master_slave_sub_mode = 'stream'
follow_master_command = '/usr/local/pgpool-II/etc/follow_master.sh %d %h %p
%D %m %M %H %P %r %R'
sr_check_user = 'postgres'
health_check_period = 10
health_check_user = 'postgres'
failover_command = '/usr/local/pgpool-II/etc/failover.sh %d %h %p %D %m %M
%H %P %r %R'
recovery_1st_stage_command = '/home/postgres/basebackup.sh'

この設定で「psql -p 5432 -c "show pool_nodes"」を実行すると下記の結果になります。
 node_id |    hostname     | port  | status | lb_weight |  role
---------+-----------------+-------+--------+-----------+---------
 0       | サーバ1 | 49152 | 2      | 0.500000  | primary
 1       | サーバ2 | 49153 | 2      | 0.500000  | standby
(2 rows)

この状態でプライマリノード障害を発生させると下記のような下記のような状態になります。

2013-10-14 15:07:50 LOG:   pid 9143: Backend status file
/var/log/pgpool-II/pgpool_status discarded
2013-10-14 15:07:50 LOG:   pid 9143: pgpool-II successfully started.
version 3.3.1 (tokakiboshi)
2013-10-14 15:07:50 LOG:   pid 9143: find_primary_node: primary node id is 0
2013-10-14 15:08:40 ERROR: pid 9143: connect_inet_domain_socket:
getsockopt() detected error: Connection refused
2013-10-14 15:08:40 ERROR: pid 9143: make_persistent_db_connection:
connection to サーバ1(49152) failed
2013-10-14 15:08:40 ERROR: pid 9143: connect_inet_domain_socket:
getsockopt() detected error: Connection refused
2013-10-14 15:08:40 ERROR: pid 9143: make_persistent_db_connection:
connection to サーバ1(49152) failed
2013-10-14 15:08:40 ERROR: pid 9143: health check failed. 0 th host サーバ1 at
port 49152 is down
2013-10-14 15:08:40 LOG:   pid 9143: set 0 th backend down status
2013-10-14 15:08:40 LOG:   pid 9143: starting degeneration. shutdown host
サーバ1(49152)
2013-10-14 15:08:40 LOG:   pid 9143: Restart all children
2013-10-14 15:08:40 LOG:   pid 9143: execute command:
/usr/local/pgpool-II/etc/failover.sh 0 サーバ1 49152 /opt/PostgreSQL/9.1/data
1 0 サーバ2 0 49153 /opt/PostgreSQL/9.1/data

real    0m0.000s
user    0m0.000s
sys     0m0.001s
2013-10-14 15:08:40 ERROR: pid 9178: connect_inet_domain_socket:
getsockopt() detected error: Connection refused
2013-10-14 15:08:40 ERROR: pid 9178: make_persistent_db_connection:
connection to サーバ2(49152) failed
2013-10-14 15:08:40 ERROR: pid 9178: check_replication_time_lag: could not
connect to DB node 0, check sr_check_user and sr_check_password
pg_ctl: cannot be run as root
Please log in (using, e.g., "su") as the (unprivileged) user that will
own the server process.
2013-10-14 15:08:40 LOG:   pid 9143: find_primary_node_repeatedly: waiting
for finding a primary node
2013-10-14 15:08:50 ERROR: pid 9178: connect_inet_domain_socket:
getsockopt() detected error: Connection refused
2013-10-14 15:08:50 ERROR: pid 9178: make_persistent_db_connection:
connection to サーバ1(49152) failed
2013-10-14 15:08:50 ERROR: pid 9178: check_replication_time_lag: could not
connect to DB node 0, check sr_check_user and sr_check_password
2013-10-14 15:08:50 LOG:   pid 9143: failover: no follow backends are
degenerated
2013-10-14 15:08:50 LOG:   pid 9143: failover: set new primary node: -1
2013-10-14 15:08:50 LOG:   pid 9143: failover: set new master node: 1
2013-10-14 15:08:50 LOG:   pid 9178: worker process received restart request
2013-10-14 15:08:50 LOG:   pid 9143: failover done. shutdown host
サーバ1(49152)
2013-10-14 15:08:51 LOG:   pid 9177: pcp child process received restart
request
2013-10-14 15:08:51 LOG:   pid 9143: PCP child 9177 exits with status 256
in failover()
2013-10-14 15:08:51 LOG:   pid 9143: fork a new PCP child pid 9313 in
failover()
2013-10-14 15:08:51 LOG:   pid 9143: worker child 9178 exits with status 256
2013-10-14 15:08:51 LOG:   pid 9143: fork a new worker child pid 9314

 node_id |    hostname     | port  | status | lb_weight |  role
---------+-----------------+-------+--------+-----------+---------
 0       | サーバ1 | 49152 | 3      | 0.500000  | standby
 1       | サーバ2 | 49153 | 2      | 0.500000  | standby
(2 rows)

failover.sh / follow_master.shはソースコードにバンドルされている”pgpool_setup”のものを利用しています。

ちなみに、"pgpool_setup"で構築した環境でも同様の動きになっいます。
-------------- next part --------------
HTMLの添付ファイルを保管しました...
URL: <http://www.sraoss.jp/pipermail/pgpool-general-jp/attachments/20131014/911dbd9b/attachment-0001.html>


pgpool-general-jp メーリングリストの案内