[pgpool-general: 6659] Re: Cluster with 3 nodes

Wed Jul 31 21:54:13 JST 2019

Yes, it was problem with passwordless ssh. Thanks for help!

пн, 29 июл. 2019 г. в 17:56, Гиа Хурцилава <khurtsilava.g at gmail.com>:

> Sorry, here is the pgpool.conf from the master node
>
> So I delete >/dev/null from the script and here is the result:
>
>  + FAILED_NODE_ID=0
>  + FAILED_NODE_HOST=master
>  + FAILED_NODE_PORT=5432
>  + FAILED_NODE_PGDATA=/var/lib/pgsql/11/data
>  + NEW_MASTER_NODE_ID=1
>  + OLD_MASTER_NODE_ID=0
>  + NEW_MASTER_NODE_HOST=slave
>  + OLD_PRIMARY_NODE_ID=0
>  + NEW_MASTER_NODE_PORT=5432
>  + NEW_MASTER_NODE_PGDATA=/var/lib/pgsql/11/data
>  + PGHOME=/usr/pgsql-11
>  + ARCHIVEDIR=/var/lib/pgsql/archivedir
>  + REPL_USER=repl
>  + PCP_USER=pgpool
>  + PGPOOL_PATH=/usr/bin
>  + PCP_PORT=9898
>  + logger -i -p local1.info follow_master.sh: start: pg_basebackup for 0
>  + ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null
> postgres at master /usr/pgsql-11/bin/pg_ctl -w -D /var/lib/pgsql/11/data
> status
>  Warning: Permanently added 'master,192.168.56.110' (ECDSA) to the list of
> known hosts.
>  Permission denied, please try again.
>  Permission denied, please try again.
>  Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
>  + [[ 255 -eq 0 ]]
>  + logger -i -p local1.info follow_master.sh: failed_nod_id=0 is not
> running. skipping follow master command.
> follow_master.sh: failed_nod_id=0 is not running. skipping follow master
> command.
>  + exit 0
>  [192-1] 2019-07-29 13:55:02: pid 2504: LOG:  execute command:
> /etc/pgpool-II/follow_master.sh 2 reserve 5432 /var/lib/pgsql/11/data 1 0
> slave 0 5432 /var/lib/pgsql/11/data
>  follow_master.sh: start: pg_basebackup for 2
>  + FAILED_NODE_ID=2
>  + FAILED_NODE_HOST=reserve
>  + FAILED_NODE_PORT=5432
>  + FAILED_NODE_PGDATA=/var/lib/pgsql/11/data
>  + NEW_MASTER_NODE_ID=1
>  + OLD_MASTER_NODE_ID=0
>  + NEW_MASTER_NODE_HOST=slave
>  + OLD_PRIMARY_NODE_ID=0
>  + NEW_MASTER_NODE_PORT=5432
>  + NEW_MASTER_NODE_PGDATA=/var/lib/pgsql/11/data
>  + PGHOME=/usr/pgsql-11
>  + ARCHIVEDIR=/var/lib/pgsql/archivedir
>  + REPL_USER=repl
>  + PCP_USER=pgpool
>  + PGPOOL_PATH=/usr/bin
>  + PCP_PORT=9898
>  + logger -i -p local1.info follow_master.sh: start: pg_basebackup for 2
>  + ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null
> postgres at reserve /usr/pgsql-11/bin/pg_ctl -w -D /var/lib/pgsql/11/data
> status
>  Warning: Permanently added 'reserve,192.168.56.112' (ECDSA) to the list
> of known hosts.
>  Permission denied, please try again.
>  Permission denied, please try again.
>  Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
>  + [[ 255 -eq 0 ]]
>  + logger -i -p local1.info follow_master.sh: failed_nod_id=2 is not
> running. skipping follow master command.
>  slave root[2550]: follow_master.sh: failed_nod_id=2 is not running.
> skipping follow master command.
>  + exit 0
>
> I'm starting to think that there some problem with ssh connection, but not
> sure
>
> вс, 28 июл. 2019 г. в 03:58, Tatsuo Ishii <ishii at sraoss.co.jp>:
>
>> I noticed followings in the log files:
>>
>> /home/t-ishii/slave log.txt:Jul 25 22:30:53 reserve root[2011]:
>> follow_master.sh: failed_nod_id=1 is not running. skipping follow master
>> command.
>> /home/t-ishii/slave log.txt:Jul 25 22:30:53 reserve root[2019]:
>> follow_master.sh: failed_nod_id=2 is not running. skipping follow master
>> command.
>>
>> I don't know which is node 1 and 2 (because you didn't share
>> pgpool.conf) , but I don't think two nodes were skipped by follow
>> master command was normal because you have only 3 nodes and just one
>> of 3 is already down.
>>
>> I suspect following code in follow_master.sh did not succeed:
>>
>> ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
>>     postgres@${FAILED_NODE_HOST} ${PGHOME}/bin/pg_ctl -w -D
>> ${FAILED_NODE_PGDATA} status >/dev/null 2>&1
>>
>> You would want to remove ">/dev/null" to see what was going on there.
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>> > "slave" -primary
>> > "master" and "reserve"- standby
>> > After I shut down "slave", "master" became primary, but "reserve" got
>> > status down. Configs are same from the documentation (changed just
>> > hostnames and ip's). Failover config is the same also
>> >
>> > пт, 26 июл. 2019 г. в 12:54, Tatsuo Ishii <ishii at sraoss.co.jp>:
>> >
>> >> Hi,
>> >>
>> >> Yes, please provide log and config files.
>> >>
>> >> My intuition is that there's something wrong with the follow master
>> >> command script or related settings (especially ssh), because the
>> >> script shutdowns standby server to resync with new primary database
>> >> server.
>> >>
>> >> Best regards,
>> >> --
>> >> Tatsuo Ishii
>> >> SRA OSS, Inc. Japan
>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> Japanese:http://www.sraoss.co.jp
>> >>
>> >> > Гиа Хурцилава <khurtsilava.g at gmail.com>
>> >> > чт, 25 июл., 13:56 (21 час назад)
>> >> > кому: pgpool-general
>> >> >
>> >> > Hi there.
>> >> >
>> >> > I’ve got 3 machines with pgpool-4.0.5 and postgresql-11. I have done
>> >> > configuration for pgpool from the official documentations (
>> >> > http://www.pgpool.net/docs/latest/en/html/example-cluster.html) and
>> >> > everything works fine, except 1 thing. When I’m shutting down master
>> >> node,
>> >> > one of the slaves is correctly promoted, and another one is going
>> down
>> >> with
>> >> > master. Just like that:
>> >> >
>> >> > node_id | hostname | port | status | lb_weight |  role   |
>> select_cnt |
>> >> > load_balance_node | replication_delay | last_status_change
>> >> >
>> >> >
>> >>
>> ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
>> >> >
>> >> >  0       | master   | 5432 | down   | 0.333333  | standby | 0
>>   |
>> >> > false             | 0                 | 2019-07-25 13:49:22
>> >> >
>> >> >  1       | slave      | 5432 | up         | 0.333333  | primary | 0
>> >> >   | true              | 0                | 2019-07-25 13:49:22
>> >> >
>> >> >  2       | reserve  | 5432 | down   | 0.333333  | standby | 0
>>   |
>> >> > false             | 0                 | 2019-07-25 13:49:22
>> >> >
>> >> >
>> >> >
>> >> > What reason can be of this behavior? How can I fix it?
>> >> >
>> >> > If you’ll need logs or config files-let me know. Thanks.
>> >>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20190731/2632ef92/attachment.html>