[pgpool-general: 6577] Follow-master problem

Domen Šetar domen.setar at izum.si
Fri May 31 17:33:15 JST 2019


Hi,

I'm testing pgpoll installation with three servers according to documentation on the page:  http://www.pgpool.net/docs/latest/en/html/example-cluster.html
I'm using Centos 7 with PostgreSQL 11.3 and pgpool 4.0.5.
I have three servers: pgsql1 (primary), pgsql2 (standby), pgsql3 (standby).

I have following problem: When primary (pgsql1) goes down, first standby (pgsql2) is normaly promoted as primary, but second standby (pgsql3) doesn't follow new primary, it goes to status down instead (postgresql is actualy running normaly on pgsql3):

-bash-4.2$ psql -h 192.168.56.150 -p 9999 -U pgpool postgres -c "show pool_nodes"
Password for user pgpool:
node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
0       | pgsql1   | 5432 | down   | 0.333333  | standby | 0          | false             | 0                 | 2019-05-31 09:51:23
1       | pgsql2   | 5432 | up     | 0.333333  | primary | 0          | true              | 0                 | 2019-05-31 09:51:23
2       | pgsql3   | 5432 | down   | 0.333333  | standby | 0          | false             | 0                 | 2019-05-31 09:51:23

I found this lines in the pgpool  log file:

May 31 09:51:23 pgsql3 pgpool[4524]: [207-1] 2019-05-31 09:51:23: pid 4524: LOG:  find_primary_node: primary node is 1
May 31 09:51:23 pgsql3 pgpool[4524]: [208-1] 2019-05-31 09:51:23: pid 4524: LOG:  find_primary_node: standby node is 2
May 31 09:51:23 pgsql3 pgpool[4524]: [209-1] 2019-05-31 09:51:23: pid 4524: LOG:  starting follow degeneration. shutdown host pgsql1(5432)
May 31 09:51:23 pgsql3 pgpool[4524]: [210-1] 2019-05-31 09:51:23: pid 4524: LOG:  starting follow degeneration. shutdown host pgsql3(5432)

When the follow-master comes, the script checks if postgresql on server pgsql3 is running, and because it doesn't scripte exits:

May 31 09:51:23 pgsql3 pgpool[5034]: [214-1] 2019-05-31 09:51:23: pid 5034: LOG:  execute command: /etc/pgpool-II/follow_master.sh 2 pgsql3 5432 /var/lib/pgsql/11/data 1 0 pgsql2 0 5432 /var/lib/pgsql/11/data
May 31 09:51:23 pgsql3 root[5077]: follow_master.sh: start: pg_basebackup for 2
May 31 09:51:23 pgsql3 root[5078]: + FAILED_NODE_ID=2
May 31 09:51:23 pgsql3 root[5078]: + FAILED_NODE_HOST=pgsql3
May 31 09:51:23 pgsql3 root[5078]: + FAILED_NODE_PORT=5432
May 31 09:51:23 pgsql3 root[5078]: + FAILED_NODE_PGDATA=/var/lib/pgsql/11/data
May 31 09:51:23 pgsql3 root[5078]: + NEW_MASTER_NODE_ID=1
May 31 09:51:23 pgsql3 root[5078]: + OLD_MASTER_NODE_ID=0
May 31 09:51:23 pgsql3 root[5078]: + NEW_MASTER_NODE_HOST=pgsql2
May 31 09:51:23 pgsql3 root[5078]: + OLD_PRIMARY_NODE_ID=0
May 31 09:51:23 pgsql3 root[5078]: + NEW_MASTER_NODE_PORT=5432
May 31 09:51:23 pgsql3 root[5078]: + NEW_MASTER_NODE_PGDATA=/var/lib/pgsql/11/data
May 31 09:51:23 pgsql3 root[5078]: + PGHOME=/usr/pgsql-11
May 31 09:51:23 pgsql3 root[5078]: + ARCHIVEDIR=/var/lib/pgsql/archivedir
May 31 09:51:23 pgsql3 root[5078]: + REPL_USER=repl
May 31 09:51:23 pgsql3 root[5078]: + PCP_USER=pgpool
May 31 09:51:23 pgsql3 root[5078]: + PGPOOL_PATH=/usr/bin
May 31 09:51:23 pgsql3 root[5078]: + PCP_PORT=9898
May 31 09:51:23 pgsql3 root[5078]: + logger -i -p local1.info follow_master.sh: start: pg_basebackup for 2
May 31 09:51:23 pgsql3 root[5078]: + ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres at pgsql3 /usr/pgsql-11/bin/pg_ctl -w -D /var/lib/pgsql/11/data status
May 31 09:51:23 pgsql3 root[5078]: + [[ 255 -eq 0 ]]
May 31 09:51:23 pgsql3 root[5078]: + logger -i -p local1.info follow_master.sh: failed_nod_id=2 is not running. skipping follow master command.
May 31 09:51:23 pgsql3 root[5082]: follow_master.sh: failed_nod_id=2 is not running. skipping follow master command.
May 31 09:51:23 pgsql3 root[5078]: + exit 0

Postgres log file on pgsql3:

2019-05-31 09:48:11.142 CEST [4764] LOG:  started streaming WAL from primary at 0/F000000 on timeline 1
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2019-05-31 09:51:18.023 CEST [4764] FATAL:  could not receive data from WAL stream: server closed the connection unexpectedly
                This probably means the server terminated abnormally
                before or while processing the request.
Host key verification failed.
2019-05-31 09:51:18.156 CEST [4747] LOG:  invalid record length at 0/F000060: wanted 24, got 0
2019-05-31 09:51:18.160 CEST [5012] FATAL:  could not connect to the primary server: could not connect to server: Connection refused
                Is the server running on host "pgsql1" (192.168.56.111) and accepting
                TCP/IP connections on port 5432?
Host key verification failed.

What is wrong? Can you please help me out of this situation.

Best regards!
[izum]

Domen Šetar
Computer Systems Support
IZUM - Institute of Information Science | Prešernova ulica 17 | 2000 Maribor | Slovenia
T: +386 2 25 20 339 | M: +386 41 676 342 | www.izum.si<http://www.izum.si/> | domen.setar at izum.si<mailto:domen.setar at izum.si>



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20190531/ca0e3548/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 1062 bytes
Desc: image002.png
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20190531/ca0e3548/attachment.png>


More information about the pgpool-general mailing list