[pgpool-general: 7465] Re: Expected behaviour after pcp_detach_node ?

Mon Mar 29 17:26:00 JST 2021

Thanks for answering! I found out I had a major typo in follow_master.sh . 
So that problem is solved. Sorry for false problem report.

But I found a typo also in follow_master.sh.sample distributed with Pgpool:

     # If start Standby failed, drop replication slot "${FAILED_NODE_HOST}"
     else
         ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool \
         ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c "SELECT pg_drop_replication_slot('${FAILED_NODE_HOST}')"

         logger -i -p local1.error follow_master.sh: end: follow master command failed
         exit 1
     fi

The SSH command above won't work unless the quotes around the actual SQL 
command are escaped with '\'. There is a working example later in the 
script. :-)

Cheers,

Anssi Kanninen

$ diff follow_master.sh.sample follow_master.sh
5c5
< set -o xtrace
---
> #set -o xtrace
34,35c34,35
< PGHOME=/usr/pgsql-11
< ARCHIVEDIR=/var/lib/pgsql/archivedir
---
> PGHOME=/usr/pgsql-12
> ARCHIVEDIR=/backup/pg-archivedir
40a41,42
> REPL_SLOT_NAME=${FAILED_NODE_HOST//[-.]/_}
>
70c72
<     # Create replication slot "${FAILED_NODE_HOST}"
---
>     # Create replication slot "${REPL_SLOT_NAME}"
72c74
<         ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c \"SELECT pg_create_physical_replication_slot('${FAILED_NODE_HOST}');\"
---
>         ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c \"SELECT pg_create_physical_replication_slot('${REPL_SLOT_NAME}');\"
85c87
< primary_slot_name = '${FAILED_NODE_HOST}'
---
> primary_slot_name = '${REPL_SLOT_NAME}'
119c121
< primary_slot_name = '${FAILED_NODE_HOST}'
---
> primary_slot_name = '${REPL_SLOT_NAME}'
131,134c133
<             ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "
<                 ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c \"SELECT pg_drop_replication_slot('${FAILED_NODE_HOST}')\"
<             "
<
---
>             ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null po
155c154
<     # If start Standby failed, drop replication slot 
"${FAILED_NODE_HOST}"
---
>     # If start Standby failed, drop replication slot "${REPL_SLOT_NAME}"
159c158
<         ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c "SELECT 
pg_drop_replication_slot('${FAILED_NODE_HOST}')"
---
>         ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c "SELECT 
pg_drop_replication_slot('${REPL_SLOT_NAME}')"
166c165
<     logger -i -p local1.info follow_master.sh: 
failed_nod_id=${FAILED_NODE_ID} is not running. skipping follow master 
command
---
>     logger -i -p local1.info follow_master.sh: 
failed_node_id=${FAILED_NODE_ID} is not running. skipping follow master 
command

On Mon, 29 Mar 2021, Tatsuo Ishii wrote:

> Hi,
>
>> Hi!
>>
>> What is the expected behaviour when detaching a node?
>> I have a three-node test cluster with pgpool-II 4.1.3
>> and PostgreSQL 12 like below.
>>
>>
>> id | hostname | status | role | repl_delay | repl_state | repl_sync_st
>> | last_status_change
>>  0 | 10.0.0.1 | up | primary | 0 | | | 2021-03-25 12:18:27
>>  1 | 10.0.0.2 | up | standby | 0 | streaming | async | 2021-03-25
>>  12:18:56
>>  2 | 10.0.0.3 | up | standby | 0 | streaming | async | 2021-03-25
>>  12:18:56
>>
>> *** Then I run "pcp_detach_node 0". The second node was promoted as
>> *** expected:
>>
>> id | hostname | status | role | repl_delay | repl_state | repl_sync_st
>> | last_status_change
>>  0 | 10.0.0.1 | down | standby | 0 | | | 2021-03-25 12:25:27
>>  1 | 10.0.0.2 | up | primary | 0 | | | 2021-03-25 12:25:27
>>  2 | 10.0.0.3 | down | standby | 0 | | | 2021-03-25 12:25:27
>>
>> *** But then, the first one is back as primary again?
>>
>> id | hostname | status | role | repl_delay | repl_state | repl_sync_st
>> | last_status_change
>>  0 | 10.0.0.1 | up | primary | 0 | | | 2021-03-25 12:25:36
>>  1 | 10.0.0.2 | up | standby | 0 | | | 2021-03-25 12:25:36
>>  2 | 10.0.0.3 | down | standby | 0 | | | 2021-03-25 12:25:27
>
> This behavior is probably caused by the follow_master_command. Can you
> share follow_master.sh?
>
>> *** After that, returning as standby, the second node is primary:
>>
>> id | hostname | status | role | repl_delay | repl_state | repl_sync_st
>> | last_status_change
>>  0 | 10.0.0.1 | up | standby | 0 | | | 2021-03-25 12:25:45
>>  1 | 10.0.0.2 | up | primary | 0 | | | 2021-03-25 12:25:45
>>  2 | 10.0.0.3 | up | standby | 0 | | | 2021-03-25 12:25:45
>>
>> *** After all, this is the final state.
>>
>> id | hostname | status | role | repl_delay | repl_state | repl_sync_st
>> | last_status_change
>>  0 | 10.0.0.1 | up | standby | 33463880 | | | 2021-03-25 12:25:45
>>  1 | 10.0.0.2 | up | primary | 0 | | | 2021-03-25 12:25:45
>>  2 | 10.0.0.3 | up | standby | 0 | streaming | async | 2021-03-25
>>  12:25:45
>>
>> So finally, the second node was promoted as expected.
>>
>> But why is the first node "up" again even if it's not replicating?
>>
>> See pgpool.conf attached.
>>
>> Best regards,
>>   Anssi Kanninen
>>
>>
>>
>>
>> --
>> anssi at iki.fi
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>

-- 
anssi at iki.fi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: follow_master.sh
Type: application/octet-stream
Size: 6410 bytes
Desc: 
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20210329/0ddb9aeb/attachment.obj>