[pgpool-general: 3036] Re: Last node in pgpool chain failover

Long On on.long.on at gmail.com
Thu Jul 17 12:41:44 JST 2014


Thanks for looking into this Yugo. Sorry for the long message.

Start with node01 primary, node02 standby

-- pgpool.log --
2014-07-17 02:05:53 LOG:   pid 18498: wd_chk_setuid all commands have
setuid bit
2014-07-17 02:05:53 LOG:   pid 18498: watchdog might call network commands
which using setuid bit.
2014-07-17 02:05:53 LOG:   pid 18498: wd_create_send_socket: connect()
reports failure (Connection refused). You can safely ignore this while
starting up.
2014-07-17 02:05:53 LOG:   pid 18498: send_packet_4_nodes: packet for
node02:9000 is canceled
2014-07-17 02:05:56 LOG:   pid 18498: wd_escalation: escalating to master
pgpool
2014-07-17 02:05:58 LOG:   pid 18498: wd_escalation: escalated to master
pgpool successfully
2014-07-17 02:05:58 LOG:   pid 18498: wd_init: start watchdog
2014-07-17 02:05:58 LOG:   pid 18498: pgpool-II successfully started.
version 3.3.2 (tokakiboshi)
2014-07-17 02:05:59 LOG:   pid 18507: wd_create_hb_recv_socket: set
SO_REUSEPORT
2014-07-17 02:05:59 LOG:   pid 18508: wd_create_hb_send_socket: set
SO_REUSEPORT
2014-07-17 02:05:59 LOG:   pid 18509: wd_create_hb_recv_socket: set
SO_REUSEPORT
2014-07-17 02:05:59 LOG:   pid 18510: wd_create_hb_send_socket: set
SO_REUSEPORT

...

Stop postgres on node01 to trigger failover to node02

2014-07-17 02:27:10 LOG:   pid 18599: connection closed. retry to create
new connection pool.
2014-07-17 02:27:10 ERROR: pid 18599: connect_inet_domain_socket:
getsockopt() detected error: Connection refused
2014-07-17 02:27:10 ERROR: pid 18599: connection to node01(5432) failed
2014-07-17 02:27:10 ERROR: pid 18599: new_connection: create_cp() failed
2014-07-17 02:27:10 LOG:   pid 18599: degenerate_backend_set: 0 fail over
request from pid 18599
2014-07-17 02:27:10 LOG:   pid 18498: wd_start_interlock: start interlocking
2014-07-17 02:27:10 LOG:   pid 18498: wd_assume_lock_holder: become a new
lock holder
2014-07-17 02:27:11 LOG:   pid 18498: starting degeneration. shutdown host
node01(5432)
2014-07-17 02:27:11 LOG:   pid 18498: Restart all children
2014-07-17 02:27:11 LOG:   pid 18498: execute command: /usr/bin/sudo -u
postgres /var/lib/postgresql/failover_cmd.sh node01 node02
2014-07-17 02:27:11 LOG:   pid 18498: wd_end_interlock: end interlocking
2014-07-17 02:27:12 LOG:   pid 18498: failover: set new primary node: -1
2014-07-17 02:27:12 LOG:   pid 18498: failover: set new master node: 1
2014-07-17 02:27:12 LOG:   pid 18498: failover done. shutdown host
node01(5432)
2014-07-17 02:27:12 LOG:   pid 18546: worker process received restart
request
2014-07-17 02:27:13 LOG:   pid 18545: pcp child process received restart
request
2014-07-17 02:27:13 LOG:   pid 18498: PCP child 18545 exits with status 256
in failover()
2014-07-17 02:27:13 LOG:   pid 18498: fork a new PCP child pid 18745 in
failover()
2014-07-17 02:27:13 LOG:   pid 18498: worker child 18546 exits with status
256
2014-07-17 02:27:13 LOG:   pid 18498: fork a new worker child pid 18746

...

Failover to node02 is successful.
node01 gets a backup of node02 postgresql and start replicating.
pgpool still think node01 is shutdown. However, if pcp_attach_node is run
here then
pgpool will make node01 postgresql primary, but it is standby to node02
right now.
pcp_attach_node is not run.

Stop postgres on node02 to trigger failover to node01

2014-07-17 02:37:12 LOG:   pid 18742: connection closed. retry to create
new connection pool.
2014-07-17 02:37:12 ERROR: pid 18742: connect_inet_domain_socket:
getsockopt() detected error: Connection refused
2014-07-17 02:37:12 ERROR: pid 18742: connection to node02(5432) failed
2014-07-17 02:37:12 ERROR: pid 18742: new_connection: create_cp() failed
2014-07-17 02:37:12 LOG:   pid 18742: degenerate_backend_set: 1 fail over
request from pid 18742
2014-07-17 02:37:12 LOG:   pid 18498: wd_start_interlock: start interlocking
2014-07-17 02:37:12 LOG:   pid 18498: wd_assume_lock_holder: become a new
lock holder
2014-07-17 02:37:13 LOG:   pid 18498: starting degeneration. shutdown host
node02(5432)
2014-07-17 02:37:13 ERROR: pid 18498: failover_handler: no valid DB node
found
2014-07-17 02:37:13 LOG:   pid 18498: Restart all children
2014-07-17 02:37:13 LOG:   pid 18498: execute command: /usr/bin/sudo -u
postgres /var/lib/postgresql/failover_cmd.sh node02

tail log show logging stopped here so failover is not completed. Note the
failover command is incomplete. It should be ... failover_cmd.sh node02
node01

Listing pgpool processes show all with <defunct> tag. This makes sense
since pgpool doesn't know node01 is now standby and died. It thinks there
are no "good" nodes left in the cluster.

The remaining log entries are logged when I exited my shell session and
closing open jobs.

2014-07-17 02:45:42 LOG:   pid 18498: wd_end_interlock: end interlocking
2014-07-17 02:45:42 LOG:   pid 18498: failover: set new primary node: -1
2014-07-17 02:45:43 LOG:   pid 18498: failover done. shutdown host
node02(5432)
2014-07-17 02:45:44 LOG:   pid 18498: PCP child 18745 exits with status 0
in failover()
2014-07-17 02:45:44 LOG:   pid 18498: fork a new PCP child pid 18977 in
failover()
2014-07-17 02:45:44 LOG:   pid 18498: received smart shutdown request
2014-07-17 02:45:45 LOG:   pid 18511: wd_IP_down: ifconfig down succeeded
-- /pgpool.log --


I think the real question is how to re-attach node01 as standby so pgpool
will know about it.

On a related note, Muhammad Usama had pointed out in another thread that
pgpool looks for specific conditions to determine if a node is primary. I
think satisfying these conditions may help.

Long
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20140716/5b0d5180/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node01_pgpool.conf
Type: application/octet-stream
Size: 31279 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20140716/5b0d5180/attachment-0001.obj>


More information about the pgpool-general mailing list