[Pgpool-general] Strange "cross" degeneration in a cluster with pgpool and openswan

Tatsuo Ishii ishii at sraoss.co.jp
Tue Oct 18 05:13:43 GMT 2005


Can you show me pgpool/PostgreSQL version, and pgpool's configuration
file contents? Especially I'm interested in which server is the master
for pgool?
--
SRA OSS, Inc. Japan
Tatsuo Ishii

> A very strange "cross" degeneration occured last week in our production
> cluster, wich uses pgpool and openswan (IPSec). The cluster consists of two
> machines (srv1 and srv2), each of which has an appliction server, a
> postgreSQL backend and a pgpool. On both servers, pgpool replicates
> transactions in both backends.
> Happened that srv1 entered degeneration mode because srv2 was not reachable,
> but srv2 did not enter degeneration mode. After some load, srv2 discovered a
> mismatch between backends and entered degeneration mode shutting down srv1
> (the slave backend). As a result, srv1 pointed to srv1 and srv2 to srv2,
> breaking the synchronization.
> I'm still unsure about the real cause that made srv1 unable to contact
> (temporarily) srv2. But maybe openswan related, because the log showed the
> following error:
> 
> Oct 12 19:48:25 srv2 kernel: pmtu discovery on SA ESP/e5fb8ff6/c0a85015
> 
> ¿Has anyone seen something like this?
> ¿Is there any method to avoid, or at least detect "cross degeneration" in
> this type of environment?
> 
> 
> 
> For further information, these are the relevant logs:
> 
> srv1
> ======
> 
> Oct 12 19:19:29 srv1 pluto[1346]: "srv2-srv1" #1177: received Delete
> SA(0xa98b908a) payload: deleting IPSEC State #1167
> Oct 12 19:19:29 srv1 pluto[1346]: "srv2-srv1" #1177: received and ignored
> informational message
> Oct 12 19:20:27 srv1 pluto[1346]: "srv2-srv1" #1179: responding to Main Mode
> Oct 12 19:20:27 srv1 pluto[1346]: "srv2-srv1" #1179: transition from state
> (null) to state STATE_MAIN_R1
> Oct 12 19:20:27 srv1 pluto[1346]: "srv2-srv1" #1179: transition from state
> STATE_MAIN_R1 to state STATE_MAIN_R2
> Oct 12 19:20:27 srv1 pluto[1346]: "srv2-srv1" #1179: Peer ID is
> ID_IPV4_ADDR: '192.168.80.20'
> Oct 12 19:20:27 srv1 pluto[1346]: "srv2-srv1" #1179: I did not send a
> certificate because I do not have one.
> Oct 12 19:20:27 srv1 pluto[1346]: "srv2-srv1" #1179: transition from state
> STATE_MAIN_R2 to state STATE_MAIN_R3
> Oct 12 19:20:27 srv1 pluto[1346]: "srv2-srv1" #1179: sent MR3, ISAKMP SA
> established
> Oct 12 19:23:01 srv1 CRON[31441]: (pam_unix) session opened for user mail by
> (uid=0)
> Oct 12 19:23:01 srv1 CRON[31441]: (pam_unix) session closed for user mail
> Oct 12 19:24:52 srv1 pgpool: 2005-10-12 19:24:52 ERROR: pid 10278:
> pool_read: read failed (Connection reset by peer)
> Oct 12 19:24:52 srv1 pgpool: 2005-10-12 19:24:52 ERROR: pid 10278:
> ProcessFrontendResponse: failed to read kind from frontend. fronend
> abnormally exited
> Oct 12 19:24:52 srv1 pgpool: 2005-10-12 19:24:52 ERROR: pid 7543: pool_read:
> read failed (Connection reset by peer)
> Oct 12 19:24:52 srv1 pgpool: 2005-10-12 19:24:52 ERROR: pid 7543:
> ProcessFrontendResponse: failed to read kind from frontend. fronend
> abnormally exited
> Oct 12 19:24:52 srv1 pgpool: 2005-10-12 19:24:52 ERROR: pid 32275:
> pool_read: read failed (Connection reset by peer)
> Oct 12 19:24:52 srv1 pgpool: 2005-10-12 19:24:52 ERROR: pid 32275:
> ProcessFrontendResponse: failed to read kind from frontend. fronend
> abnormally exited
> 
> ....
> 	Lots of same messages
> ....
> 
> Oct 12 19:24:52 srv1 pgpool: 2005-10-12 19:24:52 ERROR: pid 25087:
> pool_read: read failed (Connection reset by peer)
> Oct 12 19:24:52 srv1 pgpool: 2005-10-12 19:24:52 ERROR: pid 25087:
> ProcessFrontendResponse: failed to read kind from frontend. fronend
> abnormally exited
> Oct 12 19:28:32 srv1 pgpool: 2005-10-12 19:28:32 LOG:   pid 17352: starting
> health checking
> Oct 12 19:29:02 srv1 pgpool: 2005-10-12 19:29:02 LOG:   pid 17352: starting
> health checking
> Oct 12 19:29:32 srv1 pgpool: 2005-10-12 19:29:32 LOG:   pid 17352: starting
> health checking
> Oct 12 19:30:02 srv1 pgpool: 2005-10-12 19:30:02 LOG:   pid 17352: starting
> health checking
> Oct 12 19:30:28 srv1 pluto[1346]: "srv2-srv1" #1178: next payload type of
> ISAKMP Hash Payload has an unknown value: 219
> Oct 12 19:30:28 srv1 pluto[1346]: "srv2-srv1" #1178: malformed payload in
> packet
> Oct 12 19:30:28 srv1 pluto[1346]: "srv2-srv1" #1178: sending encrypted
> notification PAYLOAD_MALFORMED to 192.168.80.20:500
> Oct 12 19:30:32 srv1 pgpool: 2005-10-12 19:30:32 LOG:   pid 17352: starting
> health checking
> Oct 12 19:31:02 srv1 pgpool: 2005-10-12 19:31:02 LOG:   pid 17352: starting
> health checking
> Oct 12 19:31:32 srv1 pgpool: 2005-10-12 19:31:32 LOG:   pid 17352: starting
> health checking
> Oct 12 19:32:02 srv1 pgpool: 2005-10-12 19:32:02 LOG:   pid 17352: starting
> health checking
> Oct 12 19:32:32 srv1 pgpool: 2005-10-12 19:32:32 LOG:   pid 17352: starting
> health checking
> Oct 12 19:33:02 srv1 pgpool: 2005-10-12 19:33:02 LOG:   pid 17352: starting
> health checking
> Oct 12 19:33:32 srv1 pgpool: 2005-10-12 19:33:32 LOG:   pid 17352: starting
> health checking
> Oct 12 19:34:02 srv1 pgpool: 2005-10-12 19:34:02 LOG:   pid 17352: starting
> health checking
> Oct 12 19:34:32 srv1 pgpool: 2005-10-12 19:34:32 LOG:   pid 17352: starting
> health checking
> Oct 12 19:35:02 srv1 pgpool: 2005-10-12 19:35:02 LOG:   pid 17352: starting
> health checking
> Oct 12 19:35:32 srv1 pgpool: 2005-10-12 19:35:32 LOG:   pid 17352: starting
> health checking
> Oct 12 19:36:02 srv1 pgpool: 2005-10-12 19:36:02 LOG:   pid 17352: starting
> health checking
> Oct 12 19:36:21 srv1 pgpool: 2005-10-12 19:36:21 ERROR: pid 23909:
> pool_read: read failed (Connection reset by peer)
> Oct 12 19:36:21 srv1 pgpool: 2005-10-12 19:36:21 ERROR: pid 23909:
> ProcessFrontendResponse: failed to read kind from frontend. fronend
> abnormally exited
> Oct 12 19:36:21 srv1 pgpool: 2005-10-12 19:36:21 ERROR: pid 23909:
> pool_flush: fflush failed (Connection timed out)
> Oct 12 19:36:21 srv1 pgpool: 2005-10-12 19:36:21 LOG:   pid 23909:
> notice_backend_error: master: 0 fail over request from pid 23909
> Oct 12 19:36:21 srv1 pgpool: 2005-10-12 19:36:21 LOG:   pid 17352: starting
> degeneration. shutdown secondary host srv2(5432)
> 
> 
> 
> srv2
> ====
> 
> Oct 12 19:18:06 srv2 pluto[1558]: "srv2-srv1" #895: ignoring Delete SA
> payload: PROTO_IPSEC_ESP SA(0xe661038c) not
> found (maybe expired)
> Oct 12 19:18:06 srv2 pluto[1558]: "srv2-srv1" #895: received and ignored
> informational message
> Oct 12 19:19:04 srv2 pluto[1558]: "srv2-srv1" #897: initiating Main Mode to
> replace #895
> Oct 12 19:19:04 srv2 pluto[1558]: "srv2-srv1" #897: transition from state
> STATE_MAIN_I1 to state STATE_MAIN_I2
> Oct 12 19:19:04 srv2 pluto[1558]: "srv2-srv1" #897: I did not send a
> certificate because I do not have one.
> Oct 12 19:19:04 srv2 pluto[1558]: "srv2-srv1" #897: transition from state
> STATE_MAIN_I2 to state STATE_MAIN_I3
> Oct 12 19:19:04 srv2 pluto[1558]: "srv2-srv1" #897: Peer ID is ID_IPV4_ADDR:
> '192.168.80.21'
> Oct 12 19:19:04 srv2 pluto[1558]: "srv2-srv1" #897: transition from state
> STATE_MAIN_I3 to state STATE_MAIN_I4
> Oct 12 19:19:04 srv2 pluto[1558]: "srv2-srv1" #897: ISAKMP SA established
> Oct 12 19:22:45 srv2 sudo:  egutter : TTY=pts/1 ; PWD=/home/egutter ;
> USER=root ; COMMAND=/usr/bin/less /var/log/jboss/DT
> .log
> Oct 12 19:23:01 srv2 CRON[1921]: (pam_unix) session opened for user mail by
> (uid=0)
> Oct 12 19:23:01 srv2 CRON[1921]: (pam_unix) session closed for user mail
> Oct 12 19:29:05 srv2 pluto[1558]: "srv2-srv1" #895: received Delete SA
> payload: deleting ISAKMP State #895
> Oct 12 19:29:05 srv2 pluto[1558]: packet from 192.168.80.21:500: received
> and ignored informational message
> Oct 12 19:29:05 srv2 pluto[1558]: "srv2-srv1" #896: ignoring informational
> payload, type PAYLOAD_MALFORMED
> Oct 12 19:29:05 srv2 pluto[1558]: "srv2-srv1" #896: received and ignored
> informational message
> Oct 12 19:19:47 srv2 pgpool: 2005-10-12 19:19:47 LOG:   pid 1602: starting
> health checking
> Oct 12 19:19:52 srv2 kernel: pmtu discovery on SA ESP/e5fb8ff6/c0a85015
> Oct 12 19:20:05 srv2 last message repeated 7 times
> Oct 12 19:20:17 srv2 pgpool: 2005-10-12 19:20:17 LOG:   pid 1602: starting
> health checking
> Oct 12 19:20:18 srv2 kernel: pmtu discovery on SA ESP/e5fb8ff6/c0a85015
> ....
> 	Lots of same messages
> ....
> Oct 12 19:48:17 srv2 pgpool: 2005-10-12 19:48:17 LOG:   pid 1602: starting
> health checking
> Oct 12 19:48:25 srv2 kernel: pmtu discovery on SA ESP/e5fb8ff6/c0a85015
> 
> 
> 
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgpool-general
> 


More information about the Pgpool-general mailing list