[Pgpool-general] Strange "cross" degeneration in a cluster with pgpool and openswan

Pablo Milano pm-pgpool at datatransfer.com.ar
Tue Oct 18 14:43:32 GMT 2005


Yes, of course.

These are the versions of the debian packages:

ii  pgpool         2.5.2-1        connection pool server for PostgreSQL
ii  postgresql     7.4.7-6sarge1  object-relational SQL database management
sy 

Both configuration files are the same (as follows). Always master backend is
"srv1"


# pgpool.conf
# Note: The real server names were changed to "srv1" and "srv2"
#
listen_addresses = '*'
port = 5433
socket_dir = '/var/run/postgresql'
backend_host_name = 'srv1'
backend_port = 5432
backend_socket_dir = '/var/run/postgresql'
secondary_backend_host_name = 'srv2'
secondary_backend_port = 5432
num_init_children = 32
max_pool = 4
child_life_time = 300
connection_life_time = 0
logdir = '/var/run/postgresql'
replication_mode = true
replication_strict = true
replication_timeout = 1000
load_balance_mode = true
weight_master = 0.5
weight_secondary = 0.5
replication_stop_on_mismatch = true
reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'
print_timestamp = true
master_slave_mode = false
connection_cache = true
health_check_timeout = 20
health_check_period = 30
health_check_user = 'monitor'

We think the health check uses blocking primitives, so it delayed some
minutes in detecting the fail, because all the processes were blocked
waiting for the kernel.

Regards.


> -----Original Message-----
> From: Tatsuo Ishii [mailto:ishii at sraoss.co.jp] 
> Sent: Martes, 18 de Octubre de 2005 02:14 a.m.
> To: Pablo G. Milano
> Cc: pgpool-general at pgfoundry.org
> Subject: Re: [Pgpool-general] Strange "cross" degeneration in 
> a cluster with pgpool and openswan
> 
> Can you show me pgpool/PostgreSQL version, and pgpool's configuration
> file contents? Especially I'm interested in which server is the master
> for pgool?
> --
> SRA OSS, Inc. Japan
> Tatsuo Ishii
> 
> > A very strange "cross" degeneration occured last week in 
> our production
> > cluster, wich uses pgpool and openswan (IPSec). The cluster 
> consists of two
> > machines (srv1 and srv2), each of which has an appliction server, a
> > postgreSQL backend and a pgpool. On both servers, pgpool replicates
> > transactions in both backends.
> > Happened that srv1 entered degeneration mode because srv2 
> was not reachable,
> > but srv2 did not enter degeneration mode. After some load, 
> srv2 discovered a
> > mismatch between backends and entered degeneration mode 
> shutting down srv1
> > (the slave backend). As a result, srv1 pointed to srv1 and 
> srv2 to srv2,
> > breaking the synchronization.
> > I'm still unsure about the real cause that made srv1 unable 
> to contact
> > (temporarily) srv2. But maybe openswan related, because the 
> log showed the
> > following error:
> > 
> > Oct 12 19:48:25 srv2 kernel: pmtu discovery on SA 
> ESP/e5fb8ff6/c0a85015
> > 
> > ¿Has anyone seen something like this?
> > ¿Is there any method to avoid, or at least detect "cross 
> degeneration" in
> > this type of environment?
> > 
> > 
> > 
> > For further information, these are the relevant logs:
> > 
> > srv1
> > ======
> > 
> > Oct 12 19:19:29 srv1 pluto[1346]: "srv2-srv1" #1177: received Delete
> > SA(0xa98b908a) payload: deleting IPSEC State #1167
> > Oct 12 19:19:29 srv1 pluto[1346]: "srv2-srv1" #1177: 
> received and ignored
> > informational message
> > Oct 12 19:20:27 srv1 pluto[1346]: "srv2-srv1" #1179: 
> responding to Main Mode
> > Oct 12 19:20:27 srv1 pluto[1346]: "srv2-srv1" #1179: 
> transition from state
> > (null) to state STATE_MAIN_R1
> > Oct 12 19:20:27 srv1 pluto[1346]: "srv2-srv1" #1179: 
> transition from state
> > STATE_MAIN_R1 to state STATE_MAIN_R2
> > Oct 12 19:20:27 srv1 pluto[1346]: "srv2-srv1" #1179: Peer ID is
> > ID_IPV4_ADDR: '192.168.80.20'
> > Oct 12 19:20:27 srv1 pluto[1346]: "srv2-srv1" #1179: I did 
> not send a
> > certificate because I do not have one.
> > Oct 12 19:20:27 srv1 pluto[1346]: "srv2-srv1" #1179: 
> transition from state
> > STATE_MAIN_R2 to state STATE_MAIN_R3
> > Oct 12 19:20:27 srv1 pluto[1346]: "srv2-srv1" #1179: sent 
> MR3, ISAKMP SA
> > established
> > Oct 12 19:23:01 srv1 CRON[31441]: (pam_unix) session opened 
> for user mail by
> > (uid=0)
> > Oct 12 19:23:01 srv1 CRON[31441]: (pam_unix) session closed 
> for user mail
> > Oct 12 19:24:52 srv1 pgpool: 2005-10-12 19:24:52 ERROR: pid 10278:
> > pool_read: read failed (Connection reset by peer)
> > Oct 12 19:24:52 srv1 pgpool: 2005-10-12 19:24:52 ERROR: pid 10278:
> > ProcessFrontendResponse: failed to read kind from frontend. fronend
> > abnormally exited
> > Oct 12 19:24:52 srv1 pgpool: 2005-10-12 19:24:52 ERROR: pid 
> 7543: pool_read:
> > read failed (Connection reset by peer)
> > Oct 12 19:24:52 srv1 pgpool: 2005-10-12 19:24:52 ERROR: pid 7543:
> > ProcessFrontendResponse: failed to read kind from frontend. fronend
> > abnormally exited
> > Oct 12 19:24:52 srv1 pgpool: 2005-10-12 19:24:52 ERROR: pid 32275:
> > pool_read: read failed (Connection reset by peer)
> > Oct 12 19:24:52 srv1 pgpool: 2005-10-12 19:24:52 ERROR: pid 32275:
> > ProcessFrontendResponse: failed to read kind from frontend. fronend
> > abnormally exited
> > 
> > ....
> > 	Lots of same messages
> > ....
> > 
> > Oct 12 19:24:52 srv1 pgpool: 2005-10-12 19:24:52 ERROR: pid 25087:
> > pool_read: read failed (Connection reset by peer)
> > Oct 12 19:24:52 srv1 pgpool: 2005-10-12 19:24:52 ERROR: pid 25087:
> > ProcessFrontendResponse: failed to read kind from frontend. fronend
> > abnormally exited
> > Oct 12 19:28:32 srv1 pgpool: 2005-10-12 19:28:32 LOG:   pid 
> 17352: starting
> > health checking
> > Oct 12 19:29:02 srv1 pgpool: 2005-10-12 19:29:02 LOG:   pid 
> 17352: starting
> > health checking
> > Oct 12 19:29:32 srv1 pgpool: 2005-10-12 19:29:32 LOG:   pid 
> 17352: starting
> > health checking
> > Oct 12 19:30:02 srv1 pgpool: 2005-10-12 19:30:02 LOG:   pid 
> 17352: starting
> > health checking
> > Oct 12 19:30:28 srv1 pluto[1346]: "srv2-srv1" #1178: next 
> payload type of
> > ISAKMP Hash Payload has an unknown value: 219
> > Oct 12 19:30:28 srv1 pluto[1346]: "srv2-srv1" #1178: 
> malformed payload in
> > packet
> > Oct 12 19:30:28 srv1 pluto[1346]: "srv2-srv1" #1178: 
> sending encrypted
> > notification PAYLOAD_MALFORMED to 192.168.80.20:500
> > Oct 12 19:30:32 srv1 pgpool: 2005-10-12 19:30:32 LOG:   pid 
> 17352: starting
> > health checking
> > Oct 12 19:31:02 srv1 pgpool: 2005-10-12 19:31:02 LOG:   pid 
> 17352: starting
> > health checking
> > Oct 12 19:31:32 srv1 pgpool: 2005-10-12 19:31:32 LOG:   pid 
> 17352: starting
> > health checking
> > Oct 12 19:32:02 srv1 pgpool: 2005-10-12 19:32:02 LOG:   pid 
> 17352: starting
> > health checking
> > Oct 12 19:32:32 srv1 pgpool: 2005-10-12 19:32:32 LOG:   pid 
> 17352: starting
> > health checking
> > Oct 12 19:33:02 srv1 pgpool: 2005-10-12 19:33:02 LOG:   pid 
> 17352: starting
> > health checking
> > Oct 12 19:33:32 srv1 pgpool: 2005-10-12 19:33:32 LOG:   pid 
> 17352: starting
> > health checking
> > Oct 12 19:34:02 srv1 pgpool: 2005-10-12 19:34:02 LOG:   pid 
> 17352: starting
> > health checking
> > Oct 12 19:34:32 srv1 pgpool: 2005-10-12 19:34:32 LOG:   pid 
> 17352: starting
> > health checking
> > Oct 12 19:35:02 srv1 pgpool: 2005-10-12 19:35:02 LOG:   pid 
> 17352: starting
> > health checking
> > Oct 12 19:35:32 srv1 pgpool: 2005-10-12 19:35:32 LOG:   pid 
> 17352: starting
> > health checking
> > Oct 12 19:36:02 srv1 pgpool: 2005-10-12 19:36:02 LOG:   pid 
> 17352: starting
> > health checking
> > Oct 12 19:36:21 srv1 pgpool: 2005-10-12 19:36:21 ERROR: pid 23909:
> > pool_read: read failed (Connection reset by peer)
> > Oct 12 19:36:21 srv1 pgpool: 2005-10-12 19:36:21 ERROR: pid 23909:
> > ProcessFrontendResponse: failed to read kind from frontend. fronend
> > abnormally exited
> > Oct 12 19:36:21 srv1 pgpool: 2005-10-12 19:36:21 ERROR: pid 23909:
> > pool_flush: fflush failed (Connection timed out)
> > Oct 12 19:36:21 srv1 pgpool: 2005-10-12 19:36:21 LOG:   pid 23909:
> > notice_backend_error: master: 0 fail over request from pid 23909
> > Oct 12 19:36:21 srv1 pgpool: 2005-10-12 19:36:21 LOG:   pid 
> 17352: starting
> > degeneration. shutdown secondary host srv2(5432)
> > 
> > 
> > 
> > srv2
> > ====
> > 
> > Oct 12 19:18:06 srv2 pluto[1558]: "srv2-srv1" #895: 
> ignoring Delete SA
> > payload: PROTO_IPSEC_ESP SA(0xe661038c) not
> > found (maybe expired)
> > Oct 12 19:18:06 srv2 pluto[1558]: "srv2-srv1" #895: 
> received and ignored
> > informational message
> > Oct 12 19:19:04 srv2 pluto[1558]: "srv2-srv1" #897: 
> initiating Main Mode to
> > replace #895
> > Oct 12 19:19:04 srv2 pluto[1558]: "srv2-srv1" #897: 
> transition from state
> > STATE_MAIN_I1 to state STATE_MAIN_I2
> > Oct 12 19:19:04 srv2 pluto[1558]: "srv2-srv1" #897: I did not send a
> > certificate because I do not have one.
> > Oct 12 19:19:04 srv2 pluto[1558]: "srv2-srv1" #897: 
> transition from state
> > STATE_MAIN_I2 to state STATE_MAIN_I3
> > Oct 12 19:19:04 srv2 pluto[1558]: "srv2-srv1" #897: Peer ID 
> is ID_IPV4_ADDR:
> > '192.168.80.21'
> > Oct 12 19:19:04 srv2 pluto[1558]: "srv2-srv1" #897: 
> transition from state
> > STATE_MAIN_I3 to state STATE_MAIN_I4
> > Oct 12 19:19:04 srv2 pluto[1558]: "srv2-srv1" #897: ISAKMP 
> SA established
> > Oct 12 19:22:45 srv2 sudo:  egutter : TTY=pts/1 ; 
> PWD=/home/egutter ;
> > USER=root ; COMMAND=/usr/bin/less /var/log/jboss/DT
> > .log
> > Oct 12 19:23:01 srv2 CRON[1921]: (pam_unix) session opened 
> for user mail by
> > (uid=0)
> > Oct 12 19:23:01 srv2 CRON[1921]: (pam_unix) session closed 
> for user mail
> > Oct 12 19:29:05 srv2 pluto[1558]: "srv2-srv1" #895: 
> received Delete SA
> > payload: deleting ISAKMP State #895
> > Oct 12 19:29:05 srv2 pluto[1558]: packet from 
> 192.168.80.21:500: received
> > and ignored informational message
> > Oct 12 19:29:05 srv2 pluto[1558]: "srv2-srv1" #896: 
> ignoring informational
> > payload, type PAYLOAD_MALFORMED
> > Oct 12 19:29:05 srv2 pluto[1558]: "srv2-srv1" #896: 
> received and ignored
> > informational message
> > Oct 12 19:19:47 srv2 pgpool: 2005-10-12 19:19:47 LOG:   pid 
> 1602: starting
> > health checking
> > Oct 12 19:19:52 srv2 kernel: pmtu discovery on SA 
> ESP/e5fb8ff6/c0a85015
> > Oct 12 19:20:05 srv2 last message repeated 7 times
> > Oct 12 19:20:17 srv2 pgpool: 2005-10-12 19:20:17 LOG:   pid 
> 1602: starting
> > health checking
> > Oct 12 19:20:18 srv2 kernel: pmtu discovery on SA 
> ESP/e5fb8ff6/c0a85015
> > ....
> > 	Lots of same messages
> > ....
> > Oct 12 19:48:17 srv2 pgpool: 2005-10-12 19:48:17 LOG:   pid 
> 1602: starting
> > health checking
> > Oct 12 19:48:25 srv2 kernel: pmtu discovery on SA 
> ESP/e5fb8ff6/c0a85015
> > 
> > 
> > 
> > _______________________________________________
> > Pgpool-general mailing list
> > Pgpool-general at pgfoundry.org
> > http://pgfoundry.org/mailman/listinfo/pgpool-general
> > 
> 



More information about the Pgpool-general mailing list