[pgpool-general: 930] Connections failures
Juan Jose Perez
jperez at oteara.com
Thu Aug 30 09:57:07 JST 2012
Hi all,
We have pgpool 3.1.2, with postgres 9.1 in streaming replication.
Pgpool is load balancing. There are two hosts both with postgres and
pgpool. They are in high availability with corosync.
It has been working fine but recently it has started to fail. First
disconnecting one backend and then the other, and fortunately, corosync
detects this situation and restarts pgpool. Some client connections are
lost during restarting but from there on, it continues normal operation.
And it looks like the cause is some network failure, even between
the pgpool and the postgres in the same host. This is the part of the
log where the failure happens.
What does it mean "Cannot assign requested address" in this
context? Any idea what could be the cause of failure and how can we
avoid it?
Many thanks.
2012-08-29 20:37:48 ERROR: pid 19289: connect_inet_domain_socket:
connect() failed: Cannot assign requested address
2012-08-29 20:37:48 ERROR: pid 19289: connection to host0(5432) failed
2012-08-29 20:37:48 ERROR: pid 19289: new_connection: create_cp() failed
2012-08-29 20:37:48 LOG: pid 19289: degenerate_backend_set: 0 fail
over request from pid 19289
2012-08-29 20:37:48 ERROR: pid 19300: connect_inet_domain_socket:
connect() failed: Cannot assign requested address
2012-08-29 20:37:48 LOG: pid 19307: DB node id: 0 backend pid: 32408
statement: B message
2012-08-29 20:37:48 LOG: pid 19286: DB node id: 1 backend pid: 23636
statement: B message
2012-08-29 20:37:48 ERROR: pid 19300: connection to host0(5432) failed
2012-08-29 20:37:48 LOG: pid 19274: starting degeneration. shutdown
host host0(5432)
2012-08-29 20:37:48 ERROR: pid 19300: new_connection: create_cp() failed
2012-08-29 20:37:48 LOG: pid 19274: Restart all children
2012-08-29 20:37:48 LOG: pid 19307: DB node id: 1 backend pid: 23637
statement: B message
2012-08-29 20:37:48 LOG: pid 19274: execute command: /etc/failover.sh
0 "host0" 5432 /var/lib/postgresql/9.1/main/ 1 0 "host1" 1
2012-08-29 20:37:48 LOG: pid 19274: find_primary_node_repeatedly:
waiting for finding a primary node
2012-08-29 20:37:48 LOG: pid 19274: find_primary_node: primary node id
is 1
2012-08-29 20:37:48 LOG: pid 19274: failover: set new primary node: 1
2012-08-29 20:37:48 LOG: pid 19274: failover: set new master node: 1
###
### It seems to start receiving normal connections from clients again
###
2012-08-29 20:37:48 LOG: pid 32414: connection received:
host=client30011 port=58746
2012-08-29 20:37:48 LOG: pid 32415: connection received:
host=client30011 port=58747
2012-08-29 20:37:48 LOG: pid 32416: connection received:
host=client30011 port=58748
2012-08-29 20:37:48 LOG: pid 32417: connection received:
host=client30011 port=58750
2012-08-29 20:37:48 LOG: pid 32418: connection received:
host=client30011 port=58752
2012-08-29 20:37:48 LOG: pid 32419: connection received:
host=client30121 port=55205
2012-08-29 20:37:48 LOG: pid 32420: connection received:
host=client30023 port=42473
2012-08-29 20:37:48 LOG: pid 32414: DB node id: 1 backend pid: 23641
statement: Parse: SET extra_float_digits = 3
2012-08-29 20:37:48 LOG: pid 32414: DB node id: 1 backend pid: 23641
statement: B message
2012-08-29 20:37:48 LOG: pid 32414: DB node id: 1 backend pid: 23641
statement: Execute: SET extra_float_digits = 3
2012-08-29 20:37:48 LOG: pid 32414: DB node id: 1 backend pid: 23641
statement: SELECT count(*) FROM pg_catalog.pg_proc AS p WHERE p.proname
= 'pgpool_regclass'
2012-08-29 20:37:48 LOG: pid 32416: DB node id: 1 backend pid: 23642
statement: Parse: SET extra_float_digits = 3
2012-08-29 20:37:48 LOG: pid 32417: DB node id: 1 backend pid: 23643
statement: Parse: SET extra_float_digits = 3
2012-08-29 20:37:48 LOG: pid 32419: DB node id: 1 backend pid: 23644
statement: Parse: SET extra_float_digits = 3
2012-08-29 20:37:48 LOG: pid 32416: DB node id: 1 backend pid: 23642
statement: B message
2012-08-29 20:37:48 LOG: pid 32417: DB node id: 1 backend pid: 23643
statement: B message
2012-08-29 20:37:48 LOG: pid 32416: DB node id: 1 backend pid: 23642
statement: Execute: SET extra_float_digits = 3
2012-08-29 20:37:48 LOG: pid 32419: DB node id: 1 backend pid: 23644
statement: B message
2012-08-29 20:37:48 LOG: pid 32418: DB node id: 1 backend pid: 23645
statement: Parse: SET extra_float_digits = 3
2012-08-29 20:37:48 LOG: pid 32439: connection received:
host=client30021 port=48696
2012-08-29 20:37:48 LOG: pid 32417: DB node id: 1 backend pid: 23643
statement: Execute: SET extra_float_digits = 3
2012-08-29 20:37:48 LOG: pid 32419: DB node id: 1 backend pid: 23644
statement: Execute: SET extra_float_digits = 3
2012-08-29 20:37:48 LOG: pid 32418: DB node id: 1 backend pid: 23645
statement: B message
2012-08-29 20:37:48 LOG: pid 19274: failover done. shutdown host
host0(5432)
2012-08-29 20:37:48 LOG: pid 32418: DB node id: 1 backend pid: 23645
statement: Execute: SET extra_float_digits = 3
2012-08-29 20:37:48 LOG: pid 32416: DB node id: 1 backend pid: 23642
statement: SELECT count(*) FROM pg_catalog.pg_proc AS p WHERE p.proname
= 'pgpool_regclass'
2012-08-29 20:37:48 LOG: pid 32417: DB node id: 1 backend pid: 23643
statement: SELECT count(*) FROM pg_catalog.pg_proc AS p WHERE p.proname
= 'pgpool_regclass'
2012-08-29 20:37:48 LOG: pid 32419: DB node id: 1 backend pid: 23644
statement: SELECT count(*) FROM pg_catalog.pg_proc AS p WHERE p.proname
= 'pgpool_regclass'
2012-08-29 20:37:48 LOG: pid 32418: DB node id: 1 backend pid: 23645
statement: Parse: UPDATE "Documents2" SET "Lemas" = '', "SentimentAuto"
= 0, "Franja" = 67878000, "Processed" = true WHERE "Id" = 5694304
2012-08-29 20:37:48 LOG: pid 32415: DB node id: 1 backend pid: 23646
statement: Parse: SET extra_float_digits = 3
2012-08-29 20:37:48 LOG: pid 32415: DB node id: 1 backend pid: 23646
statement: B message
2012-08-29 20:37:48 LOG: pid 32415: DB node id: 1 backend pid: 23646
statement: Execute: SET extra_float_digits = 3
2012-08-29 20:37:48 LOG: pid 32418: DB node id: 1 backend pid: 23645
statement: B message
2012-08-29 20:37:48 LOG: pid 32418: DB node id: 1 backend pid: 23645
statement: D message
2012-08-29 20:37:48 LOG: pid 32415: DB node id: 1 backend pid: 23646
statement: SELECT count(*) FROM pg_catalog.pg_proc AS p WHERE p.proname
= 'pgpool_regclass'
2012-08-29 20:37:48 LOG: pid 32418: DB node id: 1 backend pid: 23645
statement: Execute: UPDATE "Documents2" SET "Lemas" = '',
"SentimentAuto" = 0, "Franja" = 67878000, "Processed" = true WHERE "Id"
= 5694304
2012-08-29 20:37:48 LOG: pid 32420: DB node id: 1 backend pid: 23647
statement: Parse: SET extra_float_digits = 3
2012-08-29 20:37:48 LOG: pid 32420: DB node id: 1 backend pid: 23647
statement: B message
2012-08-29 20:37:48 LOG: pid 32420: DB node id: 1 backend pid: 23647
statement: Execute: SET extra_float_digits = 3
2012-08-29 20:37:48 LOG: pid 32420: DB node id: 1 backend pid: 23647
statement: SELECT count(*) FROM pg_catalog.pg_proc AS p WHERE p.proname
= 'pgpool_regclass'
2012-08-29 20:37:48 LOG: pid 32439: DB node id: 1 backend pid: 23648
statement: Parse: SET extra_float_digits = 3
2012-08-29 20:37:48 LOG: pid 32439: DB node id: 1 backend pid: 23648
statement: B message
2012-08-29 20:37:48 LOG: pid 32439: DB node id: 1 backend pid: 23648
statement: Execute: SET extra_float_digits = 3
2012-08-29 20:37:48 LOG: pid 32439: DB node id: 1 backend pid: 23648
statement: SELECT count(*) FROM pg_catalog.pg_proc AS p WHERE p.proname
= 'pgpool_regclass'
2012-08-29 20:37:48 LOG: pid 32437: connection received:
host=client30021 port=48698
2012-08-29 20:37:48 LOG: pid 32444: connection received:
host=client30011 port=58757
2012-08-29 20:37:48 LOG: pid 32437: DB node id: 1 backend pid: 23650
statement: Parse: SET extra_float_digits = 3
2012-08-29 20:37:48 LOG: pid 32437: DB node id: 1 backend pid: 23650
statement: B message
2012-08-29 20:37:48 LOG: pid 32437: DB node id: 1 backend pid: 23650
statement: Execute: SET extra_float_digits = 3
2012-08-29 20:37:48 LOG: pid 32437: DB node id: 1 backend pid: 23650
statement: SELECT count(*) FROM pg_catalog.pg_proc AS p WHERE p.proname
= 'pgpool_regclass'
###
### But in the same second, connections to the second postgres failed too
###
2012-08-29 20:37:48 ERROR: pid 32444: connect_inet_domain_socket:
connect() failed: Cannot assign requested address
2012-08-29 20:37:48 ERROR: pid 32444: connection to host1(5432) failed
2012-08-29 20:37:48 ERROR: pid 32444: new_connection: create_cp() failed
2012-08-29 20:37:48 LOG: pid 32444: degenerate_backend_set: 1 fail
over request from pid 32444
###
### These last lines with more connections from clients and initial
statements repeat about twenty times. And then...
###
2012-08-29 20:37:49 LOG: pid 19274: starting degeneration. shutdown
host host1(5432)
2012-08-29 20:37:49 ERROR: pid 19274: failover_handler: no valid DB node
found
2012-08-29 20:37:49 LOG: pid 19274: Restart all children
2012-08-29 20:37:49 LOG: pid 19308: pcp child process received restart
request
2012-08-29 20:37:49 LOG: pid 19274: execute command: /etc/failover.sh
1 "host1" 5432 /var/lib/postgresql/9.1/main/ -1 1 "" 1
2012-08-29 20:37:49 LOG: pid 19274: find_primary_node_repeatedly:
waiting for finding a primary node
2012-08-29 20:38:50 LOG: pid 19274: received immediate shutdown request
2012-08-29 20:38:51 ERROR: pid 458: pool_init_pool_passwd: couldn't open
/etc/pool_passwd. reason: Permission denied
2012-08-29 20:38:51 LOG: pid 458: read_status_file: 0 th backend is
set to down status
2012-08-29 20:38:51 LOG: pid 458: read_status_file: 1 th backend is
set to down status
2012-08-29 20:38:51 LOG: pid 465: connection received: host=client000
port=35578
2012-08-29 20:38:51 LOG: pid 470: connection received: host=client000
port=35582
2012-08-29 20:38:51 LOG: pid 466: connection received: host=client000
port=35579
2012-08-29 20:38:51 LOG: pid 467: connection received:
host=client30021 port=49154
2012-08-29 20:38:51 LOG: pid 464: connection received: host=client000
port=35577
2012-08-29 20:38:51 LOG: pid 469: connection received: host=client000
port=35581
2012-08-29 20:38:51 LOG: pid 468: connection received: host=client000
port=35580
2012-08-29 20:38:51 LOG: pid 458: pgpool-II successfully started.
version 3.1.2 (hatsuiboshi)
The same happened two more times in the next 30 minutes, but now it can
take up to 20 hours before a new failure happens.
--
Juanjo Pérez
www.oteara.com
More information about the pgpool-general
mailing list