[pgpool-general: 1517] Why does Pgpool die?

Herouth Maoz herouth at unicell.co.il
Wed Mar 20 22:43:24 JST 2013


Hi. First time on this mailing list.

I'm testing pgpool for a high-availability solution which involves two PostgreSQL servers in streaming replication, and two pgpool servers with watchdog monitoring each other. It all seemed to work well until I killed one of the PostgreSQL servers to test if replication will work. And then the pgpools died - though failover succeeded.

Eventually I left one of the pgpools stopped, and configured watchdog off on the other one, and tried to run it standalone with the existing setup (original master node still not up, original slave node now master).

However, I can't get pgpool to recognize the state of the server, and the pgpool process dies (leaving subprocesses that are still alive). The strange thing is that I can connect to pcp (probably to the child processes) and it gives me a proper status (1 or 2) for both servers - although only one of them is up, and although I can't connect using psql through pgpool (though I can connect directly).

Error log says:

Mar 20 15:14:16 pool01 pgpool: 2013-03-20 15:14:16 LOG:   pid 4603: find_primary_node: primary node id is 0
Mar 20 15:14:16 pool01 pgpool: 2013-03-20 15:14:16 ERROR: pid 4603: pool_flush_it: write failed to backend (0). reason: Connection refused offset: 0 wlen: 41
Mar 20 15:14:16 pool01 pgpool: 2013-03-20 15:14:16 LOG:   pid 4603: degenerate_backend_set: 0 fail over request from pid 4603
Mar 20 15:24:39 pool01 pgpool: 2013-03-20 15:24:39 LOG:   pid 4639: connection received: host=[local]
Mar 20 15:24:39 pool01 pgpool: 2013-03-20 15:24:39 ERROR: pid 4639: pool_flush_it: write failed to backend (1). reason: Connection refused offset: 0 wlen: 76

I'm pgpool2 as installed from the Debian packages (we prefer not to compile and install vanilla packages), and since this is pgpool 3.2, this means the experimental packages.

Can anybody shed any light as to why it doesn't properly detect the state of the servers nor allow me to connect, and why the pgpool process dies?


Summary:
servers:
pstgr01: original slave, currently master.
pstgr02: original master, currently stopped.

pgpool servers:
pool01: currently set to work standalone without watchdog. 
pool02: currently stopped

psql -h pstgr01 works properly
psql -h pool01 doesn't work. Exit status 2

Error log on pool01: see above.
Processes on pool01: 
UID        PID  PPID  C STIME TTY          TIME CMD
postgres  4604     1  0 15:14 pts/0    00:00:00 logger -t pgpool -p local0.info
postgres  4612     1  0 15:14 pts/0    00:00:00 pgpool: wait for connection request
postgres  4613     1  0 15:14 pts/0    00:00:00 pgpool: wait for connection request
postgres  4614     1  0 15:14 pts/0    00:00:00 pgpool: wait for connection request
.
.
.
postgres  4638     1  0 15:14 pts/0    00:00:00 pgpool: wait for connection request
postgres  4644     1  0 15:14 pts/0    00:00:00 pgpool: PCP: wait for connection request

Note that all the pgpool processes have lost their father (formerly process 4603) and now belong to init(1).

Thank you,
Herouth


More information about the pgpool-general mailing list