[pgpool-general: 2047] Re: Suggestion about two datacenters connected through WAN

Mistina Michal Michal.Mistina at virte.sk
Sun Aug 18 20:12:21 JST 2013


Hi Tatsuo.

> What kind of problem do yo have with PostgreSQL streaming replication?

I don't know if this is the right forum, but maybe you can help me. The issue does not concern directly pgpool, but I'd like to use pgpool if I solve this one. I already wrote to pgsql-general. Nobody answered yet.

In summary, the main issue rises after I set up streaming replication - I am unable to stop postgresql service correctly on master. After issuing /etc/init.d/postgresql-9.2 stop the postmaster.pid remains on the filesystem and moreover it is corrupted. I am unable to delete it with rm command.

It looks like this:
[root at tstcaps01 ~]# ll /var/lib/pgsql/9.2/data/
ls: cannot access /var/lib/pgsql/9.2/data/postmaster.pid: No such file or directory
total 56
drwx------ 7 postgres postgres    62 Jun 26 17:13 base
drwx------ 2 postgres postgres  4096 Aug 18 00:25 global
drwx------ 2 postgres postgres    17 Jun 26 09:54 pg_clog
-rw------- 1 postgres postgres  5127 Aug 17 16:24 pg_hba.conf
-rw------- 1 postgres postgres  1636 Jun 26 09:54 pg_ident.conf
drwx------ 2 postgres postgres  4096 Jul  2 00:00 pg_log
drwx------ 4 postgres postgres    34 Jun 26 09:53 pg_multixact
drwx------ 2 postgres postgres    17 Aug 18 00:23 pg_notify
drwx------ 2 postgres postgres     6 Jun 26 09:53 pg_serial
drwx------ 2 postgres postgres     6 Jun 26 09:53 pg_snapshots
drwx------ 2 postgres postgres     6 Aug 18 00:25 pg_stat_tmp
drwx------ 2 postgres postgres    17 Jun 26 09:54 pg_subtrans
drwx------ 2 postgres postgres     6 Jun 26 09:53 pg_tblspc
drwx------ 2 postgres postgres     6 Jun 26 09:53 pg_twophase
-rw------- 1 postgres postgres     4 Jun 26 09:53 PG_VERSION
drwx------ 3 postgres postgres  4096 Aug 18 00:25 pg_xlog
-rw------- 1 postgres postgres 19884 Aug 17 22:54 postgresql.conf
-rw------- 1 postgres postgres    71 Aug 18 00:23 postmaster.opts
?????????? ? ?        ?            ?            ? postmaster.pid
-rw-r--r-- 1 postgres postgres   491 Aug 17 16:33 recovery.done

Have you been in this kind of curious situation before? Did you solve it somehow?

I will try to explain whole situation and how I got into it.

The scenario of redundant environment is in the "graphic" representation... (http://www.asciiflow.com/#4899844131549967831)

           +------------------------------------+
           |                          WAN                        |
+-----+-----+------------+                +-----v------+------------+
|pgpool      |                    |                |pgpool       |                    |
+------------+------------+                +------------+------------+
|pgsql         |pgsql          |                |pgsql          |pgsql          |
+------------+------------+                +------------+------------+
|drbd-pri   |drbd-sec   |                |drbd-pri    |drbd-sec  |
+------------+------------+                +------------+------------+
|           pacemaker         |                |           pacemaker          |
+-------------------------+                +--------------------------+
|            corosync             |                |            corosync             |
+------------+------------+                +------------+------------+
|node1       |node2        |                |node1       |node2       |
+------------+------------+                +------------+------------+
                   TC1                                                          TC2

In one moment there is only one postgresql active in each technical center. Pgpool is currently not managed by pacemaker, because I did want to test it. After it works I will make it managed by pacemaker using pgpool-ha resource agent.

Before streaming replication was established from TC1 to TC2, the migration of resources managed by pacemaker from node1 to node2 within TC1 has been successful.
After I established streaming replication and tried to move resources (including pgsql) from node1 to node2, migration of postgres resource failed. And I ended up with aforementioned corrupted postmaster.pid file on the filesystem of node1. Pacemaker did actually kill postgres process but I think it somehow checks if the postmaster.pid still exists or not. If the pacemaker find postmaster.pid is still there it ends up with FAILED status.
Now I am stucked with this postmaster.pid file and cannot continue further with debugging. I cannot start postgres server because even if I start it there are two identical postmaster.pid files. These are not clean conditions for testing and investigating.

I would be grateful if I can get behind this issue. The day would be nicer then :-)

Best regards,
Michal Mistina




More information about the pgpool-general mailing list