[pgpool-general: 2048] Re: Suggestion about two datacenters connected through WAN

Tatsuo Ishii ishii at postgresql.org
Mon Aug 19 07:27:45 JST 2013


I'm not familiar with pacemaker at all, so this is just a guess. Do
you sync PostgreSQL database cluster using DRBD? If so, I think you
should not do that. PostgreSQL modifies the database through the file
system mounted and in the mean time DRBD modifies the file system,
which would lead to corruption.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Hi Tatsuo.
> 
>> What kind of problem do yo have with PostgreSQL streaming replication?
> 
> I don't know if this is the right forum, but maybe you can help me. The issue does not concern directly pgpool, but I'd like to use pgpool if I solve this one. I already wrote to pgsql-general. Nobody answered yet.
> 
> In summary, the main issue rises after I set up streaming replication - I am unable to stop postgresql service correctly on master. After issuing /etc/init.d/postgresql-9.2 stop the postmaster.pid remains on the filesystem and moreover it is corrupted. I am unable to delete it with rm command.
> 
> It looks like this:
> [root at tstcaps01 ~]# ll /var/lib/pgsql/9.2/data/
> ls: cannot access /var/lib/pgsql/9.2/data/postmaster.pid: No such file or directory
> total 56
> drwx------ 7 postgres postgres    62 Jun 26 17:13 base
> drwx------ 2 postgres postgres  4096 Aug 18 00:25 global
> drwx------ 2 postgres postgres    17 Jun 26 09:54 pg_clog
> -rw------- 1 postgres postgres  5127 Aug 17 16:24 pg_hba.conf
> -rw------- 1 postgres postgres  1636 Jun 26 09:54 pg_ident.conf
> drwx------ 2 postgres postgres  4096 Jul  2 00:00 pg_log
> drwx------ 4 postgres postgres    34 Jun 26 09:53 pg_multixact
> drwx------ 2 postgres postgres    17 Aug 18 00:23 pg_notify
> drwx------ 2 postgres postgres     6 Jun 26 09:53 pg_serial
> drwx------ 2 postgres postgres     6 Jun 26 09:53 pg_snapshots
> drwx------ 2 postgres postgres     6 Aug 18 00:25 pg_stat_tmp
> drwx------ 2 postgres postgres    17 Jun 26 09:54 pg_subtrans
> drwx------ 2 postgres postgres     6 Jun 26 09:53 pg_tblspc
> drwx------ 2 postgres postgres     6 Jun 26 09:53 pg_twophase
> -rw------- 1 postgres postgres     4 Jun 26 09:53 PG_VERSION
> drwx------ 3 postgres postgres  4096 Aug 18 00:25 pg_xlog
> -rw------- 1 postgres postgres 19884 Aug 17 22:54 postgresql.conf
> -rw------- 1 postgres postgres    71 Aug 18 00:23 postmaster.opts
> ?????????? ? ?        ?            ?            ? postmaster.pid
> -rw-r--r-- 1 postgres postgres   491 Aug 17 16:33 recovery.done
> 
> Have you been in this kind of curious situation before? Did you solve it somehow?
> 
> I will try to explain whole situation and how I got into it.
> 
> The scenario of redundant environment is in the "graphic" representation... (http://www.asciiflow.com/#4899844131549967831)
> 
>            +------------------------------------+
>            |                          WAN                        |
> +-----+-----+------------+                +-----v------+------------+
> |pgpool      |                    |                |pgpool       |                    |
> +------------+------------+                +------------+------------+
> |pgsql         |pgsql          |                |pgsql          |pgsql          |
> +------------+------------+                +------------+------------+
> |drbd-pri   |drbd-sec   |                |drbd-pri    |drbd-sec  |
> +------------+------------+                +------------+------------+
> |           pacemaker         |                |           pacemaker          |
> +-------------------------+                +--------------------------+
> |            corosync             |                |            corosync             |
> +------------+------------+                +------------+------------+
> |node1       |node2        |                |node1       |node2       |
> +------------+------------+                +------------+------------+
>                    TC1                                                          TC2
> 
> In one moment there is only one postgresql active in each technical center. Pgpool is currently not managed by pacemaker, because I did want to test it. After it works I will make it managed by pacemaker using pgpool-ha resource agent.
> 
> Before streaming replication was established from TC1 to TC2, the migration of resources managed by pacemaker from node1 to node2 within TC1 has been successful.
> After I established streaming replication and tried to move resources (including pgsql) from node1 to node2, migration of postgres resource failed. And I ended up with aforementioned corrupted postmaster.pid file on the filesystem of node1. Pacemaker did actually kill postgres process but I think it somehow checks if the postmaster.pid still exists or not. If the pacemaker find postmaster.pid is still there it ends up with FAILED status.
> Now I am stucked with this postmaster.pid file and cannot continue further with debugging. I cannot start postgres server because even if I start it there are two identical postmaster.pid files. These are not clean conditions for testing and investigating.
> 
> I would be grateful if I can get behind this issue. The day would be nicer then :-)
> 
> Best regards,
> Michal Mistina
> 
> 


More information about the pgpool-general mailing list