[pgpool-general: 3617] Re: PGPool sending r/w queries to wrong DB node
Yugo Nagata
nagata at sraoss.co.jp
Mon Apr 13 09:40:11 JST 2015
On Thu, 09 Apr 2015 14:23:07 -0400
Pablo Sanchez <pablo at blueoakdb.com> wrote:
> [ Comments below, in-line ]
>
> On 04/09/2015 01:13 AM, Yugo Nagata wrote:
> > Hi,
>
> Hi Yugo,
>
> With your insight I believe I may have pieced together what happened.
> I think we may have a bug (or two?) with pgpool but I'm not sure.
> Please see my questions below ("q:)
>
> As always, thank you for your time!
>
> ::: Details :::
>
> pgpool information
> ------------------
> Our failover isn't implemented so consequently, we have the following
> pgpool.conf parameters set:
>
> o failover_command = ''
I think this is the reason why the standby PostgreSQL doesn't
promote to primary. pgpool-II itself does nothing about
this promotion and a failover script is needed.
> o failback_command = ''
> o fail_over_on_backend_error = on
> o search_primary_node_timeout = 10
>
> When I start pgpool, I use --discard-status.
>
> Reconstruction
> --------------
> Here's what I believe happened, in order of execution with questions
> on whether we have a bug or not:
>
> o Because "fail_over_on_backend_error = on", when we encountered the
> error [1], pgpool degenerated
>
> q: Should we have degenerated because of the failed DELETE
> statement? Is this a bug?
>
> I remember terminating the DELETE statement on PG.
The DELETE statement failture itself doesn't cause the degeneration.
I think that this is caused by terminating the statement. How and why
did you terminate the statement?
>
> o pgpool attempts to seek a primary node[2]
>
> o I believe after "search_primary_node_timeout", pgpool mistakenly
> selects the Slave DB as the primary[3]
>
> q: Why did pgpool select the Slave DB? Is this a bug?
As said above, this is because failover script isn't specified.
A script to promote standby to primary is needed. Plese refer
the document:
http://www.pgpool.net/docs/latest/pgpool-en.html#failover_in_stream_mode
>
> The DB Cluster has not been restarted since the middle of last
> month.
>
> As a point of reference, to fix the situation, all I did was
> restart pgpool.
>
> References
> ----------
> [1] - degenerate
>
> (pid 14902): mydb: LOG: pool_send_and_wait: Error or notice message
> from backend: : DB node id: 0 backend pid: 31461 statement: "delete from
> t_case_file_tag_association where case_file_id=$1" message: "terminating
> connection due to administrator command"
> (pid 15214): mydb: LOG: pool_send_and_wait: Error or notice message
> from backend: : DB node id: 0 backend pid: 578 statement: "delete from
> t_case_file_tag_association where case_file_id=$1" message: "terminating
> connection due to administrator command"
> (pid 14902): mydb: ERROR: unable to forward message to frontend
> (pid 14902): mydb: DETAIL: FATAL error occured on backend
> (pid 15214): mydb: ERROR: unable to forward message to frontend
> (pid 15214): mydb: DETAIL: FATAL error occured on backend
> (pid 14902): mydb: LOG: received degenerate backend request for
> node_id: 0 from pid [14902]
>
> [2] - seeking a primary node
>
> LOG: find_primary_node_repeatedly: waiting for finding a primary node
> LOG: find_primary_node: checking backend no 0
> LOG: find_primary_node: checking backend no 1
>
> [3] - pgpool selects the Slave DB as the Primary
>
> LOG: failover: set new primary node: -1
> LOG: failover: set new master node: 1
>
> --
> Pablo Sanchez - Blueoak Database Engineering, Inc
> Ph: 819.459.1926 Blog: http://pablo-blog.blueoakdb.com
> iNum: 883.5100.0990.1054
--
Yugo Nagata <nagata at sraoss.co.jp>
More information about the pgpool-general
mailing list