[pgpool-general: 3618] Re: PGPool sending r/w queries to wrong DB node

Yugo Nagata nagata at sraoss.co.jp
Mon Apr 13 10:06:04 JST 2015


Hi pablo,

On Mon, 13 Apr 2015 09:40:11 +0900
Yugo Nagata <nagata at sraoss.co.jp> wrote:

> On Thu, 09 Apr 2015 14:23:07 -0400
> Pablo Sanchez <pablo at blueoakdb.com> wrote:
> 
> > [ Comments below, in-line ]
> > 
> > On 04/09/2015 01:13 AM, Yugo Nagata wrote:
> > > Hi,
> > 
> > Hi Yugo,
> > 
> > With your insight I believe I may have pieced together what happened.
> > I think we may have a bug (or two?) with pgpool but I'm not sure.
> > Please see my questions below ("q:)
> > 
> > As always, thank you for your time!
> > 
> > ::: Details :::
> > 
> > pgpool information
> > ------------------
> > Our failover isn't implemented so consequently, we have the following
> > pgpool.conf parameters set:
> > 
> >     o failover_command = ''
> 
> I think this is the reason why the standby PostgreSQL doesn't
> promote to primary. pgpool-II itself does nothing about
> this promotion and a failover script is needed.
> 
> >     o failback_command = ''
> >     o fail_over_on_backend_error = on
> >     o search_primary_node_timeout = 10
> > 
> > When I start pgpool, I use --discard-status.
> > 
> > Reconstruction
> > --------------
> > Here's what I believe happened, in order of execution with questions
> > on whether we have a bug or not:
> > 
> > o Because "fail_over_on_backend_error = on", when we encountered the
> >    error [1], pgpool degenerated
> > 
> >    q:  Should we have degenerated because of the failed DELETE
> >        statement?  Is this a bug?
> > 
> >        I remember terminating the DELETE statement on PG.
> 
> The DELETE statement failture itself doesn't cause the degeneration.
> I think that this is caused by terminating the statement. How and why
> did you terminate the statement?
> 
> > 
> > o pgpool attempts to seek a primary node[2]
> > 
> > o I believe after "search_primary_node_timeout", pgpool mistakenly
> >    selects the Slave DB as the primary[3]
> > 
> >    q:  Why did pgpool select the Slave DB?  Is this a bug?
> 
> As said above, this is because failover script isn't specified.
> A script to promote standby to primary is needed. Plese refer
> the document:
> http://www.pgpool.net/docs/latest/pgpool-en.html#failover_in_stream_mode

It is better to use failover.sh in the watchdog tutorial [1] and
configure pgpool.conf as:
 failover_command = '/path/to/script/failover.sh %d %P %H %R'

[1] http://www.pgpool.net/pgpool-web/contrib_docs/watchdog_master_slave_3.3/en.html#config_master_slave

The example in the document which I refer previously assumes that
the node ID of standby is 1, and this doesn't work well in other
cases. The failover.sh in the watchdog tutorial works whichever
standby is.

> 
> > 
> >        The DB Cluster has not been restarted since the middle of last
> >        month.
> > 
> >        As a point of reference, to fix the situation, all I did was
> >        restart pgpool.
> > 
> > References
> > ----------
> > [1] - degenerate
> > 
> > (pid 14902): mydb: LOG:  pool_send_and_wait: Error or notice message 
> > from backend: : DB node id: 0 backend pid: 31461 statement: "delete from 
> > t_case_file_tag_association where case_file_id=$1" message: "terminating 
> > connection due to administrator command"
> > (pid 15214): mydb: LOG:  pool_send_and_wait: Error or notice message 
> > from backend: : DB node id: 0 backend pid: 578 statement: "delete from 
> > t_case_file_tag_association where case_file_id=$1" message: "terminating 
> > connection due to administrator command"
> > (pid 14902): mydb: ERROR:  unable to forward message to frontend
> > (pid 14902): mydb: DETAIL:  FATAL error occured on backend
> > (pid 15214): mydb: ERROR:  unable to forward message to frontend
> > (pid 15214): mydb: DETAIL:  FATAL error occured on backend
> > (pid 14902): mydb: LOG:  received degenerate backend request for 
> > node_id: 0 from pid [14902]
> > 
> > [2] - seeking a primary node
> > 
> > LOG:  find_primary_node_repeatedly: waiting for finding a primary node
> > LOG:  find_primary_node: checking backend no 0
> > LOG:  find_primary_node: checking backend no 1
> > 
> > [3] - pgpool selects the Slave DB as the Primary
> > 
> > LOG:  failover: set new primary node: -1
> > LOG:  failover: set new master node: 1
> > 
> > --
> > Pablo Sanchez - Blueoak Database Engineering, Inc
> > Ph:    819.459.1926         Blog:  http://pablo-blog.blueoakdb.com
> > iNum:  883.5100.0990.1054
> 
> 
> -- 
> Yugo Nagata <nagata at sraoss.co.jp>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general


-- 
Yugo Nagata <nagata at sraoss.co.jp>


More information about the pgpool-general mailing list