[pgpool-general: 3626] Re: PGPool sending r/w queries to wrong DB node

Yugo Nagata nagata at sraoss.co.jp
Wed Apr 15 20:01:46 JST 2015


Hi,

On Mon, 13 Apr 2015 17:10:45 -0400
Pablo Sanchez <pablo at blueoakdb.com> wrote:

> [ Comments below, in-line ]
> 
> On 04/12/2015 08:40 PM, Yugo Nagata wrote:
> > On Thu, 09 Apr 2015 14:23:07 -0400
> > Pablo Sanchez <pablo at blueoakdb.com> wrote:
> >
> >> On 04/09/2015 01:13 AM, Yugo Nagata wrote:
> >>> Hi,
> >>
> >> Hi Yugo,
> >>
> >> With your insight I believe I may have pieced together what happened.
> >> I think we may have a bug (or two?) with pgpool but I'm not sure.
> >> Please see my questions below ("q:)
> >>
> >> As always, thank you for your time!
> >>
> >> ::: Details :::
> >>
> >> pgpool information
> >> ------------------
> >> Our failover isn't implemented so consequently, we have the following
> >> pgpool.conf parameters set:
> >>
> >>      o failover_command = ''
> >
> > I think this is the reason why the standby PostgreSQL doesn't
> > promote to primary. pgpool-II itself does nothing about
> > this promotion and a failover script is needed.
> 
> Hi Yugo,
> 
> The Primary /never/ crashed.  Neither did the Slave.
> 
> In other words, there was no need to do a promotion.

OK. The current problem is why failover occured.
However, either way, failover_command should be specified
in preparation for primary down.


> 
> >> Reconstruction
> >> --------------
> >> Here's what I believe happened, in order of execution with questions
> >> on whether we have a bug or not:
> >>
> >> o Because "fail_over_on_backend_error = on", when we encountered the
> >>     error [1], pgpool degenerated
> >>
> >>     q:  Should we have degenerated because of the failed DELETE
> >>         statement?  Is this a bug?
> >>
> >>         I remember terminating the DELETE statement on PG.
> >
> > The DELETE statement failture itself doesn't cause the degeneration.
> > I think that this is caused by terminating the statement. How and
> > why did you terminate the statement?
> 
> I terminated the DELETE because it had been running for +30 minutes.
> The users of the system informed me this DELETE should complete in
> under a second.

How did you terminate the DELETE? If you terminated by
pg_terminal_backend(), this caused the degerenation.
I confirmed this by issuing "SELECT pg_sleep(100)" through
pgpool-II.

> 
> At the time of termination, there was very little other activity on
> the DB.  However I cannot say definitively that terminating the DELETE
> cause the degneration but the log below [1] suggests it.
> 
> Notice the pid of the DELETE is "14902"
> 
>     (pid 14902): mydb: LOG:  pool_send_and_wait: Error or notice
>     message from backend: : DB node id: 0 backend pid: 31461
>     statement: "delete from t_case_file_tag_association where
>     case_file_id=$1" message: "terminating
> 
> Now, notice this entry where it's saying the degenerate backend
> request for ... pid "14902":
> 
>     (pid 14902): mydb: LOG:  received degenerate backend request for
>     node_id: 0 from pid [14902]
> 
> Isn't the log saying the degeneration was due to pid "14902"?  Which
> is the "DELETE"?

"14902" is a pgpool's process which had connected to postgres. The process
requested the degeneration but this was because this received the error
"terminating connection due to administrator command" from backend.

> 
> >>
> >> References
> >> ----------
> >> [1] - degenerate
> >>
> >> (pid 14902): mydb: LOG:  pool_send_and_wait: Error or notice message
> >> from backend: : DB node id: 0 backend pid: 31461 statement: "delete from
> >> t_case_file_tag_association where case_file_id=$1" message: "terminating
> >> connection due to administrator command"
> >> (pid 15214): mydb: LOG:  pool_send_and_wait: Error or notice message
> >> from backend: : DB node id: 0 backend pid: 578 statement: "delete from
> >> t_case_file_tag_association where case_file_id=$1" message: "terminating
> >> connection due to administrator command"
> >> (pid 14902): mydb: ERROR:  unable to forward message to frontend
> >> (pid 14902): mydb: DETAIL:  FATAL error occured on backend
> >> (pid 15214): mydb: ERROR:  unable to forward message to frontend
> >> (pid 15214): mydb: DETAIL:  FATAL error occured on backend
> >> (pid 14902): mydb: LOG:  received degenerate backend request for
> >> node_id: 0 from pid [14902]
> >>
> >> [2] - seeking a primary node
> >>
> >> LOG:  find_primary_node_repeatedly: waiting for finding a primary node
> >> LOG:  find_primary_node: checking backend no 0
> >> LOG:  find_primary_node: checking backend no 1
> >>
> >> [3] - pgpool selects the Slave DB as the Primary
> >>
> >> LOG:  failover: set new primary node: -1
> >> LOG:  failover: set new master node: 1
> >>
> >> --
> >> Pablo Sanchez - Blueoak Database Engineering, Inc
> >> Ph:    819.459.1926         Blog:  http://pablo-blog.blueoakdb.com
> >> iNum:  883.5100.0990.1054
> >
> >
> 
> 
> 
> --
> Pablo Sanchez - Blueoak Database Engineering, Inc
> Ph:    819.459.1926         Blog:  http://pablo-blog.blueoakdb.com
> iNum:  883.5100.0990.1054
> 


-- 
Yugo Nagata <nagata at sraoss.co.jp>


More information about the pgpool-general mailing list