[pgpool-hackers: 3322] Re: Some performance improvements for Pgpool-II

Thu May 2 19:38:31 JST 2019

Hi Usama,

> Hi Ishii-San
> 
> Thanks for the feedback, I have been exploring the options to generate two
> parsers from one gram.y file but unfortunately can't figure out any clean
> and easy way to
> do that. The problem is the bison assembler does not support the
> preprocessors
> off the shelf and I am not sure about how we can achieve that without
> preprocessor
> using the scripts.
> 
> However, after analysing the end result of using one gram.y file for
> generating
> two parser I think it would be more hectic to maintain and especially merge
> the gram.y
> file in that case. Since adding the code in the existing gram.y file will
> increase the number
> of changes in our gram.y file from that of PostgreSQL, and effectively that
> mean at every
> PostgreSQL release we would require a bigger merging effort.
> 
> And if we go down the path of having two gram files as in the original
> patch I shared,
> despite the downside of having another parser file to maintain, the
> approach has its own
> benefits as well. Like it would be easier to track down the errors and
> problems.
> 
> Also the idea of having the minimal parser for master-slave mode is to
> speed up the queries by
> quickly analysing if we we need to parse the complete query or can we just
> do with the minimal
> information. Now my motivation of adding the minimal parser was that, in a
> perfect scenario we should
> only parse the read statements and short-circuit the parsing of almost all
> write queries. So although the
> patch does that short-circuiting only with INSERT statements, but for
> future it would be easier to make
> enhancements in the minimal parser if we go with 2 gram files approach.
> 
> Another thing is since the minimal parser is based on the philosophy that
> only queries that can be load-balanced
> must be parsed completely and rest of the statements need to be routed to
> primary anyways, and can
> be consider as default write statements no matter the actual statement. So
> effectively we might
> not need to merge the minimal parser with PostgreSQL's parser in most of
> the cases and the minimal parser
> would not add too much maintenance overhead.

But SET statement needs special treatment and needs full parser for
it. See is_set_transaction_serializable (in
src/context/pool_query_context.c) for an example. Same thing can be
said to LOCK, SAVEPOINT, BEGIN (START TRANSACTION), and PREPARE.

Also for DML, at least the target table info needs to be analyzed
because in memory query cache needs the info.

> What are your thought and suggestions on this?

If we go for two gram.y approach, probably we want to minimize the
diff to create minimal parser from the full parser in order to save
the work to create minimal parser, no?

> Thanks
> Best Regards
> Muhammad Usama
> 
> 
> to maintain
> 
> 
> 
> 
> On Fri, Mar 8, 2019 at 7:40 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
>> Hi Usama,
>>
>> Thank you for the new patch. The essential idea to have two parsers
>> seems to be good. However, I have a concern of maintenance cost for
>> those two distinct parsers. It seems the difference between two
>> parsers are subtle. So why don't you have just 1 original parser file
>> (gram.y) then generate the other automatically by using a script? Or
>> you could have 1 gram.y file with some ifdef's to differentiate those
>> two parsers. What do you think?
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>> > Hi Ishii-San
>> >
>> > Thanks for looking into the patch and your observation is correct, the
>> > patch I sent would break
>> > the timestamp overwriting functionality,
>> >
>> > So here is another go at this, See the new attached patch, which uses two
>> > parsers,
>> > 1- src/parser/gram.y which is the standard parser brought in from PG code
>> > 2- src/parser/gram_minimal.y modified parser that short-circuits the
>> INSERT
>> > and UPDATE statements
>> >
>> > The idea here is when replication mode is enabled we use the standard
>> > parser and parse everything since
>> > we need the information in case of timestamp or other things. but when we
>> > are operating in the master-slave
>> > mode we don't need any extra information for WRITE queries, since we will
>> > always be sending them to the primary
>> > anyways, so the newly added minimal parser which short-circuits the
>> INSERT
>> > and UPDATE queries
>> > (can also be extended to short-circuit all the write queries like CREATE
>> > and ALTER where ever we can)
>> > and try to enhance the write query performance of the Pgpool-II,
>> >
>> > The parser selection is made in raw_parser() function and is controlled
>> by
>> > the "bool use_minimal" argument
>> > and currently in the patch we always use the minimal parser whenever the
>> > replication mode is disabled in the
>> > pgpool.conf
>> >
>> > This is the very radical change in the parsing function of Pgpool-II, but
>> > it can provide us with the platform to minimise
>> > the parsing overhead of Pgpool-II for master-slave mode.
>> >
>> > Your thoughts and comments..
>> >
>> > Thanks
>> > Best Regards
>> > Muhammad Usama
>> >
>> > On Wed, Feb 27, 2019 at 3:58 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>> >
>> >> Hi Usama,
>> >>
>> >> I think this patch breaks replication mode because it throws away
>> >> information needed for time stamp rewriting. What do you think?
>> >>
>> >> Best regards,
>> >> --
>> >> Tatsuo Ishii
>> >> SRA OSS, Inc. Japan
>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> Japanese:http://www.sraoss.co.jp
>> >>
>> >> > Hi Ishii San
>> >> >
>> >> > Can you have a look at the attached patch which tries to extract some
>> >> > performance in the area of query parsing and query analysis for
>> routing
>> >> > decisions. Most of the performance gains from the changes in the patch
>> >> can
>> >> > be observed in large data INSERT statements.
>> >> >
>> >> > Patch contains the following changes
>> >> > ==========
>> >> > 1-- The idea here is since Pgpool-II only needs a very little
>> information
>> >> > about the queries especially for the insert queries to decide where it
>> >> > needs to send the query,
>> >> > for example: For the INSERT queries we only need the type of query and
>> >> the
>> >> > relation name.
>> >> > But since the parser we use in Pgpool-II is taken from PostgreSQL
>> source
>> >> > which parses the complete query including the value lists ( which is
>> not
>> >> > required by Pgpool).
>> >> > This parsing of value part seems very harmless in small statements
>> but in
>> >> > case of INSERTs with lots of column values and large data in each
>> value
>> >> > item, this becomes significant.
>> >> > So this patch adds a smaller bison grammar rule to short circuit the
>> >> INSERT
>> >> > statement parsing when it gets the enough information required for
>> >> > Pgpool-II.
>> >> >
>> >> > 2-- The patch also re-arranges some of the if statements in
>> >> > pool_where_to_send() function and tries to make sure the
>> pattern_compare
>> >> > and pool_has_function_call calls should only be made when they are
>> >> > absolutely necessary.
>> >> >
>> >> > 3--Another thing this patch does is, it tries to save the raw_parser()
>> >> > calls in case of un-recognised queries. Instead of invoking the
>> parser of
>> >> > "dummy read" and "dummy write" queries in case of syntax error in
>> >> original
>> >> > query, the patch adds the functions to get pre-built parse_trees for
>> >> these
>> >> > dummy queries.
>> >> >
>> >> > 4-- strlen() call is removed from scanner_init() function and is
>> passed
>> >> in
>> >> > as an argument to it, and the reason is we already have the query
>> length
>> >> in
>> >> > most cases before invoking the parser so why waste CPU cycles on it.
>> >> Again
>> >> > this becomes significant in case of large query strings.
>> >> >
>> >> > Finally the patch tries to remove the unnecessary calls of
>> >> > pool_is_likely_select()
>> >> >
>> >> > As mentioned above the area of improvements in this patch are mostly
>> >> around
>> >> > writing queries and for the testing purpose I used a INSERT query with
>> >> > large binary insert data and I am getting a very huge performance
>> gains
>> >> > with this patch
>> >> >
>> >> > *Current Master Branch*
>> >> > ================
>> >> > usama=# \i sample.sql
>> >> >  id
>> >> > -----
>> >> >  104
>> >> > (1 row)
>> >> >
>> >> > INSERT 0 1
>> >> > Time: *2059.807* ms (00:02.060)
>> >> >
>> >> > *WITH PATCH*
>> >> > ===============
>> >> > usama=# \i sample.sql
>> >> >  id
>> >> > -----
>> >> >  102
>> >> > (1 row)
>> >> >
>> >> > INSERT 0 1
>> >> > Time: *314.237* ms
>> >> >
>> >> >
>> >> > Performance gain* 655.50 %*
>> >> >
>> >> >
>> >> > Comments and suggestions?
>> >> >
>> >> > Please let me know if you also want the test data I used for the
>> INSERT
>> >> test
>> >> >
>> >> > Thanks
>> >> > Best regards
>> >> > Muhammad Usama
>> >>
>>