[pgpool-hackers: 4257] Re: segmentation fault error

稲垣毅 / INAGAKI,TSUYOSHI tsuyoshi.inagaki.ej at hitachi.com
Tue Dec 27 19:16:12 JST 2022


Hi, Ishii-san.

Thank you for quick reply.
If you are caught in a trap, consider:
(a) When it goes down due to a trap put in Execute()
    It can be confirmed that the memory has been destroyed somewhere between the Bind() completion and the Execute() call.
    -> From the completion of Bind(), further narrow down by setting a trap at the running location of the Execute() call.

(b) When downed by a trap put in Bind()
    It can be confirmed that the memory has been destroyed somewhere in Bind().
    -> Narrow it down by setting further traps at the assumed locations in Bind().

I mentioned that I will set a trap and narrow down, but since the processing flow is unknown, it is not clear where to set it.
I'm thinking of embedding this function in the function that calls it, but what do you think?

Note that,
we are currently building and testing a reproduction environment, but we have not been able to reproduce it.

Regards.

Inagaki Tsuyoshi
<tsuyoshi.inagaki.ej �� hitachi.com>


-----Original Message-----
From: Tatsuo Ishii <ishii �� sraoss.co.jp> 
Sent: Tuesday, December 27, 2022 3:11 PM
To: 稲垣毅 / INAGAKI,TSUYOSHI <tsuyoshi.inagaki.ej �� hitachi.com>
Cc: pgpool-hackers �� pgpool.net
Subject: [!]Re: [pgpool-hackers: 4229] segmentation fault error

Hi, Itagaki-san,

> Hi, Ishii-san
> 
> Thanks for replay.
> The processing I put in to activate the trap is as follows.
> 
>>	query_context->parse_tree = 0 ;  <--- add
>> 	pool_has_function_call(query_context->parse_tree); <--- add
>> 
>> 	if (rewrite_msg)
>> 		pfree(rewrite_msg);
>> 	return POOL_CONTINUE;
> 
> Since the original phenomenon was also a segmentation fault, it is recognized that this trap will cause a segmentation fault when the area destruction is reproduced.
> It is said that 0 is not proof of a trap, but how can it be proof of a trap?

>From the original coredump of segfault, it is apparent that query_context->parse_tree was smashed. But question is when? That's why I asked you to insert
pool_has_function_call(query_context->parse_tree) into several places.
If you do not see segfaults, probably you need to focus on how to reproduce the problem.

> Regards,
> 
> Inagaki Tsuyoshi
> <tsuyoshi.inagaki.ej �� hitachi.com>
> 
> 
> -----Original Message-----
> From: Tatsuo Ishii <ishii �� sraoss.co.jp>
> Sent: Wednesday, December 21, 2022 1:58 PM
> To: 稲垣毅 / INAGAKI,TSUYOSHI <tsuyoshi.inagaki.ej �� hitachi.com>
> Cc: pgpool-hackers �� pgpool.net
> Subject: [!]Re: [pgpool-hackers: 4229] segmentation fault error
> 
> Hi Iagaki-san,
> 
> Sorry for late reply.
> 
>> Hi, Ishii-san
>> 
>> Please allow me to check that my understanding is correct.
>> I created a custom version by embedding the processing you suggested.
>> Currently, it cannot be reproduced, so when I set the node to 0 just 
>> before the corrected part and confirmed it, it was segmentation.
> 
> You mean you modified the source code of Bind() something like this?
> 
>> 	pool_has_function_call(0); <--- add
>> 
>> 	if (rewrite_msg)
>> 		pfree(rewrite_msg);
>> 	return POOL_CONTINUE;
> 
> If so, of course pool_has_function_call() segfaults because of reference to 0x00.
> But this does not prove anything. Can you please elaborate?
> 
>> The stack is as follows.
>> #0 pool_has_function_call (node=0x0) at utils/pool_select_walker.c:67
>> #1 0x000000000043aa6e in Bind (frontend=frontend �� entry=0x1177128, backend=backend �� entry=0x7f3d27e1a690, len=<optimized out>, contents=<optimized out>,
>>      contents �� entry=0x117d558 "") at
>> protocol/pool_proto_modules.c:1682
>> #2 0x00000000004400cf in ProcessFrontendResponse 
>> (frontend=frontend �� entry=0x1177128,
>> backend=backend �� entry=0x7f3d27e1a690) at
>> protocol/pool_proto_modules.c:2750
>> #3 0x0000000000433086 in pool_process_query (frontend=0x1177128, 
>> backend=0x7f3d27e1a690, reset_request=reset_request �� entry=0) at
>> protocol/pool_process_query.c:263
>> #4 0x000000000042c549 in do_child (fds=fds �� entry=0x11b3310) at
>> protocol/child.c:449
>> #5 0x00000000004060e5 in fork_a_child (fds=0x11b3310, id=30) at
>> main/pgpool_main.c:682
>> #6 0x000000000040cc99 in PgpoolMain (discard_status=discard_status �� entry=1 '\001', clear_memcache_oidmaps=clear_memcache_oidmaps �� entry=0 '\000')
>>      at main/pgpool_main.c:410
>> #7 0x0000000000404247 in main (argc=<optimized out>, argv=<optimized
>> out>) at main/main.c:365
>> 
>> By the process embedded this time, it will crash immediately after 
>> the memory is destroyed, and it is recognized that the inside of 
>> Bind() can be identified as the memory destruction location from the call stack.
>> In the previous message, it was said that memory was destroyed 
>> between
>> Parse() and Execute(), and there was no node update in Execute().
>> The embedded part this time is the end of Bind() and the beginning of 
>> Execute(), so if it crashes in Bind(), it is recognized that memory 
>> destruction occurred somewhere in Bind() .
>> If it goes down in Execute(), we recognize that memory destruction 
>> occurred somewhere between Bind() completion and Execute() call.
>> After being able to reproduce with the custom version, if it goes 
>> down in Bind(), embed pool_has_function_call() calls everywhere in 
>> Bind() to identify the memory corruption location, and If it goes 
>> down in Execute(), is it correct to identify the location of memory 
>> corruption by embedding pool_has_function_call() calls everywhere from the completion of Bind() to the start of Execute()?
>> 
>> Regards,
>> 
>> Inagaki Tsuyoshi
>> tsuyoshi.inagaki.ej �� hitachi.com
>> 
>> 
>> -----Original Message-----
>> From: Tatsuo Ishii <ishii �� sraoss.co.jp>
>> Sent: Tuesday, December 6, 2022 2:38 PM
>> To: 稲垣毅 / INAGAKI,TSUYOSHI <tsuyoshi.inagaki.ej �� hitachi.com>
>> Cc: pgpool-hackers �� pgpool.net
>> Subject: [!]Re: [pgpool-hackers: 4229] segmentation fault error
>> 
>>>>Maybe you could insert "pool_has_function_call(node)" somewhere to see where the "node" memory was first >smashed.
>>> 
>>> I was understanding that by adding pool_has_function_call(node) between Parse() and Execute() I was able to catch the broken timings.
>>> However, since I do not know the full details of the execution location, could you tell me where to add it specifically?
>> 
>> For starters, in Execute() here:
>> 
>> 	query_context = bind_msg->query_context;
>> 	node = bind_msg->query_context->parse_tree;
>> 	query = bind_msg->query_context->original_query;
>> 
>> 	pool_has_function_call(node); <--- add
>> 
>> For bind() here (at the very end of bind()):
>> 
>> 	pool_has_function_call(query_context->parse_tree); <--- add
>> 
>> 	if (rewrite_msg)
>> 		pfree(rewrite_msg);
>> 	return POOL_CONTINUE;
>> }
>> 	
>> Best reagards,
>> --
>> Tatsuo Ishii
>> SRA OSS LLC
>> English: 
>> http://secure-web.cisco.com/1OV56I5WqRIdDvlXZNKgZXBE3hcMVz-taNptd9lrk
>> L 
>> BUS2vahFRGaMD8jjFuKJFX0-gfOFE2vaGEvwdBOABNIH4ZrlHnxy9oALNPN8FTzTGx-Aw
>> h 
>> mQuKQt5EbONTZkc0fKJdwc4hDFwojA1H6VrZburXfdfUexT9hwCm9UDykVLvtFZGDME2B
>> _ 
>> nggSzC5OwXYaTVAbjtwVTKF3h9mPY-I3Q1KKUd8WXEgEhWf6-q018WSqd1cdrhn7F3x5O
>> L 
>> vCMwj3TApiojUcNMGKhWgHZixddyGnNHjGDQEKQT4sRh0PiQGTXte1dkkq5zNeu8ootE8
>> H 0mUQxt0ikCVX3TpPob0XA/http%3A%2F%2Fwww.sraoss.co.jp%2Findex_en%2F
>> Japanese:http://secure-web.cisco.com/1uXKL5eIBEMfsnu4GEKfGh74Rqgx0dIe
>> 8 
>> aQjKKej6TlFgRQ-tlMWBF2N1bdvpzyEmRdcToe5zJQ3d2xSZbNNd2g6MTlAcdr6jRKui3
>> C 
>> arub3j_wouZPrCwdyEyZSqYHCL1VA3tGtu1KAfQs57IHx9lkBq4ibRfuWlWeH3FKkLiN6
>> Y 
>> 7Kom_aIqOhgbBiShGxJRWYZvDTNzHiDgXah5l2QE-7WJYg8OsO83AcL3vAUm_Bog6M-Dm
>> d 
>> BxHOojrBEfySPDdyl0VnC91F8c2IvthosORJgOiiS0UM1HTIiudRDmX7BJXSUHdVw-zB_
>> n IpWKtbfcVjyh_I4bNELi9pxJO2nyNw/http%3A%2F%2Fwww.sraoss.co.jp
>> 


More information about the pgpool-hackers mailing list