View Revisions: Issue #443

Summary 0000443: Segmentation fault occurs when a certain Bind message is sent in native replication mode.
Revision 2018-11-05 17:31 by nagata
Description One of our clients reported that segmentation fault occurs with a specific query including CURRENT_DATE in native replication mode.

Through my analysis of core file, I found that this occurs in bind_rewrite_timestamp().

#0 0x0000003ab50899d4 in memcpy () at ../sysdeps/x86_64/memcpy.S:444
0000001 0x000000000044588a in bind_rewrite_timestamp (backend=<value optimized out>, message=0xaa0398, orig_msg=<v
alue optimized out>, len=<value optimized out>)
    at /usr/include/bits/string3.h:52
0000002 0x00000000004364f9 in Bind (frontend=0xa9a258, backend=0xa7f350, len=70, contents=0xaa0688 "") at protocol
/pool_proto_modules.c:1343
0000003 0x0000000000436ce7 in ProcessFrontendResponse (frontend=0xa9a258, backend=0xa7f350) at protocol/pool_proto
_modules.c:2396
...

We can reproduce this by a query and pgproto:

 'P' "P1" "DELETE FROM test2 WHERE d = CURRENT_DATE" 0
 'D' 'S' "P1"
 'B' "" "P1" 1 1 0 0
 'E' "" 0
 'C' 'S' ""
 'S'
 'Y'
 'X'

The problem is the Bind ('B') message. The number of parameter format codes is specified to one. This means that the specified format code (this is also one (=binary) in this example) is applied to all parameters. Although the number of the original query's parameter is zero, this message is allowed in the protocol. However, this causes bind_rewrite_timestamp() to call memcpy with a negative value for size_t, because this doesn't suppose the number of parameter format codes is larger than the actual number of the parameter in the original query.

I attached a patch to handle this case properly. Also, some comments and debug codes are added.
Revision 2018-11-05 17:29 by nagata
Description One of our clients report that segmentation fault occurs with a specific query including CURRENT_DATE in native replication mode.

Through my analysis of core file, I found that this occurs in bind_rewrite_timestamp().

#0 0x0000003ab50899d4 in memcpy () at ../sysdeps/x86_64/memcpy.S:444
0000001 0x000000000044588a in bind_rewrite_timestamp (backend=<value optimized out>, message=0xaa0398, orig_msg=<v
alue optimized out>, len=<value optimized out>)
    at /usr/include/bits/string3.h:52
0000002 0x00000000004364f9 in Bind (frontend=0xa9a258, backend=0xa7f350, len=70, contents=0xaa0688 "") at protocol
/pool_proto_modules.c:1343
0000003 0x0000000000436ce7 in ProcessFrontendResponse (frontend=0xa9a258, backend=0xa7f350) at protocol/pool_proto
_modules.c:2396
...

We can reproduce this by a query and pgproto:

 'P' "P1" "DELETE FROM test2 WHERE CAST(d As Date) = CURRENT_DATE" 0
 'D' 'S' "P1"
 'B' "" "P1" 1 1 0 0
 'E' "" 0
 'C' 'S' ""
 'S'
 'Y'
 'X'

The problem is the Bind ('B') message. The number of parameter format codes is specified to one. This means that the specified format code (this is also one (=binary) in this example) is applied to all parameters. Although the number of the original query's parameter is zero, this message is allowed in the protocol. However, this causes bind_rewrite_timestamp() to call memcpy with a negative value for size_t, because this doesn't suppose the number of parameter format codes is larger than the actual number of the parameter in the original query.

I attached a patch to handle this case properly. Also, some comments and debug codes are added.