Re: [U-Boot] [PATCH v2 07/13] x86: Fix signed shift overflow in MSR_IA32_APICBASE_BASE

16 Sep 2018


      Hi Bin,
Apologize for the delay. I came back from vacation a few days ago.
On Tue, Sep 04, 2018 at 12:00:14PM +0800, Bin Meng wrote:
...
Hi Eugeniu,
On Sat, Sep 1, 2018 at 6:59 PM Eugeniu Rosca roscaeugeniu@gmail.com wrote:
[..]
...
...
Just wanted to let you know that coreboot folks are going through
similar discussions in [1]. Also, experimenting with various gcc
versions and flags in my spare time, I collected some evidence [2]
showing that the behavior of GCC UBSAN (-fsanitize=undefined &
friends) may differ a lot depending on the gcc version and below
flags (none used by U-Boot, but some used in Linux kernel):
 -fwrapv
 -fstrict-overflow
 -fno-strict-overflow
Checking how -fno-strict-overflow and -fwrapv compare to each other
(since they seem to accomplish similar goals according to many sources),
I've used the sample app from [3] to see how gcc handles signed integer
wraparound depending on gcc version, flags, optimization level and
on whether UBSAN is enabled or not. The variance/inconsistency of the
results [4] is very high in my opinion.
One clear conclusion of [4] is that questions like why gcc UBSAN
complains in U-Boot but not in the Kernel require knowing at least the
parameters  tracked in [4] (and maybe more).
[1] https://mail.coreboot.org/pipermail/coreboot/2018-February/086146.html
[2] UBSAN behavior (printing 1 << 31) is highly dependent on gcc version and flags
[..]
...
Thank you very much for all these details and links. I learned a lot
from this thread. It looks to me that coreboot folks are not in favor
of this UBSAN thing. I personally have no preference, but I suspect
there are a lot more in the U-Boot tree that have such warnings.
I don't think they question the usefulness of UBSAN as a whole. Rather,
they seem to be annoyed by one particular (frequently reproduced) UBSAN
error, specifically what gcc man pages refer to as "left-shifting 1 into
the sign bit". Note that shifting *past* the sign bit is a different
(presumably more serious) subclass of signed integer shift overflow.
Both the coreboot discussion and the fixes from my series are limited
to "left-shifting into (not past) the sign bit" behavior.
Both the C11 [6] standard (to which U-Boot "adhered" via commit [5]) and
the later C18 [7] define the left shifts as follows (§6.5.7p4):
------8<------
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits
are filled with zeros. If E1 has an unsigned type, the value of the result
is E1 × 2^E2, reduced modulo one more than the maximum value representable
in the result type. If E1 has a signed type and nonnegative value, and
E1 × 2^E2 is representable in the result type, then that is the resulting
value; otherwise, the behavior is undefined.
------8<------
With respect to the type of the result, both C11/C18 standards state
in §6.5.7p3:
------8<------
The type of the result is that of the promoted left operand.
------8<------
My understanding of the above is that, purely from C11/C18 standard
perspective, (1 << 31) is undefined behavior since the result can't be
represented in the type of the left operand (signed int).
In spite of this, things are not as simple as we would like them to be
and the DR463 entry of the "Defect Report Summary for C11" [8] tackles
exactly the "Left-shifting into the sign bit" by saying that it should
be harmonized with C++-14 [9]. The latter suffered a change in
"§5.8.2 Shift operators" chapter due to DR1457 ("Undefined behavior in
left-shift") [10], a defect report raised back in 2012. As a result,
C++-14 now considers the "left-shifting into the sign bit" as defined
behavior:
------8<------
...if E1 has a signed type and non-negative value, and E1 ⨯ 2^E2 is
representable in the **corresponding unsigned type of the result type**,
then that value, **converted to the result type**, is the resulting
value; otherwise, the behavior is undefined.
------8<------
To emphasize that the things are far from being settled, I will just
reference the UB-related talk [11] of Chandler Carruth at CppCon 2016,
which (amongst other topics) touches the left shifting of signed
integers and, to my understanding, conveys the message that there are
holes in the mental model of shifting signed integers to the left and
the solution to overcome those is still unclear.
...
Given
the behavior is quite dependent on GCC versions and flags, do kernel
folks have any final solution on this?
I still didn't fully answer one of your first questions, more exactly
why UBSAN complains about (1 << 31) in U-Boot, but not in Linux kernel.
It turns out there is another gcc option heavily affecting the UBSAN
behavior and it is (somewhat expectedly) the language standard
"-std=" [12]. As already mentioned, U-Boot recently switched to C11
via commit [5], while Linux kernel still sticks to ANSI/C89 C
via commit [13].
Just for the record, the definition of left shift operator provided
by C89/C90 [14] is indeed different compared to C11/C18 and it sounds
like:
------8<------
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated
bits are filled with zeros. If E1 has an unsigned type, the value of
the result is E1 multiplied by the quantity, 2 raised to the power E2,
reduced modulo ULONG_MAX+1 if E1 has type unsigned long, UINT_MAX+1
otherwise. (The constants ULONG_MAX and UINT_MAX are defined in the
header <limits.h> .)
------8<------
I am not sure if this makes left-shifting into the sign bit legitimate,
but the reality is that none of the x86_64 gcc versions I tried (this
includes 4.9.4, 5.5.0, 6.4.0, 7.3.0, 8.1.0) reported issues about
(1 << 31) when turning UBSAN on with -std=gnu89. This is the main
reason why Linux kernel UBSAN won't complain about (1 << 31).
On the basis of above experimental results, switching U-Boot build
system back to C89 would both keep it aligned to Linux kernel and
would consistently silent all the "left-shift 1 into sign bit" UBSAN
errors (of which there are potentially hundreds, according to the stats
presented in [15]). At the same time, this feels like a step back, so
I actually expect people to NAK this change. It still remains the
simplest I can think of (hence my favorite). It is also grounded
on my belief that the changes in the Kconfig/Kbuild should come top-down
to U-Boot from Linux kernel (where the actual development takes place).
...
Or do we have to ask GCC folks to fix these inconsistencies?
While trying to create an account in GCC Bugzilla, I got the message:
"User account creation has been restricted" :(
Please, feel free to CC any member of GCC community.
Any feedback/comments which direction to take would be appreciated.
[5] fa893990e9b5 ("Makefile: Ensure we build with -std=gnu11")
[6] C11 draft: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
[7] C18 draft: http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf
[8] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2148.htm#dr_463
[9] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf
[10] http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1457
[11] https://youtu.be/yG1OZ69H_-o?t=1668
[12] https://gcc.gnu.org/onlinedocs/gcc/Standards.html
[13] Linux v3.18-rc5 commit 51b97e354ba9 ("kernel: use the gnu89 standard explicitly")
[14] https://port70.net/~nsz/c/c89/c89-draft.html#3.3.7
[15] http://patchwork.ozlabs.org/cover/962307/
Best regards,
Eugeniu.