
Hi Bin,
Apologize for the delay. I came back from vacation a few days ago.
On Tue, Sep 04, 2018 at 12:00:14PM +0800, Bin Meng wrote:
Hi Eugeniu,
On Sat, Sep 1, 2018 at 6:59 PM Eugeniu Rosca roscaeugeniu@gmail.com wrote:
[..]
Just wanted to let you know that coreboot folks are going through similar discussions in [1]. Also, experimenting with various gcc versions and flags in my spare time, I collected some evidence [2] showing that the behavior of GCC UBSAN (-fsanitize=undefined & friends) may differ a lot depending on the gcc version and below flags (none used by U-Boot, but some used in Linux kernel): -fwrapv -fstrict-overflow -fno-strict-overflow
Checking how -fno-strict-overflow and -fwrapv compare to each other (since they seem to accomplish similar goals according to many sources), I've used the sample app from [3] to see how gcc handles signed integer wraparound depending on gcc version, flags, optimization level and on whether UBSAN is enabled or not. The variance/inconsistency of the results [4] is very high in my opinion.
One clear conclusion of [4] is that questions like why gcc UBSAN complains in U-Boot but not in the Kernel require knowing at least the parameters tracked in [4] (and maybe more).
[1] https://mail.coreboot.org/pipermail/coreboot/2018-February/086146.html [2] UBSAN behavior (printing 1 << 31) is highly dependent on gcc version and flags
[..]
Thank you very much for all these details and links. I learned a lot from this thread. It looks to me that coreboot folks are not in favor of this UBSAN thing. I personally have no preference, but I suspect there are a lot more in the U-Boot tree that have such warnings.
I don't think they question the usefulness of UBSAN as a whole. Rather, they seem to be annoyed by one particular (frequently reproduced) UBSAN error, specifically what gcc man pages refer to as "left-shifting 1 into the sign bit". Note that shifting *past* the sign bit is a different (presumably more serious) subclass of signed integer shift overflow. Both the coreboot discussion and the fixes from my series are limited to "left-shifting into (not past) the sign bit" behavior.
Both the C11 [6] standard (to which U-Boot "adhered" via commit [5]) and the later C18 [7] define the left shifts as follows (§6.5.7p4):
------8<------ The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2^E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined. ------8<------
With respect to the type of the result, both C11/C18 standards state in §6.5.7p3:
------8<------ The type of the result is that of the promoted left operand. ------8<------
My understanding of the above is that, purely from C11/C18 standard perspective, (1 << 31) is undefined behavior since the result can't be represented in the type of the left operand (signed int).
In spite of this, things are not as simple as we would like them to be and the DR463 entry of the "Defect Report Summary for C11" [8] tackles exactly the "Left-shifting into the sign bit" by saying that it should be harmonized with C++-14 [9]. The latter suffered a change in "§5.8.2 Shift operators" chapter due to DR1457 ("Undefined behavior in left-shift") [10], a defect report raised back in 2012. As a result, C++-14 now considers the "left-shifting into the sign bit" as defined behavior:
------8<------ ...if E1 has a signed type and non-negative value, and E1 ⨯ 2^E2 is representable in the **corresponding unsigned type of the result type**, then that value, **converted to the result type**, is the resulting value; otherwise, the behavior is undefined. ------8<------
To emphasize that the things are far from being settled, I will just reference the UB-related talk [11] of Chandler Carruth at CppCon 2016, which (amongst other topics) touches the left shifting of signed integers and, to my understanding, conveys the message that there are holes in the mental model of shifting signed integers to the left and the solution to overcome those is still unclear.
Given the behavior is quite dependent on GCC versions and flags, do kernel folks have any final solution on this?
I still didn't fully answer one of your first questions, more exactly why UBSAN complains about (1 << 31) in U-Boot, but not in Linux kernel. It turns out there is another gcc option heavily affecting the UBSAN behavior and it is (somewhat expectedly) the language standard "-std=" [12]. As already mentioned, U-Boot recently switched to C11 via commit [5], while Linux kernel still sticks to ANSI/C89 C via commit [13].
Just for the record, the definition of left shift operator provided by C89/C90 [14] is indeed different compared to C11/C18 and it sounds like:
------8<------ The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 multiplied by the quantity, 2 raised to the power E2, reduced modulo ULONG_MAX+1 if E1 has type unsigned long, UINT_MAX+1 otherwise. (The constants ULONG_MAX and UINT_MAX are defined in the header <limits.h> .) ------8<------
I am not sure if this makes left-shifting into the sign bit legitimate, but the reality is that none of the x86_64 gcc versions I tried (this includes 4.9.4, 5.5.0, 6.4.0, 7.3.0, 8.1.0) reported issues about (1 << 31) when turning UBSAN on with -std=gnu89. This is the main reason why Linux kernel UBSAN won't complain about (1 << 31).
On the basis of above experimental results, switching U-Boot build system back to C89 would both keep it aligned to Linux kernel and would consistently silent all the "left-shift 1 into sign bit" UBSAN errors (of which there are potentially hundreds, according to the stats presented in [15]). At the same time, this feels like a step back, so I actually expect people to NAK this change. It still remains the simplest I can think of (hence my favorite). It is also grounded on my belief that the changes in the Kconfig/Kbuild should come top-down to U-Boot from Linux kernel (where the actual development takes place).
Or do we have to ask GCC folks to fix these inconsistencies?
While trying to create an account in GCC Bugzilla, I got the message: "User account creation has been restricted" :(
Please, feel free to CC any member of GCC community. Any feedback/comments which direction to take would be appreciated.
[5] fa893990e9b5 ("Makefile: Ensure we build with -std=gnu11") [6] C11 draft: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf [7] C18 draft: http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf [8] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2148.htm#dr_463 [9] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf [10] http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1457 [11] https://youtu.be/yG1OZ69H_-o?t=1668 [12] https://gcc.gnu.org/onlinedocs/gcc/Standards.html [13] Linux v3.18-rc5 commit 51b97e354ba9 ("kernel: use the gnu89 standard explicitly") [14] https://port70.net/~nsz/c/c89/c89-draft.html#3.3.7 [15] http://patchwork.ozlabs.org/cover/962307/
Best regards, Eugeniu.