[PATCH] lib: fix buggy strcmp and strncmp

There are two problems with both strcmp and strncmp:
(1) The C standard is clear that the contents should be compared as "unsigned char":
The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp is determined by the sign of the difference between the values of the first pair of characters (both interpreted as unsigned char) that differ in the objects being compared.
(2) The difference between two char (or unsigned char) values can range from -255 to +255; so that's (due to integer promotion) the range of values we could get in the *cs-*ct expressions, but when that is then shoe-horned into an 8-bit quantity the sign may of course change.
The impact is somewhat limited by the way these functions are used in practice:
- Most of the time, one is only interested in equality (or for strncmp, "starts with"), and the existing functions do correctly return 0 if and only if the strings are equal [for strncmp, up to the given bound].
- Also most of the time, the strings being compared only consist of ASCII characters, i.e. have values in the range [0, 127], and in that case it doesn't matter if they are interpreted as signed or unsigned char, and the possible difference range is bounded to [-127, 127] which does fit the signed char.
For size, one could implement strcmp() in terms of strncmp() - just make it "return strncmp(a, b, (size_t)-1);". However, performance of strcmp() does matter somewhat, since it is used all over when parsing and matching DT nodes and properties, so let's find some other place to save those ~30 bytes.
Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk ---
Please double- and triple-check before applying. I've tested these against my libc's strcmp() and strncmp() for thousands of random strings, but I may very well have messed up when copy-pasting back to lib/string.c.
lib/string.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-)
diff --git a/lib/string.c b/lib/string.c index 78bd65c413..ecea755f40 100644 --- a/lib/string.c +++ b/lib/string.c @@ -206,16 +206,20 @@ size_t strlcat(char *dest, const char *src, size_t size) * @cs: One string * @ct: Another string */ -int strcmp(const char * cs,const char * ct) +int strcmp(const char *cs, const char *ct) { - register signed char __res; + int ret;
while (1) { - if ((__res = *cs - *ct++) != 0 || !*cs++) + unsigned char a = *cs++; + unsigned char b = *ct++; + + ret = a - b; + if (ret || !b) break; }
- return __res; + return ret; } #endif
@@ -226,17 +230,20 @@ int strcmp(const char * cs,const char * ct) * @ct: Another string * @count: The maximum number of bytes to compare */ -int strncmp(const char * cs,const char * ct,size_t count) +int strncmp(const char *cs, const char *ct, size_t count) { - register signed char __res = 0; + int ret = 0; + + while (count--) { + unsigned char a = *cs++; + unsigned char b = *ct++;
- while (count) { - if ((__res = *cs - *ct++) != 0 || !*cs++) + ret = a - b; + if (ret || !b) break; - count--; }
- return __res; + return ret; } #endif

On Wed, Oct 05, 2022 at 11:09:25AM +0200, Rasmus Villemoes wrote:
There are two problems with both strcmp and strncmp:
(1) The C standard is clear that the contents should be compared as "unsigned char":
The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp is determined by the sign of the difference between the values of the first pair of characters (both interpreted as unsigned char) that differ in the objects being compared.
(2) The difference between two char (or unsigned char) values can range from -255 to +255; so that's (due to integer promotion) the range of values we could get in the *cs-*ct expressions, but when that is then shoe-horned into an 8-bit quantity the sign may of course change.
The impact is somewhat limited by the way these functions are used in practice:
Most of the time, one is only interested in equality (or for strncmp, "starts with"), and the existing functions do correctly return 0 if and only if the strings are equal [for strncmp, up to the given bound].
Also most of the time, the strings being compared only consist of ASCII characters, i.e. have values in the range [0, 127], and in that case it doesn't matter if they are interpreted as signed or unsigned char, and the possible difference range is bounded to [-127, 127] which does fit the signed char.
For size, one could implement strcmp() in terms of strncmp() - just make it "return strncmp(a, b, (size_t)-1);". However, performance of strcmp() does matter somewhat, since it is used all over when parsing and matching DT nodes and properties, so let's find some other place to save those ~30 bytes.
Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk
Applied to u-boot/master, thanks!
participants (2)
-
Rasmus Villemoes
-
Tom Rini