[U-Boot] [PATCH] crc32: Correct endianness of crc32 result

When crc32 is handled by the hash library, it requires the data to be in big-endian format, since it reads it byte-wise. Thus at present the 'crc32' command reports incorrect data. For example, previously we might see:
Peach # crc32 40000000 100 CRC32 for 40000000 ... 400000ff ==> 0d968558
but instead with the hash library we see:
Peach # crc32 40000000 100 CRC32 for 40000000 ... 400000ff ==> 5885960d
Correct this.
Signed-off-by: Simon Glass sjg@chromium.org Reviewed-by: Vadim Bendebury vbendeb@google.com --- lib/crc32.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/lib/crc32.c b/lib/crc32.c index 76205da..94720bf 100644 --- a/lib/crc32.c +++ b/lib/crc32.c @@ -10,6 +10,7 @@
#ifndef USE_HOSTCC #include <common.h> +#include <asm/unaligned.h> #endif #include <compiler.h> #include <u-boot/crc.h> @@ -256,5 +257,10 @@ void crc32_wd_buf(const unsigned char *input, unsigned int ilen, uint32_t crc;
crc = crc32_wd(0, input, ilen, chunk_sz); +#ifdef USE_HOSTCC + crc = htobe32(crc); memcpy(output, &crc, sizeof(crc)); +#else + put_unaligned_be32(crc, output); +#endif }

On Fri, Apr 05, 2013 at 04:11:10PM -0700, Simon Glass wrote:
When crc32 is handled by the hash library, it requires the data to be in big-endian format, since it reads it byte-wise. Thus at present the 'crc32' command reports incorrect data. For example, previously we might see:
Peach # crc32 40000000 100 CRC32 for 40000000 ... 400000ff ==> 0d968558
but instead with the hash library we see:
Peach # crc32 40000000 100 CRC32 for 40000000 ... 400000ff ==> 5885960d
Correct this.
Signed-off-by: Simon Glass sjg@chromium.org Reviewed-by: Vadim Bendebury vbendeb@google.com
lib/crc32.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/lib/crc32.c b/lib/crc32.c index 76205da..94720bf 100644 --- a/lib/crc32.c +++ b/lib/crc32.c @@ -10,6 +10,7 @@
#ifndef USE_HOSTCC #include <common.h> +#include <asm/unaligned.h> #endif #include <compiler.h> #include <u-boot/crc.h> @@ -256,5 +257,10 @@ void crc32_wd_buf(const unsigned char *input, unsigned int ilen, uint32_t crc;
crc = crc32_wd(0, input, ilen, chunk_sz); +#ifdef USE_HOSTCC
- crc = htobe32(crc); memcpy(output, &crc, sizeof(crc));
+#else
- put_unaligned_be32(crc, output);
+#endif } -- 1.8.1.3
Tested on Tegra114 Dalmore, verified crc32 comes out as expected now.
Reviewed-by: Allen Martin amartin@nvidia.com Tested-by: Allen Martin amartin@nvidia.com

Dear Simon Glass,
In message 1365203470-9099-1-git-send-email-sjg@chromium.org you wrote:
When crc32 is handled by the hash library, it requires the data to be in big-endian format, since it reads it byte-wise. Thus at present the 'crc32' command reports incorrect data. For example, previously we might see:
+#ifdef USE_HOSTCC
- crc = htobe32(crc); memcpy(output, &crc, sizeof(crc));
+#else
- put_unaligned_be32(crc, output);
+#endif
Why is this depending on USE_HOSTCC, and not on the endianess?
And why do we need the #ifdef? Can we not always use htobe32() and put_unaligned_be32() ?
Best regards,
Wolfgang Denk

Hi Wolfgang,
On Sat, Apr 6, 2013 at 12:04 AM, Wolfgang Denk wd@denx.de wrote:
Dear Simon Glass,
In message 1365203470-9099-1-git-send-email-sjg@chromium.org you wrote:
When crc32 is handled by the hash library, it requires the data to be in big-endian format, since it reads it byte-wise. Thus at present the 'crc32' command reports incorrect data. For example, previously we might see:
+#ifdef USE_HOSTCC
crc = htobe32(crc); memcpy(output, &crc, sizeof(crc));
+#else
put_unaligned_be32(crc, output);
+#endif
Why is this depending on USE_HOSTCC, and not on the endianess?
We always want big-endian in this case, since the bytes have to date been written that way by the crc32 command.
And why do we need the #ifdef? Can we not always use htobe32() and put_unaligned_be32() ?
Well I don't think put_unaligned_be32 is available to user space, which is the environment that the tools are built under. It is available in the kernel, but that's not our environment.
I'm not happy with this solution and would be pleased to find a better way, but I'm not sure what it is.
But this patch does fix a real bug which we should sort out before the release, one way or another.
Best regards,
Wolfgang Denk
Regards, Simon

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 04/16/2013 05:57 PM, Simon Glass wrote:
Hi Wolfgang,
On Sat, Apr 6, 2013 at 12:04 AM, Wolfgang Denk wd@denx.de wrote:
Dear Simon Glass,
In message 1365203470-9099-1-git-send-email-sjg@chromium.org you wrote:
When crc32 is handled by the hash library, it requires the data to be in big-endian format, since it reads it byte-wise. Thus at present the 'crc32' command reports incorrect data. For example, previously we might see:
+#ifdef USE_HOSTCC + crc = htobe32(crc); memcpy(output, &crc, sizeof(crc)); +#else + put_unaligned_be32(crc, output); +#endif
Why is this depending on USE_HOSTCC, and not on the endianess?
We always want big-endian in this case, since the bytes have to date been written that way by the crc32 command.
In other words, this code is executed on both the host and the target, and there's not a uniform endian sanitizer function.
- -- Tom

Hi Tom,
On Tue, Apr 16, 2013 at 4:00 PM, Tom Rini trini@ti.com wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 04/16/2013 05:57 PM, Simon Glass wrote:
Hi Wolfgang,
On Sat, Apr 6, 2013 at 12:04 AM, Wolfgang Denk wd@denx.de wrote:
Dear Simon Glass,
In message 1365203470-9099-1-git-send-email-sjg@chromium.org you wrote:
When crc32 is handled by the hash library, it requires the data to be in big-endian format, since it reads it byte-wise. Thus at present the 'crc32' command reports incorrect data. For example, previously we might see:
+#ifdef USE_HOSTCC + crc = htobe32(crc); memcpy(output, &crc, sizeof(crc)); +#else + put_unaligned_be32(crc, output); +#endif
Why is this depending on USE_HOSTCC, and not on the endianess?
We always want big-endian in this case, since the bytes have to date been written that way by the crc32 command.
In other words, this code is executed on both the host and the target, and there's not a uniform endian sanitizer function.
Well to be honest, I don't think the code is needed on the host, and I could put the #ifdef around the whole function. I just thought that might be a bit short-sighted.
Regards, Simon

Dear Simon Glass,
In message CAPnjgZ2jRVpQ56_EpVKUMH8E1L2LJ0HgzNCs-GrN9bXfOSXz+Q@mail.gmail.com you wrote:
+#ifdef USE_HOSTCC
crc = htobe32(crc); memcpy(output, &crc, sizeof(crc));
+#else
put_unaligned_be32(crc, output);
+#endif
Why is this depending on USE_HOSTCC, and not on the endianess?
We always want big-endian in this case, since the bytes have to date been written that way by the crc32 command.
Let me rephrase. Why do we need an #ifdef here, and why depends this on USE_HOSTCC?
And why do we need the #ifdef? Can we not always use htobe32() and put_unaligned_be32() ?
Well I don't think put_unaligned_be32 is available to user space, which is the environment that the tools are built under. It is available in the kernel, but that's not our environment.
It appears put_unaligned_be32() is a widely unknown, pretty exotic function that so far has been used in ony two source files:
drivers/usb/gadget/f_mass_storage.c lib/tpm.c
The implementation (in "include/linux/unaligned/generic.h") is ugly and pretty expensive in terms of run time and memory footprint.
I would like to avoid it's usage all together.
I'm not happy with this solution and would be pleased to find a better way, but I'm not sure what it is.
Does the htobe32() + memcpy() approach not work everywhere?
But this patch does fix a real bug which we should sort out before the release, one way or another.
Agreed.
Best regards,
Wolfgang Denk

Hi Wolfgang,
On Tue, Apr 16, 2013 at 10:40 PM, Wolfgang Denk wd@denx.de wrote:
Dear Simon Glass,
In message CAPnjgZ2jRVpQ56_EpVKUMH8E1L2LJ0HgzNCs-GrN9bXfOSXz+Q@mail.gmail.com you wrote:
+#ifdef USE_HOSTCC
crc = htobe32(crc); memcpy(output, &crc, sizeof(crc));
+#else
put_unaligned_be32(crc, output);
+#endif
Why is this depending on USE_HOSTCC, and not on the endianess?
We always want big-endian in this case, since the bytes have to date been written that way by the crc32 command.
Let me rephrase. Why do we need an #ifdef here, and why depends this on USE_HOSTCC?
And why do we need the #ifdef? Can we not always use htobe32() and put_unaligned_be32() ?
Well I don't think put_unaligned_be32 is available to user space, which is the environment that the tools are built under. It is available in the kernel, but that's not our environment.
It appears put_unaligned_be32() is a widely unknown, pretty exotic function that so far has been used in ony two source files:
drivers/usb/gadget/f_mass_storage.c lib/tpm.c
The implementation (in "include/linux/unaligned/generic.h") is ugly and pretty expensive in terms of run time and memory footprint.
I would like to avoid it's usage all together.
It ends up producing this on ARMv7:
43e2c1d8: e1a03820 lsr r3, r0, #16 43e2c1dc: e5c40003 strb r0, [r4, #3] 43e2c1e0: e5c43001 strb r3, [r4, #1] 43e2c1e4: e1a02423 lsr r2, r3, #8 43e2c1e8: e7e73450 ubfx r3, r0, #8, #8 43e2c1ec: e5c42000 strb r2, [r4] 43e2c1f0: e5c43002 strb r3, [r4, #2]
In fact with unaligned support we could optimise this.
I'm not happy with this solution and would be pleased to find a better way, but I'm not sure what it is.
Does the htobe32() + memcpy() approach not work everywhere?
But htobe32() isn't available in U-Boot.
I tried using:
#ifdef USE_HOSTCC crc = htobe32(crc); #else crc = cpu_to_be32(crc); #endif memcpy(output, &crc, sizeof(crc));
This is one instruction (4 bytes, 16%) smaller, but I suspect quite a lot slower due to the overhead of a very small memcpy().
43e2c1d8: e28d1008 add r1, sp, #8 43e2c1dc: e3a02004 mov r2, #4 43e2c1e0: e6bf0f30 rev r0, r0 43e2c1e4: e5210004 str r0, [r1, #-4]! 43e2c1e8: e1a00004 mov r0, r4 43e2c1ec: eb001af7 bl 43e32dd0 <memcpy>
But this patch does fix a real bug which we should sort out before the release, one way or another.
Agreed.
Let me know what you prefer and I will update the patch.
Regards, Simon

Hi Simon -- and sorry for the dupe.
On Wed, 17 Apr 2013 11:28:07 -0700, Simon Glass sjg@chromium.org wrote:
I tried using:
#ifdef USE_HOSTCC crc = htobe32(crc); #else crc = cpu_to_be32(crc); #endif memcpy(output, &crc, sizeof(crc));
This is one instruction (4 bytes, 16%) smaller, but I suspect quite a lot slower due to the overhead of a very small memcpy().
43e2c1d8: e28d1008 add r1, sp, #8 43e2c1dc: e3a02004 mov r2, #4 43e2c1e0: e6bf0f30 rev r0, r0 43e2c1e4: e5210004 str r0, [r1, #-4]! 43e2c1e8: e1a00004 mov r0, r4 43e2c1ec: eb001af7 bl 43e32dd0 <memcpy>
How about replacing the memcpy with an explicit put_unaligned(), similar to what was done in
http://www.mail-archive.com/u-boot@lists.denx.de/msg109555.html
with get_unaligned()? The code will be longer than above, but shorter than the above plus the memcpy(), and faster too -- actually, I'm surprised that the compiler does not unroll the memcpy() on its own, considering the size argument is a constant.
Regards, Simon
Amicalement,

Hi Albert,
On Wed, Apr 17, 2013 at 12:23 PM, Albert ARIBAUD albert.u.boot@aribaud.net wrote:
Hi Simon -- and sorry for the dupe.
On Wed, 17 Apr 2013 11:28:07 -0700, Simon Glass sjg@chromium.org wrote:
I tried using:
#ifdef USE_HOSTCC crc = htobe32(crc); #else crc = cpu_to_be32(crc); #endif memcpy(output, &crc, sizeof(crc));
This is one instruction (4 bytes, 16%) smaller, but I suspect quite a lot slower due to the overhead of a very small memcpy().
43e2c1d8: e28d1008 add r1, sp, #8 43e2c1dc: e3a02004 mov r2, #4 43e2c1e0: e6bf0f30 rev r0, r0 43e2c1e4: e5210004 str r0, [r1, #-4]! 43e2c1e8: e1a00004 mov r0, r4 43e2c1ec: eb001af7 bl 43e32dd0 <memcpy>
How about replacing the memcpy with an explicit put_unaligned(), similar to what was done in
http://www.mail-archive.com/u-boot@lists.denx.de/msg109555.html
with get_unaligned()? The code will be longer than above, but shorter than the above plus the memcpy(), and faster too -- actually, I'm surprised that the compiler does not unroll the memcpy() on its own, considering the size argument is a constant.
Do you mean like this?
#ifdef USE_HOSTCC crc = htobe32(crc); memcpy(output, &crc, sizeof(crc)); #else crc = cpu_to_be32(crc); put_unaligned(crc, (uint32_t *)output); #endif
This produces the same code as my original patch. If this is acceptable then I will do that, although it doesn't really seem any better.
Regards, Simon
[and sorry for my dup]

Hi Simon,
On Wed, 17 Apr 2013 13:59:48 -0700, Simon Glass sjg@chromium.org wrote:
Hi Albert,
On Wed, Apr 17, 2013 at 12:23 PM, Albert ARIBAUD albert.u.boot@aribaud.net wrote:
Hi Simon -- and sorry for the dupe.
On Wed, 17 Apr 2013 11:28:07 -0700, Simon Glass sjg@chromium.org wrote:
I tried using:
#ifdef USE_HOSTCC crc = htobe32(crc); #else crc = cpu_to_be32(crc); #endif memcpy(output, &crc, sizeof(crc));
This is one instruction (4 bytes, 16%) smaller, but I suspect quite a lot slower due to the overhead of a very small memcpy().
43e2c1d8: e28d1008 add r1, sp, #8 43e2c1dc: e3a02004 mov r2, #4 43e2c1e0: e6bf0f30 rev r0, r0 43e2c1e4: e5210004 str r0, [r1, #-4]! 43e2c1e8: e1a00004 mov r0, r4 43e2c1ec: eb001af7 bl 43e32dd0 <memcpy>
How about replacing the memcpy with an explicit put_unaligned(), similar to what was done in
http://www.mail-archive.com/u-boot@lists.denx.de/msg109555.html
with get_unaligned()? The code will be longer than above, but shorter than the above plus the memcpy(), and faster too -- actually, I'm surprised that the compiler does not unroll the memcpy() on its own, considering the size argument is a constant.
Do you mean like this?
#ifdef USE_HOSTCC crc = htobe32(crc); memcpy(output, &crc, sizeof(crc)); #else crc = cpu_to_be32(crc); put_unaligned(crc, (uint32_t *)output); #endif
This produces the same code as my original patch. If this is acceptable then I will do that, although it doesn't really seem any better.
Wolfgang may not like it any more than put_unaligned_be32() as it builds upon it (and the disk patch could have used the be32 version as well). Personally, I think we should allow and use these...
... and work on optimizing their implementation for ARM; we should be able to reduce such put()s and get()s to a few instructions. And to avoid any misunderstanding, yes, I volunteer for the optimizing work. :)
Regards, Simon
[and sorry for my dup]
Actually I'm the culprit.
Amicalement,

Dear Albert ARIBAUD,
In message 20130418082027.4b5ea191@lilith you wrote:
#ifdef USE_HOSTCC crc = htobe32(crc); memcpy(output, &crc, sizeof(crc)); #else crc = cpu_to_be32(crc); put_unaligned(crc, (uint32_t *)output); #endif
This produces the same code as my original patch. If this is acceptable then I will do that, although it doesn't really seem any better.
Wolfgang may not like it any more than put_unaligned_be32() as it builds upon it (and the disk patch could have used the be32 version as
Indeed.
well). Personally, I think we should allow and use these...
... and work on optimizing their implementation for ARM; we should be able to reduce such put()s and get()s to a few instructions. And to
OK - and what about the other architectures that suffer from the same issues?
avoid any misunderstanding, yes, I volunteer for the optimizing work. :)
I really dislike introducing such custom functions when we have standard functions available that implement the same purposes.
Checking Linux code (as U-Boot is not representative here):
-> find * -name '*.c' | wc -l 18362 -> find * -name '*.c' | xargs fgrep -l put_unaligned | wc -l 136
i. e. just 0.75% of the source files actually use any of the "put_unaligned*()" variants - it is a highly exotic function.
htobe32() is even worse - just a single source file in the whole Lnux tree uses it (arch/um/drivers/cow_user.c).
Can we not stick to standard functions? Instead of htobe32() we should use htonl() which is available both in U-Boot and in the host environment.
Best regards,
Wolfgang Denk

Hi Wolfgang,
On Thu, 18 Apr 2013 12:36:00 +0200, Wolfgang Denk wd@denx.de wrote:
Dear Albert ARIBAUD,
In message 20130418082027.4b5ea191@lilith you wrote:
#ifdef USE_HOSTCC crc = htobe32(crc); memcpy(output, &crc, sizeof(crc)); #else crc = cpu_to_be32(crc); put_unaligned(crc, (uint32_t *)output); #endif
This produces the same code as my original patch. If this is acceptable then I will do that, although it doesn't really seem any better.
Wolfgang may not like it any more than put_unaligned_be32() as it builds upon it (and the disk patch could have used the be32 version as
Indeed.
well). Personally, I think we should allow and use these...
... and work on optimizing their implementation for ARM; we should be able to reduce such put()s and get()s to a few instructions. And to
OK - and what about the other architectures that suffer from the same issues?
They should/could provide their own optimized versions, obviously.
avoid any misunderstanding, yes, I volunteer for the optimizing work. :)
I really dislike introducing such custom functions when we have standard functions available that implement the same purposes.
I understand the point; this is a classical opposition of generic vs optimized.
Checking Linux code (as U-Boot is not representative here):
-> find * -name '*.c' | wc -l 18362 -> find * -name '*.c' | xargs fgrep -l put_unaligned | wc -l 136
i. e. just 0.75% of the source files actually use any of the "put_unaligned*()" variants - it is a highly exotic function.
htobe32() is even worse - just a single source file in the whole Lnux tree uses it (arch/um/drivers/cow_user.c).
Can we not stick to standard functions? Instead of htobe32() we should use htonl() which is available both in U-Boot and in the host environment.
Note that there are two problems here: endianness conversion and unaligned storage. We can indeed use htoln() instead of htobe32(), but that only affects/solved endianness issues, not alignment ones, for which there is no solution that is efficient across all ARM targets.
Note too that if the hash infrastructure mandated that the output buffer be 4-byte-aligned, i.e. a u32* or similar, we would not even have to worry about unalignment; such a constraint on the output buffer makes all the more sense if we consider the fact that current hash sizes are 4 (crc32), 20 (SHA1) and 32 (SHA256), all multiples of 4, and future hashes will most certainly not be less aligned.
So how about changing the element type of output in the definition of hash_func_ws, adapting the corresponding implementations sha1_csum_wd, sha256_csum_wd and crc32_wd_buf, and adapting the output argument of the sole call to hash_func_ws, that is, the local "u8 output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with alignment.
Note finally that we *need* unaligned access macros/inline functions/whatever, if only for exceptions laid out in doc/README.arm-unaligned-accesses. If we avoid calling them too often, we can live with generic.
Best regards,
Wolfgang Denk
Amicalement,

Dear Albert,
In message 20130418131810.38c916b8@lilith you wrote:
OK - and what about the other architectures that suffer from the same issues?
They should/could provide their own optimized versions, obviously.
Well, if this was a generally needed or even useful service, that might make sense. But it is highly exotic stuff...
I really dislike introducing such custom functions when we have standard functions available that implement the same purposes.
I understand the point; this is a classical opposition of generic vs optimized.
Not really. It is more opposition against premature optimization at the cost of non-standard code.
Is there any justification that we really need hightly optimized versus generic code here? Is this used anywhere in a place where performance is critical?
Can we not stick to standard functions? Instead of htobe32() we should use htonl() which is available both in U-Boot and in the host environment.
Note that there are two problems here: endianness conversion and unaligned storage. We can indeed use htoln() instead of htobe32(), but that only affects/solved endianness issues, not alignment ones, for which there is no solution that is efficient across all ARM targets.
Correct. But the htobe32() approach was cobining this with the generic memcpy(), too, so we would do the same with htoln().
So how about changing the element type of output in the definition of hash_func_ws, adapting the corresponding implementations sha1_csum_wd, sha256_csum_wd and crc32_wd_buf, and adapting the output argument of the sole call to hash_func_ws, that is, the local "u8 output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with alignment.
OK, but that would be a totally different approach (which appears to be a good one, here).
Note finally that we *need* unaligned access macros/inline functions/whatever, if only for exceptions laid out in doc/README.arm-unaligned-accesses. If we avoid calling them too often, we can live with generic.
Agreed; the focus should always be on avoidance, though.
Best regards,
Wolfgang Denk

Hi Wolfgang,
On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk wd@denx.de wrote:
So how about changing the element type of output in the definition of hash_func_ws, adapting the corresponding implementations sha1_csum_wd, sha256_csum_wd and crc32_wd_buf, and adapting the output argument of the sole call to hash_func_ws, that is, the local "u8 output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with alignment.
OK, but that would be a totally different approach (which appears to be a good one, here).
Indeed; I would suggest splitting this change in two independent ones:
- fix the endianness issue: change the endianness of crc through the use of htonl() but leave the existing memcpy() in place as it is, even though it is not speed-optimized. That's what Simon's patch does if the HOSTCC part is ignored;
- fix the unalignment issue by changing parameter 'output' of function type 'hash_func_ws' from u8* to u32* and adjusting the rest of the code accordingly -- which would lead to replacing the crc32 final memcpy() with a single indirect assignment.
These two changes could be submitted either separately, or as a series.
Amicalement,

On Thu, Apr 18, 2013 at 06:39:29PM +0200, Albert ARIBAUD wrote:
Hi Wolfgang,
On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk wd@denx.de wrote:
So how about changing the element type of output in the definition of hash_func_ws, adapting the corresponding implementations sha1_csum_wd, sha256_csum_wd and crc32_wd_buf, and adapting the output argument of the sole call to hash_func_ws, that is, the local "u8 output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with alignment.
OK, but that would be a totally different approach (which appears to be a good one, here).
Indeed; I would suggest splitting this change in two independent ones:
fix the endianness issue: change the endianness of crc through the use of htonl() but leave the existing memcpy() in place as it is, even though it is not speed-optimized. That's what Simon's patch does if the HOSTCC part is ignored;
fix the unalignment issue by changing parameter 'output' of function type 'hash_func_ws' from u8* to u32* and adjusting the rest of the code accordingly -- which would lead to replacing the crc32 final memcpy() with a single indirect assignment.
These two changes could be submitted either separately, or as a series.
Now so that I'm clear, if we don't do anything about the unaligned issue today, it's just slow, not an unaligned access that leads to abort, right? So part one today for release, part two next week after release.

Hi Tom,
On Thu, 18 Apr 2013 12:58:54 -0400, Tom Rini trini@ti.com wrote:
On Thu, Apr 18, 2013 at 06:39:29PM +0200, Albert ARIBAUD wrote:
Hi Wolfgang,
On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk wd@denx.de wrote:
So how about changing the element type of output in the definition of hash_func_ws, adapting the corresponding implementations sha1_csum_wd, sha256_csum_wd and crc32_wd_buf, and adapting the output argument of the sole call to hash_func_ws, that is, the local "u8 output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with alignment.
OK, but that would be a totally different approach (which appears to be a good one, here).
Indeed; I would suggest splitting this change in two independent ones:
fix the endianness issue: change the endianness of crc through the use of htonl() but leave the existing memcpy() in place as it is, even though it is not speed-optimized. That's what Simon's patch does if the HOSTCC part is ignored;
fix the unalignment issue by changing parameter 'output' of function type 'hash_func_ws' from u8* to u32* and adjusting the rest of the code accordingly -- which would lead to replacing the crc32 final memcpy() with a single indirect assignment.
These two changes could be submitted either separately, or as a series.
Now so that I'm clear, if we don't do anything about the unaligned issue today, it's just slow, not an unaligned access that leads to abort, right? So part one today for release, part two next week after release.
Yes, the current code is just slower than it could be, but works, and so would it with Simon's patch as long as it keeps the memcpy().
Amicalement,

Hi,
On Thu, Apr 18, 2013 at 11:43 AM, Albert ARIBAUD albert.u.boot@aribaud.net wrote:
Hi Tom,
On Thu, 18 Apr 2013 12:58:54 -0400, Tom Rini trini@ti.com wrote:
On Thu, Apr 18, 2013 at 06:39:29PM +0200, Albert ARIBAUD wrote:
Hi Wolfgang,
On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk wd@denx.de wrote:
So how about changing the element type of output in the definition of hash_func_ws, adapting the corresponding implementations sha1_csum_wd, sha256_csum_wd and crc32_wd_buf, and adapting the output argument of the sole call to hash_func_ws, that is, the local "u8 output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with alignment.
OK, but that would be a totally different approach (which appears to be a good one, here).
Indeed; I would suggest splitting this change in two independent ones:
fix the endianness issue: change the endianness of crc through the use of htonl() but leave the existing memcpy() in place as it is, even though it is not speed-optimized. That's what Simon's patch does if the HOSTCC part is ignored;
fix the unalignment issue by changing parameter 'output' of function type 'hash_func_ws' from u8* to u32* and adjusting the rest of the code accordingly -- which would lead to replacing the crc32 final memcpy() with a single indirect assignment.
These two changes could be submitted either separately, or as a series.
Now so that I'm clear, if we don't do anything about the unaligned issue today, it's just slow, not an unaligned access that leads to abort, right? So part one today for release, part two next week after release.
Yes, the current code is just slower than it could be, but works, and so would it with Simon's patch as long as it keeps the memcpy().
Yes the alignment issue is manufactured IMO by the char * used for the function signature and I did not feel comfortable declaring that it must be word aligned. To be it seems that the type should be naturally aligned. What do people think about that?
Anyway, that's why I ended up with this patch, which I felt was small enough to be accepted as a bug fix for the release.
Albert's email exactly mirrors my thinking on this - and yes I would be much more comfortable not having to create part 2 for the release.
Please let me know what I should do with this patch for now.
Regards, Simon

On Thu, Apr 18, 2013 at 12:06:47PM -0700, Simon Glass wrote:
Hi,
On Thu, Apr 18, 2013 at 11:43 AM, Albert ARIBAUD albert.u.boot@aribaud.net wrote:
Hi Tom,
On Thu, 18 Apr 2013 12:58:54 -0400, Tom Rini trini@ti.com wrote:
On Thu, Apr 18, 2013 at 06:39:29PM +0200, Albert ARIBAUD wrote:
Hi Wolfgang,
On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk wd@denx.de wrote:
So how about changing the element type of output in the definition of hash_func_ws, adapting the corresponding implementations sha1_csum_wd, sha256_csum_wd and crc32_wd_buf, and adapting the output argument of the sole call to hash_func_ws, that is, the local "u8 output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with alignment.
OK, but that would be a totally different approach (which appears to be a good one, here).
Indeed; I would suggest splitting this change in two independent ones:
fix the endianness issue: change the endianness of crc through the use of htonl() but leave the existing memcpy() in place as it is, even though it is not speed-optimized. That's what Simon's patch does if the HOSTCC part is ignored;
fix the unalignment issue by changing parameter 'output' of function type 'hash_func_ws' from u8* to u32* and adjusting the rest of the code accordingly -- which would lead to replacing the crc32 final memcpy() with a single indirect assignment.
These two changes could be submitted either separately, or as a series.
Now so that I'm clear, if we don't do anything about the unaligned issue today, it's just slow, not an unaligned access that leads to abort, right? So part one today for release, part two next week after release.
Yes, the current code is just slower than it could be, but works, and so would it with Simon's patch as long as it keeps the memcpy().
Yes the alignment issue is manufactured IMO by the char * used for the function signature and I did not feel comfortable declaring that it must be word aligned. To be it seems that the type should be naturally aligned. What do people think about that?
Anyway, that's why I ended up with this patch, which I felt was small enough to be accepted as a bug fix for the release.
Albert's email exactly mirrors my thinking on this - and yes I would be much more comfortable not having to create part 2 for the release.
Please let me know what I should do with this patch for now.
Lets follow the outline Albert made above, two patches, #2 depends on #1 and #1 is now, #2 is next week.
participants (5)
-
Albert ARIBAUD
-
Allen Martin
-
Simon Glass
-
Tom Rini
-
Wolfgang Denk