[U-Boot] [PATCH] crc32: Correct endianness of crc32 result

older
[U-Boot] [RFC] command/cache: Add...

Simon Glass

6 Apr 2013 6 Apr '13

1:11 a.m.

When crc32 is handled by the hash library, it requires the data to be in big-endian format, since it reads it byte-wise. Thus at present the 'crc32' command reports incorrect data. For example, previously we might see:

Peach # crc32 40000000 100 CRC32 for 40000000 ... 400000ff ==> 0d968558

but instead with the hash library we see:

Peach # crc32 40000000 100 CRC32 for 40000000 ... 400000ff ==> 5885960d

Correct this.

Signed-off-by: Simon Glass sjg@chromium.org Reviewed-by: Vadim Bendebury vbendeb@google.com --- lib/crc32.c | 6 ++++++ 1 file changed, 6 insertions(+)

diff --git a/lib/crc32.c b/lib/crc32.c index 76205da..94720bf 100644 --- a/lib/crc32.c +++ b/lib/crc32.c @@ -10,6 +10,7 @@

#ifndef USE_HOSTCC #include <common.h> +#include <asm/unaligned.h> #endif #include <compiler.h> #include <u-boot/crc.h> @@ -256,5 +257,10 @@ void crc32_wd_buf(const unsigned char *input, unsigned int ilen, uint32_t crc;

crc = crc32_wd(0, input, ilen, chunk_sz); +#ifdef USE_HOSTCC + crc = htobe32(crc); memcpy(output, &crc, sizeof(crc)); +#else + put_unaligned_be32(crc, output); +#endif }

-- 1.8.1.3

Show replies by date

Allen Martin

6 Apr 6 Apr

1:32 a.m.

On Fri, Apr 05, 2013 at 04:11:10PM -0700, Simon Glass wrote:

...

When crc32 is handled by the hash library, it requires the data to be in big-endian format, since it reads it byte-wise. Thus at present the 'crc32' command reports incorrect data. For example, previously we might see:

Peach # crc32 40000000 100 CRC32 for 40000000 ... 400000ff ==> 0d968558

but instead with the hash library we see:

Peach # crc32 40000000 100 CRC32 for 40000000 ... 400000ff ==> 5885960d

Correct this.

Signed-off-by: Simon Glass sjg@chromium.org Reviewed-by: Vadim Bendebury vbendeb@google.com

lib/crc32.c | 6 ++++++ 1 file changed, 6 insertions(+)

diff --git a/lib/crc32.c b/lib/crc32.c index 76205da..94720bf 100644 --- a/lib/crc32.c +++ b/lib/crc32.c @@ -10,6 +10,7 @@

#ifndef USE_HOSTCC #include <common.h> +#include <asm/unaligned.h> #endif #include <compiler.h> #include <u-boot/crc.h> @@ -256,5 +257,10 @@ void crc32_wd_buf(const unsigned char *input, unsigned int ilen, uint32_t crc;

crc = crc32_wd(0, input, ilen, chunk_sz); +#ifdef USE_HOSTCC

crc = htobe32(crc); memcpy(output, &crc, sizeof(crc));

+#else

put_unaligned_be32(crc, output);

+#endif } -- 1.8.1.3

Tested on Tegra114 Dalmore, verified crc32 comes out as expected now.

Reviewed-by: Allen Martin amartin@nvidia.com Tested-by: Allen Martin amartin@nvidia.com

-- nvpublic

Wolfgang Denk

9:04 a.m.

Dear Simon Glass,

In message 1365203470-9099-1-git-send-email-sjg@chromium.org you wrote:

...

When crc32 is handled by the hash library, it requires the data to be in big-endian format, since it reads it byte-wise. Thus at present the 'crc32' command reports incorrect data. For example, previously we might see:

...

+#ifdef USE_HOSTCC

crc = htobe32(crc); memcpy(output, &crc, sizeof(crc));

+#else

put_unaligned_be32(crc, output);

+#endif

Why is this depending on USE_HOSTCC, and not on the endianess?

And why do we need the #ifdef? Can we not always use htobe32() and put_unaligned_be32() ?

Best regards,

Wolfgang Denk

-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de "UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever things." - Doug Gwyn

Simon Glass

16 Apr 16 Apr

11:57 p.m.

Hi Wolfgang,

On Sat, Apr 6, 2013 at 12:04 AM, Wolfgang Denk wd@denx.de wrote:

...

Dear Simon Glass,

In message 1365203470-9099-1-git-send-email-sjg@chromium.org you wrote:

...
When crc32 is handled by the hash library, it requires the data to be in big-endian format, since it reads it byte-wise. Thus at present the 'crc32' command reports incorrect data. For example, previously we might see:

...
+#ifdef USE_HOSTCC
crc = htobe32(crc);
memcpy(output, &crc, sizeof(crc));
+#else
put_unaligned_be32(crc, output);
+#endif
Why is this depending on USE_HOSTCC, and not on the endianess?

We always want big-endian in this case, since the bytes have to date been written that way by the crc32 command.

...

And why do we need the #ifdef? Can we not always use htobe32() and put_unaligned_be32() ?

Well I don't think put_unaligned_be32 is available to user space, which is the environment that the tools are built under. It is available in the kernel, but that's not our environment.

I'm not happy with this solution and would be pleased to find a better way, but I'm not sure what it is.

But this patch does fix a real bug which we should sort out before the release, one way or another.

...

Best regards,

Wolfgang Denk

Regards, Simon

Tom Rini

17 Apr 17 Apr

1 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 04/16/2013 05:57 PM, Simon Glass wrote:

...

Hi Wolfgang,

On Sat, Apr 6, 2013 at 12:04 AM, Wolfgang Denk wd@denx.de wrote:

...
Dear Simon Glass,

In message 1365203470-9099-1-git-send-email-sjg@chromium.org you wrote:

...
When crc32 is handled by the hash library, it requires the data to be in big-endian format, since it reads it byte-wise. Thus at present the 'crc32' command reports incorrect data. For example, previously we might see:

...
+#ifdef USE_HOSTCC + crc = htobe32(crc); memcpy(output, &crc, sizeof(crc)); +#else + put_unaligned_be32(crc, output); +#endif

Why is this depending on USE_HOSTCC, and not on the endianess?

We always want big-endian in this case, since the bytes have to date been written that way by the crc32 command.

In other words, this code is executed on both the host and the target, and there's not a uniform endian sanitizer function.

- -- Tom

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJRbdghAAoJENk4IS6UOR1WzzsP/j+JhIIkp9EK4YwDQ7H6F06V i60/Y/EaVg32HAqMjo66y2GCcu7ak9fobZo3wHnAjrFkaSJWq4s+eDNOtJiHbr/O ZwQbDJ/t5Yf7vuZ1OhBTII+pYHXYcDW+30QZpBP3+ydY8Zkg7tsAcQ5rH7KdHWuD 83k7DLjv9obn85eeVikNkfB7ONNFVRug07Obcd+jbXdQamc/VWxWZyvUwDiKGCYH eqmss/hQ7o343FWKqsVNSxd3/tF7z3PNevWm83xJxlc5xbtyJ/8kad6qkkILtOGd G5OOXnL7BpSqL2mxt5ruW+cwqOnp74SvoQXuM6lNZzeAmlirAginYbnDSvjQamIy DK1BQAjodXGU7nZxffw4vKZzbGzkgRl2KkuGvQfpMJzoJRyWvrbDzIVOeF/bXXQS UEvASOFAdqKpMPNJnIm16GUtH6/OyEWK+8HInFse7K19ycM4M/TpM2dhmVZrHG0S Q+Xq7ZI6pEq0SjMIfhuCwYJS6lqbtlQ5eOk+KoTYXOWneOzOgDGKO+El53wDwKQ5 0icIUwNKn4ZKMg7HLE25Docx3Ez6OVBD2Aelz0wlc5FMWlmLrHe9oYaufDN36bBH D5XrLeBDjj89mTOHl4V0U3tZ/1iLLFqxo9RyNG6lPkLOQD62vLPxjyPYqY8Q6q85 kPdToR/o/YfFk3EsiYpD =0O7d -----END PGP SIGNATURE-----

Simon Glass

1:16 a.m.

Hi Tom,

On Tue, Apr 16, 2013 at 4:00 PM, Tom Rini trini@ti.com wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 04/16/2013 05:57 PM, Simon Glass wrote:

...
Hi Wolfgang,

On Sat, Apr 6, 2013 at 12:04 AM, Wolfgang Denk wd@denx.de wrote:

...
Dear Simon Glass,

In message 1365203470-9099-1-git-send-email-sjg@chromium.org you wrote:

...
When crc32 is handled by the hash library, it requires the data to be in big-endian format, since it reads it byte-wise. Thus at present the 'crc32' command reports incorrect data. For example, previously we might see:

...
+#ifdef USE_HOSTCC + crc = htobe32(crc); memcpy(output, &crc, sizeof(crc)); +#else + put_unaligned_be32(crc, output); +#endif

Why is this depending on USE_HOSTCC, and not on the endianess?

We always want big-endian in this case, since the bytes have to date been written that way by the crc32 command.

In other words, this code is executed on both the host and the target, and there's not a uniform endian sanitizer function.

Well to be honest, I don't think the code is needed on the host, and I could put the #ifdef around the whole function. I just thought that might be a bit short-sighted.

Regards, Simon

Wolfgang Denk

7:40 a.m.

Dear Simon Glass,

In message CAPnjgZ2jRVpQ56_EpVKUMH8E1L2LJ0HgzNCs-GrN9bXfOSXz+Q@mail.gmail.com you wrote:

...

...
...
+#ifdef USE_HOSTCC
crc = htobe32(crc);
memcpy(output, &crc, sizeof(crc));
+#else
put_unaligned_be32(crc, output);
+#endif
Why is this depending on USE_HOSTCC, and not on the endianess?
We always want big-endian in this case, since the bytes have to date been written that way by the crc32 command.

Let me rephrase. Why do we need an #ifdef here, and why depends this on USE_HOSTCC?

...

...
And why do we need the #ifdef? Can we not always use htobe32() and put_unaligned_be32() ?

Well I don't think put_unaligned_be32 is available to user space, which is the environment that the tools are built under. It is available in the kernel, but that's not our environment.

It appears put_unaligned_be32() is a widely unknown, pretty exotic function that so far has been used in ony two source files:

drivers/usb/gadget/f_mass_storage.c lib/tpm.c

The implementation (in "include/linux/unaligned/generic.h") is ugly and pretty expensive in terms of run time and memory footprint.

I would like to avoid it's usage all together.

...

I'm not happy with this solution and would be pleased to find a better way, but I'm not sure what it is.

Does the htobe32() + memcpy() approach not work everywhere?

...

But this patch does fix a real bug which we should sort out before the release, one way or another.

Agreed.

Best regards,

Wolfgang Denk

-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de "I find this a nice feature but it is not according to the documen- tation. Or is it a BUG?" "Let's call it an accidental feature. :-)" - Larry Wall in 6909@jpl-devvax.JPL.NASA.GOV

Simon Glass

8:28 p.m.

Hi Wolfgang,

On Tue, Apr 16, 2013 at 10:40 PM, Wolfgang Denk wd@denx.de wrote:

...

Dear Simon Glass,

In message CAPnjgZ2jRVpQ56_EpVKUMH8E1L2LJ0HgzNCs-GrN9bXfOSXz+Q@mail.gmail.com you wrote:

...
...
...
+#ifdef USE_HOSTCC
crc = htobe32(crc);
memcpy(output, &crc, sizeof(crc));
+#else
put_unaligned_be32(crc, output);
+#endif
Why is this depending on USE_HOSTCC, and not on the endianess?
We always want big-endian in this case, since the bytes have to date been written that way by the crc32 command.
Let me rephrase. Why do we need an #ifdef here, and why depends this on USE_HOSTCC?

...
...
And why do we need the #ifdef? Can we not always use htobe32() and put_unaligned_be32() ?

Well I don't think put_unaligned_be32 is available to user space, which is the environment that the tools are built under. It is available in the kernel, but that's not our environment.

It appears put_unaligned_be32() is a widely unknown, pretty exotic function that so far has been used in ony two source files:
    drivers/usb/gadget/f_mass_storage.c
    lib/tpm.c
The implementation (in "include/linux/unaligned/generic.h") is ugly and pretty expensive in terms of run time and memory footprint.

I would like to avoid it's usage all together.

It ends up producing this on ARMv7:

43e2c1d8: e1a03820 lsr r3, r0, #16 43e2c1dc: e5c40003 strb r0, [r4, #3] 43e2c1e0: e5c43001 strb r3, [r4, #1] 43e2c1e4: e1a02423 lsr r2, r3, #8 43e2c1e8: e7e73450 ubfx r3, r0, #8, #8 43e2c1ec: e5c42000 strb r2, [r4] 43e2c1f0: e5c43002 strb r3, [r4, #2]

In fact with unaligned support we could optimise this.

...

...
I'm not happy with this solution and would be pleased to find a better way, but I'm not sure what it is.

Does the htobe32() + memcpy() approach not work everywhere?

But htobe32() isn't available in U-Boot.

I tried using:

#ifdef USE_HOSTCC crc = htobe32(crc); #else crc = cpu_to_be32(crc); #endif memcpy(output, &crc, sizeof(crc));

This is one instruction (4 bytes, 16%) smaller, but I suspect quite a lot slower due to the overhead of a very small memcpy().

43e2c1d8: e28d1008 add r1, sp, #8 43e2c1dc: e3a02004 mov r2, #4 43e2c1e0: e6bf0f30 rev r0, r0 43e2c1e4: e5210004 str r0, [r1, #-4]! 43e2c1e8: e1a00004 mov r0, r4 43e2c1ec: eb001af7 bl 43e32dd0 <memcpy>

...

...
But this patch does fix a real bug which we should sort out before the release, one way or another.

Agreed.

Let me know what you prefer and I will update the patch.

Regards, Simon

Albert ARIBAUD

9:23 p.m.

Hi Simon -- and sorry for the dupe.

On Wed, 17 Apr 2013 11:28:07 -0700, Simon Glass sjg@chromium.org wrote:

...

I tried using:

#ifdef USE_HOSTCC crc = htobe32(crc); #else crc = cpu_to_be32(crc); #endif memcpy(output, &crc, sizeof(crc));

This is one instruction (4 bytes, 16%) smaller, but I suspect quite a lot slower due to the overhead of a very small memcpy().

43e2c1d8: e28d1008 add r1, sp, #8 43e2c1dc: e3a02004 mov r2, #4 43e2c1e0: e6bf0f30 rev r0, r0 43e2c1e4: e5210004 str r0, [r1, #-4]! 43e2c1e8: e1a00004 mov r0, r4 43e2c1ec: eb001af7 bl 43e32dd0 <memcpy>

How about replacing the memcpy with an explicit put_unaligned(), similar to what was done in

http://www.mail-archive.com/u-boot@lists.denx.de/msg109555.html

with get_unaligned()? The code will be longer than above, but shorter than the above plus the memcpy(), and faster too -- actually, I'm surprised that the compiler does not unroll the memcpy() on its own, considering the size argument is a constant.

...

Regards, Simon

Amicalement,

-- Albert.

Simon Glass

10:59 p.m.

Hi Albert,

On Wed, Apr 17, 2013 at 12:23 PM, Albert ARIBAUD albert.u.boot@aribaud.net wrote:

...

Hi Simon -- and sorry for the dupe.

On Wed, 17 Apr 2013 11:28:07 -0700, Simon Glass sjg@chromium.org wrote:

...
I tried using:

#ifdef USE_HOSTCC crc = htobe32(crc); #else crc = cpu_to_be32(crc); #endif memcpy(output, &crc, sizeof(crc));

This is one instruction (4 bytes, 16%) smaller, but I suspect quite a lot slower due to the overhead of a very small memcpy().

43e2c1d8: e28d1008 add r1, sp, #8 43e2c1dc: e3a02004 mov r2, #4 43e2c1e0: e6bf0f30 rev r0, r0 43e2c1e4: e5210004 str r0, [r1, #-4]! 43e2c1e8: e1a00004 mov r0, r4 43e2c1ec: eb001af7 bl 43e32dd0 <memcpy>

How about replacing the memcpy with an explicit put_unaligned(), similar to what was done in

http://www.mail-archive.com/u-boot@lists.denx.de/msg109555.html

with get_unaligned()? The code will be longer than above, but shorter than the above plus the memcpy(), and faster too -- actually, I'm surprised that the compiler does not unroll the memcpy() on its own, considering the size argument is a constant.

Do you mean like this?

#ifdef USE_HOSTCC crc = htobe32(crc); memcpy(output, &crc, sizeof(crc)); #else crc = cpu_to_be32(crc); put_unaligned(crc, (uint32_t *)output); #endif

This produces the same code as my original patch. If this is acceptable then I will do that, although it doesn't really seem any better.

Regards, Simon

[and sorry for my dup]

Albert ARIBAUD

18 Apr 18 Apr

8:20 a.m.

Hi Simon,

On Wed, 17 Apr 2013 13:59:48 -0700, Simon Glass sjg@chromium.org wrote:

...

Hi Albert,

On Wed, Apr 17, 2013 at 12:23 PM, Albert ARIBAUD albert.u.boot@aribaud.net wrote:

...
Hi Simon -- and sorry for the dupe.

On Wed, 17 Apr 2013 11:28:07 -0700, Simon Glass sjg@chromium.org wrote:

...
I tried using:

#ifdef USE_HOSTCC crc = htobe32(crc); #else crc = cpu_to_be32(crc); #endif memcpy(output, &crc, sizeof(crc));

This is one instruction (4 bytes, 16%) smaller, but I suspect quite a lot slower due to the overhead of a very small memcpy().

43e2c1d8: e28d1008 add r1, sp, #8 43e2c1dc: e3a02004 mov r2, #4 43e2c1e0: e6bf0f30 rev r0, r0 43e2c1e4: e5210004 str r0, [r1, #-4]! 43e2c1e8: e1a00004 mov r0, r4 43e2c1ec: eb001af7 bl 43e32dd0 <memcpy>

How about replacing the memcpy with an explicit put_unaligned(), similar to what was done in

http://www.mail-archive.com/u-boot@lists.denx.de/msg109555.html

with get_unaligned()? The code will be longer than above, but shorter than the above plus the memcpy(), and faster too -- actually, I'm surprised that the compiler does not unroll the memcpy() on its own, considering the size argument is a constant.

Do you mean like this?

#ifdef USE_HOSTCC crc = htobe32(crc); memcpy(output, &crc, sizeof(crc)); #else crc = cpu_to_be32(crc); put_unaligned(crc, (uint32_t *)output); #endif

This produces the same code as my original patch. If this is acceptable then I will do that, although it doesn't really seem any better.

Wolfgang may not like it any more than put_unaligned_be32() as it builds upon it (and the disk patch could have used the be32 version as well). Personally, I think we should allow and use these...

... and work on optimizing their implementation for ARM; we should be able to reduce such put()s and get()s to a few instructions. And to avoid any misunderstanding, yes, I volunteer for the optimizing work. :)

...

Regards, Simon

[and sorry for my dup]

Actually I'm the culprit.

Amicalement,

-- Albert.

Wolfgang Denk

12:36 p.m.

Dear Albert ARIBAUD,

In message 20130418082027.4b5ea191@lilith you wrote:

...

...
#ifdef USE_HOSTCC crc = htobe32(crc); memcpy(output, &crc, sizeof(crc)); #else crc = cpu_to_be32(crc); put_unaligned(crc, (uint32_t *)output); #endif

This produces the same code as my original patch. If this is acceptable then I will do that, although it doesn't really seem any better.

Wolfgang may not like it any more than put_unaligned_be32() as it builds upon it (and the disk patch could have used the be32 version as

Indeed.

...

well). Personally, I think we should allow and use these...

... and work on optimizing their implementation for ARM; we should be able to reduce such put()s and get()s to a few instructions. And to

OK - and what about the other architectures that suffer from the same issues?

...

avoid any misunderstanding, yes, I volunteer for the optimizing work. :)

I really dislike introducing such custom functions when we have standard functions available that implement the same purposes.

Checking Linux code (as U-Boot is not representative here):

-> find * -name '*.c' | wc -l 18362 -> find * -name '*.c' | xargs fgrep -l put_unaligned | wc -l 136

i. e. just 0.75% of the source files actually use any of the "put_unaligned*()" variants - it is a highly exotic function.

htobe32() is even worse - just a single source file in the whole Lnux tree uses it (arch/um/drivers/cow_user.c).

Can we not stick to standard functions? Instead of htobe32() we should use htonl() which is available both in U-Boot and in the host environment.

Best regards,

Wolfgang Denk

-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de While money can't buy happiness, it certainly lets you choose your own form of misery.

Albert ARIBAUD

1:18 p.m.

Hi Wolfgang,

On Thu, 18 Apr 2013 12:36:00 +0200, Wolfgang Denk wd@denx.de wrote:

...

Dear Albert ARIBAUD,

In message 20130418082027.4b5ea191@lilith you wrote:

...
...
#ifdef USE_HOSTCC crc = htobe32(crc); memcpy(output, &crc, sizeof(crc)); #else crc = cpu_to_be32(crc); put_unaligned(crc, (uint32_t *)output); #endif

This produces the same code as my original patch. If this is acceptable then I will do that, although it doesn't really seem any better.

Wolfgang may not like it any more than put_unaligned_be32() as it builds upon it (and the disk patch could have used the be32 version as

Indeed.

...
well). Personally, I think we should allow and use these...

... and work on optimizing their implementation for ARM; we should be able to reduce such put()s and get()s to a few instructions. And to

OK - and what about the other architectures that suffer from the same issues?

They should/could provide their own optimized versions, obviously.

...

...
avoid any misunderstanding, yes, I volunteer for the optimizing work. :)

I really dislike introducing such custom functions when we have standard functions available that implement the same purposes.

I understand the point; this is a classical opposition of generic vs optimized.

...

Checking Linux code (as U-Boot is not representative here):

-> find * -name '*.c' | wc -l 18362 -> find * -name '*.c' | xargs fgrep -l put_unaligned | wc -l 136

i. e. just 0.75% of the source files actually use any of the "put_unaligned*()" variants - it is a highly exotic function.

htobe32() is even worse - just a single source file in the whole Lnux tree uses it (arch/um/drivers/cow_user.c).

Can we not stick to standard functions? Instead of htobe32() we should use htonl() which is available both in U-Boot and in the host environment.

Note that there are two problems here: endianness conversion and unaligned storage. We can indeed use htoln() instead of htobe32(), but that only affects/solved endianness issues, not alignment ones, for which there is no solution that is efficient across all ARM targets.

Note too that if the hash infrastructure mandated that the output buffer be 4-byte-aligned, i.e. a u32* or similar, we would not even have to worry about unalignment; such a constraint on the output buffer makes all the more sense if we consider the fact that current hash sizes are 4 (crc32), 20 (SHA1) and 32 (SHA256), all multiples of 4, and future hashes will most certainly not be less aligned.

So how about changing the element type of output in the definition of hash_func_ws, adapting the corresponding implementations sha1_csum_wd, sha256_csum_wd and crc32_wd_buf, and adapting the output argument of the sole call to hash_func_ws, that is, the local "u8 output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with alignment.

Note finally that we *need* unaligned access macros/inline functions/whatever, if only for exceptions laid out in doc/README.arm-unaligned-accesses. If we avoid calling them too often, we can live with generic.

...

Best regards,

Wolfgang Denk

Amicalement,

-- Albert.

Wolfgang Denk

4:39 p.m.

Dear Albert,

In message 20130418131810.38c916b8@lilith you wrote:

...

...
OK - and what about the other architectures that suffer from the same issues?

They should/could provide their own optimized versions, obviously.

Well, if this was a generally needed or even useful service, that might make sense. But it is highly exotic stuff...

...

...
I really dislike introducing such custom functions when we have standard functions available that implement the same purposes.

I understand the point; this is a classical opposition of generic vs optimized.

Not really. It is more opposition against premature optimization at the cost of non-standard code.

Is there any justification that we really need hightly optimized versus generic code here? Is this used anywhere in a place where performance is critical?

...

...
Can we not stick to standard functions? Instead of htobe32() we should use htonl() which is available both in U-Boot and in the host environment.

Note that there are two problems here: endianness conversion and unaligned storage. We can indeed use htoln() instead of htobe32(), but that only affects/solved endianness issues, not alignment ones, for which there is no solution that is efficient across all ARM targets.

Correct. But the htobe32() approach was cobining this with the generic memcpy(), too, so we would do the same with htoln().

...

So how about changing the element type of output in the definition of hash_func_ws, adapting the corresponding implementations sha1_csum_wd, sha256_csum_wd and crc32_wd_buf, and adapting the output argument of the sole call to hash_func_ws, that is, the local "u8 output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with alignment.

OK, but that would be a totally different approach (which appears to be a good one, here).

...

Note finally that we *need* unaligned access macros/inline functions/whatever, if only for exceptions laid out in doc/README.arm-unaligned-accesses. If we avoid calling them too often, we can live with generic.

Agreed; the focus should always be on avoidance, though.

Best regards,

Wolfgang Denk

-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de |/ ____ |/ |/ ____ |/ @~/ ,. ~@ @~/ ,. ~@ /_( __/ )_\ /_( __/ )_\ __U_/ __U_/

Albert ARIBAUD

6:39 p.m.

Hi Wolfgang,

On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk wd@denx.de wrote:

...

...
So how about changing the element type of output in the definition of hash_func_ws, adapting the corresponding implementations sha1_csum_wd, sha256_csum_wd and crc32_wd_buf, and adapting the output argument of the sole call to hash_func_ws, that is, the local "u8 output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with alignment.

OK, but that would be a totally different approach (which appears to be a good one, here).

Indeed; I would suggest splitting this change in two independent ones:

- fix the endianness issue: change the endianness of crc through the use of htonl() but leave the existing memcpy() in place as it is, even though it is not speed-optimized. That's what Simon's patch does if the HOSTCC part is ignored;

- fix the unalignment issue by changing parameter 'output' of function type 'hash_func_ws' from u8* to u32* and adjusting the rest of the code accordingly -- which would lead to replacing the crc32 final memcpy() with a single indirect assignment.

These two changes could be submitted either separately, or as a series.

Amicalement,

-- Albert.

Tom Rini

6:58 p.m.

On Thu, Apr 18, 2013 at 06:39:29PM +0200, Albert ARIBAUD wrote:

...

Hi Wolfgang,

On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk wd@denx.de wrote:

...
...
So how about changing the element type of output in the definition of hash_func_ws, adapting the corresponding implementations sha1_csum_wd, sha256_csum_wd and crc32_wd_buf, and adapting the output argument of the sole call to hash_func_ws, that is, the local "u8 output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with alignment.

OK, but that would be a totally different approach (which appears to be a good one, here).

Indeed; I would suggest splitting this change in two independent ones:

fix the endianness issue: change the endianness of crc through the use of htonl() but leave the existing memcpy() in place as it is, even though it is not speed-optimized. That's what Simon's patch does if the HOSTCC part is ignored;

fix the unalignment issue by changing parameter 'output' of function type 'hash_func_ws' from u8* to u32* and adjusting the rest of the code accordingly -- which would lead to replacing the crc32 final memcpy() with a single indirect assignment.

These two changes could be submitted either separately, or as a series.

Now so that I'm clear, if we don't do anything about the unaligned issue today, it's just slow, not an unaligned access that leads to abort, right? So part one today for release, part two next week after release.

-- Tom

Albert ARIBAUD

8:43 p.m.

Hi Tom,

On Thu, 18 Apr 2013 12:58:54 -0400, Tom Rini trini@ti.com wrote:

...

On Thu, Apr 18, 2013 at 06:39:29PM +0200, Albert ARIBAUD wrote:

...
Hi Wolfgang,

On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk wd@denx.de wrote:

...
...
So how about changing the element type of output in the definition of hash_func_ws, adapting the corresponding implementations sha1_csum_wd, sha256_csum_wd and crc32_wd_buf, and adapting the output argument of the sole call to hash_func_ws, that is, the local "u8 output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with alignment.

OK, but that would be a totally different approach (which appears to be a good one, here).

Indeed; I would suggest splitting this change in two independent ones:

fix the endianness issue: change the endianness of crc through the use of htonl() but leave the existing memcpy() in place as it is, even though it is not speed-optimized. That's what Simon's patch does if the HOSTCC part is ignored;

fix the unalignment issue by changing parameter 'output' of function type 'hash_func_ws' from u8* to u32* and adjusting the rest of the code accordingly -- which would lead to replacing the crc32 final memcpy() with a single indirect assignment.

These two changes could be submitted either separately, or as a series.

Now so that I'm clear, if we don't do anything about the unaligned issue today, it's just slow, not an unaligned access that leads to abort, right? So part one today for release, part two next week after release.

Yes, the current code is just slower than it could be, but works, and so would it with Simon's patch as long as it keeps the memcpy().

Amicalement,

-- Albert.

Simon Glass

9:06 p.m.

Hi,

On Thu, Apr 18, 2013 at 11:43 AM, Albert ARIBAUD albert.u.boot@aribaud.net wrote:

...

Hi Tom,

On Thu, 18 Apr 2013 12:58:54 -0400, Tom Rini trini@ti.com wrote:

...
On Thu, Apr 18, 2013 at 06:39:29PM +0200, Albert ARIBAUD wrote:

...
Hi Wolfgang,

On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk wd@denx.de wrote:

...
...
So how about changing the element type of output in the definition of hash_func_ws, adapting the corresponding implementations sha1_csum_wd, sha256_csum_wd and crc32_wd_buf, and adapting the output argument of the sole call to hash_func_ws, that is, the local "u8 output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with alignment.

OK, but that would be a totally different approach (which appears to be a good one, here).

Indeed; I would suggest splitting this change in two independent ones:

fix the endianness issue: change the endianness of crc through the use of htonl() but leave the existing memcpy() in place as it is, even though it is not speed-optimized. That's what Simon's patch does if the HOSTCC part is ignored;

fix the unalignment issue by changing parameter 'output' of function type 'hash_func_ws' from u8* to u32* and adjusting the rest of the code accordingly -- which would lead to replacing the crc32 final memcpy() with a single indirect assignment.

These two changes could be submitted either separately, or as a series.

Now so that I'm clear, if we don't do anything about the unaligned issue today, it's just slow, not an unaligned access that leads to abort, right? So part one today for release, part two next week after release.

Yes, the current code is just slower than it could be, but works, and so would it with Simon's patch as long as it keeps the memcpy().

Yes the alignment issue is manufactured IMO by the char * used for the function signature and I did not feel comfortable declaring that it must be word aligned. To be it seems that the type should be naturally aligned. What do people think about that?

Anyway, that's why I ended up with this patch, which I felt was small enough to be accepted as a bug fix for the release.

Albert's email exactly mirrors my thinking on this - and yes I would be much more comfortable not having to create part 2 for the release.

Please let me know what I should do with this patch for now.

Regards, Simon

Tom Rini

9:18 p.m.

On Thu, Apr 18, 2013 at 12:06:47PM -0700, Simon Glass wrote:

...

Hi,

On Thu, Apr 18, 2013 at 11:43 AM, Albert ARIBAUD albert.u.boot@aribaud.net wrote:

...
Hi Tom,

On Thu, 18 Apr 2013 12:58:54 -0400, Tom Rini trini@ti.com wrote:

...
On Thu, Apr 18, 2013 at 06:39:29PM +0200, Albert ARIBAUD wrote:

...
Hi Wolfgang,

On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk wd@denx.de wrote:

...
...
So how about changing the element type of output in the definition of hash_func_ws, adapting the corresponding implementations sha1_csum_wd, sha256_csum_wd and crc32_wd_buf, and adapting the output argument of the sole call to hash_func_ws, that is, the local "u8 output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with alignment.

OK, but that would be a totally different approach (which appears to be a good one, here).

Indeed; I would suggest splitting this change in two independent ones:

fix the endianness issue: change the endianness of crc through the use of htonl() but leave the existing memcpy() in place as it is, even though it is not speed-optimized. That's what Simon's patch does if the HOSTCC part is ignored;

fix the unalignment issue by changing parameter 'output' of function type 'hash_func_ws' from u8* to u32* and adjusting the rest of the code accordingly -- which would lead to replacing the crc32 final memcpy() with a single indirect assignment.

These two changes could be submitted either separately, or as a series.

Now so that I'm clear, if we don't do anything about the unaligned issue today, it's just slow, not an unaligned access that leads to abort, right? So part one today for release, part two next week after release.

Yes, the current code is just slower than it could be, but works, and so would it with Simon's patch as long as it keeps the memcpy().

Yes the alignment issue is manufactured IMO by the char * used for the function signature and I did not feel comfortable declaring that it must be word aligned. To be it seems that the type should be naturally aligned. What do people think about that?

Anyway, that's why I ended up with this patch, which I felt was small enough to be accepted as a bug fix for the release.

Albert's email exactly mirrors my thinking on this - and yes I would be much more comfortable not having to create part 2 for the release.

Please let me know what I should do with this patch for now.

Lets follow the outline Albert made above, two patches, #2 depends on #1 and #1 is now, #2 is next week.

-- Tom

4410

Age (days ago)

4423

Last active (days ago)

List overview

Download

18 comments

5 participants

tags (0)

participants (5)

Albert ARIBAUD
Allen Martin
Simon Glass
Tom Rini
Wolfgang Denk