
Hi Simon,
On Wed, 17 Apr 2013 13:59:48 -0700, Simon Glass sjg@chromium.org wrote:
Hi Albert,
On Wed, Apr 17, 2013 at 12:23 PM, Albert ARIBAUD albert.u.boot@aribaud.net wrote:
Hi Simon -- and sorry for the dupe.
On Wed, 17 Apr 2013 11:28:07 -0700, Simon Glass sjg@chromium.org wrote:
I tried using:
#ifdef USE_HOSTCC crc = htobe32(crc); #else crc = cpu_to_be32(crc); #endif memcpy(output, &crc, sizeof(crc));
This is one instruction (4 bytes, 16%) smaller, but I suspect quite a lot slower due to the overhead of a very small memcpy().
43e2c1d8: e28d1008 add r1, sp, #8 43e2c1dc: e3a02004 mov r2, #4 43e2c1e0: e6bf0f30 rev r0, r0 43e2c1e4: e5210004 str r0, [r1, #-4]! 43e2c1e8: e1a00004 mov r0, r4 43e2c1ec: eb001af7 bl 43e32dd0 <memcpy>
How about replacing the memcpy with an explicit put_unaligned(), similar to what was done in
http://www.mail-archive.com/u-boot@lists.denx.de/msg109555.html
with get_unaligned()? The code will be longer than above, but shorter than the above plus the memcpy(), and faster too -- actually, I'm surprised that the compiler does not unroll the memcpy() on its own, considering the size argument is a constant.
Do you mean like this?
#ifdef USE_HOSTCC crc = htobe32(crc); memcpy(output, &crc, sizeof(crc)); #else crc = cpu_to_be32(crc); put_unaligned(crc, (uint32_t *)output); #endif
This produces the same code as my original patch. If this is acceptable then I will do that, although it doesn't really seem any better.
Wolfgang may not like it any more than put_unaligned_be32() as it builds upon it (and the disk patch could have used the be32 version as well). Personally, I think we should allow and use these...
... and work on optimizing their implementation for ARM; we should be able to reduce such put()s and get()s to a few instructions. And to avoid any misunderstanding, yes, I volunteer for the optimizing work. :)
Regards, Simon
[and sorry for my dup]
Actually I'm the culprit.
Amicalement,