[U-Boot] [PATCH V3 1/3] lib_generic memcpy: copy one word at a time if possible

From: Alessandro Rubini rubini@unipv.it
If source and destination are aligned, this copies ulong values until possible, trailing part is copied by byte. Thanks for the details to Wolfgang Denk, Mike Frysinger, Peter Tyser, Chris Moore.
Signed-off-by: Alessandro Rubini rubini@unipv.it Acked-by: Andrea Gallo andrea.gallo@stericsson.com --- lib_generic/string.c | 19 +++++++++++++++---- 1 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/lib_generic/string.c b/lib_generic/string.c index 181eda6..9bee79b 100644 --- a/lib_generic/string.c +++ b/lib_generic/string.c @@ -446,12 +446,23 @@ char * bcopy(const char * src, char * dest, int count) * You should not use this function to access IO space, use memcpy_toio() * or memcpy_fromio() instead. */ -void * memcpy(void * dest,const void *src,size_t count) +void * memcpy(void *dest, const void *src, size_t count) { - char *tmp = (char *) dest, *s = (char *) src; - + unsigned long *dl = (unsigned long *)dest, *sl = (unsigned long *)src; + char *d8, *s8; + + /* while all data is aligned (common case), copy a word at a time */ + if ( (((ulong)dest | (ulong)src | count) & (sizeof(*dl) - 1)) == 0) { + while (count >= sizeof(*dl)) { + *dl++ = *sl++; + count -= sizeof(*dl); + } + } + /* copy the reset one byte at a time */ + d8 = (char *)dl; + s8 = (char *)sl; while (count--) - *tmp++ = *s++; + *d8++ = *s8++;
return dest; }

On Friday 09 October 2009 05:12:20 Alessandro Rubini wrote:
- /* while all data is aligned (common case), copy a word at a time */
- if ( (((ulong)dest | (ulong)src | count) & (sizeof(*dl) - 1)) == 0) {
i think you want to drop the count from the list, otherwise we dont consume the leading groups of 4 bytes if count isnt a multiple of 4.
otherwise, it's pretty crazy how generic yet optimized the resulting C should be :). acked-by-me. -mike

i think you want to drop the count from the list, otherwise we dont consume the leading groups of 4 bytes if count isnt a multiple of 4.
Yes, same for memset. See Wolfgang it was not 10% more? These micro optimizations are hairy, as you need to measure them to make sure they work.
Ok, V4 tomorrow. /alessandro

Alessandro Rubini a écrit :
From: Alessandro Rubini rubini@unipv.it
If source and destination are aligned, this copies ulong values until possible, trailing part is copied by byte. Thanks for the details to Wolfgang Denk, Mike Frysinger, Peter Tyser, Chris Moore.
Signed-off-by: Alessandro Rubini rubini@unipv.it Acked-by: Andrea Gallo andrea.gallo@stericsson.com
lib_generic/string.c | 19 +++++++++++++++---- 1 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/lib_generic/string.c b/lib_generic/string.c index 181eda6..9bee79b 100644 --- a/lib_generic/string.c +++ b/lib_generic/string.c @@ -446,12 +446,23 @@ char * bcopy(const char * src, char * dest, int count)
- You should not use this function to access IO space, use memcpy_toio()
- or memcpy_fromio() instead.
*/ -void * memcpy(void * dest,const void *src,size_t count) +void * memcpy(void *dest, const void *src, size_t count) {
- char *tmp = (char *) dest, *s = (char *) src;
- unsigned long *dl = (unsigned long *)dest, *sl = (unsigned long *)src;
Nitpick: Are you sure the casts are necessary here ? I am not sure about U-Boot but Linux kernel maintainers frown on unnecessary casts.
- char *d8, *s8;
- /* while all data is aligned (common case), copy a word at a time */
- if ( (((ulong)dest | (ulong)src | count) & (sizeof(*dl) - 1)) == 0) {
+ if ( (((ulong)dest | (ulong)src) & (sizeof(*dl) - 1)) == 0) {
The "or" should not include count: the remaining count % sizeof(unsigned long) bytes are copied below.
while (count >= sizeof(*dl)) {
*dl++ = *sl++;
count -= sizeof(*dl);
}
- }
- /* copy the reset one byte at a time */
Nitpick: s/reset/rest/
- d8 = (char *)dl;
- s8 = (char *)sl; while (count--)
*tmp++ = *s++;
*d8++ = *s8++;
return dest;
}
Cheers, Chris

Hello Chris
- unsigned long *dl = (unsigned long *)dest, *sl = (unsigned long *)src;
Nitpick: Are you sure the casts are necessary here ?
Without the one on src it complains because of "const". So I write both for symetry.
- if ( (((ulong)dest | (ulong)src) & (sizeof(*dl) - 1)) == 0) {
The "or" should not include count: the remaining count % sizeof(unsigned long) bytes are copied below.
Yes, that's why I'm sending V4 today. Actually, I booted V3 but didn't measure it, so this bug went unnoticed. But I won't measure it today, either...
Ok for spaces around operators (even if the whole of string.c is strangely spaced, but that's historical).
thanks /alessandro

Hi Alessandro,
Alessandro Rubini a écrit :
- unsigned long *dl = (unsigned long *)dest, *sl = (unsigned long *)src;
Nitpick: Are you sure the casts are necessary here ?
Without the one on src it complains because of "const". So I write both for symetry.
Yes, of course, you and the compiler are absolutely right. Silly me, I completely overlooked the const :( To me casting away const is a cardinal sin (it rather defeats the object) and I find that the ease with which it can be done in C is frightening. So I feel obliged to (hopefully) correct my submission:
void *memcpy(void *dest, const void *src, size_t count) { char *d8; const char *s8; unsigned long *dl = dest; const unsigned long *sl = src;
/* while all data is aligned (common case), copy multiple bytes at a time */ if ( (((int)(long)dest | (int)(long)src) & (sizeof(*dl) - 1)) == 0) { while (count >= sizeof(*dl)) { *dl++ = *sl++; count -= sizeof(*dl); } }
d8 = (char *)dl; s8 = (const char *)sl;
/* copy any remaining data byte by byte */ while (count--) *d8++ = *s8++;
return dest; }
But this is still not even compile tested :( I never learn :(
Cheers, Chris
participants (4)
-
Alessandro Rubini
-
Alessandro Rubini
-
Chris Moore
-
Mike Frysinger