Re: [U-Boot] [PATCH V2 1/3] memcpy: copy one word at a time if possible

Dear Alessandro Rubini,
In message <45d5e3a574bf4844f46f50b2c88054a5b28f973b.1255000877.git.rubini@ unipv.it> you wrote:
From: Alessandro Rubini rubini@unipv.it
Signed-off-by: Alessandro Rubini rubini@unipv.it Acked-by: Andrea Gallo andrea.gallo@stericsson.com
lib_generic/string.c | 17 +++++++++++++---- 1 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/lib_generic/string.c b/lib_generic/string.c index 181eda6..9911941 100644 --- a/lib_generic/string.c +++ b/lib_generic/string.c @@ -446,12 +446,21 @@ char * bcopy(const char * src, char * dest, int count)
- You should not use this function to access IO space, use memcpy_toio()
- or memcpy_fromio() instead.
*/ -void * memcpy(void * dest,const void *src,size_t count) +void * memcpy(void *dest, const void *src, size_t count) {
- char *tmp = (char *) dest, *s = (char *) src;
char *d8 = (char *)dest, *s8 = (char *)src;
unsigned long *dl = (unsigned long *)dest, *sl = (unsigned long *)src;
/* if all data is aligned (common case), copy a word at a time */
if ( (((int)dest | (int)src | count) & (sizeof(long) - 1)) == 0) {
count /= sizeof(unsigned long);
while (count--)
*dl++ = *sl++;
return dest;
}
/* else, use 1-byte copy */ while (count--)
*tmp++ = *s++;
*d8++ = *s8++;
return dest;
I think we should change this if-else into a plain if, something like that:
void * memcpy(void *dest, const void *src, size_t count) { char *tmp = (char *) dest, *s = (char *) src; char *d8 = (char *)dest, *s8 = (char *)src; unsigned long *dl = (unsigned long *)dest, *sl = (unsigned long *)src;
/* while all data is aligned (common case), copy a word at a time */ if ( (((int)dest | (int)src | count) & (sizeof(long) - 1)) == 0) { while (count) { *dl++ = *sl++; count -= sizeof(unsigned long); } } while (count--) *d8++ = *s8++;
return dest; }
This way we can have both - the "long" copy of a potential aligne dfirst part, and the byte copy of any trailing (or unaligned) part.
Best regards,
Wolfgang Denk

Wolfgang Denk a écrit :
I think we should change this if-else into a plain if, something like that:
void * memcpy(void *dest, const void *src, size_t count) { char *tmp = (char *) dest, *s = (char *) src; char *d8 = (char *)dest, *s8 = (char *)src; unsigned long *dl = (unsigned long *)dest, *sl = (unsigned long *)src;
/* while all data is aligned (common case), copy a word at a time */ if ( (((int)dest | (int)src | count) & (sizeof(long) - 1)) == 0) { while (count) { *dl++ = *sl++; count -= sizeof(unsigned long); } } while (count--) *d8++ = *s8++;
return dest; }
This way we can have both - the "long" copy of a potential aligne dfirst part, and the byte copy of any trailing (or unaligned) part.
I agree wholeheartedly with the idea but shouldn't it be more like this (untested) code :
void * memcpy(void *dest, const void *src, size_t count)
{ char *d8, *s8; unsigned long *dl = dest, *sl = src;
/* while all data is aligned (common case), copy multiple bytes at a time */ if ( (((int)(long)dest | (int)(long)src) & (sizeof(*dl) - 1)) == 0) { while (count >= sizeof(*dl)) { *dl++ = *sl++; count -= sizeof(*dl); } }
d8 = (char *)dl; s8 = (char *)sl;
/* copy any remaining data byte by byte */ while (count--) *d8++ = *s8++;
return dest; }
Remarks : 1) My curious (int) (long) pointer casts are intended to avoid compiler warnings while avoiding unnecessary calculations in long. On some architectures long calculations are less efficient than int ones. In fact I wonder whether, on such architectures, it might not also be better to perform the copy with int size chunks. 2) Personally I prefer sizeof(*dl) to sizeof(unsigned long) as there is less risk of error if the type of the chunks is changed. 3) In C (but not in C++) I think the casts from void * to unsigned long * are unnecessary.
But as I said all this is completely untested :(
Cheers, Chris

Chris Moore wrote:
I agree wholeheartedly with the idea but shouldn't it be more like this (untested) code :
void * memcpy(void *dest, const void *src, size_t count)
{ char *d8, *s8; unsigned long *dl = dest, *sl = src;
In here, would it be overkill to add byte copying until data is aligned, and then fall into the aligned copy code.
In that case, you'd still gain a speed increase if you're starting at an unaligned address ?
/* while all data is aligned (common case), copy multiple bytes at a time */ if ( (((int)(long)dest | (int)(long)src) & (sizeof(*dl) - 1)) == 0) { while (count >= sizeof(*dl)) { *dl++ = *sl++; count -= sizeof(*dl); } }
d8 = (char *)dl; s8 = (char *)sl;
/* copy any remaining data byte by byte */ while (count--) *d8++ = *s8++;
return dest; }
Regards Mark

On Friday 09 October 2009 06:11:16 Mark Jackson wrote:
Chris Moore wrote:
I agree wholeheartedly with the idea but shouldn't it be more like this (untested) code :
void * memcpy(void *dest, const void *src, size_t count)
{ char *d8, *s8; unsigned long *dl = dest, *sl = src;
In here, would it be overkill to add byte copying until data is aligned, and then fall into the aligned copy code.
both addresses have to be unaligned the same ...
if ((ulong)dl & (sizeof(*dl) - 1) == (ulong)sl & (sizeof(*sl) - 1))
In that case, you'd still gain a speed increase if you're starting at an unaligned address ?
now it's a question of how often does this come up and is it worth the code size increase ? -mike

Mike Frysinger a écrit :
On Friday 09 October 2009 06:11:16 Mark Jackson wrote:
Chris Moore wrote:
I agree wholeheartedly with the idea but shouldn't it be more like this (untested) code :
void * memcpy(void *dest, const void *src, size_t count)
{ char *d8, *s8; unsigned long *dl = dest, *sl = src;
In here, would it be overkill to add byte copying until data is aligned, and then fall into the aligned copy code.
both addresses have to be unaligned the same ...
if ((ulong)dl & (sizeof(*dl) - 1) == (ulong)sl & (sizeof(*sl) - 1))
In that case, you'd still gain a speed increase if you're starting at an unaligned address ?
now it's a question of how often does this come up and is it worth the code size increase ?
Like Mike I'm not sure if it is worth it for memcpy. However it may be for memset where only the destination has to be aligned.
Chris

Dear Chris Moore,
In message 4ACEBF19.4010902@free.fr you wrote:
I agree wholeheartedly with the idea but shouldn't it be more like this (untested) code :
Thanks for catching this.
Of course you are right - my code was untested as well, as usual ;-)
Best regards,
Wolfgang Denk
participants (4)
-
Chris Moore
-
Mark Jackson
-
Mike Frysinger
-
Wolfgang Denk