Re: [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible

On Wednesday 07 October 2009 04:44:26 Alessandro Rubini wrote:
--- a/lib_generic/string.c +++ b/lib_generic/string.c @@ -449,7 +449,16 @@ char * bcopy(const char * src, char * dest, int count) void * memcpy(void * dest,const void *src,size_t count) { char *tmp = (char *) dest, *s = (char *) src;
u32 *d32 = (u32 *)dest, *s32 = (u32 *) src;
/* if both are aligned, use 32-bit copy */
if ( (((int)dest & 3) | ((int)src & 3) | (count & 3)) == 0 ) {
while 64bit isnt in today, might as well avoid unclean code from the start when possible. in other words, used "unsigned int" rather than "u32" and cast to "unsigned long" rather than "int".
count /= 4;
count >>= 2 ? although gcc probably optimizes this properly. -mike

while 64bit isnt in today, might as well avoid unclean code from the start when possible. in other words, used "unsigned int" rather than "u32" and cast to "unsigned long" rather than "int".
You are right. Will do.
count /= 4;
count >>= 2 ? although gcc probably optimizes this properly.
gcc-4.0 for arm already turns it into a shift. I'll use "sizeof(unsigned long)" in version 2.
Thanks /alessandro

Dear Mike Frysinger,
In message 200910070452.02225.vapier@gentoo.org you wrote:
--- a/lib_generic/string.c +++ b/lib_generic/string.c @@ -449,7 +449,16 @@ char * bcopy(const char * src, char * dest, int count) void * memcpy(void * dest,const void *src,size_t count) { char *tmp = (char *) dest, *s = (char *) src;
u32 *d32 = (u32 *)dest, *s32 = (u32 *) src;
/* if both are aligned, use 32-bit copy */
if ( (((int)dest & 3) | ((int)src & 3) | (count & 3)) == 0 ) {
while 64bit isnt in today, might as well avoid unclean code from the start when possible. in other words, used "unsigned int" rather than "u32" and cast to "unsigned long" rather than "int".
count /= 4;
count >>= 2 ? although gcc probably optimizes this properly.
Neither one nor the other.
This should become
count /= sizof(int);
then.
Best regards,
Wolfgang Denk

I was making my v2, and I found a problem wrt:
while 64bit isnt in today, might as well avoid unclean code from the start when possible. in other words, used "unsigned int" rather than "u32" and cast to "unsigned long" rather than "int".
Since int is 32 also on 64bit systems, I used unsigned long. For memcpy all is well, for memset I have this problem:
void * memset(void * s,int c,size_t count) { char *xs = (char *) s; unsigned long *sl = (unsigned long *) s; unsigned long cl;
/* do it one word at a time (32 bits or 64 bits) if possible */ if ( ((count | (int)s) & (sizeof(long) - 1)) == 0) { count /= sizeof(long); cl = (c & 0xff) | ((c & 0xff) << 8); cl |= cl << 16; if (sizeof(long) > 4) cl |= cl << 32; while (count--) *sl++ = cl; return s; } /* else, fill 8 bits at a time */ while (count--) *xs++ = c;
return s; }
string.c:416: warning: left shift count >= width of type
(obviously there is no such shift in the generated code, since the condition is false at compile time).
I think I'll stick to u32 for memset, unless I get better suggestions.
/alessandro

Dear Alessandro Rubini,
In message 20091008074114.GA30203@mail.gnudd.com you wrote:
Since int is 32 also on 64bit systems, I used unsigned long.
Note that this is not guaranteed, though. It could be 64 bit as well.
/* do it one word at a time (32 bits or 64 bits) if possible */ if ( ((count | (int)s) & (sizeof(long) - 1)) == 0) { count /= sizeof(long); cl = (c & 0xff) | ((c & 0xff) << 8); cl |= cl << 16; if (sizeof(long) > 4) cl |= cl << 32;
How about:
cl = 0; for (i=0; i<sizeof(long); ++i) { cl <<= 8; cl |= c & 0xff; }
GCC optimization will do the rest...
Best regards,
Wolfgang Denk

Dear Alessandro Rubini,
In message 20091008074114.GA30203@mail.gnudd.com you wrote:
Since int is 32 also on 64bit systems, I used unsigned long.
Note that this is not guaranteed, though. It could be 64 bit as well.
/* do it one word at a time (32 bits or 64 bits) if possible */ if ( ((count | (int)s) & (sizeof(long) - 1)) == 0) { count /= sizeof(long); cl = (c & 0xff) | ((c & 0xff) << 8); cl |= cl << 16; if (sizeof(long) > 4) cl |= cl << 32;
How about:
cl = 0; for (i=0; i<sizeof(long); ++i) { cl <<= 8; cl |= c & 0xff; }
GCC optimization will do the rest...
If you want gcc to optimise well, make it easy to do so. Changing the for loop into: for (i=sizeof(long); i; --i) makes it easier for gcc and more likely to result in optimal code.
Similarly for copy ops for(--top,--fromp, i = len/4; i; --i) *++top = *++fromp; /* pre incr can be one op on some archs */
Jocke

Dear Joakim Tjernlund,
In message OF26FEA2CB.36E102EE-ONC1257649.0031301D-C1257649.0031EDAF@transmode.se you wrote:
How about:
cl = 0; for (i=0; i<sizeof(long); ++i) { cl <<= 8; cl |= c & 0xff; }
GCC optimization will do the rest...
If you want gcc to optimise well, make it easy to do so. Changing the for loop into: for (i=sizeof(long); i; --i) makes it easier for gcc and more likely to result in optimal code.
Did you actually _check_ the code? (I did).
It does not matter. The generated code is identical.
What matters might be compiler options - "-Os" generates a small loop, and "-O3" and higher will unroll the loop, which is way more code.
Best regards,
Wolfgang Denk

Wolfgang Denk wd@denx.de wrote on 08/10/2009 14:49:15:
Dear Joakim Tjernlund,
In message <OF26FEA2CB.36E102EE-ONC1257649.0031301D-C1257649. 0031EDAF@transmode.se> you wrote:
How about:
cl = 0; for (i=0; i<sizeof(long); ++i) { cl <<= 8; cl |= c & 0xff; }
GCC optimization will do the rest...
If you want gcc to optimise well, make it easy to do so. Changing the for loop into: for (i=sizeof(long); i; --i) makes it easier for gcc and more likely to result in optimal code.
Did you actually _check_ the code? (I did).
No, but I have done this in the past and found that gcc does stupid things sometimes, especially loops, and it depends on arch and gcc version.
It does not matter. The generated code is identical.
So my question is: Did you check all gcc versions and arches?
Jocke

Dear Joakim Tjernlund,
In message OF15EDFD3F.769F0248-ONC1257649.0049C8AE-C1257649.004A1776@transmode.se you wrote:
So my question is: Did you check all gcc versions and arches?
Of course not :-)
Best regards,
Wolfgang Denk

On 10/08/09 03:41, Alessandro Rubini wrote:
For memcpy all is well, for memset I have this problem:
if (sizeof(long) > 4) cl |= cl << 32;
string.c:416: warning: left shift count >= width of type
(obviously there is no such shift in the generated code, since the condition is false at compile time).
I think I'll stick to u32 for memset, unless I get better suggestions.
What about if (sizeof(long) > 4) cl |= (unsigned long long)cl << 32; ?
Eric
participants (5)
-
Alessandro Rubini
-
Eric Lammerts
-
Joakim Tjernlund
-
Mike Frysinger
-
Wolfgang Denk