[U-Boot] [PATCH V2 0/3] make memcpy and memset faster

newer
[U-Boot] [PATCH] relocation: Do...

Alessandro Rubini

8 Oct 2009 8 Oct '09

1:29 p.m.

I've added 32-bit lcd to the Nomadik (not submitted yet), and I found the scroll to be very slow, as the screen is big.

Instead of activating the "if 0" stanza for 32-bit scroll in lcd.c, I'd better have a faster memcpy/memset globally. So this patch set adds ulong-wide memcpy and memset, then removes the "#if 0" part in the scroll function. For me scrolling is 4 times faster on a 32 bit system.

V2: I incorporated most of the comments, but I didn't change the for loops to help the compiler optimizing it, since nowadays gcc is already doing the loops his own way irrespective of what i write.

Similarly, I'm not interested in "4 bytes at a time, then 1 at a time" as it's quite a corner case. If such optimizations are really useful, then we'd better have hand-crafted assembly for each arch, possibly lifted from glibc.

Alessandro Rubini (3): memcpy: copy one word at a time if possible memset: fill one word at a time if possible lcd: remove '#if 0' 32-bit scroll, now memcpy does it

common/lcd.c | 21 --------------------- lib_generic/string.c | 34 +++++++++++++++++++++++++++++----- 2 files changed, 29 insertions(+), 26 deletions(-)

Show replies by date

Alessandro Rubini

8 Oct 8 Oct

1:30 p.m.

New subject: [U-Boot] [PATCH V2 1/3] memcpy: copy one word at a time if possible

From: Alessandro Rubini rubini@unipv.it

Signed-off-by: Alessandro Rubini rubini@unipv.it Acked-by: Andrea Gallo andrea.gallo@stericsson.com --- lib_generic/string.c | 17 +++++++++++++---- 1 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/lib_generic/string.c b/lib_generic/string.c index 181eda6..9911941 100644 --- a/lib_generic/string.c +++ b/lib_generic/string.c @@ -446,12 +446,21 @@ char * bcopy(const char * src, char * dest, int count) * You should not use this function to access IO space, use memcpy_toio() * or memcpy_fromio() instead. */ -void * memcpy(void * dest,const void *src,size_t count) +void * memcpy(void *dest, const void *src, size_t count) { - char *tmp = (char *) dest, *s = (char *) src; + char *d8 = (char *)dest, *s8 = (char *)src; + unsigned long *dl = (unsigned long *)dest, *sl = (unsigned long *)src;

+ /* if all data is aligned (common case), copy a word at a time */ + if ( (((int)dest | (int)src | count) & (sizeof(long) - 1)) == 0) { + count /= sizeof(unsigned long); + while (count--) + *dl++ = *sl++; + return dest; + } + /* else, use 1-byte copy */ while (count--) - *tmp++ = *s++; + *d8++ = *s8++;

return dest; }

-- 1.6.0.2

Alessandro Rubini

1:30 p.m.

New subject: [U-Boot] [PATCH V2 2/3] memset: fill one word at a time if possible

From: Alessandro Rubini rubini@unipv.it

Signed-off-by: Alessandro Rubini rubini@unipv.it Acked-by: Andrea Gallo andrea.gallo@stericsson.com --- lib_generic/string.c | 17 ++++++++++++++++- 1 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/lib_generic/string.c b/lib_generic/string.c index 9911941..5f7aff9 100644 --- a/lib_generic/string.c +++ b/lib_generic/string.c @@ -404,7 +404,22 @@ char *strswab(const char *s) void * memset(void * s,int c,size_t count) { char *xs = (char *) s; - + unsigned long *sl = (unsigned long *) s; + unsigned long cl = 0; + int i; + + /* do it one word at a time (32 bits or 64 bits) if possible */ + if ( ((count | (int)s) & (sizeof(long) - 1)) == 0) { + count /= sizeof(long); + for (i=0; i<sizeof(long); ++i) { + cl <<= 8; + cl |= c & 0xff; + } + while (count--) + *sl++ = cl; + return s; + } + /* else, fill 8 bits at a time */ while (count--) *xs++ = c;

-- 1.6.0.2

Alessandro Rubini

1:30 p.m.

New subject: [U-Boot] [PATCH V2 3/3] lcd: remove '#if 0' 32-bit scroll, now memcpy does it

From: Alessandro Rubini rubini@unipv.it

Signed-off-by: Alessandro Rubini rubini@unipv.it Acked-by: Andrea Gallo andrea.gallo@stericsson.com --- common/lcd.c | 21 --------------------- 1 files changed, 0 insertions(+), 21 deletions(-)

diff --git a/common/lcd.c b/common/lcd.c index dc8fea6..4e31618 100644 --- a/common/lcd.c +++ b/common/lcd.c @@ -99,32 +99,11 @@ static int lcd_getfgcolor (void);

static void console_scrollup (void) { -#if 1 /* Copy up rows ignoring the first one */ memcpy (CONSOLE_ROW_FIRST, CONSOLE_ROW_SECOND, CONSOLE_SCROLL_SIZE);

/* Clear the last one */ memset (CONSOLE_ROW_LAST, COLOR_MASK(lcd_color_bg), CONSOLE_ROW_SIZE); -#else - /* - * Poor attempt to optimize speed by moving "long"s. - * But the code is ugly, and not a bit faster :-( - */ - ulong *t = (ulong *)CONSOLE_ROW_FIRST; - ulong *s = (ulong *)CONSOLE_ROW_SECOND; - ulong l = CONSOLE_SCROLL_SIZE / sizeof(ulong); - uchar c = lcd_color_bg & 0xFF; - ulong val= (c<<24) | (c<<16) | (c<<8) | c; - - while (l--) - *t++ = *s++; - - t = (ulong *)CONSOLE_ROW_LAST; - l = CONSOLE_ROW_SIZE / sizeof(ulong); - - while (l-- > 0) - *t++ = val; -#endif }

/*----------------------------------------------------------------------*/

-- 1.6.0.2

Wolfgang Denk

10:36 p.m.

Dear Alessandro Rubini,

In message cover.1255000877.git.rubini@unipv.it you wrote:

...

Similarly, I'm not interested in "4 bytes at a time, then 1 at a time" as it's quite a corner case. If such optimizations are really useful, then we'd better have hand-crafted assembly for each arch, possibly lifted from glibc.

I disagree here, especially as the change is actually trivial to implement and proably results even in smaller code size.

Best regards,

Wolfgang Denk

-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Pray to God, but keep rowing to shore. - Russian Proverb

Mike Frysinger

11:30 p.m.

On Thursday 08 October 2009 07:29:51 Alessandro Rubini wrote:

...

Similarly, I'm not interested in "4 bytes at a time, then 1 at a time" as it's quite a corner case. If such optimizations are really useful, then we'd better have hand-crafted assembly for each arch, possibly lifted from glibc.

why ? it's trivial to implement with little code impact. have your code run while the len is larger than 4 (sizeof-whatever), then fall through to the loop that runs while the len is larger than 0 instead of immediately returning. -mike

5691

Age (days ago)

5691

Last active (days ago)

List overview

Download

5 comments

3 participants

tags (0)

participants (3)

Alessandro Rubini
Mike Frysinger
Wolfgang Denk