[U-Boot] [PATCH] arm: Add armv6 and armv7 optimized swab functions

From: Rob Herring rob.herring@calxeda.com
swab functions are heavily used by FDT code, so enable optimized assembly code for ARMv6 and later.
Signed-off-by: Rob Herring rob.herring@calxeda.com --- arch/arm/include/asm/byteorder.h | 16 ++++++++++++++++ 1 files changed, 16 insertions(+), 0 deletions(-)
diff --git a/arch/arm/include/asm/byteorder.h b/arch/arm/include/asm/byteorder.h index c3489f1..9df5844 100644 --- a/arch/arm/include/asm/byteorder.h +++ b/arch/arm/include/asm/byteorder.h @@ -23,6 +23,22 @@ # define __SWAB_64_THRU_32__ #endif
+#if defined(__ARM_ARCH_7A__) || defined(__ARM_ARCH_6__) +static inline __u16 __attribute__((const)) ___arch_swab16(__u16 x) +{ + __asm__ ("rev16 %0, %1" : "=r" (x) : "r" (x)); + return x; +} +#define __arch_swab16 ___arch_swab16 + +static inline __u32 __attribute__((const)) ___arch_swab32(__u32 x) +{ + __asm__ ("rev %0, %1" : "=r" (x) : "r" (x)); + return x; +} +#define __arch_swab32 ___arch_swab32 +#endif + #ifdef __ARMEB__ #include <linux/byteorder/big_endian.h> #else

Dear Rob Herring,
In message 1292425994-24331-1-git-send-email-robherring2@gmail.com you wrote:
From: Rob Herring rob.herring@calxeda.com
swab functions are heavily used by FDT code, so enable optimized assembly code for ARMv6 and later.
Signed-off-by: Rob Herring rob.herring@calxeda.com
arch/arm/include/asm/byteorder.h | 16 ++++++++++++++++ 1 files changed, 16 insertions(+), 0 deletions(-)
Do you have any numbers if this changes gives any measurable improvement?
Best regards,
Wolfgang Denk

Wolfgang,
On 12/17/2010 02:21 PM, Wolfgang Denk wrote:
Dear Rob Herring,
In message1292425994-24331-1-git-send-email-robherring2@gmail.com you wrote:
From: Rob Herringrob.herring@calxeda.com
swab functions are heavily used by FDT code, so enable optimized assembly code for ARMv6 and later.
Signed-off-by: Rob Herringrob.herring@calxeda.com
arch/arm/include/asm/byteorder.h | 16 ++++++++++++++++ 1 files changed, 16 insertions(+), 0 deletions(-)
Do you have any numbers if this changes gives any measurable improvement?
I have an instruction trace capture and see repeated calls to swab32 by the fdt code. It's an obvious low hanging fruit. The boot time for device tree vs. non-device tree is noticeably longer, but I don't have any formal measurements.
Rob

Rob Herring robherring2@gmail.com writes:
From: Rob Herring rob.herring@calxeda.com
swab functions are heavily used by FDT code, so enable optimized assembly code for ARMv6 and later.
Signed-off-by: Rob Herring rob.herring@calxeda.com
arch/arm/include/asm/byteorder.h | 16 ++++++++++++++++ 1 files changed, 16 insertions(+), 0 deletions(-)
diff --git a/arch/arm/include/asm/byteorder.h b/arch/arm/include/asm/byteorder.h index c3489f1..9df5844 100644 --- a/arch/arm/include/asm/byteorder.h +++ b/arch/arm/include/asm/byteorder.h @@ -23,6 +23,22 @@ # define __SWAB_64_THRU_32__ #endif
+#if defined(__ARM_ARCH_7A__) || defined(__ARM_ARCH_6__) +static inline __u16 __attribute__((const)) ___arch_swab16(__u16 x) +{
- __asm__ ("rev16 %0, %1" : "=r" (x) : "r" (x));
- return x;
+}
Pay close attention to what gcc does with this as it is prone to add unnecessary masking of the low halfword. If the callers are well-behaved (argument having top halfword clear), making the parameter and return types here plain unsigned (or u32) gives better code.

On 12/17/2010 03:27 PM, Måns Rullgård wrote:
Rob Herringrobherring2@gmail.com writes:
From: Rob Herringrob.herring@calxeda.com
swab functions are heavily used by FDT code, so enable optimized assembly code for ARMv6 and later.
Signed-off-by: Rob Herringrob.herring@calxeda.com
arch/arm/include/asm/byteorder.h | 16 ++++++++++++++++ 1 files changed, 16 insertions(+), 0 deletions(-)
diff --git a/arch/arm/include/asm/byteorder.h b/arch/arm/include/asm/byteorder.h index c3489f1..9df5844 100644 --- a/arch/arm/include/asm/byteorder.h +++ b/arch/arm/include/asm/byteorder.h @@ -23,6 +23,22 @@ # define __SWAB_64_THRU_32__ #endif
+#if defined(__ARM_ARCH_7A__) || defined(__ARM_ARCH_6__) +static inline __u16 __attribute__((const)) ___arch_swab16(__u16 x) +{
- __asm__ ("rev16 %0, %1" : "=r" (x) : "r" (x));
- return x;
+}
Pay close attention to what gcc does with this as it is prone to add unnecessary masking of the low halfword. If the callers are well-behaved (argument having top halfword clear), making the parameter and return types here plain unsigned (or u32) gives better code.
This straight from the Linux code and there are only a few users of swab16 (none in my build).
Rob

Rob Herring robherring2@gmail.com writes:
On 12/17/2010 03:27 PM, Måns Rullgård wrote:
Rob Herringrobherring2@gmail.com writes:
From: Rob Herringrob.herring@calxeda.com
swab functions are heavily used by FDT code, so enable optimized assembly code for ARMv6 and later.
Signed-off-by: Rob Herringrob.herring@calxeda.com
arch/arm/include/asm/byteorder.h | 16 ++++++++++++++++ 1 files changed, 16 insertions(+), 0 deletions(-)
diff --git a/arch/arm/include/asm/byteorder.h b/arch/arm/include/asm/byteorder.h index c3489f1..9df5844 100644 --- a/arch/arm/include/asm/byteorder.h +++ b/arch/arm/include/asm/byteorder.h @@ -23,6 +23,22 @@ # define __SWAB_64_THRU_32__ #endif
+#if defined(__ARM_ARCH_7A__) || defined(__ARM_ARCH_6__) +static inline __u16 __attribute__((const)) ___arch_swab16(__u16 x) +{
- __asm__ ("rev16 %0, %1" : "=r" (x) : "r" (x));
- return x;
+}
Pay close attention to what gcc does with this as it is prone to add unnecessary masking of the low halfword. If the callers are well-behaved (argument having top halfword clear), making the parameter and return types here plain unsigned (or u32) gives better code.
This straight from the Linux code and there are only a few users of swab16 (none in my build).
Look at the generated code if you don't believe me.

Dear Rob Herring,
In message 4D0CEB67.2040502@gmail.com you wrote:
This straight from the Linux code and there are only a few users of swab16 (none in my build).
Given that we have no idea if this code really gives any measurable performance improvement, and that it appears to be dangerous as well, I tend to not include that as is.
Thanks.
Wolfgang Denk
participants (3)
-
Måns Rullgård
-
Rob Herring
-
Wolfgang Denk