[U-Boot] [PATCH 1/2] ARM: Default to using optimized memset and memcpy routines

We have long had available optimized versions of the memset and memcpy functions that are borrowed from the Linux kernel. We should use these in normal conditions as the speed wins in many workflows outweigh the relatively minor size increase. However, we have a number of places where we're simply too close to size limits in SPL and must be able to make the size vs performance trade-off in those cases.
Cc: Philippe Reynes tremyfr@yahoo.fr Cc: Eric Jarrige eric.jarrige@armadeus.org Cc: Heiko Schocher hs@denx.de Cc: Magnus Lilja lilja.magnus@gmail.com Cc: Lokesh Vutla lokeshvutla@ti.com Cc: Chander Kashyap k.chander@samsung.com Cc: Akshay Saraswat akshay.s@samsung.com Cc: Simon Glass sjg@chromium.org Cc: Stefan Roese sr@denx.de Signed-off-by: Tom Rini trini@konsulko.com --- arch/arm/Kconfig | 22 ++++++++++++++++++++-- arch/arm/lib/Makefile | 4 ++-- common/init/board_init.c | 2 +- configs/apf27_defconfig | 1 + configs/axm_defconfig | 2 ++ configs/corvus_defconfig | 2 ++ configs/mx31pdk_defconfig | 2 ++ configs/omap4_sdp4430_defconfig | 2 ++ configs/smartweb_defconfig | 2 ++ configs/smdk5250_defconfig | 2 ++ configs/snow_defconfig | 2 ++ configs/spring_defconfig | 2 ++ configs/taurus_defconfig | 2 ++ configs/x600_defconfig | 2 ++ examples/api/Makefile | 5 ++++- 15 files changed, 48 insertions(+), 6 deletions(-)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 0ed36cded486..5762940b7043 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -128,7 +128,16 @@ config ENABLE_ARM_SOC_BOOT0_HOOK
config USE_ARCH_MEMCPY bool "Use an assembly optimized implementation of memcpy" - default y if CPU_V7 + default y + depends on !ARM64 + help + Enable the generation of an optimized version of memcpy. + Such implementation may be faster under some conditions + but may increase the binary size. + +config SPL_USE_ARCH_MEMCPY + bool "Use an assembly optimized implementation of memcpy" + default y if USE_ARCH_MEMCPY depends on !ARM64 help Enable the generation of an optimized version of memcpy. @@ -137,7 +146,16 @@ config USE_ARCH_MEMCPY
config USE_ARCH_MEMSET bool "Use an assembly optimized implementation of memset" - default y if CPU_V7 + default y + depends on !ARM64 + help + Enable the generation of an optimized version of memset. + Such implementation may be faster under some conditions + but may increase the binary size. + +config SPL_USE_ARCH_MEMSET + bool "Use an assembly optimized implementation of memset" + default y if USE_ARCH_MEMSET depends on !ARM64 help Enable the generation of an optimized version of memset. diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile index 024139da25fa..166fa9e3dad0 100644 --- a/arch/arm/lib/Makefile +++ b/arch/arm/lib/Makefile @@ -30,12 +30,12 @@ obj-$(CONFIG_CMD_BOOTI) += bootm.o obj-$(CONFIG_CMD_BOOTM) += bootm.o obj-$(CONFIG_CMD_BOOTZ) += bootm.o zimage.o obj-$(CONFIG_SYS_L2_PL310) += cache-pl310.o -obj-$(CONFIG_USE_ARCH_MEMSET) += memset.o -obj-$(CONFIG_USE_ARCH_MEMCPY) += memcpy.o else obj-$(CONFIG_SPL_FRAMEWORK) += spl.o obj-$(CONFIG_SPL_FRAMEWORK) += zimage.o endif +obj-$(CONFIG_$(SPL_)USE_ARCH_MEMSET) += memset.o +obj-$(CONFIG_$(SPL_)USE_ARCH_MEMCPY) += memcpy.o obj-$(CONFIG_SEMIHOSTING) += semihosting.o
obj-y += sections.o diff --git a/common/init/board_init.c b/common/init/board_init.c index ef01a9aeaad9..193d8180a9c5 100644 --- a/common/init/board_init.c +++ b/common/init/board_init.c @@ -17,7 +17,7 @@ DECLARE_GLOBAL_DATA_PTR; */ #if !defined(CONFIG_SPL_BUILD) || \ (defined(CONFIG_SPL_LIBGENERIC_SUPPORT) && \ - !defined(CONFIG_USE_ARCH_MEMSET)) + !defined(CONFIG_SPL_USE_ARCH_MEMCPY)) #define _USE_MEMCPY #endif
diff --git a/configs/apf27_defconfig b/configs/apf27_defconfig index 2da500aec643..64040aa32102 100644 --- a/configs/apf27_defconfig +++ b/configs/apf27_defconfig @@ -1,4 +1,5 @@ CONFIG_ARM=y +# CONFIG_SPL_USE_ARCH_MEMCPY is not set CONFIG_TARGET_APF27=y CONFIG_SPL_NAND_SUPPORT=y CONFIG_SPL_SERIAL_SUPPORT=y diff --git a/configs/axm_defconfig b/configs/axm_defconfig index db988c8be7c2..539e77659288 100644 --- a/configs/axm_defconfig +++ b/configs/axm_defconfig @@ -1,4 +1,6 @@ CONFIG_ARM=y +# CONFIG_SPL_USE_ARCH_MEMCPY is not set +# CONFIG_SPL_USE_ARCH_MEMSET is not set CONFIG_ARCH_AT91=y CONFIG_TARGET_TAURUS=y CONFIG_SPL_GPIO_SUPPORT=y diff --git a/configs/corvus_defconfig b/configs/corvus_defconfig index e33d3719b9df..fc10399844c1 100644 --- a/configs/corvus_defconfig +++ b/configs/corvus_defconfig @@ -1,4 +1,6 @@ CONFIG_ARM=y +# CONFIG_SPL_USE_ARCH_MEMCPY is not set +# CONFIG_SPL_USE_ARCH_MEMSET is not set CONFIG_ARCH_AT91=y CONFIG_TARGET_CORVUS=y CONFIG_SPL_GPIO_SUPPORT=y diff --git a/configs/mx31pdk_defconfig b/configs/mx31pdk_defconfig index bb1f121f3f4e..59084d7b65d4 100644 --- a/configs/mx31pdk_defconfig +++ b/configs/mx31pdk_defconfig @@ -1,4 +1,6 @@ CONFIG_ARM=y +# CONFIG_SPL_USE_ARCH_MEMCPY is not set +# CONFIG_SPL_USE_ARCH_MEMSET is not set CONFIG_TARGET_MX31PDK=y CONFIG_SPL_LIBGENERIC_SUPPORT=y CONFIG_SPL_NAND_SUPPORT=y diff --git a/configs/omap4_sdp4430_defconfig b/configs/omap4_sdp4430_defconfig index f3a8b0c8d227..862f3f01a397 100644 --- a/configs/omap4_sdp4430_defconfig +++ b/configs/omap4_sdp4430_defconfig @@ -1,4 +1,6 @@ CONFIG_ARM=y +# CONFIG_SPL_USE_ARCH_MEMCPY is not set +# CONFIG_SPL_USE_ARCH_MEMSET is not set CONFIG_OMAP44XX=y # CONFIG_SPL_I2C_SUPPORT is not set # CONFIG_SPL_NAND_SUPPORT is not set diff --git a/configs/smartweb_defconfig b/configs/smartweb_defconfig index eab598e41fae..a0d31666e8d8 100644 --- a/configs/smartweb_defconfig +++ b/configs/smartweb_defconfig @@ -1,4 +1,6 @@ CONFIG_ARM=y +# CONFIG_SPL_USE_ARCH_MEMCPY is not set +# CONFIG_SPL_USE_ARCH_MEMSET is not set CONFIG_ARCH_AT91=y CONFIG_TARGET_SMARTWEB=y CONFIG_SPL_GPIO_SUPPORT=y diff --git a/configs/smdk5250_defconfig b/configs/smdk5250_defconfig index 4ef41437a29d..ebcce9120468 100644 --- a/configs/smdk5250_defconfig +++ b/configs/smdk5250_defconfig @@ -1,4 +1,6 @@ CONFIG_ARM=y +# CONFIG_SPL_USE_ARCH_MEMCPY is not set +# CONFIG_SPL_USE_ARCH_MEMSET is not set CONFIG_ARCH_EXYNOS=y CONFIG_ARCH_EXYNOS5=y CONFIG_TARGET_SMDK5250=y diff --git a/configs/snow_defconfig b/configs/snow_defconfig index ef3cfa15191b..2e06a2621c45 100644 --- a/configs/snow_defconfig +++ b/configs/snow_defconfig @@ -1,4 +1,6 @@ CONFIG_ARM=y +# CONFIG_SPL_USE_ARCH_MEMCPY is not set +# CONFIG_SPL_USE_ARCH_MEMSET is not set CONFIG_ARCH_EXYNOS=y CONFIG_ARCH_EXYNOS5=y CONFIG_TARGET_SNOW=y diff --git a/configs/spring_defconfig b/configs/spring_defconfig index 3bd86442767f..0227d57cdb53 100644 --- a/configs/spring_defconfig +++ b/configs/spring_defconfig @@ -1,4 +1,6 @@ CONFIG_ARM=y +# CONFIG_SPL_USE_ARCH_MEMCPY is not set +# CONFIG_SPL_USE_ARCH_MEMSET is not set CONFIG_ARCH_EXYNOS=y CONFIG_ARCH_EXYNOS5=y CONFIG_TARGET_SPRING=y diff --git a/configs/taurus_defconfig b/configs/taurus_defconfig index 793de2909f68..0f6841b21f2e 100644 --- a/configs/taurus_defconfig +++ b/configs/taurus_defconfig @@ -1,4 +1,6 @@ CONFIG_ARM=y +# CONFIG_SPL_USE_ARCH_MEMCPY is not set +# CONFIG_SPL_USE_ARCH_MEMSET is not set CONFIG_ARCH_AT91=y CONFIG_TARGET_TAURUS=y CONFIG_SPL_GPIO_SUPPORT=y diff --git a/configs/x600_defconfig b/configs/x600_defconfig index 4b47fc68f658..28e268e19b08 100644 --- a/configs/x600_defconfig +++ b/configs/x600_defconfig @@ -1,4 +1,6 @@ CONFIG_ARM=y +# CONFIG_SPL_USE_ARCH_MEMCPY is not set +# CONFIG_SPL_USE_ARCH_MEMSET is not set CONFIG_TARGET_X600=y CONFIG_SPL_LIBCOMMON_SUPPORT=y CONFIG_SPL_LIBGENERIC_SUPPORT=y diff --git a/examples/api/Makefile b/examples/api/Makefile index 6cffee74652f..dab6398bab82 100644 --- a/examples/api/Makefile +++ b/examples/api/Makefile @@ -35,6 +35,9 @@ EXT_COBJ-y += lib/string.o EXT_COBJ-y += lib/time.o EXT_COBJ-y += lib/vsprintf.o EXT_SOBJ-$(CONFIG_PPC) += arch/powerpc/lib/ppcstring.o +ifeq ($(ARCH),arm) +EXT_SOBJ-$(CONFIG_USE_ARCH_MEMSET) += arch/arm/lib/memset.o +endif
# Create a list of object files to be compiled OBJS := $(OBJ-y) $(notdir $(EXT_COBJ-y) $(EXT_SOBJ-y)) @@ -60,5 +63,5 @@ $(addprefix $(obj)/,$(notdir $(EXT_COBJ-y))): $(obj)/%.o: lib/%.c FORCE $(call if_changed_rule,cc_o_c)
# Rule to build architecture-specific library assembly files -$(addprefix $(obj)/,$(notdir $(EXT_SOBJ-y))): $(obj)/%.o: arch/powerpc/lib/%.S FORCE +$(addprefix $(obj)/,$(notdir $(EXT_SOBJ-y))): $(obj)/%.o: arch/$(ARCH)/lib/%.S FORCE $(call if_changed_dep,as_o_S)

We can make the code read more easily here by simply using memset() always as when we don't have an optimized version of the function we will still have a version of this function around anyhow.
Cc: Simon Glass sjg@chromium.org Signed-off-by: Tom Rini trini@konsulko.com --- common/init/board_init.c | 18 ------------------ 1 file changed, 18 deletions(-)
diff --git a/common/init/board_init.c b/common/init/board_init.c index 193d8180a9c5..bf4255b4aeba 100644 --- a/common/init/board_init.c +++ b/common/init/board_init.c @@ -11,16 +11,6 @@
DECLARE_GLOBAL_DATA_PTR;
-/* - * It isn't trivial to figure out whether memcpy() exists. The arch-specific - * memcpy() is not normally available in SPL due to code size. - */ -#if !defined(CONFIG_SPL_BUILD) || \ - (defined(CONFIG_SPL_LIBGENERIC_SUPPORT) && \ - !defined(CONFIG_SPL_USE_ARCH_MEMCPY)) -#define _USE_MEMCPY -#endif - /* Unfortunately x86 or ARM can't compile this code as gd cannot be assigned */ #if !defined(CONFIG_X86) && !defined(CONFIG_ARM) __weak void arch_setup_gd(struct global_data *gd_ptr) @@ -110,9 +100,6 @@ ulong board_init_f_alloc_reserve(ulong top) void board_init_f_init_reserve(ulong base) { struct global_data *gd_ptr; -#ifndef _USE_MEMCPY - int *ptr; -#endif
/* * clear GD entirely and set it up. @@ -121,12 +108,7 @@ void board_init_f_init_reserve(ulong base)
gd_ptr = (struct global_data *)base; /* zero the area */ -#ifdef _USE_MEMCPY memset(gd_ptr, '\0', sizeof(*gd)); -#else - for (ptr = (int *)gd_ptr; ptr < (int *)(gd_ptr + 1); ) - *ptr++ = 0; -#endif /* set GD unless architecture did it already */ #if !defined(CONFIG_ARM) arch_setup_gd(gd_ptr);

On 12 January 2017 at 11:16, Tom Rini trini@konsulko.com wrote:
We can make the code read more easily here by simply using memset() always as when we don't have an optimized version of the function we will still have a version of this function around anyhow.
Cc: Simon Glass sjg@chromium.org Signed-off-by: Tom Rini trini@konsulko.com
common/init/board_init.c | 18 ------------------ 1 file changed, 18 deletions(-)
I recall this didn't work before, but I'm pleased it now does.
Reviewed-by: Simon Glass sjg@chromium.org

On Thu, Jan 19, 2017 at 06:57:51AM -0700, Simon Glass wrote:
On 12 January 2017 at 11:16, Tom Rini trini@konsulko.com wrote:
We can make the code read more easily here by simply using memset() always as when we don't have an optimized version of the function we will still have a version of this function around anyhow.
Cc: Simon Glass sjg@chromium.org Signed-off-by: Tom Rini trini@konsulko.com
common/init/board_init.c | 18 ------------------ 1 file changed, 18 deletions(-)
I recall this didn't work before, but I'm pleased it now does.
Do you recall where, if it was a runtime rather than build time failure? There was a case or two of build-time failure I had to address.

Hi Tom,
On 19 January 2017 at 07:25, Tom Rini trini@konsulko.com wrote:
On Thu, Jan 19, 2017 at 06:57:51AM -0700, Simon Glass wrote:
On 12 January 2017 at 11:16, Tom Rini trini@konsulko.com wrote:
We can make the code read more easily here by simply using memset() always as when we don't have an optimized version of the function we will still have a version of this function around anyhow.
Cc: Simon Glass sjg@chromium.org Signed-off-by: Tom Rini trini@konsulko.com
common/init/board_init.c | 18 ------------------ 1 file changed, 18 deletions(-)
I recall this didn't work before, but I'm pleased it now does.
Do you recall where, if it was a runtime rather than build time failure? There was a case or two of build-time failure I had to address.
Yes it was just a build failure, so we should be fine.
Regards, Simon

On Thu, Jan 12, 2017 at 01:16:03PM -0500, Tom Rini wrote:
We can make the code read more easily here by simply using memset() always as when we don't have an optimized version of the function we will still have a version of this function around anyhow.
Cc: Simon Glass sjg@chromium.org Signed-off-by: Tom Rini trini@konsulko.com Reviewed-by: Simon Glass sjg@chromium.org
Applied to u-boot/master, thanks!

On 12.01.2017 19:16, Tom Rini wrote:
We have long had available optimized versions of the memset and memcpy functions that are borrowed from the Linux kernel. We should use these in normal conditions as the speed wins in many workflows outweigh the relatively minor size increase. However, we have a number of places where we're simply too close to size limits in SPL and must be able to make the size vs performance trade-off in those cases.
Cc: Philippe Reynes tremyfr@yahoo.fr Cc: Eric Jarrige eric.jarrige@armadeus.org Cc: Heiko Schocher hs@denx.de Cc: Magnus Lilja lilja.magnus@gmail.com Cc: Lokesh Vutla lokeshvutla@ti.com Cc: Chander Kashyap k.chander@samsung.com Cc: Akshay Saraswat akshay.s@samsung.com Cc: Simon Glass sjg@chromium.org Cc: Stefan Roese sr@denx.de Signed-off-by: Tom Rini trini@konsulko.com
arch/arm/Kconfig | 22 ++++++++++++++++++++-- arch/arm/lib/Makefile | 4 ++-- common/init/board_init.c | 2 +- configs/apf27_defconfig | 1 + configs/axm_defconfig | 2 ++ configs/corvus_defconfig | 2 ++ configs/mx31pdk_defconfig | 2 ++ configs/omap4_sdp4430_defconfig | 2 ++ configs/smartweb_defconfig | 2 ++ configs/smdk5250_defconfig | 2 ++ configs/snow_defconfig | 2 ++ configs/spring_defconfig | 2 ++ configs/taurus_defconfig | 2 ++ configs/x600_defconfig | 2 ++ examples/api/Makefile | 5 ++++- 15 files changed, 48 insertions(+), 6 deletions(-)
For the x600 part:
Acked-by: Stefan Roese sr@denx.de
Thanks, Stefan

On 12 January 2017 at 11:16, Tom Rini trini@konsulko.com wrote:
We have long had available optimized versions of the memset and memcpy functions that are borrowed from the Linux kernel. We should use these in normal conditions as the speed wins in many workflows outweigh the relatively minor size increase. However, we have a number of places where we're simply too close to size limits in SPL and must be able to make the size vs performance trade-off in those cases.
Cc: Philippe Reynes tremyfr@yahoo.fr Cc: Eric Jarrige eric.jarrige@armadeus.org Cc: Heiko Schocher hs@denx.de Cc: Magnus Lilja lilja.magnus@gmail.com Cc: Lokesh Vutla lokeshvutla@ti.com Cc: Chander Kashyap k.chander@samsung.com Cc: Akshay Saraswat akshay.s@samsung.com Cc: Simon Glass sjg@chromium.org Cc: Stefan Roese sr@denx.de Signed-off-by: Tom Rini trini@konsulko.com
arch/arm/Kconfig | 22 ++++++++++++++++++++-- arch/arm/lib/Makefile | 4 ++-- common/init/board_init.c | 2 +- configs/apf27_defconfig | 1 + configs/axm_defconfig | 2 ++ configs/corvus_defconfig | 2 ++ configs/mx31pdk_defconfig | 2 ++ configs/omap4_sdp4430_defconfig | 2 ++ configs/smartweb_defconfig | 2 ++ configs/smdk5250_defconfig | 2 ++ configs/snow_defconfig | 2 ++ configs/spring_defconfig | 2 ++ configs/taurus_defconfig | 2 ++ configs/x600_defconfig | 2 ++ examples/api/Makefile | 5 ++++- 15 files changed, 48 insertions(+), 6 deletions(-)
Reviewed-by: Simon Glass sjg@chromium.org

On Thu, Jan 12, 2017 at 01:16:02PM -0500, Tom Rini wrote:
We have long had available optimized versions of the memset and memcpy functions that are borrowed from the Linux kernel. We should use these in normal conditions as the speed wins in many workflows outweigh the relatively minor size increase. However, we have a number of places where we're simply too close to size limits in SPL and must be able to make the size vs performance trade-off in those cases.
Cc: Philippe Reynes tremyfr@yahoo.fr Cc: Eric Jarrige eric.jarrige@armadeus.org Cc: Heiko Schocher hs@denx.de Cc: Magnus Lilja lilja.magnus@gmail.com Cc: Lokesh Vutla lokeshvutla@ti.com Cc: Chander Kashyap k.chander@samsung.com Cc: Akshay Saraswat akshay.s@samsung.com Cc: Simon Glass sjg@chromium.org Cc: Stefan Roese sr@denx.de Signed-off-by: Tom Rini trini@konsulko.com Acked-by: Stefan Roese sr@denx.de Reviewed-by: Simon Glass sjg@chromium.org
Applied to u-boot/master, thanks!
participants (3)
-
Simon Glass
-
Stefan Roese
-
Tom Rini