
On an NXP LX2160 based platform it has been noticed, that the currently implemented memset/memcpy functions for aarch64 are suboptimal. Especially the memset() for clearing the NXP MC firmware memory is very expensive (time-wise).
This patchset now adds the optimized functions ported from this repository: https://github.com/ARM-software/optimized-routines
As the optimized memset function make use of the dc opcode, which needs the caches to be enabled, an additional check is added and a simple memset version is used in this case.
Please note that checkpatch.pl complains about some issue with this imported file: arch/arm/lib/asmdefs.h Since it's imported I did explicitly not make any changes here, to make potential future sync'ing easer.
Here some numbers to see the speed improments: Current original version: ------------------------- memset() 32 Bytes, 16M times: time: 0.446 seconds
memset() 16MiB, 256 times: time: 1.076 seconds
memcpy() 512MiB: time: 0.224 seconds
New optimized version: ---------------------- memset() 32 Bytes, 16M times: time: 0.287 seconds
memset() 16MiB, 256 times: time: 0.292 seconds
memcpy() 512MiB: time: 0.222 seconds
Summary: The optimized memcpy is nearly identical to the original one. But the optimized memset is much faster, for small and big sizes. Small sizes factor ~1.6 and big sizes factor ~3.7.
Note: These measurements were done on the NXP LX2160ARDB board.
Thanks, Stefan
Changes in v5: - memmove is now auto-selected (or deselected) with the memcpy Kconfig selection as it's entry is the same as memcpy for ARM64
Changes in v4: - Use macros instead of register names, following the optimized code - Add zero size check
Changes in v3: - Add memmove alias, as this function also handles it optimized - Add memmove as well
Changes in v2: - Add file names and locations and git commit ID from imported files to the commit message - New patch
Stefan Roese (3): arm64: arch/arm/lib: Add optimized memset/memcpy/memmove functions arm64: memset-arm64: Use simple memset when cache is disabled arm64: Kconfig: Enable usage of optimized memset/memcpy/memmove
arch/arm/Kconfig | 37 +++++- arch/arm/include/asm/string.h | 4 + arch/arm/lib/Makefile | 5 + arch/arm/lib/asmdefs.h | 98 ++++++++++++++ arch/arm/lib/memcpy-arm64.S | 242 ++++++++++++++++++++++++++++++++++ arch/arm/lib/memset-arm64.S | 148 +++++++++++++++++++++ 6 files changed, 528 insertions(+), 6 deletions(-) create mode 100644 arch/arm/lib/asmdefs.h create mode 100644 arch/arm/lib/memcpy-arm64.S create mode 100644 arch/arm/lib/memset-arm64.S