[U-Boot] [PATCH v3 0/2] Fix thumb interwork in assembly code

While trying to compile u-boot in thumb due space constraints on a mxs platform it was observed that there are some thumb-interwork issues in the handwritten assembly files.
Since the first patch only applies to ARM926EJS and no board on that platform has thumb enabled for now,it was probably never observed. Strictly speaking it is not thumb specific as it can be triggered in ARM mode too. But as it was discovered by enabling thumb and no mxs board is currently using any relevant ARM code in lowlevel_init, we put the series under the thumb subject.
The second one applies to the ARM specific assembly memcpy implementation that is not enabled on any of the boards enabling thumb.
grep -l CONFIG_SYS_THUMB_BUILD=y configs/* | \ xargs grep -c CONFIG_USE_ARCH_MEMCPY configs/apalis_imx6_nospl_com_defconfig:0 configs/apalis_imx6_nospl_it_defconfig:0 configs/bk4r1_defconfig:0 configs/colibri_imx6_nospl_defconfig:0 configs/colibri_imx7_defconfig:0 configs/colibri_vf_defconfig:0 configs/highbank_defconfig:0 configs/openrd_base_defconfig:0 configs/openrd_client_defconfig:0 configs/openrd_ultimate_defconfig:0 configs/pcm052_defconfig:0 configs/tbs2910_defconfig:0 configs/x600_defconfig:0
With CONFIG_USE_ARCH_MEMCPY on our mxs platform the speedup for memcopy was about 100%.
Changes in v3: - reword commit message as it isn't thumb specific - use r10 instead of sl alias as we don't use it as stack limit pointer - revert return to a pc mov as it is a unnecessary change for this patch
Changes in v2: - use bl instead of blx to call lowlevel_init - remove mxs tag as it apply to all arm926ejs platforms - added memcpy patch to the series
Klaus Goger (2): arm: preserve lr correctly in arm926ejs startup code arm: Make arch specific memcpy thumb-safe.
arch/arm/cpu/arm926ejs/start.S | 4 ++-- arch/arm/lib/memcpy.S | 8 +++++--- 2 files changed, 7 insertions(+), 5 deletions(-)

When building the mxs platform in thumb mode gcc generates code using the intra procedure call scratch register (ip/r12) for the calling the lowlevel_init function. This modifies the lr in flush_dcache which causes u-boot proper to end in an endless loop.
40002334: e1a0c00e mov ip, lr 40002338: eb00df4c bl 4003a070 <__lowlevel_init_from_arm> 4000233c: e1a0e00c mov lr, ip 40002340: e1a0f00e mov pc, lr [...] 4003a070 <__lowlevel_init_from_arm>: 4003a070: e59fc004 ldr ip, [pc, #4] ; 4003a07c <__lowlevel_init_from_arm+0xc> 4003a074: e08fc00c add ip, pc, ip 4003a078: e12fff1c bx ip 4003a07c: fffc86cd .word 0xfffc86cd
Instead of using the the ip/r12 register we use sl/r10 to preserve the link register.
According to "Procedure Call Standard for the ARM Architecture" by ARM subroutines have to preserve the contents of register r4-r8, r10, r11 and SP. So using r12 to store the link register will fail if the called function is using r12 and not preserving it as allowed. This can happen in ARM and thumb mode but will definitely be triggered by building it in thumb. Using r10 should be safe in any case as we are startup code and not called by any C-function and no stack is set up.
Signed-off-by: Klaus Goger klaus.goger@theobroma-systems.com Signed-off-by: Christoph Muellner christoph.muellner@theobroma-systems.com
---
Changes in v3: - reword commit message as it isn't thumb specific - use r10 instead of sl alias as we don't use it as stack limit pointer - revert return to a pc mov as it is a unnecessary change for this patch
Changes in v2: - use bl instead of blx to call lowlevel_init - remove mxs tag as it apply to all arm926ejs platforms
arch/arm/cpu/arm926ejs/start.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm/cpu/arm926ejs/start.S b/arch/arm/cpu/arm926ejs/start.S index 959d1ed86d..a625b39787 100644 --- a/arch/arm/cpu/arm926ejs/start.S +++ b/arch/arm/cpu/arm926ejs/start.S @@ -105,9 +105,9 @@ flush_dcache: /* * Go setup Memory and board specific bits prior to relocation. */ - mov ip, lr /* perserve link reg across call */ + mov r10, lr /* perserve link reg across call */ bl lowlevel_init /* go setup pll,mux,memory */ - mov lr, ip /* restore link */ + mov lr, r10 /* restore link */ #endif mov pc, lr /* back to my caller */ #endif /* CONFIG_SKIP_LOWLEVEL_INIT */

On 26 Apr 2018, at 20:18, Klaus Goger klaus.goger@theobroma-systems.com wrote:
When building the mxs platform in thumb mode gcc generates code using the intra procedure call scratch register (ip/r12) for the calling the lowlevel_init function. This modifies the lr in flush_dcache which causes u-boot proper to end in an endless loop.
40002334: e1a0c00e mov ip, lr 40002338: eb00df4c bl 4003a070 <__lowlevel_init_from_arm> 4000233c: e1a0e00c mov lr, ip 40002340: e1a0f00e mov pc, lr [...] 4003a070 <__lowlevel_init_from_arm>: 4003a070: e59fc004 ldr ip, [pc, #4] ; 4003a07c <__lowlevel_init_from_arm+0xc> 4003a074: e08fc00c add ip, pc, ip 4003a078: e12fff1c bx ip 4003a07c: fffc86cd .word 0xfffc86cd
Instead of using the the ip/r12 register we use sl/r10 to preserve the link register.
According to "Procedure Call Standard for the ARM Architecture" by ARM subroutines have to preserve the contents of register r4-r8, r10, r11 and SP. So using r12 to store the link register will fail if the called function is using r12 and not preserving it as allowed.
Maybe you could rephrase this one, as the “not preserving it as allowed” made me read this twice until I realised that it is equivalent to “does not preserve (and is not required by the PCS to preserve)”?
This can happen in ARM and thumb mode but will definitely be triggered by building it in thumb. Using r10 should be safe in any case as we are startup code and not called by any C-function and no stack is set up.
Signed-off-by: Klaus Goger klaus.goger@theobroma-systems.com Signed-off-by: Christoph Muellner christoph.muellner@theobroma-systems.com
Reviewed-by: Philipp Tomsich philipp.tomsich@theobroma-systems.com
Changes in v3:
- reword commit message as it isn't thumb specific
- use r10 instead of sl alias as we don't use it as stack limit pointer
- revert return to a pc mov as it is a unnecessary change for this patch
Changes in v2:
- use bl instead of blx to call lowlevel_init
- remove mxs tag as it apply to all arm926ejs platforms
arch/arm/cpu/arm926ejs/start.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm/cpu/arm926ejs/start.S b/arch/arm/cpu/arm926ejs/start.S index 959d1ed86d..a625b39787 100644 --- a/arch/arm/cpu/arm926ejs/start.S +++ b/arch/arm/cpu/arm926ejs/start.S @@ -105,9 +105,9 @@ flush_dcache: /* * Go setup Memory and board specific bits prior to relocation. */
- mov ip, lr /* perserve link reg across call */
- mov r10, lr /* perserve link reg across call */ bl lowlevel_init /* go setup pll,mux,memory */
- mov lr, ip /* restore link */
- mov lr, r10 /* restore link */
#endif mov pc, lr /* back to my caller */
#endif /* CONFIG_SKIP_LOWLEVEL_INIT */
2.11.0
U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot

On 04/26/2018 09:33 PM, Dr. Philipp Tomsich wrote:
On 26 Apr 2018, at 20:18, Klaus Goger klaus.goger@theobroma-systems.com wrote:
When building the mxs platform in thumb mode gcc generates code using the intra procedure call scratch register (ip/r12) for the calling the lowlevel_init function. This modifies the lr in flush_dcache which causes u-boot proper to end in an endless loop.
40002334: e1a0c00e mov ip, lr 40002338: eb00df4c bl 4003a070 <__lowlevel_init_from_arm> 4000233c: e1a0e00c mov lr, ip 40002340: e1a0f00e mov pc, lr [...] 4003a070 <__lowlevel_init_from_arm>: 4003a070: e59fc004 ldr ip, [pc, #4] ; 4003a07c <__lowlevel_init_from_arm+0xc> 4003a074: e08fc00c add ip, pc, ip 4003a078: e12fff1c bx ip 4003a07c: fffc86cd .word 0xfffc86cd
Instead of using the the ip/r12 register we use sl/r10 to preserve the link register.
According to "Procedure Call Standard for the ARM Architecture" by ARM subroutines have to preserve the contents of register r4-r8, r10, r11 and SP. So using r12 to store the link register will fail if the called function is using r12 and not preserving it as allowed.
Maybe you could rephrase this one, as the “not preserving it as allowed” made me read this twice until I realised that it is equivalent to “does not preserve (and is not required by the PCS to preserve)”?
This can happen in ARM and thumb mode but will definitely be triggered by building it in thumb. Using r10 should be safe in any case as we are startup code and not called by any C-function and no stack is set up.
Signed-off-by: Klaus Goger klaus.goger@theobroma-systems.com Signed-off-by: Christoph Muellner christoph.muellner@theobroma-systems.com
Reviewed-by: Philipp Tomsich philipp.tomsich@theobroma-systems.com
I'd prefer to see a review from someone not from the same company :)
Changes in v3:
- reword commit message as it isn't thumb specific
- use r10 instead of sl alias as we don't use it as stack limit pointer
- revert return to a pc mov as it is a unnecessary change for this patch
Changes in v2:
- use bl instead of blx to call lowlevel_init
- remove mxs tag as it apply to all arm926ejs platforms
arch/arm/cpu/arm926ejs/start.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm/cpu/arm926ejs/start.S b/arch/arm/cpu/arm926ejs/start.S index 959d1ed86d..a625b39787 100644 --- a/arch/arm/cpu/arm926ejs/start.S +++ b/arch/arm/cpu/arm926ejs/start.S @@ -105,9 +105,9 @@ flush_dcache: /* * Go setup Memory and board specific bits prior to relocation. */
- mov ip, lr /* perserve link reg across call */
- mov r10, lr /* perserve link reg across call */ bl lowlevel_init /* go setup pll,mux,memory */
- mov lr, ip /* restore link */
- mov lr, r10 /* restore link */
#endif mov pc, lr /* back to my caller */
#endif /* CONFIG_SKIP_LOWLEVEL_INIT */
2.11.0
U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot

The current arch implementation of memcpy cannot be called from thumb code, because it does not use bx instructions on return. This patch addresses that. Note, that this patch does not touch the hot loop of memcpy, so performance is not affected.
Tested on MXS (arm926ejs) with and without thumb-mode enabled.
Signed-off-by: Klaus Goger klaus.goger@theobroma-systems.com Signed-off-by: Christoph Muellner christoph.muellner@theobroma-systems.com
---
Changes in v3: None Changes in v2: - added memcpy patch to the series
arch/arm/lib/memcpy.S | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/arm/lib/memcpy.S b/arch/arm/lib/memcpy.S index 588b3f8971..9e9a193c2a 100644 --- a/arch/arm/lib/memcpy.S +++ b/arch/arm/lib/memcpy.S @@ -62,7 +62,7 @@ #endif ENTRY(memcpy) cmp r0, r1 - moveq pc, lr + bxeq lr
enter r4, lr
@@ -150,7 +150,8 @@ ENTRY(memcpy) str1b r0, r4, cs, abort=21f str1b r0, ip, cs, abort=21f
- exit r4, pc + exit r4, lr + bx lr
9: rsb ip, ip, #4 cmp ip, #2 @@ -259,7 +260,8 @@ ENTRY(memcpy) .endm
.macro copy_abort_end - ldmfd sp!, {r4, pc} + ldmfd sp!, {r4, lr} + bx lr .endm
ENDPROC(memcpy)

On Thu, Apr 26, 2018 at 08:18:10PM +0200, Klaus Goger wrote:
The current arch implementation of memcpy cannot be called from thumb code, because it does not use bx instructions on return. This patch addresses that. Note, that this patch does not touch the hot loop of memcpy, so performance is not affected.
Tested on MXS (arm926ejs) with and without thumb-mode enabled.
Signed-off-by: Klaus Goger klaus.goger@theobroma-systems.com Signed-off-by: Christoph Muellner christoph.muellner@theobroma-systems.com
Applied to u-boot/master, thanks!
participants (4)
-
Dr. Philipp Tomsich
-
Klaus Goger
-
Marek Vasut
-
Tom Rini