[U-Boot] [PATCH] ARM1136: Align stack to 8 bytes

The ARM ABI requires that the stack be aligned to 8 bytes as it is noted in Procedure Call Standard for the ARM Architecture: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042d/index.html
Unaligned SP also causes the problem with variable-length arrays allocation when VLA address becomes less than stack pointer during aligning of this address, so the further 'push' into the stack overwrites first 4 bytes of VLA.
Signed-off-by: Vitaly Kuzmichev vkuzmichev@mvista.com Signed-off-by: George G. Davis gdavis@mvista.com --- arch/arm/cpu/arm1136/start.S | 1 + 1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/arch/arm/cpu/arm1136/start.S b/arch/arm/cpu/arm1136/start.S index 957f438..d0c5717 100644 --- a/arch/arm/cpu/arm1136/start.S +++ b/arch/arm/cpu/arm1136/start.S @@ -185,6 +185,7 @@ stack_setup: #endif sub sp, r0, #12 /* leave 3 words for abort-stack */ #endif /* CONFIG_PRELOADER */ + bic sp, sp, #7 /* 8-byte alignment */
clear_bss: ldr r0, _bss_start /* find start of bss segment */

Hi Vitaly,
the exact same problem applies to ARM1176. Maybe you could update your patch and add the same line to /arch/arm/cpu/arm1176/start.S. I am currently not working on an top-of-tree U-Boot, so it would be not so easy for me to create a separate patch for ARM1176.
Regards, Martin
u-boot-bounces@lists.denx.de wrote on Tuesday, June 15, 2010 12:52 PM:
The ARM ABI requires that the stack be aligned to 8 bytes as it is noted in Procedure Call Standard for the ARM Architecture: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042d/index.html
Unaligned SP also causes the problem with variable-length arrays allocation when VLA address becomes less than stack pointer during aligning of this address, so the further 'push' into the stack overwrites first 4 bytes of VLA.
Signed-off-by: Vitaly Kuzmichev vkuzmichev@mvista.com Signed-off-by: George G. Davis gdavis@mvista.com
arch/arm/cpu/arm1136/start.S | 1 + 1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/arch/arm/cpu/arm1136/start.S b/arch/arm/cpu/arm1136/start.S index 957f438..d0c5717 100644 --- a/arch/arm/cpu/arm1136/start.S +++ b/arch/arm/cpu/arm1136/start.S @@ -185,6 +185,7 @@ stack_setup: #endif sub sp, r0, #12 /* leave 3 words for abort-stack */ #endif /* CONFIG_PRELOADER */
- bic sp, sp, #7 /* 8-byte alignment */
clear_bss: ldr r0, _bss_start /* find start of bss segment */

Ok, but probably other ARM starters should be fixed too? I have no possibility to check all ARM CPUs, but I believe this patch won't break them.
On 06/15/2010 03:16 PM, Martin Krause wrote:
Hi Vitaly,
the exact same problem applies to ARM1176. Maybe you could update your patch and add the same line to /arch/arm/cpu/arm1176/start.S. I am currently not working on an top-of-tree U-Boot, so it would be not so easy for me to create a separate patch for ARM1176.
Regards, Martin
u-boot-bounces@lists.denx.de wrote on Tuesday, June 15, 2010 12:52 PM:
The ARM ABI requires that the stack be aligned to 8 bytes as it is noted in Procedure Call Standard for the ARM Architecture: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042d/index.html
Unaligned SP also causes the problem with variable-length arrays allocation when VLA address becomes less than stack pointer during aligning of this address, so the further 'push' into the stack overwrites first 4 bytes of VLA.
Signed-off-by: Vitaly Kuzmichev vkuzmichev@mvista.com Signed-off-by: George G. Davis gdavis@mvista.com
arch/arm/cpu/arm1136/start.S | 1 + 1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/arch/arm/cpu/arm1136/start.S b/arch/arm/cpu/arm1136/start.S index 957f438..d0c5717 100644 --- a/arch/arm/cpu/arm1136/start.S +++ b/arch/arm/cpu/arm1136/start.S @@ -185,6 +185,7 @@ stack_setup: #endif sub sp, r0, #12 /* leave 3 words for abort-stack */ #endif /* CONFIG_PRELOADER */
- bic sp, sp, #7 /* 8-byte alignment */
clear_bss: ldr r0, _bss_start /* find start of bss segment */

The ARM ABI requires that the stack be aligned to 8 bytes as it is noted in Procedure Call Standard for the ARM Architecture: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042d/index.html
Unaligned SP also causes the problem with variable-length arrays allocation when VLA address becomes less than stack pointer during aligning of this address, so the next 'push' in the stack overwrites first 4 bytes of VLA.
Signed-off-by: Vitaly Kuzmichev vkuzmichev@mvista.com --- arch/arm/cpu/arm1136/start.S | 1 + arch/arm/cpu/arm1176/start.S | 1 + arch/arm/cpu/arm720t/start.S | 1 + arch/arm/cpu/arm920t/start.S | 1 + arch/arm/cpu/arm925t/start.S | 1 + arch/arm/cpu/arm926ejs/start.S | 2 +- arch/arm/cpu/arm946es/start.S | 1 + arch/arm/cpu/arm_cortexa8/start.S | 2 +- arch/arm/cpu/arm_intcm/start.S | 1 + arch/arm/cpu/ixp/start.S | 1 + arch/arm/cpu/lh7a40x/start.S | 1 + arch/arm/cpu/pxa/start.S | 1 + arch/arm/cpu/s3c44b0/start.S | 1 + arch/arm/cpu/sa1100/start.S | 1 + 14 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/arm/cpu/arm1136/start.S b/arch/arm/cpu/arm1136/start.S index 957f438..f0cb3cc 100644 --- a/arch/arm/cpu/arm1136/start.S +++ b/arch/arm/cpu/arm1136/start.S @@ -185,6 +185,7 @@ stack_setup: #endif sub sp, r0, #12 /* leave 3 words for abort-stack */ #endif /* CONFIG_PRELOADER */ + bic sp, sp, #7 /* 8-byte alignment for ABI compliance */
clear_bss: ldr r0, _bss_start /* find start of bss segment */ diff --git a/arch/arm/cpu/arm1176/start.S b/arch/arm/cpu/arm1176/start.S index e2b6c9b..2162a6b 100644 --- a/arch/arm/cpu/arm1176/start.S +++ b/arch/arm/cpu/arm1176/start.S @@ -249,6 +249,7 @@ stack_setup: sub r0, r0, #CONFIG_SYS_MALLOC_LEN /* malloc area */ sub r0, r0, #CONFIG_SYS_GBL_DATA_SIZE /* bdinfo */ sub sp, r0, #12 /* leave 3 words for abort-stack */ + bic sp, sp, #7 /* 8-byte alignment for ABI compliance */
clear_bss: ldr r0, _bss_start /* find start of bss segment */ diff --git a/arch/arm/cpu/arm720t/start.S b/arch/arm/cpu/arm720t/start.S index 022b873..d6f2c16 100644 --- a/arch/arm/cpu/arm720t/start.S +++ b/arch/arm/cpu/arm720t/start.S @@ -172,6 +172,7 @@ stack_setup: sub r0, r0, #(CONFIG_STACKSIZE_IRQ+CONFIG_STACKSIZE_FIQ) #endif sub sp, r0, #12 /* leave 3 words for abort-stack */ + bic sp, sp, #7 /* 8-byte alignment for ABI compliance */
clear_bss: ldr r0, _bss_start /* find start of bss segment */ diff --git a/arch/arm/cpu/arm920t/start.S b/arch/arm/cpu/arm920t/start.S index 779f192..e532f55 100644 --- a/arch/arm/cpu/arm920t/start.S +++ b/arch/arm/cpu/arm920t/start.S @@ -204,6 +204,7 @@ stack_setup: sub r0, r0, #(CONFIG_STACKSIZE_IRQ+CONFIG_STACKSIZE_FIQ) #endif sub sp, r0, #12 /* leave 3 words for abort-stack */ + bic sp, sp, #7 /* 8-byte alignment for ABI compliance */
clear_bss: ldr r0, _bss_start /* find start of bss segment */ diff --git a/arch/arm/cpu/arm925t/start.S b/arch/arm/cpu/arm925t/start.S index 567e804..346615e 100644 --- a/arch/arm/cpu/arm925t/start.S +++ b/arch/arm/cpu/arm925t/start.S @@ -196,6 +196,7 @@ stack_setup: sub r0, r0, #(CONFIG_STACKSIZE_IRQ+CONFIG_STACKSIZE_FIQ) #endif sub sp, r0, #12 /* leave 3 words for abort-stack */ + bic sp, sp, #7 /* 8-byte alignment for ABI compliance */
clear_bss: ldr r0, _bss_start /* find start of bss segment */ diff --git a/arch/arm/cpu/arm926ejs/start.S b/arch/arm/cpu/arm926ejs/start.S index 3b81151..cf40ce1 100644 --- a/arch/arm/cpu/arm926ejs/start.S +++ b/arch/arm/cpu/arm926ejs/start.S @@ -196,7 +196,7 @@ stack_setup: #endif #endif /* CONFIG_PRELOADER */ sub sp, r0, #12 /* leave 3 words for abort-stack */ - bic sp, r0, #7 /* 8-byte align stack for ABI compliance */ + bic sp, sp, #7 /* 8-byte alignment for ABI compliance */
clear_bss: ldr r0, _bss_start /* find start of bss segment */ diff --git a/arch/arm/cpu/arm946es/start.S b/arch/arm/cpu/arm946es/start.S index 627e3cb..8844d44 100644 --- a/arch/arm/cpu/arm946es/start.S +++ b/arch/arm/cpu/arm946es/start.S @@ -163,6 +163,7 @@ stack_setup: sub r0, r0, #(CONFIG_STACKSIZE_IRQ+CONFIG_STACKSIZE_FIQ) #endif sub sp, r0, #12 /* leave 3 words for abort-stack */ + bic sp, sp, #7 /* 8-byte alignment for ABI compliance */
clear_bss: ldr r0, _bss_start /* find start of bss segment */ diff --git a/arch/arm/cpu/arm_cortexa8/start.S b/arch/arm/cpu/arm_cortexa8/start.S index 29dae2f..1e0a150 100644 --- a/arch/arm/cpu/arm_cortexa8/start.S +++ b/arch/arm/cpu/arm_cortexa8/start.S @@ -164,7 +164,7 @@ stack_setup: sub r0, r0, #(CONFIG_STACKSIZE_IRQ + CONFIG_STACKSIZE_FIQ) #endif sub sp, r0, #12 @ leave 3 words for abort-stack - and sp, sp, #~7 @ 8 byte alinged for (ldr/str)d + bic sp, sp, #7 @ 8-byte alignment for ABI compliance
/* Clear BSS (if any). Is below tx (watch load addr - need space) */ clear_bss: diff --git a/arch/arm/cpu/arm_intcm/start.S b/arch/arm/cpu/arm_intcm/start.S index bb1f003..328bae0 100644 --- a/arch/arm/cpu/arm_intcm/start.S +++ b/arch/arm/cpu/arm_intcm/start.S @@ -161,6 +161,7 @@ stack_setup: sub r0, r0, #(CONFIG_STACKSIZE_IRQ+CONFIG_STACKSIZE_FIQ) #endif sub sp, r0, #12 /* leave 3 words for abort-stack */ + bic sp, sp, #7 /* 8-byte alignment for ABI compliance */
clear_bss: ldr r0, _bss_start /* find start of bss segment */ diff --git a/arch/arm/cpu/ixp/start.S b/arch/arm/cpu/ixp/start.S index 5ebce53..6efe333 100644 --- a/arch/arm/cpu/ixp/start.S +++ b/arch/arm/cpu/ixp/start.S @@ -289,6 +289,7 @@ stack_setup: sub r0, r0, #(CONFIG_STACKSIZE_IRQ+CONFIG_STACKSIZE_FIQ) #endif sub sp, r0, #12 /* leave 3 words for abort-stack */ + bic sp, sp, #7 /* 8-byte alignment for ABI compliance */
clear_bss: ldr r0, _bss_start /* find start of bss segment */ diff --git a/arch/arm/cpu/lh7a40x/start.S b/arch/arm/cpu/lh7a40x/start.S index a1321b1..14a1fbe 100644 --- a/arch/arm/cpu/lh7a40x/start.S +++ b/arch/arm/cpu/lh7a40x/start.S @@ -178,6 +178,7 @@ stack_setup: sub r0, r0, #(CONFIG_STACKSIZE_IRQ+CONFIG_STACKSIZE_FIQ) #endif sub sp, r0, #12 /* leave 3 words for abort-stack */ + bic sp, sp, #7 /* 8-byte alignment for ABI compliance */
clear_bss: ldr r0, _bss_start /* find start of bss segment */ diff --git a/arch/arm/cpu/pxa/start.S b/arch/arm/cpu/pxa/start.S index 63ab0c5..e07c8c2 100644 --- a/arch/arm/cpu/pxa/start.S +++ b/arch/arm/cpu/pxa/start.S @@ -141,6 +141,7 @@ stack_setup: sub r0, r0, #(CONFIG_STACKSIZE_IRQ+CONFIG_STACKSIZE_FIQ) #endif /* CONFIG_USE_IRQ */ sub sp, r0, #12 /* leave 3 words for abort-stack */ + bic sp, sp, #7 /* 8-byte alignment for ABI compliance */
clear_bss: ldr r0, _bss_start /* find start of bss segment */ diff --git a/arch/arm/cpu/s3c44b0/start.S b/arch/arm/cpu/s3c44b0/start.S index f5a3d3a..0063063 100644 --- a/arch/arm/cpu/s3c44b0/start.S +++ b/arch/arm/cpu/s3c44b0/start.S @@ -163,6 +163,7 @@ stack_setup: sub r0, r0, #(CONFIG_STACKSIZE_IRQ+CONFIG_STACKSIZE_FIQ) #endif sub sp, r0, #12 /* leave 3 words for abort-stack */ + bic sp, sp, #7 /* 8-byte alignment for ABI compliance */
ldr pc, _start_armboot
diff --git a/arch/arm/cpu/sa1100/start.S b/arch/arm/cpu/sa1100/start.S index 278c500..deb4745 100644 --- a/arch/arm/cpu/sa1100/start.S +++ b/arch/arm/cpu/sa1100/start.S @@ -153,6 +153,7 @@ stack_setup: sub r0, r0, #(CONFIG_STACKSIZE_IRQ+CONFIG_STACKSIZE_FIQ) #endif sub sp, r0, #12 /* leave 3 words for abort-stack */ + bic sp, sp, #7 /* 8-byte alignment for ABI compliance */
clear_bss: ldr r0, _bss_start /* find start of bss segment */

I would like to add some explanations:
This is the issue gone from GCC behavior on VLA allocation. I did a simple test with VLA, and the following snippet from its ASM listing may clarify the root cause of issue:
VLA allocation start. R1 is initialized by the length of VLA.
80080030: e281300f add r3, r1, #15 ; 0xf 80080034: e2033f7e and r3, r3, #504 ; 0x1f8
Align VLA size.
80080038: e1a0500d mov r5, sp
Save SP to recover it when VLA becomes needless.
8008003c: e04dd003 sub sp, sp, r3
Allocate R3 bytes on stack.
80080040: e1a0300d mov r3, sp
Store VLA address in R3.
80080044: e1a0c1a3 lsr ip, r3, #3 80080048: e1a0218c lsl r2, ip, #3
Here VLA address is aligned by 8 bytes.
If SP is either 0xYYYYYYY4 or 0xZZZZZZZC, r2 will lose significant digit and will become 0xYYYYYYY0/0xZZZZZZZ8 (VLA=SP-4). It will less than SP, so the next 'push' (alias to STMDB) will decrement SP by 4 and will store register at the top of the stack, so this will overwrite first 4 bytes of VLA.
On 06/15/2010 10:18 PM, Vitaly Kuzmichev wrote:
The ARM ABI requires that the stack be aligned to 8 bytes as it is noted in Procedure Call Standard for the ARM Architecture: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042d/index.html
Unaligned SP also causes the problem with variable-length arrays allocation when VLA address becomes less than stack pointer during aligning of this address, so the next 'push' in the stack overwrites first 4 bytes of VLA.
Signed-off-by: Vitaly Kuzmichev vkuzmichev@mvista.com

Dear Vitaly Kuzmichev,
In message 1276625891-22206-1-git-send-email-vkuzmichev@mvista.com you wrote:
The ARM ABI requires that the stack be aligned to 8 bytes as it is noted in Procedure Call Standard for the ARM Architecture: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042d/index.html
Unaligned SP also causes the problem with variable-length arrays allocation when VLA address becomes less than stack pointer during aligning of this address, so the next 'push' in the stack overwrites first 4 bytes of VLA.
Signed-off-by: Vitaly Kuzmichev vkuzmichev@mvista.com
arch/arm/cpu/arm1136/start.S | 1 + arch/arm/cpu/arm1176/start.S | 1 + arch/arm/cpu/arm720t/start.S | 1 + arch/arm/cpu/arm920t/start.S | 1 + arch/arm/cpu/arm925t/start.S | 1 + arch/arm/cpu/arm926ejs/start.S | 2 +- arch/arm/cpu/arm946es/start.S | 1 + arch/arm/cpu/arm_cortexa8/start.S | 2 +- arch/arm/cpu/arm_intcm/start.S | 1 + arch/arm/cpu/ixp/start.S | 1 + arch/arm/cpu/lh7a40x/start.S | 1 + arch/arm/cpu/pxa/start.S | 1 + arch/arm/cpu/s3c44b0/start.S | 1 + arch/arm/cpu/sa1100/start.S | 1 + 14 files changed, 14 insertions(+), 2 deletions(-)
This is a pretty intrusive patch as it affects all ARM architectures, but the change has been discussed here several times before, and the actual commit looks OK to me, too.
Tested on tx25(mx25), imx27lite(mx27), qong(mx31) and trab(s3c2400) Tested-by: Wolfgang Denk wd@denx.de
Applied to master (i. e. it will be included in -rc3, this release).
ARM custodians: please help testing this!
Best regards,
Wolfgang Denk

On 23 June 2010 05:20, Wolfgang Denk wd@denx.de wrote:
Dear Vitaly Kuzmichev,
In message 1276625891-22206-1-git-send-email-vkuzmichev@mvista.com you wrote:
The ARM ABI requires that the stack be aligned to 8 bytes as it is noted in Procedure Call Standard for the ARM Architecture: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042d/index.html
Unaligned SP also causes the problem with variable-length arrays allocation when VLA address becomes less than stack pointer during aligning of this address, so the next 'push' in the stack overwrites first 4 bytes of VLA.
Signed-off-by: Vitaly Kuzmichev vkuzmichev@mvista.com
arch/arm/cpu/arm1136/start.S | 1 + arch/arm/cpu/arm1176/start.S | 1 + arch/arm/cpu/arm720t/start.S | 1 + arch/arm/cpu/arm920t/start.S | 1 + arch/arm/cpu/arm925t/start.S | 1 + arch/arm/cpu/arm926ejs/start.S | 2 +- arch/arm/cpu/arm946es/start.S | 1 + arch/arm/cpu/arm_cortexa8/start.S | 2 +- arch/arm/cpu/arm_intcm/start.S | 1 + arch/arm/cpu/ixp/start.S | 1 + arch/arm/cpu/lh7a40x/start.S | 1 + arch/arm/cpu/pxa/start.S | 1 + arch/arm/cpu/s3c44b0/start.S | 1 + arch/arm/cpu/sa1100/start.S | 1 + 14 files changed, 14 insertions(+), 2 deletions(-)
This is a pretty intrusive patch as it affects all ARM architectures, but the change has been discussed here several times before, and the actual commit looks OK to me, too.
Tested on tx25(mx25), imx27lite(mx27), qong(mx31) and trab(s3c2400) Tested-by: Wolfgang Denk wd@denx.de
Applied to master (i. e. it will be included in -rc3, this release).
ARM custodians: please help testing this!
Tested on goni (s5pc110). It works fine.
Thanks Minkyu Kang
participants (4)
-
Martin Krause
-
Minkyu Kang
-
Vitaly Kuzmichev
-
Wolfgang Denk