[U-Boot] [PATCH 0/2] armv8: reduce exception handling code size

These two patches try to address the issue that the default ARMv8 exception vectors take up quite some code space, but don't provide much benefit apart from a crash dump. Since the overhead might not be justified for some very size-restricted SPLs, we try to reduce the code size: - Patch 1/2 stuffs the shared register save/restore code into the gaps between the exception entries, which have to follow an architectural 128 byte alignment requirement. This reduces the code size by about 250 bytes, while still having the full functionality. - Patch 2/2 goes much further by introducing a Kconfig symbol allowing to drop the exception vector table for the SPL at all, at the expense of losing the crash dump feature. On the Allwinner A64 SPL this saves about 3KB of code, which sound quite worthwhile with our chronically tight SPL builds.
The last feature is off by default, but could be selected manually in menuconfig to fix too big SPL builds.
Cheers, Andre.
Andre Przywara (2): armv8: Reduce exception handling code armv8: make SPL exception vectors optional
arch/arm/cpu/armv8/Kconfig | 11 ++++ arch/arm/cpu/armv8/Makefile | 4 ++ arch/arm/cpu/armv8/exceptions.S | 132 +++++++++++++++++++++++++--------------- arch/arm/cpu/armv8/start.S | 19 ++++-- 4 files changed, 112 insertions(+), 54 deletions(-)

The arm64 exception handling code is quite big, mostly due to architectural alignment requirements. Each exception entry spans 32 instructions, which sounds generous, but is too small to fit all of the save/branch/restore code in there. So at the moment we use only four instructions, branching into shared save and restore routines. To not leave the space for those remaining 28 instructions wasted, let's split the save and restore routines and stuff them into the gaps. This saves about 250 bytes of code, which is helpful for those tight SPLs.
Signed-off-by: Andre Przywara andre.przywara@arm.com --- arch/arm/cpu/armv8/exceptions.S | 132 +++++++++++++++++++++++++--------------- 1 file changed, 82 insertions(+), 50 deletions(-)
diff --git a/arch/arm/cpu/armv8/exceptions.S b/arch/arm/cpu/armv8/exceptions.S index 1a78a5d1dc..a15af72e02 100644 --- a/arch/arm/cpu/armv8/exceptions.S +++ b/arch/arm/cpu/armv8/exceptions.S @@ -11,7 +11,26 @@ #include <linux/linkage.h>
/* - * Exception vectors. + * AArch64 exception vectors: + * We have four types of exceptions: + * - synchronous: traps, data aborts, undefined instructions, ... + * - IRQ: group 1 (normal) interrupts + * - FIQ: group 0 or secure interrupts + * - SError: fatal system errors + * There are entries for all four of those for different contexts: + * - from same exception level, when using the SP_EL0 stack pointer + * - from same exception level, when using the SP_ELx stack pointer + * - from lower exception level, when this is AArch64 + * - from lower exception level, when this is AArch32 + * Each of those 16 entries have space for 32 instructions, each entry must + * be 128 byte aligned, the whole table must be 2K aligned. + * The 32 instructions are not enough to save and restore all registers and + * to branch to the actual handler, so we split this up: + * Each entry saves the LR, branches to the save routine, then to the actual + * handler, then to the restore routine. The save and restore routines are + * each split in half and stuffed in the unused gap between the entries. + * Also as we do not run anything in a lower exception level, we just provide + * the first 8 entries for exceptions from the same EL. */ .align 11 .globl vectors @@ -22,52 +41,9 @@ vectors: bl do_bad_sync b exception_exit
- .align 7 /* Current EL IRQ Thread */ - stp x29, x30, [sp, #-16]! - bl _exception_entry - bl do_bad_irq - b exception_exit - - .align 7 /* Current EL FIQ Thread */ - stp x29, x30, [sp, #-16]! - bl _exception_entry - bl do_bad_fiq - b exception_exit - - .align 7 /* Current EL Error Thread */ - stp x29, x30, [sp, #-16]! - bl _exception_entry - bl do_bad_error - b exception_exit - - .align 7 /* Current EL Synchronous Handler */ - stp x29, x30, [sp, #-16]! - bl _exception_entry - bl do_sync - b exception_exit - - .align 7 /* Current EL IRQ Handler */ - stp x29, x30, [sp, #-16]! - bl _exception_entry - bl do_irq - b exception_exit - - .align 7 /* Current EL FIQ Handler */ - stp x29, x30, [sp, #-16]! - bl _exception_entry - bl do_fiq - b exception_exit - - .align 7 /* Current EL Error Handler */ - stp x29, x30, [sp, #-16]! - bl _exception_entry - bl do_error - b exception_exit - /* - * Enter Exception. - * This will save the processor state that is ELR/X0~X30 - * to the stack frame. + * Save (most of) the GP registers to the stack frame. + * This is the first part of the shared routine called into from all entries. */ _exception_entry: stp x27, x28, [sp, #-16]! @@ -84,7 +60,19 @@ _exception_entry: stp x5, x6, [sp, #-16]! stp x3, x4, [sp, #-16]! stp x1, x2, [sp, #-16]! + b _save_el_regs /* jump to the second part */
+ .align 7 /* Current EL IRQ Thread */ + stp x29, x30, [sp, #-16]! + bl _exception_entry + bl do_bad_irq + b exception_exit + +/* + * Save exception specific context: ESR and ELR, for all exception levels. + * This is the second part of the shared routine called into from all entries. + */ +_save_el_regs: /* Could be running at EL3/EL2/EL1 */ switch_el x11, 3f, 2f, 1f 3: mrs x1, esr_el3 @@ -100,16 +88,36 @@ _exception_entry: mov x0, sp ret
- + .align 7 /* Current EL FIQ Thread */ + stp x29, x30, [sp, #-16]! + bl _exception_entry + bl do_bad_fiq + /* falling through to _exception_exit */ +/* + * Restore the exception return address, for all exception levels. + * This is the first part of the shared routine called into from all entries. + */ exception_exit: ldp x2, x0, [sp],#16 switch_el x11, 3f, 2f, 1f 3: msr elr_el3, x2 - b 0f + b _restore_regs 2: msr elr_el2, x2 - b 0f + b _restore_regs 1: msr elr_el1, x2 -0: + b _restore_regs /* jump to the second part */ + + .align 7 /* Current EL Error Thread */ + stp x29, x30, [sp, #-16]! + bl _exception_entry + bl do_bad_error + b exception_exit + +/* + * Restore the general purpose registers from the exception stack, then return. + * This is the second part of the shared routine called into from all entries. + */ +_restore_regs: ldp x1, x2, [sp],#16 ldp x3, x4, [sp],#16 ldp x5, x6, [sp],#16 @@ -126,3 +134,27 @@ exception_exit: ldp x27, x28, [sp],#16 ldp x29, x30, [sp],#16 eret + + .align 7 /* Current EL (SP_ELx) Synchronous Handler */ + stp x29, x30, [sp, #-16]! + bl _exception_entry + bl do_sync + b exception_exit + + .align 7 /* Current EL (SP_ELx) IRQ Handler */ + stp x29, x30, [sp, #-16]! + bl _exception_entry + bl do_irq + b exception_exit + + .align 7 /* Current EL (SP_ELx) FIQ Handler */ + stp x29, x30, [sp, #-16]! + bl _exception_entry + bl do_fiq + b exception_exit + + .align 7 /* Current EL (SP_ELx) Error Handler */ + stp x29, x30, [sp, #-16]! + bl _exception_entry + bl do_error + b exception_exit

On Wed, Jul 25, 2018 at 12:57:00AM +0100, Andre Przywara wrote:
The arm64 exception handling code is quite big, mostly due to architectural alignment requirements. Each exception entry spans 32 instructions, which sounds generous, but is too small to fit all of the save/branch/restore code in there. So at the moment we use only four instructions, branching into shared save and restore routines. To not leave the space for those remaining 28 instructions wasted, let's split the save and restore routines and stuff them into the gaps. This saves about 250 bytes of code, which is helpful for those tight SPLs.
Signed-off-by: Andre Przywara andre.przywara@arm.com
Applied to u-boot/master, thanks!

Even though the exception vector table is a fundamental part of the ARM architecture, U-Boot mostly does not make real use of it, except when crash dumping. But having it in takes up quite some space, partly due to the architectural alignment requirement of 2KB. Since we don't take special care of that, the compiler adds a more or less random amount of padding space, which increases the image size quite a bit, especially for the SPL.
On a typical Allwinner build this is around 1.5KB of padding, plus 1KB for the vector table (mostly padding space again), then some extra code to do the actual handling. This amounts to almost 10% of the maximum image size, which is quite a lot for a pure debugging feature.
Add a Kconfig symbol to allow the exception vector table to be left out of the build for the SPL. For now this is "default y" for everyone, but specific defconfigs, platforms or .config files can opt out here at will, to mitigate the code size pressure we see for some SPLs.
Signed-off-by: Andre Przywara andre.przywara@arm.com --- arch/arm/cpu/armv8/Kconfig | 11 +++++++++++ arch/arm/cpu/armv8/Makefile | 4 ++++ arch/arm/cpu/armv8/start.S | 19 +++++++++++++++---- 3 files changed, 30 insertions(+), 4 deletions(-)
diff --git a/arch/arm/cpu/armv8/Kconfig b/arch/arm/cpu/armv8/Kconfig index 22d2f29548..bd1e759c37 100644 --- a/arch/arm/cpu/armv8/Kconfig +++ b/arch/arm/cpu/armv8/Kconfig @@ -1,5 +1,16 @@ if ARM64
+config ARMV8_SPL_EXCEPTION_VECTORS + bool "Install crash dump exception vectors" + depends on SPL + default y + help + The default exception vector table is only used for the crash + dump, but still takes quite a lot of space in the image size. + + Say N here if you are running out of code space in the image + and want to save some space at the cost of less debugging info. + config ARMV8_MULTIENTRY bool "Enable multiple CPUs to enter into U-Boot"
diff --git a/arch/arm/cpu/armv8/Makefile b/arch/arm/cpu/armv8/Makefile index d1d4ffecfd..52c8daa049 100644 --- a/arch/arm/cpu/armv8/Makefile +++ b/arch/arm/cpu/armv8/Makefile @@ -10,7 +10,11 @@ ifndef CONFIG_$(SPL_TPL_)TIMER obj-$(CONFIG_SYS_ARCH_TIMER) += generic_timer.o endif obj-y += cache_v8.o +ifdef CONFIG_SPL_BUILD +obj-$(CONFIG_ARMV8_SPL_EXCEPTION_VECTORS) += exceptions.o +else obj-y += exceptions.o +endif obj-y += cache.o obj-y += tlb.o obj-y += transition.o diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S index d4db4d044f..12a78ee38b 100644 --- a/arch/arm/cpu/armv8/start.S +++ b/arch/arm/cpu/armv8/start.S @@ -86,14 +86,23 @@ pie_fixup_done:
#ifdef CONFIG_SYS_RESET_SCTRL bl reset_sctrl +#endif + +#if defined(CONFIG_ARMV8__SPL_EXCEPTION_VECTORS) || !defined(CONFIG_SPL_BUILD) +.macro set_vbar, regname, reg + msr \regname, \reg +.endm + adr x0, vectors +#else +.macro set_vbar, regname, reg +.endm #endif /* * Could be EL3/EL2/EL1, Initial State: * Little Endian, MMU Disabled, i/dCache Disabled */ - adr x0, vectors switch_el x1, 3f, 2f, 1f -3: msr vbar_el3, x0 +3: set_vbar vbar_el3, x0 mrs x0, scr_el3 orr x0, x0, #0xf /* SCR_EL3.NS|IRQ|FIQ|EA */ msr scr_el3, x0 @@ -103,11 +112,11 @@ pie_fixup_done: msr cntfrq_el0, x0 /* Initialize CNTFRQ */ #endif b 0f -2: msr vbar_el2, x0 +2: set_vbar vbar_el2, x0 mov x0, #0x33ff msr cptr_el2, x0 /* Enable FP/SIMD */ b 0f -1: msr vbar_el1, x0 +1: set_vbar vbar_el1, x0 mov x0, #3 << 20 msr cpacr_el1, x0 /* Enable FP/SIMD */ 0: @@ -345,6 +354,7 @@ ENDPROC(smp_kick_all_cpus) /*-----------------------------------------------------------------------*/
ENTRY(c_runtime_cpu_setup) +#if defined(CONFIG_ARMV8__SPL_EXCEPTION_VECTORS) || !defined(CONFIG_SPL_BUILD) /* Relocate vBAR */ adr x0, vectors switch_el x1, 3f, 2f, 1f @@ -354,6 +364,7 @@ ENTRY(c_runtime_cpu_setup) b 0f 1: msr vbar_el1, x0 0: +#endif
ret ENDPROC(c_runtime_cpu_setup)

On Wed, Jul 25, 2018 at 12:57:01AM +0100, Andre Przywara wrote:
Even though the exception vector table is a fundamental part of the ARM architecture, U-Boot mostly does not make real use of it, except when crash dumping. But having it in takes up quite some space, partly due to the architectural alignment requirement of 2KB. Since we don't take special care of that, the compiler adds a more or less random amount of padding space, which increases the image size quite a bit, especially for the SPL.
On a typical Allwinner build this is around 1.5KB of padding, plus 1KB for the vector table (mostly padding space again), then some extra code to do the actual handling. This amounts to almost 10% of the maximum image size, which is quite a lot for a pure debugging feature.
Add a Kconfig symbol to allow the exception vector table to be left out of the build for the SPL. For now this is "default y" for everyone, but specific defconfigs, platforms or .config files can opt out here at will, to mitigate the code size pressure we see for some SPLs.
Signed-off-by: Andre Przywara andre.przywara@arm.com
arch/arm/cpu/armv8/Kconfig | 11 +++++++++++ arch/arm/cpu/armv8/Makefile | 4 ++++ arch/arm/cpu/armv8/start.S | 19 +++++++++++++++---- 3 files changed, 30 insertions(+), 4 deletions(-)
diff --git a/arch/arm/cpu/armv8/Kconfig b/arch/arm/cpu/armv8/Kconfig index 22d2f29548..bd1e759c37 100644 --- a/arch/arm/cpu/armv8/Kconfig +++ b/arch/arm/cpu/armv8/Kconfig @@ -1,5 +1,16 @@ if ARM64
+config ARMV8_SPL_EXCEPTION_VECTORS
- bool "Install crash dump exception vectors"
- depends on SPL
- default y
- help
The default exception vector table is only used for the crash
dump, but still takes quite a lot of space in the image size.
Say N here if you are running out of code space in the image
and want to save some space at the cost of less debugging info.
config ARMV8_MULTIENTRY bool "Enable multiple CPUs to enter into U-Boot"
diff --git a/arch/arm/cpu/armv8/Makefile b/arch/arm/cpu/armv8/Makefile index d1d4ffecfd..52c8daa049 100644 --- a/arch/arm/cpu/armv8/Makefile +++ b/arch/arm/cpu/armv8/Makefile @@ -10,7 +10,11 @@ ifndef CONFIG_$(SPL_TPL_)TIMER obj-$(CONFIG_SYS_ARCH_TIMER) += generic_timer.o endif obj-y += cache_v8.o +ifdef CONFIG_SPL_BUILD +obj-$(CONFIG_ARMV8_SPL_EXCEPTION_VECTORS) += exceptions.o +else obj-y += exceptions.o +endif
I _think_ we can make the code parts of this read better (and make use of CONFIG_IS_ENABLED() if we add CONFIG_ARMV8_EXCEPTION_VECTORS and don't allow it to be unselected (ie no text on the bool line).

On Wed, Jul 25, 2018 at 12:57:01AM +0100, Andre Przywara wrote:
Even though the exception vector table is a fundamental part of the ARM architecture, U-Boot mostly does not make real use of it, except when crash dumping. But having it in takes up quite some space, partly due to the architectural alignment requirement of 2KB. Since we don't take special care of that, the compiler adds a more or less random amount of padding space, which increases the image size quite a bit, especially for the SPL.
On a typical Allwinner build this is around 1.5KB of padding, plus 1KB for the vector table (mostly padding space again), then some extra code to do the actual handling. This amounts to almost 10% of the maximum image size, which is quite a lot for a pure debugging feature.
Add a Kconfig symbol to allow the exception vector table to be left out of the build for the SPL. For now this is "default y" for everyone, but specific defconfigs, platforms or .config files can opt out here at will, to mitigate the code size pressure we see for some SPLs.
Signed-off-by: Andre Przywara andre.przywara@arm.com
Applied to u-boot/master, thanks!
participants (2)
-
Andre Przywara
-
Tom Rini