[PATCH v2 0/3] arm64: Large PIE fixes

From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
This fixes some builds issues with large (> 1MB) PIE U-boot setups. We also document the 4K aligned load address requirement and add an early run-time check for it.
Thanks, Edgar
ChangeLog: v1 -> v2: * Fix adr of _start in arch/arm/lib/crt0_64.S * Add early check and bail out into a WFI loop when not 4K aligned * Document the 4K alignement requirement in Kconfig
Edgar E. Iglesias (3): arm64: Mention 4K aligned load addresses in the PIE Kconfig help arm64: Bail out PIE builds early if load address is not 4K aligned arm64: Add support for larger PIE U-boot
arch/arm/Kconfig | 3 +++ arch/arm/cpu/armv8/start.S | 23 +++++++++++++++++++++-- arch/arm/lib/crt0_64.S | 3 ++- 3 files changed, 26 insertions(+), 3 deletions(-)

From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
Mention the requirement of 4K aligned load addresses in the help section for the POSITION_INDEPENDENT option.
Suggested-by: Michal Simek michal.simek@xilinx.com Signed-off-by: Edgar E. Iglesias edgar.iglesias@xilinx.com --- arch/arm/Kconfig | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index f30c2639ec..c144c08612 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -21,6 +21,9 @@ config POSITION_INDEPENDENT information that is embedded in the binary to support U-Boot relocating itself to the top-of-RAM later during execution.
+ When this option is enabled, U-Boot needs to be loaded at a + 4K aligned address. + config INIT_SP_RELATIVE bool "Specify the early stack pointer relative to the .bss section" help

On 9/4/20 3:07 AM, Edgar E. Iglesias wrote:
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
Mention the requirement of 4K aligned load addresses in the help section for the POSITION_INDEPENDENT option.
Suggested-by: Michal Simek michal.simek@xilinx.com Signed-off-by: Edgar E. Iglesias edgar.iglesias@xilinx.com
arch/arm/Kconfig | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index f30c2639ec..c144c08612 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -21,6 +21,9 @@ config POSITION_INDEPENDENT information that is embedded in the binary to support U-Boot relocating itself to the top-of-RAM later during execution.
When this option is enabled, U-Boot needs to be loaded at a
4K aligned address.
I don't believe this restriction should be documented as part of POSITION_INDEPENDENT; the restriction always exists at least for 64-bit ARM, since arch/arm/lib/relocate_64.S relocate_code uses the same assembly sequence that imposes this restriction, and IIUC that code is unconditionally used.

On 04/09/2020 19:42, Stephen Warren wrote:
On 9/4/20 3:07 AM, Edgar E. Iglesias wrote:
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
Mention the requirement of 4K aligned load addresses in the help section for the POSITION_INDEPENDENT option.
Suggested-by: Michal Simek michal.simek@xilinx.com Signed-off-by: Edgar E. Iglesias edgar.iglesias@xilinx.com
arch/arm/Kconfig | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index f30c2639ec..c144c08612 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -21,6 +21,9 @@ config POSITION_INDEPENDENT information that is embedded in the binary to support U-Boot relocating itself to the top-of-RAM later during execution.
When this option is enabled, U-Boot needs to be loaded at a
4K aligned address.
I don't believe this restriction should be documented as part of POSITION_INDEPENDENT; the restriction always exists at least for 64-bit ARM, since arch/arm/lib/relocate_64.S relocate_code uses the same assembly sequence that imposes this restriction, and IIUC that code is unconditionally used.
While this is true, the difference is that without POSITION_INDEPENDENT the alignment is easily determined by the hardcoded load address. So we should actually have a build time check on this.
With POSITION_INDEPENDENT, however, the load address is only known at runtime (somewhat under the user's control, if you like). So a warning or hint here might be useful. But maybe it should be noted as a general restriction in the paragraph above: " ... from almost any address" => "from almost any 4K aligned address"
Cheers, Andre.

On Sun, Sep 06, 2020 at 11:16:17PM +0100, André Przywara wrote:
On 04/09/2020 19:42, Stephen Warren wrote:
On 9/4/20 3:07 AM, Edgar E. Iglesias wrote:
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
Mention the requirement of 4K aligned load addresses in the help section for the POSITION_INDEPENDENT option.
Suggested-by: Michal Simek michal.simek@xilinx.com Signed-off-by: Edgar E. Iglesias edgar.iglesias@xilinx.com
arch/arm/Kconfig | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index f30c2639ec..c144c08612 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -21,6 +21,9 @@ config POSITION_INDEPENDENT information that is embedded in the binary to support U-Boot relocating itself to the top-of-RAM later during execution.
When this option is enabled, U-Boot needs to be loaded at a
4K aligned address.
I don't believe this restriction should be documented as part of POSITION_INDEPENDENT; the restriction always exists at least for 64-bit ARM, since arch/arm/lib/relocate_64.S relocate_code uses the same assembly sequence that imposes this restriction, and IIUC that code is unconditionally used.
While this is true, the difference is that without POSITION_INDEPENDENT the alignment is easily determined by the hardcoded load address. So we should actually have a build time check on this.
With POSITION_INDEPENDENT, however, the load address is only known at runtime (somewhat under the user's control, if you like). So a warning or hint here might be useful. But maybe it should be noted as a general restriction in the paragraph above: " ... from almost any address" => "from almost any 4K aligned address"
That sounds good to me.
Thanks, Edgar

From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
PIE requires a 4K aligned load address. If this is not met, trap the startup sequence in a WFI loop rather than running into obscure failures.
Tested-by: Michal Simek michal.simek@xilinx.com Suggested-by: André Przywara andre.przywara@arm.com Signed-off-by: Edgar E. Iglesias edgar.iglesias@xilinx.com --- arch/arm/cpu/armv8/start.S | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S index 002698b501..732bc385d4 100644 --- a/arch/arm/cpu/armv8/start.S +++ b/arch/arm/cpu/armv8/start.S @@ -59,6 +59,23 @@ reset: save_boot_params_ret:
#if CONFIG_POSITION_INDEPENDENT + /* Verify that we're 4K aligned. */ + adr x1, _start + ands x1, x1, #0xfff + b.eq 1f +0: + /* + * FATAL, can't continue. + * U-boot needs to be loaded at a 4K aligned address. + * + * We use ADRP and ADD to load some symbol addresses during startup. + * The ADD uses an absolute (non pc-relative) lo12 relocation + * thus requiring 4K alignment. + */ + wfi + b 0b +1: + /* * Fix .rela.dyn relocations. This allows U-Boot to be loaded to and * executed at a different address than it was linked at.

On 9/4/20 3:07 AM, Edgar E. Iglesias wrote:
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
PIE requires a 4K aligned load address. If this is not met, trap the startup sequence in a WFI loop rather than running into obscure failures.
diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S #if CONFIG_POSITION_INDEPENDENT
- /* Verify that we're 4K aligned. */
Similar to the comment on the previous patch: I believe the code that implements this check should be outside the #if check, since it's always needed.

On Fri, Sep 04, 2020 at 12:43:57PM -0600, Stephen Warren wrote:
On 9/4/20 3:07 AM, Edgar E. Iglesias wrote:
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
PIE requires a 4K aligned load address. If this is not met, trap the startup sequence in a WFI loop rather than running into obscure failures.
diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S #if CONFIG_POSITION_INDEPENDENT
- /* Verify that we're 4K aligned. */
Similar to the comment on the previous patch: I believe the code that implements this check should be outside the #if check, since it's always needed.
But a check for non-PIE would have to be stricter, wouldn't it? I.e the load address needs to exactly match the link-time address.
Perhaps we should add the non-PIE check in a separate patch (if at all)?
Cheers, Edgar

On Mon, Sep 07, 2020 at 11:52:35AM +0200, Edgar E. Iglesias wrote:
On Fri, Sep 04, 2020 at 12:43:57PM -0600, Stephen Warren wrote:
On 9/4/20 3:07 AM, Edgar E. Iglesias wrote:
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
PIE requires a 4K aligned load address. If this is not met, trap the startup sequence in a WFI loop rather than running into obscure failures.
diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S #if CONFIG_POSITION_INDEPENDENT
- /* Verify that we're 4K aligned. */
Similar to the comment on the previous patch: I believe the code that implements this check should be outside the #if check, since it's always needed.
But a check for non-PIE would have to be stricter, wouldn't it? I.e the load address needs to exactly match the link-time address.
Perhaps we should add the non-PIE check in a separate patch (if at all)?
If we can catch a bad configuration at link time in the non-PIE case (as said in another part of this thread I believe) then we should, yes, thanks!

On Mon, Sep 07, 2020 at 08:57:39AM -0400, Tom Rini wrote:
On Mon, Sep 07, 2020 at 11:52:35AM +0200, Edgar E. Iglesias wrote:
On Fri, Sep 04, 2020 at 12:43:57PM -0600, Stephen Warren wrote:
On 9/4/20 3:07 AM, Edgar E. Iglesias wrote:
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
PIE requires a 4K aligned load address. If this is not met, trap the startup sequence in a WFI loop rather than running into obscure failures.
diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S #if CONFIG_POSITION_INDEPENDENT
- /* Verify that we're 4K aligned. */
Similar to the comment on the previous patch: I believe the code that implements this check should be outside the #if check, since it's always needed.
But a check for non-PIE would have to be stricter, wouldn't it? I.e the load address needs to exactly match the link-time address.
Perhaps we should add the non-PIE check in a separate patch (if at all)?
If we can catch a bad configuration at link time in the non-PIE case (as said in another part of this thread I believe) then we should, yes, thanks!
The non-PIE configuration is expected to be loaded at a specific address. The actual load address cannot be checked at link-time (since it's up to the user at run-time) but given the assumption of a specific load-address, 4K alignment can be enforced at link-time.
It really comes down to adding reasonable run-time checks for errors that users may reasonably struggle with.
For PIE, checking for 4K aligment is reasonable because it's an easy enough misstake to make (since you've got a binary that's was supposed to handle relocation).
For non-PIE, checking for the exact address at run-time is a little bit more border-line IMO but I guess also somewhat reasonable.
There are still plenty of cases we can't catch though (loaded at odd addreses, non-RAM address-ranges etc etc).
Cheers, Edgar

On 9/7/20 3:52 AM, Edgar E. Iglesias wrote:
On Fri, Sep 04, 2020 at 12:43:57PM -0600, Stephen Warren wrote:
On 9/4/20 3:07 AM, Edgar E. Iglesias wrote:
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
PIE requires a 4K aligned load address. If this is not met, trap the startup sequence in a WFI loop rather than running into obscure failures.
diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S #if CONFIG_POSITION_INDEPENDENT
- /* Verify that we're 4K aligned. */
Similar to the comment on the previous patch: I believe the code that implements this check should be outside the #if check, since it's always needed.
But a check for non-PIE would have to be stricter, wouldn't it? I.e the load address needs to exactly match the link-time address.
Oh yes, I guess that is true.

From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
Linking a U-boot larger than 1MB fails with PIE enabled: u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' defined in .bss_start section in u-boot.
This extends the supported range by using adrp & add to load symbols early while starting up.
Signed-off-by: Edgar E. Iglesias edgar.iglesias@xilinx.com --- arch/arm/cpu/armv8/start.S | 6 ++++-- arch/arm/lib/crt0_64.S | 3 ++- 2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S index 732bc385d4..335783014c 100644 --- a/arch/arm/cpu/armv8/start.S +++ b/arch/arm/cpu/armv8/start.S @@ -84,8 +84,10 @@ pie_fixup: adr x0, _start /* x0 <- Runtime value of _start */ ldr x1, _TEXT_BASE /* x1 <- Linked value of _start */ sub x9, x0, x1 /* x9 <- Run-vs-link offset */ - adr x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */ - adr x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */ + adrp x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */ + add x2, x2, #:lo12:__rel_dyn_start + adrp x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */ + add x3, x3, #:lo12:__rel_dyn_end pie_fix_loop: ldp x0, x1, [x2], #16 /* (x0, x1) <- (Link location, fixup) */ ldr x4, [x2], #8 /* x4 <- addend */ diff --git a/arch/arm/lib/crt0_64.S b/arch/arm/lib/crt0_64.S index 04afa518ac..f28071a238 100644 --- a/arch/arm/lib/crt0_64.S +++ b/arch/arm/lib/crt0_64.S @@ -102,7 +102,8 @@ ENTRY(_main) adr lr, relocation_return #if CONFIG_POSITION_INDEPENDENT /* Add in link-vs-runtime offset */ - adr x0, _start /* x0 <- Runtime value of _start */ + adrp x0, _start /* x0 <- Runtime value of _start */ + add x0, x0, #:lo12:_start ldr x9, _TEXT_BASE /* x9 <- Linked value of _start */ sub x9, x9, x0 /* x9 <- Run-vs-link offset */ add lr, lr, x9

On 04. 09. 20 11:07, Edgar E. Iglesias wrote:
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
Linking a U-boot larger than 1MB fails with PIE enabled: u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' defined in .bss_start section in u-boot.
This extends the supported range by using adrp & add to load symbols early while starting up.
Signed-off-by: Edgar E. Iglesias edgar.iglesias@xilinx.com
arch/arm/cpu/armv8/start.S | 6 ++++-- arch/arm/lib/crt0_64.S | 3 ++- 2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S index 732bc385d4..335783014c 100644 --- a/arch/arm/cpu/armv8/start.S +++ b/arch/arm/cpu/armv8/start.S @@ -84,8 +84,10 @@ pie_fixup: adr x0, _start /* x0 <- Runtime value of _start */ ldr x1, _TEXT_BASE /* x1 <- Linked value of _start */ sub x9, x0, x1 /* x9 <- Run-vs-link offset */
- adr x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */
- adr x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */
- adrp x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */
- add x2, x2, #:lo12:__rel_dyn_start
- adrp x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */
- add x3, x3, #:lo12:__rel_dyn_end
pie_fix_loop: ldp x0, x1, [x2], #16 /* (x0, x1) <- (Link location, fixup) */ ldr x4, [x2], #8 /* x4 <- addend */ diff --git a/arch/arm/lib/crt0_64.S b/arch/arm/lib/crt0_64.S index 04afa518ac..f28071a238 100644 --- a/arch/arm/lib/crt0_64.S +++ b/arch/arm/lib/crt0_64.S @@ -102,7 +102,8 @@ ENTRY(_main) adr lr, relocation_return #if CONFIG_POSITION_INDEPENDENT /* Add in link-vs-runtime offset */
- adr x0, _start /* x0 <- Runtime value of _start */
- adrp x0, _start /* x0 <- Runtime value of _start */
- add x0, x0, #:lo12:_start ldr x9, _TEXT_BASE /* x9 <- Linked value of _start */ sub x9, x9, x0 /* x9 <- Run-vs-link offset */ add lr, lr, x9
I have tested 21MB and 51MB u-boot and there is need to also patch __bss_start.
--- a/arch/arm/lib/crt0_64.S +++ b/arch/arm/lib/crt0_64.S @@ -73,7 +73,12 @@ ENTRY(_main) #elif defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_STACK) ldr x0, =(CONFIG_SPL_STACK) #elif defined(CONFIG_INIT_SP_RELATIVE) +#if CONFIG_POSITION_INDEPENDENT + adrp x0, __bss_start /* x0 <- Runtime &__bss_start */ + add x0, x0, #:lo12:__bss_start +#else adr x0, __bss_start +#endif add x0, x0, #CONFIG_SYS_INIT_SP_BSS_OFFSET #else ldr x0, =(CONFIG_SYS_INIT_SP_ADDR)
Thanks, Michal

On Fri, Sep 04, 2020 at 03:04:02PM +0200, Michal Simek wrote:
On 04. 09. 20 11:07, Edgar E. Iglesias wrote:
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
Linking a U-boot larger than 1MB fails with PIE enabled: u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' defined in .bss_start section in u-boot.
This extends the supported range by using adrp & add to load symbols early while starting up.
Signed-off-by: Edgar E. Iglesias edgar.iglesias@xilinx.com
arch/arm/cpu/armv8/start.S | 6 ++++-- arch/arm/lib/crt0_64.S | 3 ++- 2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S index 732bc385d4..335783014c 100644 --- a/arch/arm/cpu/armv8/start.S +++ b/arch/arm/cpu/armv8/start.S @@ -84,8 +84,10 @@ pie_fixup: adr x0, _start /* x0 <- Runtime value of _start */ ldr x1, _TEXT_BASE /* x1 <- Linked value of _start */ sub x9, x0, x1 /* x9 <- Run-vs-link offset */
- adr x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */
- adr x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */
- adrp x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */
- add x2, x2, #:lo12:__rel_dyn_start
- adrp x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */
- add x3, x3, #:lo12:__rel_dyn_end
pie_fix_loop: ldp x0, x1, [x2], #16 /* (x0, x1) <- (Link location, fixup) */ ldr x4, [x2], #8 /* x4 <- addend */ diff --git a/arch/arm/lib/crt0_64.S b/arch/arm/lib/crt0_64.S index 04afa518ac..f28071a238 100644 --- a/arch/arm/lib/crt0_64.S +++ b/arch/arm/lib/crt0_64.S @@ -102,7 +102,8 @@ ENTRY(_main) adr lr, relocation_return #if CONFIG_POSITION_INDEPENDENT /* Add in link-vs-runtime offset */
- adr x0, _start /* x0 <- Runtime value of _start */
- adrp x0, _start /* x0 <- Runtime value of _start */
- add x0, x0, #:lo12:_start ldr x9, _TEXT_BASE /* x9 <- Linked value of _start */ sub x9, x9, x0 /* x9 <- Run-vs-link offset */ add lr, lr, x9
I have tested 21MB and 51MB u-boot and there is need to also patch __bss_start.
--- a/arch/arm/lib/crt0_64.S +++ b/arch/arm/lib/crt0_64.S @@ -73,7 +73,12 @@ ENTRY(_main) #elif defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_STACK) ldr x0, =(CONFIG_SPL_STACK) #elif defined(CONFIG_INIT_SP_RELATIVE) +#if CONFIG_POSITION_INDEPENDENT
adrp x0, __bss_start /* x0 <- Runtime &__bss_start */
add x0, x0, #:lo12:__bss_start
+#else adr x0, __bss_start +#endif add x0, x0, #CONFIG_SYS_INIT_SP_BSS_OFFSET #else ldr x0, =(CONFIG_SYS_INIT_SP_ADDR)
Thanks Michal, I can bundle this into v3.
Cheers, Edgar

On 9/4/20 3:07 AM, Edgar E. Iglesias wrote:
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
Linking a U-boot larger than 1MB fails with PIE enabled: u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' defined in .bss_start section in u-boot.
This extends the supported range by using adrp & add to load symbols early while starting up.
This patch looks conceptually fine to me, but I won't ack this version due to the issue pointed out by Michal.
participants (6)
-
André Przywara
-
Edgar E. Iglesias
-
Edgar E. Iglesias
-
Michal Simek
-
Stephen Warren
-
Tom Rini