[PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y

From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y compilation error is shown: /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' defined in .bss_start section in u-boot.
It is caused by adr instruction which permits the calculation of any byte address within +- 1MB of the current PC. Because U-Boot is bigger then 1MB calculation is failing.
The patch is using adrp/add instructions where adrp shifts a signed, 21-bit immediate left by 12 bits (4k page), adds it to the value of the program counter with the bottom 12 bits cleared to zero. Then add instruction provides the lower 12 bits which is offset within 4k page. These two instructions together compose full 32bit offset which should be more then enough to cover the whole u-boot size.
Signed-off-by: Edgar E. Iglesias edgar.iglesias@xilinx.com Signed-off-by: Michal Simek michal.simek@xilinx.com ---
arch/arm/cpu/armv8/start.S | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S index 002698b501c3..52cf1230c205 100644 --- a/arch/arm/cpu/armv8/start.S +++ b/arch/arm/cpu/armv8/start.S @@ -67,8 +67,10 @@ pie_fixup: adr x0, _start /* x0 <- Runtime value of _start */ ldr x1, _TEXT_BASE /* x1 <- Linked value of _start */ sub x9, x0, x1 /* x9 <- Run-vs-link offset */ - adr x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */ - adr x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */ + adrp x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */ + add x2, x2, #:lo12:__rel_dyn_start + adrp x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */ + add x3, x3, #:lo12:__rel_dyn_end pie_fix_loop: ldp x0, x1, [x2], #16 /* (x0, x1) <- (Link location, fixup) */ ldr x4, [x2], #8 /* x4 <- addend */

On 02/09/2020 12:15, Michal Simek wrote:
Hi,
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y compilation error is shown: /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' defined in .bss_start section in u-boot.
It is caused by adr instruction which permits the calculation of any byte address within +- 1MB of the current PC. Because U-Boot is bigger then 1MB calculation is failing.
The patch is using adrp/add instructions where adrp shifts a signed, 21-bit immediate left by 12 bits (4k page), adds it to the value of the program counter with the bottom 12 bits cleared to zero. Then add instruction provides the lower 12 bits which is offset within 4k page. These two instructions together compose full 32bit offset which should be more then enough to cover the whole u-boot size.
Signed-off-by: Edgar E. Iglesias edgar.iglesias@xilinx.com Signed-off-by: Michal Simek michal.simek@xilinx.com
It's a bit scary that you need more than 1MB, but indeed what you do below is the canonical pattern to get the full range of PC relative addressing (this is used heavily in Trusted Firmware, for instance).
The only thing to keep in mind is that this assumes that the load address of the binary is 4K aligned, so that the low 12 bits of the symbol stay the same. I wonder if we should enforce this somehow? But the load address is not controlled by the build process (the whole purpose of PIE), so that's not doable just in the build system?
Shall we at least document this? I guess typical load address are actually quite well aligned, so it might not be an issue in practice.
Cheers, Andre
arch/arm/cpu/armv8/start.S | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S index 002698b501c3..52cf1230c205 100644 --- a/arch/arm/cpu/armv8/start.S +++ b/arch/arm/cpu/armv8/start.S @@ -67,8 +67,10 @@ pie_fixup: adr x0, _start /* x0 <- Runtime value of _start */ ldr x1, _TEXT_BASE /* x1 <- Linked value of _start */ sub x9, x0, x1 /* x9 <- Run-vs-link offset */
- adr x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */
- adr x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */
- adrp x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */
- add x2, x2, #:lo12:__rel_dyn_start
- adrp x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */
- add x3, x3, #:lo12:__rel_dyn_end
pie_fix_loop: ldp x0, x1, [x2], #16 /* (x0, x1) <- (Link location, fixup) */ ldr x4, [x2], #8 /* x4 <- addend */

Hi,
On 02. 09. 20 16:43, André Przywara wrote:
On 02/09/2020 12:15, Michal Simek wrote:
Hi,
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y compilation error is shown: /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' defined in .bss_start section in u-boot.
It is caused by adr instruction which permits the calculation of any byte address within +- 1MB of the current PC. Because U-Boot is bigger then 1MB calculation is failing.
The patch is using adrp/add instructions where adrp shifts a signed, 21-bit immediate left by 12 bits (4k page), adds it to the value of the program counter with the bottom 12 bits cleared to zero. Then add instruction provides the lower 12 bits which is offset within 4k page. These two instructions together compose full 32bit offset which should be more then enough to cover the whole u-boot size.
Signed-off-by: Edgar E. Iglesias edgar.iglesias@xilinx.com Signed-off-by: Michal Simek michal.simek@xilinx.com
It's a bit scary that you need more than 1MB, but indeed what you do below is the canonical pattern to get the full range of PC relative addressing (this is used heavily in Trusted Firmware, for instance).
We have generic u-boot where we are enabling all features which you can use and it is up to everybody to disable what they don't need. We are at 1MB limit that's why I spot this issue and trying to solve it.
The only thing to keep in mind is that this assumes that the load address of the binary is 4K aligned, so that the low 12 bits of the symbol stay the same. I wonder if we should enforce this somehow? But the load address is not controlled by the build process (the whole purpose of PIE), so that's not doable just in the build system?
That's more question to you if we can enforce is somehow. :-)
Shall we at least document this? I guess typical load address are actually quite well aligned, so it might not be an issue in practice.
maybe note this in Kconfig help for POSITION_INDEPENDENT? Any other location?
Thanks, Michal

On Wed, Sep 02, 2020 at 03:43:08PM +0100, André Przywara wrote:
On 02/09/2020 12:15, Michal Simek wrote:
Hi,
Hi Andre,
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y compilation error is shown: /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' defined in .bss_start section in u-boot.
It is caused by adr instruction which permits the calculation of any byte address within +- 1MB of the current PC. Because U-Boot is bigger then 1MB calculation is failing.
The patch is using adrp/add instructions where adrp shifts a signed, 21-bit immediate left by 12 bits (4k page), adds it to the value of the program counter with the bottom 12 bits cleared to zero. Then add instruction provides the lower 12 bits which is offset within 4k page. These two instructions together compose full 32bit offset which should be more then enough to cover the whole u-boot size.
Signed-off-by: Edgar E. Iglesias edgar.iglesias@xilinx.com Signed-off-by: Michal Simek michal.simek@xilinx.com
It's a bit scary that you need more than 1MB, but indeed what you do below is the canonical pattern to get the full range of PC relative addressing (this is used heavily in Trusted Firmware, for instance).
The only thing to keep in mind is that this assumes that the load address of the binary is 4K aligned, so that the low 12 bits of the symbol stay the same. I wonder if we should enforce this somehow? But the load address is not controlled by the build process (the whole purpose of PIE), so that's not doable just in the build system?
There shouldn't be any need for 4K alignment. Could you elaborate on why you think there is?
Perhaps the commit message is a little confusing. The toolchain will compute the pc-relative offset from this particular location to the symbol and apply the relocations accordingly.
Best regards, Edgar
Shall we at least document this? I guess typical load address are actually quite well aligned, so it might not be an issue in practice.
Cheers, Andre
arch/arm/cpu/armv8/start.S | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S index 002698b501c3..52cf1230c205 100644 --- a/arch/arm/cpu/armv8/start.S +++ b/arch/arm/cpu/armv8/start.S @@ -67,8 +67,10 @@ pie_fixup: adr x0, _start /* x0 <- Runtime value of _start */ ldr x1, _TEXT_BASE /* x1 <- Linked value of _start */ sub x9, x0, x1 /* x9 <- Run-vs-link offset */
- adr x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */
- adr x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */
- adrp x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */
- add x2, x2, #:lo12:__rel_dyn_start
- adrp x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */
- add x3, x3, #:lo12:__rel_dyn_end
pie_fix_loop: ldp x0, x1, [x2], #16 /* (x0, x1) <- (Link location, fixup) */ ldr x4, [x2], #8 /* x4 <- addend */

On 02/09/2020 15:53, Edgar E. Iglesias wrote:
On Wed, Sep 02, 2020 at 03:43:08PM +0100, Andr� Przywara wrote:
On 02/09/2020 12:15, Michal Simek wrote:
Hi,
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y compilation error is shown: /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' defined in .bss_start section in u-boot.
It is caused by adr instruction which permits the calculation of any byte address within +- 1MB of the current PC. Because U-Boot is bigger then 1MB calculation is failing.
The patch is using adrp/add instructions where adrp shifts a signed, 21-bit immediate left by 12 bits (4k page), adds it to the value of the program counter with the bottom 12 bits cleared to zero. Then add instruction provides the lower 12 bits which is offset within 4k page. These two instructions together compose full 32bit offset which should be more then enough to cover the whole u-boot size.
Signed-off-by: Edgar E. Iglesias edgar.iglesias@xilinx.com Signed-off-by: Michal Simek michal.simek@xilinx.com
It's a bit scary that you need more than 1MB, but indeed what you do below is the canonical pattern to get the full range of PC relative addressing (this is used heavily in Trusted Firmware, for instance).
The only thing to keep in mind is that this assumes that the load address of the binary is 4K aligned, so that the low 12 bits of the symbol stay the same. I wonder if we should enforce this somehow? But the load address is not controlled by the build process (the whole purpose of PIE), so that's not doable just in the build system?
There shouldn't be any need for 4K alignment. Could you elaborate on why you think there is?
That seems to be slightly tricky, and I tried to get some confirmation, but here goes my reasoning. Maybe you can confirm this:
- adrp takes the relative offset, but only of the upper 20 bits (because that's all we can encode). It clears the lower 12 bits of the register. - the "add" is not PC relative anymore, so it just takes the lower 12 bits of the "absolute" linker symbol. So this assumes that the lower 12 bits of the actual address in memory and the lower 12 bits of the linker's view match. An example: 00024: adrp x0, SYMBOL 00028: add x0, x0, :lo12:SYMBOL
SYMBOL: 42058: ...
The toolchain will generate: adrp x0, #0x42; add x0, x0, #0x058
Now you load the code to 0x8000.0800 (NOT 4K aligned). SYMBOL is now at 0x80042858. The adrp will use the PC (0x8000.0824) & ~0xfff + offs => 0x8004.2000. The add will just add 0x58, so you end up with x0 being 0x80042058, which is not the right address.
Does this make sense?
Perhaps the commit message is a little confusing. The toolchain will compute the pc-relative offset from this particular location to the symbol and apply the relocations accordingly.
Yes, but the PC relative offset applies only to the upper 20 bits, because it's only adrp that has PC relative semantics.
Cheers, Andre
Shall we at least document this? I guess typical load address are actually quite well aligned, so it might not be an issue in practice.
Cheers, Andre
arch/arm/cpu/armv8/start.S | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S index 002698b501c3..52cf1230c205 100644 --- a/arch/arm/cpu/armv8/start.S +++ b/arch/arm/cpu/armv8/start.S @@ -67,8 +67,10 @@ pie_fixup: adr x0, _start /* x0 <- Runtime value of _start */ ldr x1, _TEXT_BASE /* x1 <- Linked value of _start */ sub x9, x0, x1 /* x9 <- Run-vs-link offset */
- adr x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */
- adr x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */
- adrp x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */
- add x2, x2, #:lo12:__rel_dyn_start
- adrp x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */
- add x3, x3, #:lo12:__rel_dyn_end
pie_fix_loop: ldp x0, x1, [x2], #16 /* (x0, x1) <- (Link location, fixup) */ ldr x4, [x2], #8 /* x4 <- addend */

On Wed, Sep 02, 2020 at 04:18:48PM +0100, André Przywara wrote:
On 02/09/2020 15:53, Edgar E. Iglesias wrote:
On Wed, Sep 02, 2020 at 03:43:08PM +0100, Andr� Przywara wrote:
On 02/09/2020 12:15, Michal Simek wrote:
Hi,
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y compilation error is shown: /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' defined in .bss_start section in u-boot.
It is caused by adr instruction which permits the calculation of any byte address within +- 1MB of the current PC. Because U-Boot is bigger then 1MB calculation is failing.
The patch is using adrp/add instructions where adrp shifts a signed, 21-bit immediate left by 12 bits (4k page), adds it to the value of the program counter with the bottom 12 bits cleared to zero. Then add instruction provides the lower 12 bits which is offset within 4k page. These two instructions together compose full 32bit offset which should be more then enough to cover the whole u-boot size.
Signed-off-by: Edgar E. Iglesias edgar.iglesias@xilinx.com Signed-off-by: Michal Simek michal.simek@xilinx.com
It's a bit scary that you need more than 1MB, but indeed what you do below is the canonical pattern to get the full range of PC relative addressing (this is used heavily in Trusted Firmware, for instance).
The only thing to keep in mind is that this assumes that the load address of the binary is 4K aligned, so that the low 12 bits of the symbol stay the same. I wonder if we should enforce this somehow? But the load address is not controlled by the build process (the whole purpose of PIE), so that's not doable just in the build system?
There shouldn't be any need for 4K alignment. Could you elaborate on why you think there is?
That seems to be slightly tricky, and I tried to get some confirmation, but here goes my reasoning. Maybe you can confirm this:
- adrp takes the relative offset, but only of the upper 20 bits (because
that's all we can encode). It clears the lower 12 bits of the register.
- the "add" is not PC relative anymore, so it just takes the lower 12
bits of the "absolute" linker symbol.
I was under the impression that this would use a PC-relative lower 12bit relocation but you are correct. I dissasembled the result:
40: 91000042 add x2, x2, #0x0 40: R_AARCH64_ADD_ABS_LO12_NC __rel_dyn_start
So this assumes that the lower 12 bits of the actual address in memory and the lower 12 bits of the linker's view match. An example: 00024: adrp x0, SYMBOL 00028: add x0, x0, :lo12:SYMBOL
SYMBOL: 42058: ...
The toolchain will generate: adrp x0, #0x42; add x0, x0, #0x058
Now you load the code to 0x8000.0800 (NOT 4K aligned). SYMBOL is now at 0x80042858. The adrp will use the PC (0x8000.0824) & ~0xfff + offs => 0x8004.2000. The add will just add 0x58, so you end up with x0 being 0x80042058, which is not the right address.
Does this make sense?
Yes, it makes sense.
Perhaps the commit message is a little confusing. The toolchain will compute the pc-relative offset from this particular location to the symbol and apply the relocations accordingly.
Yes, but the PC relative offset applies only to the upper 20 bits, because it's only adrp that has PC relative semantics.
Shall we at least document this? I guess typical load address are actually quite well aligned, so it might not be an issue in practice.
Yes, probably worth documenting and perhaps an early bail-out if it's not the case...
Thanks, Edgar

On 02/09/2020 16:25, Edgar E. Iglesias wrote:
On Wed, Sep 02, 2020 at 04:18:48PM +0100, Andr� Przywara wrote:
On 02/09/2020 15:53, Edgar E. Iglesias wrote:
On Wed, Sep 02, 2020 at 03:43:08PM +0100, Andr� Przywara wrote:
On 02/09/2020 12:15, Michal Simek wrote:
Hi,
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y compilation error is shown: /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' defined in .bss_start section in u-boot.
It is caused by adr instruction which permits the calculation of any byte address within +- 1MB of the current PC. Because U-Boot is bigger then 1MB calculation is failing.
The patch is using adrp/add instructions where adrp shifts a signed, 21-bit immediate left by 12 bits (4k page), adds it to the value of the program counter with the bottom 12 bits cleared to zero. Then add instruction provides the lower 12 bits which is offset within 4k page. These two instructions together compose full 32bit offset which should be more then enough to cover the whole u-boot size.
Signed-off-by: Edgar E. Iglesias edgar.iglesias@xilinx.com Signed-off-by: Michal Simek michal.simek@xilinx.com
It's a bit scary that you need more than 1MB, but indeed what you do below is the canonical pattern to get the full range of PC relative addressing (this is used heavily in Trusted Firmware, for instance).
The only thing to keep in mind is that this assumes that the load address of the binary is 4K aligned, so that the low 12 bits of the symbol stay the same. I wonder if we should enforce this somehow? But the load address is not controlled by the build process (the whole purpose of PIE), so that's not doable just in the build system?
There shouldn't be any need for 4K alignment. Could you elaborate on why you think there is?
That seems to be slightly tricky, and I tried to get some confirmation, but here goes my reasoning. Maybe you can confirm this:
- adrp takes the relative offset, but only of the upper 20 bits (because
that's all we can encode). It clears the lower 12 bits of the register.
- the "add" is not PC relative anymore, so it just takes the lower 12
bits of the "absolute" linker symbol.
I was under the impression that this would use a PC-relative lower 12bit relocation but you are correct. I dissasembled the result:
40: 91000042 add x2, x2, #0x0 40: R_AARCH64_ADD_ABS_LO12_NC __rel_dyn_start
So this assumes that the lower 12 bits of the actual address in memory and the lower 12 bits of the linker's view match. An example: 00024: adrp x0, SYMBOL 00028: add x0, x0, :lo12:SYMBOL
SYMBOL: 42058: ...
The toolchain will generate: adrp x0, #0x42; add x0, x0, #0x058
Now you load the code to 0x8000.0800 (NOT 4K aligned). SYMBOL is now at 0x80042858. The adrp will use the PC (0x8000.0824) & ~0xfff + offs => 0x8004.2000. The add will just add 0x58, so you end up with x0 being 0x80042058, which is not the right address.
Does this make sense?
Yes, it makes sense.
Perhaps the commit message is a little confusing. The toolchain will compute the pc-relative offset from this particular location to the symbol and apply the relocations accordingly.
Yes, but the PC relative offset applies only to the upper 20 bits, because it's only adrp that has PC relative semantics.
Shall we at least document this? I guess typical load address are actually quite well aligned, so it might not be an issue in practice.
Yes, probably worth documenting and perhaps an early bail-out if it's not the case...
Documenting sounds good, Kconfig might be a good place, as Michal suggested.
Bail out: I thought about that, it's very easy to detect at runtime, but what then? This is really early, so you could just enter a WFI loop, and hope for someone to connect the dots? Or can you think of any other way of communicating with the user?
Cheers, Andre

On 02. 09. 20 20:59, André Przywara wrote:
On 02/09/2020 16:25, Edgar E. Iglesias wrote:
On Wed, Sep 02, 2020 at 04:18:48PM +0100, Andr� Przywara wrote:
On 02/09/2020 15:53, Edgar E. Iglesias wrote:
On Wed, Sep 02, 2020 at 03:43:08PM +0100, Andr� Przywara wrote:
On 02/09/2020 12:15, Michal Simek wrote:
Hi,
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y compilation error is shown: /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' defined in .bss_start section in u-boot.
It is caused by adr instruction which permits the calculation of any byte address within +- 1MB of the current PC. Because U-Boot is bigger then 1MB calculation is failing.
The patch is using adrp/add instructions where adrp shifts a signed, 21-bit immediate left by 12 bits (4k page), adds it to the value of the program counter with the bottom 12 bits cleared to zero. Then add instruction provides the lower 12 bits which is offset within 4k page. These two instructions together compose full 32bit offset which should be more then enough to cover the whole u-boot size.
Signed-off-by: Edgar E. Iglesias edgar.iglesias@xilinx.com Signed-off-by: Michal Simek michal.simek@xilinx.com
It's a bit scary that you need more than 1MB, but indeed what you do below is the canonical pattern to get the full range of PC relative addressing (this is used heavily in Trusted Firmware, for instance).
The only thing to keep in mind is that this assumes that the load address of the binary is 4K aligned, so that the low 12 bits of the symbol stay the same. I wonder if we should enforce this somehow? But the load address is not controlled by the build process (the whole purpose of PIE), so that's not doable just in the build system?
There shouldn't be any need for 4K alignment. Could you elaborate on why you think there is?
That seems to be slightly tricky, and I tried to get some confirmation, but here goes my reasoning. Maybe you can confirm this:
- adrp takes the relative offset, but only of the upper 20 bits (because
that's all we can encode). It clears the lower 12 bits of the register.
- the "add" is not PC relative anymore, so it just takes the lower 12
bits of the "absolute" linker symbol.
I was under the impression that this would use a PC-relative lower 12bit relocation but you are correct. I dissasembled the result:
40: 91000042 add x2, x2, #0x0 40: R_AARCH64_ADD_ABS_LO12_NC __rel_dyn_start
So this assumes that the lower 12 bits of the actual address in memory and the lower 12 bits of the linker's view match. An example: 00024: adrp x0, SYMBOL 00028: add x0, x0, :lo12:SYMBOL
SYMBOL: 42058: ...
The toolchain will generate: adrp x0, #0x42; add x0, x0, #0x058
Now you load the code to 0x8000.0800 (NOT 4K aligned). SYMBOL is now at 0x80042858. The adrp will use the PC (0x8000.0824) & ~0xfff + offs => 0x8004.2000. The add will just add 0x58, so you end up with x0 being 0x80042058, which is not the right address.
Does this make sense?
Yes, it makes sense.
Perhaps the commit message is a little confusing. The toolchain will compute the pc-relative offset from this particular location to the symbol and apply the relocations accordingly.
Yes, but the PC relative offset applies only to the upper 20 bits, because it's only adrp that has PC relative semantics.
Shall we at least document this? I guess typical load address are actually quite well aligned, so it might not be an issue in practice.
Yes, probably worth documenting and perhaps an early bail-out if it's not the case...
Documenting sounds good, Kconfig might be a good place, as Michal suggested.
Bail out: I thought about that, it's very easy to detect at runtime, but what then? This is really early, so you could just enter a WFI loop, and hope for someone to connect the dots? Or can you think of any other way of communicating with the user?
yes it is very early. It is the first real task what run after reset. I am fine with detecting it to make sure that we won't have unpredictable behavior later. What detection code do you have in mind? Don't we even have this 4k alignment in place already?
Thanks, Michal

On 03/09/2020 14:41, Michal Simek wrote:
On 02. 09. 20 20:59, André Przywara wrote:
On 02/09/2020 16:25, Edgar E. Iglesias wrote:
On Wed, Sep 02, 2020 at 04:18:48PM +0100, Andr� Przywara wrote:
On 02/09/2020 15:53, Edgar E. Iglesias wrote:
On Wed, Sep 02, 2020 at 03:43:08PM +0100, Andr� Przywara wrote:
On 02/09/2020 12:15, Michal Simek wrote:
Hi,
> From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com > > When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y > compilation error is shown: > /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation > truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' > defined in .bss_start section in u-boot. > > It is caused by adr instruction which permits the calculation of any byte > address within +- 1MB of the current PC. > Because U-Boot is bigger then 1MB calculation is failing. > > The patch is using adrp/add instructions where adrp shifts a signed, 21-bit > immediate left by 12 bits (4k page), adds it to the value of the program > counter with the bottom 12 bits cleared to zero. Then add instruction > provides the lower 12 bits which is offset within 4k page. > These two instructions together compose full 32bit offset which should be > more then enough to cover the whole u-boot size. > > Signed-off-by: Edgar E. Iglesias edgar.iglesias@xilinx.com > Signed-off-by: Michal Simek michal.simek@xilinx.com
It's a bit scary that you need more than 1MB, but indeed what you do below is the canonical pattern to get the full range of PC relative addressing (this is used heavily in Trusted Firmware, for instance).
The only thing to keep in mind is that this assumes that the load address of the binary is 4K aligned, so that the low 12 bits of the symbol stay the same. I wonder if we should enforce this somehow? But the load address is not controlled by the build process (the whole purpose of PIE), so that's not doable just in the build system?
There shouldn't be any need for 4K alignment. Could you elaborate on why you think there is?
That seems to be slightly tricky, and I tried to get some confirmation, but here goes my reasoning. Maybe you can confirm this:
- adrp takes the relative offset, but only of the upper 20 bits (because
that's all we can encode). It clears the lower 12 bits of the register.
- the "add" is not PC relative anymore, so it just takes the lower 12
bits of the "absolute" linker symbol.
I was under the impression that this would use a PC-relative lower 12bit relocation but you are correct. I dissasembled the result:
40: 91000042 add x2, x2, #0x0 40: R_AARCH64_ADD_ABS_LO12_NC __rel_dyn_start
So this assumes that the lower 12 bits of the actual address in memory and the lower 12 bits of the linker's view match. An example: 00024: adrp x0, SYMBOL 00028: add x0, x0, :lo12:SYMBOL
SYMBOL: 42058: ...
The toolchain will generate: adrp x0, #0x42; add x0, x0, #0x058
Now you load the code to 0x8000.0800 (NOT 4K aligned). SYMBOL is now at 0x80042858. The adrp will use the PC (0x8000.0824) & ~0xfff + offs => 0x8004.2000. The add will just add 0x58, so you end up with x0 being 0x80042058, which is not the right address.
Does this make sense?
Yes, it makes sense.
Perhaps the commit message is a little confusing. The toolchain will compute the pc-relative offset from this particular location to the symbol and apply the relocations accordingly.
Yes, but the PC relative offset applies only to the upper 20 bits, because it's only adrp that has PC relative semantics.
Shall we at least document this? I guess typical load address are actually quite well aligned, so it might not be an issue in practice.
Yes, probably worth documenting and perhaps an early bail-out if it's not the case...
Documenting sounds good, Kconfig might be a good place, as Michal suggested.
Bail out: I thought about that, it's very easy to detect at runtime, but what then? This is really early, so you could just enter a WFI loop, and hope for someone to connect the dots? Or can you think of any other way of communicating with the user?
yes it is very early. It is the first real task what run after reset. I am fine with detecting it to make sure that we won't have unpredictable behavior later. What detection code do you have in mind?
Just "adr"ing the beginning of the image (linker address 0), and checking for all 12 LSBs to be 0. The best I thought of would be a WFI loop if not. That sounds like 4 instructions or so in total to me.
Don't we even have this 4k alignment in place already?
Do you mean in linker scripts? I think what counts here is that the actual *load* address is 4K aligned, which I believe is out of control of U-Boot. I would guess that's up to the user (flash address) or previous boot-stages (BootROM or pre-SPL firmware) to set the actual load address. In the best case it's very platform dependent. But it is definitely variable, otherwise we wouldn't need PIE in the first place.
Cheers, Andre

On Thu, Sep 03, 2020 at 02:52:39PM +0100, André Przywara wrote:
On 03/09/2020 14:41, Michal Simek wrote:
On 02. 09. 20 20:59, André Przywara wrote:
On 02/09/2020 16:25, Edgar E. Iglesias wrote:
On Wed, Sep 02, 2020 at 04:18:48PM +0100, Andr� Przywara wrote:
On 02/09/2020 15:53, Edgar E. Iglesias wrote:
On Wed, Sep 02, 2020 at 03:43:08PM +0100, Andr� Przywara wrote: > On 02/09/2020 12:15, Michal Simek wrote:
Hi,
> >> From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com >> >> When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y >> compilation error is shown: >> /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation >> truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' >> defined in .bss_start section in u-boot. >> >> It is caused by adr instruction which permits the calculation of any byte >> address within +- 1MB of the current PC. >> Because U-Boot is bigger then 1MB calculation is failing. >> >> The patch is using adrp/add instructions where adrp shifts a signed, 21-bit >> immediate left by 12 bits (4k page), adds it to the value of the program >> counter with the bottom 12 bits cleared to zero. Then add instruction >> provides the lower 12 bits which is offset within 4k page. >> These two instructions together compose full 32bit offset which should be >> more then enough to cover the whole u-boot size. >> >> Signed-off-by: Edgar E. Iglesias edgar.iglesias@xilinx.com >> Signed-off-by: Michal Simek michal.simek@xilinx.com > > It's a bit scary that you need more than 1MB, but indeed what you do > below is the canonical pattern to get the full range of PC relative > addressing (this is used heavily in Trusted Firmware, for instance). > > The only thing to keep in mind is that this assumes that the load > address of the binary is 4K aligned, so that the low 12 bits of the > symbol stay the same. I wonder if we should enforce this somehow? But > the load address is not controlled by the build process (the whole > purpose of PIE), so that's not doable just in the build system?
There shouldn't be any need for 4K alignment. Could you elaborate on why you think there is?
That seems to be slightly tricky, and I tried to get some confirmation, but here goes my reasoning. Maybe you can confirm this:
- adrp takes the relative offset, but only of the upper 20 bits (because
that's all we can encode). It clears the lower 12 bits of the register.
- the "add" is not PC relative anymore, so it just takes the lower 12
bits of the "absolute" linker symbol.
I was under the impression that this would use a PC-relative lower 12bit relocation but you are correct. I dissasembled the result:
40: 91000042 add x2, x2, #0x0 40: R_AARCH64_ADD_ABS_LO12_NC __rel_dyn_start
So this assumes that the lower 12 bits of the actual address in memory and the lower 12 bits of the linker's view match. An example: 00024: adrp x0, SYMBOL 00028: add x0, x0, :lo12:SYMBOL
SYMBOL: 42058: ...
The toolchain will generate: adrp x0, #0x42; add x0, x0, #0x058
Now you load the code to 0x8000.0800 (NOT 4K aligned). SYMBOL is now at 0x80042858. The adrp will use the PC (0x8000.0824) & ~0xfff + offs => 0x8004.2000. The add will just add 0x58, so you end up with x0 being 0x80042058, which is not the right address.
Does this make sense?
Yes, it makes sense.
Perhaps the commit message is a little confusing. The toolchain will compute the pc-relative offset from this particular location to the symbol and apply the relocations accordingly.
Yes, but the PC relative offset applies only to the upper 20 bits, because it's only adrp that has PC relative semantics.
> > Shall we at least document this? I guess typical load address are > actually quite well aligned, so it might not be an issue in practice. >
Yes, probably worth documenting and perhaps an early bail-out if it's not the case...
Documenting sounds good, Kconfig might be a good place, as Michal suggested.
Bail out: I thought about that, it's very easy to detect at runtime, but what then? This is really early, so you could just enter a WFI loop, and hope for someone to connect the dots? Or can you think of any other way of communicating with the user?
yes it is very early. It is the first real task what run after reset. I am fine with detecting it to make sure that we won't have unpredictable behavior later. What detection code do you have in mind?
Just "adr"ing the beginning of the image (linker address 0), and checking for all 12 LSBs to be 0. The best I thought of would be a WFI loop if not. That sounds like 4 instructions or so in total to me.
Yeah, that sounds good me too. With a good comment in the source-code, people would be able to connect the dots.
Don't we even have this 4k alignment in place already?
Do you mean in linker scripts? I think what counts here is that the actual *load* address is 4K aligned, which I believe is out of control of U-Boot. I would guess that's up to the user (flash address) or previous boot-stages (BootROM or pre-SPL firmware) to set the actual load address. In the best case it's very platform dependent. But it is definitely variable, otherwise we wouldn't need PIE in the first place.
Right, it's the run-time load address that matters. I guess we already have limitations that fail silently (i.e a user can't load U-boot at address 1) but the 4K one may be more subtle and possible to catch.
Cheers, Edgar

On 03. 09. 20 15:59, Edgar E. Iglesias wrote:
On Thu, Sep 03, 2020 at 02:52:39PM +0100, André Przywara wrote:
On 03/09/2020 14:41, Michal Simek wrote:
On 02. 09. 20 20:59, André Przywara wrote:
On 02/09/2020 16:25, Edgar E. Iglesias wrote:
On Wed, Sep 02, 2020 at 04:18:48PM +0100, Andr� Przywara wrote:
On 02/09/2020 15:53, Edgar E. Iglesias wrote: > On Wed, Sep 02, 2020 at 03:43:08PM +0100, Andr� Przywara wrote: >> On 02/09/2020 12:15, Michal Simek wrote:
Hi,
>> >>> From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com >>> >>> When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y >>> compilation error is shown: >>> /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation >>> truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' >>> defined in .bss_start section in u-boot. >>> >>> It is caused by adr instruction which permits the calculation of any byte >>> address within +- 1MB of the current PC. >>> Because U-Boot is bigger then 1MB calculation is failing. >>> >>> The patch is using adrp/add instructions where adrp shifts a signed, 21-bit >>> immediate left by 12 bits (4k page), adds it to the value of the program >>> counter with the bottom 12 bits cleared to zero. Then add instruction >>> provides the lower 12 bits which is offset within 4k page. >>> These two instructions together compose full 32bit offset which should be >>> more then enough to cover the whole u-boot size. >>> >>> Signed-off-by: Edgar E. Iglesias edgar.iglesias@xilinx.com >>> Signed-off-by: Michal Simek michal.simek@xilinx.com >> >> It's a bit scary that you need more than 1MB, but indeed what you do >> below is the canonical pattern to get the full range of PC relative >> addressing (this is used heavily in Trusted Firmware, for instance). >> >> The only thing to keep in mind is that this assumes that the load >> address of the binary is 4K aligned, so that the low 12 bits of the >> symbol stay the same. I wonder if we should enforce this somehow? But >> the load address is not controlled by the build process (the whole >> purpose of PIE), so that's not doable just in the build system? > > There shouldn't be any need for 4K alignment. Could you elaborate on > why you think there is?
That seems to be slightly tricky, and I tried to get some confirmation, but here goes my reasoning. Maybe you can confirm this:
- adrp takes the relative offset, but only of the upper 20 bits (because
that's all we can encode). It clears the lower 12 bits of the register.
- the "add" is not PC relative anymore, so it just takes the lower 12
bits of the "absolute" linker symbol.
I was under the impression that this would use a PC-relative lower 12bit relocation but you are correct. I dissasembled the result:
40: 91000042 add x2, x2, #0x0 40: R_AARCH64_ADD_ABS_LO12_NC __rel_dyn_start
So this assumes that the lower 12 bits of the actual address in memory and the lower 12 bits of the linker's view match. An example: 00024: adrp x0, SYMBOL 00028: add x0, x0, :lo12:SYMBOL
SYMBOL: 42058: ...
The toolchain will generate: adrp x0, #0x42; add x0, x0, #0x058
Now you load the code to 0x8000.0800 (NOT 4K aligned). SYMBOL is now at 0x80042858. The adrp will use the PC (0x8000.0824) & ~0xfff + offs => 0x8004.2000. The add will just add 0x58, so you end up with x0 being 0x80042058, which is not the right address.
Does this make sense?
Yes, it makes sense.
> Perhaps the commit message is a little confusing. The toolchain will > compute the pc-relative offset from this particular location to the > symbol and apply the relocations accordingly.
Yes, but the PC relative offset applies only to the upper 20 bits, because it's only adrp that has PC relative semantics.
>> >> Shall we at least document this? I guess typical load address are >> actually quite well aligned, so it might not be an issue in practice. >>
Yes, probably worth documenting and perhaps an early bail-out if it's not the case...
Documenting sounds good, Kconfig might be a good place, as Michal suggested.
Bail out: I thought about that, it's very easy to detect at runtime, but what then? This is really early, so you could just enter a WFI loop, and hope for someone to connect the dots? Or can you think of any other way of communicating with the user?
yes it is very early. It is the first real task what run after reset. I am fine with detecting it to make sure that we won't have unpredictable behavior later. What detection code do you have in mind?
Just "adr"ing the beginning of the image (linker address 0), and checking for all 12 LSBs to be 0. The best I thought of would be a WFI loop if not. That sounds like 4 instructions or so in total to me.
Yeah, that sounds good me too. With a good comment in the source-code, people would be able to connect the dots.
Don't we even have this 4k alignment in place already?
Do you mean in linker scripts? I think what counts here is that the actual *load* address is 4K aligned, which I believe is out of control of U-Boot. I would guess that's up to the user (flash address) or previous boot-stages (BootROM or pre-SPL firmware) to set the actual load address. In the best case it's very platform dependent. But it is definitely variable, otherwise we wouldn't need PIE in the first place.
Right, it's the run-time load address that matters. I guess we already have limitations that fail silently (i.e a user can't load U-boot at address 1) but the 4K one may be more subtle and possible to catch.
Edgar: Do you want to send v2 with it? Or do you want me to send v2 with Kconfig update and checking alignment? Also as you know you have written origin patch and I have added description based on our discussion. If you want me to send v2 it is good time to make that commit message more accurate. :-)
Thanks, Michal

On 9/2/20 5:15 AM, Michal Simek wrote:
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y compilation error is shown: /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' defined in .bss_start section in u-boot.
It is caused by adr instruction which permits the calculation of any byte address within +- 1MB of the current PC. Because U-Boot is bigger then 1MB calculation is failing.
The patch is using adrp/add instructions where adrp shifts a signed, 21-bit immediate left by 12 bits (4k page), adds it to the value of the program counter with the bottom 12 bits cleared to zero. Then add instruction provides the lower 12 bits which is offset within 4k page. These two instructions together compose full 32bit offset which should be more then enough to cover the whole u-boot size.
diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S
@@ -67,8 +67,10 @@ pie_fixup: adr x0, _start /* x0 <- Runtime value of _start */ ldr x1, _TEXT_BASE /* x1 <- Linked value of _start */ sub x9, x0, x1 /* x9 <- Run-vs-link offset */
- adr x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */
- adr x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */
- adrp x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */
- add x2, x2, #:lo12:__rel_dyn_start
- adrp x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */
- add x3, x3, #:lo12:__rel_dyn_end
pie_fix_loop: ldp x0, x1, [x2], #16 /* (x0, x1) <- (Link location, fixup) */ ldr x4, [x2], #8 /* x4 <- addend */
There are likely a bunch of other places in the code that need updating too; take a look at commit 49e93875a62f "arm64: support running at addr other than linked to" (which introduced the code above) to find other places with similar instruction sequences that almost certainly need updating.

On 02. 09. 20 18:34, Stephen Warren wrote:
On 9/2/20 5:15 AM, Michal Simek wrote:
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y compilation error is shown: /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' defined in .bss_start section in u-boot.
It is caused by adr instruction which permits the calculation of any byte address within +- 1MB of the current PC. Because U-Boot is bigger then 1MB calculation is failing.
The patch is using adrp/add instructions where adrp shifts a signed, 21-bit immediate left by 12 bits (4k page), adds it to the value of the program counter with the bottom 12 bits cleared to zero. Then add instruction provides the lower 12 bits which is offset within 4k page. These two instructions together compose full 32bit offset which should be more then enough to cover the whole u-boot size.
diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S
@@ -67,8 +67,10 @@ pie_fixup: adr x0, _start /* x0 <- Runtime value of _start */ ldr x1, _TEXT_BASE /* x1 <- Linked value of _start */ sub x9, x0, x1 /* x9 <- Run-vs-link offset */
- adr x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */
- adr x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */
- adrp x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */
- add x2, x2, #:lo12:__rel_dyn_start
- adrp x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */
- add x3, x3, #:lo12:__rel_dyn_end
pie_fix_loop: ldp x0, x1, [x2], #16 /* (x0, x1) <- (Link location, fixup) */ ldr x4, [x2], #8 /* x4 <- addend */
There are likely a bunch of other places in the code that need updating too; take a look at commit 49e93875a62f "arm64: support running at addr other than linked to" (which introduced the code above) to find other places with similar instruction sequences that almost certainly need updating.
I didn't hit any issue to have a need to update them. Definitely worth to check that locations too.
Thanks, Michal

On 03/09/2020 14:35, Michal Simek wrote:
On 02. 09. 20 18:34, Stephen Warren wrote:
On 9/2/20 5:15 AM, Michal Simek wrote:
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y compilation error is shown: /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' defined in .bss_start section in u-boot.
It is caused by adr instruction which permits the calculation of any byte address within +- 1MB of the current PC. Because U-Boot is bigger then 1MB calculation is failing.
The patch is using adrp/add instructions where adrp shifts a signed, 21-bit immediate left by 12 bits (4k page), adds it to the value of the program counter with the bottom 12 bits cleared to zero. Then add instruction provides the lower 12 bits which is offset within 4k page. These two instructions together compose full 32bit offset which should be more then enough to cover the whole u-boot size.
diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S
@@ -67,8 +67,10 @@ pie_fixup: adr x0, _start /* x0 <- Runtime value of _start */ ldr x1, _TEXT_BASE /* x1 <- Linked value of _start */ sub x9, x0, x1 /* x9 <- Run-vs-link offset */
- adr x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */
- adr x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */
- adrp x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */
- add x2, x2, #:lo12:__rel_dyn_start
- adrp x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */
- add x3, x3, #:lo12:__rel_dyn_end
pie_fix_loop: ldp x0, x1, [x2], #16 /* (x0, x1) <- (Link location, fixup) */ ldr x4, [x2], #8 /* x4 <- addend */
There are likely a bunch of other places in the code that need updating too; take a look at commit 49e93875a62f "arm64: support running at addr other than linked to" (which introduced the code above) to find other places with similar instruction sequences that almost certainly need updating.
I didn't hit any issue to have a need to update them. Definitely worth to check that locations too.
So I thought the same, so I checked at least this file. And while there are other "adr" instructions, they only go to nearby labels, so don't need to be pumped up. But I will try to grep for more cases as well.
Cheers, Andre

On 9/3/20 7:40 AM, André Przywara wrote:
On 03/09/2020 14:35, Michal Simek wrote:
On 02. 09. 20 18:34, Stephen Warren wrote:
On 9/2/20 5:15 AM, Michal Simek wrote:
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y compilation error is shown: /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' defined in .bss_start section in u-boot.
It is caused by adr instruction which permits the calculation of any byte address within +- 1MB of the current PC. Because U-Boot is bigger then 1MB calculation is failing.
The patch is using adrp/add instructions where adrp shifts a signed, 21-bit immediate left by 12 bits (4k page), adds it to the value of the program counter with the bottom 12 bits cleared to zero. Then add instruction provides the lower 12 bits which is offset within 4k page. These two instructions together compose full 32bit offset which should be more then enough to cover the whole u-boot size.
diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S
@@ -67,8 +67,10 @@ pie_fixup: adr x0, _start /* x0 <- Runtime value of _start */ ldr x1, _TEXT_BASE /* x1 <- Linked value of _start */ sub x9, x0, x1 /* x9 <- Run-vs-link offset */
- adr x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */
- adr x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */
- adrp x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */
- add x2, x2, #:lo12:__rel_dyn_start
- adrp x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */
- add x3, x3, #:lo12:__rel_dyn_end
pie_fix_loop: ldp x0, x1, [x2], #16 /* (x0, x1) <- (Link location, fixup) */ ldr x4, [x2], #8 /* x4 <- addend */
There are likely a bunch of other places in the code that need updating too; take a look at commit 49e93875a62f "arm64: support running at addr other than linked to" (which introduced the code above) to find other places with similar instruction sequences that almost certainly need updating.
I didn't hit any issue to have a need to update them. Definitely worth to check that locations too.
So I thought the same, so I checked at least this file. And while there are other "adr" instructions, they only go to nearby labels, so don't need to be pumped up. But I will try to grep for more cases as well.
So in the patch I linked to, what about the added ard instructions in arch/arm/lib/crt0_64.S and arch/arm/lib/relocate_64.S?
Perhaps that code gets linked more towards the middle of U-Boot than the code you're fixing in start.S, so the max 1M offset just happens to reach all the relevant symbols and relocations that are in your current binary, but if your binary gets a little larger (e.g. goes from 1.05M to 2M say) that code will fail in the same way?

On Thu, Sep 03, 2020 at 09:45:39AM -0600, Stephen Warren wrote:
On 9/3/20 7:40 AM, André Przywara wrote:
On 03/09/2020 14:35, Michal Simek wrote:
On 02. 09. 20 18:34, Stephen Warren wrote:
On 9/2/20 5:15 AM, Michal Simek wrote:
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y compilation error is shown: /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' defined in .bss_start section in u-boot.
It is caused by adr instruction which permits the calculation of any byte address within +- 1MB of the current PC. Because U-Boot is bigger then 1MB calculation is failing.
The patch is using adrp/add instructions where adrp shifts a signed, 21-bit immediate left by 12 bits (4k page), adds it to the value of the program counter with the bottom 12 bits cleared to zero. Then add instruction provides the lower 12 bits which is offset within 4k page. These two instructions together compose full 32bit offset which should be more then enough to cover the whole u-boot size.
diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S
@@ -67,8 +67,10 @@ pie_fixup: adr x0, _start /* x0 <- Runtime value of _start */ ldr x1, _TEXT_BASE /* x1 <- Linked value of _start */ sub x9, x0, x1 /* x9 <- Run-vs-link offset */
- adr x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */
- adr x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */
- adrp x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */
- add x2, x2, #:lo12:__rel_dyn_start
- adrp x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */
- add x3, x3, #:lo12:__rel_dyn_end
pie_fix_loop: ldp x0, x1, [x2], #16 /* (x0, x1) <- (Link location, fixup) */ ldr x4, [x2], #8 /* x4 <- addend */
There are likely a bunch of other places in the code that need updating too; take a look at commit 49e93875a62f "arm64: support running at addr other than linked to" (which introduced the code above) to find other places with similar instruction sequences that almost certainly need updating.
I didn't hit any issue to have a need to update them. Definitely worth to check that locations too.
So I thought the same, so I checked at least this file. And while there are other "adr" instructions, they only go to nearby labels, so don't need to be pumped up. But I will try to grep for more cases as well.
So in the patch I linked to, what about the added ard instructions in arch/arm/lib/crt0_64.S and arch/arm/lib/relocate_64.S?
Perhaps that code gets linked more towards the middle of U-Boot than the code you're fixing in start.S, so the max 1M offset just happens to reach all the relevant symbols and relocations that are in your current binary, but if your binary gets a little larger (e.g. goes from 1.05M to 2M say) that code will fail in the same way?
Yes, those were apparently already corrected by Ibai: https://github.com/u-boot/u-boot/commit/98ffbb78e12646a1d06236ad6a1893217f25...
Cheers, Edgar

On 9/3/20 1:07 PM, Edgar E. Iglesias wrote:
On Thu, Sep 03, 2020 at 09:45:39AM -0600, Stephen Warren wrote:
On 9/3/20 7:40 AM, André Przywara wrote:
On 03/09/2020 14:35, Michal Simek wrote:
On 02. 09. 20 18:34, Stephen Warren wrote:
On 9/2/20 5:15 AM, Michal Simek wrote:
From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com
When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y compilation error is shown: /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' defined in .bss_start section in u-boot.
It is caused by adr instruction which permits the calculation of any byte address within +- 1MB of the current PC. Because U-Boot is bigger then 1MB calculation is failing.
The patch is using adrp/add instructions where adrp shifts a signed, 21-bit immediate left by 12 bits (4k page), adds it to the value of the program counter with the bottom 12 bits cleared to zero. Then add instruction provides the lower 12 bits which is offset within 4k page. These two instructions together compose full 32bit offset which should be more then enough to cover the whole u-boot size.
diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S
@@ -67,8 +67,10 @@ pie_fixup: adr x0, _start /* x0 <- Runtime value of _start */ ldr x1, _TEXT_BASE /* x1 <- Linked value of _start */ sub x9, x0, x1 /* x9 <- Run-vs-link offset */
- adr x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */
- adr x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */
- adrp x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */
- add x2, x2, #:lo12:__rel_dyn_start
- adrp x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */
- add x3, x3, #:lo12:__rel_dyn_end
pie_fix_loop: ldp x0, x1, [x2], #16 /* (x0, x1) <- (Link location, fixup) */ ldr x4, [x2], #8 /* x4 <- addend */
There are likely a bunch of other places in the code that need updating too; take a look at commit 49e93875a62f "arm64: support running at addr other than linked to" (which introduced the code above) to find other places with similar instruction sequences that almost certainly need updating.
I didn't hit any issue to have a need to update them. Definitely worth to check that locations too.
So I thought the same, so I checked at least this file. And while there are other "adr" instructions, they only go to nearby labels, so don't need to be pumped up. But I will try to grep for more cases as well.
So in the patch I linked to, what about the added ard instructions in arch/arm/lib/crt0_64.S and arch/arm/lib/relocate_64.S?
Perhaps that code gets linked more towards the middle of U-Boot than the code you're fixing in start.S, so the max 1M offset just happens to reach all the relevant symbols and relocations that are in your current binary, but if your binary gets a little larger (e.g. goes from 1.05M to 2M say) that code will fail in the same way?
Yes, those were apparently already corrected by Ibai: https://github.com/u-boot/u-boot/commit/98ffbb78e12646a1d06236ad6a1893217f25...
Ah OK.
So I guess this means that in practice, U-Boot already has the limitation that the load (and relocation target?) address must be 4K-aligned, since it uses the same instruction sequence we're discusssing here.

On Thu, Sep 03, 2020 at 01:10:48PM -0600, Stephen Warren wrote:
On 9/3/20 1:07 PM, Edgar E. Iglesias wrote:
On Thu, Sep 03, 2020 at 09:45:39AM -0600, Stephen Warren wrote:
On 9/3/20 7:40 AM, André Przywara wrote:
On 03/09/2020 14:35, Michal Simek wrote:
On 02. 09. 20 18:34, Stephen Warren wrote:
On 9/2/20 5:15 AM, Michal Simek wrote: > From: "Edgar E. Iglesias" edgar.iglesias@xilinx.com > > When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y > compilation error is shown: > /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation > truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' > defined in .bss_start section in u-boot. > > It is caused by adr instruction which permits the calculation of any byte > address within +- 1MB of the current PC. > Because U-Boot is bigger then 1MB calculation is failing. > > The patch is using adrp/add instructions where adrp shifts a signed, 21-bit > immediate left by 12 bits (4k page), adds it to the value of the program > counter with the bottom 12 bits cleared to zero. Then add instruction > provides the lower 12 bits which is offset within 4k page. > These two instructions together compose full 32bit offset which should be > more then enough to cover the whole u-boot size.
> diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S
> @@ -67,8 +67,10 @@ pie_fixup: > adr x0, _start /* x0 <- Runtime value of _start */ > ldr x1, _TEXT_BASE /* x1 <- Linked value of _start */ > sub x9, x0, x1 /* x9 <- Run-vs-link offset */ > - adr x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */ > - adr x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */ > + adrp x2, __rel_dyn_start /* x2 <- Runtime &__rel_dyn_start */ > + add x2, x2, #:lo12:__rel_dyn_start > + adrp x3, __rel_dyn_end /* x3 <- Runtime &__rel_dyn_end */ > + add x3, x3, #:lo12:__rel_dyn_end > pie_fix_loop: > ldp x0, x1, [x2], #16 /* (x0, x1) <- (Link location, fixup) */ > ldr x4, [x2], #8 /* x4 <- addend */
There are likely a bunch of other places in the code that need updating too; take a look at commit 49e93875a62f "arm64: support running at addr other than linked to" (which introduced the code above) to find other places with similar instruction sequences that almost certainly need updating.
I didn't hit any issue to have a need to update them. Definitely worth to check that locations too.
So I thought the same, so I checked at least this file. And while there are other "adr" instructions, they only go to nearby labels, so don't need to be pumped up. But I will try to grep for more cases as well.
So in the patch I linked to, what about the added ard instructions in arch/arm/lib/crt0_64.S and arch/arm/lib/relocate_64.S?
Perhaps that code gets linked more towards the middle of U-Boot than the code you're fixing in start.S, so the max 1M offset just happens to reach all the relevant symbols and relocations that are in your current binary, but if your binary gets a little larger (e.g. goes from 1.05M to 2M say) that code will fail in the same way?
Yes, those were apparently already corrected by Ibai: https://github.com/u-boot/u-boot/commit/98ffbb78e12646a1d06236ad6a1893217f25...
Ah OK.
So I guess this means that in practice, U-Boot already has the limitation that the load (and relocation target?) address must be 4K-aligned, since it uses the same instruction sequence we're discusssing here.
Apparently, yes.
Anyway, we'll submit a patch tomorrow with the early trapping if not aligned.
Cheers, Edgar
participants (5)
-
André Przywara
-
Edgar E. Iglesias
-
Michal Simek
-
Michal Simek
-
Stephen Warren