[U-Boot] [RFC] [PATCH] arm: arm926ejs: use ELF relocations

This patch is *not* a submission for master!
It is a proof of concept of ELF relocations for ARM, hastily done in a day's work time for people on the list to try and to comment. All comments are welcome, as several suggestions have been made today on the list that I did not have time to incorporate, such as rewriting the elf table fixup code in C.
The basic idea of this patch is to replace the -fPIC compile-time option with the -pie link-time option. This removes the GOT but adds the .rel.dyn and .dynsym tables, which together allow fixing up code more completely than with -fPIC and the GOT; for instance, all pointers inside structures are fixed up with -pie, whereas they are not with GOT.
Note that references to linker-file-generated symbols were also made relative to _start rather than absolute. This is not needed as such, but it will be useful when optimizing the relocation tables. Actually I should have separated this from the ELF relocation support per se.
The edminiv2.h config file is there for reference only; this is the one I used for tests. Latest numbers are:
With GOT relocs:
text data bss dec hex filename .bin size 141376 4388 16640 162404 27a64 ./u-boot 145764
With ELF relocs:
text data bss dec hex filename .bin size 149677 3727 16636 170040 29838 ./u-boot 153408
The size difference is essentially due to .rel.dyn not being optimal. As discussed, an added build step should allow reducing it by half and making ELF sizes roughly similar to GOT ones.
Tests and comments not only welcome but also heartily called for.
Amicalement, Albert.
--- arch/arm/config.mk | 3 +- arch/arm/cpu/arm926ejs/start.S | 172 +++++++++++++++++++++---------------- arch/arm/cpu/arm926ejs/u-boot.lds | 9 ++ arch/arm/include/asm/u-boot-arm.h | 14 ++-- arch/arm/lib/board.c | 8 +- board/LaCie/edminiv2/config.mk | 5 +- include/configs/edminiv2.h | 1 + 7 files changed, 123 insertions(+), 89 deletions(-)
diff --git a/arch/arm/config.mk b/arch/arm/config.mk index 6923f6d..e9e02da 100644 --- a/arch/arm/config.mk +++ b/arch/arm/config.mk @@ -35,7 +35,7 @@ endif
ifndef CONFIG_SYS_ARM_WITHOUT_RELOC # needed for relocation -PLATFORM_RELFLAGS += -fPIC +#PLATFORM_RELFLAGS += -fPIC endif
ifdef CONFIG_SYS_ARM_WITHOUT_RELOC @@ -72,3 +72,4 @@ PLATFORM_LIBS += $(OBJTREE)/arch/arm/lib/eabi_compat.o endif endif LDSCRIPT := $(SRCTREE)/$(CPUDIR)/u-boot.lds +PLATFORM_LDFLAGS += -pie \ No newline at end of file diff --git a/arch/arm/cpu/arm926ejs/start.S b/arch/arm/cpu/arm926ejs/start.S index 16ee972..5a7ae7e 100644 --- a/arch/arm/cpu/arm926ejs/start.S +++ b/arch/arm/cpu/arm926ejs/start.S @@ -10,6 +10,7 @@ * Copyright (c) 2002 Gary Jennejohn garyj@denx.de * Copyright (c) 2003 Richard Woodruff r-woodruff2@ti.com * Copyright (c) 2003 Kshitij kshitij@ti.com + * Copyright (c) 2010 Albert Aribaud albert.aribaud@free.fr * * See file CREDITS for list of people who contributed to this * project. @@ -118,22 +119,19 @@ _fiq: _TEXT_BASE: .word TEXT_BASE
-#if defined(CONFIG_SYS_ARM_WITHOUT_RELOC) -.globl _armboot_start -_armboot_start: - .word _start -#endif - /* * These are defined in the board-specific linker script. + * Subtracting _start from them lets the linker put their + * relative position in the executable instead of leaving + * them null. */ -.globl _bss_start -_bss_start: - .word __bss_start +.globl _bss_start_ofs +_bss_start_ofs: + .word __bss_start - _start
-.globl _bss_end -_bss_end: - .word _end +.globl _bss_end_ofs +_bss_end_ofs: + .word _end - _start
#ifdef CONFIG_USE_IRQ /* IRQ stack memory (calculated at run-time) */ @@ -153,29 +151,21 @@ FIQ_STACK_START: IRQ_STACK_START_IN: .word 0x0badc0de
-.globl _datarel_start -_datarel_start: - .word __datarel_start - -.globl _datarelrolocal_start -_datarelrolocal_start: - .word __datarelrolocal_start - -.globl _datarellocal_start -_datarellocal_start: - .word __datarellocal_start +.globl _datarel_start_ofs +_datarel_start_ofs: + .word __datarel_start - _start
-.globl _datarelro_start -_datarelro_start: - .word __datarelro_start +.globl _datarelrolocal_start_ofs +_datarelrolocal_start_ofs: + .word __datarelrolocal_start - _start
-.globl _got_start -_got_start: - .word __got_start +.globl _datarellocal_start_ofs +_datarellocal_start_ofs: + .word __datarellocal_start - _start
-.globl _got_end -_got_end: - .word __got_end +.globl _datarelro_start_ofs +_datarelro_start_ofs: + .word __datarelro_start - _start
/* * the actual reset code @@ -226,9 +216,8 @@ stack_setup:
adr r0, _start ldr r2, _TEXT_BASE - ldr r3, _bss_start - sub r2, r3, r2 /* r2 <- size of armboot */ - add r2, r0, r2 /* r2 <- source end address */ + ldr r3, _bss_start_ofs + add r2, r0, r3 /* r2 <- source end address */ cmp r0, r6 beq clear_bss
@@ -240,36 +229,54 @@ copy_loop: ble copy_loop
#ifndef CONFIG_PRELOADER - /* fix got entries */ - ldr r1, _TEXT_BASE /* Text base */ - mov r0, r7 /* reloc addr */ - ldr r2, _got_start /* addr in Flash */ - ldr r3, _got_end /* addr in Flash */ - sub r3, r3, r1 - add r3, r3, r0 - sub r2, r2, r1 - add r2, r2, r0 - + /* + * fix .rel.dyn relocations + */ + ldr r0, _TEXT_BASE /* r0 <- Text base */ + sub r9, r7, r0 /* r9 <- relocation offset */ + ldr r10, _dynsym_start_ofs /* r10 <- sym table ofs */ + add r10, r10, r0 /* r10 <- sym table in FLASH */ + ldr r2, _rel_dyn_start_ofs /* r2 <- rel dyn start ofs */ + add r2, r2, r0 /* r2 <- rel dyn start in FLASH */ + ldr r3, _rel_dyn_end_ofs /* r3 <- rel dyn end ofs */ + add r3, r3, r0 /* r3 <- rel dyn end in FLASH */ fixloop: - ldr r4, [r2] - sub r4, r4, r1 - add r4, r4, r0 - str r4, [r2] - add r2, r2, #4 + ldr r0, [r2] /* r0 <- location to fix up, IN FLASH! */ + add r0, r9 /* r0 <- location to fix up in RAM */ + ldr r1, [r2, #4] + and r8, r1, #0xff + cmp r8, #23 /* relative fixup? */ + beq fixrel + cmp r8, #2 /* absolute fixup? */ + beq fixabs + /* ignore unknown type of fixup */ + b fixnext +fixabs: + /* absolute fix: set location to (offset) symbol value */ + mov r1, r1, LSR #4 /* r1 <- symbol index in .dynsym */ + add r1, r10, r1 /* r1 <- address of symbol in table */ + ldr r1, [r1, #4] /* r1 <- symbol value */ + add r1, r9 /* r1 <- relocated sym addr */ + b fixnext +fixrel: + /* relative fix: increase location by offset */ + ldr r1, [r0] + add r1, r1, r9 +fixnext: + str r1, [r0] + add r2, r2, #8 /* each rel.dyn entry is 8 bytes */ cmp r2, r3 - bne fixloop + ble fixloop #endif #endif /* #ifndef CONFIG_SKIP_RELOCATE_UBOOT */
clear_bss: #ifndef CONFIG_PRELOADER - ldr r0, _bss_start - ldr r1, _bss_end + ldr r0, _bss_start_ofs + ldr r1, _bss_end_ofs ldr r3, _TEXT_BASE /* Text base */ mov r4, r7 /* reloc addr */ - sub r0, r0, r3 add r0, r0, r4 - sub r1, r1, r3 add r1, r1, r4 mov r2, #0x00000000 /* clear */
@@ -287,24 +294,33 @@ clbss_l:str r2, [r0] /* clear loop... */ * initialization, now running from RAM. */ #ifdef CONFIG_NAND_SPL - ldr pc, _nand_boot - -_nand_boot: .word nand_boot + ldr r0, _nand_boot_ofs + adr r1, _start + add pc, r0, r1 +_nand_boot_ofs + : .word nand_boot - _start #else - ldr r0, _TEXT_BASE - ldr r2, _board_init_r - sub r2, r2, r0 - add r2, r2, r7 /* position from board_init_r in RAM */ + ldr r0, _board_init_r_ofs + adr r1, _start + add r0, r0, r1 + add lr, r0, r9 /* setup parameters for board_init_r */ mov r0, r5 /* gd_t */ mov r1, r7 /* dest_addr */ /* jump to it ... */ - mov lr, r2 mov pc, lr
-_board_init_r: .word board_init_r +_board_init_r_ofs: + .word board_init_r - _start #endif
+_rel_dyn_start_ofs: + .word __rel_dyn_start - _start +_rel_dyn_end_ofs: + .word __rel_dyn_end - _start +_dynsym_start_ofs: + .word __dynsym_start - _start + #else /* #if !defined(CONFIG_SYS_ARM_WITHOUT_RELOC) */ /* * the actual reset code @@ -333,10 +349,8 @@ relocate: /* relocate U-Boot to RAM */ ldr r1, _TEXT_BASE /* test if we run from flash or RAM */ cmp r0, r1 /* don't reloc during debug */ beq stack_setup - ldr r2, _armboot_start - ldr r3, _bss_start - sub r2, r3, r2 /* r2 <- size of armboot */ - add r2, r0, r2 /* r2 <- source end address */ + ldr r3, _bss_start_ofs /* r3 <- _bss_start - _start */ + add r2, r0, r3 /* r2 <- source end address */
copy_loop: ldmia r0!, {r3-r10} /* copy from source address [r0] */ @@ -360,8 +374,11 @@ stack_setup: bic sp, sp, #7 /* 8-byte alignment for ABI compliance */
clear_bss: - ldr r0, _bss_start /* find start of bss segment */ - ldr r1, _bss_end /* stop here */ + adr r2, _start + ldr r0, _bss_start_ofs /* find start of bss segment */ + add r0, r0, r2 + ldr r1, _bss_end_ofs /* stop here */ + add r1, r1, r2 mov r2, #0x00000000 /* clear */
#ifndef CONFIG_PRELOADER @@ -374,13 +391,16 @@ clbss_l:str r2, [r0] /* clear loop... */ bl red_LED_on #endif /* CONFIG_PRELOADER */
- ldr pc, _start_armboot + ldr r0, _start_armboot_ofs + adr r1, _start + add r0, r0, r1 + ldr pc, r0
-_start_armboot: +_start_armboot_ofs: #ifdef CONFIG_NAND_SPL - .word nand_boot + .word nand_boot - _start #else - .word start_armboot + .word start_armboot - _start #endif /* CONFIG_NAND_SPL */ #endif /* #if !defined(CONFIG_SYS_ARM_WITHOUT_RELOC) */
@@ -469,7 +489,7 @@ cpu_init_crit: sub sp, sp, #S_FRAME_SIZE stmia sp, {r0 - r12} @ Save user registers (now in svc mode) r0-r12 #if defined(CONFIG_SYS_ARM_WITHOUT_RELOC) - ldr r2, _armboot_start + adr r2, _start sub r2, r2, #(CONFIG_STACKSIZE+CONFIG_SYS_MALLOC_LEN) sub r2, r2, #(CONFIG_SYS_GBL_DATA_SIZE+8) @ set base 2 words into abort stack #else @@ -507,7 +527,7 @@ cpu_init_crit:
.macro get_bad_stack #if defined(CONFIG_SYS_ARM_WITHOUT_RELOC) - ldr r13, _armboot_start @ setup our mode stack + adr r13, _start @ setup our mode stack sub r13, r13, #(CONFIG_STACKSIZE+CONFIG_SYS_MALLOC_LEN) sub r13, r13, #(CONFIG_SYS_GBL_DATA_SIZE+8) @ reserved a couple spots in abort stack #else diff --git a/arch/arm/cpu/arm926ejs/u-boot.lds b/arch/arm/cpu/arm926ejs/u-boot.lds index 02eb8ca..f07a54a 100644 --- a/arch/arm/cpu/arm926ejs/u-boot.lds +++ b/arch/arm/cpu/arm926ejs/u-boot.lds @@ -51,6 +51,14 @@ SECTIONS *(.data.rel.ro) }
+ . = ALIGN(4); + __rel_dyn_start = .; + .rel.dyn : { *(.rel.dyn) } + __rel_dyn_end = .; + + __dynsym_start = .; + .dynsym : { *(.dynsym) } + __got_start = .; . = ALIGN(4); .got : { *(.got) } @@ -65,4 +73,5 @@ SECTIONS __bss_start = .; .bss (NOLOAD) : { *(.bss) . = ALIGN(4); } _end = .; + } diff --git a/arch/arm/include/asm/u-boot-arm.h b/arch/arm/include/asm/u-boot-arm.h index faf800a..4ac4f61 100644 --- a/arch/arm/include/asm/u-boot-arm.h +++ b/arch/arm/include/asm/u-boot-arm.h @@ -30,18 +30,18 @@ #define _U_BOOT_ARM_H_ 1
/* for the following variables, see start.S */ -extern ulong _bss_start; /* code + data end == BSS start */ -extern ulong _bss_end; /* BSS end */ +extern ulong _bss_start_ofs; /* BSS start relative to _start */ +extern ulong _bss_end_ofs; /* BSS end relative to _start */ extern ulong IRQ_STACK_START; /* top of IRQ stack */ extern ulong FIQ_STACK_START; /* top of FIQ stack */ #if defined(CONFIG_SYS_ARM_WITHOUT_RELOC) -extern ulong _armboot_start; /* code start */ +extern ulong _armboot_start_ofs; /* code start */ #else extern ulong _TEXT_BASE; /* code start */ -extern ulong _datarel_start; -extern ulong _datarelrolocal_start; -extern ulong _datarellocal_start; -extern ulong _datarelro_start; +extern ulong _datarel_start_ofs; +extern ulong _datarelrolocal_start_ofs; +extern ulong _datarellocal_start_ofs; +extern ulong _datarelro_start_ofs; extern ulong IRQ_STACK_START_IN; /* 8 bytes in IRQ stack */ #endif
diff --git a/arch/arm/lib/board.c b/arch/arm/lib/board.c index 5f2dfd0..e411d93 100644 --- a/arch/arm/lib/board.c +++ b/arch/arm/lib/board.c @@ -147,7 +147,7 @@ static int display_banner (void) #else _armboot_start, #endif - _bss_start, _bss_end); + _bss_start_ofs+_TEXT_BASE, _bss_end_ofs+_TEXT_BASE); #ifdef CONFIG_MODEM_SUPPORT debug ("Modem Support enabled\n"); #endif @@ -517,7 +517,7 @@ void board_init_f (ulong bootflag)
memset ((void*)gd, 0, sizeof (gd_t));
- gd->mon_len = _bss_end - _TEXT_BASE; + gd->mon_len = _bss_end_ofs;
for (init_fnc_ptr = init_sequence; *init_fnc_ptr; ++init_fnc_ptr) { if ((*init_fnc_ptr)() != 0) { @@ -679,6 +679,7 @@ static char *failed = "*** failed ***\n"; * ************************************************************************ */ + void board_init_r (gd_t *id, ulong dest_addr) { char *s; @@ -702,7 +703,7 @@ void board_init_r (gd_t *id, ulong dest_addr)
gd->flags |= GD_FLG_RELOC; /* tell others: relocation done */
- monitor_flash_len = _bss_start - _TEXT_BASE; + monitor_flash_len = _bss_start_ofs; debug ("monitor flash len: %08lX\n", monitor_flash_len); board_init(); /* Setup chipselects */
@@ -914,6 +915,7 @@ extern void davinci_eth_set_mac_addr (const u_int8_t *addr);
/* NOTREACHED - no way out of command loop except booting */ } + #endif /* defined(CONFIG_SYS_ARM_WITHOUT_RELOC) */
void hang (void) diff --git a/board/LaCie/edminiv2/config.mk b/board/LaCie/edminiv2/config.mk index bb444b5..f8bdc3a 100644 --- a/board/LaCie/edminiv2/config.mk +++ b/board/LaCie/edminiv2/config.mk @@ -24,5 +24,6 @@ # MA 02110-1301 USA #
-# with relocation TEXT_BASE now *must* be in FLASH -TEXT_BASE = 0xfff90000 +# with relocation TEXT_BASE can be anything, and making it 0 +# makes relative and absolute relocation fixups interchangeable. +TEXT_BASE = 0 diff --git a/include/configs/edminiv2.h b/include/configs/edminiv2.h index e6537fc..8bcdfcc 100644 --- a/include/configs/edminiv2.h +++ b/include/configs/edminiv2.h @@ -224,6 +224,7 @@ #define CONFIG_SYS_MAXARGS 16
/* additions for new relocation code, must be added to all boards */ +#define CONFIG_RELOC_FIXUP_WORKS #undef CONFIG_SYS_ARM_WITHOUT_RELOC #define CONFIG_SYS_SDRAM_BASE 0 #define CONFIG_SYS_INIT_SP_ADDR \

Afterthought: I have not explained how to test this patch.
If your board is based on arm926ejs and does not have its own u-boot.lds, then you can simply applly the patch and modify your board config file as I modified edminiv2.h, basically by adding the config option CONFIG_RELOC_FIXUP_WORKS which will remove all calls to manual fixup code.
If your board is arm926ejs but has its own start.S or u-boot.lds (or both) then you'll also have to apply them the changes I made to their arm926 counterparts.
Ditto if your board is ARM but not arm926.
You should then be able to build and run.
Remember: this patch only applies to boards which boot from NOR FLASH! You can test it on other types of boards (NAND-based, etc) for regression testing, but nothing more.
Amicalement,

On Mon, Oct 4, 2010 at 4:09 PM, Albert ARIBAUD albert.aribaud@free.fr wrote: ....
Remember: this patch only applies to boards which boot from NOR FLASH! You can test it on other types of boards (NAND-based, etc) for regression testing, but nothing more.
Dumb question, how is the case of a NAND SPL, or similar loader loading u-boot first to a low RAM address then u-boot relocating itself to top of RAM different than u-boot relocating itself from NOR to top of RAM?
Thanks, John

On Tue, Oct 5, 2010 at 10:57 AM, John Rigby john.rigby@linaro.org wrote:
On Mon, Oct 4, 2010 at 4:09 PM, Albert ARIBAUD albert.aribaud@free.fr wrote: ....
Remember: this patch only applies to boards which boot from NOR FLASH! You can test it on other types of boards (NAND-based, etc) for regression testing, but nothing more.
Dumb question, how is the case of a NAND SPL, or similar loader loading u-boot first to a low RAM address then u-boot relocating itself to top of RAM different than u-boot relocating itself from NOR to top of RAM?
Can NAND SPL initialise and size memory before loading U-Boot into RAM? If so, could the relocation code be added to NAND SPL so only one copy operation is performed?
Regards,
Graeme

Dear Graeme Russ,
In message AANLkTikqE0_DEqHs-tX3A4XuEXJhM0CYW_+j6izhmktw@mail.gmail.com you wrote:
Can NAND SPL initialise and size memory before loading U-Boot into RAM?
It has to. You cannot load into and run from uninitialized RAM ;-)
If so, could the relocation code be added to NAND SPL so only one copy operation is performed?
I'm afraid it cannot, due to size limitations. The NAND loader often hast to fit into as little a 2 or 4 KiB...
Best regards,
Wolfgang Denk

On Tue, Oct 5, 2010 at 4:34 PM, Wolfgang Denk wd@denx.de wrote:
Dear Graeme Russ,
In message AANLkTikqE0_DEqHs-tX3A4XuEXJhM0CYW_+j6izhmktw@mail.gmail.com you wrote:
Can NAND SPL initialise and size memory before loading U-Boot into RAM?
It has to. You cannot load into and run from uninitialized RAM ;-)
If so, could the relocation code be added to NAND SPL so only one copy operation is performed?
I'm afraid it cannot, due to size limitations. The NAND loader often hast to fit into as little a 2 or 4 KiB...
For x86, the actual relocation calculations can be done in a probably a few dozen bytes of code. It contains:
- One offset calculation - A single tight loop - Two comparisons (probably not needed in the generic case as they are used to filter out x86 specific code outside .text) - An offset addition
If the only constraint is space then it _may_ be possible in some scenarios (although I do acknowledge that previous trival changes have caused the size constaint to be violated)
Regards,
Graeme

On Tue, Oct 5, 2010 at 4:40 PM, Graeme Russ graeme.russ@gmail.com wrote:
On Tue, Oct 5, 2010 at 4:34 PM, Wolfgang Denk wd@denx.de wrote:
Dear Graeme Russ,
In message AANLkTikqE0_DEqHs-tX3A4XuEXJhM0CYW_+j6izhmktw@mail.gmail.com you wrote:
Can NAND SPL initialise and size memory before loading U-Boot into RAM?
It has to. You cannot load into and run from uninitialized RAM ;-)
If so, could the relocation code be added to NAND SPL so only one copy operation is performed?
I'm afraid it cannot, due to size limitations. The NAND loader often hast to fit into as little a 2 or 4 KiB...
For x86, the actual relocation calculations can be done in a probably a few dozen bytes of code. It contains:
- One offset calculation
- A single tight loop
- Two comparisons (probably not needed in the generic case as they are used
to filter out x86 specific code outside .text)
- An offset addition
If the only constraint is space then it _may_ be possible in some scenarios (although I do acknowledge that previous trival changes have caused the size constaint to be violated)
Another alternative is to load into upper memory and have the relocation code detect that U-Boot is already there and skip the copy operation
Regards,
Graeme

Le 05/10/2010 07:42, Graeme Russ a écrit :
On Tue, Oct 5, 2010 at 4:40 PM, Graeme Russgraeme.russ@gmail.com wrote:
On Tue, Oct 5, 2010 at 4:34 PM, Wolfgang Denkwd@denx.de wrote:
Dear Graeme Russ,
In messageAANLkTikqE0_DEqHs-tX3A4XuEXJhM0CYW_+j6izhmktw@mail.gmail.com you wrote:
Can NAND SPL initialise and size memory before loading U-Boot into RAM?
It has to. You cannot load into and run from uninitialized RAM ;-)
If so, could the relocation code be added to NAND SPL so only one copy operation is performed?
I'm afraid it cannot, due to size limitations. The NAND loader often hast to fit into as little a 2 or 4 KiB...
For x86, the actual relocation calculations can be done in a probably a few dozen bytes of code. It contains:
- One offset calculation
- A single tight loop
- Two comparisons (probably not needed in the generic case as they are used
to filter out x86 specific code outside .text)
- An offset addition
If the only constraint is space then it _may_ be possible in some scenarios (although I do acknowledge that previous trival changes have caused the size constaint to be violated)
Another alternative is to load into upper memory and have the relocation code detect that U-Boot is already there and skip the copy operation
The loader would have to know something about the way u-boot relocates itself, and this may change based on configuration.
For instance, on ARM, if either icache or dcache are configured, u-boot will reserve the upper 64 KB for TLB and thus relocate 64 KB lower than if neither icache nor dcache are configured. Ditto for VFD, LCD framebuffer, etc. Only after these allocations, and thus below them, is u-boot finally relocated.
An independent loader would thus have to figure all this out in order to know how exactly where u-boot expects to relocate, otherwise it may put u-boot at a location which would be almost, but not quite, entirely unlike the intended one -- and that's the worst possible choice, as we now hit the dreaded 'relocate over oneself' issue.
OTOH, the u-boot board.c may possibly be modified so that the the final location of the u-boot code only depend on its code size, not its configuration options. Something like, in descending order: u-boot code, data and bss; TLBs, VFDs, framebuffers, etc; malloc arena; and stack.
Regards,
Graeme
Amicalement,

Dear Albert ARIBAUD,
In message 4CAA50AA.3000608@free.fr you wrote:
Remember: this patch only applies to boards which boot from NOR FLASH! You can test it on other types of boards (NAND-based, etc) for regression testing, but nothing more.
Assuming the NAND loder does not load U-Boot to it's final location at the upper end of RAM, but - say - somewhere in lower memory, the standard relocation preocess will be running, so I think there should be no real difference between (such) NAND booting systems and NOR booting ones - or am I missing something?
Best regards,
Wolfgang Denk

On 10/4/2010 10:30 PM, Wolfgang Denk wrote:
Dear Albert ARIBAUD,
In message4CAA50AA.3000608@free.fr you wrote:
Remember: this patch only applies to boards which boot from NOR FLASH! You can test it on other types of boards (NAND-based, etc) for regression testing, but nothing more.
Assuming the NAND loder does not load U-Boot to it's final location at the upper end of RAM, but - say - somewhere in lower memory, the standard relocation preocess will be running, so I think there should be no real difference between (such) NAND booting systems and NOR booting ones - or am I missing something?
FWIW I think you are right. If u-boot is linked for the address where the NAND loader put it, everything should work fine. It can size memory, move a copy of u-boot to the top of memory, and branch to the entry point that continues initialization.
Bill Campbell
Best regards,
Wolfgang Denk

Le 05/10/2010 07:30, Wolfgang Denk a écrit :
Dear Albert ARIBAUD,
In message4CAA50AA.3000608@free.fr you wrote:
Remember: this patch only applies to boards which boot from NOR FLASH! You can test it on other types of boards (NAND-based, etc) for regression testing, but nothing more.
Assuming the NAND loder does not load U-Boot to it's final location at the upper end of RAM, but - say - somewhere in lower memory, the standard relocation preocess will be running, so I think there should be no real difference between (such) NAND booting systems and NOR booting ones - or am I missing something?
No, you're not; instead of "but [you can test] nothing more", I should have written "but don't expect much more" -- NAND testers may find it works because yes, relocation does not differ whether from FLASH or from RAM; however I did nothing to make sure NAND booting (and other similar methods) works, whereas I spent quite some time to make NOR boot work.
Best regards,
Wolfgang Denk
Amicalement,

On Tue, Oct 5, 2010 at 9:01 AM, Albert Aribaud albert.aribaud@free.fr wrote:
This patch is *not* a submission for master!
It is a proof of concept of ELF relocations for ARM, hastily done in a day's work time for people on the list to try and to comment. All comments are welcome, as several suggestions have been made today on the list that I did not have time to incorporate, such as rewriting the elf table fixup code in C.
Yes, this would be nice. I imagine it would look somewhat like the version for x86. It would be nice to have a generic function which will work for all arches.
[snip]
With GOT relocs:
text data bss dec hex filename .bin size 141376 4388 16640 162404 27a64 ./u-boot 145764
With ELF relocs:
text data bss dec hex filename .bin size 149677 3727 16636 170040 29838 ./u-boot 153408
Hmm, I'm a bit suprised by the text increase - Can you provide a more detailed breakdown of before and after sizes by section?
As I have mentioned before, x86 has an in-RAM increase of only 284 bytes (0.3 %) with an additional 22424 bytes in .rel.dyn
Regards,
Graeme

Le 05/10/2010 00:22, Graeme Russ a écrit :
On Tue, Oct 5, 2010 at 9:01 AM, Albert Aribaudalbert.aribaud@free.fr wrote:
This patch is *not* a submission for master!
It is a proof of concept of ELF relocations for ARM, hastily done in a day's work time for people on the list to try and to comment. All comments are welcome, as several suggestions have been made today on the list that I did not have time to incorporate, such as rewriting the elf table fixup code in C.
Yes, this would be nice. I imagine it would look somewhat like the version for x86. It would be nice to have a generic function which will work for all arches.
[snip]
With GOT relocs:
text data bss dec hex filename .bin size 141376 4388 16640 162404 27a64 ./u-boot 145764
With ELF relocs:
text data bss dec hex filename .bin size 149677 3727 16636 170040 29838 ./u-boot 153408
Hmm, I'm a bit suprised by the text increase - Can you provide a more detailed breakdown of before and after sizes by section?
The output from MAKEALL is curiously calculated... If I look at objdumps of the GOT and ELF binaries, I find that:
- the GOT .text section is 118960 bytes and the ELF .text section only 108112. This is due to the fact that GOT relocation requires additional instruction for GOT indirection whereas ELF relocations work by patching the code.
- the .rodata section is 22416 for GOT, 22698 for ELF, whereas the .data section is 2908 for GOT, 2627 for ELF. Some initialized data apparently moved from non-const ton const for some reason, but basically, initialized data remains constant.
- the .bss section remains constant too, 16640 for GOT vs. 16636 for ELF. I'm not going to track what causes the 4 byte difference. :)
Many sections are output in the ELF file which do not appear in the GOT file, such as .interp, .dynamic, .dynstr etc. They probably pollute MAKEALL's figures.
So actually the code (.text+.rodata+.data) is smaller for ELF than for GOT (which is normal as GOT causes adding indirection instructions whereas ELF does not alter the code size) but the .rel.dyn is way bigger than the .got (which is also normal as GOT does not relocate all that ELF does).
As I have mentioned before, x86 has an in-RAM increase of only 284 bytes (0.3 %) with an additional 22424 bytes in .rel.dyn
That's roughly consistent with the numbers I get: about 19 KB of .rel.dyn plus .dynsym, which we will be able to cut by half if we preprocess it.
Regards,
Graeme
Amicalement,

On Tue, Oct 5, 2010 at 9:57 AM, Albert ARIBAUD albert.aribaud@free.fr wrote:
Le 05/10/2010 00:22, Graeme Russ a écrit :
On Tue, Oct 5, 2010 at 9:01 AM, Albert Aribaudalbert.aribaud@free.fr wrote:
The output from MAKEALL is curiously calculated... If I look at objdumps of the GOT and ELF binaries, I find that:
- the GOT .text section is 118960 bytes and the ELF .text section only
- This is due to the fact that GOT relocation requires additional
instruction for GOT indirection whereas ELF relocations work by patching the code.
It would be interesting to compare against the basline non-relocatable version
- the .rodata section is 22416 for GOT, 22698 for ELF, whereas the .data
section is 2908 for GOT, 2627 for ELF. Some initialized data apparently moved from non-const ton const for some reason, but basically, initialized data remains constant.
- the .bss section remains constant too, 16640 for GOT vs. 16636 for ELF.
I'm not going to track what causes the 4 byte difference. :)
Many sections are output in the ELF file which do not appear in the GOT file, such as .interp, .dynamic, .dynstr etc. They probably pollute MAKEALL's figures.
I now discard a few sections:
/DISCARD/ : { *(.dynstr*) } /DISCARD/ : { *(.dynamic*) } /DISCARD/ : { *(.plt*) } /DISCARD/ : { *(.interp*) } /DISCARD/ : { *(.gnu*) }
Not that it makes a huge difference - most of these are trivially small
So actually the code (.text+.rodata+.data) is smaller for ELF than for GOT (which is normal as GOT causes adding indirection instructions whereas ELF does not alter the code size) but the .rel.dyn is way bigger than the .got (which is also normal as GOT does not relocate all that ELF does).
As I have mentioned before, x86 has an in-RAM increase of only 284 bytes (0.3 %) with an additional 22424 bytes in .rel.dyn
That's roughly consistent with the numbers I get: about 19 KB of .rel.dyn plus .dynsym, which we will be able to cut by half if we preprocess it.
Which is not copied to RAM, so not as nasty as the .got related increase
I'm also looking at moving the low-level intialisation and relocation code into a seperate section (outside .text) so I even less to relocate to RAM
Then I could even compress the relocatable section, but that is just being silly ;)
Regards,
Graeme

Le 05/10/2010 01:21, Graeme Russ a écrit :
On Tue, Oct 5, 2010 at 9:57 AM, Albert ARIBAUDalbert.aribaud@free.fr wrote:
Le 05/10/2010 00:22, Graeme Russ a écrit :
On Tue, Oct 5, 2010 at 9:01 AM, Albert Aribaudalbert.aribaud@free.fr wrote:
The output from MAKEALL is curiously calculated... If I look at objdumps of the GOT and ELF binaries, I find that:
- the GOT .text section is 118960 bytes and the ELF .text section only
- This is due to the fact that GOT relocation requires additional
instruction for GOT indirection whereas ELF relocations work by patching the code.
It would be interesting to compare against the basline non-relocatable version
I #defined CONFIG_RELOC_FIXUP_WORKS and removed -pie from the ARM config.mk. This puts the edminiv2 code in the non-reloc build case, and produces identical .text and .data, and almost identical .rodata, as the ELF case.
- the .rodata section is 22416 for GOT, 22698 for ELF, whereas the .data
section is 2908 for GOT, 2627 for ELF. Some initialized data apparently moved from non-const ton const for some reason, but basically, initialized data remains constant.
- the .bss section remains constant too, 16640 for GOT vs. 16636 for ELF.
I'm not going to track what causes the 4 byte difference. :)
Many sections are output in the ELF file which do not appear in the GOT file, such as .interp, .dynamic, .dynstr etc. They probably pollute MAKEALL's figures.
I now discard a few sections:
/DISCARD/ : { *(.dynstr*) } /DISCARD/ : { *(.dynamic*) } /DISCARD/ : { *(.plt*) } /DISCARD/ : { *(.interp*) } /DISCARD/ : { *(.gnu*) }
Not that it makes a huge difference - most of these are trivially small
Thanks. I'll add this to the .lds as a measure of clarity.
That's roughly consistent with the numbers I get: about 19 KB of .rel.dyn plus .dynsym, which we will be able to cut by half if we preprocess it.
Which is not copied to RAM, so not as nasty as the .got related increase
True also. Note that we could probably shrink the table to 1/4 of its current size by taking advantage from the fact that the few non-program-base-relative relocations it has can easily be converted to program-base-relative, and that two consecutive relocations are always less than 64 KB away from each other. Of course that moves away from using the ELF structures as-is, and requires additional build steps, but people with small FLASH devices may want it.
I'm also looking at moving the low-level intialisation and relocation code into a seperate section (outside .text) so I even less to relocate to RAM
As Wolfgang pointed out, there might be issues in that all the code that runs in FLASH should be truly PI, which might not be a piece of cake. ARM C code, for instance, tends to generate literals which need to be relocated if you don't run the code where it was linked for.
Then I could even compress the relocatable section, but that is just being silly ;)
:)
Regards,
Graeme
Amicalement,

On 10/4/2010 5:16 PM, Albert ARIBAUD wrote:
Le 05/10/2010 01:21, Graeme Russ a écrit :
On Tue, Oct 5, 2010 at 9:57 AM, Albert ARIBAUDalbert.aribaud@free.fr wrote:
Le 05/10/2010 00:22, Graeme Russ a écrit :
On Tue, Oct 5, 2010 at 9:01 AM, Albert Aribaudalbert.aribaud@free.fr wrote:
The output from MAKEALL is curiously calculated... If I look at objdumps of the GOT and ELF binaries, I find that:
- the GOT .text section is 118960 bytes and the ELF .text section only
- This is due to the fact that GOT relocation requires additional
instruction for GOT indirection whereas ELF relocations work by patching the code.
It would be interesting to compare against the basline non-relocatable version
I #defined CONFIG_RELOC_FIXUP_WORKS and removed -pie from the ARM config.mk. This puts the edminiv2 code in the non-reloc build case, and produces identical .text and .data, and almost identical .rodata, as the ELF case.
- the .rodata section is 22416 for GOT, 22698 for ELF, whereas the .data
section is 2908 for GOT, 2627 for ELF. Some initialized data apparently moved from non-const ton const for some reason, but basically, initialized data remains constant.
- the .bss section remains constant too, 16640 for GOT vs. 16636 for ELF.
I'm not going to track what causes the 4 byte difference. :)
Many sections are output in the ELF file which do not appear in the GOT file, such as .interp, .dynamic, .dynstr etc. They probably pollute MAKEALL's figures.
I now discard a few sections:
/DISCARD/ : { *(.dynstr*) } /DISCARD/ : { *(.dynamic*) } /DISCARD/ : { *(.plt*) } /DISCARD/ : { *(.interp*) } /DISCARD/ : { *(.gnu*) }
Not that it makes a huge difference - most of these are trivially small
Thanks. I'll add this to the .lds as a measure of clarity.
That's roughly consistent with the numbers I get: about 19 KB of .rel.dyn plus .dynsym, which we will be able to cut by half if we preprocess it.
Which is not copied to RAM, so not as nasty as the .got related increase
True also. Note that we could probably shrink the table to 1/4 of its current size by taking advantage from the fact that the few non-program-base-relative relocations it has can easily be converted to program-base-relative, and that two consecutive relocations are always less than 64 KB away from each other. Of course that moves away from using the ELF structures as-is, and requires additional build steps, but people with small FLASH devices may want it.
Hi All, This may be pushing it in the more general case. ARM has only a few relocation types. Other CPU types have more types, and therefore still may need a type field. You can certainly get 1/2 in all cases, and more if you are willing to get a bit more complex in the preprocessing. That said, I think this is best left to later when all CPUs are in the relocatable state.
I'm also looking at moving the low-level intialisation and relocation code into a seperate section (outside .text) so I even less to relocate to RAM
As Wolfgang pointed out, there might be issues in that all the code that runs in FLASH should be truly PI, which might not be a piece of cake. ARM C code, for instance, tends to generate literals which need to be relocated if you don't run the code where it was linked for.
True, but the code WILL be running at the address it was linked for. It just won't be copied and relocated to the "new" address, as it would never be run again anyway. This goal is along the lines of the "two stage u-boot" that has been/is being considered, where all execute only once code can be concentrated into a segment that is not moved into ram.
Bill Campbell
Then I could even compress the relocatable section, but that is just being silly ;)
:)
Regards,
Graeme
Amicalement,

On Tue, Oct 5, 2010 at 9:01 AM, Albert Aribaud albert.aribaud@free.fr wrote:
This patch is *not* a submission for master!
It is a proof of concept of ELF relocations for ARM, hastily done in a day's work time for people on the list to try and to comment. All comments are welcome, as several suggestions have been made today on the list that I did not have time to incorporate, such as rewriting the elf table fixup code in C.
The basic idea of this patch is to replace the -fPIC compile-time option with the -pie link-time option. This removes the GOT but adds the .rel.dyn and .dynsym tables, which together allow fixing up code more completely than with -fPIC and the GOT; for instance, all pointers inside structures are fixed up with -pie, whereas they are not with GOT.
So does this make all of Heiko's ARM relocation patches redundant?
Regards,
Graeme

Dear Albert Aribaud,
In message 1286229705-16019-1-git-send-email-albert.aribaud@free.fr you wrote:
This patch is *not* a submission for master!
I'm trying this on arm1136 (i.MX31) at the moment. It seems the patch was not taken against latest mainline, but probably against a previous state in your local tree.
I get:
Configuring for qong board... arch/arm/lib/libarm.a(board.o): In function `board_init_r': /home/wd/git/u-boot/work/arch/arm/lib/board.c:913: undefined reference to `_bss_start_ofs' arch/arm/lib/libarm.a(board.o): In function `board_init_f': /home/wd/git/u-boot/work/arch/arm/lib/board.c:664: undefined reference to `_bss_end_ofs'
Seems that entries for _bss_start_ofs and _bss_end_ofs are missing in my version of the linker script, and I don't see these in your patch either.
[There are aslo rejects in board/LaCie/edminiv2/config.mk and include/configs/edminiv2.h, probably d=ue to the same reason.]
Can you please provide an updated patch against mainline? Thanks.
Best regards,
Wolfgang Denk
participants (6)
-
Albert ARIBAUD
-
Albert Aribaud
-
Graeme Russ
-
J. William Campbell
-
John Rigby
-
Wolfgang Denk