[U-Boot] Skipping relocation RAM to RAM, esp. on i.MX6?

Hi,
on i.MX6 devices, e.g. ARM2 or SabreLite, the ROM boot loader copies the U-Boot image from the boot device, e.g. the SD card, to the main memory. This does mean that U-Boot is started in RAM.
With this, one might wonder why any relocation RAM -> RAM is done anyway and if this could be skipped?
Looking into the details shows that board_init_f() in arch/arm/lib/board.c and relocate_code() in arch/arm/cpu/armv7/start.S [1] are involved in this.
In board_init_f() the relocation destination address 'addr' is calculated. This is basically at the end of the available RAM (- some space for various stuff like TLB tables etc.). At SabreLite this results in 0x4FF8D000.
By the boot loader, the U-Boot is loaded to
CONFIG_SYS_TEXT_BASE 0x17800000
This results in relocate_code() copying U-Boot from RAM 0x17800000 to RAM 0x4FF8D000.
Setting CONFIG_SYS_TEXT_BASE to the relocation destination address 0x4FF8D000 does avoid the (unnecessary?) copy by
cmp r0, r6 moveq r9, #0 /* no relocation. relocation offset(r9) = 0 */ beq clear_bss /* skip relocation */
in relocate_code().
But:
1) The resulting image still runs without the relocation (CONFIG_SYS_TEXT_BASE 0x4FF8D000). But e.g. the U-Boot command line doesn't work properly any more. Most probably this is because not only the copy is skipped by the 'beq clear_bss', but the whole 'fix .rel.dyn relocations' is skipped too.
2) It's hard to set CONFIG_SYS_TEXT_BASE at compile time to the relocation address calculated at runtime in board_init_f() due to the amount of #ifdef and runtime calculation done there. So finding a generic approach which could easily defined in the config files to avoid the relocation seems difficult.
I haven't tried it, but for (1) it might help to not jump to clear_bss, but instead jumping to the 'fix .rel.dyn relocations' section. Just avoiding the extra copy.
For (2) I don't have an idea how to solve this cleanly.
Do I have missed anything? Is there a clean way to skip the extra copy from RAM -> RAM in this case? Any idea or opinions?
Many thanks and best regards
Dirk
[1]
arch/arm/cpu/armv7/start.S:
relocate_code: mov r4, r0 /* save addr_sp */ mov r5, r1 /* save addr of gd */ mov r6, r2 /* save addr of destination */
/* Set up the stack */ stack_setup: mov sp, r4
adr r0, _start cmp r0, r6 moveq r9, #0 /* no relocation. relocation offset(r9) = 0 */ beq clear_bss /* skip relocation */ mov r1, r6 /* r1 <- scratch for copy_loop */ ldr r3, _image_copy_end_ofs add r2, r0, r3 /* r2 <- source end address */
copy_loop: ldmia r0!, {r9-r10} /* copy from source address [r0] */ stmia r1!, {r9-r10} /* copy to target address [r1] */ cmp r0, r2 /* until source end address [r2] */ blo copy_loop
#ifndef CONFIG_SPL_BUILD /* * fix .rel.dyn relocations */ ldr r0, _TEXT_BASE /* r0 <- Text base */ sub r9, r6, r0 /* r9 <- relocation offset */ ldr r10, _dynsym_start_ofs /* r10 <- sym table ofs */ add r10, r10, r0 /* r10 <- sym table in FLASH */ ldr r2, _rel_dyn_start_ofs /* r2 <- rel dyn start ofs */ add r2, r2, r0 /* r2 <- rel dyn start in FLASH */ ldr r3, _rel_dyn_end_ofs /* r3 <- rel dyn end ofs */ add r3, r3, r0 /* r3 <- rel dyn end in FLASH */ fixloop: ldr r0, [r2] /* r0 <- location to fix up, IN FLASH! */ add r0, r0, r9 /* r0 <- location to fix up in RAM */ ldr r1, [r2, #4] and r7, r1, #0xff cmp r7, #23 /* relative fixup? */ beq fixrel cmp r7, #2 /* absolute fixup? */ beq fixabs /* ignore unknown type of fixup */ b fixnext fixabs: /* absolute fix: set location to (offset) symbol value */ mov r1, r1, LSR #4 /* r1 <- symbol index in .dynsym */ add r1, r10, r1 /* r1 <- address of symbol in table */ ldr r1, [r1, #4] /* r1 <- symbol value */ add r1, r1, r9 /* r1 <- relocated sym addr */ b fixnext fixrel: /* relative fix: increase location by offset */ ldr r1, [r0] add r1, r1, r9 fixnext: str r1, [r0] add r2, r2, #8 /* each rel.dyn entry is 8 bytes */ cmp r2, r3 blo fixloop b clear_bss _rel_dyn_start_ofs: .word __rel_dyn_start - _start _rel_dyn_end_ofs: .word __rel_dyn_end - _start _dynsym_start_ofs: .word __dynsym_start - _start
#endif /* #ifndef CONFIG_SPL_BUILD */
clear_bss: ldr r0, _bss_start_ofs ldr r1, _bss_end_ofs mov r4, r6 /* reloc addr */ add r0, r0, r4 add r1, r1, r4 mov r2, #0x00000000 /* clear

On 03/02/2012 08:25, Dirk Behme wrote:
Hi,
Hi Dirk,
on i.MX6 devices, e.g. ARM2 or SabreLite, the ROM boot loader copies the U-Boot image from the boot device, e.g. the SD card, to the main memory. This does mean that U-Boot is started in RAM.
The same happens on MX5 and on several other SOCs, such as TIs.
With this, one might wonder why any relocation RAM -> RAM is done anyway and if this could be skipped?
There was very long threads in the ML when it was discussed if and how to introduce relocation for ARM processors. U-Boot for PowerPC have always supported relocation.
Relocation has other advantages as only to make U-Boot running from RAM. The main advantage I can see is that with relocation we can find at runtime the current size of installed RAM, and then move U-Boot at the end of RAM, leaving the whole memory free for the rest.
As rest I mean also loading the kernel, and avoiding that by increasing the kernel size or loading other images (ramdisks, fpga, some other blobs) there is a conflict with the running bootloader. This happened mopre often as we can imagine ;-)
Another point is that, getting the RAM size at runtime, we can have the same image if additional RAM is installed (or a new version with more memory is developped). This does not happen generally for the evaluation boards, but it happens very often with custom boards. In most cases, customers appreciate to have a single image supporting both hardware revisions (with more or less RAM).
There are also other features running with relocation (protected RAM, for example), sharing memory with Linux. We cannot have a general solution if each SOC defines its own private and fix address in RAM to link U-Boot.
Looking into the details shows that board_init_f() in arch/arm/lib/board.c and relocate_code() in arch/arm/cpu/armv7/start.S [1] are involved in this.
In board_init_f() the relocation destination address 'addr' is calculated. This is basically at the end of the available RAM (- some space for various stuff like TLB tables etc.). At SabreLite this results in 0x4FF8D000.
This is the reason - independently how much RAM you have on a Sabre, on a mx53QSB, or on the Beagleboard, U-Boot will be moved for all targets at the end of the memory.
By the boot loader, the U-Boot is loaded to
CONFIG_SYS_TEXT_BASE 0x17800000
This results in relocate_code() copying U-Boot from RAM 0x17800000 to RAM 0x4FF8D000.
Right
Setting CONFIG_SYS_TEXT_BASE to the relocation destination address 0x4FF8D000 does avoid the (unnecessary?) copy by
That's right - it was used in the past. We had also a CONFIG_SKIP_RELOCATE_UBOOT during the transition phase, together with other ones (I remember CONFIG_SYS_ARM_WITHOUT_RELOC).
These CONFIG_ are obsolete and they were removed some times ago.
cmp r0, r6 moveq r9, #0 /* no relocation. relocation offset(r9) = 0 */ beq clear_bss /* skip relocation */
in relocate_code().
But:
- The resulting image still runs without the relocation
(CONFIG_SYS_TEXT_BASE 0x4FF8D000). But e.g. the U-Boot command line doesn't work properly any more. Most probably this is because not only the copy is skipped by the 'beq clear_bss', but the whole 'fix .rel.dyn relocations' is skipped too.
- It's hard to set CONFIG_SYS_TEXT_BASE at compile time to the
relocation address calculated at runtime in board_init_f() due to the amount of #ifdef and runtime calculation done there. So finding a generic approach which could easily defined in the config files to avoid the relocation seems difficult.
Well, this is an advantage of relocation - we do not need such fixed address, and we have a generic way running on all architectures. You can of couse fix things to skip relocation on your board, but it is hard to make it generic and for the above reasons I doubt that can flow to mainline.
As your concerns are surely related to speed up the boot process, IMHO we can focus efforts to add cache support for MX5 / MX6.
Best regards, Stefano

On 03.02.2012 09:51, Stefano Babic wrote:
On 03/02/2012 08:25, Dirk Behme wrote:
Hi,
Hi Dirk,
on i.MX6 devices, e.g. ARM2 or SabreLite, the ROM boot loader copies the U-Boot image from the boot device, e.g. the SD card, to the main memory. This does mean that U-Boot is started in RAM.
The same happens on MX5 and on several other SOCs, such as TIs.
With this, one might wonder why any relocation RAM -> RAM is done anyway and if this could be skipped?
There was very long threads in the ML when it was discussed
Sorry if this was a FAQ. Many thanks for answering! :)
if and how to introduce relocation for ARM processors. U-Boot for PowerPC have always supported relocation.
Relocation has other advantages as only to make U-Boot running from RAM. The main advantage I can see is that with relocation we can find at runtime the current size of installed RAM, and then move U-Boot at the end of RAM, leaving the whole memory free for the rest.
As rest I mean also loading the kernel, and avoiding that by increasing the kernel size or loading other images (ramdisks, fpga, some other blobs) there is a conflict with the running bootloader. This happened mopre often as we can imagine ;-)
Another point is that, getting the RAM size at runtime, we can have the same image if additional RAM is installed (or a new version with more memory is developped). This does not happen generally for the evaluation boards, but it happens very often with custom boards. In most cases, customers appreciate to have a single image supporting both hardware revisions (with more or less RAM).
There are also other features running with relocation (protected RAM, for example), sharing memory with Linux. We cannot have a general solution if each SOC defines its own private and fix address in RAM to link U-Boot.
Looking into the details shows that board_init_f() in arch/arm/lib/board.c and relocate_code() in arch/arm/cpu/armv7/start.S [1] are involved in this.
In board_init_f() the relocation destination address 'addr' is calculated. This is basically at the end of the available RAM (- some space for various stuff like TLB tables etc.). At SabreLite this results in 0x4FF8D000.
This is the reason - independently how much RAM you have on a Sabre, on a mx53QSB, or on the Beagleboard, U-Boot will be moved for all targets at the end of the memory.
By the boot loader, the U-Boot is loaded to
CONFIG_SYS_TEXT_BASE 0x17800000
This results in relocate_code() copying U-Boot from RAM 0x17800000 to RAM 0x4FF8D000.
Right
Setting CONFIG_SYS_TEXT_BASE to the relocation destination address 0x4FF8D000 does avoid the (unnecessary?) copy by
That's right - it was used in the past. We had also a CONFIG_SKIP_RELOCATE_UBOOT during the transition phase, together with other ones (I remember CONFIG_SYS_ARM_WITHOUT_RELOC).
These CONFIG_ are obsolete and they were removed some times ago.
cmp r0, r6 moveq r9, #0 /* no relocation. relocation offset(r9) = 0 */ beq clear_bss /* skip relocation */
in relocate_code().
But:
- The resulting image still runs without the relocation
(CONFIG_SYS_TEXT_BASE 0x4FF8D000). But e.g. the U-Boot command line doesn't work properly any more. Most probably this is because not only the copy is skipped by the 'beq clear_bss', but the whole 'fix .rel.dyn relocations' is skipped too.
- It's hard to set CONFIG_SYS_TEXT_BASE at compile time to the
relocation address calculated at runtime in board_init_f() due to the amount of #ifdef and runtime calculation done there. So finding a generic approach which could easily defined in the config files to avoid the relocation seems difficult.
Well, this is an advantage of relocation - we do not need such fixed address, and we have a generic way running on all architectures. You can of couse fix things to skip relocation on your board,
Ok, understood :) Do you have any pointers or hints how to implement a board specific relocation skip? Just in case somebody wants us to implement this for a specific i.MX6 board ...
but it is hard to make it generic and for the above reasons I doubt that can flow to mainline.
As your concerns are surely related to speed up the boot process, IMHO we can focus efforts to add cache support for MX5 / MX6.
Ok, sounds good. Any idea what has to be done for this? Or what would be the steps for this? Maybe we should open a new thread or at least rename the subject of this mail for this discussion?
Best regards
Dirk

On 03/02/2012 11:18, Dirk Behme wrote:
Ok, understood :) Do you have any pointers or hints how to implement a board specific relocation skip? Just in case somebody wants us to implement this for a specific i.MX6 board ...
Not really - I think you have to dig into the git history, when we could skip relocation via a CONFIG_ OPTION. Maybe someone else can give some more hints. Anyway, nobody nowadays checks if a patch breaks when the relocation is skipped, as this option is unsupported, and the possibility that your implementation will be break by next update is quite high...
As your concerns are surely related to speed up the boot process, IMHO we can focus efforts to add cache support for MX5 / MX6.
Ok, sounds good. Any idea what has to be done for this? Or what would be the steps for this?
As armv7 architecture, the MX can profit of the work already done for other SOCs. Functions for enabling / disabling / invalidate caches are already provided, in arch/arm/lib and arch/arm/cpu/armv7/cache_v7.c. So at least for MX5/MX6.
But we should change MXC drivers to be cache-aware. At least the FEC driver and MMC driver are known to not work when dcache is on.
Maybe we should open a new thread or at least rename the subject of this mail for this discussion?
Not a bad idea.
Best regards, Stefano Babic

On Friday 03 February 2012 06:00:57 Stefano Babic wrote:
On 03/02/2012 11:18, Dirk Behme wrote:
Ok, understood :) Do you have any pointers or hints how to implement a board specific relocation skip? Just in case somebody wants us to implement this for a specific i.MX6 board ...
Not really - I think you have to dig into the git history, when we could skip relocation via a CONFIG_ OPTION. Maybe someone else can give some more hints. Anyway, nobody nowadays checks if a patch breaks when the relocation is skipped, as this option is unsupported, and the possibility that your implementation will be break by next update is quite high...
in common code, i know relocation skipping works as that is the only mode i use. for arm-specific code, i cannot say. -mike

Let's discuss how to enable the i.MX5/6 caches in U-Boot:
On 03.02.2012 12:00, Stefano Babic wrote:
On 03/02/2012 11:18, Dirk Behme wrote:
...
As your concerns are surely related to speed up the boot process, IMHO we can focus efforts to add cache support for MX5 / MX6.
Ok, sounds good. Any idea what has to be done for this? Or what would be the steps for this?
As armv7 architecture, the MX can profit of the work already done for other SOCs. Functions for enabling / disabling / invalidate caches are already provided, in arch/arm/lib and arch/arm/cpu/armv7/cache_v7.c. So at least for MX5/MX6.
But we should change MXC drivers to be cache-aware. At least the FEC driver and MMC driver are known to not work when dcache is on.
Marek, Troy, Fabio: What do you think is needed to make the i.MX5/6 FEC driver cache-aware?
Jason, Stefano: And what do you think would be needed for the MMC driver?
Best regards
Dirk

Hi Dirk,
On Saturday 04 February 2012 02:08 PM, Dirk Behme wrote:
Let's discuss how to enable the i.MX5/6 caches in U-Boot:
On 03.02.2012 12:00, Stefano Babic wrote:
On 03/02/2012 11:18, Dirk Behme wrote:
...
As your concerns are surely related to speed up the boot process, IMHO we can focus efforts to add cache support for MX5 / MX6.
Ok, sounds good. Any idea what has to be done for this? Or what would be the steps for this?
As armv7 architecture, the MX can profit of the work already done for other SOCs. Functions for enabling / disabling / invalidate caches are already provided, in arch/arm/lib and arch/arm/cpu/armv7/cache_v7.c. So at least for MX5/MX6.
But we should change MXC drivers to be cache-aware. At least the FEC driver and MMC driver are known to not work when dcache is on.
Marek, Troy, Fabio: What do you think is needed to make the i.MX5/6 FEC driver cache-aware?
Jason, Stefano: And what do you think would be needed for the MMC driver?
Three is a generic README for ARM at doc/README.arm-caches
br, Aneesh

Let's discuss how to enable the i.MX5/6 caches in U-Boot:
On 03.02.2012 12:00, Stefano Babic wrote:
On 03/02/2012 11:18, Dirk Behme wrote:
...
As your concerns are surely related to speed up the boot process, IMHO we can focus efforts to add cache support for MX5 / MX6.
Ok, sounds good. Any idea what has to be done for this? Or what would be the steps for this?
As armv7 architecture, the MX can profit of the work already done for other SOCs. Functions for enabling / disabling / invalidate caches are already provided, in arch/arm/lib and arch/arm/cpu/armv7/cache_v7.c. So at least for MX5/MX6.
But we should change MXC drivers to be cache-aware. At least the FEC driver and MMC driver are known to not work when dcache is on.
Marek, Troy, Fabio: What do you think is needed to make the i.MX5/6 FEC driver cache-aware?
I already have a partly finished implementation of FEC ethernet with cache support somewhere on my drive.
M
Jason, Stefano: And what do you think would be needed for the MMC driver?
Best regards
Dirk

On 04.02.2012 11:18, Marek Vasut wrote:
Let's discuss how to enable the i.MX5/6 caches in U-Boot:
On 03.02.2012 12:00, Stefano Babic wrote:
On 03/02/2012 11:18, Dirk Behme wrote:
...
As your concerns are surely related to speed up the boot process, IMHO we can focus efforts to add cache support for MX5 / MX6.
Ok, sounds good. Any idea what has to be done for this? Or what would be the steps for this?
As armv7 architecture, the MX can profit of the work already done for other SOCs. Functions for enabling / disabling / invalidate caches are already provided, in arch/arm/lib and arch/arm/cpu/armv7/cache_v7.c. So at least for MX5/MX6.
But we should change MXC drivers to be cache-aware. At least the FEC driver and MMC driver are known to not work when dcache is on.
Marek, Troy, Fabio: What do you think is needed to make the i.MX5/6 FEC driver cache-aware?
I already have a partly finished implementation of FEC ethernet with cache support somewhere on my drive.
Do you like to share this?
Many thanks and best regards
Dirk

On 04.02.2012 11:18, Marek Vasut wrote:
Let's discuss how to enable the i.MX5/6 caches in U-Boot:
On 03.02.2012 12:00, Stefano Babic wrote:
On 03/02/2012 11:18, Dirk Behme wrote:
...
As your concerns are surely related to speed up the boot process, IMHO we can focus efforts to add cache support for MX5 / MX6.
Ok, sounds good. Any idea what has to be done for this? Or what would be the steps for this?
As armv7 architecture, the MX can profit of the work already done for other SOCs. Functions for enabling / disabling / invalidate caches are already provided, in arch/arm/lib and arch/arm/cpu/armv7/cache_v7.c. So at least for MX5/MX6.
But we should change MXC drivers to be cache-aware. At least the FEC driver and MMC driver are known to not work when dcache is on.
Marek, Troy, Fabio: What do you think is needed to make the i.MX5/6 FEC driver cache-aware?
I already have a partly finished implementation of FEC ethernet with cache support somewhere on my drive.
Do you like to share this?
Many thanks and best regards
Try the attached stuff, it's likely crap and needs rebasing. It might give you some pointers as of how to handle this.
Dirk

Hi Dirk,
On Friday 03 February 2012 12:55 PM, Dirk Behme wrote:
Hi,
on i.MX6 devices, e.g. ARM2 or SabreLite, the ROM boot loader copies the U-Boot image from the boot device, e.g. the SD card, to the main memory. This does mean that U-Boot is started in RAM.
With this, one might wonder why any relocation RAM -> RAM is done anyway and if this could be skipped?
Looking into the details shows that board_init_f() in arch/arm/lib/board.c and relocate_code() in arch/arm/cpu/armv7/start.S [1] are involved in this.
In board_init_f() the relocation destination address 'addr' is calculated. This is basically at the end of the available RAM (- some space for various stuff like TLB tables etc.). At SabreLite this results in 0x4FF8D000.
By the boot loader, the U-Boot is loaded to
CONFIG_SYS_TEXT_BASE 0x17800000
This results in relocate_code() copying U-Boot from RAM 0x17800000 to RAM 0x4FF8D000.
Setting CONFIG_SYS_TEXT_BASE to the relocation destination address 0x4FF8D000 does avoid the (unnecessary?) copy by
cmp r0, r6 moveq r9, #0 /* no relocation. relocation offset(r9) = 0 */ beq clear_bss /* skip relocation */
in relocate_code().
But:
- The resulting image still runs without the relocation
(CONFIG_SYS_TEXT_BASE 0x4FF8D000). But e.g. the U-Boot command line doesn't work properly any more. Most probably this is because not only the copy is skipped by the 'beq clear_bss', but the whole 'fix .rel.dyn relocations' is skipped too.
- It's hard to set CONFIG_SYS_TEXT_BASE at compile time to the
relocation address calculated at runtime in board_init_f() due to the amount of #ifdef and runtime calculation done there. So finding a generic approach which could easily defined in the config files to avoid the relocation seems difficult.
I haven't really completely read your mail. But here is an implementation I had provided long time back for ARM. But Wolfgang didn't want to take it. You can see the patch and the following discussion in this thread:
http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/96352
br, Aneesh

Le 04/02/2012 10:15, Aneesh V a écrit :
Hi Dirk,
On Friday 03 February 2012 12:55 PM, Dirk Behme wrote:
Hi,
on i.MX6 devices, e.g. ARM2 or SabreLite, the ROM boot loader copies the U-Boot image from the boot device, e.g. the SD card, to the main memory. This does mean that U-Boot is started in RAM.
With this, one might wonder why any relocation RAM -> RAM is done anyway and if this could be skipped?
Looking into the details shows that board_init_f() in arch/arm/lib/board.c and relocate_code() in arch/arm/cpu/armv7/start.S [1] are involved in this.
In board_init_f() the relocation destination address 'addr' is calculated. This is basically at the end of the available RAM (- some space for various stuff like TLB tables etc.). At SabreLite this results in 0x4FF8D000.
By the boot loader, the U-Boot is loaded to
CONFIG_SYS_TEXT_BASE 0x17800000
This results in relocate_code() copying U-Boot from RAM 0x17800000 to RAM 0x4FF8D000.
Setting CONFIG_SYS_TEXT_BASE to the relocation destination address 0x4FF8D000 does avoid the (unnecessary?) copy by
cmp r0, r6 moveq r9, #0 /* no relocation. relocation offset(r9) = 0 */ beq clear_bss /* skip relocation */
in relocate_code().
But:
- The resulting image still runs without the relocation
(CONFIG_SYS_TEXT_BASE 0x4FF8D000). But e.g. the U-Boot command line doesn't work properly any more. Most probably this is because not only the copy is skipped by the 'beq clear_bss', but the whole 'fix .rel.dyn relocations' is skipped too.
- It's hard to set CONFIG_SYS_TEXT_BASE at compile time to the
relocation address calculated at runtime in board_init_f() due to the amount of #ifdef and runtime calculation done there. So finding a generic approach which could easily defined in the config files to avoid the relocation seems difficult.
I haven't really completely read your mail. But here is an implementation I had provided long time back for ARM. But Wolfgang didn't want to take it. You can see the patch and the following discussion in this thread:
http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/96352
Recently there was an reminder by Wolfgang that debugging can be done even with relocation, provided the symbols are dropped and reloaded in gdb upon hitting the end of the relocate loop where the jump to (the new location of) board_init_f happens --see http://www.denx.de/wiki/view/DULG/WrongDebugSymbolsAfterRelocation.
I am not a specialist of gdb but I think it might be automated, too, so that if you want to debug u-boot past relocation then you would just have to enter a single command in gdb, or a script name when invoking gdb, to load u-boot in low RAM , set a breakpoint at the pivot point after relocation, run to that breakpoint, drop current symbols and reload symbols with the adequate offset, possibly computed from some accessible global.
Anyone itching enough to do some research and experiments on this?
br, Aneesh
Amicalement,

On Saturday 04 February 2012 04:30 PM, Albert ARIBAUD wrote:
Le 04/02/2012 10:15, Aneesh V a écrit :
Hi Dirk,
On Friday 03 February 2012 12:55 PM, Dirk Behme wrote:
Hi,
on i.MX6 devices, e.g. ARM2 or SabreLite, the ROM boot loader copies the U-Boot image from the boot device, e.g. the SD card, to the main memory. This does mean that U-Boot is started in RAM.
With this, one might wonder why any relocation RAM -> RAM is done anyway and if this could be skipped?
Looking into the details shows that board_init_f() in arch/arm/lib/board.c and relocate_code() in arch/arm/cpu/armv7/start.S [1] are involved in this.
In board_init_f() the relocation destination address 'addr' is calculated. This is basically at the end of the available RAM (- some space for various stuff like TLB tables etc.). At SabreLite this results in 0x4FF8D000.
By the boot loader, the U-Boot is loaded to
CONFIG_SYS_TEXT_BASE 0x17800000
This results in relocate_code() copying U-Boot from RAM 0x17800000 to RAM 0x4FF8D000.
Setting CONFIG_SYS_TEXT_BASE to the relocation destination address 0x4FF8D000 does avoid the (unnecessary?) copy by
cmp r0, r6 moveq r9, #0 /* no relocation. relocation offset(r9) = 0 */ beq clear_bss /* skip relocation */
in relocate_code().
But:
- The resulting image still runs without the relocation
(CONFIG_SYS_TEXT_BASE 0x4FF8D000). But e.g. the U-Boot command line doesn't work properly any more. Most probably this is because not only the copy is skipped by the 'beq clear_bss', but the whole 'fix .rel.dyn relocations' is skipped too.
- It's hard to set CONFIG_SYS_TEXT_BASE at compile time to the
relocation address calculated at runtime in board_init_f() due to the amount of #ifdef and runtime calculation done there. So finding a generic approach which could easily defined in the config files to avoid the relocation seems difficult.
I haven't really completely read your mail. But here is an implementation I had provided long time back for ARM. But Wolfgang didn't want to take it. You can see the patch and the following discussion in this thread:
http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/96352
Recently there was an reminder by Wolfgang that debugging can be done even with relocation, provided the symbols are dropped and reloaded in gdb upon hitting the end of the relocate loop where the jump to (the new location of) board_init_f happens --see http://www.denx.de/wiki/view/DULG/WrongDebugSymbolsAfterRelocation.
I am not a specialist of gdb but I think it might be automated, too, so that if you want to debug u-boot past relocation then you would just have to enter a single command in gdb, or a script name when invoking gdb, to load u-boot in low RAM , set a breakpoint at the pivot point after relocation, run to that breakpoint, drop current symbols and reload symbols with the adequate offset, possibly computed from some accessible global.
Anyone itching enough to do some research and experiments on this?
I employ a different method using my JTAG debugger.
1. Look at the content of gd using the address from r8. Lauterbach allows you to cast that address to a (struct global_data *) and view the contents.
2. Get reloc_off from gd and use that to relocate the symbols in Trace32.
The advantage is that I can do all this after booting completely. No breakpoint needed.
br, Aneesh

Le 04/02/2012 12:14, Aneesh V a écrit :
On Saturday 04 February 2012 04:30 PM, Albert ARIBAUD wrote:
Le 04/02/2012 10:15, Aneesh V a écrit :
Hi Dirk,
On Friday 03 February 2012 12:55 PM, Dirk Behme wrote:
Hi,
on i.MX6 devices, e.g. ARM2 or SabreLite, the ROM boot loader copies the U-Boot image from the boot device, e.g. the SD card, to the main memory. This does mean that U-Boot is started in RAM.
With this, one might wonder why any relocation RAM -> RAM is done anyway and if this could be skipped?
Looking into the details shows that board_init_f() in arch/arm/lib/board.c and relocate_code() in arch/arm/cpu/armv7/start.S [1] are involved in this.
In board_init_f() the relocation destination address 'addr' is calculated. This is basically at the end of the available RAM (- some space for various stuff like TLB tables etc.). At SabreLite this results in 0x4FF8D000.
By the boot loader, the U-Boot is loaded to
CONFIG_SYS_TEXT_BASE 0x17800000
This results in relocate_code() copying U-Boot from RAM 0x17800000 to RAM 0x4FF8D000.
Setting CONFIG_SYS_TEXT_BASE to the relocation destination address 0x4FF8D000 does avoid the (unnecessary?) copy by
cmp r0, r6 moveq r9, #0 /* no relocation. relocation offset(r9) = 0 */ beq clear_bss /* skip relocation */
in relocate_code().
But:
- The resulting image still runs without the relocation
(CONFIG_SYS_TEXT_BASE 0x4FF8D000). But e.g. the U-Boot command line doesn't work properly any more. Most probably this is because not only the copy is skipped by the 'beq clear_bss', but the whole 'fix .rel.dyn relocations' is skipped too.
- It's hard to set CONFIG_SYS_TEXT_BASE at compile time to the
relocation address calculated at runtime in board_init_f() due to the amount of #ifdef and runtime calculation done there. So finding a generic approach which could easily defined in the config files to avoid the relocation seems difficult.
I haven't really completely read your mail. But here is an implementation I had provided long time back for ARM. But Wolfgang didn't want to take it. You can see the patch and the following discussion in this thread:
http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/96352
Recently there was an reminder by Wolfgang that debugging can be done even with relocation, provided the symbols are dropped and reloaded in gdb upon hitting the end of the relocate loop where the jump to (the new location of) board_init_f happens --see http://www.denx.de/wiki/view/DULG/WrongDebugSymbolsAfterRelocation.
I am not a specialist of gdb but I think it might be automated, too, so that if you want to debug u-boot past relocation then you would just have to enter a single command in gdb, or a script name when invoking gdb, to load u-boot in low RAM , set a breakpoint at the pivot point after relocation, run to that breakpoint, drop current symbols and reload symbols with the adequate offset, possibly computed from some accessible global.
Anyone itching enough to do some research and experiments on this?
I employ a different method using my JTAG debugger.
- Look at the content of gd using the address from r8. Lauterbach
allows you to cast that address to a (struct global_data *) and view the contents.
- Get reloc_off from gd and use that to relocate the symbols in
Trace32.
The advantage is that I can do all this after booting completely. No breakpoint needed.
Indeed, assuming you only want to debug post-relocation you can use this technique -- I guess it is applicable to GDB as well.
br, Aneesh
Amicalement,

On Sat, Feb 4, 2012 at 4:00 AM, Albert ARIBAUD albert.u.boot@aribaud.net wrote:
Le 04/02/2012 10:15, Aneesh V a écrit :
Hi Dirk,
On Friday 03 February 2012 12:55 PM, Dirk Behme wrote:
Hi,
on i.MX6 devices, e.g. ARM2 or SabreLite, the ROM boot loader copies the U-Boot image from the boot device, e.g. the SD card, to the main memory. This does mean that U-Boot is started in RAM.
With this, one might wonder why any relocation RAM -> RAM is done anyway and if this could be skipped?
Looking into the details shows that board_init_f() in arch/arm/lib/board.c and relocate_code() in arch/arm/cpu/armv7/start.S [1] are involved in this.
In board_init_f() the relocation destination address 'addr' is calculated. This is basically at the end of the available RAM (- some space for various stuff like TLB tables etc.). At SabreLite this results in 0x4FF8D000.
By the boot loader, the U-Boot is loaded to
CONFIG_SYS_TEXT_BASE 0x17800000
This results in relocate_code() copying U-Boot from RAM 0x17800000 to RAM 0x4FF8D000.
Setting CONFIG_SYS_TEXT_BASE to the relocation destination address 0x4FF8D000 does avoid the (unnecessary?) copy by
cmp r0, r6 moveq r9, #0 /* no relocation. relocation offset(r9) = 0 */ beq clear_bss /* skip relocation */
in relocate_code().
But:
- The resulting image still runs without the relocation
(CONFIG_SYS_TEXT_BASE 0x4FF8D000). But e.g. the U-Boot command line doesn't work properly any more. Most probably this is because not only the copy is skipped by the 'beq clear_bss', but the whole 'fix .rel.dyn relocations' is skipped too.
- It's hard to set CONFIG_SYS_TEXT_BASE at compile time to the
relocation address calculated at runtime in board_init_f() due to the amount of #ifdef and runtime calculation done there. So finding a generic approach which could easily defined in the config files to avoid the relocation seems difficult.
I haven't really completely read your mail. But here is an implementation I had provided long time back for ARM. But Wolfgang didn't want to take it. You can see the patch and the following discussion in this thread:
http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/96352
Recently there was an reminder by Wolfgang that debugging can be done even with relocation, provided the symbols are dropped and reloaded in gdb upon hitting the end of the relocate loop where the jump to (the new location of) board_init_f happens --see http://www.denx.de/wiki/view/DULG/WrongDebugSymbolsAfterRelocation.
I am not a specialist of gdb but I think it might be automated, too, so that if you want to debug u-boot past relocation then you would just have to enter a single command in gdb, or a script name when invoking gdb, to load u-boot in low RAM , set a breakpoint at the pivot point after relocation, run to that breakpoint, drop current symbols and reload symbols with the adequate offset, possibly computed from some accessible global.
Anyone itching enough to do some research and experiments on this?
In my experience, the offset is consistent on a given platform so once you do the dance once to figure out where it'll be placed you can just start off debugging post-relocation.

Hi Tom,
Le 06/02/2012 15:34, Tom Rini a écrit :
On Sat, Feb 4, 2012 at 4:00 AM, Albert ARIBAUD albert.u.boot@aribaud.net wrote:
Le 04/02/2012 10:15, Aneesh V a écrit :
Hi Dirk,
On Friday 03 February 2012 12:55 PM, Dirk Behme wrote:
Hi,
on i.MX6 devices, e.g. ARM2 or SabreLite, the ROM boot loader copies the U-Boot image from the boot device, e.g. the SD card, to the main memory. This does mean that U-Boot is started in RAM.
With this, one might wonder why any relocation RAM -> RAM is done anyway and if this could be skipped?
Looking into the details shows that board_init_f() in arch/arm/lib/board.c and relocate_code() in arch/arm/cpu/armv7/start.S [1] are involved in this.
In board_init_f() the relocation destination address 'addr' is calculated. This is basically at the end of the available RAM (- some space for various stuff like TLB tables etc.). At SabreLite this results in 0x4FF8D000.
By the boot loader, the U-Boot is loaded to
CONFIG_SYS_TEXT_BASE 0x17800000
This results in relocate_code() copying U-Boot from RAM 0x17800000 to RAM 0x4FF8D000.
Setting CONFIG_SYS_TEXT_BASE to the relocation destination address 0x4FF8D000 does avoid the (unnecessary?) copy by
cmp r0, r6 moveq r9, #0 /* no relocation. relocation offset(r9) = 0 */ beq clear_bss /* skip relocation */
in relocate_code().
But:
- The resulting image still runs without the relocation
(CONFIG_SYS_TEXT_BASE 0x4FF8D000). But e.g. the U-Boot command line doesn't work properly any more. Most probably this is because not only the copy is skipped by the 'beq clear_bss', but the whole 'fix .rel.dyn relocations' is skipped too.
- It's hard to set CONFIG_SYS_TEXT_BASE at compile time to the
relocation address calculated at runtime in board_init_f() due to the amount of #ifdef and runtime calculation done there. So finding a generic approach which could easily defined in the config files to avoid the relocation seems difficult.
I haven't really completely read your mail. But here is an implementation I had provided long time back for ARM. But Wolfgang didn't want to take it. You can see the patch and the following discussion in this thread:
http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/96352
Recently there was an reminder by Wolfgang that debugging can be done even with relocation, provided the symbols are dropped and reloaded in gdb upon hitting the end of the relocate loop where the jump to (the new location of) board_init_f happens --see http://www.denx.de/wiki/view/DULG/WrongDebugSymbolsAfterRelocation.
I am not a specialist of gdb but I think it might be automated, too, so that if you want to debug u-boot past relocation then you would just have to enter a single command in gdb, or a script name when invoking gdb, to load u-boot in low RAM , set a breakpoint at the pivot point after relocation, run to that breakpoint, drop current symbols and reload symbols with the adequate offset, possibly computed from some accessible global.
Anyone itching enough to do some research and experiments on this?
In my experience, the offset is consistent on a given platform so once you do the dance once to figure out where it'll be placed you can just start off debugging post-relocation.
That's for a given platform *and* a given U-Boot build, since the U-Boot location is computed from top of DDR down, so if U-Boot grows or shrinks, its base address will change.
Amicalement,

Dear Albert ARIBAUD,
In message 4F304463.1050901@aribaud.net you wrote:
In my experience, the offset is consistent on a given platform so once you do the dance once to figure out where it'll be placed you can just start off debugging post-relocation.
That's for a given platform *and* a given U-Boot build, since the U-Boot >
...and for a given set of configured options and environment settings.
Change the size of the PRAM area, or change the resolution of your graphics controller (and thus the size of the frame buffer), or change the size of the log buffer, or ...
There is a _plenty_ of reasons why the relocation address may change, even for a given binary image and a given piece of hardware.
Best regards,
Wolfgang Denk

Hi Wolfgang,
On Tue, Feb 7, 2012 at 9:27 AM, Wolfgang Denk wd@denx.de wrote:
Dear Albert ARIBAUD,
In message 4F304463.1050901@aribaud.net you wrote:
In my experience, the offset is consistent on a given platform so once you do the dance once to figure out where it'll be placed you can just start off debugging post-relocation.
That's for a given platform *and* a given U-Boot build, since the U-Boot >
...and for a given set of configured options and environment settings.
Change the size of the PRAM area, or change the resolution of your graphics controller (and thus the size of the frame buffer), or change
The graphics controller memory may not be in main memory - It could be in the PCI address space, in which case it may not affect the relocation address
the size of the log buffer, or ...
There is a _plenty_ of reasons why the relocation address may change, even for a given binary image and a given piece of hardware.
Note that x86's E820 address map tells the kernel which physical pages of memory are reserved for system use - Paging takes care of defragmenting the physical address space to provide the kernel and user mode code a virtual linear address space. Technically, U-Boot does not need to be relocated for x86 at all, even if we want to keep it in RAM - We just tell the kernel that the physical address space that U-Boot resides in is reserved
There are a lot of other arches besides PPC ;) And a lot of boards which will be shipped with an extremely static configuration (lots of consumer devices like set-top boxes etc only have one physical configuration)
As I've said before, the new INIT_CALL framework will (hopefully) provide much better control at the board level with zero impact on the arch or common code - not just for relocation, but for all sorts of SoC defined functionality.
Regards,
Graeme

On Tuesday 07 February 2012 04:11 AM, Graeme Russ wrote:
Hi Wolfgang,
On Tue, Feb 7, 2012 at 9:27 AM, Wolfgang Denkwd@denx.de wrote:
Dear Albert ARIBAUD,
In message4F304463.1050901@aribaud.net you wrote:
In my experience, the offset is consistent on a given platform so once you do the dance once to figure out where it'll be placed you can just start off debugging post-relocation.
That's for a given platform *and* a given U-Boot build, since the U-Boot>
...and for a given set of configured options and environment settings.
Change the size of the PRAM area, or change the resolution of your graphics controller (and thus the size of the frame buffer), or change
The graphics controller memory may not be in main memory - It could be in the PCI address space, in which case it may not affect the relocation address
the size of the log buffer, or ...
There is a _plenty_ of reasons why the relocation address may change, even for a given binary image and a given piece of hardware.
Note that x86's E820 address map tells the kernel which physical pages of memory are reserved for system use - Paging takes care of defragmenting the physical address space to provide the kernel and user mode code a virtual linear address space. Technically, U-Boot does not need to be relocated for x86 at all, even if we want to keep it in RAM - We just tell the kernel that the physical address space that U-Boot resides in is reserved
There are a lot of other arches besides PPC ;) And a lot of boards which will be shipped with an extremely static configuration (lots of consumer devices like set-top boxes etc only have one physical configuration)
I agree. Even on some platforms that are not fully static (such as having variants with different memory sizes) the minimum available memory is more than enough to allocate big enough partitions for each need at U-Boot level. And my guess (or rather speculation) is that platforms that do not have any real dynamic needs are in the majority. I sincerely believe that platforms should be allowed to enable/disable relocation based on their needs.
br, Aneesh

Dear Aneesh V,
In message 4F30D06E.8060200@ti.com you wrote:
I agree. Even on some platforms that are not fully static (such as having variants with different memory sizes) the minimum available memory is more than enough to allocate big enough partitions for each need at U-Boot level. And my guess (or rather speculation) is that platforms that do not have any real dynamic needs are in the majority. I sincerely believe that platforms should be allowed to enable/disable relocation based on their needs.
This is your opinion. It is noted, and appreciated.
But you should not try to continue to ignore all the previous discussion to that topic. There have been no new arguments in this round, so there will be no change of the previously made decisions.
Please accept that.
Best regards,
Wolfgang Denk

Dear Wolfgang,
On Wednesday 08 February 2012 04:56 AM, Wolfgang Denk wrote:
Dear Aneesh V,
In message4F30D06E.8060200@ti.com you wrote:
I agree. Even on some platforms that are not fully static (such as having variants with different memory sizes) the minimum available memory is more than enough to allocate big enough partitions for each need at U-Boot level. And my guess (or rather speculation) is that platforms that do not have any real dynamic needs are in the majority. I sincerely believe that platforms should be allowed to enable/disable relocation based on their needs.
This is your opinion. It is noted, and appreciated.
But you should not try to continue to ignore all the previous discussion to that topic. There have been no new arguments in this round, so there will be no change of the previously made decisions.
First of all I am not arguing in favor of taking it in the current form. But as Graeme mentioned if the new initcall framework makes it clean and maintainable I am hoping that you will consider it more favorably. And this is indeed a new argument because the biggest objection previously was that the no-relocation case will be difficult to maintain.
As for ignoring comments, I think you are culpable of that more than me in this specific instance:) (of course I know you are busy person, but still..). For instance, my arguments in the previous round [1] never got an answer from you.
[1] http://article.gmane.org/gmane.comp.boot-loaders.u-boot/96371
best regards, Aneesh

Dear Aneesh V,
In message 4F3219A8.7090607@ti.com you wrote:
As for ignoring comments, I think you are culpable of that more than me in this specific instance:) (of course I know you are busy person, but still..). For instance, my arguments in the previous round [1] never got an answer from you.
[1] http://article.gmane.org/gmane.comp.boot-loaders.u-boot/96371
I don't see any question from you that has not been answered?
You suggest to do things in a way that introduces a number of often discussed disadvantages. And you claim that for you this would be good enough. What should I comment on this?
For the record:
- You continiue to talk about relocation, but you mean something different (a memory copy). I am not sure if you really understand the difference.
- You claim omitting this copy operation would be important to optimize boot times, but you cannot provide any real numbers that support such a claim:
* You do not know how much time exactly is needed for this copy operation, so you don't know how much you can potentially safe, and if this would result in any perceptible imprvement of the boot time.
* You did not investigate how long other parts of the boot process are talking, so you don't really know where the hot spot where you should focus your optimization efforts.
* You did not investigate how the timing behaviour changes if you enable both instruction and data cache in the SPL. In my experience this would be a way more rewarding target for optimization efforts than omitting a little memcpy().
Best regards,
Wolfgang Denk

Dear Wolfgang,
On Wednesday 08 February 2012 07:28 PM, Wolfgang Denk wrote:
Dear Aneesh V,
In message4F3219A8.7090607@ti.com you wrote:
As for ignoring comments, I think you are culpable of that more than me in this specific instance:) (of course I know you are busy person, but still..). For instance, my arguments in the previous round [1] never got an answer from you.
[1] http://article.gmane.org/gmane.comp.boot-loaders.u-boot/96371
I don't see any question from you that has not been answered?
You suggest to do things in a way that introduces a number of often discussed disadvantages. And you claim that for you this would be good enough. What should I comment on this?
For the record:
- You continiue to talk about relocation, but you mean something different (a memory copy). I am not sure if you really understand the difference.
I do understand what the ELF relocation does.
You claim omitting this copy operation would be important to optimize boot times, but you cannot provide any real numbers that support such a claim:
- You do not know how much time exactly is needed for this copy operation, so you don't know how much you can potentially safe, and if this would result in any perceptible imprvement of the boot time.
I had provided data for OMAP4 and agreed with you that from the boot time point of view it doesn't make sense to disable relocation [1].
But since then I changed my mind due to some other factors: 1. Difficulty in debugging. I use JTAG debuggers. The workarounds available are still painful and not many people know about it.
2. On FPGA platform, it was adding a considerable delay (I don't have the exact number, but that will be in minutes). The u-boot was already scaled down and was doing minimal stuff, but this one could not be removed easily. That's when I created that patch.
3. Un-necessary complexity without any benefit for our platform. I nearly get exhausted explaining to new u-boot users how it all works and nearly always gets confronted with the question "why do we need it?" In our platforms U-Boot starts from SDRAM and we do not need any of the flexibilities relocation may provide.
You did not investigate how long other parts of the boot process are talking, so you don't really know where the hot spot where you should focus your optimization efforts.
You did not investigate how the timing behaviour changes if you enable both instruction and data cache in the SPL. In my experience this would be a way more rewarding target for optimization efforts than omitting a little memcpy().
We have caches enabled. But anyway, these two points are irrelevant because my argument is not from the point of view of time.
At the end of the day I think we are making U-Boot way too complex for a bootloader and I think relocation is one of the factors.
[1] http://article.gmane.org/gmane.comp.boot-loaders.u-boot/88288

Dear Aneesh V,
In message 4F328B41.2050008@ti.com you wrote:
But since then I changed my mind due to some other factors:
- Difficulty in debugging. I use JTAG debuggers. The workarounds
available are still painful and not many people know about it.
We use JTAG debuggers all day, and have been doing so for well over 10 years. All development of PPCBoot nad U-Boot has been done withJTAG debuggers. Relocation has never been a real problem here.
Reasinf the manual may help - this is documented in detail there.
This is not a good reason to reconsider.
- On FPGA platform, it was adding a considerable delay (I don't have
the exact number, but that will be in minutes). The u-boot was already scaled down and was doing minimal stuff, but this one could not be removed easily. That's when I created that patch.
What exactly are you talking about here that "was adding a considerable delay" - the memory copy ? Are you really sure about that?
- Un-necessary complexity without any benefit for our platform. I
Maintaining different configurations of the code that behave differently, that can cause different types of addressing, compile and link and debug issues is also adding complexity. Using a single, well tested approach is one of the benefits even for your platform.
nearly get exhausted explaining to new u-boot users how it all works and nearly always gets confronted with the question "why do we need it?" In our platforms U-Boot starts from SDRAM and we do not need any of the flexibilities relocation may provide.
Maybe it would help if you add your explanations to the manual, so you can point people who ask to the manual instead of repeating this again and agian?
At the end of the day I think we are making U-Boot way too complex for a bootloader and I think relocation is one of the factors.
Well, if you prefer, you can probably adapt blob (http://sourceforge.net/projects/blob/) or similar to your system. That would be definitely less complex.
Hm... isn't it your users who are asking for the features?
Best regards,
Wolfgang Denk

On Wednesday 08 February 2012 09:53 PM, Wolfgang Denk wrote:
Dear Aneesh V,
In message4F328B41.2050008@ti.com you wrote:
But since then I changed my mind due to some other factors:
- Difficulty in debugging. I use JTAG debuggers. The workarounds
available are still painful and not many people know about it.
We use JTAG debuggers all day, and have been doing so for well over 10 years. All development of PPCBoot nad U-Boot has been done withJTAG debuggers. Relocation has never been a real problem here.
Reasinf the manual may help - this is documented in detail there.
This is not a good reason to reconsider.
- On FPGA platform, it was adding a considerable delay (I don't have
the exact number, but that will be in minutes). The u-boot was already scaled down and was doing minimal stuff, but this one could not be removed easily. That's when I created that patch.
What exactly are you talking about here that "was adding a considerable delay" - the memory copy ? Are you really sure about that?
I didn't measure it part by part, but removing relocation gave a noticeable speed-up, this platform is several orders of magnitude slower than the real silicon. So, that should not be surprising.
- Un-necessary complexity without any benefit for our platform. I
Maintaining different configurations of the code that behave differently, that can cause different types of addressing, compile and link and debug issues is also adding complexity. Using a single, well tested approach is one of the benefits even for your platform.
Fair enough. But will the new INITCALL framework help? I haven't really followed the discussions on it. But if, as Graeme claims, all relocation stuff is collected in one place and is easily pluggable then maintainability is not a problem, right?
Maybe, I should stop the arguments now and wait till that framework is a reality.
best regards, Aneesh

Dear Aneesh V,
In message 4F33614D.8020904@ti.com you wrote:
What exactly are you talking about here that "was adding a considerable delay" - the memory copy ? Are you really sure about that?
I didn't measure it part by part, but removing relocation gave a noticeable speed-up, this platform is several orders of magnitude slower than the real silicon. So, that should not be surprising.
Could you please start using exact terminology, so we understand what you actually refer to? Did you really remove the _relocation_, i. e. link for a static address, or did you just skip the memory copy? Note that the latter should be a no-op anyway if you just load the image to the resulting target address.
Maybe, I should stop the arguments now and wait till that framework is a reality.
I am very much convinced that you are tracking down a red herring. It does not really matter if you run the code on real silicon or in an emulation - the relative times will always be the same. Without any detailed timing analysis I simply do not believe you that you really have found a hot spot. You focus on it because you found out that it exists and you think it was "not needed" in your configuration - without spending time on real optimization.
This is a fundamentally broken approach, and it will remain to be broken even if new concepts get implemented that may make it easier to skip certain steps of the initialization.
If you are concerned about boot time optimization, you _must_ start with timing measurements. You know where premature optimization leads to, don't you?
Best regards,
Wolfgang Denk

On Thursday 09 February 2012 05:14 PM, Wolfgang Denk wrote:
Dear Aneesh V,
In message4F33614D.8020904@ti.com you wrote:
What exactly are you talking about here that "was adding a considerable delay" - the memory copy ? Are you really sure about that?
I didn't measure it part by part, but removing relocation gave a noticeable speed-up, this platform is several orders of magnitude slower than the real silicon. So, that should not be surprising.
Could you please start using exact terminology, so we understand what you actually refer to? Did you really remove the _relocation_, i. e. link for a static address, or did you just skip the memory copy? Note that the latter should be a no-op anyway if you just load the image to the resulting target address.
I defeated relocation by passing to the relocate_code() function the same address as it is linked to. I patched up arch/arm/lib/board.c for this and fixed up the relocate_code() to correctly handle this special case. So, relocate_code() does only .bss init now.
Maybe, I should stop the arguments now and wait till that framework is a reality.
I am very much convinced that you are tracking down a red herring. It does not really matter if you run the code on real silicon or in an emulation - the relative times will always be the same. Without any detailed timing analysis I simply do not believe you that you really have found a hot spot. You focus on it because you found out that it exists and you think it was "not needed" in your configuration - without spending time on real optimization.
Please note that our bootloaders and kernel are customized and scaled down for this environment. For instance, u-boot doesn't load the kernel from network or a memory device. The kernel is preloaded in the modeled memory for it. So, u-boot was just used to jump to the kernel. As such, the u-boot run-time is now more dominated by pure software stuff such as relocation. The relative timing doesn't quite apply.
This is a fundamentally broken approach, and it will remain to be broken even if new concepts get implemented that may make it easier to skip certain steps of the initialization.
If you are concerned about boot time optimization, you _must_ start with timing measurements. You know where premature optimization leads to, don't you?
As I mentioned earlier boot-time is not my key care-about. Even on an emulation platform I will probably try SPL Linux boot next time. My key concerns are about the other aspects I mentioned, namely avoidable complexity and problems with debugger.
br, Aneesh

Hi,
On Sat, Feb 4, 2012 at 1:15 AM, Aneesh V aneesh@ti.com wrote:
Hi Dirk,
On Friday 03 February 2012 12:55 PM, Dirk Behme wrote:
Hi,
on i.MX6 devices, e.g. ARM2 or SabreLite, the ROM boot loader copies the U-Boot image from the boot device, e.g. the SD card, to the main memory. This does mean that U-Boot is started in RAM.
With this, one might wonder why any relocation RAM -> RAM is done anyway and if this could be skipped?
Looking into the details shows that board_init_f() in arch/arm/lib/board.c and relocate_code() in arch/arm/cpu/armv7/start.S [1] are involved in this.
In board_init_f() the relocation destination address 'addr' is calculated. This is basically at the end of the available RAM (- some space for various stuff like TLB tables etc.). At SabreLite this results in 0x4FF8D000.
By the boot loader, the U-Boot is loaded to
CONFIG_SYS_TEXT_BASE 0x17800000
This results in relocate_code() copying U-Boot from RAM 0x17800000 to RAM 0x4FF8D000.
Setting CONFIG_SYS_TEXT_BASE to the relocation destination address 0x4FF8D000 does avoid the (unnecessary?) copy by
cmp r0, r6 moveq r9, #0 /* no relocation. relocation offset(r9) = 0 */ beq clear_bss /* skip relocation */
in relocate_code().
But:
- The resulting image still runs without the relocation
(CONFIG_SYS_TEXT_BASE 0x4FF8D000). But e.g. the U-Boot command line doesn't work properly any more. Most probably this is because not only the copy is skipped by the 'beq clear_bss', but the whole 'fix .rel.dyn relocations' is skipped too.
- It's hard to set CONFIG_SYS_TEXT_BASE at compile time to the
relocation address calculated at runtime in board_init_f() due to the amount of #ifdef and runtime calculation done there. So finding a generic approach which could easily defined in the config files to avoid the relocation seems difficult.
I haven't really completely read your mail. But here is an implementation I had provided long time back for ARM. But Wolfgang didn't want to take it. You can see the patch and the following discussion in this thread:
http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/96352
From your patch Aneesh I evolved something that I still use - it deals
with the case where malloc cannot fit below the text area.
I find any sort of messing with the ICE startup a pain - although I have often been able to script it. But for me I need to attach the device tree to the binary and a few other things so I might as well disable relocation at the same time. It also allows me to debug seamlessly in board_init_f() as well as afterwards.
I will send a patch.
It would be good to get something in mainline despite the protestations, if only to avoid all the work that people have to do to figure out this problem.
Regards, Simon
br, Aneesh
U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

On Sunday 05 February 2012 11:49 AM, Simon Glass wrote:
Hi,
On Sat, Feb 4, 2012 at 1:15 AM, Aneesh Vaneesh@ti.com wrote:
Hi Dirk,
On Friday 03 February 2012 12:55 PM, Dirk Behme wrote:
Hi,
on i.MX6 devices, e.g. ARM2 or SabreLite, the ROM boot loader copies the U-Boot image from the boot device, e.g. the SD card, to the main memory. This does mean that U-Boot is started in RAM.
With this, one might wonder why any relocation RAM -> RAM is done anyway and if this could be skipped?
Looking into the details shows that board_init_f() in arch/arm/lib/board.c and relocate_code() in arch/arm/cpu/armv7/start.S [1] are involved in this.
In board_init_f() the relocation destination address 'addr' is calculated. This is basically at the end of the available RAM (- some space for various stuff like TLB tables etc.). At SabreLite this results in 0x4FF8D000.
By the boot loader, the U-Boot is loaded to
CONFIG_SYS_TEXT_BASE 0x17800000
This results in relocate_code() copying U-Boot from RAM 0x17800000 to RAM 0x4FF8D000.
Setting CONFIG_SYS_TEXT_BASE to the relocation destination address 0x4FF8D000 does avoid the (unnecessary?) copy by
cmp r0, r6 moveq r9, #0 /* no relocation. relocation offset(r9) = 0 */ beq clear_bss /* skip relocation */
in relocate_code().
But:
- The resulting image still runs without the relocation
(CONFIG_SYS_TEXT_BASE 0x4FF8D000). But e.g. the U-Boot command line doesn't work properly any more. Most probably this is because not only the copy is skipped by the 'beq clear_bss', but the whole 'fix .rel.dyn relocations' is skipped too.
- It's hard to set CONFIG_SYS_TEXT_BASE at compile time to the
relocation address calculated at runtime in board_init_f() due to the amount of #ifdef and runtime calculation done there. So finding a generic approach which could easily defined in the config files to avoid the relocation seems difficult.
I haven't really completely read your mail. But here is an implementation I had provided long time back for ARM. But Wolfgang didn't want to take it. You can see the patch and the following discussion in this thread:
http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/96352
From your patch Aneesh I evolved something that I still use - it deals with the case where malloc cannot fit below the text area.
I find any sort of messing with the ICE startup a pain - although I have often been able to script it. But for me I need to attach the device tree to the binary and a few other things so I might as well disable relocation at the same time. It also allows me to debug seamlessly in board_init_f() as well as afterwards.
I will send a patch.
Great!
It would be good to get something in mainline despite the protestations, if only to avoid all the work that people have to do to figure out this problem.
I am always in favor of that:)
best regards, Aneesh
participants (11)
-
Albert ARIBAUD
-
Aneesh V
-
Dirk Behme
-
Dirk Behme
-
Graeme Russ
-
Marek Vasut
-
Mike Frysinger
-
Simon Glass
-
Stefano Babic
-
Tom Rini
-
Wolfgang Denk