[U-Boot] AT91: problems master vs. next

Just to report on preliminary findings I had:
Rebasing my current TOP9000 port on u-boot/master compiles and works fine. Code size increased moderately from 223592 to 223976.
Rebasing my current TOP9000 port on u-boot/next compiles after defining CONFIG_SYS_SDRAM_BASE and CONFIG_SYS_INIT_SP_ADDR. Code size increased heavyly from 223592 to 245544.
And U-Boot crashes instantly (I know there is more to be done than just defining those two macros).
What bothers me really here is the huge increase in code size.
And, on almost all AT91 systems booting will be through a first boot loader, which sets up SDRAM, loads u-boot to the "correct" address and jumps to it. All low level init and relocation is not required in such cases.
It should be always possible to #define relocation off!
With Best Regards Reinhard

Le 21/09/2010 14:39, Reinhard Meyer a écrit :
Just to report on preliminary findings I had:
Rebasing my current TOP9000 port on u-boot/master compiles and works fine. Code size increased moderately from 223592 to 223976.
Rebasing my current TOP9000 port on u-boot/next compiles after defining CONFIG_SYS_SDRAM_BASE and CONFIG_SYS_INIT_SP_ADDR. Code size increased heavyly from 223592 to 245544.
And U-Boot crashes instantly (I know there is more to be done than just defining those two macros).
What bothers me really here is the huge increase in code size.
I see numbers similar for orion5x based net5big, where non relocating build is 117252 bytes while relocating build is 127120, a 8.4% increase (yours is 9.8%).
This is due to the fact that each routine has to recompute the PIC register. As a test, I tried adding -msingle-pic-base to -fPIC (this computes the PIC register once for the whole code) and the code size falls back to 123764 bytes, 'only' 5.5% more than the non relocating case. Using -fPIE -pie -msingle-pic-base lowers this again to 5.2%.
Of course you cannot just turn -msingle-pic-base on; you've got to have the code in start.S that computes the register. Also, switching from PIC to PIE needs to be verified. I've got the code in my local tree, but it's not tested yet. I'll test it tonight and post it if it works.
And, on almost all AT91 systems booting will be through a first boot loader, which sets up SDRAM, loads u-boot to the "correct" address and jumps to it. All low level init and relocation is not required in such cases.
It should be always possible to #define relocation off!
On arm926ejs this is controlled by CONFIG_SKIP_LOWLEVEL_INIT and CONFIG_SKIP_RELOCATE_UBOOT. For instance, openrd_base, a kirkwood board, always skips lowlevel inits.
Amicalement,

Dear Albert ARIBAUD,
Le 21/09/2010 14:39, Reinhard Meyer a écrit :
Rebasing my current TOP9000 port on u-boot/next compiles after defining CONFIG_SYS_SDRAM_BASE and CONFIG_SYS_INIT_SP_ADDR. Code size increased heavyly from 223592 to 245544.
And U-Boot crashes instantly (I know there is more to be done than just defining those two macros).
What bothers me really here is the huge increase in code size.
I see numbers similar for orion5x based net5big, where non relocating build is 117252 bytes while relocating build is 127120, a 8.4% increase (yours is 9.8%).
This is due to the fact that each routine has to recompute the PIC register. As a test, I tried adding -msingle-pic-base to -fPIC (this computes the PIC register once for the whole code) and the code size falls back to 123764 bytes, 'only' 5.5% more than the non relocating case. Using -fPIE -pie -msingle-pic-base lowers this again to 5.2%.
Of course you cannot just turn -msingle-pic-base on; you've got to have the code in start.S that computes the register. Also, switching from PIC to PIE needs to be verified. I've got the code in my local tree, but it's not tested yet. I'll test it tonight and post it if it works.
And, on almost all AT91 systems booting will be through a first boot loader, which sets up SDRAM, loads u-boot to the "correct" address and jumps to it. All low level init and relocation is not required in such cases.
It should be always possible to #define relocation off!
On arm926ejs this is controlled by CONFIG_SKIP_LOWLEVEL_INIT and CONFIG_SKIP_RELOCATE_UBOOT. For instance, openrd_base, a kirkwood board, always skips lowlevel inits.
Yep, those are set and work well with master. However the extra almost 10% of code increase (with next) will not go away with that.
Therefore I strongly suggest that all extras (PIC) needed solely for relocation should be switchable OFF by a configuration option. Who does need that relocation in the first place? For years ARM did work without it; why now blowing up the code?
Reinhard

Reinhard Meyer schrieb:
Therefore I strongly suggest that all extras (PIC) needed solely for relocation should be switchable OFF by a configuration option. Who does need that relocation in the first place? For years ARM did work without it; why now blowing up the code?
Sorry, to be precise: the option CONFIG_SYS_ARM_WITHOUT_RELOC should stay as a permanent feature.
However, when I compile with that option defined in my board-config.h I get the following warnings for EVERY file:
/home/reinhard/embedded/u-boot/include/configs/top9000_9xe.h:74:1: warning: "CONFIG_SYS_ARM_WITHOUT_RELOC" redefined <command-line>: warning: this is the location of the previous definition In file included from /home/reinhard/embedded/u-boot/include/config.h:4, from /home/reinhard/embedded/u-boot/include/common.h:37, from stmicro.c:30:
because of that recursion: +ifdef CONFIG_SYS_ARM_WITHOUT_RELOC +PLATFORM_CPPFLAGS += -DCONFIG_SYS_ARM_WITHOUT_RELOC +endif
The code size changes from 223592 to 229792 which is more acceptable, but I still does crash (I will look into that soon).
Reinhard

Dear Reinhard Meyer,
In message 4C98BEC7.9090500@emk-elektronik.de you wrote:
should be switchable OFF by a configuration option. Who does need that relocation in the first place? For years ARM did work without it; why now blowing up the code?
Maintenancewise, the relocation is needed to allow to merge the ARM code back into a common source tree with other architectures.
Technically, it is needed to be able to run a single U-Boot binary image on systems where you don;t know the exact RAM size at compile time (say on boards that come with more than one RAM size configurations), or to dynamically adapt to things like PRAM, frame buffer memory, syslog buffer, etc.
Yes, ARM kind of worked without that for years, but it has always been a PITA for many of us.
Best regards,
Wolfgang Denk

Dear Albert ARIBAUD,
In message 4C98BA84.9040104@free.fr you wrote:
It should be always possible to #define relocation off!
On arm926ejs this is controlled by CONFIG_SKIP_LOWLEVEL_INIT and CONFIG_SKIP_RELOCATE_UBOOT. For instance, openrd_base, a kirkwood board, always skips lowlevel inits.
You cannot use CONFIG_SKIP_RELOCATE_UBOOT they way you used to do it.
Best regards,
Wolfgang Denk

Le 21/09/2010 16:00, Albert ARIBAUD a écrit :
This is due to the fact that each routine has to recompute the PIC register. As a test, I tried adding -msingle-pic-base to -fPIC (this computes the PIC register once for the whole code) and the code size falls back to 123764 bytes, 'only' 5.5% more than the non relocating case. Using -fPIE -pie -msingle-pic-base lowers this again to 5.2%.
Of course you cannot just turn -msingle-pic-base on; you've got to have the code in start.S that computes the register. Also, switching from PIC to PIE needs to be verified. I've got the code in my local tree, but it's not tested yet. I'll test it tonight and post it if it works.
I haven't fully tested yet, but I've given a look at -fPIC vs -fPIE vs -fPIE -pie, with and without -msingle-pic-base, and here are the results; note that the following numbers were measured on the edminiv2 config (I'd previously used a not-yet-submitted net5big config).
Compiling u-boot with '#define CONFIG_SYS_ARM_WITHOUT_RELOC' yields a 133700 byte u-boot.bin whereas '#undef CONFIG_SYS_ARM_WITHOUT_RELOC' yields 145588, an increase of +8.9% -- AIUI this increase is due only to relocation changes as all the rest is unchanged, including Wolfgang's environment patches. This increase is explained by the fact that all global object accesses now need an additional indirection through the GOT.
Replacing -fPIC with -fPIE reduces size down to 144856 (+8.3%), by replacing the GOT indirection for initialized data with sl-relative addressing (uninitialized data still go through the GOT). This has the added marginal benefit that the GOT shrinks from 520 down to 296 bytes. The condition for -fPIE to work is that the relative position of .data to .text must remain constant across relocations, which is the case for u-boot.
Unfortunately I haven't seen any gcc/ld option which would replace indirection by sl-relative addressing for .bss.
Adding linker option -pie to -fPIE has no noticeable effect on size, though it does move things around -- I'll have to dig deeper into this one; until I know its effects, I'll just not add it.
Finally, adding -msingle-pic-base to any of the previous obviously reduces code size by not recomputing the pic base register in every function; for -fPIE -msingle-pic-base, the u-boot size shrinks down to 140992 (+5.5%). Since this recomputation is useful only if relocation scatters .text, which is not a requirement (that I know of) for u-boot, we can safely add -msingle-pic-base, provided start.S computes the pic base register.
As for implementing these changes: going from -fPIC to -fPIE can be done at ARM level because it only requires modifying PLATFORM_RELFLAGS in arch/arm/config.mk. Adding -msingle-pic-base, though, requires a change to start.S for computing the pic base into r10 and r9 (only one of these will be used, depending on whether stack checking is enabled or not); since start.S is cpu-specific, the change must be at cpu, not arch, level. I'll provide a tested patch for arm926ejs along with the -fPIE patch; for other arm cpus, I can provide a patch but not test it.
Amicalement,

Hi Reinhard,
On Tuesday 21 September 2010 14:39:41 Reinhard Meyer wrote:
Rebasing my current TOP9000 port on u-boot/next compiles after defining CONFIG_SYS_SDRAM_BASE and CONFIG_SYS_INIT_SP_ADDR. Code size increased heavyly from 223592 to 245544.
Please note that this increase is not only because of the new ARM relocation support, but the environment rework done by Wolfgang:
http://lists.denx.de/pipermail/u-boot/2010-July/074125.html
As stated in this mail, the code size increase is typically 5...7KiB.
Cheers, Stefan
-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: office@denx.de

Stefan Roese schrieb:
Please note that this increase is not only because of the new ARM relocation support, but the environment rework done by Wolfgang:
Yes, that, too. About 5.5k
next w/o relocation (#define CONFIG_SYS_ARM_WITHOUT_RELOC) and w/o cache (#define CONFIG_SYS_NO_[DI]CACHE): 229440 master: 223976 delta: 5464.
Cache on costs 352 bytes.
Reinhard

Dear Reinhard Meyer,
In message 4C98A78D.7070407@emk-elektronik.de you wrote:
What bothers me really here is the huge increase in code size.
As has been pointed out by others, there are several factors that contribute to that code.
And, on almost all AT91 systems booting will be through a first boot loader, which sets up SDRAM, loads u-boot to the "correct" address and jumps to it.
When you have to support multiple memory configurations that "correct" address is typically right in the middle of your RAM.
Assume you have a system with 64 or 128 MB of RAM. In the old way, U-Boot will probably sit somewhere at offset 63 MB or so, close to the end of the "small" configuration.
On the big board this is neatly splitting the RAM into two small chunks - please explain to a customer why he cannot load a 64 MB image to RAM when there are 128 MB of RAM on his board, 127 MB of these actually unused?
r why you need multiple binary images of U-Boot if you want to initialize and pass some memory at the end of RAM for further use in Linux, say frame buffer, or PRAM, or a log buffer?
All low level init and relocation is not required in such cases.
Relocation is still needed.
Best regards,
Wolfgang Denk
participants (4)
-
Albert ARIBAUD
-
Reinhard Meyer
-
Stefan Roese
-
Wolfgang Denk