
Le 21/09/2010 16:00, Albert ARIBAUD a écrit :
This is due to the fact that each routine has to recompute the PIC register. As a test, I tried adding -msingle-pic-base to -fPIC (this computes the PIC register once for the whole code) and the code size falls back to 123764 bytes, 'only' 5.5% more than the non relocating case. Using -fPIE -pie -msingle-pic-base lowers this again to 5.2%.
Of course you cannot just turn -msingle-pic-base on; you've got to have the code in start.S that computes the register. Also, switching from PIC to PIE needs to be verified. I've got the code in my local tree, but it's not tested yet. I'll test it tonight and post it if it works.
I haven't fully tested yet, but I've given a look at -fPIC vs -fPIE vs -fPIE -pie, with and without -msingle-pic-base, and here are the results; note that the following numbers were measured on the edminiv2 config (I'd previously used a not-yet-submitted net5big config).
Compiling u-boot with '#define CONFIG_SYS_ARM_WITHOUT_RELOC' yields a 133700 byte u-boot.bin whereas '#undef CONFIG_SYS_ARM_WITHOUT_RELOC' yields 145588, an increase of +8.9% -- AIUI this increase is due only to relocation changes as all the rest is unchanged, including Wolfgang's environment patches. This increase is explained by the fact that all global object accesses now need an additional indirection through the GOT.
Replacing -fPIC with -fPIE reduces size down to 144856 (+8.3%), by replacing the GOT indirection for initialized data with sl-relative addressing (uninitialized data still go through the GOT). This has the added marginal benefit that the GOT shrinks from 520 down to 296 bytes. The condition for -fPIE to work is that the relative position of .data to .text must remain constant across relocations, which is the case for u-boot.
Unfortunately I haven't seen any gcc/ld option which would replace indirection by sl-relative addressing for .bss.
Adding linker option -pie to -fPIE has no noticeable effect on size, though it does move things around -- I'll have to dig deeper into this one; until I know its effects, I'll just not add it.
Finally, adding -msingle-pic-base to any of the previous obviously reduces code size by not recomputing the pic base register in every function; for -fPIE -msingle-pic-base, the u-boot size shrinks down to 140992 (+5.5%). Since this recomputation is useful only if relocation scatters .text, which is not a requirement (that I know of) for u-boot, we can safely add -msingle-pic-base, provided start.S computes the pic base register.
As for implementing these changes: going from -fPIC to -fPIE can be done at ARM level because it only requires modifying PLATFORM_RELFLAGS in arch/arm/config.mk. Adding -msingle-pic-base, though, requires a change to start.S for computing the pic base into r10 and r9 (only one of these will be used, depending on whether stack checking is enabled or not); since start.S is cpu-specific, the change must be at cpu, not arch, level. I'll provide a tested patch for arm926ejs along with the -fPIE patch; for other arm cpus, I can provide a patch but not test it.
Amicalement,