[U-Boot] [PATCH] mpc83xx: Add -fpic relocation support

This add relocation of .got entries produced by -fpic. -fpic produces 2-3% smaller code and is faster. Unfortunately gcc promotes -fpic to -fPIC when -mrelocatable is used so one need a very small patch to gcc too(sent upstream).
Signed-off-by: Joakim Tjernlund Joakim.Tjernlund@transmode.se --- arch/powerpc/cpu/mpc83xx/start.S | 18 ++++++++++++++++++ arch/powerpc/cpu/mpc83xx/u-boot.lds | 1 + 2 files changed, 19 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/cpu/mpc83xx/start.S b/arch/powerpc/cpu/mpc83xx/start.S index c7d85a8..95ae1d8 100644 --- a/arch/powerpc/cpu/mpc83xx/start.S +++ b/arch/powerpc/cpu/mpc83xx/start.S @@ -69,6 +69,8 @@ */ START_GOT GOT_ENTRY(_GOT2_TABLE_) + GOT_ENTRY(_GOT_TABLE_) + GOT_ENTRY(_GLOBAL_OFFSET_TABLE_) GOT_ENTRY(__bss_start) GOT_ENTRY(_end)
@@ -951,6 +953,22 @@ in_ram: stw r0,0(r3) 2: bdnz 1b
+ lwz r4,GOT(_GLOBAL_OFFSET_TABLE_) + addi r4,r4,-4 /* don't write over blrl in GOT */ + lwz r3,GOT(_GOT_TABLE_) + subf. r4,r3,r4 /* r4 - r3 */ + ble 3f + srwi r4,r4,2 /* r4/4 */ + mr r5,r11 + mtctr r4 + addi r3,r3,-4 +1: lwzu r0,4(r3) + cmpwi r0,0 + beq- 2f + add r0,r0,r11 + stw r0,0(r3) +2: bdnz 1b +3: #ifndef CONFIG_NAND_SPL /* * Now adjust the fixups and the pointers to the fixups diff --git a/arch/powerpc/cpu/mpc83xx/u-boot.lds b/arch/powerpc/cpu/mpc83xx/u-boot.lds index 0b74a13..a498a37 100644 --- a/arch/powerpc/cpu/mpc83xx/u-boot.lds +++ b/arch/powerpc/cpu/mpc83xx/u-boot.lds @@ -67,6 +67,7 @@ SECTIONS PROVIDE (erotext = .); .reloc : { + _GOT_TABLE_ = .; *(.got) _GOT2_TABLE_ = .; *(.got2)

Dear Joakim Tjernlund,
In message 1286887081-23172-1-git-send-email-Joakim.Tjernlund@transmode.se you wrote:
This add relocation of .got entries produced by -fpic. -fpic produces 2-3% smaller code and is faster. Unfortunately gcc promotes -fpic to -fPIC when -mrelocatable is used so one need a very small patch to gcc too(sent upstream).
What happens when one uses this patch in combination with a "standard" (i. e. older, unpatched) GCC?
START_GOT GOT_ENTRY(_GOT2_TABLE_)
- GOT_ENTRY(_GOT_TABLE_)
- GOT_ENTRY(_GLOBAL_OFFSET_TABLE_) GOT_ENTRY(__bss_start) GOT_ENTRY(_end)
@@ -951,6 +953,22 @@ in_ram: stw r0,0(r3) 2: bdnz 1b
- lwz r4,GOT(_GLOBAL_OFFSET_TABLE_)
What exactly is _GLOBAL_OFFSET_TABLE_ good for, and how does it differ from _GOT_TABLE_ ?
Best regards,
Wolfgang Denk

Wolfgang Denk wd@denx.de wrote on 2010/10/12 14:52:18:
Dear Joakim Tjernlund,
In message
1286887081-23172-1-git-send-email-Joakim.Tjernlund@transmode.se you wrote:
This add relocation of .got entries produced by -fpic. -fpic produces 2-3% smaller code and is faster. Unfortunately gcc promotes -fpic to -fPIC when -mrelocatable is used so one need a very small patch to gcc too(sent upstream).
What happens when one uses this patch in combination with a "standard" (i. e. older, unpatched) GCC?
Nothing, gcc will produce -fPIC relocs and the code will/should just work.
START_GOT GOT_ENTRY(_GOT2_TABLE_)
- GOT_ENTRY(_GOT_TABLE_)
- GOT_ENTRY(_GLOBAL_OFFSET_TABLE_) GOT_ENTRY(__bss_start) GOT_ENTRY(_end)
@@ -951,6 +953,22 @@ in_ram: stw r0,0(r3) 2: bdnz 1b
- lwz r4,GOT(_GLOBAL_OFFSET_TABLE_)
What exactly is _GLOBAL_OFFSET_TABLE_ good for, and how does it differ from _GOT_TABLE_ ?
_GLOBAL_OFFSET_TABLE_ is a predefined symbol that the linker defines to be in the middle of the -fpic GOT table. It marks the end of the GOT table as far as we are concerned(u-boot does not generate so many relocs that the linker needs to use the space above _GLOBAL_OFFSET_TABLE_)
There is no predefined symbol that marks the start of fpic relocs so one is added(_GOT_TABLE_) by me to the linker script.
Jocke

Dear Joakim Tjernlund,
In message OF4DCFBD28.58E81A84-ONC12577BA.0047150B-C12577BA.0047D319@transmode.se you wrote:
What happens when one uses this patch in combination with a "standard" (i. e. older, unpatched) GCC?
Nothing, gcc will produce -fPIC relocs and the code will/should just work.
OK, so your change means effectively a no-op (except for the moderate size increase in start.S) to most of us?
You mentioned -fpic was smaller and faster; do you have any numbers for that (especailly for the faster part) ?
_GLOBAL_OFFSET_TABLE_ is a predefined symbol that the linker defines to be in the middle of the -fpic GOT table. It marks the end of the GOT table as far as we are concerned(u-boot does not generate so many relocs that the linker needs to use the space above _GLOBAL_OFFSET_TABLE_)
There is no predefined symbol that marks the start of fpic relocs so one is added(_GOT_TABLE_) by me to the linker script.
OK - can you please include these explanations into the commit message? Thanks.
Best regards,
Wolfgang Denk

Wolfgang Denk wd@denx.de wrote on 2010/10/12 15:47:19:
Dear Joakim Tjernlund,
In message <OF4DCFBD28.58E81A84-ONC12577BA.0047150B-C12577BA. 0047D319@transmode.se> you wrote:
What happens when one uses this patch in combination with a
"standard"
(i. e. older, unpatched) GCC?
Nothing, gcc will produce -fPIC relocs and the code will/should just
work.
OK, so your change means effectively a no-op (except for the moderate size increase in start.S) to most of us?
Yes.
You mentioned -fpic was smaller and faster; do you have any numbers for that (especailly for the faster part) ?
No, but I can show a code fragment,
char *f() { return "string"; }
-fPIC -mplt-bss: .LC1: .long .LC0 .section ".text" .align 2 .globl f .LCL0: .long .LCTOC1-.LCF0 .type f, @function f: stwu 1,-16(1) mflr 0 bcl 20,31,.LCF0 .LCF0: stw 30,8(1) mflr 30 stw 0,20(1) lwz 0,.LCL0-.LCF0(30) add 30,0,30 lwz 0,20(1) lwz 3,.LC1-.LCTOC1(30) mtlr 0 lwz 30,8(1) addi 1,1,16 blr .LC0: .string "string" .ident "GCC: (Gentoo 4.4.4-r2 p1.2, pie-0.4.5) 4.4.4" .section .note.GNU-stack,"",@progbits
-fpic -mbss-plt: f: stwu 1,-16(1) mflr 12 bl _GLOBAL_OFFSET_TABLE_@local-4 stw 30,8(1) mflr 30 mtlr 12 lwz 3,.LC0@got(30) lwz 30,8(1) addi 1,1,16 blr .LC0: .string "string" .ident "GCC: (Gentoo 4.4.4-r2 p1.2, pie-0.4.5) 4.4.4" .section .note.GNU-stack,"",@progbits
_GLOBAL_OFFSET_TABLE_ is a predefined symbol that the linker defines
to
be in the middle of the -fpic GOT table. It marks the end of the GOT
table
as far as we are concerned(u-boot does not generate so many relocs
that
the linker needs to use the space above _GLOBAL_OFFSET_TABLE_)
There is no predefined symbol that marks the start of fpic relocs so one is added(_GOT_TABLE_) by me to the linker script.
OK - can you please include these explanations into the commit message? Thanks.
Will do.
Jocke

On Tue, 12 Oct 2010 16:10:33 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
Wolfgang Denk wd@denx.de wrote on 2010/10/12 15:47:19:
Dear Joakim Tjernlund,
In message <OF4DCFBD28.58E81A84-ONC12577BA.0047150B-C12577BA. 0047D319@transmode.se> you wrote:
What happens when one uses this patch in combination with a
"standard"
(i. e. older, unpatched) GCC?
Nothing, gcc will produce -fPIC relocs and the code will/should just
work.
OK, so your change means effectively a no-op (except for the moderate size increase in start.S) to most of us?
Yes.
that moderate size increase in start.S breaks nand builds:
Configuring for MPC8313ERDB_NAND_66 board... start.o:(.got2+0x4): undefined reference to `_GOT_TABLE_' make[1]: *** [/home/r1aaha/git/u-boot/nand_spl/u-boot-spl] Error 1
...
Configuring for MPC8315ERDB_NAND board... powerpc-linux-gnu-ld: NAND bootstrap too big powerpc-linux-gnu-ld: NAND bootstrap too big start.o:(.got2+0x4): undefined reference to `_GOT_TABLE_' make[1]: *** [/home/r1aaha/git/u-boot/nand_spl/u-boot-spl] Error 1
(I'm using powerpc-linux-gnu-gcc (Sourcery G++ Lite 4.3-74) 4.3.2 atm).
Kim

On Tue, 12 Oct 2010 12:31:25 -0500 Kim Phillips kim.phillips@freescale.com wrote:
that moderate size increase in start.S breaks nand builds:
Configuring for MPC8313ERDB_NAND_66 board... start.o:(.got2+0x4): undefined reference to `_GOT_TABLE_' make[1]: *** [/home/r1aaha/git/u-boot/nand_spl/u-boot-spl] Error 1
I don't think that's the size increase (though that might end up being an issue as well), but rather the NAND SPL linker script needs to be updated with _GOT_TABLE_.
-Scott

Kim Phillips kim.phillips@freescale.com wrote on 2010/10/12 19:31:25:
On Tue, 12 Oct 2010 16:10:33 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
Wolfgang Denk wd@denx.de wrote on 2010/10/12 15:47:19:
Dear Joakim Tjernlund,
In message <OF4DCFBD28.58E81A84-ONC12577BA.0047150B-C12577BA. 0047D319@transmode.se> you wrote:
What happens when one uses this patch in combination with a
"standard"
(i. e. older, unpatched) GCC?
Nothing, gcc will produce -fPIC relocs and the code will/should
just
work.
OK, so your change means effectively a no-op (except for the
moderate
size increase in start.S) to most of us?
Yes.
that moderate size increase in start.S breaks nand builds:
Configuring for MPC8313ERDB_NAND_66 board... start.o:(.got2+0x4): undefined reference to `_GOT_TABLE_' make[1]: *** [/home/r1aaha/git/u-boot/nand_spl/u-boot-spl] Error 1
ehh, these got there own linker scripts it seems I could #ifdef NAND_SPL I guess? Or possbly select one of GUT/GOT2 based on #if __pic__ == 1
...
Configuring for MPC8315ERDB_NAND board... powerpc-linux-gnu-ld: NAND bootstrap too big powerpc-linux-gnu-ld: NAND bootstrap too big start.o:(.got2+0x4): undefined reference to `_GOT_TABLE_' make[1]: *** [/home/r1aaha/git/u-boot/nand_spl/u-boot-spl] Error 1
How much is missing?
One thing I wonder about: How come NAND_SPL need GOT2 relocs but no FIXUPs? I figured either none or both.
Jocke
(I'm using powerpc-linux-gnu-gcc (Sourcery G++ Lite 4.3-74) 4.3.2 atm).
Kim

On Tue, 12 Oct 2010 19:41:56 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
Kim Phillips kim.phillips@freescale.com wrote on 2010/10/12 19:31:25:
that moderate size increase in start.S breaks nand builds:
Configuring for MPC8313ERDB_NAND_66 board... start.o:(.got2+0x4): undefined reference to `_GOT_TABLE_' make[1]: *** [/home/r1aaha/git/u-boot/nand_spl/u-boot-spl] Error 1
ehh, these got there own linker scripts it seems I could #ifdef NAND_SPL I guess? Or possbly select one of GUT/GOT2 based on #if __pic__ == 1
I think NAND_SPL would be clearer, assuming no other differences are involved.
Configuring for MPC8315ERDB_NAND board... powerpc-linux-gnu-ld: NAND bootstrap too big powerpc-linux-gnu-ld: NAND bootstrap too big start.o:(.got2+0x4): undefined reference to `_GOT_TABLE_' make[1]: *** [/home/r1aaha/git/u-boot/nand_spl/u-boot-spl] Error 1
How much is missing?
I can't tell due to the undefined reference error.
One thing I wonder about: How come NAND_SPL need GOT2 relocs but no FIXUPs? I figured either none or both.
I don't know; apparently FIXUPs aren't needed?
Kim

On Tue, 12 Oct 2010 13:19:38 -0500 Kim Phillips kim.phillips@freescale.com wrote:
On Tue, 12 Oct 2010 19:41:56 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
Kim Phillips kim.phillips@freescale.com wrote on 2010/10/12 19:31:25:
that moderate size increase in start.S breaks nand builds:
Configuring for MPC8313ERDB_NAND_66 board... start.o:(.got2+0x4): undefined reference to `_GOT_TABLE_' make[1]: *** [/home/r1aaha/git/u-boot/nand_spl/u-boot-spl] Error 1
ehh, these got there own linker scripts it seems I could #ifdef NAND_SPL I guess? Or possbly select one of GUT/GOT2 based on #if __pic__ == 1
I think NAND_SPL would be clearer, assuming no other differences are involved.
Why? The type of PIC is the distinction. If it can be determined with __pic__, wouldn't that also avoid the extra code being present in the main U-Boot if an older toolchain is used and we end up with -fPIC? And there could be other types of SPL besides NAND.
The linker scripts for NAND SPL would still have to be updated, though, or else wouldn't it break with a new toolchain that actually uses -fpic? I assume we're not passing different flags when building the SPL.
-Scott

On Tue, 12 Oct 2010 13:25:40 -0500 Scott Wood scottwood@freescale.com wrote:
On Tue, 12 Oct 2010 13:19:38 -0500 Kim Phillips kim.phillips@freescale.com wrote:
On Tue, 12 Oct 2010 19:41:56 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
Kim Phillips kim.phillips@freescale.com wrote on 2010/10/12 19:31:25:
that moderate size increase in start.S breaks nand builds:
Configuring for MPC8313ERDB_NAND_66 board... start.o:(.got2+0x4): undefined reference to `_GOT_TABLE_' make[1]: *** [/home/r1aaha/git/u-boot/nand_spl/u-boot-spl] Error 1
ehh, these got there own linker scripts it seems I could #ifdef NAND_SPL I guess? Or possbly select one of GUT/GOT2 based on #if __pic__ == 1
I think NAND_SPL would be clearer, assuming no other differences are involved.
Why? The type of PIC is the distinction. If it can be determined with __pic__, wouldn't that also avoid the extra code being present in the main U-Boot if an older toolchain is used and we end up with -fPIC? And there could be other types of SPL besides NAND.
that's true - I was going for more reader consistency wrt the current code.
The linker scripts for NAND SPL would still have to be updated, though, or else wouldn't it break with a new toolchain that actually uses -fpic? I assume we're not passing different flags when building the SPL.
we're not.
Kim

Scott Wood scottwood@freescale.com wrote on 2010/10/12 20:25:40:
On Tue, 12 Oct 2010 13:19:38 -0500 Kim Phillips kim.phillips@freescale.com wrote:
On Tue, 12 Oct 2010 19:41:56 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
Kim Phillips kim.phillips@freescale.com wrote on 2010/10/12
19:31:25:
that moderate size increase in start.S breaks nand builds:
Configuring for MPC8313ERDB_NAND_66 board... start.o:(.got2+0x4): undefined reference to `_GOT_TABLE_' make[1]: *** [/home/r1aaha/git/u-boot/nand_spl/u-boot-spl] Error 1
ehh, these got there own linker scripts it seems I could #ifdef NAND_SPL I guess? Or possbly select one of GUT/GOT2 based on #if __pic__ == 1
I think NAND_SPL would be clearer, assuming no other differences are involved.
Why? The type of PIC is the distinction. If it can be determined with __pic__, wouldn't that also avoid the extra code being present in the main U-Boot if an older toolchain is used and we end up with -fPIC? And there could be other types of SPL besides NAND.
The PIC type can be different for various files but u-boot doesn't do that and I don't see why it should so we should be fine.
The linker scripts for NAND SPL would still have to be updated, though, or else wouldn't it break with a new toolchain that actually uses -fpic? I assume we're not passing different flags when building the SPL.
Yes, it is a simple symbol to add. I will do it twm if you don't beat me to it. it would be nice if you could try what works and not though.
Jocke

Yes, it is a simple symbol to add. I will do it twm if you don't beat me to it. it would be nice if you could try what works and not though.
Could not wait, does this work for you?
diff --git a/nand_spl/board/freescale/mpc8313erdb/u-boot.lds b/nand_spl/board/freescale/mpc8313erdb/u-boot.lds index ad82589..1a3e44f 100644 --- a/nand_spl/board/freescale/mpc8313erdb/u-boot.lds +++ b/nand_spl/board/freescale/mpc8313erdb/u-boot.lds @@ -38,6 +38,8 @@ SECTIONS .data : { *(.data*) *(.sdata*) + _GOT_TABLE_ = .; + *(.got) _GOT2_TABLE_ = .; *(.got2) __got2_entries = (. - _GOT2_TABLE_) >> 2;

On Tue, 12 Oct 2010 21:17:38 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
Yes, it is a simple symbol to add. I will do it twm if you don't beat me to it. it would be nice if you could try what works and not though.
Could not wait, does this work for you?
diff --git a/nand_spl/board/freescale/mpc8313erdb/u-boot.lds b/nand_spl/board/freescale/mpc8313erdb/u-boot.lds index ad82589..1a3e44f 100644 --- a/nand_spl/board/freescale/mpc8313erdb/u-boot.lds +++ b/nand_spl/board/freescale/mpc8313erdb/u-boot.lds @@ -38,6 +38,8 @@ SECTIONS .data : { *(.data*) *(.sdata*)
_GOT_TABLE_ = .;
*(.got) _GOT2_TABLE_ = .; *(.got2) __got2_entries = (. - _GOT2_TABLE_) >> 2;
it passes a build test on the 8313, but applying the same change to the 8315 still fails to build with bootstrap too big errors, because it still suffers from the extra bits. At least now it's buildable, so we have some size info for the curious:
before (fits):
$ size ./nand_spl/board/freescale/mpc8315erdb/start.o text data bss dec hex filename 1528 12 0 1540 604 ./nand_spl/board/freescale/mpc8315erdb/start.o
after (too big):
$ size ./nand_spl/board/freescale/mpc8315erdb/start.o text data bss dec hex filename 1588 20 0 1608 648 ./nand_spl/board/freescale/mpc8315erdb/start.o
and there are other 83xx nand boards; please at least MAKEALL 83xx before resubmitting.
Thanks,
Kim

Kim Phillips kim.phillips@freescale.com wrote on 2010/10/12 21:54:10:
On Tue, 12 Oct 2010 21:17:38 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
Yes, it is a simple symbol to add. I will do it twm if you don't beat me to it. it would be nice if you could try what works and not though.
Could not wait, does this work for you?
diff --git a/nand_spl/board/freescale/mpc8313erdb/u-boot.lds b/nand_spl/board/freescale/mpc8313erdb/u-boot.lds index ad82589..1a3e44f 100644 --- a/nand_spl/board/freescale/mpc8313erdb/u-boot.lds +++ b/nand_spl/board/freescale/mpc8313erdb/u-boot.lds @@ -38,6 +38,8 @@ SECTIONS .data : { *(.data*) *(.sdata*)
_GOT_TABLE_ = .;
*(.got) _GOT2_TABLE_ = .; *(.got2) __got2_entries = (. - _GOT2_TABLE_) >> 2;
it passes a build test on the 8313, but applying the same change to the 8315 still fails to build with bootstrap too big errors, because it still suffers from the extra bits. At least now it's buildable, so we have some size info for the curious:
before (fits):
$ size ./nand_spl/board/freescale/mpc8315erdb/start.o text data bss dec hex filename 1528 12 0 1540 604
./nand_spl/board/freescale/
mpc8315erdb/start.o
after (too big):
$ size ./nand_spl/board/freescale/mpc8315erdb/start.o text data bss dec hex filename 1588 20 0 1608 648
./nand_spl/board/freescale/
mpc8315erdb/start.o
and there are other 83xx nand boards; please at least MAKEALL 83xx before resubmitting.
Any idea if SPL is size optimized to death already or if there is some low hanging fruits left?
Jocke

On Tue, 12 Oct 2010 23:23:23 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
Any idea if SPL is size optimized to death already or if there is some low hanging fruits left?
There are some things that could be shrunk -- such as hardcoding the page size, removing prints, etc.
Still, it would be nice if we could use #if __pic__ == 1 to remove the extra relocation code when the toolchain isn't using it. How much shrinkage might we get out of the rest of the SPL with -fpic enabled?
-Scott

Kim Phillips kim.phillips@freescale.com wrote on 2010/10/12 21:54:10:
On Tue, 12 Oct 2010 21:17:38 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
Yes, it is a simple symbol to add. I will do it twm if you don't beat me to it. it would be nice if you could try what works and not though.
Could not wait, does this work for you?
diff --git a/nand_spl/board/freescale/mpc8313erdb/u-boot.lds b/nand_spl/board/freescale/mpc8313erdb/u-boot.lds index ad82589..1a3e44f 100644 --- a/nand_spl/board/freescale/mpc8313erdb/u-boot.lds +++ b/nand_spl/board/freescale/mpc8313erdb/u-boot.lds @@ -38,6 +38,8 @@ SECTIONS .data : { *(.data*) *(.sdata*)
_GOT_TABLE_ = .;
*(.got) _GOT2_TABLE_ = .; *(.got2) __got2_entries = (. - _GOT2_TABLE_) >> 2;
it passes a build test on the 8313, but applying the same change to the 8315 still fails to build with bootstrap too big errors, because it still suffers from the extra bits. At least now it's buildable, so we have some size info for the curious:
before (fits):
$ size ./nand_spl/board/freescale/mpc8315erdb/start.o text data bss dec hex filename 1528 12 0 1540 604
./nand_spl/board/freescale/
mpc8315erdb/start.o
after (too big):
$ size ./nand_spl/board/freescale/mpc8315erdb/start.o text data bss dec hex filename 1588 20 0 1608 648
./nand_spl/board/freescale/
mpc8315erdb/start.o
Some sizes:
with -fPIC size ../u-boot-spl text data bss dec hex filename 3980 36 0 4016 fb0 ../u-boot-spl
with -mbss-plt -fpic -msingle-pic-base size ../u-boot-spl text data bss dec hex filename 3928 0 0 3928 f58 ../u-boot-spl
with -mbss-plt -fpic size ../u-boot-spl text data bss dec hex filename 3960 0 0 3960 f78 ../u-boot-spl

Kim Phillips kim.phillips@freescale.com wrote on 2010/10/12 20:19:38:
On Tue, 12 Oct 2010 19:41:56 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
Kim Phillips kim.phillips@freescale.com wrote on 2010/10/12
19:31:25:
that moderate size increase in start.S breaks nand builds:
One thing I wonder about: How come NAND_SPL need GOT2 relocs but no FIXUPs? I figured either none or both.
I don't know; apparently FIXUPs aren't needed?
Fixups are needed unless your link address == load address. I guess you link at the load address? If so you should not need to relocate the GOT either.
Jocke

On Tue, 12 Oct 2010 21:13:19 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
Kim Phillips kim.phillips@freescale.com wrote on 2010/10/12 20:19:38:
On Tue, 12 Oct 2010 19:41:56 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
Kim Phillips kim.phillips@freescale.com wrote on 2010/10/12
19:31:25:
that moderate size increase in start.S breaks nand builds:
One thing I wonder about: How come NAND_SPL need GOT2 relocs but no FIXUPs? I figured either none or both.
I don't know; apparently FIXUPs aren't needed?
Fixups are needed unless your link address == load address. I guess you link at the load address? If so you should not need to relocate the GOT either.
We do need to relocate with NAND SPL. We start in the NAND buffer, but we have to move to RAM to free up the buffer for loading the rest of U-Boot.
-Scott

Scott Wood scottwood@freescale.com wrote on 2010/10/12 21:20:26:
On Tue, 12 Oct 2010 21:13:19 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
Kim Phillips kim.phillips@freescale.com wrote on 2010/10/12
20:19:38:
On Tue, 12 Oct 2010 19:41:56 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
Kim Phillips kim.phillips@freescale.com wrote on 2010/10/12
19:31:25:
that moderate size increase in start.S breaks nand builds:
One thing I wonder about: How come NAND_SPL need GOT2 relocs but no FIXUPs? I figured either none or both.
I don't know; apparently FIXUPs aren't needed?
Fixups are needed unless your link address == load address. I guess you link at the load address? If so you should not need to relocate the GOT either.
We do need to relocate with NAND SPL. We start in the NAND buffer, but we have to move to RAM to free up the buffer for loading the rest of U-Boot.
Sure, but do you move to a specific RAM address and is it the same as your link address?

On Tue, 12 Oct 2010 21:51:44 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
Scott Wood scottwood@freescale.com wrote on 2010/10/12 21:20:26:
We do need to relocate with NAND SPL. We start in the NAND buffer, but we have to move to RAM to free up the buffer for loading the rest of U-Boot.
Sure, but do you move to a specific RAM address and is it the same as your link address?
The link address is of the pre-relocation NAND buffer.
-Scott

Scott Wood scottwood@freescale.com wrote on 2010/10/12 22:16:14:
On Tue, 12 Oct 2010 21:51:44 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
Scott Wood scottwood@freescale.com wrote on 2010/10/12 21:20:26:
We do need to relocate with NAND SPL. We start in the NAND buffer, but we have to move to RAM to free up the buffer for loading the
rest
of U-Boot.
Sure, but do you move to a specific RAM address and is it the same as your link address?
The link address is of the pre-relocation NAND buffer.
hmm, then I don't understand how you get by. Initialized static/global ptrs should point into some random area. Perhaps you do something special(such as leaving a copy of u-boot at its link address)?
Jocke

On Tue, 12 Oct 2010 22:40:27 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
Scott Wood scottwood@freescale.com wrote on 2010/10/12 22:16:14:
The link address is of the pre-relocation NAND buffer.
hmm, then I don't understand how you get by. Initialized static/global ptrs should point into some random area.
It suspect we just don't have any such pointers in the SPL.
Perhaps you do something special(such as leaving a copy of u-boot at its link address)?
No, that buffer is re-used for loading additional data via NAND once we relocate out of it.
-Scott

Scott Wood scottwood@freescale.com wrote on 2010/10/12 22:48:59:
On Tue, 12 Oct 2010 22:40:27 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
Scott Wood scottwood@freescale.com wrote on 2010/10/12 22:16:14:
The link address is of the pre-relocation NAND buffer.
hmm, then I don't understand how you get by. Initialized static/global ptrs should point into some random area.
It suspect we just don't have any such pointers in the SPL.
Good for you :)
One could ban these in all of u-boot too. The relocation would be much simpler for those arch's lacking -mrelocatable.
I suspect there are some low hanging fruits to be picked too: char *mystr = "hello" can often be rewritten as char mystr[] = "hello" which won't cause fixups and smaller code too.

On Tue, 12 Oct 2010 15:04:31 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
_GLOBAL_OFFSET_TABLE_ is a predefined symbol that the linker defines to be in the middle of the -fpic GOT table. It marks the end of the GOT table as far as we are concerned(u-boot does not generate so many relocs that the linker needs to use the space above _GLOBAL_OFFSET_TABLE_)
There is no predefined symbol that marks the start of fpic relocs so one is added(_GOT_TABLE_) by me to the linker script.
Maybe call it _GOT_START_ or similar? _GLOBAL_OFFSET_TABLE_ and _GOT_TABLE_[1] look like synonyms.
-Scott
[1] Global offset table table? :-)

Scott Wood scottwood@freescale.com wrote on 2010/10/12 17:52:58:
On Tue, 12 Oct 2010 15:04:31 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
_GLOBAL_OFFSET_TABLE_ is a predefined symbol that the linker defines
to
be in the middle of the -fpic GOT table. It marks the end of the GOT
table
as far as we are concerned(u-boot does not generate so many relocs
that
the linker needs to use the space above _GLOBAL_OFFSET_TABLE_)
There is no predefined symbol that marks the start of fpic relocs so one is added(_GOT_TABLE_) by me to the linker script.
Maybe call it _GOT_START_ or similar? _GLOBAL_OFFSET_TABLE_ and _GOT_TABLE_[1] look like synonyms.
hmm, the other reloc syms are named _GOT2_TABLE_ and _FIXUP_TABLE_ so I think I should follow that.
-Scott
[1] Global offset table table? :-)
:)
Figured I should mention that I have added -msingle-pic-base(from ARM) which works nicely with -fpic(not sure if -fPIC is possible) and reduces size even more:
size u-boot-before text data bss dec hex filename 230595 6580 24228 261403 3fd1b u-boot size u-boot-after text data bss dec hex filename 222779 6580 24228 253587 3de93 u-boot
If you have 8 KB free DPRAM/cache one can move the GOT tables there while in flash and create true PIC
Jocke

Le 12/10/2010 19:11, Joakim Tjernlund a écrit :
Figured I should mention that I have added -msingle-pic-base(from ARM) which works nicely with -fpic(not sure if -fPIC is possible) and reduces size even more:
Since you seem to be following the same path as I did on ARM, I may as well ask: did you try removing -fPIC and -msingle-pic-base from compile options and adding -pie to the link options instead?
Link option -pie generates ELF relocation and, on ARM at least, does a better job than GOT reloc, which does not fix handle pointers in initialized data while ELF reloc fixes them.
And since ELF reloc does not modify code (it is a linker option), you end up with the same size for text+data+rodata. You do have a bigger FLASH image though, because the ELF reloc tables are bigger than the GOT table; but you can git rid of them / not copy them to RAM once relocated.
The move from -fPIC to ELF on ARM can be looked for in the elf_reloc branch of the u-boot-arm repo.
Amicalement,

Le 12/10/2010 19:11, Joakim Tjernlund a écrit :
Figured I should mention that I have added -msingle-pic-base(from ARM) which works nicely with -fpic(not sure if -fPIC is possible) and reduces
size
even more:
Since you seem to be following the same path as I did on ARM, I may as well ask: did you try removing -fPIC and -msingle-pic-base from compile options and adding -pie to the link options instead?
looked at it briefly but -pie is really massive. Each access needs a reloc entry, even if they access the same data.
Link option -pie generates ELF relocation and, on ARM at least, does a better job than GOT reloc, which does not fix handle pointers in initialized data while ELF reloc fixes them.
on ppc -mrelocatable does the job for you and adds fixup relocs. It a simple addon that should be fairly easy to add to other archs too.
And since ELF reloc does not modify code (it is a linker option), you
ehh, I think you need to reloc directly in the text segment.
end up with the same size for text+data+rodata. You do have a bigger FLASH image though, because the ELF reloc tables are bigger than the GOT
table; but you can git rid of them / not copy them to RAM once
relocated.
I don't think RAM is as much as a problem as flash is.
The move from -fPIC to ELF on ARM can be looked for in the elf_reloc branch of the u-boot-arm repo.
Yes, but I believe the ppc way is smaller once -fpic and -msingle-pic-base are used(In flash anyway). Also, I don't think you will be able to do true PIC in the future without PIC code.
Jocke

Le 12/10/2010 20:11, Joakim Tjernlund a écrit :
Le 12/10/2010 19:11, Joakim Tjernlund a écrit :
Figured I should mention that I have added -msingle-pic-base(from ARM) which works nicely with -fpic(not sure if -fPIC is possible) and reduces
size
even more:
Since you seem to be following the same path as I did on ARM, I may as well ask: did you try removing -fPIC and -msingle-pic-base from compile options and adding -pie to the link options instead?
looked at it briefly but -pie is really massive. Each access needs a reloc entry, even if they access the same data.
OTOH, the accesses are as simple as without reloc, i.e. no indirection as GOT introduces. What is the size of the .rel.dyn and .dynsym sections?
Link option -pie generates ELF relocation and, on ARM at least, does a better job than GOT reloc, which does not fix handle pointers in initialized data while ELF reloc fixes them.
on ppc -mrelocatable does the job for you and adds fixup relocs. It a simple addon that should be fairly easy to add to other archs too.
It does not exist on ARM targets whereas -pie is general.
And since ELF reloc does not modify code (it is a linker option), you
ehh, I think you need to reloc directly in the text segment.
I meant that it does not cause the compiler to generate a different code, whereas GOT relocation generates a different code, which causes the text section to grow.
end up with the same size for text+data+rodata. You do have a bigger FLASH image though, because the ELF reloc tables are bigger than the GOT
table; but you can git rid of them / not copy them to RAM once
relocated.
I don't think RAM is as much as a problem as flash is.
Indeed in some cases it isnt; but you gain some boot time if you don't have to copy the relocation table along with the code.
The move from -fPIC to ELF on ARM can be looked for in the elf_reloc branch of the u-boot-arm repo.
Yes, but I believe the ppc way is smaller once -fpic and -msingle-pic-base are used(In flash anyway). Also, I don't think you will be able to do true PIC in the future without PIC code.
Problem is, -fPIC / -fPIE (I tried both) is not really position independent either, and requires ugly manual relocation. Besides, for the moment, true position independence is not required, although I'd like at least the u-boot FLASH startup code to be.
I do understand, though, that ppc and arm may not share a common optimal relocation method.
Amicalement,

Albert ARIBAUD albert.aribaud@free.fr wrote on 2010/10/12 22:37:54:
Le 12/10/2010 20:11, Joakim Tjernlund a écrit :
Le 12/10/2010 19:11, Joakim Tjernlund a écrit :
Figured I should mention that I have added -msingle-pic-base(from
ARM)
which works nicely with -fpic(not sure if -fPIC is possible) and reduces
size
even more:
Since you seem to be following the same path as I did on ARM, I may
as
well ask: did you try removing -fPIC and -msingle-pic-base from
compile
options and adding -pie to the link options instead?
looked at it briefly but -pie is really massive. Each access needs a reloc entry, even if they access the same data.
OTOH, the accesses are as simple as without reloc, i.e. no indirection
On ppc that is more work than via the GOT (with -fpic at least). You need two insn to load the address to a register compared with one to get it from the GOT.
as GOT introduces. What is the size of the .rel.dyn and .dynsym
sections?
Don't have that handy.
Link option -pie generates ELF relocation and, on ARM at least, does
a
better job than GOT reloc, which does not fix handle pointers in initialized data while ELF reloc fixes them.
on ppc -mrelocatable does the job for you and adds fixup relocs. It a simple addon that should be fairly easy to add to other archs
too.
It does not exist on ARM targets whereas -pie is general.
I know, I meant you could consider adding it to ARM.
And since ELF reloc does not modify code (it is a linker option), you
ehh, I think you need to reloc directly in the text segment.
I meant that it does not cause the compiler to generate a different code, whereas GOT relocation generates a different code, which causes the text section to grow.
Not really, it is about the same(2 insn vs. 1 insn and 1 GOT entry). What builds size is the PIC prologue to load the GOT ptr, but that can be avoided with -msingle-pic-base
end up with the same size for text+data+rodata. You do have a bigger FLASH image though, because the ELF reloc tables are bigger than the
GOT
table; but you can git rid of them / not copy them to RAM once
relocated.
I don't think RAM is as much as a problem as flash is.
Indeed in some cases it isnt; but you gain some boot time if you don't have to copy the relocation table along with the code.
The move from -fPIC to ELF on ARM can be looked for in the elf_reloc branch of the u-boot-arm repo.
Yes, but I believe the ppc way is smaller once -fpic and
-msingle-pic-base
are used(In flash anyway). Also, I don't think you will be able to do true PIC in the future without PIC code.
Problem is, -fPIC / -fPIE (I tried both) is not really position independent either, and requires ugly manual relocation. Besides, for the moment, true position independence is not required, although I'd like at least the u-boot FLASH startup code to be.
hmm, can't remember but I think need -pie too(for the fixups). Then you can test with/without -fpic. If -fpic is similar to ppc -fpic, you probably get smaller code than with -fPIC. Then add -msingle-pic-base too.
I do understand, though, that ppc and arm may not share a common optimal
relocation method.
Yes, but the difference isn't really the arch. It is the -mrelocatable flag that is the big difference.

Le 12/10/2010 23:00, Joakim Tjernlund a écrit :
Yes, but the difference isn't really the arch. It is the -mrelocatable flag that is the big difference.
Not only: obviously, implementing GOT relocation is not done the same on both archs, and it simply is not beneficial on ARM wrt PPC in terms of instructions. I did a pretty extensive run of tests with and without -fPIC and -fPIE on ARM, and GOT relocation clearly makes code bigger, whereas it does not PPC.
This simply implies that -fPIC is a better choice for PPC (and hence -mrelocatable) while -fpie is a better one for ARM.
Amicalement,

Albert ARIBAUD albert.aribaud@free.fr wrote on 2010/10/13 08:30:33:
Le 12/10/2010 23:00, Joakim Tjernlund a écrit :
Yes, but the difference isn't really the arch. It is the -mrelocatable flag that is the big difference.
Not only: obviously, implementing GOT relocation is not done the same on
both archs, and it simply is not beneficial on ARM wrt PPC in terms of instructions. I did a pretty extensive run of tests with and without -fPIC and -fPIE on ARM, and GOT relocation clearly makes code bigger, whereas it does not PPC.
Did you use -msingle-pic-base too with -fpic/-fPIC? This is what makes a difference(together with -fpic). The most interesting size is the total flash size IMHO. Reducing insn's in RAM at the expense of flash is not what most users need I think.
This simply implies that -fPIC is a better choice for PPC (and hence -mrelocatable) while -fpie is a better one for ARM.
-fPIC isn't optimal(it is bigger) but until my gcc patch gets into gcc one cannot use -fpic(it gets promoted to -fPIC by gcc). -fpic is smaller but one cannot build apps has has a GOT over 32KB with that

Le 13/10/2010 09:07, Joakim Tjernlund a écrit :
Did you use -msingle-pic-base too with -fpic/-fPIC? This is what makes a difference(together with -fpic). The most interesting size is the total flash size IMHO. Reducing insn's in RAM at the expense of flash is not what most users need I think.
Yes, I did use -msingle-pic-base -- actually, I am the one who submitted the patch for ARM to that effect, precisely after all my tests :) -- but the code growth I am talking about is accesses, not setup.
This simply implies that -fPIC is a better choice for PPC (and hence -mrelocatable) while -fpie is a better one for ARM.
-fPIC isn't optimal(it is bigger) but until my gcc patch gets into gcc one cannot use -fpic(it gets promoted to -fPIC by gcc). -fpic is smaller but one cannot build apps has has a GOT over 32KB with that
You get a GOT over 32 KiB? IIRC, the reloc tables for ARM with -pie are slightly below 19 KiB for a typical u-boot; I'm surprised that a GOT would go bigger than the ELF table for the same work.
Amicalement,

Albert ARIBAUD albert.aribaud@free.fr wrote on 2010/10/13 11:05:09:
Le 13/10/2010 09:07, Joakim Tjernlund a écrit :
Did you use -msingle-pic-base too with -fpic/-fPIC? This is what makes a difference(together with -fpic). The most interesting size is the total flash size IMHO. Reducing insn's in RAM at the expense of flash is not what most users need I think.
Yes, I did use -msingle-pic-base -- actually, I am the one who submitted
the patch for ARM to that effect, precisely after all my tests :) -- but
the code growth I am talking about is accesses, not setup.
Ah, that was you. That patch made me look at adding -msingle-pic-base to ppc :)
This simply implies that -fPIC is a better choice for PPC (and hence -mrelocatable) while -fpie is a better one for ARM.
-fPIC isn't optimal(it is bigger) but until my gcc patch gets into gcc one cannot use -fpic(it gets promoted to -fPIC by gcc). -fpic is smaller but one cannot build apps has has a GOT over 32KB
with
that
You get a GOT over 32 KiB? IIRC, the reloc tables for ARM with -pie are slightly below 19 KiB for a typical u-boot; I'm surprised that a GOT would go bigger than the ELF table for the same work.
No, -fpic can handle 32KiB, if it gets bigger you have to use -fPIC. My board uses about 7.5 KiB GOT+fixups
Jocke

Albert ARIBAUD albert.aribaud@free.fr wrote on 2010/10/13 11:05:09:
Le 13/10/2010 09:07, Joakim Tjernlund a écrit :
Did you use -msingle-pic-base too with -fpic/-fPIC? This is what
makes
a difference(together with -fpic). The most interesting size is the total flash size IMHO. Reducing insn's in RAM at the expense of flash is not what most users need I think.
Yes, I did use -msingle-pic-base -- actually, I am the one who
submitted
the patch for ARM to that effect, precisely after all my tests :) --
but
the code growth I am talking about is accesses, not setup.
Ah, that was you. That patch made me look at adding -msingle-pic-base to
ppc :)
Does -msingle-pic-base work on ARM for both -fpic and -fPIC? I have a hard time making -fPIC work, I think it isn't possible with the current impl. of -fPIC on ppc.
Jocke

Le 13/10/2010 23:25, Joakim Tjernlund a écrit :
Does -msingle-pic-base work on ARM for both -fpic and -fPIC? I have a hard time making -fPIC work, I think it isn't possible with the current impl. of -fPIC on ppc.
Jocke
From what I remember, there was no difference between object files produced with -fpic and -fPIC on the ARM tests I did; -msingle-pic-base thus works identically for both.
Amicalement,

On 10/12/2010 11:30 PM, Albert ARIBAUD wrote:
Le 12/10/2010 23:00, Joakim Tjernlund a écrit :
Yes, but the difference isn't really the arch. It is the -mrelocatable flag that is the big difference.
Not only: obviously, implementing GOT relocation is not done the same on both archs, and it simply is not beneficial on ARM wrt PPC in terms of instructions. I did a pretty extensive run of tests with and without -fPIC and -fPIE on ARM, and GOT relocation clearly makes code bigger, whereas it does not PPC.
This simply implies that -fPIC is a better choice for PPC (and hence -mrelocatable) while -fpie is a better one for ARM.
Hi All, In particular, the PPC takes two 32 bit instructions to load the known address of a variable into a register. If the GOT is used, a single 32 bit instruction can load the address of a variable from the GOT table (pointed to by a "fixed" register) into a register. In both cases, there are two memory cycles, but in the GOT case, only one instruction is required. This is why the GOT based code is smaller. However, the GOT cannot be used to address constants and some other items that are not "variables". I do think that -fPIC and -fpie are not mutually incompatible. On the PPC, the GOT references would be relocated in the loop that updates the GOT and the references to constants would be relocated by the ELF relocation code. That is how shared libraries are relocated.
Best Regards, Bill Campbell
Amicalement,

On 10/12/2010 11:30 PM, Albert ARIBAUD wrote:
Le 12/10/2010 23:00, Joakim Tjernlund a écrit :
Yes, but the difference isn't really the arch. It is the
-mrelocatable
flag that is the big difference.
Not only: obviously, implementing GOT relocation is not done the same
on
both archs, and it simply is not beneficial on ARM wrt PPC in terms of instructions. I did a pretty extensive run of tests with and without -fPIC and -fPIE on ARM, and GOT relocation clearly makes code bigger, whereas it does not PPC.
This simply implies that -fPIC is a better choice for PPC (and hence -mrelocatable) while -fpie is a better one for ARM.
Hi All, In particular, the PPC takes two 32 bit instructions to load the known address of a variable into a register. If the GOT is used, a single 32 bit instruction can load the address of a variable from the GOT table (pointed to by a "fixed" register) into a register. In both cases, there are two memory cycles, but in the GOT case, only one instruction is required. This is why the GOT based code is smaller. However, the GOT cannot be used to address constants and some other items that are not "variables". I do think that -fPIC and -fpie are not mutually incompatible. On the PPC, the GOT references would be relocated
in the loop that updates the GOT and the references to constants would be relocated by the ELF relocation code. That is how shared libraries are relocated.
hmm, what constants and why would you relocate these?
Curious, what other data that are not "variables" are you thinking about? Could such data be present in u-boot too? Possibly the fixup's(initialized static ptrs)? Relocs for these are emitted with -mrelocatable for ppc and u-boot has a small routine to relocate these too.
Jocke

Dear Scott Wood,
In message 20101012105258.372089f5@udp111988uds.am.freescale.net you wrote:
Maybe call it _GOT_START_ or similar? _GLOBAL_OFFSET_TABLE_ and _GOT_TABLE_[1] look like synonyms.
...
[1] Global offset table table? :-)
Yeah. Please display the GOT table on the LCD display ;-)
Best regards,
Wolfgang Denk

This add relocation of .got entries produced by -fpic. -fpic produces 2-3% smaller code and is faster. Unfortunately gcc promotes -fpic to -fPIC when -mrelocatable is used so one need a very small patch to gcc too(sent upstream).
-fpic puts its GOT entries in .got section(s) and linker defines the symbol _GLOBAL_OFFSET_TABLE_ to point to the middle of this table. The entry at _GLOBAL_OFFSET_TABLE_-4 contains a blrl insn which is used to find the table's real address by branching to _GLOBAL_OFFSET_TABLE_-4.
Here are some size examples for my board: size with -fPIC text data bss dec hex filename 224687 14400 24228 263315 40493 u-boot
size with -mbss-plt -fPIC text data bss dec hex filename 222687 14400 24228 261315 3fcc3 u-boot
size with -mbss-plt -fpic text data bss dec hex filename 225179 6580 24228 255987 3e7f3 u-boot
size with -mbss-plt -fpic -msingle-pic-base text data bss dec hex filename 222091 6580 24228 252899 3dbe3 u-boot
Note: -msingle-pic-base is not supported upstarem yet.
Signed-off-by: Joakim Tjernlund Joakim.Tjernlund@transmode.se --- v2: Better commit msg Fix linker script for two NAND boards. Only compile the new -fpic code if compiled with -fpic to reduce size.
arch/powerpc/cpu/mpc83xx/start.S | 31 +++++++++++++++++++++- arch/powerpc/cpu/mpc83xx/u-boot.lds | 1 + nand_spl/board/freescale/mpc8313erdb/u-boot.lds | 2 + nand_spl/board/freescale/mpc8315erdb/u-boot.lds | 2 + 4 files changed, 34 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/cpu/mpc83xx/start.S b/arch/powerpc/cpu/mpc83xx/start.S index c7d85a8..8b540bc 100644 --- a/arch/powerpc/cpu/mpc83xx/start.S +++ b/arch/powerpc/cpu/mpc83xx/start.S @@ -69,6 +69,10 @@ */ START_GOT GOT_ENTRY(_GOT2_TABLE_) +#if defined(__pic__) && __pic__ == 1 + GOT_ENTRY(_GOT_TABLE_) + GOT_ENTRY(_GLOBAL_OFFSET_TABLE_) +#endif GOT_ENTRY(__bss_start) GOT_ENTRY(_end)
@@ -296,7 +300,11 @@ in_flash: /*------------------------------------------------------*/
GET_GOT /* initialize GOT access */ - +#if defined(__pic__) && __pic__ == 1 + /* Needed for upcoming -msingle-pic-base */ + bl _GLOBAL_OFFSET_TABLE_@local-4 + mflr r30 +#endif /* r3: IMMR */ lis r3, CONFIG_SYS_IMMR@h /* run low-level CPU init code (in Flash)*/ @@ -950,7 +958,26 @@ in_ram: add r0,r0,r11 stw r0,0(r3) 2: bdnz 1b - +#if defined(__pic__) && __pic__ == 1 + /* + * Relocation of *.got(-fpic) + */ + lwz r4,GOT(_GLOBAL_OFFSET_TABLE_) + addi r4,r4,-4 /* don't write over blrl in GOT */ + lwz r3,GOT(_GOT_TABLE_) + subf. r4,r3,r4 /* r4 - r3 */ + ble 3f + srwi r4,r4,2 /* r4/4 */ + mtctr r4 + addi r3,r3,-4 +1: lwzu r0,4(r3) + cmpwi r0,0 + beq- 2f + add r0,r0,r11 + stw r0,0(r3) +2: bdnz 1b +3: +#endif #ifndef CONFIG_NAND_SPL /* * Now adjust the fixups and the pointers to the fixups diff --git a/arch/powerpc/cpu/mpc83xx/u-boot.lds b/arch/powerpc/cpu/mpc83xx/u-boot.lds index 0b74a13..a498a37 100644 --- a/arch/powerpc/cpu/mpc83xx/u-boot.lds +++ b/arch/powerpc/cpu/mpc83xx/u-boot.lds @@ -67,6 +67,7 @@ SECTIONS PROVIDE (erotext = .); .reloc : { + _GOT_TABLE_ = .; *(.got) _GOT2_TABLE_ = .; *(.got2) diff --git a/nand_spl/board/freescale/mpc8313erdb/u-boot.lds b/nand_spl/board/freescale/mpc8313erdb/u-boot.lds index ad82589..1a3e44f 100644 --- a/nand_spl/board/freescale/mpc8313erdb/u-boot.lds +++ b/nand_spl/board/freescale/mpc8313erdb/u-boot.lds @@ -38,6 +38,8 @@ SECTIONS .data : { *(.data*) *(.sdata*) + _GOT_TABLE_ = .; + *(.got) _GOT2_TABLE_ = .; *(.got2) __got2_entries = (. - _GOT2_TABLE_) >> 2; diff --git a/nand_spl/board/freescale/mpc8315erdb/u-boot.lds b/nand_spl/board/freescale/mpc8315erdb/u-boot.lds index ad82589..1a3e44f 100644 --- a/nand_spl/board/freescale/mpc8315erdb/u-boot.lds +++ b/nand_spl/board/freescale/mpc8315erdb/u-boot.lds @@ -38,6 +38,8 @@ SECTIONS .data : { *(.data*) *(.sdata*) + _GOT_TABLE_ = .; + *(.got) _GOT2_TABLE_ = .; *(.got2) __got2_entries = (. - _GOT2_TABLE_) >> 2;

This adds relocation of .got entries produced by -fpic. -fpic produces 2-3% smaller code and is faster. Unfortunately gcc promotes -fpic to -fPIC when -mrelocatable is used so one need a very small patch to gcc too(sent upstream).
-fpic puts its GOT entries in .got section(s) and linker defines the symbol _GLOBAL_OFFSET_TABLE_ to point to the middle of this table. The entry at _GLOBAL_OFFSET_TABLE_-4 contains a blrl insn which is used to find the table's real address by branching to _GLOBAL_OFFSET_TABLE_-4.
Here are some size examples for my board: size with -fPIC text data bss dec hex filename 224687 14400 24228 263315 40493 u-boot
size with -mbss-plt -fPIC text data bss dec hex filename 222687 14400 24228 261315 3fcc3 u-boot
size with -mbss-plt -fpic text data bss dec hex filename 225179 6580 24228 255987 3e7f3 u-boot
size with -mbss-plt -fpic -msingle-pic-base text data bss dec hex filename 222091 6580 24228 252899 3dbe3 u-boot
Note: -msingle-pic-base is not supported upstream yet.
Signed-off-by: Joakim Tjernlund Joakim.Tjernlund@transmode.se ---
v3: - Make the new -fpic code to have zero impact when not compled with -fpic - linker __got*_entries sysm needs to be defined outside the referenced scope. Add linker sym __got_entries for -fpic relocs. Very likely more *lds scripts needs fixing but that is somebody elses problem :) - NAND SPL still don't fit for MPC8315ERDB and SIMPC8313 but these didn't fit before either. Note that my tree isn't current so it migth be fixed in master.
arch/powerpc/cpu/mpc83xx/start.S | 26 +++++++++++++++++++++- arch/powerpc/cpu/mpc83xx/u-boot.lds | 3 ++ nand_spl/board/freescale/mpc8313erdb/u-boot.lds | 7 ++++- nand_spl/board/freescale/mpc8315erdb/u-boot.lds | 7 ++++- 4 files changed, 37 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/cpu/mpc83xx/start.S b/arch/powerpc/cpu/mpc83xx/start.S index c7d85a8..c9bb0ea 100644 --- a/arch/powerpc/cpu/mpc83xx/start.S +++ b/arch/powerpc/cpu/mpc83xx/start.S @@ -296,7 +296,11 @@ in_flash: /*------------------------------------------------------*/
GET_GOT /* initialize GOT access */ - +#if defined(__pic__) && __pic__ == 1 + /* Needed for upcoming -msingle-pic-base */ + bl _GLOBAL_OFFSET_TABLE_@local-4 + mflr r30 +#endif /* r3: IMMR */ lis r3, CONFIG_SYS_IMMR@h /* run low-level CPU init code (in Flash)*/ @@ -950,7 +954,25 @@ in_ram: add r0,r0,r11 stw r0,0(r3) 2: bdnz 1b - +#if defined(__pic__) && __pic__ == 1 + /* + * Relocation of *.got(-fpic) + * + * Adjust got pointers, no need to check for 0, this code + * already puts one entry in the table. + */ + li r0,__got_entries@sectoff@l + lwz r3,_GOT_TABLE_@got(r30) + add r3,r3,r11 + mtctr r0 + addi r3,r3,-4 +1: lwzu r0,4(r3) + cmpwi r0,0 + beq- 2f + add r0,r0,r11 + stw r0,0(r3) +2: bdnz 1b +#endif #ifndef CONFIG_NAND_SPL /* * Now adjust the fixups and the pointers to the fixups diff --git a/arch/powerpc/cpu/mpc83xx/u-boot.lds b/arch/powerpc/cpu/mpc83xx/u-boot.lds index 0b74a13..8b189d9 100644 --- a/arch/powerpc/cpu/mpc83xx/u-boot.lds +++ b/arch/powerpc/cpu/mpc83xx/u-boot.lds @@ -67,12 +67,15 @@ SECTIONS PROVIDE (erotext = .); .reloc : { + _GOT_TABLE_ = .; + PROVIDE(_GLOBAL_OFFSET_TABLE_ = . + 4); *(.got) _GOT2_TABLE_ = .; *(.got2) _FIXUP_TABLE_ = .; *(.fixup) } + __got_entries = ((_GLOBAL_OFFSET_TABLE_ - _GOT_TABLE_) >> 2)-1; __got2_entries = (_FIXUP_TABLE_ - _GOT2_TABLE_) >> 2; __fixup_entries = (. - _FIXUP_TABLE_) >> 2;
diff --git a/nand_spl/board/freescale/mpc8313erdb/u-boot.lds b/nand_spl/board/freescale/mpc8313erdb/u-boot.lds index ad82589..a3cacf6 100644 --- a/nand_spl/board/freescale/mpc8313erdb/u-boot.lds +++ b/nand_spl/board/freescale/mpc8313erdb/u-boot.lds @@ -38,11 +38,14 @@ SECTIONS .data : { *(.data*) *(.sdata*) + _GOT_TABLE_ = .; + PROVIDE(_GLOBAL_OFFSET_TABLE_ = . + 4); + *(.got) _GOT2_TABLE_ = .; *(.got2) - __got2_entries = (. - _GOT2_TABLE_) >> 2; } - + __got_entries = ((_GLOBAL_OFFSET_TABLE_ - _GOT_TABLE_) >> 2)-1; + __got2_entries = (. - _GOT2_TABLE_) >> 2; . = ALIGN(8); __bss_start = .; .bss (NOLOAD) : { *(.*bss) } diff --git a/nand_spl/board/freescale/mpc8315erdb/u-boot.lds b/nand_spl/board/freescale/mpc8315erdb/u-boot.lds index ad82589..a3cacf6 100644 --- a/nand_spl/board/freescale/mpc8315erdb/u-boot.lds +++ b/nand_spl/board/freescale/mpc8315erdb/u-boot.lds @@ -38,11 +38,14 @@ SECTIONS .data : { *(.data*) *(.sdata*) + _GOT_TABLE_ = .; + PROVIDE(_GLOBAL_OFFSET_TABLE_ = . + 4); + *(.got) _GOT2_TABLE_ = .; *(.got2) - __got2_entries = (. - _GOT2_TABLE_) >> 2; } - + __got_entries = ((_GLOBAL_OFFSET_TABLE_ - _GOT_TABLE_) >> 2)-1; + __got2_entries = (. - _GOT2_TABLE_) >> 2; . = ALIGN(8); __bss_start = .; .bss (NOLOAD) : { *(.*bss) }

From: Joakim Tjernlund Joakim.Tjernlund@transmode.se To: u-boot@lists.denx.de, Scott Wood scottwood@freescale.com, Kim
Phillips
kim.phillips@freescale.com Cc: Joakim Tjernlund Joakim.Tjernlund@transmode.se Date: 2010/10/13 23:12 Subject: [U-Boot] [PATCHv3] mpc83xx: Add -fpic relocation support Sent by: u-boot-bounces@lists.denx.de
This adds relocation of .got entries produced by -fpic. -fpic produces 2-3% smaller code and is faster. Unfortunately gcc promotes -fpic to -fPIC when -mrelocatable is used so one need a very small patch to gcc too(sent upstream).
-fpic puts its GOT entries in .got section(s) and linker defines the symbol _GLOBAL_OFFSET_TABLE_ to point to the middle of this table. The entry at _GLOBAL_OFFSET_TABLE_-4 contains a blrl insn which is used to find the table's real address by branching to _GLOBAL_OFFSET_TABLE_-4.
Ping?
Jocke

From: Joakim Tjernlund joakim.tjernlund@transmode.se
From: Joakim Tjernlund Joakim.Tjernlund@transmode.se To: u-boot@lists.denx.de, Scott Wood scottwood@freescale.com, Kim
Phillips
kim.phillips@freescale.com Cc: Joakim Tjernlund Joakim.Tjernlund@transmode.se Date: 2010/10/13 23:12 Subject: [U-Boot] [PATCHv3] mpc83xx: Add -fpic relocation support Sent by: u-boot-bounces@lists.denx.de
This adds relocation of .got entries produced by -fpic. -fpic produces 2-3% smaller code and is faster. Unfortunately gcc promotes -fpic to -fPIC when -mrelocatable is used so one need a very small patch to gcc too(sent upstream).
-fpic puts its GOT entries in .got section(s) and linker defines the symbol _GLOBAL_OFFSET_TABLE_ to point to the middle of this table. The entry at _GLOBAL_OFFSET_TABLE_-4 contains a blrl insn which is used to find the table's real address by branching to _GLOBAL_OFFSET_TABLE_-4.
Ping?
Ping ping :)
I should mention that this work tougher with -msingle-pic-base paves the way for true PIC with minimal changes to C source code.
participants (7)
-
Albert ARIBAUD
-
J. William Campbell
-
Joakim Tjernlund
-
Joakim Tjernlund
-
Kim Phillips
-
Scott Wood
-
Wolfgang Denk