[U-Boot] RFC: Aligning arch initialisation sequences

Hi All,
I've been looking at how x86 does things compared to other arches, and it seems to have one big difference - Very little is initialised prior to relocation. The general order for x86 is:
- ultra-low level board initialisation - DRAM controller initialisation - DRAM sizing - low-level board initialisation - Relocation - init_sequence[] (including timers, serial console
Now ARM, m68k and PPC all call relocate_code well after init_sequence[]
Now I'm wondering if I should change x86 to align with these other arch's. I think it can be done, but there are a few technicalities that make it a non-trival task. Two big problems I foresee are timers and the global data pointer.
Timers for x86 use interrupts which use function hooks for the interrupt handlers - I can change all the call-back functions to use global data, and I will have to 're-hook' the call-back functions after relocation.
The global data pointer is worse - x86 does not have a spare register to effectively use as a global variable. I would have to do all sorts of trickery probably involving putting the global data pointer at the very top of the stack segment and doing some very fancy shuffling of registers which will make global data suffer a pretty severe performance hit.
On the other hand, if I leave things the way they are, there is beauty to be uncovered in board_init_r() - At the moment, board_init_r() consists of the init_sequence[] loop, followed by a whole raft of function calls wrapped around #idefs. That whole mess could all be converted to a unified init_sequence[]. Along with current discussions on weak functions, I think this might be a cleaner solution.
So the question is, what does everyone think I should do? Should all architectures strive to look as much like one another as possible? Should we accept that maybe this particular issue be thrown in the too hard basket?
Regards,
Graeme

On Sunday, November 07, 2010 05:06:26 Graeme Russ wrote:
Should all architectures strive to look as much like one another as possible? Should we accept that maybe this particular issue be thrown in the too hard basket?
imo, we should strive to have these things in one common place and not just have the different files look the sam -mike

On 10/11/10 10:35, Mike Frysinger wrote:
On Sunday, November 07, 2010 05:06:26 Graeme Russ wrote:
Should all architectures strive to look as much like one another as possible? Should we accept that maybe this particular issue be thrown in the too hard basket?
imo, we should strive to have these things in one common place and not just have the different files look the sam
I was afraid someone would say that. I've been having a look and a think about the situation for x86, and I cannot come up with a sane way to emulate the way the global data pointer is handled by the other arches.
I essence, the gd pointer is a unique global variable available prior to relocation. On all other arches, this is achieved by using a reserved register which I do not have the luxury of on x86 :(
Unless I can resolve this, I cannot move x86 in line with the other arches (i.e. init functions can only be done after relocation).
I'll keep thinking about a possible solution
Regards,
Graeme

On Friday, November 12, 2010 23:16:07 Graeme Russ wrote:
On 10/11/10 10:35, Mike Frysinger wrote:
On Sunday, November 07, 2010 05:06:26 Graeme Russ wrote:
Should all architectures strive to look as much like one another as possible? Should we accept that maybe this particular issue be thrown in the too hard basket?
imo, we should strive to have these things in one common place and not just have the different files look the sam
I was afraid someone would say that. I've been having a look and a think about the situation for x86, and I cannot come up with a sane way to emulate the way the global data pointer is handled by the other arches.
I essence, the gd pointer is a unique global variable available prior to relocation. On all other arches, this is achieved by using a reserved register which I do not have the luxury of on x86 :(
Unless I can resolve this, I cannot move x86 in line with the other arches (i.e. init functions can only be done after relocation).
I'll keep thinking about a possible solution
i dont think the first cut needs to go all the way. if you want to start small with unifying post-reloc, that's OK too. and probably a lot easier to git bisect in case things go wrong. -mike

Dear Graeme Russ, Dear All,
On 10/11/10 10:35, Mike Frysinger wrote:
On Sunday, November 07, 2010 05:06:26 Graeme Russ wrote:
Should all architectures strive to look as much like one another as possible? Should we accept that maybe this particular issue be thrown in the too hard basket?
imo, we should strive to have these things in one common place and not just have the different files look the sam
I was afraid someone would say that. I've been having a look and a think about the situation for x86, and I cannot come up with a sane way to emulate the way the global data pointer is handled by the other arches.
I essence, the gd pointer is a unique global variable available prior to relocation. On all other arches, this is achieved by using a reserved register which I do not have the luxury of on x86 :(
Unless I can resolve this, I cannot move x86 in line with the other arches (i.e. init functions can only be done after relocation).
<Ducks><Flame shield> I *personally* prefer code without tool-chain specific, out of "C" constructs. So I would be fine with passing gd as a parameter to all pre-relocation functions. The way they are called with a function array that would probably only marginally increase code size. And instead of all the funky stack calculations to get the space for gd, it could be achieved in a much simpler way:
in asm: set stack to end (or begin) of SRAM (or whatever)
in c: board_early_init(void) { gd_t auto_gd; ... call all pre-relocation functions with &auto_gd as parameter ... relocate and fixup (preferably in C) - would probably have arch-specific parts in the ELF fix-up ... memcpy (&static_gd, &auto_gd, sizeof auto_gd); ... setup final stack and jump to relocated code... that should be a *small* assembly helper function }
You get the idea ;)
</Flame Shield></Ducks>
Best Regards, Reinhard

On 13.11.2010 07:31, Reinhard Meyer wrote:
Dear Graeme Russ, Dear All,
On 10/11/10 10:35, Mike Frysinger wrote:
On Sunday, November 07, 2010 05:06:26 Graeme Russ wrote:
Should all architectures strive to look as much like one another as possible? Should we accept that maybe this particular issue be thrown in the too hard basket?
imo, we should strive to have these things in one common place and not just have the different files look the sam
I was afraid someone would say that. I've been having a look and a think about the situation for x86, and I cannot come up with a sane way to emulate the way the global data pointer is handled by the other arches.
I essence, the gd pointer is a unique global variable available prior to relocation. On all other arches, this is achieved by using a reserved register which I do not have the luxury of on x86 :(
Unless I can resolve this, I cannot move x86 in line with the other arches (i.e. init functions can only be done after relocation).
<Ducks><Flame shield> I *personally* prefer code without tool-chain specific, out of "C" constructs. So I would be fine with passing gd as a parameter to all pre-relocation functions. The way they are called with a function array that would probably only marginally increase code size. And instead of all the funky stack calculations to get the space for gd, it could be achieved in a much simpler way:
in asm: set stack to end (or begin) of SRAM (or whatever)
in c: board_early_init(void) { gd_t auto_gd; ... call all pre-relocation functions with&auto_gd as parameter ... relocate and fixup (preferably in C) - would probably have arch-specific parts in the ELF fix-up ... memcpy (&static_gd,&auto_gd, sizeof auto_gd); ... setup final stack and jump to relocated code... that should be a *small* assembly helper function }
You get the idea ;)
</Flame Shield></Ducks>
Ok, ok, I forgot the hitch that some functions called from the pre-relocation functions also would need the auto_gd passed on, *if* they need to use gd. That might become ugly, but x86 has solved that somehow?
Best Regards, Reinhard

On 13/11/10 17:38, Reinhard Meyer wrote:
On 13.11.2010 07:31, Reinhard Meyer wrote:
Dear Graeme Russ, Dear All,
On 10/11/10 10:35, Mike Frysinger wrote:
On Sunday, November 07, 2010 05:06:26 Graeme Russ wrote:
Should all architectures strive to look as much like one another as possible? Should we accept that maybe this particular issue be thrown in the too hard basket?
imo, we should strive to have these things in one common place and not just have the different files look the sam
I was afraid someone would say that. I've been having a look and a think about the situation for x86, and I cannot come up with a sane way to emulate the way the global data pointer is handled by the other arches.
I essence, the gd pointer is a unique global variable available prior to relocation. On all other arches, this is achieved by using a reserved register which I do not have the luxury of on x86 :(
Unless I can resolve this, I cannot move x86 in line with the other arches (i.e. init functions can only be done after relocation).
<Ducks><Flame shield> I *personally* prefer code without tool-chain specific, out of "C" constructs. So I would be fine with passing gd as a parameter to all pre-relocation functions. The way they are called with a function array that would probably only marginally increase code size. And instead of all the funky stack calculations to get the space for gd, it could be achieved in a much simpler way:
in asm: set stack to end (or begin) of SRAM (or whatever)
in c: board_early_init(void) { gd_t auto_gd; ... call all pre-relocation functions with&auto_gd as parameter ... relocate and fixup (preferably in C) - would probably have arch-specific parts in the ELF fix-up ... memcpy (&static_gd,&auto_gd, sizeof auto_gd); ... setup final stack and jump to relocated code... that should be a *small* assembly helper function }
You get the idea ;)
</Flame Shield></Ducks>
Ok, ok, I forgot the hitch that some functions called from the pre-relocation functions also would need the auto_gd passed on, *if* they need to use gd. That might become ugly, but x86 has solved that somehow?
Yes, by declaring gd as a global (static) variable and only referencing it after relocation (well actually, it is passed as a parameter to board_init_f (in lieu of boot_params) where the relocation is performed
Regards,
Graeme

Dear Reinhard Meyer,
In message 4CDE30CF.9010509@emk-elektronik.de you wrote:
I *personally* prefer code without tool-chain specific, out of "C" constructs. So I would be fine with passing gd as a parameter to all
I think you would quickly find this becoming a pretty serious pain.
I've been there before. I used this, or tried to use this, before I diged out the technique to reserve a register. This issuch a non-standard mechanism that I hesitated for a long time if doing something like that was acceptable. But otherwise code size will explode: tiny functions, which now are a mere few bytes, wil triple their size and more, because they need to savce registers, load gd from an argument, and pass it as argument to all called functions. You will find you have to add gd as argument basicly to _all_ functions in the code, because these my call another function (which does not use gd), etc. and then, 4 or 5 or evven more levels down there is a single function that needs to access gd.
It is ugly, and a serious pain.
pre-relocation functions. The way they are called with a function array
There are no pure pre-location functions. They gt reused after relocation. And evenbefore relocation you run things like printf() and so on, and it is a nightmare to add gd arguments to all tehs estandard functions.
Been there, and quickly left.
that would probably only marginally increase code size. And instead of
Heh. Try it out, before making such statements.
all the funky stack calculations to get the space for gd, it could be achieved in a much simpler way:
I would be happy if you were right. But you would have to come up with patches that demonstrate that your claims are actually correct to make me change my mind.
Best regards,
Wolfgang Denk

Dear Wolfgang Denk,
I think you would quickly find this becoming a pretty serious pain.
Yes, I saw that soon after my first post:)
But whats left of my ideas is the following:
in asm: set stack to end of SRAM (or whatever) (board-config.h would not subtract GENERATED_GBL_DATA_SIZE anymore)
in c: board_early_init(void) { gd_t auto_gd; gd = &auto_gd;
That would rid us of all alignment concerns: Setting stack to end of initial storage will certainly be aligned, and the auto_gd will be aligned as the toolchain deems necessary.
We would not need GENERATED_GBL_DATA_SIZE anymore.
The auto_gd space on stack will be valid even into the call to relocate_code.
Best Regards, Reinhard

Dear Reinhard Meyer,
In message 4CDF04A8.4050802@emk-elektronik.de you wrote:
But whats left of my ideas is the following:
in asm: set stack to end of SRAM (or whatever) (board-config.h would not subtract GENERATED_GBL_DATA_SIZE anymore)
in c: board_early_init(void) { gd_t auto_gd; gd = &auto_gd;
That would rid us of all alignment concerns: Setting stack to end of initial storage will certainly be aligned, and the auto_gd will be aligned as the toolchain deems necessary.
We would not need GENERATED_GBL_DATA_SIZE anymore.
The auto_gd space on stack will be valid even into the call to relocate_code.
This has but on tiny shortcoming: we use GD to pass data around, for example to pass clock frequencies determind before relocation to the code running after relocation - which means that GD must be of a statically allocated storage class.
Your stack variable above will go out of scope as soon as we leave the board_early_init() function...
Best regards,
Wolfgang Denk

Dear Wolfgang Denk,
Dear Reinhard Meyer,
In message4CDF04A8.4050802@emk-elektronik.de you wrote:
But whats left of my ideas is the following:
in asm: set stack to end of SRAM (or whatever) (board-config.h would not subtract GENERATED_GBL_DATA_SIZE anymore)
in c: board_early_init(void) { gd_t auto_gd; gd =&auto_gd;
That would rid us of all alignment concerns: Setting stack to end of initial storage will certainly be aligned, and the auto_gd will be aligned as the toolchain deems necessary.
We would not need GENERATED_GBL_DATA_SIZE anymore.
The auto_gd space on stack will be valid even into the call to relocate_code.
This has but on tiny shortcoming: we use GD to pass data around, for example to pass clock frequencies determind before relocation to the code running after relocation - which means that GD must be of a statically allocated storage class.
Your stack variable above will go out of scope as soon as we leave the board_early_init() function...
Correct, that's why its even now copied over to storage in SDRAM... (at least on ARM: debug ("relocation Offset is: %08lx\n", gd->reloc_off); memcpy (id, (void *)gd, sizeof (gd_t));
relocate_code (addr_sp, id, addr); )
Best Regards, Reinhard

Dear Reinhard Meyer,
In message 4CDF137E.2000902@emk-elektronik.de you wrote:
This has but on tiny shortcoming: we use GD to pass data around, for example to pass clock frequencies determind before relocation to the code running after relocation - which means that GD must be of a statically allocated storage class.
Your stack variable above will go out of scope as soon as we leave the board_early_init() function...
Correct, that's why its even now copied over to storage in SDRAM... (at least on ARM: debug ("relocation Offset is: %08lx\n", gd->reloc_off); memcpy (id, (void *)gd, sizeof (gd_t));
relocate_code (addr_sp, id, addr); )
At this time board_early_init_f() has terminated long ago, i. e. the data is not available any more.
Best regards,
Wolfgang Denk

Dear Wolfgang Denk,
Dear Reinhard Meyer,
In message4CDF137E.2000902@emk-elektronik.de you wrote:
This has but on tiny shortcoming: we use GD to pass data around, for example to pass clock frequencies determind before relocation to the code running after relocation - which means that GD must be of a statically allocated storage class.
Your stack variable above will go out of scope as soon as we leave the board_early_init() function...
Correct, that's why its even now copied over to storage in SDRAM... (at least on ARM: debug ("relocation Offset is: %08lx\n", gd->reloc_off); memcpy (id, (void *)gd, sizeof (gd_t));
relocate_code (addr_sp, id, addr); )
At this time board_early_init_f() has terminated long ago, i. e. the data is not available any more.
Above code is *IN* board_early_init_f !
Best regards, Reinhard

Dear Reinhard Meyer,
In message 4CDF15BB.1090107@emk-elektronik.de you wrote:
Correct, that's why its even now copied over to storage in SDRAM... (at least on ARM: debug ("relocation Offset is: %08lx\n", gd->reloc_off); memcpy (id, (void *)gd, sizeof (gd_t));
relocate_code (addr_sp, id, addr); )
At this time board_early_init_f() has terminated long ago, i. e. the data is not available any more.
Above code is *IN* board_early_init_f !
That's totally broken, then.
See init_sequence[] in "arch/arm/lib/board.c":
239 init_fnc_t *init_sequence[] = { 240 #if defined(CONFIG_ARCH_CPU_INIT) 241 arch_cpu_init, /* basic arch cpu dependent setup */ 242 #endif 243 #if defined(CONFIG_BOARD_EARLY_INIT_F) 244 board_early_init_f, 245 #endif 246 timer_init, /* initialize timer */ 247 #ifdef CONFIG_FSL_ESDHC 248 get_clocks, 249 #endif 250 env_init, /* initialize environment */ 251 init_baudrate, /* initialze baudrate settings */ 252 serial_init, /* serial communications setup */ 253 console_init_f, /* stage 1 init of console */ 254 display_banner, /* say that we are here */ 255 #if defined(CONFIG_DISPLAY_CPUINFO) 256 print_cpuinfo, /* display cpu info (and speed) */ 257 #endif 258 #if defined(CONFIG_DISPLAY_BOARDINFO) 259 checkboard, /* display board info */ 260 #endif 261 #if defined(CONFIG_HARD_I2C) || defined(CONFIG_SOFT_I2C) 262 init_func_i2c, 263 #endif 264 dram_init, /* configure available RAM banks */ 265 #if defined(CONFIG_CMD_PCI) || defined (CONFIG_PCI) 266 arm_pci_init, 267 #endif 268 NULL, 269 };
board_early_init_f() [in line 244] runs a long, long time before the SDRAM has been tested and initialized, which happens in dram_init() [in line 264].
You cannot and must not touch SDRAM in board_early_init_f(). And even more, you must not at all run relocate_code() there!
Best regards,
Wolfgang Denk

On 14/11/10 11:07, Wolfgang Denk wrote:
Dear Reinhard Meyer,
In message 4CDF15BB.1090107@emk-elektronik.de you wrote:
Correct, that's why its even now copied over to storage in SDRAM... (at least on ARM: debug ("relocation Offset is: %08lx\n", gd->reloc_off); memcpy (id, (void *)gd, sizeof (gd_t));
relocate_code (addr_sp, id, addr); )
At this time board_early_init_f() has terminated long ago, i. e. the data is not available any more.
Above code is *IN* board_early_init_f !
That's totally broken, then.
See init_sequence[] in "arch/arm/lib/board.c":
board_early_init_f() [in line 244] runs a long, long time before the SDRAM has been tested and initialized, which happens in dram_init() [in line 264].
You cannot and must not touch SDRAM in board_early_init_f(). And even more, you must not at all run relocate_code() there!
See: arch/powerpc/lib/board.c arch/m68k/lib/board.c arch/arm/lib/board.c
They all malloc the final global data structure, memcpy the temporary global data to the malloc'd global data, and call relocate_code passing a pointer to the new global data all at the very end board_init_f() and therefore after SDRAM has been initialised
Regards,
Graeme

Dear Graeme Russ,
In message 4CDF2C8B.3060401@gmail.com you wrote:
You cannot and must not touch SDRAM in board_early_init_f(). And even more, you must not at all run relocate_code() there!
See: arch/powerpc/lib/board.c arch/m68k/lib/board.c arch/arm/lib/board.c
They all malloc the final global data structure, memcpy the temporary global data to the malloc'd global data, and call relocate_code passing a pointer to the new global data all at the very end board_init_f() and therefore after SDRAM has been initialised
Yes, and this is correct. board_init_f != board_early_init_f
Best regards,
Wolfgang Denk

On 14/11/10 11:35, Wolfgang Denk wrote:
Dear Graeme Russ,
In message 4CDF2C8B.3060401@gmail.com you wrote:
You cannot and must not touch SDRAM in board_early_init_f(). And even more, you must not at all run relocate_code() there!
See: arch/powerpc/lib/board.c arch/m68k/lib/board.c arch/arm/lib/board.c
They all malloc the final global data structure, memcpy the temporary global data to the malloc'd global data, and call relocate_code passing a pointer to the new global data all at the very end board_init_f() and therefore after SDRAM has been initialised
Yes, and this is correct. board_init_f != board_early_init_f
Ah, I see...
board_early_init_f() is (in most cases) the very first entry in the init_sequence[]
So if global data is defined on the stack in board_init_f() and copied to the heap at the end of board_init_f() we should be OK. Is global data needed prior to board_init_f()?
For x86, I allocate global data in asm and set three members. The sticking point for me is the single ulong parameter to board_init_f() which does not present enough flexibility to pass all the information I need.
I could create a stub x86_board_init_f which has all the parameters I need, allocate global_data on it's stack, setup global data based on the parameters and call board_init_f() passing the pointer to global data. x86 could then do the same as the other arches and copy global data to a malloc'd version.
One step closer to unification.
One nit-pick is that, in reality, the stack space used by board_init_f() is never reclaimed because it never returns. What we could do is reset the stack pointer prior to calling board_init_r()
I still have no way to globalise gd prior to relocation though :( (still thinking)
Regards,
Graeme

Dear Graeme Russ,
In message 4CDF36E3.7060505@gmail.com you wrote:
board_early_init_f() is (in most cases) the very first entry in the init_sequence[]
Right.
So if global data is defined on the stack in board_init_f() and copied to the heap at the end of board_init_f() we should be OK. Is global data needed prior to board_init_f()?
Yes, as we have no writable data segmeit before relocation. We need a way to pass around some data.
For x86, I allocate global data in asm and set three members. The sticking point for me is the single ulong parameter to board_init_f() which does not present enough flexibility to pass all the information I need.
Pass a pointer to a struct ?
One nit-pick is that, in reality, the stack space used by board_init_f() is never reclaimed because it never returns. What we could do is reset the stack pointer prior to calling board_init_r()
This is not correct. After relocation, we set up a completely new stack in RAM.
Best regards,
Wolfgang Denk

On 14/11/10 20:04, Wolfgang Denk wrote:
Dear Graeme Russ,
In message 4CDF36E3.7060505@gmail.com you wrote:
board_early_init_f() is (in most cases) the very first entry in the init_sequence[]
Right.
So if global data is defined on the stack in board_init_f() and copied to the heap at the end of board_init_f() we should be OK. Is global data needed prior to board_init_f()?
Yes, as we have no writable data segmeit before relocation. We need a way to pass around some data.
For x86, I allocate global data in asm and set three members. The sticking point for me is the single ulong parameter to board_init_f() which does not present enough flexibility to pass all the information I need.
Pass a pointer to a struct ?
Which is what I do (a pointer to a gd_t allocated in asm)
There are two minor perceived 'issues' with board_init_f() 1) The parameter is defined as a 'ulong bootflag' - arm, blackfin, mips, and powerpc do not use the parameter at all - avr redefines it as 'ulong board_type' - x86 uses it as a gd_t * - m68k and sparc use it as bootflag - microblaze and nios2 do not even use board_init_f() (they defines there own board_init(void) - sh is like microblaze and nios2 but defines sh_generic_init()
2) At least one list member does not like the allocation of global data in asm and suggests that it should be on the stack of board_init_f() and copied to the heap at the end of board_init_f() (NOTE: there was some confusion with board_early_init_f(), but I think the discussion always intended to focus on board_init_f())
and a few other inconsistencies: - blackfin and avr do not use init_sequence[] (but look like they could) - blackfin has an interesting (read wrong) comment: * The requirements for any new initalization function is simple: it * receives a pointer to the "global data" structure as it's only * argument, and returns an integer return code, where 0 means * "continue" and != 0 means "fatal error, hang the system". -
Why don't we just change board_init_f(ulong bootflag) to board_init_f(gd_t *gd)? avr would need a slight mod to add board_type to gd_t. m68k and sparc would need similar to add bootflag
So start.S would calculate the location of the initial global data struct (in cache, SRAM, Flash etc) and pass this to board_init_f(). A lot of arches would just pass a constant (arm would pass CONFIG_SYS_INIT_SP_ADDR for example)
One nit-pick is that, in reality, the stack space used by board_init_f() is never reclaimed because it never returns. What we could do is reset the stack pointer prior to calling board_init_r()
This is not correct. After relocation, we set up a completely new stack in RAM.
Ah, OK - x86 does not, but after the realignment of x86 to the other arches, I will do this as well
Regards,
Graeme

On 14/11/10 21:21, Graeme Russ wrote:
On 14/11/10 20:04, Wolfgang Denk wrote:
Dear Graeme Russ,
In message 4CDF36E3.7060505@gmail.com you wrote:
Why don't we just change board_init_f(ulong bootflag) to board_init_f(gd_t *gd)? avr would need a slight mod to add board_type to gd_t. m68k and sparc would need similar to add bootflag
So start.S would calculate the location of the initial global data struct (in cache, SRAM, Flash etc) and pass this to board_init_f(). A lot of arches would just pass a constant (arm would pass CONFIG_SYS_INIT_SP_ADDR for example)
Scratch that - with my proposed x86 changes, I do not need to pass a gd_t*
Plus, moving init sequence into board_init_f() before relocation means I loose to ability to 'load anywhere' (which is not a totally disastrous loss) so I don't need load_offset any more. So I can still pass bootflag for warm/cold boot indication
I still think we need to clean-up microblaze, nios2 and sh
Regards,
Graeme

Dear Wolfgang Denk,
Yes, and this is correct. board_init_f != board_early_init_f
To make it crystal clear now:
void board_init_f (ulong bootflag) { bd_t *bd; init_fnc_t **init_fnc_ptr; gd_t *id; ulong addr, addr_sp; + gd_t auto_gd;
/* Pointer is writable since we allocated a register for it */ - gd = (gd_t *) (CONFIG_SYS_INIT_SP_ADDR); + gd = &auto_gd; /* compiler optimization barrier needed for GCC >= 3.4 */ /* Q: why is that needed anyway ??? */ __asm__ __volatile__("": : :"memory");
memset ((void*)gd, 0, sizeof (gd_t)); .... debug ("relocation Offset is: %08lx\n", gd->reloc_off); memcpy (id, (void *)gd, sizeof (gd_t));
relocate_code (addr_sp, id, addr);
/* NOTREACHED - relocate_code() does not return */ }
This, and setting CONFIG_SYS_INIT_SP_ADDR to an aligned value in initial storage (SRAM, pinned down Cache, or other) removes alot of the headache about making stuff aligned.
Best Regards, Reinhard

Dear Reinhard Meyer,
In message 4CDF6070.20503@emk-elektronik.de you wrote:
Dear Wolfgang Denk,
Yes, and this is correct. board_init_f != board_early_init_f
To make it crystal clear now:
void board_init_f (ulong bootflag)
That's not crystal clear at all. Until now, you've been talking about board_early_init_f(), and now you suddenly switch to a completey different function.
bd_t *bd; init_fnc_t **init_fnc_ptr; gd_t *id; ulong addr, addr_sp;
- gd_t auto_gd;
And how do we pass around data before relocation? For example, the serial driver needs information about clock settings to set up the serial console.
You would have to add lots of return values (some time even return structs), which then get passed around as arguments - or you would have to pass around a pointer to GD _before_ that.
Best regards,
Wolfgang Denk

Dear Wolfgang Denk,
To make it crystal clear now, and put the complete context:
That simple change would rid us of all alignment concerns: Setting stack to end of initial storage will certainly be aligned, and the auto_gd will be aligned as the toolchain deems necessary.
We would not need GENERATED_GBL_DATA_SIZE anymore.
The auto_gd space on stack will be valid even into the call to relocate_code.
void board_init_f (ulong bootflag) { bd_t *bd; init_fnc_t **init_fnc_ptr; gd_t *id; ulong addr, addr_sp;
gd_t auto_gd;
/* Pointer is writable since we allocated a register for it */
- gd = (gd_t *) (CONFIG_SYS_INIT_SP_ADDR);
- gd =&auto_gd; /* compiler optimization barrier needed for GCC>= 3.4 */
/* Q: why is that needed anyway ??? */ __asm__ __volatile__("": : :"memory");
memset ((void*)gd, 0, sizeof (gd_t)); .... debug ("relocation Offset is: %08lx\n", gd->reloc_off); memcpy (id, (void *)gd, sizeof (gd_t));
relocate_code (addr_sp, id, addr);
/* NOTREACHED - relocate_code() does not return */ }
This, and setting CONFIG_SYS_INIT_SP_ADDR to an aligned value in initial storage (SRAM, pinned down Cache, or other) removes alot of the headache about making stuff aligned.
This three line change DOES NOT propose any different method of handling gd other then it is currently done.
It is ONLY a more elegant way to allocate the pre-relocation storage for gd.
Best Regards, Reinhard

On 15/11/10 00:46, Reinhard Meyer wrote:
Dear Wolfgang Denk,
To make it crystal clear now, and put the complete context:
That simple change would rid us of all alignment concerns: Setting stack to end of initial storage will certainly be aligned, and the auto_gd will be aligned as the toolchain deems necessary.
We would not need GENERATED_GBL_DATA_SIZE anymore.
The auto_gd space on stack will be valid even into the call to relocate_code.
void board_init_f (ulong bootflag) { bd_t *bd; init_fnc_t **init_fnc_ptr; gd_t *id; ulong addr, addr_sp;
gd_t auto_gd;
/* Pointer is writable since we allocated a register for it */
- gd = (gd_t *) (CONFIG_SYS_INIT_SP_ADDR);
- gd =&auto_gd; /* compiler optimization barrier needed for GCC>= 3.4 */
/* Q: why is that needed anyway ??? */ __asm__ __volatile__("": : :"memory");
memset ((void*)gd, 0, sizeof (gd_t)); .... debug ("relocation Offset is: %08lx\n", gd->reloc_off); memcpy (id, (void *)gd, sizeof (gd_t));
relocate_code (addr_sp, id, addr);
/* NOTREACHED - relocate_code() does not return */ }
This, and setting CONFIG_SYS_INIT_SP_ADDR to an aligned value in initial storage (SRAM, pinned down Cache, or other) removes alot of the headache about making stuff aligned.
This three line change DOES NOT propose any different method of handling gd other then it is currently done.
It is ONLY a more elegant way to allocate the pre-relocation storage for gd.
On the proviso that gd is not needed _BEFORE_ board_init_f()
Regards,
Graeme

Dear Graeme Russ,
On 15/11/10 00:46, Reinhard Meyer wrote:
Dear Wolfgang Denk,
To make it crystal clear now, and put the complete context:
That simple change would rid us of all alignment concerns: Setting stack to end of initial storage will certainly be aligned, and the auto_gd will be aligned as the toolchain deems necessary.
We would not need GENERATED_GBL_DATA_SIZE anymore.
The auto_gd space on stack will be valid even into the call to relocate_code.
void board_init_f (ulong bootflag) { bd_t *bd; init_fnc_t **init_fnc_ptr; gd_t *id; ulong addr, addr_sp;
gd_t auto_gd;
/* Pointer is writable since we allocated a register for it */
- gd = (gd_t *) (CONFIG_SYS_INIT_SP_ADDR);
- gd =&auto_gd; /* compiler optimization barrier needed for GCC>= 3.4 */
/* Q: why is that needed anyway ??? */ __asm__ __volatile__("": : :"memory");
memset ((void*)gd, 0, sizeof (gd_t)); .... debug ("relocation Offset is: %08lx\n", gd->reloc_off); memcpy (id, (void *)gd, sizeof (gd_t));
relocate_code (addr_sp, id, addr);
/* NOTREACHED - relocate_code() does not return */ }
This, and setting CONFIG_SYS_INIT_SP_ADDR to an aligned value in initial storage (SRAM, pinned down Cache, or other) removes alot of the headache about making stuff aligned.
This three line change DOES NOT propose any different method of handling gd other then it is currently done.
It is ONLY a more elegant way to allocate the pre-relocation storage for gd.
On the proviso that gd is not needed _BEFORE_ board_init_f()
At least on ARM, board_init_f() is the first C function called after basic SoC initialisation in ASM.
And may patch does not move the point of availability of gd at all.
Best Regards, Reinhard

On Mon, Nov 15, 2010 at 7:16 AM, Reinhard Meyer u-boot@emk-elektronik.de wrote:
Dear Graeme Russ,
On 15/11/10 00:46, Reinhard Meyer wrote:
Dear Wolfgang Denk,
To make it crystal clear now, and put the complete context:
That simple change would rid us of all alignment concerns: Setting stack to end of initial storage will certainly be aligned, and the auto_gd will be aligned as the toolchain deems necessary.
We would not need GENERATED_GBL_DATA_SIZE anymore.
The auto_gd space on stack will be valid even into the call to relocate_code.
void board_init_f (ulong bootflag) { bd_t *bd; init_fnc_t **init_fnc_ptr; gd_t *id; ulong addr, addr_sp;
- gd_t auto_gd;
/* Pointer is writable since we allocated a register for it */
- gd = (gd_t *) (CONFIG_SYS_INIT_SP_ADDR);
- gd =&auto_gd;
/* compiler optimization barrier needed for GCC>= 3.4 */ /* Q: why is that needed anyway ??? */ __asm__ __volatile__("": : :"memory");
memset ((void*)gd, 0, sizeof (gd_t)); .... debug ("relocation Offset is: %08lx\n", gd->reloc_off); memcpy (id, (void *)gd, sizeof (gd_t));
relocate_code (addr_sp, id, addr);
/* NOTREACHED - relocate_code() does not return */ }
This, and setting CONFIG_SYS_INIT_SP_ADDR to an aligned value in initial storage (SRAM, pinned down Cache, or other) removes alot of the headache about making stuff aligned.
This three line change DOES NOT propose any different method of handling gd other then it is currently done.
It is ONLY a more elegant way to allocate the pre-relocation storage for gd.
On the proviso that gd is not needed _BEFORE_ board_init_f()
At least on ARM, board_init_f() is the first C function called after basic SoC initialisation in ASM.
And may patch does not move the point of availability of gd at all.
But do some arches populate some of gd before calling board_init_f()?
Hint: x86 does (but might not need to in the near future) - YMMV for other arches
Regards,
Graeme

Le 13/11/2010 05:16, Graeme Russ a écrit :
I essence, the gd pointer is a unique global variable available prior to relocation. On all other arches, this is achieved by using a reserved register which I do not have the luxury of on x86 :(
Dusting off ooooold knowledge of x86 and without even a glance at x86 u-boot... Since GD is the only global used pre-reloc, can you not ensure it always ends up first in the data segment, and then manage two values for the DS segment reg, one pre-reloc where only gd can be used, and one post-reloc where gd and all the other globals can be accessed?
Regards,
Graeme
Amicalement,

On 13/11/10 19:20, Albert ARIBAUD wrote:
Le 13/11/2010 05:16, Graeme Russ a écrit :
I essence, the gd pointer is a unique global variable available prior to relocation. On all other arches, this is achieved by using a reserved register which I do not have the luxury of on x86 :(
Dusting off ooooold knowledge of x86 and without even a glance at x86 u-boot... Since GD is the only global used pre-reloc, can you not ensure it always ends up first in the data segment, and then manage two values for the DS segment reg, one pre-reloc where only gd can be used, and one post-reloc where gd and all the other globals can be accessed?
I had though of something similar to that by using GS (which is not generally used by u-boot) but it is very messy
All segments are currently setup to be full 4GB with the initial descriptor table hard-coded in flash and then reloaded after relocation (it needs to be reloaded so it does not get clobbered when erasing flash or relocating from RAM). I did have a dynamic GTD calculated in asm using self modifying code but changed that out to use the same 'C' methodology as Linux. I would prefer not to go back there...
So yes, it is possible, but quite frankly, I would rather leave the init functions post relocation than mess around with the GTS pre-relocation
Regards,
Graeme

On 13/11/10 22:18, Graeme Russ wrote:
On 13/11/10 19:20, Albert ARIBAUD wrote:
Le 13/11/2010 05:16, Graeme Russ a écrit :
I essence, the gd pointer is a unique global variable available prior to relocation. On all other arches, this is achieved by using a reserved register which I do not have the luxury of on x86 :(
Dusting off ooooold knowledge of x86 and without even a glance at x86 u-boot... Since GD is the only global used pre-reloc, can you not ensure it always ends up first in the data segment, and then manage two values for the DS segment reg, one pre-reloc where only gd can be used, and one post-reloc where gd and all the other globals can be accessed?
I had though of something similar to that by using GS (which is not generally used by u-boot) but it is very messy
All segments are currently setup to be full 4GB with the initial descriptor table hard-coded in flash and then reloaded after relocation (it needs to be reloaded so it does not get clobbered when erasing flash or relocating from RAM). I did have a dynamic GTD calculated in asm using self modifying code but changed that out to use the same 'C' methodology as Linux. I would prefer not to go back there...
So yes, it is possible, but quite frankly, I would rather leave the init functions post relocation than mess around with the GTS pre-relocation
OK, I've had a good hard look at this, and setting aside a segment is the only way that I can think of. I'll need to revisit the self-modifying code I threw out, but I can live with that small ugliness to bring x86 board_init_f() in line with the other (relocating) arches.
For some bizarre reason, the BIOS emulation layer *yuck* uses GS *double yuck*. FS is available though
So, I can set aside FS to store the global data pointer, but this introduces a new problem. To get the pointer, I cannot simply use gd->, I will need to write a function to retrieve the pointer, so I need gd()->
So I'm thinking of a #define GLOBAL_DATA which all existing arches can define simply as gd and x86 can define as get_gd_ptr() and I write the get_gd_ptr() function to extract the pointer using FS
static inline void *get_gd_ptr(void) { void *gd_ptr;
asm volatile("gs mov 0, %0\n" : "=r" (gd_ptr)); return gd_ptr; }
Now I hope the compiler will optimise this down very well. For example, accessing gd results in the following asm:
/* gd->baudrate = CONFIG_BAUDRATE; */ mov 0x602917c,%eax movl $0x2580,0x8(%eax)
I would expect using FS to result in something like:
/* gd->baudrate = CONFIG_BAUDRATE; */ mov fs:0x00000000,%eax movl $0x2580,0x8(%eax)
So no speed or size penalty :)
Does this sound like a plan?
Regards,
Graeme

On 14/11/10 16:10, Graeme Russ wrote:
On 13/11/10 22:18, Graeme Russ wrote:
On 13/11/10 19:20, Albert ARIBAUD wrote:
Le 13/11/2010 05:16, Graeme Russ a écrit :
static inline void *get_gd_ptr(void) { void *gd_ptr;
asm volatile("gs mov 0, %0\n" : "=r" (gd_ptr)); return gd_ptr; }
static inline gd_t *get_gd_ptr(void) { gd_t *gd_ptr;
asm volatile("fs mov 0, %0\n" : "=r" (gd_ptr)); return gd_ptr; }
works a treat :)
Is there any reason the other arches could not implement an inline function as well? ARM for example:
-#define DECLARE_GLOBAL_DATA_PTR register volatile gd_t *gd asm ("r8")
+static inline gd_t *gd(void) +{ + gd_t *gd_ptr; + + asm volatile("mov r8, %0\n" : "=r" (gd_ptr)); + return gd_ptr; +}
Is the compiler smart enough to optimise out and not increase code size?
If so, all the arches can have identical implementations for using global data
Regards,
Graeme

Le 14/11/2010 06:48, Graeme Russ a écrit :
Is there any reason the other arches could not implement an inline function as well? ARM for example:
-#define DECLARE_GLOBAL_DATA_PTR register volatile gd_t *gd asm ("r8")
There is no way to set aside a register in ARM code for applicative purposes.
Amicalement,

Dear Albert ARIBAUD,
In message 4CDFA8E9.3050803@free.fr you wrote:
Is there any reason the other arches could not implement an inline function as well? ARM for example:
-#define DECLARE_GLOBAL_DATA_PTR register volatile gd_t *gd asm ("r8")
There is no way to set aside a register in ARM code for applicative purposes.
What exactly do you mean by that?
Best regards,
Wolfgang Denk

Le 14/11/2010 11:30, Wolfgang Denk a écrit :
Dear Albert ARIBAUD,
In message4CDFA8E9.3050803@free.fr you wrote:
Is there any reason the other arches could not implement an inline function as well? ARM for example:
-#define DECLARE_GLOBAL_DATA_PTR register volatile gd_t *gd asm ("r8")
There is no way to set aside a register in ARM code for applicative purposes.
What exactly do you mean by that?
Use of the 16 ARM registers r0 to r15 in C code is governed by ABIs; we use the one known as (GNU) EABI. The compiler will both respect this ABI when generating code and assume it is respected by any other code in the application.
The GNU EABI is based on the AAPCS which defines the use of registers. r0 to r3 are used for call parameters; r4 to r8, r10 and r11 are used as local variables; r12 is a scratch register; r13, r14 and 15 are the stack pointer, return address holder and program counter respectively. None of these registers can be used as a global constant except r9, which is indicated as 'plaform-specific' and potentially useable by variants of the AAPCS.
One could argue that using r9 to locage GD would constitue such a variant, platform-specific use of AAPCS. However, some -gcc) compiler switches actually cause the generated code to follow such variants of AAPCS and use r9 for specific purposes (some even r10 if r9 is used, even though this is not strictly AAPCS compliant) -- and I don't know about potential uses by other compilers.
I would thus not favor using r9 for gd access if there is another way.
Amicalement,

Dear Albert ARIBAUD,
In message 4CDFD1AE.1070409@free.fr you wrote:
There is no way to set aside a register in ARM code for applicative purposes.
What exactly do you mean by that?
Use of the 16 ARM registers r0 to r15 in C code is governed by ABIs; we use the one known as (GNU) EABI. The compiler will both respect this ABI when generating code and assume it is respected by any other code in the application.
Well, that just means that your above statement should be completed by the phrase "as long as we strictly adhere to some specific ABI".
The GNU EABI is based on the AAPCS which defines the use of registers. r0 to r3 are used for call parameters; r4 to r8, r10 and r11 are used as local variables; r12 is a scratch register; r13, r14 and 15 are the stack pointer, return address holder and program counter respectively. None of these registers can be used as a global constant except r9, which is indicated as 'plaform-specific' and potentially useable by variants of the AAPCS.
One could argue that using r9 to locage GD would constitue such a variant, platform-specific use of AAPCS. However, some -gcc) compiler switches actually cause the generated code to follow such variants of AAPCS and use r9 for specific purposes (some even r10 if r9 is used, even though this is not strictly AAPCS compliant) -- and I don't know about potential uses by other compilers.
I would thus not favor using r9 for gd access if there is another way.
Traditionally, U-Boot has been using r9 as GOT pointer. And we use r8 as pointer to the global data (which is what the ``register volatile gd_t *gd asm ("r8")'' means.
We make sure that the compiler respects this use by adding a "-ffixed-r8" compiler option.
This is perfectly OK, as we don't have to interface in any way with any strictly EABI conforming code - we are working in a completely controlled environment and can adjust rules as needed.
Best regards,
Wolfgang Denk

Le 14/11/2010 16:01, Wolfgang Denk a écrit :
This is perfectly OK, as we don't have to interface in any way with any strictly EABI conforming code - we are working in a completely controlled environment and can adjust rules as needed.
Alright, then I think we should document how we comply, or do not comply, with GNU EABI / AAPCS (maybe a README.arm that people could read up) -- and I think if there is a way to access GD both before and after relocation without making a register unavailable to the whole u-boot code, then we should use it.
Best regards,
Wolfgang Denk
Amicalement,

Dear Albert ARIBAUD,
In message 4CE0221A.7030502@free.fr you wrote:
Alright, then I think we should document how we comply, or do not comply, with GNU EABI / AAPCS (maybe a README.arm that people could read
Register use is documented in the top level README.
up) -- and I think if there is a way to access GD both before and after relocation without making a register unavailable to the whole u-boot code, then we should use it.
Agreed.
Best regards,
Wolfgang Denk

Le 14/11/2010 20:06, Wolfgang Denk a écrit :
Alright, then I think we should document how we comply, or do not comply, with GNU EABI / AAPCS (maybe a README.arm that people could read
Register use is documented in the top level README.
My bad: I'd missed that one because I always go straight to the doc/ directory for documentation of this kind -- the root README I never read apart from the first few pages, and I would not have thought it to give this level of detail.
BTW, a fix to this ./README is in order as GOT is not used any more with ELF ARM relocation, so r9 is not needed for this anymore...
... and even though I don't like the idea of reserving a register for gd, since we must for the moment, then using the (now available) r9 register would be *more* 'EABI/AAPCS-compliant' than using r8 (as I said, one could think of this use of r9 as 'our AAPCS variant').
Amicalement,

Dear Albert ARIBAUD,
In message 4CE0388E.2070601@free.fr you wrote:
Register use is documented in the top level README.
My bad: I'd missed that one because I always go straight to the doc/ directory for documentation of this kind -- the root README I never read apart from the first few pages, and I would not have thought it to give this level of detail.
;-)
BTW, a fix to this ./README is in order as GOT is not used any more with ELF ARM relocation, so r9 is not needed for this anymore...
Agreed.
... and even though I don't like the idea of reserving a register for gd, since we must for the moment, then using the (now available) r9 register would be *more* 'EABI/AAPCS-compliant' than using r8 (as I said, one could think of this use of r9 as 'our AAPCS variant').
Actually situation might be differenton ARM. I just did quick and ditry test for the TX25 board:
diff --git a/arch/arm/cpu/arm926ejs/config.mk b/arch/arm/cpu/arm926ejs/config.mk index f8ef90f..f8bbeba 100644 --- a/arch/arm/cpu/arm926ejs/config.mk +++ b/arch/arm/cpu/arm926ejs/config.mk @@ -21,7 +21,7 @@ # MA 02111-1307 USA #
-PLATFORM_RELFLAGS += -fno-common -ffixed-r8 -msoft-float +PLATFORM_RELFLAGS += -fno-common -msoft-float
PLATFORM_CPPFLAGS += -march=armv5te # ========================================================================= diff --git a/arch/arm/include/asm/global_data.h b/arch/arm/include/asm/global_data.h index ada3fbb..7561523 100644 --- a/arch/arm/include/asm/global_data.h +++ b/arch/arm/include/asm/global_data.h @@ -86,6 +86,13 @@ typedef struct global_data { #define GD_FLG_DISABLE_CONSOLE 0x00040 /* Disable console (in & out) */ #define GD_FLG_ENV_READY 0x00080 /* Environment imported into hash table */
+ +#if 0 #define DECLARE_GLOBAL_DATA_PTR register volatile gd_t *gd asm ("r8") +#else /* We could use plain global data, but the resulting code is bigger */ +#define XTRN_DECLARE_GLOBAL_DATA_PTR extern +#define DECLARE_GLOBAL_DATA_PTR XTRN_DECLARE_GLOBAL_DATA_PTR \ + gd_t *gd +#endif
#endif /* __ASM_GBL_DATA_H */ diff --git a/arch/arm/lib/board.c b/arch/arm/lib/board.c index 1fd5f83..b0de6c7 100644 --- a/arch/arm/lib/board.c +++ b/arch/arm/lib/board.c @@ -647,3 +647,14 @@ void hang (void) puts ("### ERROR ### Please RESET the board ###\n"); for (;;); } + +#if 1 /* We could use plain global data, but the resulting code is bigger */ +/* + * Pointer to initial global data area + * + * Here we initialize it. + */ +#undef XTRN_DECLARE_GLOBAL_DATA_PTR +#define XTRN_DECLARE_GLOBAL_DATA_PTR /* empty = allocate here */ +DECLARE_GLOBAL_DATA_PTR = (gd_t *) (CONFIG_SYS_INIT_SP_ADDR); +#endif /* 0 */
Compare sizes for "tx25":
text data bss dec hex filename 158730 8668 37120 204518 31ee6 ./u-boot with register 158574 8672 37120 204366 31e4e ./u-boot with global pointer
The global pointer method saves a total of 152 bytes here (156 in .text saved, but 4 in .data needed).
OK, this is not even 0.1% of the size, but anyway...
Best regards,
Wolfgang Denk

Le 14/11/2010 20:55, Wolfgang Denk a écrit :
Actually situation might be differenton ARM. I just did quick and ditry test for the TX25 board:
[...]
The global pointer method saves a total of 152 bytes here (156 in .text saved, but 4 in .data needed).
OK, this is not even 0.1% of the size, but anyway...
If the difference in size is marginal, then I prefer the implementation that has the least 'quirks' and most closely complies with EABI/AAPCS.
BTW your quick'n'dirty test puts GD at a fixed location identical for code running before and after relocation, right? But do we not change the stack location?
Best regards,
Wolfgang Denk
Amicalement,

Dear Albert ARIBAUD,
In message 4CE04241.7070407@free.fr you wrote:
OK, this is not even 0.1% of the size, but anyway...
If the difference in size is marginal, then I prefer the implementation that has the least 'quirks' and most closely complies with EABI/AAPCS.
Yes, I agree. On ARM the global pointermethod has both the advantage of being cleaner and giving slightly smaller code.
BTW your quick'n'dirty test puts GD at a fixed location identical for code running before and after relocation, right? But do we not change the stack location?
Yes, we do. I just wanted to compile it for the code size difference; I did not attempt to make a perfect patch yet :-)
Best regards,
Wolfgang Denk

Dear Albert ARIBAUD,
In message 4CE0221A.7030502@free.fr you wrote:
Alright, then I think we should document how we comply, or do not comply, with GNU EABI / AAPCS (maybe a README.arm that people could read up) -- and I think if there is a way to access GD both before and after relocation without making a register unavailable to the whole u-boot code, then we should use it.
By the way - it should be not difficult to use a normal extern pointer to reference the global data; see "arch/powerpc/include/asm/global_data.h":
194 #if 1 195 #define DECLARE_GLOBAL_DATA_PTR register volatile gd_t *gd asm ("r2") 196 #else /* We could use plain global data, but the resulting code is bigger */ 197 #define XTRN_DECLARE_GLOBAL_DATA_PTR extern 198 #define DECLARE_GLOBAL_DATA_PTR XTRN_DECLARE_GLOBAL_DATA_PTR \ 199 gd_t *gd 200 #endif
When I implemented this code I tested both versions. There is not much of a difference, except that the register based version results in smaller code.
Best regards,
Wolfgang Denk

On 15/11/10 06:23, Wolfgang Denk wrote:
Dear Albert ARIBAUD,
In message 4CE0221A.7030502@free.fr you wrote:
Alright, then I think we should document how we comply, or do not comply, with GNU EABI / AAPCS (maybe a README.arm that people could read up) -- and I think if there is a way to access GD both before and after relocation without making a register unavailable to the whole u-boot code, then we should use it.
By the way - it should be not difficult to use a normal extern pointer to reference the global data; see "arch/powerpc/include/asm/global_data.h":
194 #if 1 195 #define DECLARE_GLOBAL_DATA_PTR register volatile gd_t *gd asm ("r2") 196 #else /* We could use plain global data, but the resulting code is bigger */ 197 #define XTRN_DECLARE_GLOBAL_DATA_PTR extern 198 #define DECLARE_GLOBAL_DATA_PTR XTRN_DECLARE_GLOBAL_DATA_PTR \ 199 gd_t *gd 200 #endif
I think you will find this peculiar to PowerPC
What you are talking about is exactly how x86 defines gd, but for x86, gd is not accessible until after relocation
When I implemented this code I tested both versions. There is not much of a difference, except that the register based version results in smaller code.
Probably due to one less register load from memory for each gd access
Regards,
Graeme

Le 14/11/2010 20:34, Graeme Russ a écrit :
On 15/11/10 06:23, Wolfgang Denk wrote:
Dear Albert ARIBAUD,
In message4CE0221A.7030502@free.fr you wrote:
Alright, then I think we should document how we comply, or do not comply, with GNU EABI / AAPCS (maybe a README.arm that people could read up) -- and I think if there is a way to access GD both before and after relocation without making a register unavailable to the whole u-boot code, then we should use it.
By the way - it should be not difficult to use a normal extern pointer to reference the global data; see "arch/powerpc/include/asm/global_data.h":
194 #if 1 195 #define DECLARE_GLOBAL_DATA_PTR register volatile gd_t *gd asm ("r2") 196 #else /* We could use plain global data, but the resulting code is bigger */ 197 #define XTRN_DECLARE_GLOBAL_DATA_PTR extern 198 #define DECLARE_GLOBAL_DATA_PTR XTRN_DECLARE_GLOBAL_DATA_PTR \ 199 gd_t *gd 200 #endif
I think you will find this peculiar to PowerPC
I don't think Wolfgang's idea is actually processor-specific.
Each processor has a way to define globals, which end up in the initialized data or bss area. BSS is not available before relocation, but initialized data is, and remains so after relocation.
So if we define gd as an initialized pointer (residing in the initialized data area), it will be available both before and after relocation.
Before relocation, this pointer will be read-only. We can set it at compile time if we know for each arch (or board) a good address in RAM or IRAM where gd can exist.
After relocation, the pointer becomes read-write: we can copy gd content from (I)RAM to RAM if necessary and then update the gd pointer.
What you are talking about is exactly how x86 defines gd, but for x86, gd is not accessible until after relocation
Could it become accessible with the idea I expose above?
Regards,
Graeme
Amicalement,

On Mon, Nov 15, 2010 at 7:05 AM, Albert ARIBAUD albert.aribaud@free.fr wrote:
Le 14/11/2010 20:34, Graeme Russ a écrit :
On 15/11/10 06:23, Wolfgang Denk wrote:
Dear Albert ARIBAUD,
In message4CE0221A.7030502@free.fr you wrote:
Alright, then I think we should document how we comply, or do not comply, with GNU EABI / AAPCS (maybe a README.arm that people could read up) -- and I think if there is a way to access GD both before and after relocation without making a register unavailable to the whole u-boot code, then we should use it.
By the way - it should be not difficult to use a normal extern pointer to reference the global data; see "arch/powerpc/include/asm/global_data.h":
194 #if 1 195 #define DECLARE_GLOBAL_DATA_PTR register volatile gd_t *gd asm ("r2") 196 #else /* We could use plain global data, but the resulting code is bigger */ 197 #define XTRN_DECLARE_GLOBAL_DATA_PTR extern 198 #define DECLARE_GLOBAL_DATA_PTR XTRN_DECLARE_GLOBAL_DATA_PTR \ 199 gd_t *gd 200 #endif
I think you will find this peculiar to PowerPC
I don't think Wolfgang's idea is actually processor-specific.
Each processor has a way to define globals, which end up in the initialized data or bss area. BSS is not available before relocation, but initialized data is, and remains so after relocation.
So if we define gd as an initialized pointer (residing in the initialized data area), it will be available both before and after relocation.
Before relocation, this pointer will be read-only. We can set it at compile time if we know for each arch (or board) a good address in RAM or IRAM where gd can exist.
After relocation, the pointer becomes read-write: we can copy gd content from (I)RAM to RAM if necessary and then update the gd pointer.
What you are talking about is exactly how x86 defines gd, but for x86, gd is not accessible until after relocation
Could it become accessible with the idea I expose above?
Well, I do not have CONFIG_SYS_INIT_SP_ADDR but I guess I could add it
What about in board.c
gd_t global_data; gd_t *gd = &global_data;
Oh, PowerPC comment is wrong...
* Be aware of the restrictions: global data is read-only, BSS is not * initialized, and stack space is limited to a few kB.
And then it does a whole lot of writing to gd starting with:
/* Clear initial global data */ memset ((void *) gd, 0, sizeof (gd_t));
OK bad idea :)
So we seem to have two perfectly valid theories which cannot co-exist:
1) Make gd_t *gd static and hard code it's initial value to some known location in RAM (SRAM, Cache etc). After relocation, it can be written to 2) Use a register as we do now and potentially move initial global data onto the stack in init_board_f()
Option #1 means I do not have to do any crazyness with my Global Descriptor Table in asm (self modifying code, FS calculations etc) but seems to mess araound with code size on some arches
Option #2 means there is no need for GENERATED_GBL_DATA_SIZE or hard-coded memory location but does rely on there being no global data access prior to calling board_init_f()
Well, both have their pro's and con's - I'll leave that decision to Wolfgang ;)
Regards,
Graeme

On 15/11/10 07:05, Albert ARIBAUD wrote:
Le 14/11/2010 20:34, Graeme Russ a écrit :
On 15/11/10 06:23, Wolfgang Denk wrote:
Dear Albert ARIBAUD,
In message4CE0221A.7030502@free.fr you wrote:
Alright, then I think we should document how we comply, or do not comply, with GNU EABI / AAPCS (maybe a README.arm that people could read up) -- and I think if there is a way to access GD both before and after relocation without making a register unavailable to the whole u-boot code, then we should use it.
By the way - it should be not difficult to use a normal extern pointer to reference the global data; see "arch/powerpc/include/asm/global_data.h":
194 #if 1 195 #define DECLARE_GLOBAL_DATA_PTR register volatile gd_t *gd asm ("r2") 196 #else /* We could use plain global data, but the resulting code is bigger */ 197 #define XTRN_DECLARE_GLOBAL_DATA_PTR extern 198 #define DECLARE_GLOBAL_DATA_PTR XTRN_DECLARE_GLOBAL_DATA_PTR \ 199 gd_t *gd 200 #endif
I think you will find this peculiar to PowerPC
I don't think Wolfgang's idea is actually processor-specific.
Each processor has a way to define globals, which end up in the initialized data or bss area. BSS is not available before relocation, but initialized data is, and remains so after relocation.
So if we define gd as an initialized pointer (residing in the initialized data area), it will be available both before and after relocation.
Before relocation, this pointer will be read-only. We can set it at compile time if we know for each arch (or board) a good address in RAM or IRAM where gd can exist.
After relocation, the pointer becomes read-write: we can copy gd content from (I)RAM to RAM if necessary and then update the gd pointer.
What you are talking about is exactly how x86 defines gd, but for x86, gd is not accessible until after relocation
Could it become accessible with the idea I expose above?
Yes - The following patch uses CONFIG_SYS_INIT_SP_ADDR as a temporary global data area which is copied to a permanent global data structure located in the data section.
I honestly don't know where I stand on this solution though. By using FS, I can emulate the 'global register variable' and the initial global data structure can end up anywhere it needs to be without needing to define CONFIG_SYS_INIT_SP_ADDR but it will involve self-modifying code I would prefer not to re-introduce.
However, this method requires CONFIG_SYS_INIT_SP_ADDR which is casting a certain memory location (in RAM) in stone. Now the eNET has some battery backed SRAM on board which I can point CONFIG_SYS_INIT_SP_ADDR. So I could move SDRAM initialisation into C code, but this is not guaranteed for every x86 board (and for the eNET, it reduces the amount of battery-backed configuration memory available). I am looking to port U-Boot to a VIA EPIA EN15000 single board computer. This has no SRAM.
The VIA board has a C7 processor which coreboot has a 'Cache-as-RAM' (CAR) implementation for, and the SC520 might support CAR (still looking). If this is the case, I think I can unequivocally support the solution based on CONFIG_SYS_INIT_SP_ADDR and move low-level init into board_init_f() in line with the other arches.
For now, consider it a +0.5 vote for the patch below :)
Regards,
Graeme
commit e38af43f0246335578b8c207e8097fd0c5fca520 Author: Graeme Russ graeme.russ@gmail.com Date: Mon Nov 15 21:15:52 2010 +1100
x86 Global Data Mods
diff --git a/arch/i386/cpu/start.S b/arch/i386/cpu/start.S index aaf9dba..4c04a5a 100644 --- a/arch/i386/cpu/start.S +++ b/arch/i386/cpu/start.S @@ -127,14 +127,14 @@ mem_ok: /* Set the upper memory limit parameter */ subl $CONFIG_SYS_STACK_SIZE, %eax
- /* Reserve space for global data */ - subl $(GD_SIZE * 4), %eax - - /* %eax points to the global data structure */ + /* Load some required values into the global data structure */ + movl $CONFIG_SYS_INIT_SP_ADDR, %eax movl %esp, (GD_RAM_SIZE * 4)(%eax) - movl %ebx, (GD_FLAGS * 4)(%eax) movl %ecx, (GD_LOAD_OFF * 4)(%eax)
+ /* Setup bootflags parameter to board_init_f() */ + movl %ebx, %eax + call board_init_f /* Enter, U-boot! */
/* indicate (lack of) progress */ diff --git a/arch/i386/include/asm/global_data.h b/arch/i386/include/asm/global_data.h index e9000c3..03ecc3c 100644 --- a/arch/i386/include/asm/global_data.h +++ b/arch/i386/include/asm/global_data.h @@ -88,6 +88,12 @@ extern gd_t *gd; #define GD_FLG_WARM_BOOT 0x00200 /* Warm Boot */
+#if 0 #define DECLARE_GLOBAL_DATA_PTR +#else +#define XTRN_DECLARE_GLOBAL_DATA_PTR extern +#define DECLARE_GLOBAL_DATA_PTR XTRN_DECLARE_GLOBAL_DATA_PTR \ +gd_t *gd +#endif
#endif /* __ASM_GBL_DATA_H */ diff --git a/arch/i386/lib/board.c b/arch/i386/lib/board.c index 1a962d3..11e6569 100644 --- a/arch/i386/lib/board.c +++ b/arch/i386/lib/board.c @@ -45,7 +45,24 @@ #include <miiphy.h> #endif
-DECLARE_GLOBAL_DATA_PTR; +#if 1 /* We could use plain global data, but the resulting code is bigger */ +/* + * Pointer to initial global data area + * + * Here we initialize it. + */ +#undef XTRN_DECLARE_GLOBAL_DATA_PTR +#define XTRN_DECLARE_GLOBAL_DATA_PTR /* empty = allocate here */ +DECLARE_GLOBAL_DATA_PTR = (gd_t *) (CONFIG_SYS_INIT_SP_ADDR); +#endif /* 0 */ + +static inline gd_t *get_gd_ptr(void) +{ + gd_t *gd_ptr; + + asm volatile("gs mov 0, %0\n" : "=r" (gd_ptr)); + return gd_ptr; +}
/* Exports from the Linker Script */ extern ulong __text_start; @@ -163,12 +180,10 @@ init_fnc_t *init_sequence[] = { NULL, };
-gd_t *gd; - /* * Load U-Boot into RAM, initialize BSS, perform relocation adjustments */ -void board_init_f (ulong gdp) +void board_init_f (ulong bootflag) { void *text_start = &__text_start; void *data_end = &__data_end; @@ -186,12 +201,14 @@ void board_init_f (ulong gdp) Elf32_Rel *re_src; Elf32_Rel *re_end;
+ gd->flags = bootflag; + /* Calculate destination RAM Address and relocation offset */ - dest_addr = (void *)gdp - (bss_end - text_start); + dest_addr = (void *)gd->ram_size - (bss_end - text_start); rel_offset = dest_addr - text_start;
/* Perform low-level initialization only when cold booted */ - if (((gd_t *)gdp)->flags & GD_FLG_COLD_BOOT) { + if (gd->flags & GD_FLG_COLD_BOOT) { /* First stage CPU initialization */ if (cpu_init_f() != 0) hang(); @@ -203,8 +220,8 @@ void board_init_f (ulong gdp)
/* Copy U-Boot into RAM */ dst_addr = (ulong *)dest_addr; - src_addr = (ulong *)(text_start + ((gd_t *)gdp)->load_off); - end_addr = (ulong *)(data_end + ((gd_t *)gdp)->load_off); + src_addr = (ulong *)(text_start + gd->load_off); + end_addr = (ulong *)(data_end + gd->load_off);
while (src_addr < end_addr) *dst_addr++ = *src_addr++; @@ -217,8 +234,8 @@ void board_init_f (ulong gdp) *dst_addr++ = 0x00000000;
/* Perform relocation adjustments */ - re_src = (Elf32_Rel *)(rel_dyn_start + ((gd_t *)gdp)->load_off); - re_end = (Elf32_Rel *)(rel_dyn_end + ((gd_t *)gdp)->load_off); + re_src = (Elf32_Rel *)(rel_dyn_start + gd->load_off); + re_end = (Elf32_Rel *)(rel_dyn_end + gd->load_off);
do { if (re_src->r_offset >= CONFIG_SYS_TEXT_BASE) @@ -226,11 +243,11 @@ void board_init_f (ulong gdp) *(Elf32_Addr *)(re_src->r_offset + rel_offset) += rel_offset; } while (re_src++ < re_end);
- ((gd_t *)gdp)->reloc_off = rel_offset; - ((gd_t *)gdp)->flags |= GD_FLG_RELOC; + gd->reloc_off = rel_offset; + gd->flags |= GD_FLG_RELOC;
/* Enter the relocated U-Boot! */ - (board_init_r + rel_offset)((gd_t *)gdp, (ulong)dest_addr); + (board_init_r + rel_offset)(gd, (ulong)dest_addr);
/* NOTREACHED - board_init_f() does not return */ while(1); @@ -242,11 +259,15 @@ void board_init_r(gd_t *id, ulong dest_addr) int i; ulong size; static bd_t bd_data; + static gd_t static_gd; + init_fnc_t **init_fnc_ptr;
show_boot_progress(0x21);
- gd = id; + memcpy (&static_gd, id, sizeof static_gd); + gd = &static_gd; + /* compiler optimization barrier needed for GCC >= 3.4 */ __asm__ __volatile__("": : :"memory");
diff --git a/include/configs/eNET.h b/include/configs/eNET.h index a04cc9a..b821e2d 100644 --- a/include/configs/eNET.h +++ b/include/configs/eNET.h @@ -172,6 +172,7 @@ * Memory organization */ #define CONFIG_SYS_STACK_SIZE 0x8000 /* Size of bootloader stack */ +#define CONFIG_SYS_INIT_SP_ADDR 0x1000000 #define CONFIG_SYS_BL_START_FLASH 0x38040000 /* Address of relocated code */ #define CONFIG_SYS_BL_START_RAM 0x03fd0000 /* Address of relocated code */ #define CONFIG_SYS_MONITOR_BASE CONFIG_SYS_TEXT_BASE

Dear Mike Frysinger,
In message 201011091835.38581.vapier@gentoo.org you wrote:
Should all architectures strive to look as much like one another as possible? Should we accept that maybe this particular issue be thrown in the too hard basket?
imo, we should strive to have these things in one common place and not just have the different files look the sam
Agreed. The former is just a (necessary) step for the latter.
Best regards,
Wolfgang Denk
participants (5)
-
Albert ARIBAUD
-
Graeme Russ
-
Mike Frysinger
-
Reinhard Meyer
-
Wolfgang Denk