
On 30.03.19 23:37, Simon Glass wrote:
Hi Simon,
On Sat, 30 Mar 2019 at 15:40, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 22:18:
Hi Simon,
On Sat, 30 Mar 2019 at 14:50, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
Simon Goldschmidt simon.k.r.goldschmidt@gmail.com schrieb am Sa., 30. März 2019, 21:18:
Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 21:06:
Hi Simon,
On Wed, 27 Mar 2019 at 13:40, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote: > > This introduces a new Kconfig option SPL_CLEAR_BSS_F. If enabled, it clears > the bss before calling board_init_f() instead of clearing it before calling > board_init_r(). > > This also ensures that variables placed in BSS can be shared between > board_init_f() and board_init_r() in SPL. > > Such global variables are used, for example, when loading things from FAT > before SDRAM is available: the full heap required for FAT uses global > variables and clearing BSS after board_init_f() would reset the heap state. > An example for such a usage is socfpa_arria10 where an FPGA configuration > is required before SDRAM can be used. > > Make the new option depend on ARM for now until more implementations follow. >
I still have objections to this series and I think we should discuss other ways of solving this problem.
Does socfgpa have SRAM that could be used before SDRAM is available? If so, can we not use that for the configuration? What various are actually in BSS that are needed before board_init_r() is called? Can they not be in a struct created from malloc()?
The problem is the board needs to load an FPGA configuration from FAT before SDRAM is available. Yes, this is loaded into SRAM of course, but the whole code until that is done uses so many malloc/free iterations that The simple mall of implementation would require too much memory.
And it's the full malloc state variables only that use BSS, not the FAT code.
One way out could be to move the full mall of state variables into 'gd'...
Maybe I should point out again that the whole purpose of this series is to have an SPL that uses full malloc right from the start. This is currently not possible as full malloc needs BSS.
Right, and our assumption/design is that full malloc() requires SRAM.
Here we have an architecture that requires FAT just to init its SDRAM. FAT requires malloc() and free() and there is not enough SRAM to skip the free() calls. So we have to use full malloc() and that uses BSS. But BSS is not available before board_init_r(). But we need to init SDRAM before board_init_r().
Firstly I'd point out that someone should speak to the chip designers. Did anyone try to boot U-Boot on the SoC model?
Well, it's a U-Boot thing to load it from FAT. It could probably be loaded from RAW mmc without problems, so I don't know if it's a chip designers issue. I think it's an issue that we need to fix in U-Boot: we have a good full malloc implementation but it's not usable in all cases were it should be.
OK then why use FAT? I assumed it was a boot-ROM requirement. How does the boot ROM load SPL?
Honestly, I don't know. It's Altera/Intel that decided for this boot flow. However, since FPGA development is often done from Windows, I guess it's convenient for FPGA development to write the resulting binary to a FAT partition on a sdcard.
I think it is possible to change dlmalloc to put its variables in a struct. Then I suppose the struct pointer could be kept in gd. That would be one solution. Then full malloc() could be inited in spl_common_init() perhaps.
That might be worth a try.
Yes shouldn't be too painful. I suspect it would be neutral on code size, too.
Another option might be to update the FAT code to use ALLOC_CACHE_ALIGN_BUFFER() instead of malloc(), as it already does in places, and perhaps to disable all caching for this case. Then it might work with simple malloc().
Hmm, then the next platform will have problems because allocating via malloc would be preferable. If really rather fix using dlmalloc instead.
Hmm but there is no obvious analysis behind your preference. If we have code like this:
do_something_with_fat() { void *buf = malloc(...);
...
free(buf);
}
it seems to me we should convert it to use the stack instead, unless there is some recursion problem, etc. Otherwise we are just causing pointless churn on malloc().
I don't think it's that easy. There are platforms where a big stack size hurts. I don't think such big buffers like MAX_CLUSTERSIZE should be allocated on stack.
Another option would be to put the FPGA image in a known position on the media, outside the FAT partition. The position could perhaps be written into a FAT file, and reading just that file in SPL might not involve many free() operations.
Sounds like a workaround, too. I think the U-Boot infrastructure should work for the boards, not placing restrictions on them.
I'll await your answer to my first question in this email before passing judgement.
I hesitate to suggest enhancing simple malloc() to support a free list. That would increase the code size and I'm not sure it would be better than using full malloc().
Yes, I think that might result in making a second dlmalloc.... :-)
Thanks for your thoughts and input!
Overall I feel that the current trade-offs and phases of boot are reasonable. We should be suspicious of attempts to make them more complex.
I think denying BSS access in SPL before board_init_r is a bit outdated. In times where board_init_f was ASM, it could have been OK, but nowadays where this is C and you can't check the code easily to *not* use BSS, I'm not too sure this limitation should holf.
Anyway, I realize there are people wanting to keep this up, so I'll work in that direction.
The initial intention of this series was to use full heap just like tiny heap in SPL. Tiny heap is configured via Kconfig by setting its size, full heap (available some time during board_init_r) currently needs CONFIG_SYS_SPL_MALLOC_SIZE and CONFIG_SYS_SPL_MALLOC_START
I'd prefer it if the full heap implementation could use the same allocation strategy as the tiny heap. That should make configuration easier, not more complex. Maybe I started it the wrong way, maybe you have a better idea of how to make is simpler in that direction?
Regards, Simon
Regards, Simon