[U-Boot] [PATCH v3 0/6] spl: full-featured heap cleanups

Some platforms cannot use simple malloc even in very early stages, e.g. when using FAT before DRAM is available. Such platforms currently often use non-Kconfig defines to initialize full malloc and rely on simple heap before that.
This series makes some adjustments to ensure SPL behaves the same with simple or full malloc: when CONFIG_SPL_SYS_MALLOC_F_LEN is != 0, both heap types can be used (by changing CONFIG_SPL_SYS_MALLOC_SIMPLE), without manually supplying an address range for the full heap.
Changes in v3: - improve commit message to show why CONFIG_CLEAR_BSS_F is needed - fixed summary ("stack" -> "heap") - enable CONFIG_SPL_CLEAR_BSS_F for socfpga_arria10 using full malloc early in SPL - rebased
Changes in v2: - make CONFIG_SPL_CLEAR_BSS_F depend on ARM for now - add CONFIG_SPL_CLEAR_BSS_F implementation for arm64 also - use if() instead of #if - adapt documentation to using CONFIG_SPL_SYS_MALLOC_F_LEN for full-featured heap as well - ensure SPL_CLEAR_BSS_F is set when using SYS_MALLOC_F_LEN for full featured heap (or else, the heap status stored in bss will be overwritten between board_init_f and board_init_r)
Simon Goldschmidt (6): spl: add Kconfig option to clear bss early spl: arm: implement SPL_CLEAR_BSS_F dlmalloc: fix malloc range at end of ram dlmalloc: be compatible to tiny printf spl: support using full malloc with SYS_MALLOC_F_LEN arm: socfpga: a10: move SPL heap size to Kconfig
Kconfig | 24 ++++++++++++++-------- README | 15 ++++++++++---- arch/arm/lib/crt0.S | 22 +++++++++++++++++++++ arch/arm/lib/crt0_64.S | 14 +++++++++++++ common/dlmalloc.c | 6 +++++- common/spl/Kconfig | 12 +++++++++++ common/spl/spl.c | 10 ++++++++-- configs/socfpga_arria10_defconfig | 2 ++ drivers/core/Kconfig | 33 +++++++++++++++---------------- include/configs/socfpga_common.h | 14 ------------- 10 files changed, 106 insertions(+), 46 deletions(-)

This introduces a new Kconfig option SPL_CLEAR_BSS_F. If enabled, it clears the bss before calling board_init_f() instead of clearing it before calling board_init_r().
This also ensures that variables placed in BSS can be shared between board_init_f() and board_init_r() in SPL.
Such global variables are used, for example, when loading things from FAT before SDRAM is available: the full heap required for FAT uses global variables and clearing BSS after board_init_f() would reset the heap state. An example for such a usage is socfpa_arria10 where an FPGA configuration is required before SDRAM can be used.
Make the new option depend on ARM for now until more implementations follow.
Signed-off-by: Simon Goldschmidt simon.k.r.goldschmidt@gmail.com ---
Changes in v3: - improve commit message to show why CONFIG_CLEAR_BSS_F is needed
Changes in v2: - make CONFIG_SPL_CLEAR_BSS_F depend on ARM for now
common/spl/Kconfig | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/common/spl/Kconfig b/common/spl/Kconfig index 206c24076d..6a4270516a 100644 --- a/common/spl/Kconfig +++ b/common/spl/Kconfig @@ -156,6 +156,18 @@ config SPL_STACK_R_MALLOC_SIMPLE_LEN to give board_init_r() a larger heap then the initial heap in SRAM which is limited to SYS_MALLOC_F_LEN bytes.
+config SPL_CLEAR_BSS_F + bool "Clear BSS section before calling board_init_f" + depends on ARM + help + The BSS section is initialized to zero. In SPL, this is normally done + before calling board_init_r(). + For platforms using BSS in board_init_f() already, enable this to + clear the BSS section before calling board_init_f() instead of + clearing it before calling board_init_r(). This also ensures that + variables placed in BSS can be shared between board_init_f() and + board_init_r(). + config SPL_SEPARATE_BSS bool "BSS section is in a different memory region from text" help

Hi Simon,
On Wed, 27 Mar 2019 at 13:40, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
This introduces a new Kconfig option SPL_CLEAR_BSS_F. If enabled, it clears the bss before calling board_init_f() instead of clearing it before calling board_init_r().
This also ensures that variables placed in BSS can be shared between board_init_f() and board_init_r() in SPL.
Such global variables are used, for example, when loading things from FAT before SDRAM is available: the full heap required for FAT uses global variables and clearing BSS after board_init_f() would reset the heap state. An example for such a usage is socfpa_arria10 where an FPGA configuration is required before SDRAM can be used.
Make the new option depend on ARM for now until more implementations follow.
I still have objections to this series and I think we should discuss other ways of solving this problem.
Does socfgpa have SRAM that could be used before SDRAM is available? If so, can we not use that for the configuration? What various are actually in BSS that are needed before board_init_r() is called? Can they not be in a struct created from malloc()?
If this is a limitation of FAT, then I think we should fix that, instead.
Regards, Simon
Signed-off-by: Simon Goldschmidt simon.k.r.goldschmidt@gmail.com
Changes in v3:
- improve commit message to show why CONFIG_CLEAR_BSS_F is needed
Changes in v2:
- make CONFIG_SPL_CLEAR_BSS_F depend on ARM for now
common/spl/Kconfig | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/common/spl/Kconfig b/common/spl/Kconfig index 206c24076d..6a4270516a 100644 --- a/common/spl/Kconfig +++ b/common/spl/Kconfig @@ -156,6 +156,18 @@ config SPL_STACK_R_MALLOC_SIMPLE_LEN to give board_init_r() a larger heap then the initial heap in SRAM which is limited to SYS_MALLOC_F_LEN bytes.
+config SPL_CLEAR_BSS_F
bool "Clear BSS section before calling board_init_f"
depends on ARM
help
The BSS section is initialized to zero. In SPL, this is normally done
before calling board_init_r().
For platforms using BSS in board_init_f() already, enable this to
clear the BSS section before calling board_init_f() instead of
clearing it before calling board_init_r(). This also ensures that
variables placed in BSS can be shared between board_init_f() and
board_init_r().
config SPL_SEPARATE_BSS bool "BSS section is in a different memory region from text" help -- 2.17.1

Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 21:06:
Hi Simon,
On Wed, 27 Mar 2019 at 13:40, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
This introduces a new Kconfig option SPL_CLEAR_BSS_F. If enabled, it
clears
the bss before calling board_init_f() instead of clearing it before
calling
board_init_r().
This also ensures that variables placed in BSS can be shared between board_init_f() and board_init_r() in SPL.
Such global variables are used, for example, when loading things from FAT before SDRAM is available: the full heap required for FAT uses global variables and clearing BSS after board_init_f() would reset the heap
state.
An example for such a usage is socfpa_arria10 where an FPGA configuration is required before SDRAM can be used.
Make the new option depend on ARM for now until more implementations
follow.
I still have objections to this series and I think we should discuss other ways of solving this problem.
Does socfgpa have SRAM that could be used before SDRAM is available? If so, can we not use that for the configuration? What various are actually in BSS that are needed before board_init_r() is called? Can they not be in a struct created from malloc()?
The problem is the board needs to load an FPGA configuration from FAT before SDRAM is available. Yes, this is loaded into SRAM of course, but the whole code until that is done uses so many malloc/free iterations that The simple mall of implementation would require too much memory.
And it's the full malloc state variables only that use BSS, not the FAT code.
One way out could be to move the full mall of state variables into 'gd'...
Another way would be to continue into board_init_f without SDRAM enabled and in it it later...
Regards, Simon
If this is a limitation of FAT, then I think we should fix that, instead.
Regards, Simon
Signed-off-by: Simon Goldschmidt simon.k.r.goldschmidt@gmail.com
Changes in v3:
- improve commit message to show why CONFIG_CLEAR_BSS_F is needed
Changes in v2:
- make CONFIG_SPL_CLEAR_BSS_F depend on ARM for now
common/spl/Kconfig | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/common/spl/Kconfig b/common/spl/Kconfig index 206c24076d..6a4270516a 100644 --- a/common/spl/Kconfig +++ b/common/spl/Kconfig @@ -156,6 +156,18 @@ config SPL_STACK_R_MALLOC_SIMPLE_LEN to give board_init_r() a larger heap then the initial heap in SRAM which is limited to SYS_MALLOC_F_LEN bytes.
+config SPL_CLEAR_BSS_F
bool "Clear BSS section before calling board_init_f"
depends on ARM
help
The BSS section is initialized to zero. In SPL, this is
normally done
before calling board_init_r().
For platforms using BSS in board_init_f() already, enable this
to
clear the BSS section before calling board_init_f() instead of
clearing it before calling board_init_r(). This also ensures
that
variables placed in BSS can be shared between board_init_f()
and
board_init_r().
config SPL_SEPARATE_BSS bool "BSS section is in a different memory region from text" help -- 2.17.1

Simon Goldschmidt simon.k.r.goldschmidt@gmail.com schrieb am Sa., 30. März 2019, 21:18:
Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 21:06:
Hi Simon,
On Wed, 27 Mar 2019 at 13:40, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
This introduces a new Kconfig option SPL_CLEAR_BSS_F. If enabled, it
clears
the bss before calling board_init_f() instead of clearing it before
calling
board_init_r().
This also ensures that variables placed in BSS can be shared between board_init_f() and board_init_r() in SPL.
Such global variables are used, for example, when loading things from
FAT
before SDRAM is available: the full heap required for FAT uses global variables and clearing BSS after board_init_f() would reset the heap
state.
An example for such a usage is socfpa_arria10 where an FPGA
configuration
is required before SDRAM can be used.
Make the new option depend on ARM for now until more implementations
follow.
I still have objections to this series and I think we should discuss other ways of solving this problem.
Does socfgpa have SRAM that could be used before SDRAM is available? If so, can we not use that for the configuration? What various are actually in BSS that are needed before board_init_r() is called? Can they not be in a struct created from malloc()?
The problem is the board needs to load an FPGA configuration from FAT before SDRAM is available. Yes, this is loaded into SRAM of course, but the whole code until that is done uses so many malloc/free iterations that The simple mall of implementation would require too much memory.
And it's the full malloc state variables only that use BSS, not the FAT code.
One way out could be to move the full mall of state variables into 'gd'...
Maybe I should point out again that the whole purpose of this series is to have an SPL that uses full malloc right from the start. This is currently not possible as full malloc needs BSS.
Regards, Simon
Another way would be to continue into board_init_f without SDRAM enabled and in it it later...
Regards, Simon
If this is a limitation of FAT, then I think we should fix that, instead.
Regards, Simon
Signed-off-by: Simon Goldschmidt simon.k.r.goldschmidt@gmail.com
Changes in v3:
- improve commit message to show why CONFIG_CLEAR_BSS_F is needed
Changes in v2:
- make CONFIG_SPL_CLEAR_BSS_F depend on ARM for now
common/spl/Kconfig | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/common/spl/Kconfig b/common/spl/Kconfig index 206c24076d..6a4270516a 100644 --- a/common/spl/Kconfig +++ b/common/spl/Kconfig @@ -156,6 +156,18 @@ config SPL_STACK_R_MALLOC_SIMPLE_LEN to give board_init_r() a larger heap then the initial heap in SRAM which is limited to SYS_MALLOC_F_LEN bytes.
+config SPL_CLEAR_BSS_F
bool "Clear BSS section before calling board_init_f"
depends on ARM
help
The BSS section is initialized to zero. In SPL, this is
normally done
before calling board_init_r().
For platforms using BSS in board_init_f() already, enable
this to
clear the BSS section before calling board_init_f() instead of
clearing it before calling board_init_r(). This also ensures
that
variables placed in BSS can be shared between board_init_f()
and
board_init_r().
config SPL_SEPARATE_BSS bool "BSS section is in a different memory region from text" help -- 2.17.1

Hi Simon,
On Sat, 30 Mar 2019 at 14:50, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
Simon Goldschmidt simon.k.r.goldschmidt@gmail.com schrieb am Sa., 30. März 2019, 21:18:
Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 21:06:
Hi Simon,
On Wed, 27 Mar 2019 at 13:40, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
This introduces a new Kconfig option SPL_CLEAR_BSS_F. If enabled, it clears the bss before calling board_init_f() instead of clearing it before calling board_init_r().
This also ensures that variables placed in BSS can be shared between board_init_f() and board_init_r() in SPL.
Such global variables are used, for example, when loading things from FAT before SDRAM is available: the full heap required for FAT uses global variables and clearing BSS after board_init_f() would reset the heap state. An example for such a usage is socfpa_arria10 where an FPGA configuration is required before SDRAM can be used.
Make the new option depend on ARM for now until more implementations follow.
I still have objections to this series and I think we should discuss other ways of solving this problem.
Does socfgpa have SRAM that could be used before SDRAM is available? If so, can we not use that for the configuration? What various are actually in BSS that are needed before board_init_r() is called? Can they not be in a struct created from malloc()?
The problem is the board needs to load an FPGA configuration from FAT before SDRAM is available. Yes, this is loaded into SRAM of course, but the whole code until that is done uses so many malloc/free iterations that The simple mall of implementation would require too much memory.
And it's the full malloc state variables only that use BSS, not the FAT code.
One way out could be to move the full mall of state variables into 'gd'...
Maybe I should point out again that the whole purpose of this series is to have an SPL that uses full malloc right from the start. This is currently not possible as full malloc needs BSS.
Right, and our assumption/design is that full malloc() requires SRAM.
Here we have an architecture that requires FAT just to init its SDRAM. FAT requires malloc() and free() and there is not enough SRAM to skip the free() calls. So we have to use full malloc() and that uses BSS. But BSS is not available before board_init_r(). But we need to init SDRAM before board_init_r().
Firstly I'd point out that someone should speak to the chip designers. Did anyone try to boot U-Boot on the SoC model?
I think it is possible to change dlmalloc to put its variables in a struct. Then I suppose the struct pointer could be kept in gd. That would be one solution. Then full malloc() could be inited in spl_common_init() perhaps.
Another option might be to update the FAT code to use ALLOC_CACHE_ALIGN_BUFFER() instead of malloc(), as it already does in places, and perhaps to disable all caching for this case. Then it might work with simple malloc().
Another option would be to put the FPGA image in a known position on the media, outside the FAT partition. The position could perhaps be written into a FAT file, and reading just that file in SPL might not involve many free() operations.
I hesitate to suggest enhancing simple malloc() to support a free list. That would increase the code size and I'm not sure it would be better than using full malloc().
[..]
Regards, Simon

Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 22:18:
Hi Simon,
On Sat, 30 Mar 2019 at 14:50, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
Simon Goldschmidt simon.k.r.goldschmidt@gmail.com schrieb am Sa., 30.
März 2019, 21:18:
Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 21:06:
Hi Simon,
On Wed, 27 Mar 2019 at 13:40, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
This introduces a new Kconfig option SPL_CLEAR_BSS_F. If enabled, it
clears
the bss before calling board_init_f() instead of clearing it before
calling
board_init_r().
This also ensures that variables placed in BSS can be shared between board_init_f() and board_init_r() in SPL.
Such global variables are used, for example, when loading things
from FAT
before SDRAM is available: the full heap required for FAT uses global variables and clearing BSS after board_init_f() would reset the heap
state.
An example for such a usage is socfpa_arria10 where an FPGA
configuration
is required before SDRAM can be used.
Make the new option depend on ARM for now until more implementations
follow.
I still have objections to this series and I think we should discuss other ways of solving this problem.
Does socfgpa have SRAM that could be used before SDRAM is available? If so, can we not use that for the configuration? What various are actually in BSS that are needed before board_init_r() is called? Can they not be in a struct created from malloc()?
The problem is the board needs to load an FPGA configuration from FAT
before SDRAM is available. Yes, this is loaded into SRAM of course, but the whole code until that is done uses so many malloc/free iterations that The simple mall of implementation would require too much memory.
And it's the full malloc state variables only that use BSS, not the FAT
code.
One way out could be to move the full mall of state variables into
'gd'...
Maybe I should point out again that the whole purpose of this series is
to have an SPL that uses full malloc right from the start. This is currently not possible as full malloc needs BSS.
Right, and our assumption/design is that full malloc() requires SRAM.
Here we have an architecture that requires FAT just to init its SDRAM. FAT requires malloc() and free() and there is not enough SRAM to skip the free() calls. So we have to use full malloc() and that uses BSS. But BSS is not available before board_init_r(). But we need to init SDRAM before board_init_r().
Firstly I'd point out that someone should speak to the chip designers. Did anyone try to boot U-Boot on the SoC model?
Well, it's a U-Boot thing to load it from FAT. It could probably be loaded from RAW mmc without problems, so I don't know if it's a chip designers issue. I think it's an issue that we need to fix in U-Boot: we have a good full malloc implementation but it's not usable in all cases were it should be.
I think it is possible to change dlmalloc to put its variables in a struct. Then I suppose the struct pointer could be kept in gd. That would be one solution. Then full malloc() could be inited in spl_common_init() perhaps.
That might be worth a try.
Another option might be to update the FAT code to use ALLOC_CACHE_ALIGN_BUFFER() instead of malloc(), as it already does in places, and perhaps to disable all caching for this case. Then it might work with simple malloc().
Hmm, then the next platform will have problems because allocating via malloc would be preferable. If really rather fix using dlmalloc instead.
Another option would be to put the FPGA image in a known position on the media, outside the FAT partition. The position could perhaps be written into a FAT file, and reading just that file in SPL might not involve many free() operations.
Sounds like a workaround, too. I think the U-Boot infrastructure should work for the boards, not placing restrictions on them.
I hesitate to suggest enhancing simple malloc() to support a free list. That would increase the code size and I'm not sure it would be better than using full malloc().
Yes, I think that might result in making a second dlmalloc.... :-)
Thanks for your thoughts and input!
Regards, Simon

Hi Simon,
On Sat, 30 Mar 2019 at 15:40, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 22:18:
Hi Simon,
On Sat, 30 Mar 2019 at 14:50, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
Simon Goldschmidt simon.k.r.goldschmidt@gmail.com schrieb am Sa., 30. März 2019, 21:18:
Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 21:06:
Hi Simon,
On Wed, 27 Mar 2019 at 13:40, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
This introduces a new Kconfig option SPL_CLEAR_BSS_F. If enabled, it clears the bss before calling board_init_f() instead of clearing it before calling board_init_r().
This also ensures that variables placed in BSS can be shared between board_init_f() and board_init_r() in SPL.
Such global variables are used, for example, when loading things from FAT before SDRAM is available: the full heap required for FAT uses global variables and clearing BSS after board_init_f() would reset the heap state. An example for such a usage is socfpa_arria10 where an FPGA configuration is required before SDRAM can be used.
Make the new option depend on ARM for now until more implementations follow.
I still have objections to this series and I think we should discuss other ways of solving this problem.
Does socfgpa have SRAM that could be used before SDRAM is available? If so, can we not use that for the configuration? What various are actually in BSS that are needed before board_init_r() is called? Can they not be in a struct created from malloc()?
The problem is the board needs to load an FPGA configuration from FAT before SDRAM is available. Yes, this is loaded into SRAM of course, but the whole code until that is done uses so many malloc/free iterations that The simple mall of implementation would require too much memory.
And it's the full malloc state variables only that use BSS, not the FAT code.
One way out could be to move the full mall of state variables into 'gd'...
Maybe I should point out again that the whole purpose of this series is to have an SPL that uses full malloc right from the start. This is currently not possible as full malloc needs BSS.
Right, and our assumption/design is that full malloc() requires SRAM.
Here we have an architecture that requires FAT just to init its SDRAM. FAT requires malloc() and free() and there is not enough SRAM to skip the free() calls. So we have to use full malloc() and that uses BSS. But BSS is not available before board_init_r(). But we need to init SDRAM before board_init_r().
Firstly I'd point out that someone should speak to the chip designers. Did anyone try to boot U-Boot on the SoC model?
Well, it's a U-Boot thing to load it from FAT. It could probably be loaded from RAW mmc without problems, so I don't know if it's a chip designers issue. I think it's an issue that we need to fix in U-Boot: we have a good full malloc implementation but it's not usable in all cases were it should be.
OK then why use FAT? I assumed it was a boot-ROM requirement. How does the boot ROM load SPL?
I think it is possible to change dlmalloc to put its variables in a struct. Then I suppose the struct pointer could be kept in gd. That would be one solution. Then full malloc() could be inited in spl_common_init() perhaps.
That might be worth a try.
Yes shouldn't be too painful. I suspect it would be neutral on code size, too.
Another option might be to update the FAT code to use ALLOC_CACHE_ALIGN_BUFFER() instead of malloc(), as it already does in places, and perhaps to disable all caching for this case. Then it might work with simple malloc().
Hmm, then the next platform will have problems because allocating via malloc would be preferable. If really rather fix using dlmalloc instead.
Hmm but there is no obvious analysis behind your preference. If we have code like this:
do_something_with_fat() { void *buf = malloc(...);
...
free(buf); }
it seems to me we should convert it to use the stack instead, unless there is some recursion problem, etc. Otherwise we are just causing pointless churn on malloc().
Another option would be to put the FPGA image in a known position on the media, outside the FAT partition. The position could perhaps be written into a FAT file, and reading just that file in SPL might not involve many free() operations.
Sounds like a workaround, too. I think the U-Boot infrastructure should work for the boards, not placing restrictions on them.
I'll await your answer to my first question in this email before passing judgement.
I hesitate to suggest enhancing simple malloc() to support a free list. That would increase the code size and I'm not sure it would be better than using full malloc().
Yes, I think that might result in making a second dlmalloc.... :-)
Thanks for your thoughts and input!
Overall I feel that the current trade-offs and phases of boot are reasonable. We should be suspicious of attempts to make them more complex.
Regards, Simon

On 30.03.19 23:37, Simon Glass wrote:
Hi Simon,
On Sat, 30 Mar 2019 at 15:40, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 22:18:
Hi Simon,
On Sat, 30 Mar 2019 at 14:50, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
Simon Goldschmidt simon.k.r.goldschmidt@gmail.com schrieb am Sa., 30. März 2019, 21:18:
Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 21:06:
Hi Simon,
On Wed, 27 Mar 2019 at 13:40, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote: > > This introduces a new Kconfig option SPL_CLEAR_BSS_F. If enabled, it clears > the bss before calling board_init_f() instead of clearing it before calling > board_init_r(). > > This also ensures that variables placed in BSS can be shared between > board_init_f() and board_init_r() in SPL. > > Such global variables are used, for example, when loading things from FAT > before SDRAM is available: the full heap required for FAT uses global > variables and clearing BSS after board_init_f() would reset the heap state. > An example for such a usage is socfpa_arria10 where an FPGA configuration > is required before SDRAM can be used. > > Make the new option depend on ARM for now until more implementations follow. >
I still have objections to this series and I think we should discuss other ways of solving this problem.
Does socfgpa have SRAM that could be used before SDRAM is available? If so, can we not use that for the configuration? What various are actually in BSS that are needed before board_init_r() is called? Can they not be in a struct created from malloc()?
The problem is the board needs to load an FPGA configuration from FAT before SDRAM is available. Yes, this is loaded into SRAM of course, but the whole code until that is done uses so many malloc/free iterations that The simple mall of implementation would require too much memory.
And it's the full malloc state variables only that use BSS, not the FAT code.
One way out could be to move the full mall of state variables into 'gd'...
Maybe I should point out again that the whole purpose of this series is to have an SPL that uses full malloc right from the start. This is currently not possible as full malloc needs BSS.
Right, and our assumption/design is that full malloc() requires SRAM.
Here we have an architecture that requires FAT just to init its SDRAM. FAT requires malloc() and free() and there is not enough SRAM to skip the free() calls. So we have to use full malloc() and that uses BSS. But BSS is not available before board_init_r(). But we need to init SDRAM before board_init_r().
Firstly I'd point out that someone should speak to the chip designers. Did anyone try to boot U-Boot on the SoC model?
Well, it's a U-Boot thing to load it from FAT. It could probably be loaded from RAW mmc without problems, so I don't know if it's a chip designers issue. I think it's an issue that we need to fix in U-Boot: we have a good full malloc implementation but it's not usable in all cases were it should be.
OK then why use FAT? I assumed it was a boot-ROM requirement. How does the boot ROM load SPL?
Honestly, I don't know. It's Altera/Intel that decided for this boot flow. However, since FPGA development is often done from Windows, I guess it's convenient for FPGA development to write the resulting binary to a FAT partition on a sdcard.
I think it is possible to change dlmalloc to put its variables in a struct. Then I suppose the struct pointer could be kept in gd. That would be one solution. Then full malloc() could be inited in spl_common_init() perhaps.
That might be worth a try.
Yes shouldn't be too painful. I suspect it would be neutral on code size, too.
Another option might be to update the FAT code to use ALLOC_CACHE_ALIGN_BUFFER() instead of malloc(), as it already does in places, and perhaps to disable all caching for this case. Then it might work with simple malloc().
Hmm, then the next platform will have problems because allocating via malloc would be preferable. If really rather fix using dlmalloc instead.
Hmm but there is no obvious analysis behind your preference. If we have code like this:
do_something_with_fat() { void *buf = malloc(...);
...
free(buf);
}
it seems to me we should convert it to use the stack instead, unless there is some recursion problem, etc. Otherwise we are just causing pointless churn on malloc().
I don't think it's that easy. There are platforms where a big stack size hurts. I don't think such big buffers like MAX_CLUSTERSIZE should be allocated on stack.
Another option would be to put the FPGA image in a known position on the media, outside the FAT partition. The position could perhaps be written into a FAT file, and reading just that file in SPL might not involve many free() operations.
Sounds like a workaround, too. I think the U-Boot infrastructure should work for the boards, not placing restrictions on them.
I'll await your answer to my first question in this email before passing judgement.
I hesitate to suggest enhancing simple malloc() to support a free list. That would increase the code size and I'm not sure it would be better than using full malloc().
Yes, I think that might result in making a second dlmalloc.... :-)
Thanks for your thoughts and input!
Overall I feel that the current trade-offs and phases of boot are reasonable. We should be suspicious of attempts to make them more complex.
I think denying BSS access in SPL before board_init_r is a bit outdated. In times where board_init_f was ASM, it could have been OK, but nowadays where this is C and you can't check the code easily to *not* use BSS, I'm not too sure this limitation should holf.
Anyway, I realize there are people wanting to keep this up, so I'll work in that direction.
The initial intention of this series was to use full heap just like tiny heap in SPL. Tiny heap is configured via Kconfig by setting its size, full heap (available some time during board_init_r) currently needs CONFIG_SYS_SPL_MALLOC_SIZE and CONFIG_SYS_SPL_MALLOC_START
I'd prefer it if the full heap implementation could use the same allocation strategy as the tiny heap. That should make configuration easier, not more complex. Maybe I started it the wrong way, maybe you have a better idea of how to make is simpler in that direction?
Regards, Simon
Regards, Simon

Hello Simon,
Am 30.03.2019 um 23:37 schrieb Simon Glass:
Hi Simon,
On Sat, 30 Mar 2019 at 15:40, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 22:18:
Hi Simon,
On Sat, 30 Mar 2019 at 14:50, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
Simon Goldschmidt simon.k.r.goldschmidt@gmail.com schrieb am Sa., 30. März 2019, 21:18:
Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 21:06:
Hi Simon,
On Wed, 27 Mar 2019 at 13:40, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote: > > This introduces a new Kconfig option SPL_CLEAR_BSS_F. If enabled, it clears > the bss before calling board_init_f() instead of clearing it before calling > board_init_r(). > > This also ensures that variables placed in BSS can be shared between > board_init_f() and board_init_r() in SPL. > > Such global variables are used, for example, when loading things from FAT > before SDRAM is available: the full heap required for FAT uses global > variables and clearing BSS after board_init_f() would reset the heap state. > An example for such a usage is socfpa_arria10 where an FPGA configuration > is required before SDRAM can be used. > > Make the new option depend on ARM for now until more implementations follow. >
I still have objections to this series and I think we should discuss other ways of solving this problem.
Does socfgpa have SRAM that could be used before SDRAM is available? If so, can we not use that for the configuration? What various are actually in BSS that are needed before board_init_r() is called? Can they not be in a struct created from malloc()?
The problem is the board needs to load an FPGA configuration from FAT before SDRAM is available. Yes, this is loaded into SRAM of course, but the whole code until that is done uses so many malloc/free iterations that The simple mall of implementation would require too much memory.
And it's the full malloc state variables only that use BSS, not the FAT code.
One way out could be to move the full mall of state variables into 'gd'...
Maybe I should point out again that the whole purpose of this series is to have an SPL that uses full malloc right from the start. This is currently not possible as full malloc needs BSS.
Right, and our assumption/design is that full malloc() requires SRAM.
Here we have an architecture that requires FAT just to init its SDRAM. FAT requires malloc() and free() and there is not enough SRAM to skip the free() calls. So we have to use full malloc() and that uses BSS. But BSS is not available before board_init_r(). But we need to init SDRAM before board_init_r().
Firstly I'd point out that someone should speak to the chip designers. Did anyone try to boot U-Boot on the SoC model?
Well, it's a U-Boot thing to load it from FAT. It could probably be loaded from RAW mmc without problems, so I don't know if it's a chip designers issue. I think it's an issue that we need to fix in U-Boot: we have a good full malloc implementation but it's not usable in all cases were it should be.
OK then why use FAT? I assumed it was a boot-ROM requirement. How does the boot ROM load SPL?
I think it is possible to change dlmalloc to put its variables in a struct. Then I suppose the struct pointer could be kept in gd. That would be one solution. Then full malloc() could be inited in spl_common_init() perhaps.
That might be worth a try.
Yes shouldn't be too painful. I suspect it would be neutral on code size, too.
Sorry, for digging in so late, here. I just stumbeled over the same problem ... but we cannot use BSS before relocation on all platforms, so I think, this patch is no option.
Hmm... I did not take a deep look at the dlmalloc implementation, but may my "fix" for the problem with Stefans patch, see here:
http://patchwork.ozlabs.org/patch/1065508/#2130443
is may an option for dlmalloc?
If not, I think, the option suggested from Simon here is the way to go ...
bye, Heiko

On Mon, Apr 1, 2019 at 8:07 AM Heiko Schocher hs@denx.de wrote:
Hello Simon,
Am 30.03.2019 um 23:37 schrieb Simon Glass:
Hi Simon,
On Sat, 30 Mar 2019 at 15:40, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 22:18:
Hi Simon,
On Sat, 30 Mar 2019 at 14:50, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
Simon Goldschmidt simon.k.r.goldschmidt@gmail.com schrieb am Sa., 30. März 2019, 21:18:
Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 21:06: > > Hi Simon, > > On Wed, 27 Mar 2019 at 13:40, Simon Goldschmidt > simon.k.r.goldschmidt@gmail.com wrote: >> >> This introduces a new Kconfig option SPL_CLEAR_BSS_F. If enabled, it clears >> the bss before calling board_init_f() instead of clearing it before calling >> board_init_r(). >> >> This also ensures that variables placed in BSS can be shared between >> board_init_f() and board_init_r() in SPL. >> >> Such global variables are used, for example, when loading things from FAT >> before SDRAM is available: the full heap required for FAT uses global >> variables and clearing BSS after board_init_f() would reset the heap state. >> An example for such a usage is socfpa_arria10 where an FPGA configuration >> is required before SDRAM can be used. >> >> Make the new option depend on ARM for now until more implementations follow. >> > > I still have objections to this series and I think we should discuss > other ways of solving this problem. > > Does socfgpa have SRAM that could be used before SDRAM is available? > If so, can we not use that for the configuration? What various are > actually in BSS that are needed before board_init_r() is called? Can > they not be in a struct created from malloc()?
The problem is the board needs to load an FPGA configuration from FAT before SDRAM is available. Yes, this is loaded into SRAM of course, but the whole code until that is done uses so many malloc/free iterations that The simple mall of implementation would require too much memory.
And it's the full malloc state variables only that use BSS, not the FAT code.
One way out could be to move the full mall of state variables into 'gd'...
Maybe I should point out again that the whole purpose of this series is to have an SPL that uses full malloc right from the start. This is currently not possible as full malloc needs BSS.
Right, and our assumption/design is that full malloc() requires SRAM.
Here we have an architecture that requires FAT just to init its SDRAM. FAT requires malloc() and free() and there is not enough SRAM to skip the free() calls. So we have to use full malloc() and that uses BSS. But BSS is not available before board_init_r(). But we need to init SDRAM before board_init_r().
Firstly I'd point out that someone should speak to the chip designers. Did anyone try to boot U-Boot on the SoC model?
Well, it's a U-Boot thing to load it from FAT. It could probably be loaded from RAW mmc without problems, so I don't know if it's a chip designers issue. I think it's an issue that we need to fix in U-Boot: we have a good full malloc implementation but it's not usable in all cases were it should be.
OK then why use FAT? I assumed it was a boot-ROM requirement. How does the boot ROM load SPL?
I think it is possible to change dlmalloc to put its variables in a struct. Then I suppose the struct pointer could be kept in gd. That would be one solution. Then full malloc() could be inited in spl_common_init() perhaps.
That might be worth a try.
Yes shouldn't be too painful. I suspect it would be neutral on code size, too.
Sorry, for digging in so late, here. I just stumbeled over the same problem ... but we cannot use BSS before relocation on all platforms, so I think, this patch is no option.
Right, by now I know that folks want to keep it that way.
Hmm... I did not take a deep look at the dlmalloc implementation, but may my "fix" for the problem with Stefans patch, see here:
http://patchwork.ozlabs.org/patch/1065508/#2130443
is may an option for dlmalloc?
Moving variables from 'bss' into 'data' unoverridable by adding the 'section' attribute? No, I think that makes it even worse. Why would 'data' always be available when 'bss' is not?
If not, I think, the option suggested from Simon here is the way to go ...
I'm not too convinced of that either. I'll take the time to re-think this specific problem (using full-malloc in SPL without explicitly defining an address range) and see what other solutions I can think of.
Regards, Simon

Hello Simon,
Am 01.04.2019 um 10:43 schrieb Simon Goldschmidt:
On Mon, Apr 1, 2019 at 8:07 AM Heiko Schocher hs@denx.de wrote:
Hello Simon,
Am 30.03.2019 um 23:37 schrieb Simon Glass:
Hi Simon,
On Sat, 30 Mar 2019 at 15:40, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 22:18:
Hi Simon,
On Sat, 30 Mar 2019 at 14:50, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
Simon Goldschmidt simon.k.r.goldschmidt@gmail.com schrieb am Sa., 30. März 2019, 21:18: > > > > Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 21:06: >> >> Hi Simon, >> >> On Wed, 27 Mar 2019 at 13:40, Simon Goldschmidt >> simon.k.r.goldschmidt@gmail.com wrote: >>> >>> This introduces a new Kconfig option SPL_CLEAR_BSS_F. If enabled, it clears >>> the bss before calling board_init_f() instead of clearing it before calling >>> board_init_r(). >>> >>> This also ensures that variables placed in BSS can be shared between >>> board_init_f() and board_init_r() in SPL. >>> >>> Such global variables are used, for example, when loading things from FAT >>> before SDRAM is available: the full heap required for FAT uses global >>> variables and clearing BSS after board_init_f() would reset the heap state. >>> An example for such a usage is socfpa_arria10 where an FPGA configuration >>> is required before SDRAM can be used. >>> >>> Make the new option depend on ARM for now until more implementations follow. >>> >> >> I still have objections to this series and I think we should discuss >> other ways of solving this problem. >> >> Does socfgpa have SRAM that could be used before SDRAM is available? >> If so, can we not use that for the configuration? What various are >> actually in BSS that are needed before board_init_r() is called? Can >> they not be in a struct created from malloc()? > > > The problem is the board needs to load an FPGA configuration from FAT before SDRAM is available. Yes, this is loaded into SRAM of course, but the whole code until that is done uses so many malloc/free iterations that The simple mall of implementation would require too much memory. > > And it's the full malloc state variables only that use BSS, not the FAT code. > > One way out could be to move the full mall of state variables into 'gd'...
Maybe I should point out again that the whole purpose of this series is to have an SPL that uses full malloc right from the start. This is currently not possible as full malloc needs BSS.
Right, and our assumption/design is that full malloc() requires SRAM.
Here we have an architecture that requires FAT just to init its SDRAM. FAT requires malloc() and free() and there is not enough SRAM to skip the free() calls. So we have to use full malloc() and that uses BSS. But BSS is not available before board_init_r(). But we need to init SDRAM before board_init_r().
Firstly I'd point out that someone should speak to the chip designers. Did anyone try to boot U-Boot on the SoC model?
Well, it's a U-Boot thing to load it from FAT. It could probably be loaded from RAW mmc without problems, so I don't know if it's a chip designers issue. I think it's an issue that we need to fix in U-Boot: we have a good full malloc implementation but it's not usable in all cases were it should be.
OK then why use FAT? I assumed it was a boot-ROM requirement. How does the boot ROM load SPL?
I think it is possible to change dlmalloc to put its variables in a struct. Then I suppose the struct pointer could be kept in gd. That would be one solution. Then full malloc() could be inited in spl_common_init() perhaps.
That might be worth a try.
Yes shouldn't be too painful. I suspect it would be neutral on code size, too.
Sorry, for digging in so late, here. I just stumbeled over the same problem ... but we cannot use BSS before relocation on all platforms, so I think, this patch is no option.
Right, by now I know that folks want to keep it that way.
Hmm... I did not take a deep look at the dlmalloc implementation, but may my "fix" for the problem with Stefans patch, see here:
http://patchwork.ozlabs.org/patch/1065508/#2130443
is may an option for dlmalloc?
Moving variables from 'bss' into 'data' unoverridable by adding the 'section' attribute? No, I think that makes it even worse. Why would 'data' always be available when 'bss' is not?
Why marks the section attribute the variable unoverrideable in U-Boot?
I could not find any special comment in: https://gcc.gnu.org/onlinedocs/gcc-7.4.0/gcc/Common-Variable-Attributes.html...
Moving this variable into data section has the effect, that the variable is between __image_copy_start and __image_copy_end and as SPL copy U-Boot image into writeable memory, it is guaranteed, that variables in data are in writeable storage medium ...
And this should apply for SPL also, as your ROMbootloader must copy SPL into some writeable place ... or does your SPL code is executed from ROM ?
If not, I think, the option suggested from Simon here is the way to go ...
I'm not too convinced of that either. I'll take the time to re-think this specific problem (using full-malloc in SPL without explicitly defining an address range) and see what other solutions I can think of.
Ok!
bye, Heiko

Simon,
On Sat, Mar 30, 2019 at 09:18:08PM +0100, Simon Goldschmidt wrote:
Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 21:06:
Hi Simon,
On Wed, 27 Mar 2019 at 13:40, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
This introduces a new Kconfig option SPL_CLEAR_BSS_F. If enabled, it
clears
the bss before calling board_init_f() instead of clearing it before
calling
board_init_r().
This also ensures that variables placed in BSS can be shared between board_init_f() and board_init_r() in SPL.
Such global variables are used, for example, when loading things from FAT before SDRAM is available: the full heap required for FAT uses global variables and clearing BSS after board_init_f() would reset the heap
state.
An example for such a usage is socfpa_arria10 where an FPGA configuration is required before SDRAM can be used.
Make the new option depend on ARM for now until more implementations
follow.
I still have objections to this series and I think we should discuss other ways of solving this problem.
Does socfgpa have SRAM that could be used before SDRAM is available? If so, can we not use that for the configuration? What various are actually in BSS that are needed before board_init_r() is called? Can they not be in a struct created from malloc()?
The problem is the board needs to load an FPGA configuration from FAT before SDRAM is available. Yes, this is loaded into SRAM of course, but the whole code until that is done uses so many malloc/free iterations that The simple mall of implementation would require too much memory.
And it's the full malloc state variables only that use BSS, not the FAT code.
I've actually faced very similar issues working on our TI AM654x "System Firmware Loader" implementation (will post upstream soon), where I need to load this firmware and other files from media such as MMC/FAT in a very memory-constrained SPL pre-relocation environment *before* I can bring up DDR.
Initially, I modified the fat.c driver to re-use memory so it is not as wasteful during SYS_MALLOC_SIMPLE. While I'm not proud of this solution [1] this allowed us to get going, allowing to load multiple files without issues in pre-relocation SPL.
In the quest of creating something more upstream-friendly I had then switched to using full malloc in pre-relocation SPL so that I didn't have to hack the FAT driver, encountering similar issues like you brought up and got this working, but ultimately abandoned this approach after bundling all files needed to get loaded into a single image tree blob which no longer required any of those solutions.
What remained till today however is a need to preserve specific BSS state from pre-relocation SPL over to post-relocation SPL environment, namely flags set to avoid the (expensive) re-probing of peripheral drivers by the SPL loader. For that I introduced a Kconfig option that allows skipping the automatic clearing of BSS during relocation [2].
Seeing this very related discussion here got me thinking about how else I can carry over this "state" from pre- to post relocation but that's probably a discussion to be had once I post my "System Firmware Loader Series", probably next week.
PS: If you want to save a ton of memory during FAT loading you can try something like CONFIG_FS_FAT_MAX_CLUSTSIZE=16384, I argue the default is overkill for all practical scenarios.
-- Andreas Dannenberg Texas Instruments Inc
[1] http://git.ti.com/gitweb/?p=ti-u-boot/ti-u-boot.git;a=commitdiff;h=3de26c27d... [2] http://git.ti.com/gitweb/?p=ti-u-boot/ti-u-boot.git;a=commitdiff;h=9ab4a405c...
One way out could be to move the full mall of state variables into 'gd'...
Another way would be to continue into board_init_f without SDRAM enabled and in it it later...
Regards, Simon
If this is a limitation of FAT, then I think we should fix that, instead.
Regards, Simon
Signed-off-by: Simon Goldschmidt simon.k.r.goldschmidt@gmail.com
Changes in v3:
- improve commit message to show why CONFIG_CLEAR_BSS_F is needed
Changes in v2:
- make CONFIG_SPL_CLEAR_BSS_F depend on ARM for now
common/spl/Kconfig | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/common/spl/Kconfig b/common/spl/Kconfig index 206c24076d..6a4270516a 100644 --- a/common/spl/Kconfig +++ b/common/spl/Kconfig @@ -156,6 +156,18 @@ config SPL_STACK_R_MALLOC_SIMPLE_LEN to give board_init_r() a larger heap then the initial heap in SRAM which is limited to SYS_MALLOC_F_LEN bytes.
+config SPL_CLEAR_BSS_F
bool "Clear BSS section before calling board_init_f"
depends on ARM
help
The BSS section is initialized to zero. In SPL, this is
normally done
before calling board_init_r().
For platforms using BSS in board_init_f() already, enable this
to
clear the BSS section before calling board_init_f() instead of
clearing it before calling board_init_r(). This also ensures
that
variables placed in BSS can be shared between board_init_f()
and
board_init_r().
config SPL_SEPARATE_BSS bool "BSS section is in a different memory region from text" help -- 2.17.1
U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot

Hi Andreas,
On Fri, 3 May 2019 at 14:25, Andreas Dannenberg dannenberg@ti.com wrote:
Simon,
On Sat, Mar 30, 2019 at 09:18:08PM +0100, Simon Goldschmidt wrote:
Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 21:06:
Hi Simon,
On Wed, 27 Mar 2019 at 13:40, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
This introduces a new Kconfig option SPL_CLEAR_BSS_F. If enabled, it
clears
the bss before calling board_init_f() instead of clearing it before
calling
board_init_r().
This also ensures that variables placed in BSS can be shared between board_init_f() and board_init_r() in SPL.
Such global variables are used, for example, when loading things from FAT before SDRAM is available: the full heap required for FAT uses global variables and clearing BSS after board_init_f() would reset the heap
state.
An example for such a usage is socfpa_arria10 where an FPGA configuration is required before SDRAM can be used.
Make the new option depend on ARM for now until more implementations
follow.
I still have objections to this series and I think we should discuss other ways of solving this problem.
Does socfgpa have SRAM that could be used before SDRAM is available? If so, can we not use that for the configuration? What various are actually in BSS that are needed before board_init_r() is called? Can they not be in a struct created from malloc()?
The problem is the board needs to load an FPGA configuration from FAT before SDRAM is available. Yes, this is loaded into SRAM of course, but the whole code until that is done uses so many malloc/free iterations that The simple mall of implementation would require too much memory.
And it's the full malloc state variables only that use BSS, not the FAT code.
I've actually faced very similar issues working on our TI AM654x "System Firmware Loader" implementation (will post upstream soon), where I need to load this firmware and other files from media such as MMC/FAT in a very memory-constrained SPL pre-relocation environment *before* I can bring up DDR.
Initially, I modified the fat.c driver to re-use memory so it is not as wasteful during SYS_MALLOC_SIMPLE. While I'm not proud of this solution [1] this allowed us to get going, allowing to load multiple files without issues in pre-relocation SPL.
That seems to point the way to a useful solution I think. We could have a struct containing allocated pointers which is private to FAT, and just allocate them the first time.
I wonder if that would be enough for
In the quest of creating something more upstream-friendly I had then switched to using full malloc in pre-relocation SPL so that I didn't have to hack the FAT driver, encountering similar issues like you brought up and got this working, but ultimately abandoned this approach after bundling all files needed to get loaded into a single image tree blob which no longer required any of those solutions.
What remained till today however is a need to preserve specific BSS state from pre-relocation SPL over to post-relocation SPL environment, namely flags set to avoid the (expensive) re-probing of peripheral drivers by the SPL loader. For that I introduced a Kconfig option that allows skipping the automatic clearing of BSS during relocation [2].
Seeing this very related discussion here got me thinking about how else I can carry over this "state" from pre- to post relocation but that's probably a discussion to be had once I post my "System Firmware Loader Series", probably next week.
Since this is SPL I don't you mean 'relocation' here. I think you mean board_init_f() to board_init_r()?
You can use global_data to store state, or malloc() to allocate memory and put things there. But using BSS seems wrong to me. If you are doing something in board_init_f() in SPL that needs BSS, can you not just move that code to board_init_r()?
PS: If you want to save a ton of memory during FAT loading you can try something like CONFIG_FS_FAT_MAX_CLUSTSIZE=16384, I argue the default is overkill for all practical scenarios.
-- Andreas Dannenberg Texas Instruments Inc
[1] http://git.ti.com/gitweb/?p=ti-u-boot/ti-u-boot.git;a=commitdiff;h=3de26c27d... [2] http://git.ti.com/gitweb/?p=ti-u-boot/ti-u-boot.git;a=commitdiff;h=9ab4a405c...
One way out could be to move the full mall of state variables into 'gd'...
Another way would be to continue into board_init_f without SDRAM enabled and in it it later...
Regards, Simon
If this is a limitation of FAT, then I think we should fix that, instead.
Regards, Simon
Signed-off-by: Simon Goldschmidt simon.k.r.goldschmidt@gmail.com
Changes in v3:
- improve commit message to show why CONFIG_CLEAR_BSS_F is needed
Changes in v2:
- make CONFIG_SPL_CLEAR_BSS_F depend on ARM for now
common/spl/Kconfig | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/common/spl/Kconfig b/common/spl/Kconfig index 206c24076d..6a4270516a 100644 --- a/common/spl/Kconfig +++ b/common/spl/Kconfig @@ -156,6 +156,18 @@ config SPL_STACK_R_MALLOC_SIMPLE_LEN to give board_init_r() a larger heap then the initial heap in SRAM which is limited to SYS_MALLOC_F_LEN bytes.
+config SPL_CLEAR_BSS_F
bool "Clear BSS section before calling board_init_f"
depends on ARM
help
The BSS section is initialized to zero. In SPL, this is
normally done
before calling board_init_r().
For platforms using BSS in board_init_f() already, enable this
to
clear the BSS section before calling board_init_f() instead of
clearing it before calling board_init_r(). This also ensures
that
variables placed in BSS can be shared between board_init_f()
and
board_init_r().
config SPL_SEPARATE_BSS bool "BSS section is in a different memory region from text" help -- 2.17.1
U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot
Regards, SImon

Hi Simon,
On Mon, May 06, 2019 at 09:51:56PM -0600, Simon Glass wrote:
Hi Andreas,
On Fri, 3 May 2019 at 14:25, Andreas Dannenberg dannenberg@ti.com wrote:
Simon,
On Sat, Mar 30, 2019 at 09:18:08PM +0100, Simon Goldschmidt wrote:
Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 21:06:
Hi Simon,
On Wed, 27 Mar 2019 at 13:40, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
This introduces a new Kconfig option SPL_CLEAR_BSS_F. If enabled, it
clears
the bss before calling board_init_f() instead of clearing it before
calling
board_init_r().
This also ensures that variables placed in BSS can be shared between board_init_f() and board_init_r() in SPL.
Such global variables are used, for example, when loading things from FAT before SDRAM is available: the full heap required for FAT uses global variables and clearing BSS after board_init_f() would reset the heap
state.
An example for such a usage is socfpa_arria10 where an FPGA configuration is required before SDRAM can be used.
Make the new option depend on ARM for now until more implementations
follow.
I still have objections to this series and I think we should discuss other ways of solving this problem.
Does socfgpa have SRAM that could be used before SDRAM is available? If so, can we not use that for the configuration? What various are actually in BSS that are needed before board_init_r() is called? Can they not be in a struct created from malloc()?
The problem is the board needs to load an FPGA configuration from FAT before SDRAM is available. Yes, this is loaded into SRAM of course, but the whole code until that is done uses so many malloc/free iterations that The simple mall of implementation would require too much memory.
And it's the full malloc state variables only that use BSS, not the FAT code.
I've actually faced very similar issues working on our TI AM654x "System Firmware Loader" implementation (will post upstream soon), where I need to load this firmware and other files from media such as MMC/FAT in a very memory-constrained SPL pre-relocation environment *before* I can bring up DDR.
Initially, I modified the fat.c driver to re-use memory so it is not as wasteful during SYS_MALLOC_SIMPLE. While I'm not proud of this solution [1] this allowed us to get going, allowing to load multiple files without issues in pre-relocation SPL.
That seems to point the way to a useful solution I think. We could have a struct containing allocated pointers which is private to FAT, and just allocate them the first time.
The board_init_f()-based loader solution we use extends beyond MMC/FAT, but also for OSPI, X/Y-Modem, and (later) USB, network, etc.
Background: On our "TI K3" devices we need to do a whole bunch of stuff before DDR is up with limited memory, namely loading and installing a firmware that controls the entire SoC called "System Firmware". It is only after this FW is loaded from boot media and successfully started that I can bring up DDR. So all this is done in SPL board_init_f(), which as the last step brings up DDR.
Not having BSS available to carry over certain state to the board_init_r() world would lead to a bunch of hacky changes across the board I'm afraid, more below.
I wonder if that would be enough for
In the quest of creating something more upstream-friendly I had then switched to using full malloc in pre-relocation SPL so that I didn't have to hack the FAT driver, encountering similar issues like you brought up and got this working, but ultimately abandoned this approach after bundling all files needed to get loaded into a single image tree blob which no longer required any of those solutions.
What remained till today however is a need to preserve specific BSS state from pre-relocation SPL over to post-relocation SPL environment, namely flags set to avoid the (expensive) re-probing of peripheral drivers by the SPL loader. For that I introduced a Kconfig option that allows skipping the automatic clearing of BSS during relocation [2].
Seeing this very related discussion here got me thinking about how else I can carry over this "state" from pre- to post relocation but that's probably a discussion to be had once I post my "System Firmware Loader Series", probably next week.
Since this is SPL I don't you mean 'relocation' here. I think you mean board_init_f() to board_init_r()?
Yes that's what I mean. AFAIK relocation in SPL is still called relocation from what I have seen working on U-boot, it just relocates gd and stack but not the actual code (personally I find it misleading calling what SPL does "relocation", but I got used to it).
You can use global_data to store state,
I thought the idea was to stay away from gd, so we can eventually get rid of it altogether?
or malloc() to allocate memory and put things there.
The challenge with potentially having to incorporate such a custom solution for state preservation into several drivers as explained earlier (SPI, USB, network, etc.) is that it does not appear to scale well. I think using BSS instead would make all those additions cleaner and simpler.
But using BSS seems wrong to me.
I've seen you saying this a few times :) Why?
If you are doing something in board_init_f() in SPL that needs BSS, can you not just move that code to board_init_r()?
I need to access media drivers in board_init_f(), for which currently I'm using BSS so I can preserve some limited state, such as that the peripheral init was done, so that it doesn't get re-done by the actual SPL loader later. board_init_r() requires DDR to be available which I can't use without doing all that work in board_init_f() first to load/start the system controller firmware, so it's a bit of a chicken and egg issue here.
My system firmware loader patch series is about ready and I was planning on posting it tomorrow. How about with the entire approach being in the open we use this as an opportunity to re-look at potential alternative solutions...
Thanks,
-- Andreas Dannenberg Texas Instruments Inc
PS: If you want to save a ton of memory during FAT loading you can try something like CONFIG_FS_FAT_MAX_CLUSTSIZE=16384, I argue the default is overkill for all practical scenarios.
-- Andreas Dannenberg Texas Instruments Inc
[1] http://git.ti.com/gitweb/?p=ti-u-boot/ti-u-boot.git;a=commitdiff;h=3de26c27d... [2] http://git.ti.com/gitweb/?p=ti-u-boot/ti-u-boot.git;a=commitdiff;h=9ab4a405c...
One way out could be to move the full mall of state variables into 'gd'...
Another way would be to continue into board_init_f without SDRAM enabled and in it it later...
Regards, Simon
If this is a limitation of FAT, then I think we should fix that, instead.
Regards, Simon
Signed-off-by: Simon Goldschmidt simon.k.r.goldschmidt@gmail.com
Changes in v3:
- improve commit message to show why CONFIG_CLEAR_BSS_F is needed
Changes in v2:
- make CONFIG_SPL_CLEAR_BSS_F depend on ARM for now
common/spl/Kconfig | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/common/spl/Kconfig b/common/spl/Kconfig index 206c24076d..6a4270516a 100644 --- a/common/spl/Kconfig +++ b/common/spl/Kconfig @@ -156,6 +156,18 @@ config SPL_STACK_R_MALLOC_SIMPLE_LEN to give board_init_r() a larger heap then the initial heap in SRAM which is limited to SYS_MALLOC_F_LEN bytes.
+config SPL_CLEAR_BSS_F
bool "Clear BSS section before calling board_init_f"
depends on ARM
help
The BSS section is initialized to zero. In SPL, this is
normally done
before calling board_init_r().
For platforms using BSS in board_init_f() already, enable this
to
clear the BSS section before calling board_init_f() instead of
clearing it before calling board_init_r(). This also ensures
that
variables placed in BSS can be shared between board_init_f()
and
board_init_r().
config SPL_SEPARATE_BSS bool "BSS section is in a different memory region from text" help -- 2.17.1
U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot
Regards, SImon

Hi Andreas,
On Mon, 6 May 2019 at 22:49, Andreas Dannenberg dannenberg@ti.com wrote:
Hi Simon,
On Mon, May 06, 2019 at 09:51:56PM -0600, Simon Glass wrote:
Hi Andreas,
On Fri, 3 May 2019 at 14:25, Andreas Dannenberg dannenberg@ti.com wrote:
Simon,
On Sat, Mar 30, 2019 at 09:18:08PM +0100, Simon Goldschmidt wrote:
Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 21:06:
Hi Simon,
On Wed, 27 Mar 2019 at 13:40, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
This introduces a new Kconfig option SPL_CLEAR_BSS_F. If enabled, it
clears
the bss before calling board_init_f() instead of clearing it before
calling
board_init_r().
This also ensures that variables placed in BSS can be shared between board_init_f() and board_init_r() in SPL.
Such global variables are used, for example, when loading things from FAT before SDRAM is available: the full heap required for FAT uses global variables and clearing BSS after board_init_f() would reset the heap
state.
An example for such a usage is socfpa_arria10 where an FPGA configuration is required before SDRAM can be used.
Make the new option depend on ARM for now until more implementations
follow.
I still have objections to this series and I think we should discuss other ways of solving this problem.
Does socfgpa have SRAM that could be used before SDRAM is available? If so, can we not use that for the configuration? What various are actually in BSS that are needed before board_init_r() is called? Can they not be in a struct created from malloc()?
The problem is the board needs to load an FPGA configuration from FAT before SDRAM is available. Yes, this is loaded into SRAM of course, but the whole code until that is done uses so many malloc/free iterations that The simple mall of implementation would require too much memory.
And it's the full malloc state variables only that use BSS, not the FAT code.
I've actually faced very similar issues working on our TI AM654x "System Firmware Loader" implementation (will post upstream soon), where I need to load this firmware and other files from media such as MMC/FAT in a very memory-constrained SPL pre-relocation environment *before* I can bring up DDR.
Initially, I modified the fat.c driver to re-use memory so it is not as wasteful during SYS_MALLOC_SIMPLE. While I'm not proud of this solution [1] this allowed us to get going, allowing to load multiple files without issues in pre-relocation SPL.
That seems to point the way to a useful solution I think. We could have a struct containing allocated pointers which is private to FAT, and just allocate them the first time.
The board_init_f()-based loader solution we use extends beyond MMC/FAT, but also for OSPI, X/Y-Modem, and (later) USB, network, etc.
Background: On our "TI K3" devices we need to do a whole bunch of stuff before DDR is up with limited memory, namely loading and installing a firmware that controls the entire SoC called "System Firmware". It is only after this FW is loaded from boot media and successfully started that I can bring up DDR. So all this is done in SPL board_init_f(), which as the last step brings up DDR.
Not having BSS available to carry over certain state to the board_init_r() world would lead to a bunch of hacky changes across the board I'm afraid, more below.
This is really unfortunate.
It seems to me that we have two choises:
1. Hack around with board_init_f() such as to remove the distinction between this and board_init_r().
2. Enter board_init_r() without DRAM ready, and deal with setting it up there.
I feel that the second solution is worth exploring. We could have some board-specific init in board_init_r(). We already have spl_board_init() so perhaps we could have spl_early_board_init() called right near the top?
We can refactor a few of the functions in spl/spl.c so they can be called from board-specific code if necessary. We could also add new flags to global_data to control the behaviour of the SPL code, and the board code could set these.
I wonder if that would be enough for
In the quest of creating something more upstream-friendly I had then switched to using full malloc in pre-relocation SPL so that I didn't have to hack the FAT driver, encountering similar issues like you brought up and got this working, but ultimately abandoned this approach after bundling all files needed to get loaded into a single image tree blob which no longer required any of those solutions.
What remained till today however is a need to preserve specific BSS state from pre-relocation SPL over to post-relocation SPL environment, namely flags set to avoid the (expensive) re-probing of peripheral drivers by the SPL loader. For that I introduced a Kconfig option that allows skipping the automatic clearing of BSS during relocation [2].
Seeing this very related discussion here got me thinking about how else I can carry over this "state" from pre- to post relocation but that's probably a discussion to be had once I post my "System Firmware Loader Series", probably next week.
Since this is SPL I don't you mean 'relocation' here. I think you mean board_init_f() to board_init_r()?
Yes that's what I mean. AFAIK relocation in SPL is still called relocation from what I have seen working on U-boot, it just relocates gd and stack but not the actual code (personally I find it misleading calling what SPL does "relocation", but I got used to it).
You can use global_data to store state,
I thought the idea was to stay away from gd, so we can eventually get rid of it altogether?
Not that I know of. It is how we communicate state before we have BSS.
or malloc() to allocate memory and put things there.
The challenge with potentially having to incorporate such a custom solution for state preservation into several drivers as explained earlier (SPI, USB, network, etc.) is that it does not appear to scale well. I think using BSS instead would make all those additions cleaner and simpler.
But using BSS seems wrong to me.
I've seen you saying this a few times :) Why?
Driver model does its own allocate of memory and this is all attached to the DM structures.
Drivers themselves cannot assume they are the only instance running, so data should be attached to their private-data pointers. Similarly for uclasses, if we put everything in the uclass-private data, then we don't need BSS and don't have any problems dealing with whether it is available yet. In general, BSS creates a lot of problems early in U-Boot's execution, and we don't actually need to use it.
If you look at the DM design you'll see that we try to avoid malloc() and BSS as much as possible. I suppose this series is another example of why :-)
If you are doing something in board_init_f() in SPL that needs BSS, can you not just move that code to board_init_r()?
I need to access media drivers in board_init_f(), for which currently I'm using BSS so I can preserve some limited state, such as that the peripheral init was done, so that it doesn't get re-done by the actual SPL loader later. board_init_r() requires DDR to be available which I can't use without doing all that work in board_init_f() first to load/start the system controller firmware, so it's a bit of a chicken and egg issue here.
Let's try moving the egg into board_init_r() and putting the chicken after it, as mentioned above.
My system firmware loader patch series is about ready and I was planning on posting it tomorrow. How about with the entire approach being in the open we use this as an opportunity to re-look at potential alternative solutions...
Sure. I hope I've explained my POV above.
Regards, Simon

Hi Simon (Glass),
On Sat, May 18, 2019 at 10:08:19AM -0600, Simon Glass wrote:
Hi Andreas,
On Mon, 6 May 2019 at 22:49, Andreas Dannenberg dannenberg@ti.com wrote:
Hi Simon,
On Mon, May 06, 2019 at 09:51:56PM -0600, Simon Glass wrote:
Hi Andreas,
On Fri, 3 May 2019 at 14:25, Andreas Dannenberg dannenberg@ti.com wrote:
Simon,
On Sat, Mar 30, 2019 at 09:18:08PM +0100, Simon Goldschmidt wrote:
Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 21:06:
Hi Simon,
On Wed, 27 Mar 2019 at 13:40, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote: > > This introduces a new Kconfig option SPL_CLEAR_BSS_F. If enabled, it clears > the bss before calling board_init_f() instead of clearing it before calling > board_init_r(). > > This also ensures that variables placed in BSS can be shared between > board_init_f() and board_init_r() in SPL. > > Such global variables are used, for example, when loading things from FAT > before SDRAM is available: the full heap required for FAT uses global > variables and clearing BSS after board_init_f() would reset the heap state. > An example for such a usage is socfpa_arria10 where an FPGA configuration > is required before SDRAM can be used. > > Make the new option depend on ARM for now until more implementations follow. >
I still have objections to this series and I think we should discuss other ways of solving this problem.
Does socfgpa have SRAM that could be used before SDRAM is available? If so, can we not use that for the configuration? What various are actually in BSS that are needed before board_init_r() is called? Can they not be in a struct created from malloc()?
The problem is the board needs to load an FPGA configuration from FAT before SDRAM is available. Yes, this is loaded into SRAM of course, but the whole code until that is done uses so many malloc/free iterations that The simple mall of implementation would require too much memory.
And it's the full malloc state variables only that use BSS, not the FAT code.
I've actually faced very similar issues working on our TI AM654x "System Firmware Loader" implementation (will post upstream soon), where I need to load this firmware and other files from media such as MMC/FAT in a very memory-constrained SPL pre-relocation environment *before* I can bring up DDR.
Initially, I modified the fat.c driver to re-use memory so it is not as wasteful during SYS_MALLOC_SIMPLE. While I'm not proud of this solution [1] this allowed us to get going, allowing to load multiple files without issues in pre-relocation SPL.
That seems to point the way to a useful solution I think. We could have a struct containing allocated pointers which is private to FAT, and just allocate them the first time.
The board_init_f()-based loader solution we use extends beyond MMC/FAT, but also for OSPI, X/Y-Modem, and (later) USB, network, etc.
Background: On our "TI K3" devices we need to do a whole bunch of stuff before DDR is up with limited memory, namely loading and installing a firmware that controls the entire SoC called "System Firmware". It is only after this FW is loaded from boot media and successfully started that I can bring up DDR. So all this is done in SPL board_init_f(), which as the last step brings up DDR.
Not having BSS available to carry over certain state to the board_init_r() world would lead to a bunch of hacky changes across the board I'm afraid, more below.
This is really unfortunate.
It seems to me that we have two choises:
- Hack around with board_init_f() such as to remove the distinction
between this and board_init_r().
- Enter board_init_r() without DRAM ready, and deal with setting it up there.
I feel that the second solution is worth exploring. We could have some board-specific init in board_init_r(). We already have spl_board_init() so perhaps we could have spl_early_board_init() called right near the top?
We can refactor a few of the functions in spl/spl.c so they can be called from board-specific code if necessary. We could also add new flags to global_data to control the behaviour of the SPL code, and the board code could set these.
Let me explore this option. I can probably make something work but I don't yet see how to do it cleanly, maybe it becomes clearer once I put some code down. Currently by definition board_init_r() has DDR available, and much of the code is geared towards it (for example the calling of spl_relocate_stack_gd() before entering board_init_r() which will already switch over to DDR).
Also, and not to discourage that we can't improve upon the status quo, but there is already a ton of boards using such an "early BSS" scheme...
$ git grep --show-function 'memset.*bss' | grep board_init_f | wc -l 35
I wonder if that would be enough for
In the quest of creating something more upstream-friendly I had then switched to using full malloc in pre-relocation SPL so that I didn't have to hack the FAT driver, encountering similar issues like you brought up and got this working, but ultimately abandoned this approach after bundling all files needed to get loaded into a single image tree blob which no longer required any of those solutions.
What remained till today however is a need to preserve specific BSS state from pre-relocation SPL over to post-relocation SPL environment, namely flags set to avoid the (expensive) re-probing of peripheral drivers by the SPL loader. For that I introduced a Kconfig option that allows skipping the automatic clearing of BSS during relocation [2].
Seeing this very related discussion here got me thinking about how else I can carry over this "state" from pre- to post relocation but that's probably a discussion to be had once I post my "System Firmware Loader Series", probably next week.
Since this is SPL I don't you mean 'relocation' here. I think you mean board_init_f() to board_init_r()?
Yes that's what I mean. AFAIK relocation in SPL is still called relocation from what I have seen working on U-boot, it just relocates gd and stack but not the actual code (personally I find it misleading calling what SPL does "relocation", but I got used to it).
You can use global_data to store state,
I thought the idea was to stay away from gd, so we can eventually get rid of it altogether?
Not that I know of. It is how we communicate state before we have BSS.
Oh ok I see now, I guess I have taken the comment [1] from arch/arm/lib/spl.c out of context.
[1] https://github.com/u-boot/u-boot/blob/master/arch/arm/lib/spl.c#L22
It is how we communicate state before we have BSS.
Understood.
or malloc() to allocate memory and put things there.
The challenge with potentially having to incorporate such a custom solution for state preservation into several drivers as explained earlier (SPI, USB, network, etc.) is that it does not appear to scale well. I think using BSS instead would make all those additions cleaner and simpler.
But using BSS seems wrong to me.
I've seen you saying this a few times :) Why?
Driver model does its own allocate of memory and this is all attached to the DM structures.
Drivers themselves cannot assume they are the only instance running, so data should be attached to their private-data pointers. Similarly for uclasses, if we put everything in the uclass-private data, then we don't need BSS and don't have any problems dealing with whether it is available yet. In general, BSS creates a lot of problems early in U-Boot's execution, and we don't actually need to use it.
Yes, understood. The DM concept with private-data pointers is quite clean from an OOP point of view.
If you look at the DM design you'll see that we try to avoid malloc() and BSS as much as possible. I suppose this series is another example of why :-)
If you are doing something in board_init_f() in SPL that needs BSS, can you not just move that code to board_init_r()?
I need to access media drivers in board_init_f(), for which currently I'm using BSS so I can preserve some limited state, such as that the peripheral init was done, so that it doesn't get re-done by the actual SPL loader later. board_init_r() requires DDR to be available which I can't use without doing all that work in board_init_f() first to load/start the system controller firmware, so it's a bit of a chicken and egg issue here.
Let's try moving the egg into board_init_r() and putting the chicken after it, as mentioned above.
Well I'll give this a shot.
My system firmware loader patch series is about ready and I was planning on posting it tomorrow. How about with the entire approach being in the open we use this as an opportunity to re-look at potential alternative solutions...
Sure. I hope I've explained my POV above.
Yes thanks for the review and comments.
-- Andreas Dannenberg Texas Instruments Inc

Hi Andreas,
On Tue, 21 May 2019 at 15:01, Andreas Dannenberg dannenberg@ti.com wrote:
Hi Simon (Glass),
On Sat, May 18, 2019 at 10:08:19AM -0600, Simon Glass wrote:
Hi Andreas,
On Mon, 6 May 2019 at 22:49, Andreas Dannenberg dannenberg@ti.com wrote:
Hi Simon,
On Mon, May 06, 2019 at 09:51:56PM -0600, Simon Glass wrote:
Hi Andreas,
On Fri, 3 May 2019 at 14:25, Andreas Dannenberg dannenberg@ti.com wrote:
Simon,
On Sat, Mar 30, 2019 at 09:18:08PM +0100, Simon Goldschmidt wrote:
Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 21:06:
> Hi Simon, > > On Wed, 27 Mar 2019 at 13:40, Simon Goldschmidt > simon.k.r.goldschmidt@gmail.com wrote: > > > > This introduces a new Kconfig option SPL_CLEAR_BSS_F. If enabled, it > clears > > the bss before calling board_init_f() instead of clearing it before > calling > > board_init_r(). > > > > This also ensures that variables placed in BSS can be shared between > > board_init_f() and board_init_r() in SPL. > > > > Such global variables are used, for example, when loading things from FAT > > before SDRAM is available: the full heap required for FAT uses global > > variables and clearing BSS after board_init_f() would reset the heap > state. > > An example for such a usage is socfpa_arria10 where an FPGA configuration > > is required before SDRAM can be used. > > > > Make the new option depend on ARM for now until more implementations > follow. > > > > I still have objections to this series and I think we should discuss > other ways of solving this problem. > > Does socfgpa have SRAM that could be used before SDRAM is available? > If so, can we not use that for the configuration? What various are > actually in BSS that are needed before board_init_r() is called? Can > they not be in a struct created from malloc()? >
The problem is the board needs to load an FPGA configuration from FAT before SDRAM is available. Yes, this is loaded into SRAM of course, but the whole code until that is done uses so many malloc/free iterations that The simple mall of implementation would require too much memory.
And it's the full malloc state variables only that use BSS, not the FAT code.
I've actually faced very similar issues working on our TI AM654x "System Firmware Loader" implementation (will post upstream soon), where I need to load this firmware and other files from media such as MMC/FAT in a very memory-constrained SPL pre-relocation environment *before* I can bring up DDR.
Initially, I modified the fat.c driver to re-use memory so it is not as wasteful during SYS_MALLOC_SIMPLE. While I'm not proud of this solution [1] this allowed us to get going, allowing to load multiple files without issues in pre-relocation SPL.
That seems to point the way to a useful solution I think. We could have a struct containing allocated pointers which is private to FAT, and just allocate them the first time.
The board_init_f()-based loader solution we use extends beyond MMC/FAT, but also for OSPI, X/Y-Modem, and (later) USB, network, etc.
Background: On our "TI K3" devices we need to do a whole bunch of stuff before DDR is up with limited memory, namely loading and installing a firmware that controls the entire SoC called "System Firmware". It is only after this FW is loaded from boot media and successfully started that I can bring up DDR. So all this is done in SPL board_init_f(), which as the last step brings up DDR.
Not having BSS available to carry over certain state to the board_init_r() world would lead to a bunch of hacky changes across the board I'm afraid, more below.
This is really unfortunate.
It seems to me that we have two choises:
- Hack around with board_init_f() such as to remove the distinction
between this and board_init_r().
- Enter board_init_r() without DRAM ready, and deal with setting it up there.
I feel that the second solution is worth exploring. We could have some board-specific init in board_init_r(). We already have spl_board_init() so perhaps we could have spl_early_board_init() called right near the top?
We can refactor a few of the functions in spl/spl.c so they can be called from board-specific code if necessary. We could also add new flags to global_data to control the behaviour of the SPL code, and the board code could set these.
Let me explore this option. I can probably make something work but I don't yet see how to do it cleanly, maybe it becomes clearer once I put some code down. Currently by definition board_init_r() has DDR available, and much of the code is geared towards it (for example the calling of spl_relocate_stack_gd() before entering board_init_r() which will already switch over to DDR).
Also, and not to discourage that we can't improve upon the status quo, but there is already a ton of boards using such an "early BSS" scheme...
$ git grep --show-function 'memset.*bss' | grep board_init_f | wc -l 35
Yes I know :-(
We should migrate these boards to use the generic SPL framework.
I wonder if that would be enough for
In the quest of creating something more upstream-friendly I had then switched to using full malloc in pre-relocation SPL so that I didn't have to hack the FAT driver, encountering similar issues like you brought up and got this working, but ultimately abandoned this approach after bundling all files needed to get loaded into a single image tree blob which no longer required any of those solutions.
What remained till today however is a need to preserve specific BSS state from pre-relocation SPL over to post-relocation SPL environment, namely flags set to avoid the (expensive) re-probing of peripheral drivers by the SPL loader. For that I introduced a Kconfig option that allows skipping the automatic clearing of BSS during relocation [2].
Seeing this very related discussion here got me thinking about how else I can carry over this "state" from pre- to post relocation but that's probably a discussion to be had once I post my "System Firmware Loader Series", probably next week.
Since this is SPL I don't you mean 'relocation' here. I think you mean board_init_f() to board_init_r()?
Yes that's what I mean. AFAIK relocation in SPL is still called relocation from what I have seen working on U-boot, it just relocates gd and stack but not the actual code (personally I find it misleading calling what SPL does "relocation", but I got used to it).
You can use global_data to store state,
I thought the idea was to stay away from gd, so we can eventually get rid of it altogether?
Not that I know of. It is how we communicate state before we have BSS.
Oh ok I see now, I guess I have taken the comment [1] from arch/arm/lib/spl.c out of context.
[1] https://github.com/u-boot/u-boot/blob/master/arch/arm/lib/spl.c#L22
Ah yes. That is referring to putting global_data in the data section. Perhaps we can delete that code now and see what breaks?
It is how we communicate state before we have BSS.
Understood.
or malloc() to allocate memory and put things there.
The challenge with potentially having to incorporate such a custom solution for state preservation into several drivers as explained earlier (SPI, USB, network, etc.) is that it does not appear to scale well. I think using BSS instead would make all those additions cleaner and simpler.
But using BSS seems wrong to me.
I've seen you saying this a few times :) Why?
Driver model does its own allocate of memory and this is all attached to the DM structures.
Drivers themselves cannot assume they are the only instance running, so data should be attached to their private-data pointers. Similarly for uclasses, if we put everything in the uclass-private data, then we don't need BSS and don't have any problems dealing with whether it is available yet. In general, BSS creates a lot of problems early in U-Boot's execution, and we don't actually need to use it.
Yes, understood. The DM concept with private-data pointers is quite clean from an OOP point of view.
If you look at the DM design you'll see that we try to avoid malloc() and BSS as much as possible. I suppose this series is another example of why :-)
If you are doing something in board_init_f() in SPL that needs BSS, can you not just move that code to board_init_r()?
I need to access media drivers in board_init_f(), for which currently I'm using BSS so I can preserve some limited state, such as that the peripheral init was done, so that it doesn't get re-done by the actual SPL loader later. board_init_r() requires DDR to be available which I can't use without doing all that work in board_init_f() first to load/start the system controller firmware, so it's a bit of a chicken and egg issue here.
Let's try moving the egg into board_init_r() and putting the chicken after it, as mentioned above.
Well I'll give this a shot.
My system firmware loader patch series is about ready and I was planning on posting it tomorrow. How about with the entire approach being in the open we use this as an opportunity to re-look at potential alternative solutions...
Sure. I hope I've explained my POV above.
Yes thanks for the review and comments.
You're welcome, and good luck with it.
Regards, Simon

On Wed, May 22, 2019 at 2:53 AM Simon Glass sjg@chromium.org wrote:
Hi Andreas,
On Tue, 21 May 2019 at 15:01, Andreas Dannenberg dannenberg@ti.com wrote:
Hi Simon (Glass),
On Sat, May 18, 2019 at 10:08:19AM -0600, Simon Glass wrote:
Hi Andreas,
On Mon, 6 May 2019 at 22:49, Andreas Dannenberg dannenberg@ti.com wrote:
Hi Simon,
On Mon, May 06, 2019 at 09:51:56PM -0600, Simon Glass wrote:
Hi Andreas,
On Fri, 3 May 2019 at 14:25, Andreas Dannenberg dannenberg@ti.com wrote:
Simon,
On Sat, Mar 30, 2019 at 09:18:08PM +0100, Simon Goldschmidt wrote: > Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 21:06: > > > Hi Simon, > > > > On Wed, 27 Mar 2019 at 13:40, Simon Goldschmidt > > simon.k.r.goldschmidt@gmail.com wrote: > > > > > > This introduces a new Kconfig option SPL_CLEAR_BSS_F. If enabled, it > > clears > > > the bss before calling board_init_f() instead of clearing it before > > calling > > > board_init_r(). > > > > > > This also ensures that variables placed in BSS can be shared between > > > board_init_f() and board_init_r() in SPL. > > > > > > Such global variables are used, for example, when loading things from FAT > > > before SDRAM is available: the full heap required for FAT uses global > > > variables and clearing BSS after board_init_f() would reset the heap > > state. > > > An example for such a usage is socfpa_arria10 where an FPGA configuration > > > is required before SDRAM can be used. > > > > > > Make the new option depend on ARM for now until more implementations > > follow. > > > > > > > I still have objections to this series and I think we should discuss > > other ways of solving this problem. > > > > Does socfgpa have SRAM that could be used before SDRAM is available? > > If so, can we not use that for the configuration? What various are > > actually in BSS that are needed before board_init_r() is called? Can > > they not be in a struct created from malloc()? > > > > The problem is the board needs to load an FPGA configuration from FAT > before SDRAM is available. Yes, this is loaded into SRAM of course, but the > whole code until that is done uses so many malloc/free iterations that The > simple mall of implementation would require too much memory. > > And it's the full malloc state variables only that use BSS, not the FAT > code.
I've actually faced very similar issues working on our TI AM654x "System Firmware Loader" implementation (will post upstream soon), where I need to load this firmware and other files from media such as MMC/FAT in a very memory-constrained SPL pre-relocation environment *before* I can bring up DDR.
Initially, I modified the fat.c driver to re-use memory so it is not as wasteful during SYS_MALLOC_SIMPLE. While I'm not proud of this solution [1] this allowed us to get going, allowing to load multiple files without issues in pre-relocation SPL.
That seems to point the way to a useful solution I think. We could have a struct containing allocated pointers which is private to FAT, and just allocate them the first time.
The board_init_f()-based loader solution we use extends beyond MMC/FAT, but also for OSPI, X/Y-Modem, and (later) USB, network, etc.
Background: On our "TI K3" devices we need to do a whole bunch of stuff before DDR is up with limited memory, namely loading and installing a firmware that controls the entire SoC called "System Firmware". It is only after this FW is loaded from boot media and successfully started that I can bring up DDR. So all this is done in SPL board_init_f(), which as the last step brings up DDR.
Not having BSS available to carry over certain state to the board_init_r() world would lead to a bunch of hacky changes across the board I'm afraid, more below.
This is really unfortunate.
It seems to me that we have two choises:
- Hack around with board_init_f() such as to remove the distinction
between this and board_init_r().
- Enter board_init_r() without DRAM ready, and deal with setting it up there.
I feel that the second solution is worth exploring. We could have some board-specific init in board_init_r(). We already have spl_board_init() so perhaps we could have spl_early_board_init() called right near the top?
We can refactor a few of the functions in spl/spl.c so they can be called from board-specific code if necessary. We could also add new flags to global_data to control the behaviour of the SPL code, and the board code could set these.
Let me explore this option. I can probably make something work but I don't yet see how to do it cleanly, maybe it becomes clearer once I put some code down. Currently by definition board_init_r() has DDR available, and much of the code is geared towards it (for example the calling of spl_relocate_stack_gd() before entering board_init_r() which will already switch over to DDR).
Also, and not to discourage that we can't improve upon the status quo, but there is already a ton of boards using such an "early BSS" scheme...
$ git grep --show-function 'memset.*bss' | grep board_init_f | wc -l 35
Yes I know :-(
We should migrate these boards to use the generic SPL framework.
socfpga_gen5 is one of the architectures listed here. I'm not even sure whether that's actually needed. However, it's hard to test, isn't it? How do you actually tell BSS isn't used before entering board_init_r?
To be sure, we'd need to initialize unused memory to some magic constant and check that it has been left untouched later (on boards where BSS is available in board_init_r and remains in place when moving on).
Regards, Simon
I wonder if that would be enough for
In the quest of creating something more upstream-friendly I had then switched to using full malloc in pre-relocation SPL so that I didn't have to hack the FAT driver, encountering similar issues like you brought up and got this working, but ultimately abandoned this approach after bundling all files needed to get loaded into a single image tree blob which no longer required any of those solutions.
What remained till today however is a need to preserve specific BSS state from pre-relocation SPL over to post-relocation SPL environment, namely flags set to avoid the (expensive) re-probing of peripheral drivers by the SPL loader. For that I introduced a Kconfig option that allows skipping the automatic clearing of BSS during relocation [2].
Seeing this very related discussion here got me thinking about how else I can carry over this "state" from pre- to post relocation but that's probably a discussion to be had once I post my "System Firmware Loader Series", probably next week.
Since this is SPL I don't you mean 'relocation' here. I think you mean board_init_f() to board_init_r()?
Yes that's what I mean. AFAIK relocation in SPL is still called relocation from what I have seen working on U-boot, it just relocates gd and stack but not the actual code (personally I find it misleading calling what SPL does "relocation", but I got used to it).
You can use global_data to store state,
I thought the idea was to stay away from gd, so we can eventually get rid of it altogether?
Not that I know of. It is how we communicate state before we have BSS.
Oh ok I see now, I guess I have taken the comment [1] from arch/arm/lib/spl.c out of context.
[1] https://github.com/u-boot/u-boot/blob/master/arch/arm/lib/spl.c#L22
Ah yes. That is referring to putting global_data in the data section. Perhaps we can delete that code now and see what breaks?
It is how we communicate state before we have BSS.
Understood.
or malloc() to allocate memory and put things there.
The challenge with potentially having to incorporate such a custom solution for state preservation into several drivers as explained earlier (SPI, USB, network, etc.) is that it does not appear to scale well. I think using BSS instead would make all those additions cleaner and simpler.
But using BSS seems wrong to me.
I've seen you saying this a few times :) Why?
Driver model does its own allocate of memory and this is all attached to the DM structures.
Drivers themselves cannot assume they are the only instance running, so data should be attached to their private-data pointers. Similarly for uclasses, if we put everything in the uclass-private data, then we don't need BSS and don't have any problems dealing with whether it is available yet. In general, BSS creates a lot of problems early in U-Boot's execution, and we don't actually need to use it.
Yes, understood. The DM concept with private-data pointers is quite clean from an OOP point of view.
If you look at the DM design you'll see that we try to avoid malloc() and BSS as much as possible. I suppose this series is another example of why :-)
If you are doing something in board_init_f() in SPL that needs BSS, can you not just move that code to board_init_r()?
I need to access media drivers in board_init_f(), for which currently I'm using BSS so I can preserve some limited state, such as that the peripheral init was done, so that it doesn't get re-done by the actual SPL loader later. board_init_r() requires DDR to be available which I can't use without doing all that work in board_init_f() first to load/start the system controller firmware, so it's a bit of a chicken and egg issue here.
Let's try moving the egg into board_init_r() and putting the chicken after it, as mentioned above.
Well I'll give this a shot.
My system firmware loader patch series is about ready and I was planning on posting it tomorrow. How about with the entire approach being in the open we use this as an opportunity to re-look at potential alternative solutions...
Sure. I hope I've explained my POV above.
Yes thanks for the review and comments.
You're welcome, and good luck with it.
Regards, Simon

Hi Simon,
On Wed, 22 May 2019 at 02:05, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
On Wed, May 22, 2019 at 2:53 AM Simon Glass sjg@chromium.org wrote:
Hi Andreas,
On Tue, 21 May 2019 at 15:01, Andreas Dannenberg dannenberg@ti.com wrote:
Hi Simon (Glass),
On Sat, May 18, 2019 at 10:08:19AM -0600, Simon Glass wrote:
Hi Andreas,
On Mon, 6 May 2019 at 22:49, Andreas Dannenberg dannenberg@ti.com wrote:
Hi Simon,
On Mon, May 06, 2019 at 09:51:56PM -0600, Simon Glass wrote:
Hi Andreas,
On Fri, 3 May 2019 at 14:25, Andreas Dannenberg dannenberg@ti.com wrote: > > Simon, > > On Sat, Mar 30, 2019 at 09:18:08PM +0100, Simon Goldschmidt wrote: > > Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 21:06: > > > > > Hi Simon, > > > > > > On Wed, 27 Mar 2019 at 13:40, Simon Goldschmidt > > > simon.k.r.goldschmidt@gmail.com wrote: > > > > > > > > This introduces a new Kconfig option SPL_CLEAR_BSS_F. If enabled, it > > > clears > > > > the bss before calling board_init_f() instead of clearing it before > > > calling > > > > board_init_r(). > > > > > > > > This also ensures that variables placed in BSS can be shared between > > > > board_init_f() and board_init_r() in SPL. > > > > > > > > Such global variables are used, for example, when loading things from FAT > > > > before SDRAM is available: the full heap required for FAT uses global > > > > variables and clearing BSS after board_init_f() would reset the heap > > > state. > > > > An example for such a usage is socfpa_arria10 where an FPGA configuration > > > > is required before SDRAM can be used. > > > > > > > > Make the new option depend on ARM for now until more implementations > > > follow. > > > > > > > > > > I still have objections to this series and I think we should discuss > > > other ways of solving this problem. > > > > > > Does socfgpa have SRAM that could be used before SDRAM is available? > > > If so, can we not use that for the configuration? What various are > > > actually in BSS that are needed before board_init_r() is called? Can > > > they not be in a struct created from malloc()? > > > > > > > The problem is the board needs to load an FPGA configuration from FAT > > before SDRAM is available. Yes, this is loaded into SRAM of course, but the > > whole code until that is done uses so many malloc/free iterations that The > > simple mall of implementation would require too much memory. > > > > And it's the full malloc state variables only that use BSS, not the FAT > > code. > > I've actually faced very similar issues working on our TI AM654x "System > Firmware Loader" implementation (will post upstream soon), where I need > to load this firmware and other files from media such as MMC/FAT in a very > memory-constrained SPL pre-relocation environment *before* I can bring up > DDR. > > Initially, I modified the fat.c driver to re-use memory so it is not as > wasteful during SYS_MALLOC_SIMPLE. While I'm not proud of this solution [1] > this allowed us to get going, allowing to load multiple files without > issues in pre-relocation SPL.
That seems to point the way to a useful solution I think. We could have a struct containing allocated pointers which is private to FAT, and just allocate them the first time.
The board_init_f()-based loader solution we use extends beyond MMC/FAT, but also for OSPI, X/Y-Modem, and (later) USB, network, etc.
Background: On our "TI K3" devices we need to do a whole bunch of stuff before DDR is up with limited memory, namely loading and installing a firmware that controls the entire SoC called "System Firmware". It is only after this FW is loaded from boot media and successfully started that I can bring up DDR. So all this is done in SPL board_init_f(), which as the last step brings up DDR.
Not having BSS available to carry over certain state to the board_init_r() world would lead to a bunch of hacky changes across the board I'm afraid, more below.
This is really unfortunate.
It seems to me that we have two choises:
- Hack around with board_init_f() such as to remove the distinction
between this and board_init_r().
- Enter board_init_r() without DRAM ready, and deal with setting it up there.
I feel that the second solution is worth exploring. We could have some board-specific init in board_init_r(). We already have spl_board_init() so perhaps we could have spl_early_board_init() called right near the top?
We can refactor a few of the functions in spl/spl.c so they can be called from board-specific code if necessary. We could also add new flags to global_data to control the behaviour of the SPL code, and the board code could set these.
Let me explore this option. I can probably make something work but I don't yet see how to do it cleanly, maybe it becomes clearer once I put some code down. Currently by definition board_init_r() has DDR available, and much of the code is geared towards it (for example the calling of spl_relocate_stack_gd() before entering board_init_r() which will already switch over to DDR).
Also, and not to discourage that we can't improve upon the status quo, but there is already a ton of boards using such an "early BSS" scheme...
$ git grep --show-function 'memset.*bss' | grep board_init_f | wc -l 35
Yes I know :-(
We should migrate these boards to use the generic SPL framework.
socfpga_gen5 is one of the architectures listed here. I'm not even sure whether that's actually needed. However, it's hard to test, isn't it? How do you actually tell BSS isn't used before entering board_init_r?
One way might be to link the SPL code without the call to board_init_r() and then check the map to make sure BSS is empty.
To be sure, we'd need to initialize unused memory to some magic constant and check that it has been left untouched later (on boards where BSS is available in board_init_r and remains in place when moving on).
I prefer a build-time check. We might even be able to automate it.
Regards, Simon
Regards, Simon
I wonder if that would be enough for
> > In the quest of creating something more upstream-friendly I had then > switched to using full malloc in pre-relocation SPL so that I didn't > have to hack the FAT driver, encountering similar issues like you > brought up and got this working, but ultimately abandoned this > approach after bundling all files needed to get loaded into a single > image tree blob which no longer required any of those solutions. > > What remained till today however is a need to preserve specific BSS > state from pre-relocation SPL over to post-relocation SPL environment, > namely flags set to avoid the (expensive) re-probing of peripheral > drivers by the SPL loader. For that I introduced a Kconfig option that > allows skipping the automatic clearing of BSS during relocation [2]. > > Seeing this very related discussion here got me thinking about how else > I can carry over this "state" from pre- to post relocation but that's > probably a discussion to be had once I post my "System Firmware Loader > Series", probably next week.
Since this is SPL I don't you mean 'relocation' here. I think you mean board_init_f() to board_init_r()?
Yes that's what I mean. AFAIK relocation in SPL is still called relocation from what I have seen working on U-boot, it just relocates gd and stack but not the actual code (personally I find it misleading calling what SPL does "relocation", but I got used to it).
You can use global_data to store state,
I thought the idea was to stay away from gd, so we can eventually get rid of it altogether?
Not that I know of. It is how we communicate state before we have BSS.
Oh ok I see now, I guess I have taken the comment [1] from arch/arm/lib/spl.c out of context.
[1] https://github.com/u-boot/u-boot/blob/master/arch/arm/lib/spl.c#L22
Ah yes. That is referring to putting global_data in the data section. Perhaps we can delete that code now and see what breaks?
It is how we communicate state before we have BSS.
Understood.
or malloc() to allocate memory and put things there.
The challenge with potentially having to incorporate such a custom solution for state preservation into several drivers as explained earlier (SPI, USB, network, etc.) is that it does not appear to scale well. I think using BSS instead would make all those additions cleaner and simpler.
But using BSS seems wrong to me.
I've seen you saying this a few times :) Why?
Driver model does its own allocate of memory and this is all attached to the DM structures.
Drivers themselves cannot assume they are the only instance running, so data should be attached to their private-data pointers. Similarly for uclasses, if we put everything in the uclass-private data, then we don't need BSS and don't have any problems dealing with whether it is available yet. In general, BSS creates a lot of problems early in U-Boot's execution, and we don't actually need to use it.
Yes, understood. The DM concept with private-data pointers is quite clean from an OOP point of view.
If you look at the DM design you'll see that we try to avoid malloc() and BSS as much as possible. I suppose this series is another example of why :-)
If you are doing something in board_init_f() in SPL that needs BSS, can you not just move that code to board_init_r()?
I need to access media drivers in board_init_f(), for which currently I'm using BSS so I can preserve some limited state, such as that the peripheral init was done, so that it doesn't get re-done by the actual SPL loader later. board_init_r() requires DDR to be available which I can't use without doing all that work in board_init_f() first to load/start the system controller firmware, so it's a bit of a chicken and egg issue here.
Let's try moving the egg into board_init_r() and putting the chicken after it, as mentioned above.
Well I'll give this a shot.
My system firmware loader patch series is about ready and I was planning on posting it tomorrow. How about with the entire approach being in the open we use this as an opportunity to re-look at potential alternative solutions...
Sure. I hope I've explained my POV above.
Yes thanks for the review and comments.
You're welcome, and good luck with it.
Regards, Simon

Simon Glass sjg@chromium.org schrieb am Mi., 22. Mai 2019, 21:34:
Hi Simon,
On Wed, 22 May 2019 at 02:05, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
On Wed, May 22, 2019 at 2:53 AM Simon Glass sjg@chromium.org wrote:
Hi Andreas,
On Tue, 21 May 2019 at 15:01, Andreas Dannenberg dannenberg@ti.com
wrote:
Hi Simon (Glass),
On Sat, May 18, 2019 at 10:08:19AM -0600, Simon Glass wrote:
Hi Andreas,
On Mon, 6 May 2019 at 22:49, Andreas Dannenberg dannenberg@ti.com
wrote:
Hi Simon,
On Mon, May 06, 2019 at 09:51:56PM -0600, Simon Glass wrote: > Hi Andreas, > > On Fri, 3 May 2019 at 14:25, Andreas Dannenberg <
dannenberg@ti.com> wrote:
> > > > Simon, > > > > On Sat, Mar 30, 2019 at 09:18:08PM +0100, Simon Goldschmidt
wrote:
> > > Simon Glass sjg@chromium.org schrieb am Sa., 30. März
2019, 21:06:
> > > > > > > Hi Simon, > > > > > > > > On Wed, 27 Mar 2019 at 13:40, Simon Goldschmidt > > > > simon.k.r.goldschmidt@gmail.com wrote: > > > > > > > > > > This introduces a new Kconfig option SPL_CLEAR_BSS_F.
If enabled, it
> > > > clears > > > > > the bss before calling board_init_f() instead of
clearing it before
> > > > calling > > > > > board_init_r(). > > > > > > > > > > This also ensures that variables placed in BSS can be
shared between
> > > > > board_init_f() and board_init_r() in SPL. > > > > > > > > > > Such global variables are used, for example, when
loading things from FAT
> > > > > before SDRAM is available: the full heap required for
FAT uses global
> > > > > variables and clearing BSS after board_init_f() would
reset the heap
> > > > state. > > > > > An example for such a usage is socfpa_arria10 where an
FPGA configuration
> > > > > is required before SDRAM can be used. > > > > > > > > > > Make the new option depend on ARM for now until more
implementations
> > > > follow. > > > > > > > > > > > > > I still have objections to this series and I think we
should discuss
> > > > other ways of solving this problem. > > > > > > > > Does socfgpa have SRAM that could be used before SDRAM
is available?
> > > > If so, can we not use that for the configuration? What
various are
> > > > actually in BSS that are needed before board_init_r() is
called? Can
> > > > they not be in a struct created from malloc()? > > > > > > > > > > The problem is the board needs to load an FPGA
configuration from FAT
> > > before SDRAM is available. Yes, this is loaded into SRAM
of course, but the
> > > whole code until that is done uses so many malloc/free
iterations that The
> > > simple mall of implementation would require too much
memory.
> > > > > > And it's the full malloc state variables only that use
BSS, not the FAT
> > > code. > > > > I've actually faced very similar issues working on our TI
AM654x "System
> > Firmware Loader" implementation (will post upstream soon),
where I need
> > to load this firmware and other files from media such as
MMC/FAT in a very
> > memory-constrained SPL pre-relocation environment *before* I
can bring up
> > DDR. > > > > Initially, I modified the fat.c driver to re-use memory so
it is not as
> > wasteful during SYS_MALLOC_SIMPLE. While I'm not proud of
this solution [1]
> > this allowed us to get going, allowing to load multiple
files without
> > issues in pre-relocation SPL. > > That seems to point the way to a useful solution I think. We
could
> have a struct containing allocated pointers which is private
to FAT,
> and just allocate them the first time.
The board_init_f()-based loader solution we use extends beyond
MMC/FAT,
but also for OSPI, X/Y-Modem, and (later) USB, network, etc.
Background: On our "TI K3" devices we need to do a whole bunch of stuff
before
DDR is up with limited memory, namely loading and installing a
firmware
that controls the entire SoC called "System Firmware". It is
only after
this FW is loaded from boot media and successfully started that
I can
bring up DDR. So all this is done in SPL board_init_f(), which
as the
last step brings up DDR.
Not having BSS available to carry over certain state to the board_init_r() world would lead to a bunch of hacky changes
across
the board I'm afraid, more below.
This is really unfortunate.
It seems to me that we have two choises:
- Hack around with board_init_f() such as to remove the
distinction
between this and board_init_r().
- Enter board_init_r() without DRAM ready, and deal with setting
it up there.
I feel that the second solution is worth exploring. We could have
some
board-specific init in board_init_r(). We already have spl_board_init() so perhaps we could have spl_early_board_init() called right near the top?
We can refactor a few of the functions in spl/spl.c so they can be called from board-specific code if necessary. We could also add new flags to global_data to control the behaviour of the SPL code, and
the
board code could set these.
Let me explore this option. I can probably make something work but I don't yet see how to do it cleanly, maybe it becomes clearer once I
put
some code down. Currently by definition board_init_r() has DDR available, and much of the code is geared towards it (for example the calling of spl_relocate_stack_gd() before entering board_init_r()
which
will already switch over to DDR).
Also, and not to discourage that we can't improve upon the status
quo,
but there is already a ton of boards using such an "early BSS"
scheme...
$ git grep --show-function 'memset.*bss' | grep board_init_f | wc -l 35
Yes I know :-(
We should migrate these boards to use the generic SPL framework.
socfpga_gen5 is one of the architectures listed here. I'm not even sure whether that's actually needed. However, it's hard to test, isn't it? How do you actually tell BSS isn't used before entering board_init_r?
One way might be to link the SPL code without the call to board_init_r() and then check the map to make sure BSS is empty.
That would be worth a try.
To be sure, we'd need to initialize unused memory to some magic constant and check that it has been left untouched later (on boards where BSS is available in board_init_r and remains in place when moving on).
I prefer a build-time check. We might even be able to automate it.
Of course a build-time check is better here than a runtime check. I just couldn't come up with one.
I think it would be really beneficial to add such a check to all boards so we know what we're discussing: I'll bet there are many of them just using bss early because No one noticed...
And in board-local code, that could even be ok, it's just not ok for code shared with boards not having access to bss early.
Regards, Simon
Regards, Simon
Regards, Simon
> I wonder if that would be enough for > > > > > In the quest of creating something more upstream-friendly I
had then
> > switched to using full malloc in pre-relocation SPL so that
I didn't
> > have to hack the FAT driver, encountering similar issues
like you
> > brought up and got this working, but ultimately abandoned
this
> > approach after bundling all files needed to get loaded into
a single
> > image tree blob which no longer required any of those
solutions.
> > > > What remained till today however is a need to preserve
specific BSS
> > state from pre-relocation SPL over to post-relocation SPL
environment,
> > namely flags set to avoid the (expensive) re-probing of
peripheral
> > drivers by the SPL loader. For that I introduced a Kconfig
option that
> > allows skipping the automatic clearing of BSS during
relocation [2].
> > > > Seeing this very related discussion here got me thinking
about how else
> > I can carry over this "state" from pre- to post relocation
but that's
> > probably a discussion to be had once I post my "System
Firmware Loader
> > Series", probably next week. > > Since this is SPL I don't you mean 'relocation' here. I think
you mean
> board_init_f() to board_init_r()?
Yes that's what I mean. AFAIK relocation in SPL is still called
relocation
from what I have seen working on U-boot, it just relocates gd
and stack
but not the actual code (personally I find it misleading calling
what SPL
does "relocation", but I got used to it).
> You can use global_data to store state,
I thought the idea was to stay away from gd, so we can
eventually get
rid of it altogether?
Not that I know of. It is how we communicate state before we have
BSS.
Oh ok I see now, I guess I have taken the comment [1] from arch/arm/lib/spl.c out of context.
[1]
https://github.com/u-boot/u-boot/blob/master/arch/arm/lib/spl.c#L22
Ah yes. That is referring to putting global_data in the data section. Perhaps we can delete that code now and see what breaks?
It is how we communicate state before we have BSS.
Understood.
> or malloc() to allocate memory and put things there.
The challenge with potentially having to incorporate such a
custom
solution for state preservation into several drivers as explained earlier (SPI, USB, network, etc.) is that it does not appear to
scale
well. I think using BSS instead would make all those additions
cleaner
and simpler.
> But using BSS seems wrong to me.
I've seen you saying this a few times :) Why?
Driver model does its own allocate of memory and this is all
attached
to the DM structures.
Drivers themselves cannot assume they are the only instance
running,
so data should be attached to their private-data pointers.
Similarly
for uclasses, if we put everything in the uclass-private data,
then we
don't need BSS and don't have any problems dealing with whether it
is
available yet. In general, BSS creates a lot of problems early in U-Boot's execution, and we don't actually need to use it.
Yes, understood. The DM concept with private-data pointers is quite clean from an OOP point of view.
If you look at the DM design you'll see that we try to avoid
malloc()
and BSS as much as possible. I suppose this series is another
example
of why :-)
> If you are doing something in board_init_f() in SPL that needs
BSS,
> can you not just move that code to board_init_r()?
I need to access media drivers in board_init_f(), for which
currently
I'm using BSS so I can preserve some limited state, such as that
the
peripheral init was done, so that it doesn't get re-done by the
actual
SPL loader later. board_init_r() requires DDR to be available
which I
can't use without doing all that work in board_init_f() first to load/start the system controller firmware, so it's a bit of a
chicken
and egg issue here.
Let's try moving the egg into board_init_r() and putting the
chicken
after it, as mentioned above.
Well I'll give this a shot.
My system firmware loader patch series is about ready and I was
planning
on posting it tomorrow. How about with the entire approach being
in the
open we use this as an opportunity to re-look at potential
alternative
solutions...
Sure. I hope I've explained my POV above.
Yes thanks for the review and comments.
You're welcome, and good luck with it.
Regards, Simon

Hi Simon,
On Wed, 22 May 2019 at 13:42, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
Simon Glass sjg@chromium.org schrieb am Mi., 22. Mai 2019, 21:34:
Hi Simon,
On Wed, 22 May 2019 at 02:05, Simon Goldschmidt simon.k.r.goldschmidt@gmail.com wrote:
On Wed, May 22, 2019 at 2:53 AM Simon Glass sjg@chromium.org wrote:
Hi Andreas,
On Tue, 21 May 2019 at 15:01, Andreas Dannenberg dannenberg@ti.com wrote:
Hi Simon (Glass),
On Sat, May 18, 2019 at 10:08:19AM -0600, Simon Glass wrote:
Hi Andreas,
On Mon, 6 May 2019 at 22:49, Andreas Dannenberg dannenberg@ti.com wrote: > > Hi Simon, > > On Mon, May 06, 2019 at 09:51:56PM -0600, Simon Glass wrote: > > Hi Andreas, > > > > On Fri, 3 May 2019 at 14:25, Andreas Dannenberg dannenberg@ti.com wrote: > > > > > > Simon, > > > > > > On Sat, Mar 30, 2019 at 09:18:08PM +0100, Simon Goldschmidt wrote: > > > > Simon Glass sjg@chromium.org schrieb am Sa., 30. März 2019, 21:06: > > > > > > > > > Hi Simon, > > > > > > > > > > On Wed, 27 Mar 2019 at 13:40, Simon Goldschmidt > > > > > simon.k.r.goldschmidt@gmail.com wrote: > > > > > > > > > > > > This introduces a new Kconfig option SPL_CLEAR_BSS_F. If enabled, it > > > > > clears > > > > > > the bss before calling board_init_f() instead of clearing it before > > > > > calling > > > > > > board_init_r(). > > > > > > > > > > > > This also ensures that variables placed in BSS can be shared between > > > > > > board_init_f() and board_init_r() in SPL. > > > > > > > > > > > > Such global variables are used, for example, when loading things from FAT > > > > > > before SDRAM is available: the full heap required for FAT uses global > > > > > > variables and clearing BSS after board_init_f() would reset the heap > > > > > state. > > > > > > An example for such a usage is socfpa_arria10 where an FPGA configuration > > > > > > is required before SDRAM can be used. > > > > > > > > > > > > Make the new option depend on ARM for now until more implementations > > > > > follow. > > > > > > > > > > > > > > > > I still have objections to this series and I think we should discuss > > > > > other ways of solving this problem. > > > > > > > > > > Does socfgpa have SRAM that could be used before SDRAM is available? > > > > > If so, can we not use that for the configuration? What various are > > > > > actually in BSS that are needed before board_init_r() is called? Can > > > > > they not be in a struct created from malloc()? > > > > > > > > > > > > > The problem is the board needs to load an FPGA configuration from FAT > > > > before SDRAM is available. Yes, this is loaded into SRAM of course, but the > > > > whole code until that is done uses so many malloc/free iterations that The > > > > simple mall of implementation would require too much memory. > > > > > > > > And it's the full malloc state variables only that use BSS, not the FAT > > > > code. > > > > > > I've actually faced very similar issues working on our TI AM654x "System > > > Firmware Loader" implementation (will post upstream soon), where I need > > > to load this firmware and other files from media such as MMC/FAT in a very > > > memory-constrained SPL pre-relocation environment *before* I can bring up > > > DDR. > > > > > > Initially, I modified the fat.c driver to re-use memory so it is not as > > > wasteful during SYS_MALLOC_SIMPLE. While I'm not proud of this solution [1] > > > this allowed us to get going, allowing to load multiple files without > > > issues in pre-relocation SPL. > > > > That seems to point the way to a useful solution I think. We could > > have a struct containing allocated pointers which is private to FAT, > > and just allocate them the first time. > > The board_init_f()-based loader solution we use extends beyond MMC/FAT, > but also for OSPI, X/Y-Modem, and (later) USB, network, etc. > > Background: > On our "TI K3" devices we need to do a whole bunch of stuff before > DDR is up with limited memory, namely loading and installing a firmware > that controls the entire SoC called "System Firmware". It is only after > this FW is loaded from boot media and successfully started that I can > bring up DDR. So all this is done in SPL board_init_f(), which as the > last step brings up DDR. > > Not having BSS available to carry over certain state to the > board_init_r() world would lead to a bunch of hacky changes across > the board I'm afraid, more below.
This is really unfortunate.
It seems to me that we have two choises:
- Hack around with board_init_f() such as to remove the distinction
between this and board_init_r().
- Enter board_init_r() without DRAM ready, and deal with setting it up there.
I feel that the second solution is worth exploring. We could have some board-specific init in board_init_r(). We already have spl_board_init() so perhaps we could have spl_early_board_init() called right near the top?
We can refactor a few of the functions in spl/spl.c so they can be called from board-specific code if necessary. We could also add new flags to global_data to control the behaviour of the SPL code, and the board code could set these.
Let me explore this option. I can probably make something work but I don't yet see how to do it cleanly, maybe it becomes clearer once I put some code down. Currently by definition board_init_r() has DDR available, and much of the code is geared towards it (for example the calling of spl_relocate_stack_gd() before entering board_init_r() which will already switch over to DDR).
Also, and not to discourage that we can't improve upon the status quo, but there is already a ton of boards using such an "early BSS" scheme...
$ git grep --show-function 'memset.*bss' | grep board_init_f | wc -l 35
Yes I know :-(
We should migrate these boards to use the generic SPL framework.
socfpga_gen5 is one of the architectures listed here. I'm not even sure whether that's actually needed. However, it's hard to test, isn't it? How do you actually tell BSS isn't used before entering board_init_r?
One way might be to link the SPL code without the call to board_init_r() and then check the map to make sure BSS is empty.
That would be worth a try.
Yes, I did something like this a few years back and it worked OK.
The way I did it was (from my fading memory) was using an if() around the call, or perhaps by removing symbols using a linker option...I can't remember. But in any case garbage collection removed dependent code.
To be sure, we'd need to initialize unused memory to some magic constant and check that it has been left untouched later (on boards where BSS is available in board_init_r and remains in place when moving on).
I prefer a build-time check. We might even be able to automate it.
Of course a build-time check is better here than a runtime check. I just couldn't come up with one.
I think it would be really beneficial to add such a check to all boards so we know what we're discussing: I'll bet there are many of them just using bss early because No one noticed...
And in board-local code, that could even be ok, it's just not ok for code shared with boards not having access to bss early.
Yes this would be a really great thing to have.
[..]
Regards, Simon

Hi Simon (Glass)
On Tue, May 21, 2019 at 06:53:31PM -0600, Simon Glass wrote: <snip>
Not having BSS available to carry over certain state to the board_init_r() world would lead to a bunch of hacky changes across the board I'm afraid, more below.
This is really unfortunate.
It seems to me that we have two choises:
- Hack around with board_init_f() such as to remove the distinction
between this and board_init_r().
- Enter board_init_r() without DRAM ready, and deal with setting it up there.
I feel that the second solution is worth exploring. We could have some board-specific init in board_init_r(). We already have spl_board_init() so perhaps we could have spl_early_board_init() called right near the top?
We can refactor a few of the functions in spl/spl.c so they can be called from board-specific code if necessary. We could also add new flags to global_data to control the behaviour of the SPL code, and the board code could set these.
I have an alternative solution working, that essentially makes board_init_f() more useful. I understand that this is not what you wanted to see but I wanted to throw it out here anyways so we can have another look at it.
Please see attached RFC for the general concept of allowing to move BSS setup prior to board_init_f for SPL via Kconfig option. This should also allow a few folks to get rid of the "hacky" memset() calls to manually clear BSS in board_initf() and with this bring some cleanup across the board (no pun intended). Of course such solution would need to go along with comment/documentation updates that are not yet comprehended in this RFC.
Background, I played with the adding a hook early into SPL's board_init_r() but as expected it was not very straightforward. One challenge for example is that gd/stack are "relocated" to DDR prior to board_init_r(), but since I don't have DDR until I can use BSS to bring up DDR, adding a hook into board_init_r() to bringup DDR I couldn't see a good way to both avoid doing and then to re-do some of that stuff usually done in crt0.S after my early board_init_r() hook has ran without making changes to crt0.S itself...
I'm still thinking about it...
-- Andreas Dannenberg Texas Instruments Inc

Simon & Simon,
On Wed, May 22, 2019 at 05:08:40PM -0500, Andreas Dannenberg wrote:
Hi Simon (Glass)
On Tue, May 21, 2019 at 06:53:31PM -0600, Simon Glass wrote:
<snip> > > > > Not having BSS available to carry over certain state to the > > > > board_init_r() world would lead to a bunch of hacky changes across > > > > the board I'm afraid, more below. > > > > > > This is really unfortunate. > > > > > > It seems to me that we have two choises: > > > > > > 1. Hack around with board_init_f() such as to remove the distinction > > > between this and board_init_r(). > > > > > > 2. Enter board_init_r() without DRAM ready, and deal with setting it up there. > > > > > > I feel that the second solution is worth exploring. We could have some > > > board-specific init in board_init_r(). We already have > > > spl_board_init() so perhaps we could have spl_early_board_init() > > > called right near the top? > > > > > > We can refactor a few of the functions in spl/spl.c so they can be > > > called from board-specific code if necessary. We could also add new > > > flags to global_data to control the behaviour of the SPL code, and the > > > board code could set these.
I have an alternative solution working, that essentially makes board_init_f() more useful. I understand that this is not what you wanted to see but I wanted to throw it out here anyways so we can have another look at it.
Please see attached RFC for the general concept of allowing to move BSS setup prior to board_init_f for SPL via Kconfig option. This should also allow a few folks to get rid of the "hacky" memset() calls to manually clear BSS in board_initf() and with this bring some cleanup across the board (no pun intended). Of course such solution would need to go along with comment/documentation updates that are not yet comprehended in this RFC.
Ok I just realized yesterday after I sent that RFC that it was essentially the same approach that was already part of Simon's patch series here... give or take that my approach was using a macro to avoid the duplication of BSS clearing code in crt0.S. So it's not really adding to the discussion here of coming up with something entirely different. Should have looked at the original patch more closely, my bad.
-- Andreas Dannenberg Texas Instruments Inc
Background, I played with the adding a hook early into SPL's board_init_r() but as expected it was not very straightforward. One challenge for example is that gd/stack are "relocated" to DDR prior to board_init_r(), but since I don't have DDR until I can use BSS to bring up DDR, adding a hook into board_init_r() to bringup DDR I couldn't see a good way to both avoid doing and then to re-do some of that stuff usually done in crt0.S after my early board_init_r() hook has ran without making changes to crt0.S itself...
I'm still thinking about it...
-- Andreas Dannenberg Texas Instruments Inc
U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot

This implements the new option to clear BSS early in SPL for standard arm and arm64 crt0.
BSS is cleared before calling board_init_f() and thus not cleared before calling board_init_r() as it is not relocated in SPL.
Signed-off-by: Simon Goldschmidt simon.k.r.goldschmidt@gmail.com ---
Changes in v3: None Changes in v2: - add CONFIG_SPL_CLEAR_BSS_F implementation for arm64 also
arch/arm/lib/crt0.S | 22 ++++++++++++++++++++++ arch/arm/lib/crt0_64.S | 14 ++++++++++++++ 2 files changed, 36 insertions(+)
diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S index fe312db690..b06e54e144 100644 --- a/arch/arm/lib/crt0.S +++ b/arch/arm/lib/crt0.S @@ -80,6 +80,26 @@ ENTRY(_main) mov r9, r0 bl board_init_f_init_reserve
+#if defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_CLEAR_BSS_F) + ldr r0, =__bss_start + +#ifdef CONFIG_USE_ARCH_MEMSET + ldr r3, =__bss_end + mov r1, #0x00000000 /* prepare zero to clear BSS */ + + subs r2, r3, r0 /* r2 = memset len */ + bl memset +#else + ldr r1, =__bss_end + mov r2, #0x00000000 /* prepare zero to clear BSS */ + +clbss_l:cmp r0, r1 /* while not at end of BSS */ + strlo r2, [r0] /* clear 32-bit BSS word */ + addlo r0, r0, #4 /* move to next */ + blo clbss_l +#endif +#endif + mov r0, #0 bl board_init_f
@@ -124,6 +144,7 @@ here: movne sp, r0 movne r9, r0 # endif +#if !defined(CONFIG_SPL_BUILD) || !defined(CONFIG_SPL_CLEAR_BSS_F) ldr r0, =__bss_start /* this is auto-relocated! */
#ifdef CONFIG_USE_ARCH_MEMSET @@ -141,6 +162,7 @@ clbss_l:cmp r0, r1 /* while not at end of BSS */ addlo r0, r0, #4 /* move to next */ blo clbss_l #endif +#endif
#if ! defined(CONFIG_SPL_BUILD) bl coloured_LED_init diff --git a/arch/arm/lib/crt0_64.S b/arch/arm/lib/crt0_64.S index d6b632aa87..82f643f737 100644 --- a/arch/arm/lib/crt0_64.S +++ b/arch/arm/lib/crt0_64.S @@ -86,6 +86,18 @@ ENTRY(_main) mov x18, x0 bl board_init_f_init_reserve
+#if defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_CLEAR_BSS_F) +/* + * Clear BSS section + */ + ldr x0, =__bss_start + ldr x1, =__bss_end +clear_loop: + str xzr, [x0], #8 + cmp x0, x1 + b.lo clear_loop +#endif + mov x0, #0 bl board_init_f
@@ -136,6 +148,7 @@ relocation_return: mov sp, x0 #endif
+#if !defined(CONFIG_SPL_BUILD) || !defined(CONFIG_SPL_CLEAR_BSS_F) /* * Clear BSS section */ @@ -145,6 +158,7 @@ clear_loop: str xzr, [x0], #8 cmp x0, x1 b.lo clear_loop +#endif
/* call board_init_r(gd_t *id, ulong dest_addr) */ mov x0, x18 /* gd_t */

If the malloc range passed to mem_malloc_init() is at the end of address range and 'start + size' overflows to 0, following allocations fail as mem_malloc_end is zero (which looks like uninitialized).
Fix this by subtracting 1 of 'start + size' overflows to zero.
Signed-off-by: Simon Goldschmidt simon.k.r.goldschmidt@gmail.com ---
Changes in v3: None Changes in v2: None
common/dlmalloc.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/common/dlmalloc.c b/common/dlmalloc.c index edaad299bb..51d3bd671a 100644 --- a/common/dlmalloc.c +++ b/common/dlmalloc.c @@ -603,6 +603,10 @@ void mem_malloc_init(ulong start, ulong size) mem_malloc_start = start; mem_malloc_end = start + size; mem_malloc_brk = start; + if (start && size && !mem_malloc_end) { + /* overflow: malloc area is at end of address range */ + mem_malloc_end--; + }
debug("using memory %#lx-%#lx for malloc()\n", mem_malloc_start, mem_malloc_end);

Convert debug output from '%#lx' to '0x%lx' to be compatible with tiny printf used in SPL.
Signed-off-by: Simon Goldschmidt simon.k.r.goldschmidt@gmail.com ---
Changes in v3: None Changes in v2: None
common/dlmalloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/common/dlmalloc.c b/common/dlmalloc.c index 51d3bd671a..af6f43dcc9 100644 --- a/common/dlmalloc.c +++ b/common/dlmalloc.c @@ -608,7 +608,7 @@ void mem_malloc_init(ulong start, ulong size) mem_malloc_end--; }
- debug("using memory %#lx-%#lx for malloc()\n", mem_malloc_start, + debug("using memory 0x%lx-0x%lx for malloc()\n", mem_malloc_start, mem_malloc_end); #ifdef CONFIG_SYS_MALLOC_CLEAR_ON_INIT memset((void *)mem_malloc_start, 0x0, size);

Some platforms (like socfpga A10) need a big heap before SDRAM is available (e.g. because FAT is used). For such platforms, simple_malloc is often not a good option as it does not support freeing memory. These platforms often use the non-Kconfig defines CONFIG_SYS_SPL_MALLOC_START (and its SIZE).
This patch allows enabling CONFIG_SPL_SYS_MALLOC_F_LEN while leaving CONFIG_SPL_SYS_MALLOC_SIMPLE disabled. In this case, the full malloc heap is made available as early as the simple_malloc heap would be normally.
This way, platforms can drop the non-Kconfig options to set up the full heap and rely on the same automatically calculated heap allocation used for simple heap.
Signed-off-by: Simon Goldschmidt simon.k.r.goldschmidt@gmail.com ---
Changes in v3: None Changes in v2: - use if() instead of #if - adapt documentation to using CONFIG_SPL_SYS_MALLOC_F_LEN for full-featured heap as well - ensure SPL_CLEAR_BSS_F is set when using SYS_MALLOC_F_LEN for full featured heap (or else, the heap status stored in bss will be overwritten between board_init_f and board_init_r)
Kconfig | 24 ++++++++++++++++-------- README | 15 +++++++++++---- common/spl/spl.c | 10 ++++++++-- drivers/core/Kconfig | 33 ++++++++++++++++----------------- 4 files changed, 51 insertions(+), 31 deletions(-)
diff --git a/Kconfig b/Kconfig index 305b265ed7..e4165692d1 100644 --- a/Kconfig +++ b/Kconfig @@ -155,22 +155,30 @@ config SYS_MALLOC_LEN config SPL_SYS_MALLOC_F_LEN hex "Size of malloc() pool in SPL before relocation" depends on SYS_MALLOC_F + depends on SPL_SYS_MALLOC_SIMPLE || SPL_CLEAR_BSS_F default SYS_MALLOC_F_LEN help - Before relocation, memory is very limited on many platforms. Still, - we can provide a small malloc() pool if needed. Driver model in - particular needs this to operate, so that it can allocate the - initial serial device and any others that are needed. + Before relocation (before calling board_init_r, that is), memory is + very limited on many platforms. Still, we can provide a small + malloc() pool if needed. Driver model in particular needs this to + operate, so that it can allocate the initial serial device and any + others that are needed. + This option controls the size of this initial malloc() pool by + default.
config TPL_SYS_MALLOC_F_LEN hex "Size of malloc() pool in TPL before relocation" depends on SYS_MALLOC_F + depends on TPL_SYS_MALLOC_SIMPLE || SPL_CLEAR_BSS_F default SYS_MALLOC_F_LEN help - Before relocation, memory is very limited on many platforms. Still, - we can provide a small malloc() pool if needed. Driver model in - particular needs this to operate, so that it can allocate the - initial serial device and any others that are needed. + Before relocation (before calling board_init_r, that is), memory is + very limited on many platforms. Still, we can provide a small + malloc() pool if needed. Driver model in particular needs this to + operate, so that it can allocate the initial serial device and any + others that are needed. + This option controls the size of this initial malloc() pool by + default.
menuconfig EXPERT bool "Configure standard U-Boot features (expert users)" diff --git a/README b/README index c9a20db34f..7c0fb8e4a7 100644 --- a/README +++ b/README @@ -2462,13 +2462,19 @@ FIT uImage format: CONFIG_SPL_STACK.
CONFIG_SYS_SPL_MALLOC_START - Starting address of the malloc pool used in SPL. + This is one way of providing the starting address of the malloc + pool used in SPL. If CONFIG_SPL_SYS_MALLOC_SIMPLE isn't set, + the full-featured heap will be used and it will allocate its + memory from the initial stack if CONFIG_SPL_SYS_MALLOC_F_LEN is + != 0. If you need it to use a dedicated area, use this option + to set an absolute address for the initial heap. When this option is set the full malloc is used in SPL and it is set up by spl_init() and before that, the simple malloc() - can be used if CONFIG_SYS_MALLOC_F is defined. + can still be used if CONFIG_SPL_SYS_MALLOC_F_LEN is defined.
CONFIG_SYS_SPL_MALLOC_SIZE - The size of the malloc pool used in SPL. + The size of the malloc pool used in SPL if + CONFIG_SYS_SPL_MALLOC_START is set.
CONFIG_SPL_OS_BOOT Enable booting directly to an OS from SPL. @@ -2743,7 +2749,8 @@ Configuration Settings: - CONFIG_SYS_MALLOC_SIMPLE Provides a simple and small malloc() and calloc() for those boards which do not use the full malloc in SPL (which is - enabled with CONFIG_SYS_SPL_MALLOC_START). + enabled by default with CONFIG_SYS_SPL_MALLOC_START or + CONFIG_SPL_SYS_MALLOC_F_LEN).
- CONFIG_SYS_NONCACHED_MEMORY: Size of non-cached memory area. This area of memory will be diff --git a/common/spl/spl.c b/common/spl/spl.c index 88d4b8a9bf..dec06c6e07 100644 --- a/common/spl/spl.c +++ b/common/spl/spl.c @@ -383,8 +383,14 @@ static int spl_common_init(bool setup_malloc) #ifdef CONFIG_MALLOC_F_ADDR gd->malloc_base = CONFIG_MALLOC_F_ADDR; #endif - gd->malloc_limit = CONFIG_VAL(SYS_MALLOC_F_LEN); - gd->malloc_ptr = 0; + if (CONFIG_IS_ENABLED(SYS_MALLOC_SIMPLE)) { + gd->malloc_limit = CONFIG_VAL(SYS_MALLOC_F_LEN); + gd->malloc_ptr = 0; + } else { + mem_malloc_init(gd->malloc_base, + CONFIG_VAL(SYS_MALLOC_F_LEN)); + gd->flags |= GD_FLG_FULL_MALLOC_INIT; + } } #endif ret = bootstage_init(true); diff --git a/drivers/core/Kconfig b/drivers/core/Kconfig index ddf2fb3fb8..297c19383f 100644 --- a/drivers/core/Kconfig +++ b/drivers/core/Kconfig @@ -12,28 +12,27 @@ config SPL_DM bool "Enable Driver Model for SPL" depends on DM && SPL help - Enable driver model in SPL. You will need to provide a - suitable malloc() implementation. If you are not using the - full malloc() enabled by CONFIG_SYS_SPL_MALLOC_START, - consider using CONFIG_SYS_MALLOC_SIMPLE. In that case you - must provide CONFIG_SPL_SYS_MALLOC_F_LEN to set the size. - In most cases driver model will only allocate a few uclasses - and devices in SPL, so 1KB should be enable. See - CONFIG_SPL_SYS_MALLOC_F_LEN for more details on how to enable it. + Enable driver model in SPL. You will need to provide a suitable + malloc() implementation. In most cases driver model will only + allocate a few uclasses and devices in SPL wihout freeing them, so + 1KB should be enough. + If full malloc() (default if CONFIG_SYS_SPL_MALLOC_START or + CONFIG_SPL_SYS_MALLOC_F_LEN are set) is too big for your board, + consider using CONFIG_SPL_SYS_MALLOC_SIMPLE (see help of that option + or CONFIG_SPL_SYS_MALLOC_F_LEN for more info).
config TPL_DM bool "Enable Driver Model for TPL" depends on DM && TPL help - Enable driver model in TPL. You will need to provide a - suitable malloc() implementation. If you are not using the - full malloc() enabled by CONFIG_SYS_SPL_MALLOC_START, - consider using CONFIG_SYS_MALLOC_SIMPLE. In that case you - must provide CONFIG_SPL_SYS_MALLOC_F_LEN to set the size. - In most cases driver model will only allocate a few uclasses - and devices in SPL, so 1KB should be enough. See - CONFIG_SPL_SYS_MALLOC_F_LEN for more details on how to enable it. - Disable this for very small implementations. + Enable driver model in TPL. You will need to provide a suitable + malloc() implementation. In most cases driver model will only + allocate a few uclasses and devices in TPL wihout freeing them, so + 1KB should be enough. + If full malloc() (default if CONFIG_SYS_SPL_MALLOC_START or + CONFIG_TPL_SYS_MALLOC_F_LEN are set) is too big for your board, + consider using CONFIG_TPL_SYS_MALLOC_SIMPLE (see help of that option + or CONFIG_TPL_SYS_MALLOC_F_LEN for more info).
config DM_WARN bool "Enable warnings in driver model"

Instead of fixing the SPL heap to 64 KiB in the board config header via CONFIG_SYS_SPL_MALLOC_SIZE, let's just use CONFIG_SPL_SYS_MALLOC_F_LEN in the defconfig.
This also has the advantage that it removes sub-mach specific ifdefs in socfpga_common.h.
Signed-off-by: Simon Goldschmidt simon.k.r.goldschmidt@gmail.com ---
Changes in v3: - fixed summary ("stack" -> "heap") - enable CONFIG_SPL_CLEAR_BSS_F for socfpga_arria10 using full malloc early in SPL - rebased
Changes in v2: None
configs/socfpga_arria10_defconfig | 2 ++ include/configs/socfpga_common.h | 14 -------------- 2 files changed, 2 insertions(+), 14 deletions(-)
diff --git a/configs/socfpga_arria10_defconfig b/configs/socfpga_arria10_defconfig index f321a0ac3b..094232e847 100644 --- a/configs/socfpga_arria10_defconfig +++ b/configs/socfpga_arria10_defconfig @@ -2,6 +2,7 @@ CONFIG_ARM=y CONFIG_ARCH_SOCFPGA=y CONFIG_SYS_TEXT_BASE=0x01000040 CONFIG_SYS_MALLOC_F_LEN=0x2000 +CONFIG_SPL_SYS_MALLOC_F_LEN=0x10000 CONFIG_TARGET_SOCFPGA_ARRIA10_SOCDK=y CONFIG_SPL=y CONFIG_IDENT_STRING="socfpga_arria10" @@ -13,6 +14,7 @@ CONFIG_BOOTARGS="console=ttyS0,115200" CONFIG_DEFAULT_FDT_FILE="socfpga_arria10_socdk_sdmmc.dtb" CONFIG_DISPLAY_BOARDINFO_LATE=y CONFIG_BOUNCE_BUFFER=y +CONFIG_SPL_CLEAR_BSS_F=y CONFIG_SPL_FPGA_SUPPORT=y CONFIG_SPL_SPI_LOAD=y CONFIG_CMD_ASKENV=y diff --git a/include/configs/socfpga_common.h b/include/configs/socfpga_common.h index 09c9b7ca9e..24f8665c24 100644 --- a/include/configs/socfpga_common.h +++ b/include/configs/socfpga_common.h @@ -252,16 +252,6 @@ unsigned int cm_get_qspi_controller_clk_hz(void); #define CONFIG_SPL_MAX_SIZE CONFIG_SYS_INIT_RAM_SIZE #endif
-#if defined(CONFIG_TARGET_SOCFPGA_ARRIA10) -/* SPL memory allocation configuration, this is for FAT implementation */ -#ifndef CONFIG_SYS_SPL_MALLOC_START -#define CONFIG_SYS_SPL_MALLOC_SIZE 0x00010000 -#define CONFIG_SYS_SPL_MALLOC_START (CONFIG_SYS_INIT_RAM_SIZE - \ - CONFIG_SYS_SPL_MALLOC_SIZE + \ - CONFIG_SYS_INIT_RAM_ADDR) -#endif -#endif - /* SPL SDMMC boot support */ #ifdef CONFIG_SPL_MMC_SUPPORT #if defined(CONFIG_SPL_FS_FAT) || defined(CONFIG_SPL_FS_EXT4) @@ -295,11 +285,7 @@ unsigned int cm_get_qspi_controller_clk_hz(void); /* * Stack setup */ -#if defined(CONFIG_TARGET_SOCFPGA_GEN5) #define CONFIG_SPL_STACK CONFIG_SYS_INIT_SP_ADDR -#elif defined(CONFIG_TARGET_SOCFPGA_ARRIA10) -#define CONFIG_SPL_STACK CONFIG_SYS_SPL_MALLOC_START -#endif
/* Extra Environment */ #ifndef CONFIG_SPL_BUILD
participants (4)
-
Andreas Dannenberg
-
Heiko Schocher
-
Simon Glass
-
Simon Goldschmidt