[U-Boot] [PATCH 0/4] sunxi: SPL size reduction patches

Hi All,
Inspired by Simon's work to make the FEL SPL and regular SPL builds more similar I've been looking into reducing the size of the SPL, resulting in the following patch series. This all seems quite safe, but we are past rc1, so may be best to keep this on a branch for now, or not ...
Simon, I think the malloc_simple changes should go through your tree, once those are merged I'll add the sunxi patch enabling malloc_simple usage in the SPL.
With this series we're already quite close to getting a full-blown SPL to fit in 16K, I've looked at removing CONFIG_SPL_LIBCOMMON_SUPPORT but that won't fly well I think. We could however add a CONFIG_SPL_NO_PRINTF, which automatically gets set when CONFIG_SPL_LIBCOMMON_SUPPORT is not set, and use that on sunxi, assuming that you (Simon) are ok with not being able to use printf in the device-model code which gets used in the SPL.
Things already almost built when not setting CONFIG_SPL_LIBCOMMON_SUPPORT since there are already a lot of places checking for that before calling printf. We could change all the CONFIG_SPL_LIBCOMMON_SUPPORT to CONFIG_SPL_NO_PRINTF checks and define CONFIG_SPL_NO_PRINTF on sunxi to win another "large" chunk of RAM.
Something else to look at is at the memory map of the first 32k of SRAM when in FEL mode. The only documentation we've is: https://github.com/hno/Allwinner-Info/blob/master/FEL-usb/USB-protocol.txt
Which is reverse engineered and not 100% clear on the memory map. We could add a memset to 0xaa for 0x6000 - 0x8000 to the fel spl as a test, as I've the feeling that what hno has found there are just scratch buffers and that using that in the SPL will be save, if that is the case we could use 0x2000 - 0x0000 (growing downwards) as stack, and always use a text-base of 0x2000 for the SPL, with the SPL code / data segments living at 0x2000 - 0x8000 and then everything should fit as is, and we can have one unified SPL binary for both FEL and sdcard booting.
I don't have the time to look into this further atm, so I hope that one of you (Simon or Siarhei) has time to look into this further.
I see that we also never merged this related fix: http://lists.denx.de/pipermail/u-boot/2014-July/183986.html
We should probably merge a version of that when Simon's patches for fixing FEL have landed. I think it may be best to just always set the L2EN bit on sun4i/sun5i, without detecting whether we're in fel mode or not, setting it when it is already said should be a nop, and thus harmless.
Regards,
Hans

Move the dram helper functions to a separate C file, rather then having them as inline helpers in dram.h. This saves 144 bytes in the .text segment for sun6i builds.
Signed-off-by: Hans de Goede hdegoede@redhat.com --- arch/arm/cpu/armv7/sunxi/Makefile | 1 + arch/arm/cpu/armv7/sunxi/dram_helpers.c | 37 +++++++++++++++++++++++++++++++++ arch/arm/include/asm/arch-sunxi/dram.h | 28 ++----------------------- 3 files changed, 40 insertions(+), 26 deletions(-) create mode 100644 arch/arm/cpu/armv7/sunxi/dram_helpers.c
diff --git a/arch/arm/cpu/armv7/sunxi/Makefile b/arch/arm/cpu/armv7/sunxi/Makefile index f8a6bea..bf95928 100644 --- a/arch/arm/cpu/armv7/sunxi/Makefile +++ b/arch/arm/cpu/armv7/sunxi/Makefile @@ -12,6 +12,7 @@ obj-y += timer.o obj-y += board.o obj-y += clock.o obj-y += cpu_info.o +obj-y += dram_helpers.o obj-y += pinmux.o ifndef CONFIG_MACH_SUN9I obj-y += usbc.o diff --git a/arch/arm/cpu/armv7/sunxi/dram_helpers.c b/arch/arm/cpu/armv7/sunxi/dram_helpers.c new file mode 100644 index 0000000..9a94e1b --- /dev/null +++ b/arch/arm/cpu/armv7/sunxi/dram_helpers.c @@ -0,0 +1,37 @@ +/* + * DRAM init helper functions + * + * (C) Copyright 2015 Hans de Goede hdegoede@redhat.com + * + * SPDX-License-Identifier: GPL-2.0+ + */ + +#include <common.h> +#include <asm/io.h> +#include <asm/arch/dram.h> + +/* + * Wait up to 1s for value to be set in given part of reg. + */ +void mctl_await_completion(u32 *reg, u32 mask, u32 val) +{ + unsigned long tmo = timer_get_us() + 1000000; + + while ((readl(reg) & mask) != val) { + if (timer_get_us() > tmo) + panic("Timeout initialising DRAM\n"); + } +} + +/* + * Test if memory at offset offset matches memory at begin of DRAM + */ +bool mctl_mem_matches(u32 offset) +{ + /* Try to write different values to RAM at two addresses */ + writel(0, CONFIG_SYS_SDRAM_BASE); + writel(0xaa55aa55, CONFIG_SYS_SDRAM_BASE + offset); + /* Check if the same value is actually observed when reading back */ + return readl(CONFIG_SYS_SDRAM_BASE) == + readl(CONFIG_SYS_SDRAM_BASE + offset); +} diff --git a/arch/arm/include/asm/arch-sunxi/dram.h b/arch/arm/include/asm/arch-sunxi/dram.h index 7ff43e6..aedd194 100644 --- a/arch/arm/include/asm/arch-sunxi/dram.h +++ b/arch/arm/include/asm/arch-sunxi/dram.h @@ -25,31 +25,7 @@ #endif
unsigned long sunxi_dram_init(void); - -/* - * Wait up to 1s for value to be set in given part of reg. - */ -static inline void mctl_await_completion(u32 *reg, u32 mask, u32 val) -{ - unsigned long tmo = timer_get_us() + 1000000; - - while ((readl(reg) & mask) != val) { - if (timer_get_us() > tmo) - panic("Timeout initialising DRAM\n"); - } -} - -/* - * Test if memory at offset offset matches memory at begin of DRAM - */ -static inline bool mctl_mem_matches(u32 offset) -{ - /* Try to write different values to RAM at two addresses */ - writel(0, CONFIG_SYS_SDRAM_BASE); - writel(0xaa55aa55, CONFIG_SYS_SDRAM_BASE + offset); - /* Check if the same value is actually observed when reading back */ - return readl(CONFIG_SYS_SDRAM_BASE) == - readl(CONFIG_SYS_SDRAM_BASE + offset); -} +void mctl_await_completion(u32 *reg, u32 mask, u32 val); +bool mctl_mem_matches(u32 offset);
#endif /* _SUNXI_DRAM_H */

On 4 February 2015 at 05:05, Hans de Goede hdegoede@redhat.com wrote:
Move the dram helper functions to a separate C file, rather then having them as inline helpers in dram.h. This saves 144 bytes in the .text segment for sun6i builds.
Signed-off-by: Hans de Goede hdegoede@redhat.com
arch/arm/cpu/armv7/sunxi/Makefile | 1 + arch/arm/cpu/armv7/sunxi/dram_helpers.c | 37 +++++++++++++++++++++++++++++++++ arch/arm/include/asm/arch-sunxi/dram.h | 28 ++----------------------- 3 files changed, 40 insertions(+), 26 deletions(-) create mode 100644 arch/arm/cpu/armv7/sunxi/dram_helpers.c
Reviewed-by: Simon Glass sjg@chromium.org

Before this patch malloc_simple would always allocate a chunk of RAM from the stack. This commit adds a CONFIG_SYS_MALLOC_F_BASE define, which when set directly specifies the memory address to use for the heap with malloc_simple.
Signed-off-by: Hans de Goede hdegoede@redhat.com --- arch/arm/lib/crt0.S | 2 +- common/board_f.c | 4 ++++ common/spl/spl.c | 3 +++ 3 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S index 22df3e5..a80dbf7 100644 --- a/arch/arm/lib/crt0.S +++ b/arch/arm/lib/crt0.S @@ -78,7 +78,7 @@ clr_gd: strlo r0, [r1] /* clear 32-bit GD word */ addlo r1, r1, #4 /* move to next */ blo clr_gd -#if defined(CONFIG_SYS_MALLOC_F_LEN) +#if defined(CONFIG_SYS_MALLOC_F_LEN) && !defined(CONFIG_SYS_MALLOC_F_BASE) sub sp, sp, #CONFIG_SYS_MALLOC_F_LEN str sp, [r9, #GD_MALLOC_BASE] #endif diff --git a/common/board_f.c b/common/board_f.c index 7953137..504dc1c 100644 --- a/common/board_f.c +++ b/common/board_f.c @@ -786,7 +786,11 @@ static int mark_bootstage(void) static int initf_malloc(void) { #ifdef CONFIG_SYS_MALLOC_F_LEN +#if defined(CONFIG_SYS_MALLOC_F_BASE) + gd->malloc_base = CONFIG_SYS_MALLOC_F_BASE; +#else assert(gd->malloc_base); /* Set up by crt0.S */ +#endif gd->malloc_limit = gd->malloc_base + CONFIG_SYS_MALLOC_F_LEN; gd->malloc_ptr = 0; #endif diff --git a/common/spl/spl.c b/common/spl/spl.c index daaeb50..f751fcc 100644 --- a/common/spl/spl.c +++ b/common/spl/spl.c @@ -146,6 +146,9 @@ void board_init_r(gd_t *dummy1, ulong dummy2) CONFIG_SYS_SPL_MALLOC_SIZE); gd->flags |= GD_FLG_FULL_MALLOC_INIT; #elif defined(CONFIG_SYS_MALLOC_F_LEN) +#if defined(CONFIG_SYS_MALLOC_F_BASE) + gd->malloc_base = CONFIG_SYS_MALLOC_F_BASE; +#endif gd->malloc_limit = gd->malloc_base + CONFIG_SYS_MALLOC_F_LEN; gd->malloc_ptr = 0; #endif

On 4 February 2015 at 05:05, Hans de Goede hdegoede@redhat.com wrote:
Before this patch malloc_simple would always allocate a chunk of RAM from the stack. This commit adds a CONFIG_SYS_MALLOC_F_BASE define, which when set directly specifies the memory address to use for the heap with malloc_simple.
Signed-off-by: Hans de Goede hdegoede@redhat.com
arch/arm/lib/crt0.S | 2 +- common/board_f.c | 4 ++++ common/spl/spl.c | 3 +++ 3 files changed, 8 insertions(+), 1 deletion(-)
Reviewed-by: Simon Glass sjg@chromium.org

All callers of malloc should already do error checking, and may even be able to continue without the alloc succeeding.
Moreover, common/malloc_simple.c is the only user of .rodata.str1.1 in common/built-in.o when building the SPL, triggering this gcc bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54303
Causing .rodata to grow with e.g. 0xc21 bytes, nullifying all benefits of using malloc_simple in the first place.
Signed-off-by: Hans de Goede hdegoede@redhat.com --- common/malloc_simple.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/common/malloc_simple.c b/common/malloc_simple.c index afdacff..64ae036 100644 --- a/common/malloc_simple.c +++ b/common/malloc_simple.c @@ -19,7 +19,7 @@ void *malloc_simple(size_t bytes)
new_ptr = gd->malloc_ptr + bytes; if (new_ptr > gd->malloc_limit) - panic("Out of pre-reloc memory"); + return NULL; ptr = map_sysmem(gd->malloc_base + gd->malloc_ptr, bytes); gd->malloc_ptr = ALIGN(new_ptr, sizeof(new_ptr)); return ptr;

On 4 February 2015 at 05:05, Hans de Goede hdegoede@redhat.com wrote:
All callers of malloc should already do error checking, and may even be able to continue without the alloc succeeding.
Moreover, common/malloc_simple.c is the only user of .rodata.str1.1 in common/built-in.o when building the SPL, triggering this gcc bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54303
Causing .rodata to grow with e.g. 0xc21 bytes, nullifying all benefits of using malloc_simple in the first place.
Signed-off-by: Hans de Goede hdegoede@redhat.com
common/malloc_simple.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Great find, thanks!
Acked-by: Simon Glass sjg@chromium.org

On Wed, 4 Feb 2015 13:05:50 +0100 Hans de Goede hdegoede@redhat.com wrote:
All callers of malloc should already do error checking, and may even be able to continue without the alloc succeeding.
Moreover, common/malloc_simple.c is the only user of .rodata.str1.1 in common/built-in.o when building the SPL, triggering this gcc bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54303
Causing .rodata to grow with e.g. 0xc21 bytes, nullifying all benefits of using malloc_simple in the first place.
Signed-off-by: Hans de Goede hdegoede@redhat.com
common/malloc_simple.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/common/malloc_simple.c b/common/malloc_simple.c index afdacff..64ae036 100644 --- a/common/malloc_simple.c +++ b/common/malloc_simple.c @@ -19,7 +19,7 @@ void *malloc_simple(size_t bytes)
new_ptr = gd->malloc_ptr + bytes; if (new_ptr > gd->malloc_limit)
panic("Out of pre-reloc memory");
ptr = map_sysmem(gd->malloc_base + gd->malloc_ptr, bytes); gd->malloc_ptr = ALIGN(new_ptr, sizeof(new_ptr)); return ptr;return NULL;
The other patches look great, but I'm not convinced that requiring the malloc callers to do error checking is such a great idea. This means a lot of checks in a lot of places with extra code paths instead of just a single check in one place for the "impossible to happen" critical failure. I think that we should normally assume that malloc always succeeds in the production code and the panic may only happen while debugging.
If the malloc pool is in the DRAM, then we usually have orders of magnitude more space than necessary. While the code might be still in the SRAM at the same time (the extra branching code logic for errors checking may be wasting the scarce SRAM space).
If the malloc pool is in the SRAM and potentially may fail allocations, then this is a major reliability problem by itself. The malloc pool is always inefficient, has fragmentation problems, etc. If this is the case, then IMHO the only right solution is to replace such problematic dynamic allocations with static reservations in the ".data" section. Otherwise the reliability critical things (something like Mars rovers for example) will be sometimes failing. The Murphy law exists for a reason :-)
The workaround for the GCC compiler bug is orthogonal to this. Maybe there is some other solution?

Hi,
On 05-02-15 12:24, Siarhei Siamashka wrote:
On Wed, 4 Feb 2015 13:05:50 +0100 Hans de Goede hdegoede@redhat.com wrote:
All callers of malloc should already do error checking, and may even be able to continue without the alloc succeeding.
Moreover, common/malloc_simple.c is the only user of .rodata.str1.1 in common/built-in.o when building the SPL, triggering this gcc bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54303
Causing .rodata to grow with e.g. 0xc21 bytes, nullifying all benefits of using malloc_simple in the first place.
Signed-off-by: Hans de Goede hdegoede@redhat.com
common/malloc_simple.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/common/malloc_simple.c b/common/malloc_simple.c index afdacff..64ae036 100644 --- a/common/malloc_simple.c +++ b/common/malloc_simple.c @@ -19,7 +19,7 @@ void *malloc_simple(size_t bytes)
new_ptr = gd->malloc_ptr + bytes; if (new_ptr > gd->malloc_limit)
panic("Out of pre-reloc memory");
ptr = map_sysmem(gd->malloc_base + gd->malloc_ptr, bytes); gd->malloc_ptr = ALIGN(new_ptr, sizeof(new_ptr)); return ptr;return NULL;
The other patches look great, but I'm not convinced that requiring the malloc callers to do error checking is such a great idea. This means a lot of checks in a lot of places with extra code paths instead of just a single check in one place for the "impossible to happen" critical failure. I think that we should normally assume that malloc always succeeds in the production code and the panic may only happen while debugging.
If the malloc pool is in the DRAM, then we usually have orders of magnitude more space than necessary. While the code might be still in the SRAM at the same time (the extra branching code logic for errors checking may be wasting the scarce SRAM space).
If the malloc pool is in the SRAM and potentially may fail allocations, then this is a major reliability problem by itself. The malloc pool is always inefficient, has fragmentation problems, etc. If this is the case, then IMHO the only right solution is to replace such problematic dynamic allocations with static reservations in the ".data" section. Otherwise the reliability critical things (something like Mars rovers for example) will be sometimes failing. The Murphy law exists for a reason :-)
Actually we had a discussion about this at the u-boot miniconf at ELCE, I was actually advocating for having malloc behave as gmalloc from glib2 does, so basically what you're advocating for, but I lost the discussion. All this patch does is bring the simple, light-weight malloc from malloc_simple.c inline with the regular malloc implementation which can already fail.
The workaround for the GCC compiler bug is orthogonal to this. Maybe there is some other solution?
To workaround this gcc bug we need to not use any const strings. I guess we could do something like this and still panic ():
char str[4];
str[0] = 'O'; str[1] = 'O'; str[2] = 'M'; str[3] = 0;
But I think that just removing the panic is better, as it makes malloc_simple.c consistent with the regular malloc. In the end we should really get the gcc bug fixed, this will likely trim down .rodata significantly.
Regards,
Hans
p.s.
I've also seen your reply to 0/4, I will reply when I've some more time.

Hi,
On 5 February 2015 at 10:53, Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 05-02-15 12:24, Siarhei Siamashka wrote:
On Wed, 4 Feb 2015 13:05:50 +0100 Hans de Goede hdegoede@redhat.com wrote:
All callers of malloc should already do error checking, and may even be able to continue without the alloc succeeding.
Moreover, common/malloc_simple.c is the only user of .rodata.str1.1 in common/built-in.o when building the SPL, triggering this gcc bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54303
Causing .rodata to grow with e.g. 0xc21 bytes, nullifying all benefits of using malloc_simple in the first place.
Signed-off-by: Hans de Goede hdegoede@redhat.com
common/malloc_simple.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/common/malloc_simple.c b/common/malloc_simple.c index afdacff..64ae036 100644 --- a/common/malloc_simple.c +++ b/common/malloc_simple.c @@ -19,7 +19,7 @@ void *malloc_simple(size_t bytes)
new_ptr = gd->malloc_ptr + bytes; if (new_ptr > gd->malloc_limit)
panic("Out of pre-reloc memory");
return NULL; ptr = map_sysmem(gd->malloc_base + gd->malloc_ptr, bytes); gd->malloc_ptr = ALIGN(new_ptr, sizeof(new_ptr)); return ptr;
The other patches look great, but I'm not convinced that requiring the malloc callers to do error checking is such a great idea. This means a lot of checks in a lot of places with extra code paths instead of just a single check in one place for the "impossible to happen" critical failure. I think that we should normally assume that malloc always succeeds in the production code and the panic may only happen while debugging.
If the malloc pool is in the DRAM, then we usually have orders of magnitude more space than necessary. While the code might be still in the SRAM at the same time (the extra branching code logic for errors checking may be wasting the scarce SRAM space).
If the malloc pool is in the SRAM and potentially may fail allocations, then this is a major reliability problem by itself. The malloc pool is always inefficient, has fragmentation problems, etc. If this is the case, then IMHO the only right solution is to replace such problematic dynamic allocations with static reservations in the ".data" section. Otherwise the reliability critical things (something like Mars rovers for example) will be sometimes failing. The Murphy law exists for a reason :-)
Actually we had a discussion about this at the u-boot miniconf at ELCE, I was actually advocating for having malloc behave as gmalloc from glib2 does, so basically what you're advocating for, but I lost the discussion. All this patch does is bring the simple, light-weight malloc from malloc_simple.c inline with the regular malloc implementation which can already fail.
The workaround for the GCC compiler bug is orthogonal to this. Maybe there is some other solution?
To workaround this gcc bug we need to not use any const strings. I guess we could do something like this and still panic ():
char str[4];
str[0] = 'O'; str[1] = 'O'; str[2] = 'M'; str[3] = 0;
But I think that just removing the panic is better, as it makes malloc_simple.c consistent with the regular malloc. In the end we should really get the gcc bug fixed, this will likely trim down .rodata significantly.
Yes I'd like to apply this one. It works around a really annoying gcc bug.
Hans I think you are right that we can deal with the errors at the top level, as we do with other things. SPL error handling is probably not great, but that's a separate issue.
Regards, Simon

On 5 February 2015 at 11:00, Simon Glass sjg@chromium.org wrote:
Hi,
On 5 February 2015 at 10:53, Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 05-02-15 12:24, Siarhei Siamashka wrote:
On Wed, 4 Feb 2015 13:05:50 +0100 Hans de Goede hdegoede@redhat.com wrote:
All callers of malloc should already do error checking, and may even be able to continue without the alloc succeeding.
Moreover, common/malloc_simple.c is the only user of .rodata.str1.1 in common/built-in.o when building the SPL, triggering this gcc bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54303
Causing .rodata to grow with e.g. 0xc21 bytes, nullifying all benefits of using malloc_simple in the first place.
Signed-off-by: Hans de Goede hdegoede@redhat.com
common/malloc_simple.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/common/malloc_simple.c b/common/malloc_simple.c index afdacff..64ae036 100644 --- a/common/malloc_simple.c +++ b/common/malloc_simple.c @@ -19,7 +19,7 @@ void *malloc_simple(size_t bytes)
new_ptr = gd->malloc_ptr + bytes; if (new_ptr > gd->malloc_limit)
panic("Out of pre-reloc memory");
return NULL; ptr = map_sysmem(gd->malloc_base + gd->malloc_ptr, bytes); gd->malloc_ptr = ALIGN(new_ptr, sizeof(new_ptr)); return ptr;
The other patches look great, but I'm not convinced that requiring the malloc callers to do error checking is such a great idea. This means a lot of checks in a lot of places with extra code paths instead of just a single check in one place for the "impossible to happen" critical failure. I think that we should normally assume that malloc always succeeds in the production code and the panic may only happen while debugging.
If the malloc pool is in the DRAM, then we usually have orders of magnitude more space than necessary. While the code might be still in the SRAM at the same time (the extra branching code logic for errors checking may be wasting the scarce SRAM space).
If the malloc pool is in the SRAM and potentially may fail allocations, then this is a major reliability problem by itself. The malloc pool is always inefficient, has fragmentation problems, etc. If this is the case, then IMHO the only right solution is to replace such problematic dynamic allocations with static reservations in the ".data" section. Otherwise the reliability critical things (something like Mars rovers for example) will be sometimes failing. The Murphy law exists for a reason :-)
Actually we had a discussion about this at the u-boot miniconf at ELCE, I was actually advocating for having malloc behave as gmalloc from glib2 does, so basically what you're advocating for, but I lost the discussion. All this patch does is bring the simple, light-weight malloc from malloc_simple.c inline with the regular malloc implementation which can already fail.
The workaround for the GCC compiler bug is orthogonal to this. Maybe there is some other solution?
To workaround this gcc bug we need to not use any const strings. I guess we could do something like this and still panic ():
char str[4];
str[0] = 'O'; str[1] = 'O'; str[2] = 'M'; str[3] = 0;
But I think that just removing the panic is better, as it makes malloc_simple.c consistent with the regular malloc. In the end we should really get the gcc bug fixed, this will likely trim down .rodata significantly.
Yes I'd like to apply this one. It works around a really annoying gcc bug.
Hans I think you are right that we can deal with the errors at the top level, as we do with other things. SPL error handling is probably not great, but that's a separate issue.
Regards, Simon
Applied to u-boot-dm, thanks!

common/dlmalloc.c is quite big, both in .text and .data usage. E.g. for a Mele_M9 sun6i board build this reduces .text from 0x4214 to 0x3b94 bytes, and .data from 0x54c to 0x144 bytes.
Signed-off-by: Hans de Goede hdegoede@redhat.com --- include/configs/sunxi-common.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/include/configs/sunxi-common.h b/include/configs/sunxi-common.h index 33c0614..b0a360e 100644 --- a/include/configs/sunxi-common.h +++ b/include/configs/sunxi-common.h @@ -18,6 +18,7 @@ */ #define CONFIG_SUNXI /* sunxi family */ #ifdef CONFIG_SPL_BUILD +#define CONFIG_SYS_MALLOC_SIMPLE #ifndef CONFIG_SPL_FEL #define CONFIG_SYS_THUMB_BUILD /* Thumbs mode to save space in SPL */ #endif @@ -180,8 +181,8 @@ /* end of 32 KiB in sram */ #define LOW_LEVEL_SRAM_STACK 0x00008000 /* End of sram */ #define CONFIG_SPL_STACK LOW_LEVEL_SRAM_STACK -#define CONFIG_SYS_SPL_MALLOC_START 0x4ff00000 -#define CONFIG_SYS_SPL_MALLOC_SIZE 0x00080000 /* 512 KiB */ +#define CONFIG_SYS_MALLOC_F_BASE 0x4ff00000 +#define CONFIG_SYS_MALLOC_F_LEN 0x00080000 /* 512 KiB */
/* I2C */ #if defined CONFIG_AXP152_POWER || defined CONFIG_AXP209_POWER

On 4 February 2015 at 05:05, Hans de Goede hdegoede@redhat.com wrote:
common/dlmalloc.c is quite big, both in .text and .data usage. E.g. for a Mele_M9 sun6i board build this reduces .text from 0x4214 to 0x3b94 bytes, and .data from 0x54c to 0x144 bytes.
Signed-off-by: Hans de Goede hdegoede@redhat.com
include/configs/sunxi-common.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
Reviewed-by: Simon Glass sjg@chromium.org

On Wed, 2015-02-04 at 13:05 +0100, Hans de Goede wrote:
Hi All,
Inspired by Simon's work to make the FEL SPL and regular SPL builds more similar I've been looking into reducing the size of the SPL, resulting in the following patch series. This all seems quite safe, but we are past rc1, so may be best to keep this on a branch for now, or not ...
I think #next is probably best?
Anyway, for #1 and #4: Acked-by: Ian Campbell ijc@hellion.org.uk
For #2 and #3, well they look good to me, I'm not the maintainer of that code but you can add my Ack if you like
Ian.

On Wed, 4 Feb 2015 13:05:47 +0100 Hans de Goede hdegoede@redhat.com wrote:
Hi All,
Inspired by Simon's work to make the FEL SPL and regular SPL builds more similar I've been looking into reducing the size of the SPL, resulting in the following patch series. This all seems quite safe, but we are past rc1, so may be best to keep this on a branch for now, or not ...
Simon, I think the malloc_simple changes should go through your tree, once those are merged I'll add the sunxi patch enabling malloc_simple usage in the SPL.
With this series we're already quite close to getting a full-blown SPL to fit in 16K, I've looked at removing CONFIG_SPL_LIBCOMMON_SUPPORT but that won't fly well I think. We could however add a CONFIG_SPL_NO_PRINTF, which automatically gets set when CONFIG_SPL_LIBCOMMON_SUPPORT is not set, and use that on sunxi, assuming that you (Simon) are ok with not being able to use printf in the device-model code which gets used in the SPL.
Things already almost built when not setting CONFIG_SPL_LIBCOMMON_SUPPORT since there are already a lot of places checking for that before calling printf. We could change all the CONFIG_SPL_LIBCOMMON_SUPPORT to CONFIG_SPL_NO_PRINTF checks and define CONFIG_SPL_NO_PRINTF on sunxi to win another "large" chunk of RAM.
Thanks for these patches. The SPL size optimization is very useful, regardless of whether we want to wilfully confine ourself to just 16 KiB or not. Even with full 32 KiB, we still have NAND support pending, the potential device-model code size overhead, not to mention my little project which needs DRAM init code for multiple SoCs in a single SPL. And extra headroom is always good to safeguard against poor code generation by certain versions of compilers, etc.
I was intending to do something like this myself eventually, but did not like the idea of being the only guy who actually cares about the SPL size. Sometimes it has to get worse before it gets better ;-)
Something else to look at is at the memory map of the first 32k of SRAM when in FEL mode. The only documentation we've is: https://github.com/hno/Allwinner-Info/blob/master/FEL-usb/USB-protocol.txt
Which is reverse engineered and not 100% clear on the memory map. We could add a memset to 0xaa for 0x6000 - 0x8000 to the fel spl as a test, as I've the feeling that what hno has found there are just scratch buffers and that using that in the SPL will be save, if that is the case we could use 0x2000 - 0x0000 (growing downwards) as stack, and always use a text-base of 0x2000 for the SPL, with the SPL code / data segments living at 0x2000 - 0x8000 and then everything should fit as is, and we can have one unified SPL binary for both FEL and sdcard booting.
The BROM sets up two stacks for itself. One at 0x2000 (and growing down) for the IRQ handler. And another one at 0x7000 (and also growing down) for the regular code. This unfortunately splits the usable SRAM address space into disconnected chunks. Clashing with the upper parts of these stacks is not a great idea. For example:
$ fel clear 0x0000 0x1F00 $ fel ver AWUSBFEX soc=00165100(A20) 00000001 ver=0001 44 08 scratchpad=00007e00 00000000 00000000
Great, we are still alive! But:
$ fel clear 0x1F00 0x100 $ fel ver
Here it dies and shows that messing with the addresses just below 0x2000 (the IRQ stack) is bad. The same applies to another stack.
Now I'm trying to get the best use of the available free SRAM areas with a patched 'fel' tool: http://lists.denx.de/pipermail/u-boot/2015-February/204024.html
The layout of the usable SRAM locations can be SoC variant specific. This needs further investigation.
I don't have the time to look into this further atm, so I hope that one of you (Simon or Siarhei) has time to look into this further.
I see that we also never merged this related fix: http://lists.denx.de/pipermail/u-boot/2014-July/183986.html
We should probably merge a version of that when Simon's patches for fixing FEL have landed. I think it may be best to just always set the L2EN bit on sun4i/sun5i, without detecting whether we're in fel mode or not, setting it when it is already said should be a nop, and thus harmless.
This could be surely picked up, but IMHO it is not that urgent.
participants (4)
-
Hans de Goede
-
Ian Campbell
-
Siarhei Siamashka
-
Simon Glass