
On 15/03/2023 16.24, Frieder Schrempf wrote:
On 15.03.23 15:42, Frieder Schrempf wrote:
On 15.03.23 15:17, Michael Nazzareno Trimarchi wrote:
Hi
On Wed, Mar 15, 2023 at 3:13 PM Frieder Schrempf frieder.schrempf@kontron.de wrote:
Hi,
I'm trying to bring up a new board based on the i.MX8MP and I have an issue I'm hoping someone can help solving.
I'm seeing failures in the early SPL code, usually in the DDR initialization. Often they look like:
U-Boot SPL 2023.04-rc3 (Mar 07 2023 - 14:32:34 +0000) Training FAILED Failed to initialize DDR RAM! ### ERROR ### Please RESET the board ###
But sometimes ddr_init() doesn't even return an error and only the get_ram_size() afterwards which tries to allocate the memory fails.
In my experience you don't have space inside the cpu internal memory. It means that you overlap some stack with the code. Change the printf means move a bit. So you have problem but depends what you are going to destroy
Thanks for your reply. That's exactly what I'm thinking, too.
The strange thing is that the issues appear or disappear deterministically on the binary level. This means I sometimes get a U-Boot binary which runs just fine in 100% of cases. Then I change for example one of the following:
- Adding a single printf() somewhere in the boards spl.c
- Using the same binary but booting from SD card instead of USB loader
- Using the same source but switching from the OS cross compiler to the
one from Yocto/OE
And afterwards I get 100% failure rate with an error as described above.
My suspicion is that there is some memory corruption/conflict. My SPL is quite large and I wonder if it exceeds some limit.
SPL is loaded to 0x920000 and CONFIG_SPL_STACK is set to 0x960000, which leaves 256 KiB in between for the SPL. But all i.MX8MP boards seem to set CONFIG_SPL_MAX_SIZE=0x26000 (152 KiB) for some reason. My u-boot-spl-ddr.bin currently has around 193 KiB but I don't get any warning about exceeding the SPL_MAX_SIZE.
I also ran into this problem a while back, but that was back when the ddr firmware files were padded to 16K and 32K each to make the magic offset computations work; now that binman symbols are used, they only take up as much space as they actually use (give or take some 4-byte padding perhaps), and I no longer need the debug code I put in place in our 2022.07 branch.
Remember that from the stack, the initial (and in SPL only) malloc arena is carved out, and if you haven't adjusted SPL_SYS_MALLOC_F_LEN, you probably have that set to the default SYS_MALLOC_F_LEN, which in turn (on imx8m) defaults to 0x10000 aka 64KiB. So that could easily explain why you collide with the firmware.
Maybe you can use the debug code I added to our copy of spl.c; I also include most of my commit-message-for-future-me. But just something as simple as
int dummy; printf("stack is around %p\n", &dummy);
can be quite valuable.
===
add sanity check of SPL stack versus DDR firmware
The DDR firmware blobs are added via binman to the end of the SPL, and they are found by the DDR init code via somewhat magic offset tricks. To make those offset tricks work, binman has to pad the firmware files to known sizes of 32KiB and 16KiB, so all four files end up occupying 2*(32+16) = 96KiB in the image.
Since the SPL image is loaded at 0x920000, and we use a stack growing down from 0x960000 (recently changed to 0x968000), from which we also carve out space for the malloc arena and global data, there's a risk that normal use of the stack accidentally overwrites parts of the firmware data. That's exactly what happened when I added another 14K table of DDR training data.
We can compute exactly where the firmware lives, and also where our stack starts (after the mentioned reservations), so to help future me, do a sanity check as soon as we have the console. We might already have scribbled over the DDR firmware (the main stack use is actually in the DM init code which runs before we do), but at least we then now to stop and tell the developer what's wrong, instead of just silently failing when loading corrupt firmware.
#include <asm/mach-imx/iomux-v3.h> #include <asm/mach-imx/mxc_i2c.h> #include <asm/arch/ddr.h> +#include <asm/sections.h> #include <power/pmic.h> #include <power/pca9450.h>
@@ -102,6 +103,35 @@ int board_fit_config_name_match(const char *name) } #endif
+#define TOP_OF_STACK \ + ((CONFIG_SPL_STACK - CONFIG_VAL(SYS_MALLOC_F_LEN) - sizeof(struct global_data)) & ~15); + +static void check_stack_versus_ddr_firmware(void) +{ + unsigned long fdt_end = roundup((unsigned long)&_end + fdt_totalsize(gd->fdt_blob), 4); + unsigned long ddr_firmware_end = fdt_end + 2*(16 + 32)*1024; + unsigned long top_of_stack = TOP_OF_STACK; + unsigned long available = top_of_stack - ddr_firmware_end; + + if (0) { + printf("fdt_end = 0x%08lx\n", fdt_end); + printf("ddr_end = 0x%08lx\n", ddr_firmware_end); + printf("stack = 0x%08lx\n", top_of_stack); + printf("available stack = %8ld\n", available); + } + + if (top_of_stack <= ddr_firmware_end) + panic("Top of stack 0x%08lx already below DDR firmware end 0x%08lx!\n", + top_of_stack, ddr_firmware_end); + /* + * Some ad hoc instrumentation shows that our worst case stack + * usage is around 3.5K. Requiring at least 16K should be + * future-proof for a while. + */ + if (available <= 16*1024) + panic("Available stack space %ld too small!\n", available); +} + /* Do not use BSS area in this phase */ void board_init_f(ulong dummy) { @@ -118,6 +148,8 @@ void board_init_f(ulong dummy)
preloader_console_init();
+ check_stack_versus_ddr_firmware(); + enable_tzc380();
power_init_board();