[U-Boot] 83xx fails to boot with moderately sized kernels

Hello all,
I am using vanilla U-Boot on an MPC8349EMDS board with 256MB of RAM. I have been tracking down a kernel hang very early during boot, and have traced it back to a U-Boot bug. The bug has to do with the way CONFIG_SYS_BOOTMAPSZ is set.
Please read through this thread for more lots more information: http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-September/085524.html
On most (all?) 83xx boards, CONFIG_SYS_BOOTMAPSZ is set to 8MB. The comment says:
/* * For booting Linux, the board info and command line data * have to be in the first 8 MB of memory, since this is * the maximum mapped by the Linux kernel during initialization. */ #define CONFIG_SYS_BOOTMAPSZ (8 << 20) /* Initial Memory map for Linux*/
I am booting with a FIT image containing a kernel, FDT, and ramdisk. U-Boot's bootm command makes sure that the FDT is relocated such that it is located at an address lower than CONFIG_SYS_BOOTMAPSZ. The ramdisk is placed at the end of memory. The relevant U-Boot messages:
Booting using the fdt blob at 0x226a278 Uncompressing Kernel Image ... OK Loading Ramdisk to 0fe9f000, end 0ff75699 ... OK Loading Device Tree to 007f8000, end 007ff78f ... OK
You'll notice the ramdisk was loaded at the end of RAM (near 256MB), while the FDT was loaded near the end of CONFIG_SYS_BOOTMAPSZ (near 8MB).
Newer Linux kernels are getting pretty large. In fact, even a fairly minimal kernel is large enough such that the uncompressed kernel code + BSS is >8MB. When this happens, Linux overwrites the FDT and the boot hangs silently.
U-Boot cannot detect this, since it does not load ELF images. It does not know the BSS size.
Below is the size of my two kernels, toggling only the CONFIG_PROVE_LOCKING feature on or off. As stated above, the rest of the kernel is relatively minimal.
With CONFIG_PROVE_LOCKING: c0369000 A __bss_start c08a0c48 A __bss_stop
0xc08a0c48 - 0xc0369000 == 5471304 bytes BSS
Kernel size (uncompressed): 3573068 vmlinux.bin
Total (code + bss): 3573068 + 5471304 == 9044372 (8.625MB)
Without CONFIG_PROVE_LOCKING: c0363000 A __bss_start c0786c08 A __bss_stop
0xc0786c08 - 0xc0363000 == 4340744 bytes BSS
Kernel size (uncompressed): 3548492 vmlinux.bin
Total (code + bss): 3548492 + 4340744 == 7889236 (7.523MB)
So, just adding CONFIG_PROVE_LOCKING=y made a 1MB difference, most of it in the bss section.
Now, as the U-Boot output shows (with CONFIG_SYS_BOOTMAPSZ == 8MB):
Booting using the fdt blob at 0x226a278 Uncompressing Kernel Image ... OK Loading Ramdisk to 0fe9f000, end 0ff75699 ... OK Loading Device Tree to 007f8000, end 007ff78f ... OK
Now, since the Linux kernel is loaded to address 0x0, this means that after it is running and has initialized it's BSS, the end of the kernel will be at address: 0x786154. The working kernel is just barely squeaking by with about 450KB of space between the kernel and FDT.
Note that on PowerPC, Linux initializes it's BSS very, very early. The BSS is initialized before the FDT is parsed.
Nothing that Timur or I can find indicate that CONFIG_SYS_BOOTMAPSZ is actually limited to 8MB, as the comment states. Timur and Kumar chatted, and realized that they had seen this bug before, on 85xx. They raised the value of CONFIG_SYS_BOOTMAPSZ to 16MB without issue.
Reading through the Linux code, I cannot find any reason why this should be limited to 8MB, or even 16MB. Should it be limited to CONFIG_MAX_MEM_MAPPED?
I'm worried that others either have hit this bug, or will hit it soon. By building lots of drivers into the kernel, I'm sure that even the 16MB boundary could be overrun.
Does anyone know the true maximum value for CONFIG_SYS_BOOTMAPSZ on Linux (if one even exists)?
Thanks, Ira

Dear "Ira W. Snyder",
In message 20100909225241.GI3496@ovro.caltech.edu you wrote:
On most (all?) 83xx boards, CONFIG_SYS_BOOTMAPSZ is set to 8MB. The comment says:
...
Does anyone know the true maximum value for CONFIG_SYS_BOOTMAPSZ on Linux (if one even exists)?
The CONFIG_SYS_BOOTMAPSZ thing is as old as U-Boot and PPCBoot exists, i. e. well over a decade. IIRC there was such a limitation on the then current 2.2.13 Linux kernels, at least on MPC8xx and PPC40x systems, which is where all started from.
I am pretty sure that as long as nobody ran into any problems, nobody looked into that code, so it was copied from architecture to architecture without much thinking, if any.
Best regards,
Wolfgang Denk

Hi Ira & Wolfgang,
On Friday 10 September 2010 13:18:55 Wolfgang Denk wrote:
Does anyone know the true maximum value for CONFIG_SYS_BOOTMAPSZ on Linux (if one even exists)?
The CONFIG_SYS_BOOTMAPSZ thing is as old as U-Boot and PPCBoot exists, i. e. well over a decade. IIRC there was such a limitation on the then current 2.2.13 Linux kernels, at least on MPC8xx and PPC40x systems, which is where all started from.
I am pretty sure that as long as nobody ran into any problems, nobody looked into that code, so it was copied from architecture to architecture without much thinking, if any.
I looked at it a bit over a year ago and commited this change for the AMCC/APM eval boards:
commit 6942efc2be1b90054fa4afa5cda7023469fe08b9 Author: Stefan Roese sr@denx.de Date: Tue Jul 28 10:50:32 2009 +0200
ppc4xx: amcc: Set CONFIG_SYS_BOOTMAPSZ to 16MB for big kernels
This patch changes CONFIG_SYS_BOOTMAPSZ from 8MB to 16MB which is the initial TLB on 40x PPC's in the Linux kernel. With this change even bigger Linux kernels (> 8MB) can be booted.
This patch also sets CONFIG_SYS_BOOTM_LEN to 16MB (default 8MB) to enable decompression of bigger images.
Signed-off-by: Stefan Roese sr@denx.de
So we have this 16MiB initial TLB restriction at least for PPC405 (not PPC440). I'm pretty sure that 83xx has no such tight restrictions.
Cheers, Stefan
-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: office@denx.de

On Fri, Sep 10, 2010 at 01:29:48PM +0200, Stefan Roese wrote:
Hi Ira & Wolfgang,
On Friday 10 September 2010 13:18:55 Wolfgang Denk wrote:
Does anyone know the true maximum value for CONFIG_SYS_BOOTMAPSZ on Linux (if one even exists)?
The CONFIG_SYS_BOOTMAPSZ thing is as old as U-Boot and PPCBoot exists, i. e. well over a decade. IIRC there was such a limitation on the then current 2.2.13 Linux kernels, at least on MPC8xx and PPC40x systems, which is where all started from.
I am pretty sure that as long as nobody ran into any problems, nobody looked into that code, so it was copied from architecture to architecture without much thinking, if any.
I looked at it a bit over a year ago and commited this change for the AMCC/APM eval boards:
commit 6942efc2be1b90054fa4afa5cda7023469fe08b9 Author: Stefan Roese sr@denx.de Date: Tue Jul 28 10:50:32 2009 +0200
ppc4xx: amcc: Set CONFIG_SYS_BOOTMAPSZ to 16MB for big kernels This patch changes CONFIG_SYS_BOOTMAPSZ from 8MB to 16MB which is the initial TLB on 40x PPC's in the Linux kernel. With this change even bigger Linux kernels (> 8MB) can be booted. This patch also sets CONFIG_SYS_BOOTM_LEN to 16MB (default 8MB) to enable decompression of bigger images. Signed-off-by: Stefan Roese <sr@denx.de>
So we have this 16MiB initial TLB restriction at least for PPC405 (not PPC440). I'm pretty sure that 83xx has no such tight restrictions.
I've just gone through both the 40x code (as a source for comparison) and the 6xx code (generic 32-bit powerpc: 83xx, 85xx, others?).
arch/powerpc/kernel/setup_32.c's machine_init() function is the first thing to access the device tree. The full MMU setup has not been done at this point; the initial MMU configuration is used at this point. The initial MMU configuration is done in arch/powerpc/kernel/head_32.S's initial_bats() function. On 6xx, it creates a 256MB mapping:
/* * On 601, we use 3 BATs to map up to 24M of RAM at _PAGE_OFFSET * (we keep one for debugging) and on others, we use one 256M BAT. */
Inside U-Boot, common/image.c's boot_relocate_fdt() function uses lmb_alloc_base() to allocate memory to hold the FDT for Linux. That shouldn't return an invalid memory location. (It doesn't return a pointer to unpopulated memory on a board.)
Based on that, I think it should be fine to increase CONFIG_SYS_BOOTMAPSZ to 256MB on all 32-bit 6xx processors. This includes 83xx and 85xx. Is 86xx included too (IIRC, it has 64bit modes)?
A boot test on my MPC8349EMDS confirms that it works:
Booting using the fdt blob at 0x2242d6c Uncompressing Kernel Image ... OK Loading Ramdisk to 0fe9f000, end 0ff75699 ... OK Loading Device Tree to 0fe97000, end 0fe9e84f ... OK [ 0.000000] Using MPC834x MDS machine description [ 0.000000] Linux version 2.6.31.12-00018-g306aebe [output trimmed]
Would you prefer a patch only for the MPC8349EMDS, or should I try and convert the other boards too? How should I know which boards are safe? Grep for CONFIG_E300?
Thanks, Ira

On Fri, 10 Sep 2010 11:10:23 -0700 "Ira W. Snyder" iws@ovro.caltech.edu wrote:
Would you prefer a patch only for the MPC8349EMDS, or should I try and convert the other boards too? How should I know which boards are safe? Grep for CONFIG_E300?
CONFIG_MPC83xx, since E300 gets hits for some mpc51xx parts. Not to mention it clarifies which git tree you're targeting, so as to not confuse custodians.
Thanks,
Kim

Dear Kim Phillips,
In message 20100910132606.cb2debcd.kim.phillips@freescale.com you wrote:
Would you prefer a patch only for the MPC8349EMDS, or should I try and convert the other boards too? How should I know which boards are safe? Grep for CONFIG_E300?
CONFIG_MPC83xx, since E300 gets hits for some mpc51xx parts. Not to mention it clarifies which git tree you're targeting, so as to not confuse custodians.
Do you think mpc51xx behaves differently?
Best regards,
Wolfgang Denk

On Fri, 10 Sep 2010 20:53:56 +0200 Wolfgang Denk wd@denx.de wrote:
Dear Kim Phillips,
In message 20100910132606.cb2debcd.kim.phillips@freescale.com you wrote:
Would you prefer a patch only for the MPC8349EMDS, or should I try and convert the other boards too? How should I know which boards are safe? Grep for CONFIG_E300?
CONFIG_MPC83xx, since E300 gets hits for some mpc51xx parts. Not to mention it clarifies which git tree you're targeting, so as to not confuse custodians.
Do you think mpc51xx behaves differently?
I don't know, I'm just trying to encourage some sort of organization to the breadth of application.
But based on your other email, this looks like it'll be a Wolfgang-direct patch.
Kim

Dear "Ira W. Snyder",
In message 20100910181022.GA18510@ovro.caltech.edu you wrote:
Would you prefer a patch only for the MPC8349EMDS, or should I try and convert the other boards too? How should I know which boards are safe? Grep for CONFIG_E300?
I think we should try and update all boards where this applies. I'm not sure, but my wild guess is that this means all processors except 40x and 8xx (probably 5xx as well, but AFAICT there is no Linux available for these anyway, so it does not matter).
Best regards,
Wolfgang Denk

Dear "Ira W. Snyder",
In message 20100910181022.GA18510@ovro.caltech.edu you wrote:
Would you prefer a patch only for the MPC8349EMDS, or should I try and convert the other boards too? How should I know which boards are safe? Grep for CONFIG_E300?
I think we should try and update all boards where this applies. I'm not sure, but my wild guess is that this means all processors except 40x and 8xx (probably 5xx as well, but AFAICT there is no Linux available for these anyway, so it does not matter).
I think 8xx can be fixed quite easily to do 16 MB if needed. Probably won't be needed though.
Jocke

On Sat, 11 Sep 2010 09:19:34 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
Dear "Ira W. Snyder",
In message 20100910181022.GA18510@ovro.caltech.edu you wrote:
Would you prefer a patch only for the MPC8349EMDS, or should I try and convert the other boards too? How should I know which boards are safe? Grep for CONFIG_E300?
I think we should try and update all boards where this applies. I'm not sure, but my wild guess is that this means all processors except 40x and 8xx (probably 5xx as well, but AFAICT there is no Linux available for these anyway, so it does not matter).
I think 8xx can be fixed quite easily to do 16 MB if needed. Probably won't be needed though.
Actually, I got a complaint a while ago (I think it was Ben H) about problems with large kernels with debugging options on 8xx.
-Scott

Scott Wood scottwood@freescale.com wrote on 2010/09/13 21:21:33:
On Sat, 11 Sep 2010 09:19:34 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
Dear "Ira W. Snyder",
In message 20100910181022.GA18510@ovro.caltech.edu you wrote:
Would you prefer a patch only for the MPC8349EMDS, or should I try and convert the other boards too? How should I know which boards are safe? Grep for CONFIG_E300?
I think we should try and update all boards where this applies. I'm not sure, but my wild guess is that this means all processors except 40x and 8xx (probably 5xx as well, but AFAICT there is no Linux available for these anyway, so it does not matter).
I think 8xx can be fixed quite easily to do 16 MB if needed. Probably won't be needed though.
Actually, I got a complaint a while ago (I think it was Ben H) about problems with large kernels with debugging options on 8xx.
I see. I think the fix is basically enable pinned TLBs. Then a much bigger area will be mapped. Did they try that?
Jocke

On Mon, 13 Sep 2010 23:52:00 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
Scott Wood scottwood@freescale.com wrote on 2010/09/13 21:21:33:
On Sat, 11 Sep 2010 09:19:34 +0200 Joakim Tjernlund joakim.tjernlund@transmode.se wrote:
Dear "Ira W. Snyder",
In message 20100910181022.GA18510@ovro.caltech.edu you wrote:
Would you prefer a patch only for the MPC8349EMDS, or should I try and convert the other boards too? How should I know which boards are safe? Grep for CONFIG_E300?
I think we should try and update all boards where this applies. I'm not sure, but my wild guess is that this means all processors except 40x and 8xx (probably 5xx as well, but AFAICT there is no Linux available for these anyway, so it does not matter).
I think 8xx can be fixed quite easily to do 16 MB if needed. Probably won't be needed though.
Actually, I got a complaint a while ago (I think it was Ben H) about problems with large kernels with debugging options on 8xx.
I see. I think the fix is basically enable pinned TLBs. Then a much bigger area will be mapped. Did they try that?
Don't know if the kernel in question fit in 16 MiB or not... it wasn't a request for assistance, just a "hey, I ran into this, you might want to look at mapping more memory by default" note.
-Scott
participants (6)
-
Ira W. Snyder
-
Joakim Tjernlund
-
Kim Phillips
-
Scott Wood
-
Stefan Roese
-
Wolfgang Denk