[U-Boot] Chain loading an u-boot from an u-boot

Hi,
to give you some background why we would want to do something (strange) like this:
- we have a hardware design bug - we have a few hundred i.MX31 TT-01 devices in the field - the i.MX31 rom boot loader is only capable of using 1bit HW-ECC (loading the first page (2k) from the NAND) - the NAND chip specifies a requirement of 1bit ECC for the first 128kB (PEB) and 4bit ECC for the rest - our current u-boot uses 1-bit HW-ECC, the kernel uses UBIFS and 1bit HW-ECC - we face increasing bit errors in the field in the PEBs used by u-boot.
Using UBIFS in the kernel mitigates the requirements of 4bit ECC for the whole NAND because it moves PEBs when bit errors show up. The real problem is the area where u-boot is located (currently approx. 450kB, including UBIFS, USB ethernet support and more ..).
So the idea was:
- use a small u-boot (<128kB) in the first PEB of the NAND (written with 1bit HW-ECC) that supports 4bit BCH - let it load a second u-boot (<512kB) from the next 4 PEBs (written with 4bit BCH) - jump to the second u-boot and load the kernel from an UBI volume using 1bit HW-ECC again
I did all that and it seemed to work just fine, but jumping to the second u-boot almost always crashes the system. In detail we do:
- romboot loads the SPL (2kb) - SPL loads the first u-boot stage (which relocates and runs nicely) - the first u-boot 'boots' the second u-boot by loading it from the NAND - the second u-boot is loaded to the link address minus 2kB (for SPL) - this is the same for the first and the second u-boot (link address 0x87e00000 - 0x800 = 0x87dff800) - it jumps to 0x87e00000 omitting the SPL for the second u-boot - the second u-boot should relocated itself again
The second u-boot is verified in RAM with crc32 and it is valid.
I've tested many configuration and found, that it only works if both u-boots are identical:
- different builds of the same code work (different build date, but same code) - different configurations never work - it does not matter if cashes are turned on or off - I skipped the relocation of the second u-boot (actually not necessary) to no avail
I also tried u-boot standalone applications which always work (after fixing a bug in u-boot r8<->r9 for gd), again independent of cashes. Using different configuration I never get any serial output of the second u-boot (board info) or debugging stuff. If I set a breakpoint in the second u-boot (after relocation) and continue from there it works until it tries to get the SPI clock (mxc_get_clock() when accessing CCM_CCMR) for the PMIC access. If I throw in a mxc_dump_clocks() earlier it hangs there.
I'm pretty much running out of ideas, so any pointers are appreciated.
Thx Helmut
-- Scanned by MailScanner.

Dear Helmut Raiger,
On 02/10/2014 12:11 PM, Helmut Raiger wrote:
Hi,
to give you some background why we would want to do something
(strange) like this:
- we have a hardware design bug
- we have a few hundred i.MX31 TT-01 devices in the field
- the i.MX31 rom boot loader is only capable of using 1bit HW-ECC
(loading the first page (2k) from the NAND)
- the NAND chip specifies a requirement of 1bit ECC for the first 128kB
(PEB) and 4bit ECC for the rest
- our current u-boot uses 1-bit HW-ECC, the kernel uses UBIFS and 1bit
HW-ECC
D'oh!
- we face increasing bit errors in the field in the PEBs used by u-boot.
Using UBIFS in the kernel mitigates the requirements of 4bit ECC for the whole NAND because it moves PEBs when bit errors show up. The real problem is the area where u-boot is located (currently approx. 450kB, including UBIFS, USB ethernet support and more ..).
I wouldn't say it is a good solution to have 1 bit ecc on NAND that requires 4 bit, even though there is another layer reacting on bit errors. I guess your BBT will increase significant in a very short time.
So the idea was:
- use a small u-boot (<128kB) in the first PEB of the NAND (written with
1bit HW-ECC) that supports 4bit BCH
How about using SPL here? I don't know the freescale universe but wonder if SPL is fixed to 2k. Building SPL with SW BCH in less than 2k seems not doable for me.
- let it load a second u-boot (<512kB) from the next 4 PEBs (written
with 4bit BCH)
- jump to the second u-boot and load the kernel from an UBI volume using
1bit HW-ECC again
I did all that and it seemed to work just fine, but jumping to the second u-boot almost always crashes the system. In detail we do:
- romboot loads the SPL (2kb)
- SPL loads the first u-boot stage (which relocates and runs nicely)
- the first u-boot 'boots' the second u-boot by loading it from the NAND
- the second u-boot is loaded to the link address minus 2kB (for SPL)
- this is the same for the first and the second u-boot (link address
0x87e00000 - 0x800 = 0x87dff800)
The offset is about 125MiB, current mainline code tells me, that the tt-01 board has just 128 MiB. It is likely your second uboot overwrites the code of your first one while copying. You should link your code to run at a far away address, maybe 0x80000000 ;)
- it jumps to 0x87e00000 omitting the SPL for the second u-boot
- the second u-boot should relocated itself again
The second u-boot is verified in RAM with crc32 and it is valid.
I've tested many configuration and found, that it only works if both u-boots are identical:
- different builds of the same code work (different build date, but same
code)
- different configurations never work
- it does not matter if cashes are turned on or off
- I skipped the relocation of the second u-boot (actually not necessary)
to no avail
I also tried u-boot standalone applications which always work (after fixing a bug in u-boot r8<->r9 for gd), again independent of cashes. Using different configuration I never get any serial output of the second u-boot (board info) or debugging stuff. If I set a breakpoint in the second u-boot (after relocation) and continue from there it works until it tries to get the SPI clock (mxc_get_clock() when accessing CCM_CCMR) for the PMIC access. If I throw in a mxc_dump_clocks() earlier it hangs there.
Well, it may be related to some freescale interna I do not know. However It is likely that you really overwrite the first u-boot version with the second one.
I'm pretty much running out of ideas, so any pointers are appreciated.
Hope it helps ...
Best regards
Andreas Bießmann

On 02/10/2014 01:14 PM, Andreas Bießmann wrote:
- we have a hardware design bug
- we have a few hundred i.MX31 TT-01 devices in the field
- the i.MX31 rom boot loader is only capable of using 1bit HW-ECC
(loading the first page (2k) from the NAND)
- the NAND chip specifies a requirement of 1bit ECC for the first 128kB
(PEB) and 4bit ECC for the rest
- our current u-boot uses 1-bit HW-ECC, the kernel uses UBIFS and 1bit
HW-ECC D'oh!
just about what I thought ...
- we face increasing bit errors in the field in the PEBs used by u-boot.
Using UBIFS in the kernel mitigates the requirements of 4bit ECC for the whole NAND because it moves PEBs when bit errors show up. The real problem is the area where u-boot is located (currently approx. 450kB, including UBIFS, USB ethernet support and more ..). I wouldn't say it is a good solution to have 1 bit ecc on NAND that requires 4 bit, even though there is another layer reacting on bit errors. I guess your BBT will increase significant in a very short time.
Most operations are read (we use a separate YAFFS partition for time predictable writes), so UBI will relocate read-only blocks anyway (due to read disturbances), I think the effect wont be too dramatic, but don't make me proof that ;-)
So the idea was:
- use a small u-boot (<128kB) in the first PEB of the NAND (written with
1bit HW-ECC) that supports 4bit BCH
How about using SPL here? I don't know the freescale universe but wonder if SPL is fixed to 2k. Building SPL with SW BCH in less than 2k seems not doable for me.
SPL on i.MX31 is limited to 2kB so we can't use BCH 4 here, just as you guessed.
- let it load a second u-boot (<512kB) from the next 4 PEBs (written
with 4bit BCH)
- jump to the second u-boot and load the kernel from an UBI volume using
1bit HW-ECC again
I did all that and it seemed to work just fine, but jumping to the second u-boot almost always crashes the system. In detail we do:
- romboot loads the SPL (2kb)
- SPL loads the first u-boot stage (which relocates and runs nicely)
- the first u-boot 'boots' the second u-boot by loading it from the NAND
- the second u-boot is loaded to the link address minus 2kB (for SPL)
- this is the same for the first and the second u-boot (link address
0x87e00000 - 0x800 = 0x87dff800)
The offset is about 125MiB, current mainline code tells me, that the tt-01 board has just 128 MiB. It is likely your second uboot overwrites the code of your first one while copying. You should link your code to run at a far away address, maybe 0x80000000 ;)
We have 256MiB (not yet contributed). First u-boot is loaded to 0x87e00000, then relocates to 0x8f... something. Second u-booot is loaded to 0x87e00000 again and relocates to 0x8f..., the same locations for both, the second u-boot is verified in RAM before jumping to it. If I set a breakpoint in the do_go_exec() I can step right into the second u-boot.
Well, it may be related to some freescale interna I do not know. However It is likely that you really overwrite the first u-boot version with the
You'd be right for 128MiB! I'll try to crc32 the relocated area of the first u-boot, anyway.
The strange thing is that the serial output does not show up unless I set breakpoints. This might be pointing to some clock setup problem?!
Thx for caring, Helmut
-- Scanned by MailScanner.

On Mon, 2014-02-10 at 13:57 +0100, Helmut Raiger wrote:
On 02/10/2014 01:14 PM, Andreas Bießmann wrote:
- we have a hardware design bug
- we have a few hundred i.MX31 TT-01 devices in the field
- the i.MX31 rom boot loader is only capable of using 1bit HW-ECC
(loading the first page (2k) from the NAND)
- the NAND chip specifies a requirement of 1bit ECC for the first 128kB
(PEB) and 4bit ECC for the rest
- our current u-boot uses 1-bit HW-ECC, the kernel uses UBIFS and 1bit
HW-ECC D'oh!
just about what I thought ...
- we face increasing bit errors in the field in the PEBs used by u-boot.
Using UBIFS in the kernel mitigates the requirements of 4bit ECC for the whole NAND because it moves PEBs when bit errors show up. The real problem is the area where u-boot is located (currently approx. 450kB, including UBIFS, USB ethernet support and more ..). I wouldn't say it is a good solution to have 1 bit ecc on NAND that requires 4 bit, even though there is another layer reacting on bit errors. I guess your BBT will increase significant in a very short time.
Most operations are read (we use a separate YAFFS partition for time predictable writes), so UBI will relocate read-only blocks anyway (due to read disturbances), I think the effect wont be too dramatic, but don't make me proof that ;-)
This sounds like a very bad idea.
So the idea was:
- use a small u-boot (<128kB) in the first PEB of the NAND (written with
1bit HW-ECC) that supports 4bit BCH
How about using SPL here? I don't know the freescale universe but wonder if SPL is fixed to 2k. Building SPL with SW BCH in less than 2k seems not doable for me.
SPL on i.MX31 is limited to 2kB so we can't use BCH 4 here, just as you guessed.
You could use TPL (three stage extension of SPL). 2K SPL loads 126K TPL, which has BCH code and can load the real U-Boot.
See doc/README.TPL, and include/configs/p1_p2_rdb_pc.h for an example.
- let it load a second u-boot (<512kB) from the next 4 PEBs (written
with 4bit BCH)
- jump to the second u-boot and load the kernel from an UBI volume using
1bit HW-ECC again
I did all that and it seemed to work just fine, but jumping to the second u-boot almost always crashes the system. In detail we do:
- romboot loads the SPL (2kb)
- SPL loads the first u-boot stage (which relocates and runs nicely)
- the first u-boot 'boots' the second u-boot by loading it from the NAND
- the second u-boot is loaded to the link address minus 2kB (for SPL)
- this is the same for the first and the second u-boot (link address
0x87e00000 - 0x800 = 0x87dff800)
The offset is about 125MiB, current mainline code tells me, that the tt-01 board has just 128 MiB. It is likely your second uboot overwrites the code of your first one while copying. You should link your code to run at a far away address, maybe 0x80000000 ;)
We have 256MiB (not yet contributed). First u-boot is loaded to 0x87e00000, then relocates to 0x8f... something. Second u-booot is loaded to 0x87e00000 again and relocates to 0x8f..., the same locations for both, the second u-boot is verified in RAM before jumping to it. If I set a breakpoint in the do_go_exec() I can step right into the second u-boot.
Make sure you're cleaning the cache for that second load, if required.
-Scott

On 02/12/2014 10:59 PM, Scott Wood wrote:
Most operations are read (we use a separate YAFFS partition for time predictable writes), so UBI will relocate read-only blocks anyway (due to read disturbances), I think the effect wont be too dramatic, but don't make me proof that ;-)
This sounds like a very bad idea.
Agreed.
SPL on i.MX31 is limited to 2kB so we can't use BCH 4 here, just as you guessed.
You could use TPL (three stage extension of SPL). 2K SPL loads 126K TPL, which has BCH code and can load the real U-Boot.
See doc/README.TPL, and include/configs/p1_p2_rdb_pc.h for an example.
Yes I read that, but it's not done for i.mx31 and I thought it might be harder to do, than just a second u-boot. This might proof wrong.
jumping to it. If I set a breakpoint in the do_go_exec() I can step right into the second u-boot.
Make sure you're cleaning the cache for that second load, if required.
Currently I turned off cashes in the first u-boot and hoped that would do.
Helmut
-- Scanned by MailScanner.

Hi Helmut,
On 10/02/2014 12:11, Helmut Raiger wrote:
So the idea was:
- use a small u-boot (<128kB) in the first PEB of the NAND (written with
1bit HW-ECC) that supports 4bit BCH
- let it load a second u-boot (<512kB) from the next 4 PEBs (written
with 4bit BCH)
- jump to the second u-boot and load the kernel from an UBI volume using
1bit HW-ECC again
I understand the first two points, but why do you store the kernel again with 1bit HW-ECC ? The second U-Boot is able to check with 4bit BCH and your NAND requires 4bit.
I did all that and it seemed to work just fine, but jumping to the second u-boot almost always crashes the system. In detail we do:
- romboot loads the SPL (2kb)
- SPL loads the first u-boot stage (which relocates and runs nicely)
- the first u-boot 'boots' the second u-boot by loading it from the NAND
- the second u-boot is loaded to the link address minus 2kB (for SPL)
- this is the same for the first and the second u-boot (link address
0x87e00000 - 0x800 = 0x87dff800)
- it jumps to 0x87e00000 omitting the SPL for the second u-boot
- the second u-boot should relocated itself again
The second u-boot is verified in RAM with crc32 and it is valid.
I've tested many configuration and found, that it only works if both u-boots are identical:
- different builds of the same code work (different build date, but same
code)
I agree with Andreas' analyses. It seems that the second u-boot overwrites your running U-Boot and only if they are identical you have no problem, that means that you are not changing the running code.
Regards, Stefano Babic

Hi Stefano,
Hi Helmut,
I understand the first two points, but why do you store the kernel again with 1bit HW-ECC ? The second U-Boot is able to check with 4bit BCH and your NAND requires 4bit.
This is mainly due to performance requirements. Using 4bit BCH increases overhead and makes DMA (currently not used in the kernel driver) a lot slower. We thought we might slip through with 1bit HW-ECC, but we will test this (hopefully not in the field this time ;-) )
I agree with Andreas' analyses. It seems that the second u-boot overwrites your running U-Boot and only if they are identical you have no problem, that means that you are not changing the running code.
I double-checked now, the running u-boot is not overwritten. When the 2nd u-boot relocates it overwrites the first one, but that shouldn't be a problem. The first u-boot keeps working after loading (but not running) the second one without issues.
Only the 'go' crashes the system. u-boot starts stand-alone application fine, just as the kernel. I really can't see the point why another u-boot should be any different?!
Helmut
-- Scanned by MailScanner.

Hi Helmut,
On 12/02/2014 10:56, Helmut Raiger wrote:
This is mainly due to performance requirements. Using 4bit BCH increases overhead and makes DMA (currently not used in the kernel driver) a lot slower. We thought we might slip through with 1bit HW-ECC, but we will test this (hopefully not in the field this time ;-) )
I know the MX31's NAND controller can only handle 1bit HW ECC ;-(
Let's know us your results - in my experience, when the NAND controller (I am not speaking about MX31, anyway) provides less ECC bits as requested by NAND, I always got problems. Sometimes UBIFS recovered it, but I got the point when rootfs was not mounted.
I double-checked now, the running u-boot is not overwritten. When the 2nd u-boot relocates it overwrites the first one, but that shouldn't be a problem. The first u-boot keeps working after loading (but not running) the second one without issues.
Only the 'go' crashes the system. u-boot starts stand-alone application fine, just as the kernel. I really can't see the point why another u-boot should be any different?!
ok - then it could be that something is set twice (from first and second U-Boot), and it crashes the second time is set. Of course, the DDR controller must not be set by second U-Boot, but I suppose you have already commented out.
Maybe setting the main clock again (I see it in board_setup_clocks() ) can cause some problems. Is it called again in your modified U-Boot ? I am expecting that board_setup_sdram() and board_setup_clocks() are called only by your SPL.
Best regards, Stefano Babic

Hi Helmut,
On 02/12/2014 10:56 AM, Helmut Raiger wrote:
I understand the first two points, but why do you store the kernel again with 1bit HW-ECC ? The second U-Boot is able to check with 4bit BCH and your NAND requires 4bit.
This is mainly due to performance requirements. Using 4bit BCH increases overhead and makes DMA (currently not used in the kernel driver) a lot slower. We thought we might slip through with 1bit HW-ECC, but we will test this (hopefully not in the field this time ;-) )
If your HW requires 4Bit it is highly recommended to do so. You will run your HW out of specs in other case and I think it is hard to qualify that 4Bit required ECC runs with 1Bit ECC and UBIFS as you stated in a previous mail.
I agree with Andreas' analyses. It seems that the second u-boot overwrites your running U-Boot and only if they are identical you have no problem, that means that you are not changing the running code.
I double-checked now, the running u-boot is not overwritten. When the 2nd u-boot relocates it overwrites the first one, but that shouldn't be a problem. The first u-boot keeps working after loading (but not running) the second one without issues.
Only the 'go' crashes the system. u-boot starts stand-alone application fine, just as the kernel. I really can't see the point why another u-boot should be any different?!
Just thinking ... have you checked the global data pointer? Is it possible that the global data of the first u-boot influences the global data of the second one?
Best regards
Andreas Bießmann

On 02/12/2014 11:45 AM, Andreas Bießmann wrote:
Hi Helmut,
On 02/12/2014 10:56 AM, Helmut Raiger wrote:
I understand the first two points, but why do you store the kernel again with 1bit HW-ECC ? The second U-Boot is able to check with 4bit BCH and your NAND requires 4bit.
This is mainly due to performance requirements. Using 4bit BCH increases overhead and makes DMA (currently not used in the kernel driver) a lot slower. We thought we might slip through with 1bit HW-ECC, but we will test this (hopefully not in the field this time ;-) )
If your HW requires 4Bit it is highly recommended to do so. You will run your HW out of specs in other case and I think it is hard to qualify that 4Bit required ECC runs with 1Bit ECC and UBIFS as you stated in a previous mail.
You guys are right. I'm just cornered, as performance is a big issue aswell. We'll try to qualify the NAND in proper climate and some UBIFS supervision to gain more insight. Its just that teh application software guys suggested to improve the kernel driver to use DMA to increase overall performance.
why another u-boot should be any different?!
Just thinking ... have you checked the global data pointer? Is it possible that the global data of the first u-boot influences the global data of the second one?
The global data pointer is setup right before the newly set stack pointer in arch/arm/lib/crt0.S, so it should be reset anyway.
And answering Stefano's question. The RAM setup is only in the SPL which is skipped when I jump to the second u-boot.
But you just inspired me! There are probably interrupts running for some time when the second u-boot starts and the relocation might destroy part of the interrupt entry points ....
Thx for asking the right questions. I'll have to check this.
Helmut
-- Scanned by MailScanner.

On 02/13/2014 10:03 AM, Helmut Raiger wrote:
But you just inspired me! There are probably interrupts running for some time when the second u-boot starts and the relocation might destroy part of the interrupt entry points ....
Thx for asking the right questions. I'll have to check this.
Helmut
I'm finally able to start the second u-boot (other things in-between as always). If I do a cleanup_before_linux(), i.e. turn off interrupts and caches before the actual 'go' and it works just fine.
For testing I patched the go command, but obviously this can't be contributed as such.
Anyone having a suggestion on how to do this?
1) add option to 'go' command, which is hard as it has variable arguments 2) add another go command 3) use an environment variable to set the option for 'go'
Theoretically I could use a u-boot image to encapsulate the second u-boot and use 'bootm', but I think I'll stumble over the same kind of questions.
Helmut
-- Scanned by MailScanner.

Hi Hemut,
On 31 March 2014 05:29, Helmut Raiger helmut.raiger@hale.at wrote:
On 02/13/2014 10:03 AM, Helmut Raiger wrote:
But you just inspired me! There are probably interrupts running for some time when the second u-boot starts and the relocation might destroy part of the interrupt entry points ....
Thx for asking the right questions. I'll have to check this.
Helmut
I'm finally able to start the second u-boot (other things in-between as always). If I do a cleanup_before_linux(), i.e. turn off interrupts and caches before the actual 'go' and it works just fine.
For testing I patched the go command, but obviously this can't be contributed as such.
Anyone having a suggestion on how to do this?
- add option to 'go' command, which is hard as it has variable arguments
- add another go command
- use an environment variable to set the option for 'go'
Theoretically I could use a u-boot image to encapsulate the second u-boot and use 'bootm', but I think I'll stumble over the same kind of questions.
There is already 'dcache off' but I wonder if something like 'go prepare' would be useful? Another option is that bootm has a prepare state, but it requires an image.
Regards, Simon
-- Scanned by MailScanner.
U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

Hi Helmut,
On 04/04/2014 01:13, Simon Glass wrote:
- add option to 'go' command, which is hard as it has variable arguments
- add another go command
- use an environment variable to set the option for 'go'
Theoretically I could use a u-boot image to encapsulate the second u-boot and use 'bootm', but I think I'll stumble over the same kind of questions.
There is already 'dcache off' but I wonder if something like 'go prepare' would be useful? Another option is that bootm has a prepare state, but it requires an image.
I agree with Simon. If you have not changed your target, you are using a MX31 and what cleanup_linux for arm1136 does is turning off the caches. What about if you turn off i-cache and d-cache in einem script before the go command ?
Best regards, Stefano

On 04/04/2014 11:25 AM, Stefano Babic wrote:
Hi Helmut,
On 04/04/2014 01:13, Simon Glass wrote:
- add option to 'go' command, which is hard as it has variable arguments
- add another go command
- use an environment variable to set the option for 'go'
Theoretically I could use a u-boot image to encapsulate the second u-boot and use 'bootm', but I think I'll stumble over the same kind of questions.
There is already 'dcache off' but I wonder if something like 'go prepare' would be useful? Another option is that bootm has a prepare state, but it requires an image.
I agree with Simon. If you have not changed your target, you are using a MX31 and what cleanup_linux for arm1136 does is turning off the caches. What about if you turn off i-cache and d-cache in einem script before the go command ?
Best regards, Stefano
Hi Stefano,
cleanup_before_linux() also disables interrupts and flushes the cache(s). Simply turning off the caches did not do the trick.
Using 'go prepare' would be a solution as the 2nd argument should be an address (i.e. numeric) in any case.
Thx, Helmut
-- Scanned by MailScanner.
participants (5)
-
Andreas Bießmann
-
Helmut Raiger
-
Scott Wood
-
Simon Glass
-
Stefano Babic