
Dear Helmut Raiger,
On 02/10/2014 12:11 PM, Helmut Raiger wrote:
Hi,
to give you some background why we would want to do something
(strange) like this:
- we have a hardware design bug
- we have a few hundred i.MX31 TT-01 devices in the field
- the i.MX31 rom boot loader is only capable of using 1bit HW-ECC
(loading the first page (2k) from the NAND)
- the NAND chip specifies a requirement of 1bit ECC for the first 128kB
(PEB) and 4bit ECC for the rest
- our current u-boot uses 1-bit HW-ECC, the kernel uses UBIFS and 1bit
HW-ECC
D'oh!
- we face increasing bit errors in the field in the PEBs used by u-boot.
Using UBIFS in the kernel mitigates the requirements of 4bit ECC for the whole NAND because it moves PEBs when bit errors show up. The real problem is the area where u-boot is located (currently approx. 450kB, including UBIFS, USB ethernet support and more ..).
I wouldn't say it is a good solution to have 1 bit ecc on NAND that requires 4 bit, even though there is another layer reacting on bit errors. I guess your BBT will increase significant in a very short time.
So the idea was:
- use a small u-boot (<128kB) in the first PEB of the NAND (written with
1bit HW-ECC) that supports 4bit BCH
How about using SPL here? I don't know the freescale universe but wonder if SPL is fixed to 2k. Building SPL with SW BCH in less than 2k seems not doable for me.
- let it load a second u-boot (<512kB) from the next 4 PEBs (written
with 4bit BCH)
- jump to the second u-boot and load the kernel from an UBI volume using
1bit HW-ECC again
I did all that and it seemed to work just fine, but jumping to the second u-boot almost always crashes the system. In detail we do:
- romboot loads the SPL (2kb)
- SPL loads the first u-boot stage (which relocates and runs nicely)
- the first u-boot 'boots' the second u-boot by loading it from the NAND
- the second u-boot is loaded to the link address minus 2kB (for SPL)
- this is the same for the first and the second u-boot (link address
0x87e00000 - 0x800 = 0x87dff800)
The offset is about 125MiB, current mainline code tells me, that the tt-01 board has just 128 MiB. It is likely your second uboot overwrites the code of your first one while copying. You should link your code to run at a far away address, maybe 0x80000000 ;)
- it jumps to 0x87e00000 omitting the SPL for the second u-boot
- the second u-boot should relocated itself again
The second u-boot is verified in RAM with crc32 and it is valid.
I've tested many configuration and found, that it only works if both u-boots are identical:
- different builds of the same code work (different build date, but same
code)
- different configurations never work
- it does not matter if cashes are turned on or off
- I skipped the relocation of the second u-boot (actually not necessary)
to no avail
I also tried u-boot standalone applications which always work (after fixing a bug in u-boot r8<->r9 for gd), again independent of cashes. Using different configuration I never get any serial output of the second u-boot (board info) or debugging stuff. If I set a breakpoint in the second u-boot (after relocation) and continue from there it works until it tries to get the SPI clock (mxc_get_clock() when accessing CCM_CCMR) for the PMIC access. If I throw in a mxc_dump_clocks() earlier it hangs there.
Well, it may be related to some freescale interna I do not know. However It is likely that you really overwrite the first u-boot version with the second one.
I'm pretty much running out of ideas, so any pointers are appreciated.
Hope it helps ...
Best regards
Andreas Bießmann