
On Mon, 2014-02-10 at 13:57 +0100, Helmut Raiger wrote:
On 02/10/2014 01:14 PM, Andreas Bießmann wrote:
- we have a hardware design bug
- we have a few hundred i.MX31 TT-01 devices in the field
- the i.MX31 rom boot loader is only capable of using 1bit HW-ECC
(loading the first page (2k) from the NAND)
- the NAND chip specifies a requirement of 1bit ECC for the first 128kB
(PEB) and 4bit ECC for the rest
- our current u-boot uses 1-bit HW-ECC, the kernel uses UBIFS and 1bit
HW-ECC D'oh!
just about what I thought ...
- we face increasing bit errors in the field in the PEBs used by u-boot.
Using UBIFS in the kernel mitigates the requirements of 4bit ECC for the whole NAND because it moves PEBs when bit errors show up. The real problem is the area where u-boot is located (currently approx. 450kB, including UBIFS, USB ethernet support and more ..). I wouldn't say it is a good solution to have 1 bit ecc on NAND that requires 4 bit, even though there is another layer reacting on bit errors. I guess your BBT will increase significant in a very short time.
Most operations are read (we use a separate YAFFS partition for time predictable writes), so UBI will relocate read-only blocks anyway (due to read disturbances), I think the effect wont be too dramatic, but don't make me proof that ;-)
This sounds like a very bad idea.
So the idea was:
- use a small u-boot (<128kB) in the first PEB of the NAND (written with
1bit HW-ECC) that supports 4bit BCH
How about using SPL here? I don't know the freescale universe but wonder if SPL is fixed to 2k. Building SPL with SW BCH in less than 2k seems not doable for me.
SPL on i.MX31 is limited to 2kB so we can't use BCH 4 here, just as you guessed.
You could use TPL (three stage extension of SPL). 2K SPL loads 126K TPL, which has BCH code and can load the real U-Boot.
See doc/README.TPL, and include/configs/p1_p2_rdb_pc.h for an example.
- let it load a second u-boot (<512kB) from the next 4 PEBs (written
with 4bit BCH)
- jump to the second u-boot and load the kernel from an UBI volume using
1bit HW-ECC again
I did all that and it seemed to work just fine, but jumping to the second u-boot almost always crashes the system. In detail we do:
- romboot loads the SPL (2kb)
- SPL loads the first u-boot stage (which relocates and runs nicely)
- the first u-boot 'boots' the second u-boot by loading it from the NAND
- the second u-boot is loaded to the link address minus 2kB (for SPL)
- this is the same for the first and the second u-boot (link address
0x87e00000 - 0x800 = 0x87dff800)
The offset is about 125MiB, current mainline code tells me, that the tt-01 board has just 128 MiB. It is likely your second uboot overwrites the code of your first one while copying. You should link your code to run at a far away address, maybe 0x80000000 ;)
We have 256MiB (not yet contributed). First u-boot is loaded to 0x87e00000, then relocates to 0x8f... something. Second u-booot is loaded to 0x87e00000 again and relocates to 0x8f..., the same locations for both, the second u-boot is verified in RAM before jumping to it. If I set a breakpoint in the do_go_exec() I can step right into the second u-boot.
Make sure you're cleaning the cache for that second load, if required.
-Scott