Re: [U-Boot-Users] [PATCH RFC] ARM: Davinci: NAND fix for large page ECC and linux compatibility

[offlist thread cc'd back to the list - hope you don't mind Scott!]
Hi Scott,
Thanks for your feedback.
On Fri, Jun 27, 2008 at 12:04:00PM -0500, Scott Wood wrote:
Bernard Blackham wrote:
What I would like to know though, is are you the right person to push this through, and can it make it into U-boot 1.3.4? I'm asking because it contains potential compatibility breaks, and I'd like to document the specific U-boot versions that will be affected.
There is a general ARM tree, but I can take it through mine if that's what is preferred.
That was the suggestion on #uboot. I'd say it's more NAND related than ARM related too.
+/*
- Previous versions of u-boot (1.3.3 and prior) and Montavista Linux kernels
- generated bogus ECCs on large-page NAND. Both large and small page NAND ECCs
- were incompatible with the Linux davinci git tree (since NAND was integrated
- in 2.6.24).
- Don't turn this on if you want backwards compatibility.
- Do turn this on if you want u-boot to be able to read and write NAND
- that can be written or read by the Linux davinci git kernel.
+#define CFG_LINUX_COMPATIBLE_ECC
- */
It seems odd that backwards compatibility requires turning *off* an option with "compatible" in the name... I'd invert the sense of the ifdef, and have it be something like CFG_BROKEN_ECC_COMPATIBILITY.
The concern with this is people that use their own custom config files will need to add this #define when they upgrade. How about just changing the name to CFG_NEW_NAND_ECC_FORMAT then?
If the old way of doing small page ECC was valid, should we preserve that (and change Linux back)?
That's a little controversial. Basically, the old OOB layout didn't match any other layout used (except by the MV kernel), the actual ECC layout meant that the method for correction was overly complex (with 170 non-obvious lines of code), and allegedly broken: http://article.gmane.org/gmane.comp.boot-loaders.u-boot/32035
The new code is about 30 lines, really simple, and I can even prove it's correctness (which I couldn't even begin to with the old code). Troy (cc'd) I believe was the original author. It could probably do with some comments though to make it dead obvious to the casual observer what's going on. I'll add them in.
We should probably default to doing it the right way, not the broken-but-compatible way for large pages, though.
It depends if you put backwards compatibility over reliability though. Many davinci users are still running the MontaVista-supplied 2.6.10 kernel which has the same broken ECC code and I've heard no word from MV on fixing it yet (and they're probably struggling to deal with the same backwards compatibility issue).
How about this solution: in davinci/nand.c, we add something like this:
#if defined(CFG_NAND_LARGEPAGE) && !defined(CFG_LINUX_COMPATIBLE_ECC) /* Comment this #error out only if you really really have to. */ #error "You are using old ECC code that is broken on large page devices. See doc/README.davinci" #endif
This forces the user to make a choice - they'll probably curse while they're doing it, but they can't plead ignorance when they find their large page NAND isn't detecting ECC errors.
Perhaps we could use some currently unused OOB byte as a marker for new/old ECC layout?
Could do, but any filesystems which use the OOB bytes might step on these. It also complicates the code even moreso and creates a lot more scenarios to test and that could go wrong.
I really do believe it should be a clean switch from one format to the other, for both small and large page NAND, with no run-time backwards compatibility. But that's just my POV.
Cheers, Bernard.

+#define CFG_LINUX_COMPATIBLE_ECC
- */
It seems odd that backwards compatibility requires turning *off* an option with "compatible" in the name... I'd invert the sense of the ifdef, and have it be something like CFG_BROKEN_ECC_COMPATIBILITY.
The concern with this is people that use their own custom config files will need to add this #define when they upgrade. How about just changing the name to CFG_NEW_NAND_ECC_FORMAT then?
I like CFG_NEW_NAND_ECC_FORMAT better as well.
#if defined(CFG_NAND_LARGEPAGE) && !defined(CFG_LINUX_COMPATIBLE_ECC) /* Comment this #error out only if you really really have to. */ #error "You are using old ECC code that is broken on large page devices. See doc/README.davinci" #endif
This forces the user to make a choice - they'll probably curse while they're doing it, but they can't plead ignorance when they find their large page NAND isn't detecting ECC errors.
I like this too. Maybe a #warning for small pages as well. Of course both would also depend on #ifdef CFG_NAND_HW_ECC.
Perhaps we could use some currently unused OOB byte as a marker for new/old ECC layout?
Could do, but any filesystems which use the OOB bytes might step on these. It also complicates the code even moreso and creates a lot more scenarios to test and that could go wrong.
I really do believe it should be a clean switch from one format to the other, for both small and large page NAND, with no run-time backwards compatibility. But that's just my POV.
I hope that eventually we can remove the old format, but this patch has my ack.
Thanks Bernard
Troy

On Sat, Jun 28, 2008 at 11:31:18AM +0800, Bernard Blackham wrote:
It seems odd that backwards compatibility requires turning *off* an option with "compatible" in the name... I'd invert the sense of the ifdef, and have it be something like CFG_BROKEN_ECC_COMPATIBILITY.
The concern with this is people that use their own custom config files will need to add this #define when they upgrade. How about just changing the name to CFG_NEW_NAND_ECC_FORMAT then?
How about having both, and #erroring if one or the other isn't defined (similar to what you suggest below, but for both small and large page)?
Also, both should probably be CFG_DAVINCI_xxx rather than CFG_xxx, to make clear that it's not a general NAND issue.
If the old way of doing small page ECC was valid, should we preserve that (and change Linux back)?
That's a little controversial. Basically, the old OOB layout didn't match any other layout used (except by the MV kernel), the actual ECC layout meant that the method for correction was overly complex (with 170 non-obvious lines of code), and allegedly broken: http://article.gmane.org/gmane.comp.boot-loaders.u-boot/32035
The new code is about 30 lines, really simple, and I can even prove it's correctness (which I couldn't even begin to with the old code).
OK -- was just making sure that it wasn't a gratuitous change.
We should probably default to doing it the right way, not the broken-but-compatible way for large pages, though.
It depends if you put backwards compatibility over reliability though.
In the long term, I value the latter -- compatibility should be possible, but it shouldn't cause new users to continue to generate bad ECC indefinetly (both causing them reliability problems and expanding the number of people that would be affected if the default were to change down the road).
This forces the user to make a choice - they'll probably curse while they're doing it, but they can't plead ignorance when they find their large page NAND isn't detecting ECC errors.
Or when they end up getting lots of ECC errors when using non-MV Linux.
I really do believe it should be a clean switch from one format to the other, for both small and large page NAND, with no run-time backwards compatibility. But that's just my POV.
Agreed, that's the simplest route if it can be managed without surprise breakage.
-Scott

It depends if you put backwards compatibility over reliability though.
In the long term, I value the latter -- compatibility should be possible, but it shouldn't cause new users to continue to generate bad ECC indefinetly (both causing them reliability problems and expanding the number of people that would be affected if the default were to change down the road).
I spoke to Bernard and he told me he moved to other things and would probably not have the time to modify his patch based on the comments he got and to rebase it against the latest U-Boot git repository.
So that is why I send a patch based on Bernard's original work.
Hugo V.
participants (4)
-
Bernard Blackham
-
Hugo Villeneuve
-
Scott Wood
-
Troy Kisky