Re: [U-Boot] [PATCH v6 2/8] lpc32xx: mtd: nand: add MLC NAND controller

23 Mar 2015

      Bonjour Scott,
Le Fri, 20 Mar 2015 17:41:11 -0500, Scott Wood
scottwood@freescale.com a écrit :
...
On Fri, 2015-03-20 at 10:35 +0100, Albert ARIBAUD wrote:
...
Hi Scott,
Le Thu, 19 Mar 2015 16:39:42 -0500, Scott Wood
scottwood@freescale.com a écrit :
...
On Wed, 2015-03-18 at 10:04 +0100, Albert ARIBAUD (3ADEV) wrote:
...
+int nand_spl_load_image(uint32_t offs, unsigned int size, void *dst)
+{

int block_good;

bool?
...

struct lpc32xx_oob oob;
unsigned int page, left;

/* if starting mid-block consider block good */
block_good = 1;

for (page = offs / BYTES_PER_PAGE,
    left = DIV_ROUND_UP(size, BYTES_PER_PAGE);

    left && (page < PAGES_PER_CHIP_MAX); page++) {

/* read inband and oob page data anyway */

int res = read_single_page(dst, page, &oob);

/* return error if a good block's page was not readable */

if ((res < 0) && block_good)

	return -1;

Why even try to read it if it's been marked bad?
Actually, at this point, we may not know yet if the block is good or
bad, since we might be reading a page of it for the first time. Will
fix, see below.
...
...

/* if first page in a block, update 'block good' status */

The non-SPL driver appears to be checking the last two pages in the
block, not the first page.
Thanks for pointing out this discrepancy.
I took the first page option here after checking the NAND's datasheet,
which says that for factory bad block marking at least, while all pages
in a bad block /should/ bear the bad block marker, only the first one is
/guaranteed/ to have it. The driver might have inherited the 'last page'
option from the previous NAND chip used, or from a mistake of my own.
Are there any plans for this controller to be used with different chips?
Not that I know; but in the same vein, the current LPC32XX maintainer
did not consider there were any plans for a few devices in it, and then
came my patch series :) so I cannot rule out that someday a new LPC32XX
board pops up in U-Boot with another NAND device.
...
...
BTW, is there a standard way to ask a NAND chip which page(s) in a bad
block should be checked?
Yes and no:
http://lists.infradead.org/pipermail/linux-mtd/2011-November/038419.html
... right.
How about this: the driver expects a driver-specific configuration
option which tells it which page(s) in a block, and which byte in a
page's OOB area, should be scanned for bad block markers, and "my"
board provides a value for said option.
This way, when the driver is used with a new NAND chip in another
board, it can be configured for this board's specific case.
...
...
...
...

if ((page % PAGES_PER_BLOCK) == 0) {

	/* assume block is good */

	block_good = 1;

	/* if page could not be read, assume block is bad */

	if (res < 0)

		block_good = 0;

No.  A block is to be skipped if and only if it has a bad block marker.
ECC errors should not be silently ignored just because they happen on
the first page of a block.  If the writer of this data didn't skip the
block, then skipping it when reading will result in unusable data
regardless of the underlying ECC problem.
You're correct of course.
However, I do want the SPL chainloading to be as resilient as
possible, and we have established that it will have to read some part
of a bad block -- possibly resulting in a read failure -- before
knowing that the block is bad, which means it may have to finally
ignore the read failure.
FWIW, the eLBC/IFC SPL functions have the same problem regarding halting
on an ECC error before checking the bad block status.  Instead it should
return an error code, and let the caller halt after it's sure it's not
marked bad.
Maybe we could generalize an SPL chainloading common/spl/spl.c routine
and have boards /SoCs only provide the hardware-specific page/OOB read
function(s) ? Honestly, that would be way beyond my scope for this
patch series.
...
...
I guess I could wait to declare the block good or bad until I could
read at least one if its pages (actually, one of its pages' OOB data)
properly, only bail out if the OOB says the block is supposed to be
good.
Now, if none of the block's pages could be read, this still prevents us
from knowing whether these read failures are catastrophic.
If the block was actually bad and skipped, then the resulting image
might be intact, will pass the checksum test, and will run; if the
block was actually good, SPL will detect it and not boot the corrupt
image -- except if the completely unreadable good block was the first
one, which holds the signature and checksum, in which case SPL will
fall back to considering a raw, unsigned, unverifiable image, and will
jump into corrupt code.
This may happen; but I think the probability of having a completely
unreadable sector is very low, and the probability of this sector being
the first one of the image is very low too [1].
Which leads me to two possible courses of action:

consider the risk even though the probabilities are very low, thus
consider any totally unreadable sector as good, and bail out in this
case, or

I was thinking instead to distinguish between a hard failure where you
got no data from the NAND (e.g. a timeout), versus an ECC failure.  In
the later case you can still look in the OOB for a bad block marker.
...

add an option to SPL that prevents common/spl/spl.c from falling back
to running the image as raw if no signature is found, and consider
any totally unreadable sector as bad and therefore ignore the read
errors. Either the sector was actually good, and SPL will catch the
image as corrupt or missing a signature, or the sector was actually
bad, and the image will run as expected.

...but this seems reasonable as well (it wouldn't work for eLBC/IFC
since they need to support tiny SPLs that can only handle raw images).
This leads me to a half-OT question: so those SPL, while too tiny to
handle non-raw images, still do include the whole common/spl/spl.c
and thus contain code which will never apply to them? Then maybe we
should have /two/ options, one enabling raw images, and one enabling
'non-raw' images (and at least one enabled for any SPL), so that each
SPL could choose what code it wants in. Again, I am not offering to do
that in this patch series.
...
If you want to do this, just put a comment in explaining why you're
skipping in that situation.
Ok -- I will go with the 'option' method then, and add a (lengthy, I'm
afraid) comment in common/spl/spl.c explaining that skipping the raw
image handling is required when chainloading from NAND and the NAND
driver considers totally unreadable sectors bad, in order not to
confuse a missing image header with a raw image.
...
-Scott
Thanks for the feedback!
Cordialement,
Albert ARIBAUD
3ADEV