Re: [U-Boot] [PATCH 2/4] env_nand.c: support falling back to redundant env when writing

20 Dec 2012


      On 12/20/2012 03:28:39 PM, Phil Sutter wrote:
...
On Tue, Dec 11, 2012 at 05:12:32PM -0600, Scott Wood wrote:
...
Erase blocks are larger than write pages, yes.  I've never heard
erase
...
blocks called "pages" or write pages called "blocks" -- but my main
point is that the unit of erasing and the unit of badness are the
same.
Ah, OK. Please excuse my humble nomenclature, I never cared enough to
sort out what is called what. Of course, this is not the best basis  
for
a discussion about these things.
But getting back to the topic: The assumption of blocks getting bad,  
not
pages within a block means that for any kind of bad block prevention,
multiple blocks need to be used. Although I'm honestly speaking not
really sure why this needs to be like that. Maybe the bad page marking
would disappear when erasing the block it belongs to?
Yes, it would disappear.  This is why erase operations skip bad blocks,  
unless the scrub option is uesd.
...
...
...
...
The block to hold the environment is stored in the OOB of block
zero,
...
which is usually guaranteed to not be bad.
Erase or write block? Note that every write block has it's own
OOB.
...
"block" means "erase block".
Every write page has its own OOB, but it is erase blocks that are
marked bad.  Typically the block can be marked bad in either the
first
...
or the second page of the erase block.
Interesting. I had the impression of pages being marked bad and the
block's badness being taken from whether it contains bad pages.  
Probably
the 'nand markbad' command tricked me.
Do you mean the lack of error checking if you pass a non-block-aligned  
offset into "nand markbad"?
...
...
...
So that assumes that any block initially identified 'good' will
ever
...
...
turn 'bad' later on?
We don't currently have any mechanism for that to happen with the
environment -- which could be another good reason to have real
redundancy that doesn't get crippled from day one by having one copy
land on a factory bad block.  Of course, that requires someone to
implement support for redundant environment combined with
CONFIG_ENV_OFFSET_OOB.
Well, as long as CONFIG_ENV_OFFSET_REDUND supported falling back to  
the
other copy in case of error there would be a working system in three  
of
four cases instead of only one.
I'm not sure what you mean here -- where do "three", "four", and "one"  
come from?
...
...
Maybe a better option is to implement support for storing the
environment in ubi, although usually if your environment is in NAND
that means your U-Boot image is in NAND, so you have the same
problem
...
there.  Maybe you could have an SPL that contains ubi support, that
fits in the guaranteed-good first block.
Do you have any data on how often a block might go bad that wasn't
factory-bad, to what extent reads versus writes matter, and whether
there is anything special about block zero beyond not being
factory-bad?
No, sadly not. I'd guess this information depends on what hardware  
being
used specifically. But I suppose block zero being prone to becoming
worn just like any other block, although it not being erased as often
should help a lot.
Assuming a certain number of erase cycles after each block is worn out
and given the fact that CONFIG_ENV_OFFSET_REDUND has always both  
blocks
written (unless power failure occurs), they would turn bad at the same
time and therefore rendering the environment useless with or without
fallback. :)
That depends on whether the specified number of erase cycles per block  
is a minimum for any block not marked factory-bad, or whether some  
fraction of non-factory-bad blocks may fail early.
-Scott