
On 12/20/2012 03:28:39 PM, Phil Sutter wrote:
On Tue, Dec 11, 2012 at 05:12:32PM -0600, Scott Wood wrote:
Erase blocks are larger than write pages, yes. I've never heard
erase
blocks called "pages" or write pages called "blocks" -- but my main point is that the unit of erasing and the unit of badness are the
same.
Ah, OK. Please excuse my humble nomenclature, I never cared enough to sort out what is called what. Of course, this is not the best basis for a discussion about these things.
But getting back to the topic: The assumption of blocks getting bad, not pages within a block means that for any kind of bad block prevention, multiple blocks need to be used. Although I'm honestly speaking not really sure why this needs to be like that. Maybe the bad page marking would disappear when erasing the block it belongs to?
Yes, it would disappear. This is why erase operations skip bad blocks, unless the scrub option is uesd.
The block to hold the environment is stored in the OOB of block
zero,
which is usually guaranteed to not be bad.
Erase or write block? Note that every write block has it's own
OOB.
"block" means "erase block".
Every write page has its own OOB, but it is erase blocks that are marked bad. Typically the block can be marked bad in either the
first
or the second page of the erase block.
Interesting. I had the impression of pages being marked bad and the block's badness being taken from whether it contains bad pages. Probably the 'nand markbad' command tricked me.
Do you mean the lack of error checking if you pass a non-block-aligned offset into "nand markbad"?
So that assumes that any block initially identified 'good' will
ever
turn 'bad' later on?
We don't currently have any mechanism for that to happen with the environment -- which could be another good reason to have real redundancy that doesn't get crippled from day one by having one copy land on a factory bad block. Of course, that requires someone to implement support for redundant environment combined with CONFIG_ENV_OFFSET_OOB.
Well, as long as CONFIG_ENV_OFFSET_REDUND supported falling back to the other copy in case of error there would be a working system in three of four cases instead of only one.
I'm not sure what you mean here -- where do "three", "four", and "one" come from?
Maybe a better option is to implement support for storing the environment in ubi, although usually if your environment is in NAND that means your U-Boot image is in NAND, so you have the same
problem
there. Maybe you could have an SPL that contains ubi support, that fits in the guaranteed-good first block.
Do you have any data on how often a block might go bad that wasn't factory-bad, to what extent reads versus writes matter, and whether there is anything special about block zero beyond not being
factory-bad?
No, sadly not. I'd guess this information depends on what hardware being used specifically. But I suppose block zero being prone to becoming worn just like any other block, although it not being erased as often should help a lot.
Assuming a certain number of erase cycles after each block is worn out and given the fact that CONFIG_ENV_OFFSET_REDUND has always both blocks written (unless power failure occurs), they would turn bad at the same time and therefore rendering the environment useless with or without fallback. :)
That depends on whether the specified number of erase cycles per block is a minimum for any block not marked factory-bad, or whether some fraction of non-factory-bad blocks may fail early.
-Scott