[U-Boot] PATCH: bugfix for nand erase failure with bad blocks

Hi all, this patch fixes a bug for 'nand erase' command: when bad blocks are present into erasing area, they were skipped but the erased size was updated anyway.
Regards, Michele Jr De Candia

Dear "Michele De Candia (VT)",
In message 4A3798C4.8000303@valueteam.com you wrote:
this patch fixes a bug for 'nand erase' command: when bad blocks are present into erasing area, they were skipped but the erased size was updated anyway.
And what exactly is the bug in this behaviour?
Given the fact that you don't know the number of bad blocks in advance, what do you use as reference for 100% in your display, then?
Best regards,
Wolfgang Denk

Wolfgang Denk wrote:
Dear "Michele De Candia (VT)",
In message 4A3798C4.8000303@valueteam.com you wrote:
this patch fixes a bug for 'nand erase' command: when bad blocks are present into erasing area, they were skipped but the erased size was updated anyway.
And what exactly is the bug in this behaviour?
I think that 'erase' should have the same behaviour of 'write' and 'read' commands: skip bad blocks until read/write size is reached. If you write a script that erases and then writes a NAND area and bad blocks are not skipped while erasing (as U-Boot actually does), the following 'write' is successfully done but ECC checks fail on next read on the same area.
Given the fact that you don't know the number of bad blocks in advance, what do you use as reference for 100% in your display, then?
I used the size passed by the user from command line as target and the actual erased size as reference while erasing blocks and skipping bad ones.
Best regards,
Wolfgang Denk

Dear "Michele De Candia (VT)",
In message 4A37F7BF.2090101@valueteam.com you wrote:
this patch fixes a bug for 'nand erase' command: when bad blocks are present into erasing area, they were skipped but the erased size was updated anyway.
And what exactly is the bug in this behaviour?
I think that 'erase' should have the same behaviour of 'write' and 'read' commands: skip bad blocks until read/write size is reached. If you write a script that erases and then writes a NAND area and bad blocks are not skipped while erasing (as U-Boot actually does), the following 'write' is successfully done but ECC checks fail on next read on the same area.
I see - thanks for the explanation.
Hm... actually I think the write should fail in such a case...
Scott, what do you think?
Best regards,
Wolfgang Denk

Wolfgang Denk wrote:
Dear "Michele De Candia (VT)",
In message 4A37F7BF.2090101@valueteam.com you wrote:
this patch fixes a bug for 'nand erase' command: when bad blocks are present into erasing area, they were skipped but the erased size was updated anyway.
And what exactly is the bug in this behaviour?
I think that 'erase' should have the same behaviour of 'write' and 'read' commands: skip bad blocks until read/write size is reached. If you write a script that erases and then writes a NAND area and bad blocks are not skipped while erasing (as U-Boot actually does), the following 'write' is successfully done but ECC checks fail on next read on the same area.
I see - thanks for the explanation.
Hm... actually I think the write should fail in such a case...
Scott, what do you think?
I think the current behavior is reasonable. You're erasing a specific region of flash, not an amount needed to hold a certain amount of data.
While I can see the appeal of Michele's suggestion, I think it would be more error-prone as people trying to erase a region rather than just the size of data could erase too much.
It definitely should not be an error to erase a region that happens to contain a bad block. Bad blocks are expected and we need to work around them.
-Scott

Scott Wood wrote:
Wolfgang Denk wrote:
Dear "Michele De Candia (VT)",
In message 4A37F7BF.2090101@valueteam.com you wrote:
this patch fixes a bug for 'nand erase' command: when bad blocks are present into erasing area, they were skipped but the erased size was updated anyway.
And what exactly is the bug in this behaviour?
I think that 'erase' should have the same behaviour of 'write' and 'read' commands: skip bad blocks until read/write size is reached. If you write a script that erases and then writes a NAND area and bad blocks are not skipped while erasing (as U-Boot actually does), the following 'write' is successfully done but ECC checks fail on next read on the same area.
I see - thanks for the explanation.
Hm... actually I think the write should fail in such a case...
Scott, what do you think?
I think the current behavior is reasonable. You're erasing a specific region of flash, not an amount needed to hold a certain amount of data.
From this point of view you're in the right; maybe this would be explained in 'README.nand' documentation or what do you think about add ing an option to 'nand erase' command to consider 'size' field as the effective blocks size to be erased and not as the area size?
While I can see the appeal of Michele's suggestion, I think it would be more error-prone as people trying to erase a region rather than just the size of data could erase too much.
It definitely should not be an error to erase a region that happens to contain a bad block. Bad blocks are expected and we need to work around them.
-Scott

Michele De Candia (VT) wrote:
Scott Wood wrote:
Wolfgang Denk wrote:
Dear "Michele De Candia (VT)",
In message 4A37F7BF.2090101@valueteam.com you wrote:
this patch fixes a bug for 'nand erase' command: when bad blocks are present into erasing area, they were skipped but the erased size was updated anyway.
And what exactly is the bug in this behaviour?
I think that 'erase' should have the same behaviour of 'write' and 'read' commands: skip bad blocks until read/write size is reached. If you write a script that erases and then writes a NAND area and bad blocks are not skipped while erasing (as U-Boot actually does), the following 'write' is successfully done but ECC checks fail on next read on the same area.
I see - thanks for the explanation.
Hm... actually I think the write should fail in such a case...
Scott, what do you think?
I think the current behavior is reasonable. You're erasing a specific region of flash, not an amount needed to hold a certain amount of data.
From this point of view you're in the right; maybe this would be explained in 'README.nand' documentation or what do you think about add ing an option to 'nand erase' command to consider 'size' field as the effective blocks size to be erased and not as the area size?
Moreover, I think that if you want to erase a specific NAND area, the correct way to use 'nand erase' command would be:
'nand erase start end'
If you want to erase an area but you want to be sure that 'size' bytes were erased, you should use:
'nand erase off size'
In this case a bad block should be skipped and next block will be erased.
Obviously this behavior must be implemented.
What do you think about it?
While I can see the appeal of Michele's suggestion, I think it would be more error-prone as people trying to erase a region rather than just the size of data could erase too much.
It definitely should not be an error to erase a region that happens to contain a bad block. Bad blocks are expected and we need to work around them.
-Scott

Michele De Candia (VT) wrote:
Scott Wood wrote:
Wolfgang Denk wrote:
Dear "Michele De Candia (VT)",
In message 4A37F7BF.2090101@valueteam.com you wrote:
this patch fixes a bug for 'nand erase' command: when bad blocks are present into erasing area, they were skipped but the erased size was updated anyway.
And what exactly is the bug in this behaviour?
I think that 'erase' should have the same behaviour of 'write' and 'read' commands: skip bad blocks until read/write size is reached. If you write a script that erases and then writes a NAND area and bad blocks are not skipped while erasing (as U-Boot actually does), the following 'write' is successfully done but ECC checks fail on next read on the same area.
I see - thanks for the explanation.
Hm... actually I think the write should fail in such a case...
Scott, what do you think?
I think the current behavior is reasonable. You're erasing a specific region of flash, not an amount needed to hold a certain amount of data.
From this point of view you're in the right; maybe this would be explained in 'README.nand' documentation or what do you think about add ing an option to 'nand erase' command to consider 'size' field as the effective blocks size to be erased and not as the area size?
Moreover, I think that if you want to erase a specific NAND area, the correct way to use 'nand erase' command would be:
'nand erase start end'
If you want to erase an area but you want to be sure that 'size' bytes were erased, you should use:
'nand erase off size'
In this case a bad block should be skipped and next block will be erased.
Obviously this behavior must be implemented.
What do you think about it?
While I can see the appeal of Michele's suggestion, I think it would be more error-prone as people trying to erase a region rather than just the size of data could erase too much.
It definitely should not be an error to erase a region that happens to contain a bad block. Bad blocks are expected and we need to work around them.
-Scott

On Wed, Jun 17, 2009 at 09:44:41AM +0200, Michele De Candia (VT) wrote:
Moreover, I think that if you want to erase a specific NAND area, the correct way to use 'nand erase' command would be:
'nand erase start end'
If you want to erase an area but you want to be sure that 'size' bytes were erased, you should use:
'nand erase off size'
How would the "nand erase" command reliably distinguish between the two alternatives?
What we could do is extend the "plus" semantics (which currently allow rounding the size up to a block boundary) so that if you have a plus sign before the size it is interpreted the same as read/write.
I'm a little uneasy about changing the normal erase command from size to end -- it would break existing uses. Though, it would make it consistent with the NOR erase command. Perhaps a period where it warns but accepts anyway a size, if the second parameter is less than the first.
-Scott

Scott Wood wrote:
On Wed, Jun 17, 2009 at 09:44:41AM +0200, Michele De Candia (VT) wrote:
Moreover, I think that if you want to erase a specific NAND area, the correct way to use 'nand erase' command would be:
'nand erase start end'
If you want to erase an area but you want to be sure that 'size' bytes were erased, you should use:
'nand erase off size'
How would the "nand erase" command reliably distinguish between the two alternatives?
What we could do is extend the "plus" semantics (which currently allow rounding the size up to a block boundary) so that if you have a plus sign before the size it is interpreted the same as read/write.
As you has suggested we could use:
'nand erase start end'
and
'nand erase off +size'
I'm a little uneasy about changing the normal erase command from size to end -- it would break existing uses. Though, it would make it consistent with the NOR erase command. Perhaps a period where it warns but accepts anyway a size, if the second parameter is less than the first.
This doesn't work always: for example, when you erase at the NAND begin, second parameter could be greater than first one.
It can always warn user when he uses the first erase way.
-Scott _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

Michele De Candia (VT) wrote:
I'm a little uneasy about changing the normal erase command from size to end -- it would break existing uses. Though, it would make it consistent with the NOR erase command. Perhaps a period where it warns but accepts anyway a size, if the second parameter is less than the first.
This doesn't work always: for example, when you erase at the NAND begin, second parameter could be greater than first one.
Hmm... perhaps check the alignment? If "end" is supposed to be the last to-be-erased byte, not the first not-to-be-erased byte, then if the low bits are 0 it's a size (and gets a warning) and if they're 1 it's an end?
Or just always use the new syntax, announce it loudly, and grep the board config files for scripts to update. Or leave it alone and only change the plus variant. :-)
It can always warn user when he uses the first erase way.
Then what would be the correct, non-warning-producing way to erase a region of flash regardless of its bad block content?
-Scott

Dear Scott,
In message 4A39685F.6030304@freescale.com you wrote:
Hmm... perhaps check the alignment? If "end" is supposed to be the last to-be-erased byte, not the first not-to-be-erased byte, then if the low bits are 0 it's a size (and gets a warning) and if they're 1 it's an end?
Stop here. Don't add fancy stuf that nobody expects.
Ask yourself what the end user expects - we all think of "erase" preparing the grounds for "write", right? So both should work oin the same NAND flash region when given the same arguments (offset and seze, no matter how these are specified).
Best regards,
Wolfgang Denk

Wolfgang Denk wrote:
Dear Scott,
In message 4A39685F.6030304@freescale.com you wrote:
Hmm... perhaps check the alignment? If "end" is supposed to be the last to-be-erased byte, not the first not-to-be-erased byte, then if the low bits are 0 it's a size (and gets a warning) and if they're 1 it's an end?
Stop here. Don't add fancy stuf that nobody expects.
Nobody expects the semantics to silently change...
I'm not particularly advocating this approach, just throwing out alternatives. Leaving it alone is probably best.
Ask yourself what the end user expects - we all think of "erase" preparing the grounds for "write", right?
Sometimes. Other times, it could be preparing for use by a filesystem (which may not use the same bad block handling scheme), to reset the environment, for testing, etc.
We have the plus syntax specifically for the use case of erase+write of a specific number of bytes; IMO that's the place for this kind of interpretation.
So both should work oin the same NAND flash region when given the same arguments (offset and seze,
So then there would be no reliable way of erasing a specific range of flash. To properly use the block-skipping approach, the user needs to have in mind a range of actual blocks that the data can take up (i.e. a maximum number of bad blocks to expect), so that they can avoid locating the next partition within that range. I don't think it makes sense to try to completely hide this.
Perhaps "write" should accept an optional limit argument, returning an error if enough bad blocks were encountered to bust the limit.
no matter how these are specified).
If the syntax were changed to start/end, but the erase could go beyond end anyway, that would be extremely confusing.
-Scott

Scott Wood wrote:
Wolfgang Denk wrote:
Dear Scott,
In message 4A39685F.6030304@freescale.com you wrote:
Hmm... perhaps check the alignment? If "end" is supposed to be the last to-be-erased byte, not the first not-to-be-erased byte, then if the low bits are 0 it's a size (and gets a warning) and if they're 1 it's an end?
Stop here. Don't add fancy stuf that nobody expects.
Nobody expects the semantics to silently change...
I'm not particularly advocating this approach, just throwing out alternatives. Leaving it alone is probably best.
Ask yourself what the end user expects - we all think of "erase" preparing the grounds for "write", right?
Sometimes. Other times, it could be preparing for use by a filesystem (which may not use the same bad block handling scheme), to reset the environment, for testing, etc.
We have the plus syntax specifically for the use case of erase+write of a specific number of bytes; IMO that's the place for this kind of interpretation.
So both should work oin the same NAND flash region when given the same arguments (offset and seze,
So then there would be no reliable way of erasing a specific range of flash. To properly use the block-skipping approach, the user needs to have in mind a range of actual blocks that the data can take up (i.e. a maximum number of bad blocks to expect), so that they can avoid locating the next partition within that range. I don't think it makes sense to try to completely hide this.
Perhaps "write" should accept an optional limit argument, returning an error if enough bad blocks were encountered to bust the limit.
no matter how these are specified).
If the syntax were changed to start/end, but the erase could go beyond end anyway, that would be extremely confusing.
-Scott
Summarizing, there are two alternatives:
- change 'erase' command aligning it with 'write' and 'read' command (what the patch does);
- add a third field to the 'erase' command, maybe called 'limit', over which erasing can't be done:
'nand erase start offset limit'
In this case, I think that 'write' command may be aligned with this new syntax too.
It's up to you.
Regards, Michele

Dear Scott,
In message 20090617155421.GB6333@loki.buserror.net you wrote:
If you want to erase an area but you want to be sure that 'size' bytes were erased, you should use:
'nand erase off size'
How would the "nand erase" command reliably distinguish between the two alternatives?
It cannot.
What we could do is extend the "plus" semantics (which currently allow rounding the size up to a block boundary) so that if you have a plus sign before the size it is interpreted the same as read/write.
But it's not only with the "plus". I think erase should work similar as write - if we allow write to skip bad blocks and "exceed" thenetto size, then we must do the same for erase.
I'm a little uneasy about changing the normal erase command from size to end -- it would break existing uses. Though, it would make it consistent with the NOR erase command. Perhaps a period where it warns but accepts anyway a size, if the second parameter is less than the first.
"if the second parameter is less than the first" ? Sorry, can't parse that. What do you have in mind?
Best regards,
Wolfgang Denk

Dear Scott,
In message 4A37FE47.3030203@freescale.com you wrote:
I see - thanks for the explanation.
Hm... actually I think the write should fail in such a case...
Scott, what do you think?
I think the current behavior is reasonable. You're erasing a specific region of flash, not an amount needed to hold a certain amount of data.
While I can see the appeal of Michele's suggestion, I think it would be more error-prone as people trying to erase a region rather than just the size of data could erase too much.
That was my initial thought, too, which is why I asked Michele for an explanation.
when I think about typiical use cases like automatic uypdate script similar to:
=> tftp 200000 filename => nand erase 0 +${filesize} => nand write 200000 0 ${filesize}
I (and probably any other user) will expect that the "erase" and "nand write" commands use the same interpretation for the size argument, i. e. that and "erase" followed by a "write" will have made sufficient room to write all data.
Thus I reconsidered and think the patch is actually reasonable, as it does what is the most practical use case needs.
It definitely should not be an error to erase a region that happens to contain a bad block. Bad blocks are expected and we need to work around them.
Agreed.
But it should be an error when writing to not erase flash blocks.
Best regards,
Wolfgang Denk
participants (3)
-
Michele De Candia (VT)
-
Scott Wood
-
Wolfgang Denk