[U-Boot] PATCH: bugfix for nand erase failure with bad blocks

newer
[U-Boot] [PATCH 1/2] sf: sst: fix...

older
Re: [U-Boot] Problem with memcpy...

Michele De Candia (VT)

16 Jun 2009 16 Jun '09

3:06 p.m.

Hi all, this patch fixes a bug for 'nand erase' command: when bad blocks are present into erasing area, they were skipped but the erased size was updated anyway.

Regards, Michele Jr De Candia

Attachments:

nand_erase_count_bad_blocks.patch (text/x-patch — 1.2 KB)

Show replies by date

Wolfgang Denk

16 Jun 16 Jun

8:10 p.m.

Dear "Michele De Candia (VT)",

In message 4A3798C4.8000303@valueteam.com you wrote:

...

this patch fixes a bug for 'nand erase' command: when bad blocks are present into erasing area, they were skipped but the erased size was updated anyway.

And what exactly is the bug in this behaviour?

Given the fact that you don't know the number of bad blocks in advance, what do you use as reference for 100% in your display, then?

Best regards,

Wolfgang Denk

Michele De Candia (VT)

9:51 p.m.

Wolfgang Denk wrote:

...

Dear "Michele De Candia (VT)",

In message 4A3798C4.8000303@valueteam.com you wrote:

...
this patch fixes a bug for 'nand erase' command: when bad blocks are present into erasing area, they were skipped but the erased size was updated anyway.

And what exactly is the bug in this behaviour?

I think that 'erase' should have the same behaviour of 'write' and 'read' commands: skip bad blocks until read/write size is reached. If you write a script that erases and then writes a NAND area and bad blocks are not skipped while erasing (as U-Boot actually does), the following 'write' is successfully done but ECC checks fail on next read on the same area.

...

Given the fact that you don't know the number of bad blocks in advance, what do you use as reference for 100% in your display, then?

I used the size passed by the user from command line as target and the actual erased size as reference while erasing blocks and skipping bad ones.

...

Best regards,

Wolfgang Denk

-- *Michele Jr **De Candia* ------------------------------------------------------------------------ Value Team Via Vespri Siciliani, 9 20146 Milano Tel. +39 0248985722 michele.decandia@valueteam.com mailto:michele.decandia@valueteam.com http://www.valueteam.com CONFIDENTIALITY NOTICE -This message and its attachments (if any) may contain confidential, proprietary or legally privileged information and is intended only for the use of the addressee named above. No confidentiality or privilege is waived or lost by any mistransmission. If you are not the intended recipient of this message you are hereby notified that you must not use, disseminate, copy it in any form or take any action in reliance on it. If you have received this message in error please delete it and any copies of it and kindly inform the sender of this e-mail by replying or go to www.valueteam.com http://www.valueteam.com on 'contacts'.

Wolfgang Denk

10:09 p.m.

Dear "Michele De Candia (VT)",

In message 4A37F7BF.2090101@valueteam.com you wrote:

...

...
...
this patch fixes a bug for 'nand erase' command: when bad blocks are present into erasing area, they were skipped but the erased size was updated anyway.

And what exactly is the bug in this behaviour?

I think that 'erase' should have the same behaviour of 'write' and 'read' commands: skip bad blocks until read/write size is reached. If you write a script that erases and then writes a NAND area and bad blocks are not skipped while erasing (as U-Boot actually does), the following 'write' is successfully done but ECC checks fail on next read on the same area.

I see - thanks for the explanation.

Hm... actually I think the write should fail in such a case...

Scott, what do you think?

Best regards,

Wolfgang Denk

Scott Wood

10:19 p.m.

Wolfgang Denk wrote:

...

Dear "Michele De Candia (VT)",

In message 4A37F7BF.2090101@valueteam.com you wrote:

...
...
...
this patch fixes a bug for 'nand erase' command: when bad blocks are present into erasing area, they were skipped but the erased size was updated anyway.

And what exactly is the bug in this behaviour?

I think that 'erase' should have the same behaviour of 'write' and 'read' commands: skip bad blocks until read/write size is reached. If you write a script that erases and then writes a NAND area and bad blocks are not skipped while erasing (as U-Boot actually does), the following 'write' is successfully done but ECC checks fail on next read on the same area.

I see - thanks for the explanation.

Hm... actually I think the write should fail in such a case...

Scott, what do you think?

I think the current behavior is reasonable. You're erasing a specific region of flash, not an amount needed to hold a certain amount of data.

While I can see the appeal of Michele's suggestion, I think it would be more error-prone as people trying to erase a region rather than just the size of data could erase too much.

It definitely should not be an error to erase a region that happens to contain a bad block. Bad blocks are expected and we need to work around them.

-Scott

Michele De Candia (VT)

17 Jun 17 Jun

9:18 a.m.

Scott Wood wrote:

...

Wolfgang Denk wrote:

...
Dear "Michele De Candia (VT)",

In message 4A37F7BF.2090101@valueteam.com you wrote:

...
...
...
this patch fixes a bug for 'nand erase' command: when bad blocks are present into erasing area, they were skipped but the erased size was updated anyway.

And what exactly is the bug in this behaviour?

I think that 'erase' should have the same behaviour of 'write' and 'read' commands: skip bad blocks until read/write size is reached. If you write a script that erases and then writes a NAND area and bad blocks are not skipped while erasing (as U-Boot actually does), the following 'write' is successfully done but ECC checks fail on next read on the same area.

I see - thanks for the explanation.

Hm... actually I think the write should fail in such a case...

Scott, what do you think?

I think the current behavior is reasonable. You're erasing a specific region of flash, not an amount needed to hold a certain amount of data.

From this point of view you're in the right; maybe this would be explained in 'README.nand' documentation or what do you think about add ing an option to 'nand erase' command to consider 'size' field as the effective blocks size to be erased and not as the area size?

...

While I can see the appeal of Michele's suggestion, I think it would be more error-prone as people trying to erase a region rather than just the size of data could erase too much.

It definitely should not be an error to erase a region that happens to contain a bad block. Bad blocks are expected and we need to work around them.

-Scott

Michele De Candia (VT)

9:43 a.m.

Michele De Candia (VT) wrote:

...

Scott Wood wrote:

...
Wolfgang Denk wrote:

...
Dear "Michele De Candia (VT)",

In message 4A37F7BF.2090101@valueteam.com you wrote:

...
...
...
this patch fixes a bug for 'nand erase' command: when bad blocks are present into erasing area, they were skipped but the erased size was updated anyway.

And what exactly is the bug in this behaviour?

I think that 'erase' should have the same behaviour of 'write' and 'read' commands: skip bad blocks until read/write size is reached. If you write a script that erases and then writes a NAND area and bad blocks are not skipped while erasing (as U-Boot actually does), the following 'write' is successfully done but ECC checks fail on next read on the same area.

I see - thanks for the explanation.

Hm... actually I think the write should fail in such a case...

Scott, what do you think?

I think the current behavior is reasonable. You're erasing a specific region of flash, not an amount needed to hold a certain amount of data.

From this point of view you're in the right; maybe this would be explained in 'README.nand' documentation or what do you think about add ing an option to 'nand erase' command to consider 'size' field as the effective blocks size to be erased and not as the area size?

Moreover, I think that if you want to erase a specific NAND area, the correct way to use 'nand erase' command would be:

'nand erase start end'

If you want to erase an area but you want to be sure that 'size' bytes were erased, you should use:

'nand erase off size'

In this case a bad block should be skipped and next block will be erased.

Obviously this behavior must be implemented.

What do you think about it?

...

...
While I can see the appeal of Michele's suggestion, I think it would be more error-prone as people trying to erase a region rather than just the size of data could erase too much.

It definitely should not be an error to erase a region that happens to contain a bad block. Bad blocks are expected and we need to work around them.

-Scott

Michele De Candia (VT)

9:44 a.m.

Michele De Candia (VT) wrote:

...

Scott Wood wrote:

...
Wolfgang Denk wrote:

...
Dear "Michele De Candia (VT)",

In message 4A37F7BF.2090101@valueteam.com you wrote:

...
...
...
this patch fixes a bug for 'nand erase' command: when bad blocks are present into erasing area, they were skipped but the erased size was updated anyway.

And what exactly is the bug in this behaviour?

I think that 'erase' should have the same behaviour of 'write' and 'read' commands: skip bad blocks until read/write size is reached. If you write a script that erases and then writes a NAND area and bad blocks are not skipped while erasing (as U-Boot actually does), the following 'write' is successfully done but ECC checks fail on next read on the same area.

I see - thanks for the explanation.

Hm... actually I think the write should fail in such a case...

Scott, what do you think?

I think the current behavior is reasonable. You're erasing a specific region of flash, not an amount needed to hold a certain amount of data.

From this point of view you're in the right; maybe this would be explained in 'README.nand' documentation or what do you think about add ing an option to 'nand erase' command to consider 'size' field as the effective blocks size to be erased and not as the area size?

Moreover, I think that if you want to erase a specific NAND area, the correct way to use 'nand erase' command would be:

'nand erase start end'

If you want to erase an area but you want to be sure that 'size' bytes were erased, you should use:

'nand erase off size'

In this case a bad block should be skipped and next block will be erased.

Obviously this behavior must be implemented.

What do you think about it?

...

...
While I can see the appeal of Michele's suggestion, I think it would be more error-prone as people trying to erase a region rather than just the size of data could erase too much.

It definitely should not be an error to erase a region that happens to contain a bad block. Bad blocks are expected and we need to work around them.

-Scott

Scott Wood

5:54 p.m.

On Wed, Jun 17, 2009 at 09:44:41AM +0200, Michele De Candia (VT) wrote:

...

Moreover, I think that if you want to erase a specific NAND area, the correct way to use 'nand erase' command would be:

'nand erase start end'

If you want to erase an area but you want to be sure that 'size' bytes were erased, you should use:

'nand erase off size'

How would the "nand erase" command reliably distinguish between the two alternatives?

What we could do is extend the "plus" semantics (which currently allow rounding the size up to a block boundary) so that if you have a plus sign before the size it is interpreted the same as read/write.

I'm a little uneasy about changing the normal erase command from size to end -- it would break existing uses. Though, it would make it consistent with the NOR erase command. Perhaps a period where it warns but accepts anyway a size, if the second parameter is less than the first.

-Scott

Michele De Candia (VT)

6:17 p.m.

Scott Wood wrote:

...

On Wed, Jun 17, 2009 at 09:44:41AM +0200, Michele De Candia (VT) wrote:

...
Moreover, I think that if you want to erase a specific NAND area, the correct way to use 'nand erase' command would be:

'nand erase start end'

If you want to erase an area but you want to be sure that 'size' bytes were erased, you should use:

'nand erase off size'

How would the "nand erase" command reliably distinguish between the two alternatives?

What we could do is extend the "plus" semantics (which currently allow rounding the size up to a block boundary) so that if you have a plus sign before the size it is interpreted the same as read/write.

As you has suggested we could use:

'nand erase start end'

and

'nand erase off +size'

...

I'm a little uneasy about changing the normal erase command from size to end -- it would break existing uses. Though, it would make it consistent with the NOR erase command. Perhaps a period where it warns but accepts anyway a size, if the second parameter is less than the first.

This doesn't work always: for example, when you erase at the NAND begin, second parameter could be greater than first one.

It can always warn user when he uses the first erase way.

...

-Scott _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

Scott Wood

18 Jun 18 Jun

12:04 a.m.

Michele De Candia (VT) wrote:

...

...
I'm a little uneasy about changing the normal erase command from size to end -- it would break existing uses. Though, it would make it consistent with the NOR erase command. Perhaps a period where it warns but accepts anyway a size, if the second parameter is less than the first.

This doesn't work always: for example, when you erase at the NAND begin, second parameter could be greater than first one.

Hmm... perhaps check the alignment? If "end" is supposed to be the last to-be-erased byte, not the first not-to-be-erased byte, then if the low bits are 0 it's a size (and gets a warning) and if they're 1 it's an end?

Or just always use the new syntax, announce it loudly, and grep the board config files for scripts to update. Or leave it alone and only change the plus variant. :-)

...

It can always warn user when he uses the first erase way.

Then what would be the correct, non-warning-producing way to erase a region of flash regardless of its bad block content?

-Scott

Wolfgang Denk

12:15 a.m.

Dear Scott,

In message 4A39685F.6030304@freescale.com you wrote:

...

Hmm... perhaps check the alignment? If "end" is supposed to be the last to-be-erased byte, not the first not-to-be-erased byte, then if the low bits are 0 it's a size (and gets a warning) and if they're 1 it's an end?

Stop here. Don't add fancy stuf that nobody expects.

Ask yourself what the end user expects - we all think of "erase" preparing the grounds for "write", right? So both should work oin the same NAND flash region when given the same arguments (offset and seze, no matter how these are specified).

Best regards,

Wolfgang Denk

Scott Wood

12:34 a.m.

Wolfgang Denk wrote:

...

Dear Scott,

In message 4A39685F.6030304@freescale.com you wrote:

...
Hmm... perhaps check the alignment? If "end" is supposed to be the last to-be-erased byte, not the first not-to-be-erased byte, then if the low bits are 0 it's a size (and gets a warning) and if they're 1 it's an end?

Stop here. Don't add fancy stuf that nobody expects.

Nobody expects the semantics to silently change...

I'm not particularly advocating this approach, just throwing out alternatives. Leaving it alone is probably best.

...

Ask yourself what the end user expects - we all think of "erase" preparing the grounds for "write", right?

Sometimes. Other times, it could be preparing for use by a filesystem (which may not use the same bad block handling scheme), to reset the environment, for testing, etc.

We have the plus syntax specifically for the use case of erase+write of a specific number of bytes; IMO that's the place for this kind of interpretation.

...

So both should work oin the same NAND flash region when given the same arguments (offset and seze,

So then there would be no reliable way of erasing a specific range of flash. To properly use the block-skipping approach, the user needs to have in mind a range of actual blocks that the data can take up (i.e. a maximum number of bad blocks to expect), so that they can avoid locating the next partition within that range. I don't think it makes sense to try to completely hide this.

Perhaps "write" should accept an optional limit argument, returning an error if enough bad blocks were encountered to bust the limit.

...

no matter how these are specified).

If the syntax were changed to start/end, but the erase could go beyond end anyway, that would be extremely confusing.

-Scott

Michele De Candia (VT)

19 Jun 19 Jun

9:01 a.m.

Scott Wood wrote:

...

Wolfgang Denk wrote:

...
Dear Scott,

In message 4A39685F.6030304@freescale.com you wrote:

...
Hmm... perhaps check the alignment? If "end" is supposed to be the last to-be-erased byte, not the first not-to-be-erased byte, then if the low bits are 0 it's a size (and gets a warning) and if they're 1 it's an end?

Stop here. Don't add fancy stuf that nobody expects.

Nobody expects the semantics to silently change...

I'm not particularly advocating this approach, just throwing out alternatives. Leaving it alone is probably best.

...
Ask yourself what the end user expects - we all think of "erase" preparing the grounds for "write", right?

Sometimes. Other times, it could be preparing for use by a filesystem (which may not use the same bad block handling scheme), to reset the environment, for testing, etc.

We have the plus syntax specifically for the use case of erase+write of a specific number of bytes; IMO that's the place for this kind of interpretation.

...
So both should work oin the same NAND flash region when given the same arguments (offset and seze,

So then there would be no reliable way of erasing a specific range of flash. To properly use the block-skipping approach, the user needs to have in mind a range of actual blocks that the data can take up (i.e. a maximum number of bad blocks to expect), so that they can avoid locating the next partition within that range. I don't think it makes sense to try to completely hide this.

Perhaps "write" should accept an optional limit argument, returning an error if enough bad blocks were encountered to bust the limit.

...
no matter how these are specified).

If the syntax were changed to start/end, but the erase could go beyond end anyway, that would be extremely confusing.

-Scott

Summarizing, there are two alternatives:

- change 'erase' command aligning it with 'write' and 'read' command (what the patch does);

- add a third field to the 'erase' command, maybe called 'limit', over which erasing can't be done:

'nand erase start offset limit'

In this case, I think that 'write' command may be aligned with this new syntax too.

It's up to you.

Regards, Michele

Wolfgang Denk

18 Jun 18 Jun

12:11 a.m.

Dear Scott,

In message 20090617155421.GB6333@loki.buserror.net you wrote:

...

...
If you want to erase an area but you want to be sure that 'size' bytes were erased, you should use:

'nand erase off size'

How would the "nand erase" command reliably distinguish between the two alternatives?

It cannot.

...

What we could do is extend the "plus" semantics (which currently allow rounding the size up to a block boundary) so that if you have a plus sign before the size it is interpreted the same as read/write.

But it's not only with the "plus". I think erase should work similar as write - if we allow write to skip bad blocks and "exceed" thenetto size, then we must do the same for erase.

...

I'm a little uneasy about changing the normal erase command from size to end -- it would break existing uses. Though, it would make it consistent with the NOR erase command. Perhaps a period where it warns but accepts anyway a size, if the second parameter is less than the first.

"if the second parameter is less than the first" ? Sorry, can't parse that. What do you have in mind?

Best regards,

Wolfgang Denk

-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de The so-called "desktop metaphor" of today's workstations is instead an "airplane-seat" metaphor. Anyone who has shuffled a lap full of papers while seated between two portly passengers will recognize the difference -- one can see only a very few things at once. - Fred Brooks, Jr.

Wolfgang Denk

17 Jun 17 Jun

11:18 a.m.

Dear Scott,

In message 4A37FE47.3030203@freescale.com you wrote:

...

...
I see - thanks for the explanation.

Hm... actually I think the write should fail in such a case...

Scott, what do you think?

I think the current behavior is reasonable. You're erasing a specific region of flash, not an amount needed to hold a certain amount of data.

While I can see the appeal of Michele's suggestion, I think it would be more error-prone as people trying to erase a region rather than just the size of data could erase too much.

That was my initial thought, too, which is why I asked Michele for an explanation.

when I think about typiical use cases like automatic uypdate script similar to:

=> tftp 200000 filename => nand erase 0 +${filesize} => nand write 200000 0 ${filesize}

I (and probably any other user) will expect that the "erase" and "nand write" commands use the same interpretation for the size argument, i. e. that and "erase" followed by a "write" will have made sufficient room to write all data.

Thus I reconsidered and think the patch is actually reasonable, as it does what is the most practical use case needs.

...

It definitely should not be an error to erase a region that happens to contain a bad block. Bad blocks are expected and we need to work around them.

Agreed.

But it should be an error when writing to not erase flash blocks.

Best regards,

Wolfgang Denk

-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de "...this does not mean that some of us should not want, in a rather dispassionate sort of way, to put a bullet through csh's head." - Larry Wall in 1992Aug6.221512.5963@netlabs.com

5801

Age (days ago)

5804

Last active (days ago)

List overview

Download

15 comments

3 participants

tags (0)

participants (3)

Michele De Candia (VT)
Scott Wood
Wolfgang Denk