Re: [U-Boot-Users] skipping bad blocks when erasing nand

Silently ignoring such errors without even printing a message and setting the return code of the command is not acceptable.
I guess I don't understand what's going on enough then. Is it really an error to come across a bad block? I thought that was part of the deal when using nand chips. If, for example, I want to erase a whole chip, then silently skipping bad blocks seems like the way to go. I could see keeping some kind of debug message, but something less severe than KERN_WARNING. Would that make this acceptable?
Thanks for your help.
-DB

In message 000001c6a6ca$bd0b3230$a134800a@RudiDell you wrote:
Silently ignoring such errors without even printing a message and setting the return code of the command is not acceptable.
I guess I don't understand what's going on enough then. Is it really an error to come across a bad block? I thought that was part of the deal when
It should at least be reported to the user - otherwise he assumes to have an error-free medium.
using nand chips. If, for example, I want to erase a whole chip, then silently skipping bad blocks seems like the way to go. I could see keeping
No, I think I would want to see reports about unerasable blocks here.
some kind of debug message, but something less severe than KERN_WARNING. Would that make this acceptable?
It should be a regular message, not only debug.
Best regards,
Wolfgang Denk

Hi,
Replying to the whole list this time around....
It should at least be reported to the user - otherwise he assumes to have an error-free medium.
Hard-drives have bad blocks, but you don't see them being reported when you format.
Somebody who knows that NAND chips have bad blocks as a normal course of events won't be suprised to see them. People (i.e. end users - not the programmers) who are unaware that NAND chips can have bad blocks will be suprised, so the whole thing about bad blocks being normal now has to be documented.
I think that if a block is marked as bad, then doing an erase should just ignore blocks that already marked bad. However, if a block goes bad as a result of the formatting, then that's worth reporting (although, this is also a fact of life).
I think having a command to list the bad blocks would be useful.

Hi,
Dave Hylands wrote:
Hard-drives have bad blocks, but you don't see them being reported when you format.
Somebody who knows that NAND chips have bad blocks as a normal course of events won't be suprised to see them. People (i.e. end users - not the programmers) who are unaware that NAND chips can have bad blocks will be suprised, so the whole thing about bad blocks being normal now has to be documented.
I think that if a block is marked as bad, then doing an erase should just ignore blocks that already marked bad. However, if a block goes bad as a result of the formatting, then that's worth reporting (although, this is also a fact of life).
I also think the patch from David is OK, because the bad blocks are not skipped, which would be fatal crossing partition boundaries, but just ignored. I think even reporting about these ignored blocks isn't necessary. Instead it would be much more interesting to know about the really bad block free space.
I think having a command to list the bad blocks would be useful.
This already exists (nand bad)
Regards Joachim

On Friday 14 July 2006 00:45, Wolfgang Denk wrote:
In message 000001c6a6ca$bd0b3230$a134800a@RudiDell you wrote:
I guess I don't understand what's going on enough then. Is it really an error to come across a bad block? I thought that was part of the deal when
It should at least be reported to the user - otherwise he assumes to have an error-free medium.
I am "voting" for David's implementation, since bad blocks are "normal" on NAND chips. And if I remember correctly, the "old" U-Boot NAND driver also just skipped the bad block upon erasing without reproting them.
Best regards, Stefan

Hello Stefan,
in message 200607140914.07427.sr@denx.de you wrote:
I am "voting" for David's implementation, since bad blocks are "normal" on NAND chips. And if I remember correctly, the "old" U-Boot NAND driver also just skipped the bad block upon erasing without reproting them.
I'm sorry, but I disagree. This code is not coming out of thin air. It is a more or less vrbatim copy of the corresponding Linux MTD NAND code, see "drivers/mtd/nand/nand_base.c" in your Linux kernel tree.
In U-Boot, we have the additional #define NAND_ALLOW_ERASE_ALL which can be enabled if you don't like this behaviour.
If you believe that the U-Boot behavious is wrong, then the Linux MTD driver would be wrong, too. In this case discussion should continue on the MTD mailing list.
As long as I don't see any changes to the current MTD Linux code you will need really good arguments to talk me into changing the U-Boot code.
Best regards,
Wolfgang Denk

Hello Wolfgang,
On Friday 14 July 2006 10:28, Wolfgang Denk wrote:
in message 200607140914.07427.sr@denx.de you wrote:
I am "voting" for David's implementation, since bad blocks are "normal" on NAND chips. And if I remember correctly, the "old" U-Boot NAND driver also just skipped the bad block upon erasing without reproting them.
I'm sorry, but I disagree. This code is not coming out of thin air. It is a more or less vrbatim copy of the corresponding Linux MTD NAND code, see "drivers/mtd/nand/nand_base.c" in your Linux kernel tree.
In U-Boot, we have the additional #define NAND_ALLOW_ERASE_ALL which can be enabled if you don't like this behaviour.
If you believe that the U-Boot behavious is wrong, then the Linux MTD driver would be wrong, too. In this case discussion should continue on the MTD mailing list.
I find it hard to believe that a linux NAND erase operation will stop upon reaching a bad block. But you are right: the code in question is the same in the current linux mtd driver. From my experience, I never had problems erasing NAND flash devices (from U-Boot & linux) which had bad blocks.
As long as I don't see any changes to the current MTD Linux code you will need really good arguments to talk me into changing the U-Boot code.
I see. You have a good point here. This needs some testing on a device with bad blocks. "Unfortunately" the device on my desk has no bad blocks at all:
=> nand bad
Device 0 bad blocks:
Perhaps somebody else can jump in here and test the current linux mtd driver behavior on a device with bad blocks. Thanks.
Best regards, Stefan

Dear Stefan,
in message 200607141047.08126.sr@denx.de you wrote:
I find it hard to believe that a linux NAND erase operation will stop upon reaching a bad block. But you are right: the code in question is the same in the current linux mtd driver. From my experience, I never had problems erasing NAND flash devices (from U-Boot & linux) which had bad blocks.
The code in the MTD CVS is also the same.
Note that in my understanding this code never gets executed for bad blocks (see comment "we do not erase bad blocks") in Linux, i. e. it attempts to catch previous errors.
I see. You have a good point here. This needs some testing on a device with bad blocks. "Unfortunately" the device on my desk has no bad blocks at all:
=> nand bad
Device 0 bad blocks:
Perhaps somebody else can jump in here and test the current linux mtd driver behavior on a device with bad blocks. Thanks.
If you like you can use the "dave" board in our virtual lab. It has a couple of bad blocks in device 1:
=> nand device 1 Device 1: NAND 32MiB 3,3V 8-bit... is now current device => nand bad
Device 1 bad blocks: 00000000 00004000 00008000 0000c000 00010000 =>
Best regards,
Wolfgang Denk

Hi Stefan,
Stefan Roese wrote:
Perhaps somebody else can jump in here and test the current linux mtd driver behavior on a device with bad blocks. Thanks.
I only can provide the output based on the 2.6.12.5 kernel:
NAND device: Manufacturer ID: 0xec, Chip ID: 0xd3 (Samsung NAND 1GiB 3,3V 8-bit) Scanning device for bad blocks Bad eraseblock 4861 at 0x25fa0000 Bad eraseblock 5098 at 0x27d40000 Bad eraseblock 5163 at 0x28560000 Bad eraseblock 6252 at 0x30d80000 Bad eraseblock 7952 at 0x3e200000 Creating 3 MTD partitions on "NAND 1GiB 3,3V 8-bit": 0x00000000-0x00040000 : "u-boot" 0x00040000-0x00300000 : "kernel" 0x00300000-0x40000000 : "rootfs"
/ # mtd_debug erase /dev/mtd/2 0x22f00000 0x3000000 nand_erase: attempt to erase a bad block at page 0x0004bf40 MEMERASE: Input/output error
I think, bad blocks should not prevent U-Boot from erasing a partition, which is needed to write an OS there, which needs to be booted.
Best regards Joachim

In message 44B76ABA.3000607@fsforth.de you wrote:
/ # mtd_debug erase /dev/mtd/2 0x22f00000 0x3000000 nand_erase: attempt to erase a bad block at page 0x0004bf40 MEMERASE: Input/output error
That's the behaviour I would expect to see from looking at the code.
And what you will see in U-Boot, too - same code, same behaviour.
I think, bad blocks should not prevent U-Boot from erasing a partition, which is needed to write an OS there, which needs to be booted.
I tend to agree here, which was the reason that I said I was willing to accept the patch, but that at least the printf() shoould be kept so that the user gets the warning.
Best regards,
Wolfgang Denk

Dear Wolfgang,
On Fri, Jul 14, 2006 at 01:19:23PM +0200, Wolfgang Denk wrote:
In message 44B76ABA.3000607@fsforth.de you wrote:
/ # mtd_debug erase /dev/mtd/2 0x22f00000 0x3000000 nand_erase: attempt to erase a bad block at page 0x0004bf40 MEMERASE: Input/output error
That's the behaviour I would expect to see from looking at the code.
And what you will see in U-Boot, too - same code, same behaviour.
Indeed, but MTD code in U-Boot is invoked different way (that's my bad and I apologize for late reply). Most Linux users are probably using mtd-utils to deal with erasing NAND. flash_erase will issue MEMERASE ioctl directly and that will fail on attempt to erase bad block. However flash_eraseall will first ask for bad block using MEMGETBADBLOCK ioctl and eventually skip it, printing info message "Skipping bad block at 0x%08x" Average U-Boot user probably wants flash_eraseall behaviour...
I think, bad blocks should not prevent U-Boot from erasing a partition, which is needed to write an OS there, which needs to be booted.
I tend to agree here, which was the reason that I said I was willing to accept the patch, but that at least the printf() shoould be kept so that the user gets the warning.
So, it seems the only unanswered question is implementation. You can either accept second version of proposed patch or ask for different implementation which will be similar to flash_eraseall (ie. bad block checking done outside MTD code, so it can remain unmodified). Not being U-Boot maintainer I trade off decision :-)
Best regards, ladis
participants (6)
-
Dave Hylands
-
David Byron
-
Joachim Jaeger
-
Ladislav Michl
-
Stefan Roese
-
Wolfgang Denk