[U-Boot-Users] bug in cfi_flash error detection

Hi all,
recently I got back a board from our customer, where some bits in the (nor) flash could not be programmed due to an error in the flash chip (-> very rare error).
I tought, this would be a good reason, to test the common cfi_flash driver with this board (so far we used a board specific flash driver on this board).
But the cfi_flash driver does not detect the error as I expected. I took a closer look to the driver and found some bugs (at least in my opinion).
I added a patch wich tries to fix it, but I'm not sure if this is the proper way it should be done (e. g. because it adds bytes to cfi_flash.o).
Here a brief description of what the patch tries to fix:
- Timeout calculation in flash_status_check(): The function calculates seconds instead of miliseconds. And on some boards the range of ulong is exceeded in the calculation (because CFG_HZ is very big on some boards).
- In flash_status_check() a bogus error message is printed (secotor number is always 0 and the printed data value is not from the address where the error occours).
- flash_status_check() uses always the erase block timeout, regardles if eraseing flash or writing flash.
- If the write timeout of the flash device is below 1 ms it is erroneously calculated to zero.
Any comment would be welcome.
Regards, Martin
TQ-Systems GmbH Mühlstraße 2, Gut Delling, 82229 Seefeld Tel. +49 (8153) 93 08-157, Fax +49 (8153) 93 08-7157 martin.krause@tqs.de www.tq-group.com

Martin Krause wrote:
Hi all,
Here a brief description of what the patch tries to fix:
Timeout calculation in flash_status_check(): The function calculates seconds instead of miliseconds. And on some boards the range of ulong is exceeded in the calculation (because CFG_HZ is very big on some boards).
In flash_status_check() a bogus error message is printed (secotor number is always 0 and the printed data value is not from the address where the error occours).
flash_status_check() uses always the erase block timeout, regardles if eraseing flash or writing flash.
If the write timeout of the flash device is below 1 ms it is erroneously calculated to zero.
Some of the issues you are fixing are also fixed my earlier patch in the pipeline (from July). See my patch below for items 1,3,4 in your list.
http://sourceforge.net/mailarchive/message.php?msg_id=12234135
I think, in flash_status_check() your patch timeout check is not correct. get_timer() returns in msec as well as tout is in msecs. So, they are directly comparable without involving CFG_HZ.
I think the full status check needs to be applied to sector 0 irrespective of what sector you are operating with because this is where the flash status register resides. I think some flash ignore the sector address while others require sector 0. I would be careful in changing this without verifying for a wide range of flash chips. Yes, the sector number will be wrong in error if you activate debug outputs but it is only cosmetic as far as I can see. Also, finding the sector number each time from address is a time consuming search that will slow the flash write operation unnecessarily. I am more willing to take the wrong sector number in debug output (if enabled) than slowed down status check which is used in a number of places.
Best regards, Tolunay

In message 47F3F98010FF784EBEE6526EAAB078D1C05E04@tq-mailsrv.tq-net.de you wrote:
- Timeout calculation in flash_status_check(): The function calculates seconds instead of miliseconds.=20 And on some boards the range of ulong is exceeded in the calculation (because CFG_HZ is very big on some boards).
This is a bug with such boards. CFG_HZ should always be set to 1000. Looking back I have to admit that it was a major design error to create this variable at all.
Best regards,
Wolfgang Denk
participants (3)
-
Martin Krause
-
Tolunay Orkun
-
Wolfgang Denk