[U-Boot] nand_spl/nand_boot.c: why can't we do anything on ECC error?

Hi,
in nand_spl/nand_boot.c in function nand_read_page() one can read the comment in the case of ECC errors: "No chance to do something with the possible error message from correct_data(). We just hope that all possible errors are corrected by this routine."
Why can't we do anything? If an uncorrectable error has been recognized, we could at least execute an endless loop or issue a reset. Depending on the bit errors and their location in the U-Boot image, U-Boot may though boot and a runtime error could probably appear never or later or only under special circumtances. Because this is a risk (the image is corrupted), what do you think of inserting some blocking functionality?
Or did I missed something in interpreting this code?
Kind regards, Jens

On Wed, Aug 13, 2008 at 07:03:25PM +0200, Jens Gehrlein wrote:
Hi, in nand_spl/nand_boot.c in function nand_read_page() one can read the comment in the case of ECC errors: "No chance to do something with the possible error message from correct_data(). We just hope that all possible errors are corrected by this routine."
Why can't we do anything?
We can't fit printf() into the 4K NAND loader. We could fit puts(), though (it's included on 8313erdb's NAND loader).
If an uncorrectable error has been recognized, we could at least execute an endless loop or issue a reset. Depending on the bit errors and their location in the U-Boot image, U-Boot may though boot and a runtime error could probably appear never or later or only under special circumtances. Because this is a risk (the image is corrupted), what do you think of inserting some blocking functionality?
I'm open to halting if the image is corrupt (it's what nand_boot_fsl_elbc does), though I'm concerned about boards bricking when they might survive well enough to reflash.
We should definitely try to get some sort of message out.
-Scott

Scott Wood schrieb:
On Wed, Aug 13, 2008 at 07:03:25PM +0200, Jens Gehrlein wrote:
Hi, in nand_spl/nand_boot.c in function nand_read_page() one can read the comment in the case of ECC errors: "No chance to do something with the possible error message from correct_data(). We just hope that all possible errors are corrected by this routine."
Why can't we do anything?
We can't fit printf() into the 4K NAND loader. We could fit puts(), though (it's included on 8313erdb's NAND loader).
If an uncorrectable error has been recognized, we could at least execute an endless loop or issue a reset. Depending on the bit errors and their location in the U-Boot image, U-Boot may though boot and a runtime error could probably appear never or later or only under special circumtances. Because this is a risk (the image is corrupted), what do you think of inserting some blocking functionality?
I'm open to halting if the image is corrupt (it's what nand_boot_fsl_elbc does), though I'm concerned about boards bricking when they might survive well enough to reflash.
What about a function pointer or similar, so that the developer could decide himself what to do in this new routine? Of course, it has to fit into the 4 KiB block.
In some cases it could be meaningful to block in order not to run into a critical state because, for instance, peripheral HW has been wrongly initialized.
We should definitely try to get some sort of message out.
Better than nothing, although some boards won't be connected to, for instance, a serial terminal in the end version at the customer's site. Dependent on the board and it's application there may be no chance to signal the problem to the user. One way, for instance, is that it just doesn't boot. The service staff or board vendor could at least do a post analysis if the error is reproducible.
Kind regards, Jens
participants (2)
-
Jens Gehrlein
-
Scott Wood