[U-Boot-Users] Loading from NAND using 'nboot' Periodically Fails Where 'nand read' Succeeds

Before I jump in with the BDI and start debugging, has anyone else using 'nboot' and FIT images noticed that 'nboot' periodically fails where 'nand read.i' of the SAME region of NAND succeeds?
=> echo $bootaddr 800000 => echo $boot0 nboot ${bootaddr} 0 0 && setenv bootargs root=/dev/mtdblock9 && run addjffs2 addtty && bootm ${bootaddr} => run boot0
Loading from NAND 64MiB 3,3V 8-bit, offset 0x0 ** Bad FIT image format
=> nand read.i ${bootaddr} 0 400000
NAND read: device 0 offset 0x0, size 0x400000
Reading data from 0x3ffe00 -- 100% complete. 4194304 bytes read: OK => setenv bootargs root=/dev/mtdblock9 => run addjffs2 addtty => bootm ${bootaddr} ## Booting kernel from FIT Image at 00800000 ... Using 'config@1' configuration Trying 'kernel@1' kernel subimage ... Using Haleakala machine description Linux version 2.6.25-rc3-00951-g6514352-dirty (gerickson@ubuntu-fusion) (gcc version 4.0.0 (DENX ELDK 4.1 4.0.0)) #2 Wed May 28 22:49:36 PDT 2008 Zone PFN ranges: DMA 0 -> 65536 Normal 65536 -> 65536 Movable zone start PFN for each node early_node_map[1] active PFN ranges ...
This is using AMCC's Haleakala board with Samsung K9F1208U0B NAND, though I suspect that doesn't make any difference since nand read.i works fine.
Regards,
Grant

Hi Grant,
On Monday 02 June 2008, Grant Erickson wrote:
Before I jump in with the BDI and start debugging, has anyone else using 'nboot' and FIT images noticed that 'nboot' periodically fails where 'nand read.i' of the SAME region of NAND succeeds?
Not sure here, since I never used nboot before. But "nand read.i" skips bad blocks and perhaps "nboot" not? I suggest that you check if this is the case and if you have bad blocks in this NAND area.
Best regards, Stefan
===================================================================== DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: +49-8142-66989-0 Fax: +49-8142-66989-80 Email: office@denx.de =====================================================================

On Mon, Jun 02, 2008 at 08:22:21AM +0200, Stefan Roese wrote:
Hi Grant,
On Monday 02 June 2008, Grant Erickson wrote:
Before I jump in with the BDI and start debugging, has anyone else using 'nboot' and FIT images noticed that 'nboot' periodically fails where 'nand read.i' of the SAME region of NAND succeeds?
Not sure here, since I never used nboot before. But "nand read.i" skips bad blocks and perhaps "nboot" not? I suggest that you check if this is the case and if you have bad blocks in this NAND area.
It is indeed the case -- you need to use "nboot.i".
-Scott

On 6/2/08 11:21 AM, Scott Wood wrote:
On Mon, Jun 02, 2008 at 08:22:21AM +0200, Stefan Roese wrote:
Hi Grant,
On Monday 02 June 2008, Grant Erickson wrote:
Before I jump in with the BDI and start debugging, has anyone else using 'nboot' and FIT images noticed that 'nboot' periodically fails where 'nand read.i' of the SAME region of NAND succeeds?
Not sure here, since I never used nboot before. But "nand read.i" skips bad blocks and perhaps "nboot" not? I suggest that you check if this is the case and if you have bad blocks in this NAND area.
It is indeed the case -- you need to use "nboot.i".
-Scott
Scott and Stefan,
Thanks for the suggestion. That solved it. As an academic exercise, is there any practical reason a system would want to use nboot, as I erroneously chose to do, without .i|.jffs2|.e?
Regards,
Grant

Grant Erickson wrote:
Thanks for the suggestion. That solved it. As an academic exercise, is there any practical reason a system would want to use nboot, as I erroneously chose to do, without .i|.jffs2|.e?
I don't think so, though I don't know the history involved. Does anyone actually use the non-block-skipping versions of any of the nand commands (intentionally, that is)? If the answer is no, then we could make it the default.
-Scott

I would vote for making bad black handling the default. I've been working on fixing up a design of ours that mistakenly used non block skipping version and I've been trying to find all the places were bad block's were not being skipped and fixes them. Our system only uses NAND flash and people are very concerned about it.
Stuart
On Mon, Jun 2, 2008 at 6:07 PM, Scott Wood scottwood@freescale.com wrote:
Grant Erickson wrote:
Thanks for the suggestion. That solved it. As an academic exercise, is there any practical reason a system would want to use nboot, as I erroneously chose to do, without .i|.jffs2|.e?
I don't think so, though I don't know the history involved. Does anyone actually use the non-block-skipping versions of any of the nand commands (intentionally, that is)? If the answer is no, then we could make it the default.
-Scott
This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ U-Boot-Users mailing list U-Boot-Users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/u-boot-users

On Tuesday 03 June 2008, Scott Wood wrote:
Grant Erickson wrote:
Thanks for the suggestion. That solved it. As an academic exercise, is there any practical reason a system would want to use nboot, as I erroneously chose to do, without .i|.jffs2|.e?
I don't think so, though I don't know the history involved. Does anyone actually use the non-block-skipping versions of any of the nand commands (intentionally, that is)? If the answer is no, then we could make it the default.
I'm fine with making bad-block-skipping the default. I never used the "other" version and I don't know what it's really useful for.
Best regards, Stefan
===================================================================== DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: +49-8142-66989-0 Fax: +49-8142-66989-80 Email: office@denx.de =====================================================================

Hi Scott,
we are using the NAND stuff for a couple of boards. All use the .i or .jffs2 extension. So I also vote for making skipping the default. But the extensions should be preserved :-)
Matthias
On Tuesday 03 June 2008 00:07, Scott Wood wrote:
Grant Erickson wrote:
Thanks for the suggestion. That solved it. As an academic exercise, is there any practical reason a system would want to use nboot, as I erroneously chose to do, without .i|.jffs2|.e?
I don't think so, though I don't know the history involved. Does anyone actually use the non-block-skipping versions of any of the nand commands (intentionally, that is)? If the answer is no, then we could make it the default.
-Scott

Hi Scott,
Grant Erickson wrote:
Thanks for the suggestion. That solved it. As an academic exercise, is there any practical reason a system would want to use nboot, as I erroneously chose to do, without .i|.jffs2|.e?
I don't think so, though I don't know the history involved. Does anyone actually use the non-block-skipping versions of any of the nand commands (intentionally, that is)? If the answer is no, then we could make it the default.
I also vote for commands to be bad-block aware per default as I also already got bitten by that.
Cheers Detlev

On 6/2/08 3:02 PM, Grant Erickson wrote:
On 6/2/08 11:21 AM, Scott Wood wrote:
On Mon, Jun 02, 2008 at 08:22:21AM +0200, Stefan Roese wrote:
Hi Grant,
On Monday 02 June 2008, Grant Erickson wrote:
Before I jump in with the BDI and start debugging, has anyone else using 'nboot' and FIT images noticed that 'nboot' periodically fails where 'nand read.i' of the SAME region of NAND succeeds?
Not sure here, since I never used nboot before. But "nand read.i" skips bad blocks and perhaps "nboot" not? I suggest that you check if this is the case and if you have bad blocks in this NAND area.
It is indeed the case -- you need to use "nboot.i".
-Scott
Scott and Stefan,
Thanks for the suggestion. That solved it. As an academic exercise, is there any practical reason a system would want to use nboot, as I erroneously chose to do, without .i|.jffs2|.e?
It would appear I was slightly too quick to regard this as fixed. What I found this morning is the following with the AMCC "Haleakala" board:
1) A 48+ hour reboot test bouncing between the boot0 and boot1 partitions passed, where boot0 and boot1 are defined as:
=> printenv bootaddr bootcmd boot0 boot1 bootaddr=800000 bootcmd=run boot0 || run boot1 || reset boot0=nboot.i ${bootaddr} 0 0 && setenv bootargs root=/dev/mtdblock9 && run addjffs2 addtty && bootm ${bootaddr} boot1=nboot.i ${bootaddr} 0 1C00000 && setenv bootargs root=/dev/mtdblock11 && run addjffs2 addtty && bootm ${bootaddr}
2) I stopped that test and added power-cycling to the mix and it, again, immediately failed with:
Loading from NAND 64MiB 3,3V 8-bit, offset 0x0 ** Bad FIT image format
Loading from NAND 64MiB 3,3V 8-bit, offset 0x1c00000 ** Bad FIT image format
So, resets are not enough to trigger this issue, it takes a power cycle.
I have found that this state is recoverable by issuing a 'nand read.i' and then re-running 'boot0':
=> nand read.i ${bootaddr} 0 400000 && run boot0
From that point forward, both boot0 and boot1 work flawlessly.
I have also found that NFS booting to get Linux up and running, restarting and then running boot0 or boot1 also works from that point forward until the next power cycle.
So, there seems to be some specific state the PowerPC NDFC (NAND controller) or Samsung K9F1208U0B NAND gets in where either 'nand read.i' or the Linux MTD driver kick one or both in such a way as to get out of whatever state prevents nboot.i from working.
Strangely though, both nand read.i and nboot.i both exercise the same nand_read_opts path in nand_util.c.
Any thoughts?
Regards,
Grant

On 6/5/08 1:47 PM, Grant Erickson wrote:
On 6/2/08 3:02 PM, Grant Erickson wrote:
On 6/2/08 11:21 AM, Scott Wood wrote:
On Mon, Jun 02, 2008 at 08:22:21AM +0200, Stefan Roese wrote:
Hi Grant,
On Monday 02 June 2008, Grant Erickson wrote:
Before I jump in with the BDI and start debugging, has anyone else using 'nboot' and FIT images noticed that 'nboot' periodically fails where 'nand read.i' of the SAME region of NAND succeeds?
Not sure here, since I never used nboot before. But "nand read.i" skips bad blocks and perhaps "nboot" not? I suggest that you check if this is the case and if you have bad blocks in this NAND area.
It is indeed the case -- you need to use "nboot.i".
-Scott
Scott and Stefan,
Thanks for the suggestion. That solved it. As an academic exercise, is there any practical reason a system would want to use nboot, as I erroneously chose to do, without .i|.jffs2|.e?
It would appear I was slightly too quick to regard this as fixed. What I found this morning is the following with the AMCC "Haleakala" board:
- A 48+ hour reboot test bouncing between the boot0 and boot1 partitions
passed, where boot0 and boot1 are defined as:
=> printenv bootaddr bootcmd boot0 boot1 bootaddr=800000 bootcmd=run boot0 || run boot1 || reset boot0=nboot.i ${bootaddr} 0 0 && setenv bootargs root=/dev/mtdblock9 &&
run addjffs2 addtty && bootm ${bootaddr} boot1=nboot.i ${bootaddr} 0 1C00000 && setenv bootargs root=/dev/mtdblock11 && run addjffs2 addtty && bootm ${bootaddr}
- I stopped that test and added power-cycling to the mix and it, again,
immediately failed with:
Loading from NAND 64MiB 3,3V 8-bit, offset 0x0 ** Bad FIT image format Loading from NAND 64MiB 3,3V 8-bit, offset 0x1c00000 ** Bad FIT image format
So, resets are not enough to trigger this issue, it takes a power cycle.
I have found that this state is recoverable by issuing a 'nand read.i' and then re-running 'boot0':
=> nand read.i ${bootaddr} 0 400000 && run boot0
From that point forward, both boot0 and boot1 work flawlessly.
I have also found that NFS booting to get Linux up and running, restarting and then running boot0 or boot1 also works from that point forward until the next power cycle.
So, there seems to be some specific state the PowerPC NDFC (NAND controller) or Samsung K9F1208U0B NAND gets in where either 'nand read.i' or the Linux MTD driver kick one or both in such a way as to get out of whatever state prevents nboot.i from working.
Strangely though, both nand read.i and nboot.i both exercise the same nand_read_opts path in nand_util.c.
Any thoughts?
Marian:
I'm following up with you on this since 'git blame cmd_nand.c' seems to indicate you added the CONFIG_FIT support to this file.
Based on stepping through with the debugger, my initial guess about hardware issues may have been incorrect. Is there an implicit assumption in the following snippet from nand_load_image() in cmd_nand.c:
...
cnt = nand->oobblock; if (jffs2) { nand_read_options_t opts; memset(&opts, 0, sizeof(opts)); opts.buffer = (u_char*) addr; opts.length = cnt; opts.offset = offset; opts.quiet = 1; r = nand_read_opts(nand, &opts); } else { r = nand_read(nand, offset, &cnt, (u_char *) addr); }
if (r) { puts("** Read error\n"); show_boot_progress (-56); return 1; }
...
switch (genimg_get_format ((void *)addr)) {
...
#if defined(CONFIG_FIT) case IMAGE_FORMAT_FIT: fit_hdr = (const void *)addr; if (!fit_check_format (fit_hdr)) { show_boot_progress (-150); puts ("** Bad FIT image format\n"); return 1; } show_boot_progress (151); puts ("Fit image detected...\n");
cnt = fit_get_size (fit_hdr); break; #endif
...
that casting 'addr' to 'fit_hdr' represents more than 512 bytes of valid data to be accessed by fit_check_format()? If so, should not 'cnt = nand->oobblock' be explicitly set to match that assumption?
I am guessing that my observation that NFS booting and nand read.i addressed the issue strictly had to do with the fact that the 8 MiB address to which those operate were not getting used or otherwise updated between resets after the boot of the kernel allowing subsequent runs of 'nboot' to "leverage" the stale data.
Regards,
Grant

On 6/5/08 3:30 PM, Grant Erickson wrote:
I'm following up with you on this since 'git blame cmd_nand.c' seems to indicate you added the CONFIG_FIT support to this file.
Based on stepping through with the debugger, my initial guess about hardware issues may have been incorrect. Is there an implicit assumption in the following snippet from nand_load_image() in cmd_nand.c:
[ code omitted ]
that casting 'addr' to 'fit_hdr' represents more than 512 bytes of valid data to be accessed by fit_check_format()? If so, should not 'cnt = nand->oobblock' be explicitly set to match that assumption?
I am guessing that my observation that NFS booting and nand read.i addressed the issue strictly had to do with the fact that the 8 MiB address to which those operate were not getting used or otherwise updated between resets after the boot of the kernel allowing subsequent runs of 'nboot' to "leverage" the stale data.
The boot.itb image I have in NAND is 0x13CB98 bytes in size. Running a series of 'nand read.i ${bootaddr} 0 <...>':
=> nand read.i ${bootaddr} 0 1000 && iminfo ${bootaddr} ## Checking Image at 00800000 ... FIT image found Bad FIT image format!
=> nand read.i ${bootaddr} 0 2000 && iminfo ${bootaddr} ... => nand read.i ${bootaddr} 0 4000 && iminfo ${bootaddr} ... => nand read.i ${bootaddr} 0 8000 && iminfo ${bootaddr} ... => nand read.i ${bootaddr} 0 10000 && iminfo ${bootaddr} ... => nand read.i ${bootaddr} 0 20000 && iminfo ${bootaddr} ... => nand read.i ${bootaddr} 0 40000 && iminfo ${bootaddr} ... => nand read.i ${bootaddr} 0 80000 && iminfo ${bootaddr} ... => nand read.i ${bootaddr} 0 100000 && iminfo ${bootaddr} ... => nand read.i ${bootaddr} 0 200000 && iminfo ${bootaddr} ... ## Checking Image at 00800000 ... FIT image found FIT description: Linux Kernel with Device Tree Image 0 (kernel@1) Description: Kernel Type: Kernel Image Compression: gzip compressed Data Start: 0x008000c8 ...
So, it would appear that the answer, at least for this trivial boot.itb of a kernel and DTB, for how large must the initial value of 'cnt' be is "as large as the image being nboot'ed is". That said, it looks like nboot and FIT images may not work together at present with today's code.
Any thoughts?
Regards,
Grant

Hi Grant,
Grant Erickson wrote:
On 6/5/08 3:30 PM, Grant Erickson wrote:
I'm following up with you on this since 'git blame cmd_nand.c' seems to indicate you added the CONFIG_FIT support to this file.
Based on stepping through with the debugger, my initial guess about hardware issues may have been incorrect. Is there an implicit assumption in the following snippet from nand_load_image() in cmd_nand.c:
[ code omitted ]
that casting 'addr' to 'fit_hdr' represents more than 512 bytes of valid data to be accessed by fit_check_format()? If so, should not 'cnt = nand->oobblock' be explicitly set to match that assumption?
I am guessing that my observation that NFS booting and nand read.i addressed the issue strictly had to do with the fact that the 8 MiB address to which those operate were not getting used or otherwise updated between resets after the boot of the kernel allowing subsequent runs of 'nboot' to "leverage" the stale data.
The boot.itb image I have in NAND is 0x13CB98 bytes in size. Running a series of 'nand read.i ${bootaddr} 0 <...>':
...
So, it would appear that the answer, at least for this trivial boot.itb of a kernel and DTB, for how large must the initial value of 'cnt' be is "as large as the image being nboot'ed is". That said, it looks like nboot and FIT images may not work together at present with today's code.
Any thoughts?
Doing a FIT format check on a first sector data is obviously wrong. This was a good spot for such check with the initial implementation of the routine, but it should have been corrected after that changed, stupid me. As you'll note fit_print_contents() call is deferred, and that is for the same reason, it needs the whole image data to operate on. Same, for format check, it cannot be done earlier than that.
I'll post a patch later today that fixes it, please give it a try on your system.
Cheers, m.
participants (7)
-
Detlev Zundel
-
Grant Erickson
-
Marian Balakowicz
-
Matthias Fuchs
-
Scott Wood
-
Stefan Roese
-
Stuart Wood