Re: [U-Boot] Strange NAND issue on a P1014

On Thu, 2013-08-15 at 19:35 +0000, ANDY KENNEDY wrote:
-----Original Message----- From: Scott Wood [mailto:scottwood@freescale.com] Sent: Thursday, August 15, 2013 2:09 PM
On Thu, 2013-08-15 at 18:34 +0000, ANDY KENNEDY wrote:
All,
We are attempting to set up a NAND chip on our board through u-Boot. Strange things are happening.
What sort of "strange things"?
e.g. NAND dropping out.
Again, could you be more specific? Is there a particular error message you're getting?
Have you double checked your NAND timings?
Command lines that are > 1000 bytes getting truncated, etc.
Have you increased CONFIG_SYS_CBSIZE? Though I'm not sure why the behavior of exceeding CONFIG_SYS_CBSIZE would be unpredictable, unless there's some bounds checking bug.
During our debugging (of release 2013.04), we found the issue seemed to be in the file drivers/mtd/nand/nand_base.c file around line 2640:
chip->cmdfunc(mtd, NAND_CMD_READID, 0x00, -1);
An interesting comment is below this line:
/* Try again to make sure, as some systems the bus-hold or other * interface concerns can cause random data which looks like a * possibly credible NAND flash to appear. If the two results do * not match, ignore the device completely. */
Stranger still is that adding in a putc('x') makes the problem go away (tested via cold boot ~ 20 times, warm boot ~ 10 times). In fact, adding in a dummy function and calling it seems to do just as well.
Other information is that we had issues with long command lines in u-boot. To "fix" this (a serious hack), we adjusted config.mk's optimization level to -O1 from -Os. It seems the putting this back to -Os makes the problem *better* but does seem to move it from a cold to a warm start issue.
Long command lines? What sort of "issues"?
Long command lines = 1000 bytes or more. Issues: command lines get truncated, NAND dropping out from cold start to cold start (nothing else changing), doing a "reset" from within u-Boot detects a non-detected flash OR doesn't detect a detected flash, etc.
"detects a non-detected flash"? Do you mean detects a flash that isn't really there?
Please double check your DDR setup. It sounds like you may have general flakiness rather than a specific issue with NAND or command lines. Have you seen this on more than one board?
We had one of your guys sit with us and go through all the DDR settings (actually, it was one guy here and another on the phone with us -- some engineer by the name of ??Leigh/Lee??, I cannot recall). So, if it is wrong, it is your fault ;)!
BTW, we have hundreds of systems currently running (some even in the field) and we haven't seen any other problems. Several of us run Linux on these pretty much all day long and we have yet to see anything tank. I'm not convinced that it is a DDR issue (though, you may be 100% correct).
It may not be the DDR -- it's just the first thing to check when you see unpredictable weirdness, especially in multiple different functional areas in code that is well-established.
The default configuration file for the P1014 we modified to address our specific NAND flash (Linux reports this as: NAND device: Manufacturer ID: 0x2c, Chip ID: 0xf1 (Micron MT29F1G08ABADAWP). This is a replacement for an end of life dumb NAND. We are configuring this chip to be in the dumb, non-embedded ECC mode.).
What does "dumb, non-embedded ECC mode" mean?
Now days, most of the NAND flash chips have self-managing ECC. These generally have 2048 write pages, 128K erase pages, (8-bit) 64K OOB areas, etc.
This one does not conform to such "general" practices. The fact of the matter is that we had to use the IFC to handle the ECC for us as this chip uses a 4-bit OOB area (thus JFFS2 won't read/write to it). So, we disable the internal ECC of the chip and allow EVERYTHING (refresh, etc) to be done by the IFC. This chip has been placed into what Micron called "raw mode", IIRC.
IFC doing the ECC is what I'd consider normal usage... I'm not familiar with non-raw modes.
What do you mean by a 4-bit OOB area? Searching for this chip shows it to have 2048 byte pages, and 64 bytes of OOB per page. NOP is 4. This chip should work with JFFS2.
-Scott
participants (1)
-
Scott Wood