[U-Boot] NAND on Davinci boards

Hi all,
I have seen an incompatibility between the NAND driver in u-boot for the davinci boards and the linux driver (kernel 2.6.38, mainline).
I think it is not related to the specific board I use. In any case, I am using the ea20 board (OMAP-L138 based, in u-boot mainline), and I have added NAND support setting for the driver:
#define CONFIG_SYS_NAND_4BIT_HW_ECC_OOBFIRST #define CONFIG_SYS_NAND_USE_FLASH_BBT
This is done in the linux driver, too. Both drivers are using ECC in Hardware, the number of bits for ECC is set to 4, the OOB is set first, and the oob layout is the same.
The NAND driver under u-boot works flawless. The kernel is stored on NAND (the board boots from a SPI-Flash), and everything seems correct.
The same thing under linux. In Linux I am able to set up a root filesystem with UBIfs, everything ok. Problems arises when u-boot tries to access to data written from Linux and viceversa. It seems to me that the management of ECC is different in u-boot and in kernel.
If I write under Linux a kernel Image in a NAND partition, after a reset I am not able to read that partition from u-boot. Setting MTDDEBUG and a couple of printf, I see that nand_do_read_ops() report an error:
if (mtd->ecc_stats.failed - stats.failed) return -EBADMSG;
This is the output of my board:
nboot aKernel
Loading from nand0, offset 0x0 Bad block table found at page 262080, version 0x01 Bad block table found at page 262016, version 0x01 Returning error 4 0 nand_bbt: ECC error while reading bad block table nand_read_bbt: Bad block at 0x000002220000 nand_read_bbt: Bad block at 0x00000b120000 nand_read_bbt: Bad block at 0x00001f100000 nand_isbad_bbt(): bbt info for offs 0x00000000: (block 0) 0x00 Returning error 8 4 NAND read from offset 0 failed -74 ** Read error
Checking the two drivers, it seems to me that they are doing different things. However, I do not know which one is correct. I would think that the driver in u-boot is old and must be synchronized with the Linux's driver, but after checking how u-boot gets the ecc's error from Hardware it seems to me it is correct.
Has anyone seen the same issue or have proposal, which driver should be modified ?
Best regards, Stefano Babic

On 16/03/11 08:22, Stefano Babic wrote:
Hi all,
I have seen an incompatibility between the NAND driver in u-boot for the davinci boards and the linux driver (kernel 2.6.38, mainline).
I think it is not related to the specific board I use. In any case, I am using the ea20 board (OMAP-L138 based, in u-boot mainline), and I have added NAND support setting for the driver:
I'm using da830evm (OMAP-L137) with more or less up-to-date U-Boot, but quite old 2.6.18+ kernel from Montavista.
#define CONFIG_SYS_NAND_4BIT_HW_ECC_OOBFIRST #define CONFIG_SYS_NAND_USE_FLASH_BBT
I don't have BBT enabled.
This is done in the linux driver, too. Both drivers are using ECC in Hardware, the number of bits for ECC is set to 4, the OOB is set first, and the oob layout is the same.
The NAND driver under u-boot works flawless. The kernel is stored on NAND (the board boots from a SPI-Flash), and everything seems correct.
The same thing under linux. In Linux I am able to set up a root filesystem with UBIfs, everything ok. Problems arises when u-boot tries to access to data written from Linux and viceversa. It seems to me that the management of ECC is different in u-boot and in kernel.
I almost always update my NAND kernel image from within Linux. U-Boot reads this with no errors and can reboot the system correctly.
If I write under Linux a kernel Image in a NAND partition, after a reset I am not able to read that partition from u-boot. Setting MTDDEBUG and a couple of printf, I see that nand_do_read_ops() report an error:
if (mtd->ecc_stats.failed - stats.failed) return -EBADMSG;
This is the output of my board:
nboot aKernel
Loading from nand0, offset 0x0 Bad block table found at page 262080, version 0x01 Bad block table found at page 262016, version 0x01 Returning error 4 0 nand_bbt: ECC error while reading bad block table nand_read_bbt: Bad block at 0x000002220000 nand_read_bbt: Bad block at 0x00000b120000 nand_read_bbt: Bad block at 0x00001f100000 nand_isbad_bbt(): bbt info for offs 0x00000000: (block 0) 0x00 Returning error 8 4 NAND read from offset 0 failed -74 ** Read error
Checking the two drivers, it seems to me that they are doing different things. However, I do not know which one is correct. I would think that the driver in u-boot is old and must be synchronized with the Linux's driver, but after checking how u-boot gets the ecc's error from Hardware it seems to me it is correct.
Has anyone seen the same issue or have proposal, which driver should be modified ?
Neither ;)
I think the U-Boot NAND driver for davinci has always been setup to be compatible with the TI & Montavista Kernels. What I'm not sure about is if those Kernels are compatible with mainline Linux and in particular the very latest mainline kernels. In fact I'm not sure if it is compatible with the very latest TI Kernels.
Have you tried "nand dump" of a Linux programmed Kernel and compared it with "nand dump" of a U-Boot programmed Kernel? You would be able to see identical data in each case, but you will be able to compare the differences in the OOB. You only need to look at the first page to see if the OOB data or position of the OOB data differs.
You errors all seem to be in the BBT handling. I don't use BBT here.
Nick.

On 03/16/2011 11:01 AM, Nick Thompson wrote:
Hi Nick,
I'm using da830evm (OMAP-L137) with more or less up-to-date U-Boot, but quite old 2.6.18+ kernel from Montavista.
#define CONFIG_SYS_NAND_4BIT_HW_ECC_OOBFIRST #define CONFIG_SYS_NAND_USE_FLASH_BBT
I don't have BBT enabled.
Thanks, I have tried to disable it. No improvement, I got always ECC errors.
Neither ;)
I think the U-Boot NAND driver for davinci has always been setup to be compatible with the TI & Montavista Kernels.
This is probably the problem. I cannot check the montavista sources (I do not use them, but it seems that the old source.mvista.com went offline), but if u-boot sticks to mvista kernel is surely not aligned to the kernel mainline.
What I'm not sure about is if those Kernels are compatible with mainline Linux and in particular the very latest mainline kernels. In fact I'm not sure if it is compatible with the very latest TI Kernels.
I think I can answer: no. I checked PSP_3.20 from Texas, and even on the arago project. TI went from their 2.x version of PSP tools to 3.x from mvista kernel to mainline kernel, and probably at that point u-boot and kernel were not anymore compatible. I have not seen patches in drivers/mtd/davinci_nand.c related to make it suitable for newer kernels.
Have you tried "nand dump" of a Linux programmed Kernel and compared it with "nand dump" of a U-Boot programmed Kernel?
I have tried now to get the first page (=2048 bytes) from both and I have compared byte-per-byte. They are identical, inclusive the oob part.
You would be able to see identical data in each case, but you will be able to compare the differences in the OOB. You only need to look at the first page to see if the OOB data or position of the OOB data differs.
No differences at all. For both, I get in the oob:
OOB: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
However, u-boot complain about it:
nand read kernel_addr_r 0 800
NAND read: device 0 offset 0x0, size 0x800 Getting too many errors Getting too many errors Getting too many errors Getting too many errors 2048 bytes read: OK
You errors all seem to be in the BBT handling. I don't use BBT here.
Rather it is not enough - already tried to disable, same errors. It seems I can exclude errors by writing or reading raw data from the NAND. It seems the problem is related to a different interpretation of ECC results.
Stefano

On 16/03/11 12:01, Stefano Babic wrote:
On 03/16/2011 11:01 AM, Nick Thompson wrote:
Hi Nick,
I'm using da830evm (OMAP-L137) with more or less up-to-date U-Boot, but quite old 2.6.18+ kernel from Montavista.
#define CONFIG_SYS_NAND_4BIT_HW_ECC_OOBFIRST #define CONFIG_SYS_NAND_USE_FLASH_BBT
I don't have BBT enabled.
Thanks, I have tried to disable it. No improvement, I got always ECC errors.
Neither ;)
I think the U-Boot NAND driver for davinci has always been setup to be compatible with the TI & Montavista Kernels.
This is probably the problem. I cannot check the montavista sources (I do not use them, but it seems that the old source.mvista.com went offline), but if u-boot sticks to mvista kernel is surely not aligned to the kernel mainline.
What I'm not sure about is if those Kernels are compatible with mainline Linux and in particular the very latest mainline kernels. In fact I'm not sure if it is compatible with the very latest TI Kernels.
I think I can answer: no. I checked PSP_3.20 from Texas, and even on the arago project. TI went from their 2.x version of PSP tools to 3.x from mvista kernel to mainline kernel, and probably at that point u-boot and kernel were not anymore compatible. I have not seen patches in drivers/mtd/davinci_nand.c related to make it suitable for newer kernels.
You may be correct, but maybe you have another problem first...
Have you tried "nand dump" of a Linux programmed Kernel and compared it with "nand dump" of a U-Boot programmed Kernel?
I have tried now to get the first page (=2048 bytes) from both and I have compared byte-per-byte. They are identical, inclusive the oob part.
You would be able to see identical data in each case, but you will be able to compare the differences in the OOB. You only need to look at the first page to see if the OOB data or position of the OOB data differs.
No differences at all. For both, I get in the oob:
OOB: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Is this really from the OOB for the first _kernel_ page. It looks wrong.
I see:
nand dump 0x100000
<snip> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 9a ea 40 97 85 bc 5f f5 2e 15 91 c2 c6 93 14 c0 03 e3 b6 4c 35 40 2d 8f 7e 74 10 13 59 47 cf 09 24 10 6a 0a 8b e2 f1 b0
The part after all the ff's is the ECC. IIRC a zero ECC implies all the data in the page is zero also. That would be an odd start to a Kernel image.
Can you confirm what it is you dumped?
However, u-boot complain about it:
nand read kernel_addr_r 0 800
NAND read: device 0 offset 0x0, size 0x800 Getting too many errors Getting too many errors Getting too many errors Getting too many errors 2048 bytes read: OK
You errors all seem to be in the BBT handling. I don't use BBT here.
Rather it is not enough - already tried to disable, same errors. It seems I can exclude errors by writing or reading raw data from the NAND. It seems the problem is related to a different interpretation of ECC results.
Stefano
Nick.

On 03/16/2011 01:12 PM, Nick Thompson wrote:
You may be correct, but maybe you have another problem first...
Yes, you are right...
Have you tried "nand dump" of a Linux programmed Kernel and compared it with "nand dump" of a U-Boot programmed Kernel?
I have tried now to get the first page (=2048 bytes) from both and I have compared byte-per-byte. They are identical, inclusive the oob part.
You would be able to see identical data in each case, but you will be able to compare the differences in the OOB. You only need to look at the first page to see if the OOB data or position of the OOB data differs.
No differences at all. For both, I get in the oob:
OOB: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Is this really from the OOB for the first _kernel_ page. It looks wrong.
Yes. this is when I write the kernel from linux.
I see:
nand dump 0x100000
<snip> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 9a ea 40 97 85 bc 5f f5 2e 15 91 c2 c6 93 14 c0 03 e3 b6 4c 35 40 2d 8f 7e 74 10 13 59 47 cf 09 24 10 6a 0a 8b e2 f1 b0
The part after all the ff's is the ECC. IIRC a zero ECC implies all the data in the page is zero also. That would be an odd start to a Kernel image.
Can you confirm what it is you dumped?
Yes. This However, when I write the kernel with u-boot, I get:
OOB: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff a7 af c5 ed 87 86 2f 1c f9 31 10 92 4a 34 5a 7d 91 cf e0 fd b6 3f 4b ae ca 63 86 9c 2d 91 d2 6c 95 73 1b 4b e0 09 ed a3
It looks like Linux has not written the ECCs at all....
Stefano

Stefano,
On 16/03/11 12:36, Stefano Babic wrote:
It looks like Linux has not written the ECCs at all....
So I'll leave you to look into that problem.
You could still be correct about Kernel compatibilities, though I hope not. I'm encouraged that the zeros where in the "correct" place, but the ECC nibbles could be still be packed differently.
Good luck, Nick.

Hi Stefano,
On Wed, Mar 16, 2011 at 4:22 AM, Stefano Babic sbabic@denx.de wrote:
Hi all,
I have seen an incompatibility between the NAND driver in u-boot for the davinci boards and the linux driver (kernel 2.6.38, mainline).
I think it is not related to the specific board I use. In any case, I am using the ea20 board (OMAP-L138 based, in u-boot mainline), and I have added NAND support setting for the driver:
I've noticed some problems with u-boot and 2.6.38 writing to NAND also, here on a da850evm.
I'm reasonably sure that I had a working combination at some point. I'm in the process of investigating also.
I hope we can combine our efforts -- please keep me on the CC here.
Best Regards, Ben Gardiner
--- Nanometrics Inc. http://www.nanometrics.ca

On 03/16/2011 02:05 PM, Ben Gardiner wrote:
Hi Stefano,
Hi Ben,
I hope we can combine our efforts -- please keep me on the CC here.
Sure. I will inform you about my progresses (if any...).
Best regards, Stefano Babic

On 03/16/2011 03:44 PM, Stefano Babic wrote:
Sure. I will inform you about my progresses (if any...).
Solved. It was a misunderstanding about how to set up the NAND driver in linux. I think the usage of the "id" field in the platform device can confuse, as it did for me. I thought the id was used only as index for multiple devices, but the driver takes it as chipselect, even if the value of the chip select itself could be derived from the resource structure.
And then, the value of the chip select is used by the driver to enable the ECC, and as I passed the wrong chip select, the hw ecc was disabled. Even if I could correctly write/read the NAND under Linux.
I have checked the two drivers and I went to the conclusion that they are fully compatible. They do exactly the same things and in the same sequence, no problem at all.
My problem was caused by the fact I have set up the device platform with "id = 1", and this is translate by the driver as chip select = 3. However, on my board chip select 2 is used ;-(
Ben, I have already tested writing from linux, and u-boot can now read the partition correctly. I think there is no incompatibility problem with the mainline kernel.
Best regards, Stefano Babic

Hi Stefano,
On Wed, Mar 16, 2011 at 1:24 PM, Stefano Babic sbabic@denx.de wrote:
[...] Ben, I have already tested writing from linux, and u-boot can now read the partition correctly. I think there is no incompatibility problem with the mainline kernel.
Thanks. I guess that eliminates quite a few possible causes on my end. The current most likely candidate is between the keyboard and the chair.
Best Regards, Ben Gardiner
--- Nanometrics Inc. http://www.nanometrics.ca
participants (3)
-
Ben Gardiner
-
Nick Thompson
-
Stefano Babic