
On Mon, Apr 4, 2022 at 9:26 AM Sean Anderson sean.anderson@seco.com wrote:
On 4/3/22 11:56 AM, Dario Binacchi wrote:
Hi Sean,
Il 01/04/2022 23:43 Michael Nazzareno Trimarchi michael@amarulasolutions.com ha scritto:
Hi Sean
On Fri, Apr 1, 2022 at 8:53 PM Sean Anderson sean.anderson@seco.com wrote:
On 4/1/22 2:46 PM, Sean Anderson wrote:
Hi all,
I don't understand how spl_nand_fit_read is supposed to work. This function has been seemingly hacked together by multiple people over the years, and it (presumably) (somehow) works despite mass confusion of units. I tried to do some refactoring, but the contradictions make it very difficult to determine what is a bug and what is intentional.
Lets start with the basics. spl_nand_fit_read is used to implement spl_load_info.load. I've reproduced the documentation for this function below:
read() - Read from device
@load: Information about the load state @sector: Sector number to read from (each @load->bl_len bytes) @count: Number of sectors to read @buf: Buffer to read into @return number of sectors read, 0 on error
In particular, both @sector and @count are in units of load.bl_len. In addition, the return value should be @count on success.
static ulong spl_nand_fit_read(struct spl_load_info *load, ulong offs, ulong size, void *dst) { int err; #ifdef CONFIG_SYS_NAND_BLOCK_SIZE
The following logic was introduced in commit 9f6a14c47f ("spl: fit: nand: fix fit loading in case of bad blocks"). However, at the time it was not guarded by the above ifdef. nand_spl_adjust_offset is not defined for all nand drivers. It is defined for at least mxs, am335x, atmel, and simple. Additionally, some drivers (such as rockchip) implement bad block skipping in nand_spl_load_image.
It's not at all clear to me that there is common config which determines whether nand_spl_adjust_offset is implemented. Nevertheless, in commit aa0032f672 ("spl: fit: nand: allow for non-page-aligned elements"), the above directive selects different code if CONFIG_SYS_NAND_BLOCK_SIZE is defined.
ulong sector; sector = *(int *)load->priv;
This is the offset passed to nand_spl_load_image. This offset is in units of bytes, and is aligned to the block size. However, it is *not* set if CONFIG_SPL_LOAD_IMX_CONTAINER is enabled. Oops.
offs = sector + nand_spl_adjust_offset(sector, offs - sector);
I forgot to mention this, but this also adds sector to offs, but offs already includes the sector:
return spl_load_simple_fit(spl_image, &load, offset / bl_len, header);
The spl_nand_fit_read was called from info->read that was in units of sectors count = info->read(info, sector, sectors, fit); debug("fit read sector %lx, sectors=%d, dst=%p, count=%lu, size=0x%lx\n", sector, sectors, fit, count, size);
load.dev = NULL; load.priv = &offset; load.filename = NULL; load.bl_len = 1;
Ah, here is the problem. If bl_len is always 1, then there is no problem going back and forth between units. But aa0032f672 ("spl: fit: nand: allow for non-page-aligned elements") means that bl_len is not always 1, so now the conversion matters.
So I suppose my question is now for Tim: how did you test your patch?
Hi Sean,
I maintain a set of boards using gwventana_nand_defconfg that have various NAND parts.
Yes, I vaguely recall running into problems when moving my boards to DM. It was some time ago but if I recall I found nand_spl_adjust_offset didn't exist for MXS and when I noticed the requirement for it was introduced in 9f6a14c47ff9 ("spl: fit: nand: fix fit loading in case of bad blocks") I looked at that patch and noticed it added the notion of sectors which lead me to submitting 39cb85043cdb "(spl: fit: nand: skip bad block handling if NAND chip not fully defined)" which added the ifdef check around it to essentially move it back to bytes.
As far as testing goes I test with IMX6 boards using NAND as a boot device with board/gateworks/gw_ventana and gwventana_nand_defconfig
With some debugging added my boot looks like this: Booting from NAND PMIC: PFUZE100 ID=0x10 Trying to boot from NAND spl: nand - using hw ecc spl_nand_load_element bl_len=0xe00000 offset=0x1000 mtd=18200268 Found FIT spl_nand_fit_read offset=0xe00 size=0x3 bl_len=0x1000 DTB: imx6q-gw54xx spl_nand_fit_read offset=0xe02 size=0xca bl_len=0x1000 spl_nand_fit_read offset=0xf07 size=0xc bl_len=0x1000
For my config CONFIG_SYS_NAND_BLOCK_SIZE is not defined because the boards supported use a variety of devices: 2GiB Micron Technology MT29F16G08ADACAH4 02120768 (4k page size, 256k erase size, 224 oob size, legacy layout ecc str=16) 2GiB Cypress S34ML16G202BHI000 02120939 (2k page size, 128k erase size, 128B oob size, legacy layout ecc str=16) 1GiB Micron MT29F8G08ABACAH4 02120782 (4k page size, 156k erase size, 224 oob size, legacy layout ecc str=16) 256MiB Micron MT29F2G08ABAEAH4 02120770 (2k page size, 128k erase size, 64B oob size, legacy layout ecc str=16)
So maybe the confusion is all around CONFIG_SYS_NAND_BLOCK_SIZE?
I appreciate you trying to unwind and clean all of this up. I recall being thoroughly confused when it came time for me to move my boards to DM and there were so many changes to unravel.
Best Regards,
Tim
load.read = spl_nand_fit_read;
static ulong spl_nand_fit_read(struct spl_load_info *load, ulong offs, ulong size, void *dst) { ulong sector; int ret;
sector = *(int *)load->priv; offs = sector + nand_spl_adjust_offset(sector, offs - sector);
This line is ONLY ok if bl_len = 1
ret = nand_spl_load_image(offs, size, dst); if (!ret) return size; else return 0;
}
Sorry it's a bit of time and I think that we are discussing about this commit at the time was introduced Michael
Which means that we add in sector twice (but with different units!)
The documentation for this this function is as follows
nand_spl_adjust_offset - Adjust offset from a starting sector @sector: Address of the sector @offs: Offset starting from @sector
If one or more bad blocks are in the address space between @sector and @sector + @offs, @offs is increased by the NAND block size for each bad block found.
In particular, note that offs is in units of bytes. However, we know from above that at this point in the function, offs is in units of bl_len. Oops.
#else offs *= load->bl_len; size *= load->bl_len;
This code path adjusts both offs and size to be in units of
*units of bytes
#endif err = nand_spl_load_image(offs, size, dst);
So by the time we get here, one code path is in units of bytes, and the other is in units of bl_len. nand_spl_load_image seems to expect bytes (though there is no documentation, so I can't be sure). Which of course begs the question: how did the code introduced in 9f6a14c47f ever work?
The commit 9f6a14c47f only added the code strictly necessary to skip a bad block that was in that memory area. I heavily tested it on a custom board that presented this problem and that without it it would not have been usable. The patch itself was also tested on SOCs that did not have bad block, therefore backward compatible. I think that your doubts arise from the patches following mine and that, however, I have not had the opportunity to test.
Have you tested v2021.07 or later?
--Sean
if (err) return 0; return size / load->bl_len;
And of course, this should only be done if size is in units of bytes.
}
I have some patches for what I think are bugs, but the logic of this function is so confused that I am unable to determine if they actually are bugs. I would greatly appreciate if some of the authors of the mentioned commits could comment on their intent, and what they thought the units of offs and size were.
--Sean