Re: The contradictory nature of spl_nand_fit_read

4 Apr 2022

      On Mon, Apr 4, 2022 at 9:26 AM Sean Anderson sean.anderson@seco.com wrote:
...
On 4/3/22 11:56 AM, Dario Binacchi wrote:
...
Hi Sean,
...
Il 01/04/2022 23:43 Michael Nazzareno Trimarchi michael@amarulasolutions.com ha scritto:
Hi Sean
On Fri, Apr 1, 2022 at 8:53 PM Sean Anderson sean.anderson@seco.com wrote:
...
On 4/1/22 2:46 PM, Sean Anderson wrote:
...
Hi all,
I don't understand how spl_nand_fit_read is supposed to work. This
function has been seemingly hacked together by multiple people over the
years, and it (presumably) (somehow) works despite mass confusion of
units. I tried to do some refactoring, but the contradictions make it
very difficult to determine what is a bug and what is intentional.
Lets start with the basics. spl_nand_fit_read is used to implement
spl_load_info.load. I've reproduced the documentation for this function
below:
...
read() - Read from device
@load: Information about the load state
 @sector: Sector number to read from (each @load->bl_len bytes)
 @count: Number of sectors to read
 @buf: Buffer to read into
 @return number of sectors read, 0 on error
In particular, both @sector and @count are in units of load.bl_len. In
addition, the return value should be @count on success.
...
static ulong spl_nand_fit_read(struct spl_load_info *load, ulong offs,
                            ulong size, void *dst)
{
     int err;
#ifdef CONFIG_SYS_NAND_BLOCK_SIZE
The following logic was introduced in commit 9f6a14c47f ("spl: fit:
nand: fix fit loading in case of bad blocks"). However, at the time it
was not guarded by the above ifdef. nand_spl_adjust_offset is not
defined for all nand drivers. It is defined for at least mxs, am335x,
atmel, and simple. Additionally, some drivers (such as rockchip)
implement bad block skipping in nand_spl_load_image.
It's not at all clear to me that there is common config which determines
whether nand_spl_adjust_offset is implemented. Nevertheless, in commit
aa0032f672 ("spl: fit: nand: allow for non-page-aligned elements"), the
above directive selects different code if CONFIG_SYS_NAND_BLOCK_SIZE is
defined.
...
 ulong sector;

 sector = *(int *)load->priv;

This is the offset passed to nand_spl_load_image. This
offset is in units of bytes, and is aligned to the block size. However,
it is *not* set if CONFIG_SPL_LOAD_IMX_CONTAINER is enabled. Oops.
...
 offs = sector + nand_spl_adjust_offset(sector, offs - sector);

I forgot to mention this, but this also adds sector to offs, but offs already includes the sector:
...
return spl_load_simple_fit(spl_image, &load, offset / bl_len, header);
The spl_nand_fit_read was called from info->read that was in units of sectors
count = info->read(info, sector, sectors, fit);
debug("fit read sector %lx, sectors=%d, dst=%p, count=%lu, size=0x%lx\n",
              sector, sectors, fit, count, size);
load.dev = NULL;
load.priv = &offset;
load.filename = NULL;
load.bl_len = 1;
Ah, here is the problem. If bl_len is always 1, then there is no problem
going back and forth between units. But aa0032f672 ("spl: fit: nand: allow
for non-page-aligned elements") means that bl_len is not always 1, so
now the conversion matters.
So I suppose my question is now for Tim: how did you test your patch?
Hi Sean,
I maintain a set of boards using gwventana_nand_defconfg that have
various NAND parts.
Yes, I vaguely recall running into problems when moving my boards to
DM. It was some time ago but if I recall I found
nand_spl_adjust_offset didn't exist for MXS and when I noticed the
requirement for it was introduced in 9f6a14c47ff9 ("spl: fit: nand:
fix fit loading in case of bad blocks") I looked at that patch and
noticed it added the notion of sectors which lead me to submitting
39cb85043cdb "(spl: fit: nand: skip bad block handling if NAND chip
not fully defined)" which added the ifdef check around it to
essentially move it back to bytes.
As far as testing goes I test with IMX6 boards using NAND as a boot
device with board/gateworks/gw_ventana and gwventana_nand_defconfig
With some debugging added my boot looks like this:
Booting from NAND
PMIC:  PFUZE100 ID=0x10
Trying to boot from NAND
spl: nand - using hw ecc
spl_nand_load_element bl_len=0xe00000 offset=0x1000 mtd=18200268
Found FIT
spl_nand_fit_read offset=0xe00 size=0x3 bl_len=0x1000
DTB:   imx6q-gw54xx
spl_nand_fit_read offset=0xe02 size=0xca bl_len=0x1000
spl_nand_fit_read offset=0xf07 size=0xc bl_len=0x1000
For my config CONFIG_SYS_NAND_BLOCK_SIZE is not defined because the
boards supported use a variety of devices:
2GiB Micron Technology MT29F16G08ADACAH4 02120768 (4k page size, 256k
erase size, 224 oob size, legacy layout ecc str=16)
2GiB Cypress S34ML16G202BHI000 02120939 (2k page size, 128k erase
size, 128B oob size, legacy layout ecc str=16)
1GiB Micron MT29F8G08ABACAH4 02120782 (4k page size, 156k erase size,
224 oob size, legacy layout ecc str=16)
256MiB Micron MT29F2G08ABAEAH4 02120770 (2k page size, 128k erase
size, 64B oob size, legacy layout ecc str=16)
So maybe the confusion is all around CONFIG_SYS_NAND_BLOCK_SIZE?
I appreciate you trying to unwind and clean all of this up. I recall
being thoroughly confused when it came time for me to move my boards
to DM and there were so many changes to unravel.
Best Regards,
Tim
...
...
...
load.read = spl_nand_fit_read;
static ulong spl_nand_fit_read(struct spl_load_info *load, ulong offs,
                               ulong size, void *dst)
{
        ulong sector;
        int ret;
    sector = *(int *)load->priv;
    offs = sector + nand_spl_adjust_offset(sector, offs - sector);

This line is ONLY ok if bl_len = 1
...
...
    ret = nand_spl_load_image(offs, size, dst);
    if (!ret)
            return size;
    else
            return 0;

}
Sorry it's a bit of time and I think that we are discussing about this
commit at the time was introduced
Michael
...
Which means that we add in sector twice (but with different units!)
...
The documentation for this this function is as follows
...
nand_spl_adjust_offset - Adjust offset from a starting sector
@sector:     Address of the sector
@offs:       Offset starting from @sector
If one or more bad blocks are in the address space between @sector
and @sector + @offs, @offs is increased by the NAND block size for
each bad block found.
In particular, note that offs is in units of bytes. However, we know
from above that at this point in the function, offs is in units of
bl_len. Oops.
...
#else
     offs *= load->bl_len;
     size *= load->bl_len;
This code path adjusts both offs and size to be in units of
*units of bytes
...
...
#endif
     err = nand_spl_load_image(offs, size, dst);
So by the time we get here, one code path is in units of bytes, and the
other is in units of bl_len. nand_spl_load_image seems to expect bytes
(though there is no documentation, so I can't be sure).  Which of course
begs the question: how did the code introduced in 9f6a14c47f ever work?
The commit 9f6a14c47f only added the code strictly necessary to skip a bad block
that was in that memory area. I heavily tested it on a custom board that presented
this problem and that without it it would not have been usable. The patch itself
was also tested on SOCs that did not have bad block, therefore backward compatible.
I think that your doubts arise from the patches following mine and that, however,
I have not had the opportunity to test.
Have you tested v2021.07 or later?
--Sean
...
...
...
...
...
 if (err)
         return 0;

 return size / load->bl_len;

And of course, this should only be done if size is in units of bytes.
...
}
I have some patches for what I think are bugs, but the logic of this
function is so confused that I am unable to determine if they actually
are bugs. I would greatly appreciate if some of the authors of the
mentioned commits could comment on their intent, and what they thought
the units of offs and size were.
--Sean

Re: The contradictory nature of spl_nand_fit_read

Tim Harvey