[U-Boot] [RFC PATCH 0/1] bug writing to NAND 1 byte less than page size

Hi,
I found this bug on v2015.04 but I guess it happens also in v2016.07 as I believe it has been there since 2007. It is so strange that nobody caught this before that I fear I'm doing something wrong or not taking something into account on the suggested fix, so I'd like to know if others can reproduce it and if the fix looks correct.
I reproduced it on an i.MX6UL platform and on an i.MX28 platform but it is a bug on the NAND core driver, so it should be reproducible with any architecture. In fact, it is reproducible with any size, as long as it is a multiple of the page size, less one byte.
Thanks in advance for your comments.
Hector Palacios (1): mtd: nand: fix bug writing 1 byte less than page size
drivers/mtd/nand/nand_base.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)

nand_do_write_ops() determines if it is writing a partial page with the formula: part_pagewr = (column || writelen < (mtd->writesize - 1))
When 'writelen' is exactly 1 byte less than the NAND page size the formula equates to zero, so the code doesn't process it as a partial write, although it should. As a consequence the function remains in the while(1) loop with 'writelen' becoming 0xffffffff and iterating until the watchdog timeout triggers.
To reproduce the issue on a NAND with 2K page (0x800): => nand erase.part <partition> => nand write $loadaddr <partition> 7ff
Signed-off-by: Hector Palacios hector.palacios@digi.com --- drivers/mtd/nand/nand_base.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c index 689716753ae6..c8be74849e56 100644 --- a/drivers/mtd/nand/nand_base.c +++ b/drivers/mtd/nand/nand_base.c @@ -877,7 +877,7 @@ static int nand_wait(struct mtd_info *mtd, struct nand_chip *chip)
u32 timer = (CONFIG_SYS_HZ * timeo) / 1000; u32 time_start; - + time_start = get_timer(0); while (get_timer(time_start) < timer) { if (chip->dev_ready) { @@ -2411,7 +2411,7 @@ static int nand_do_write_ops(struct mtd_info *mtd, loff_t to, int cached = writelen > bytes && page != blockmask; uint8_t *wbuf = buf; int use_bufpoi; - int part_pagewr = (column || writelen < (mtd->writesize - 1)); + int part_pagewr = (column || writelen < mtd->writesize);
if (part_pagewr) use_bufpoi = 1;

On Fri, 2016-07-15 at 09:45 +0200, Hector Palacios wrote:
nand_do_write_ops() determines if it is writing a partial page with the formula: part_pagewr = (column || writelen < (mtd->writesize - 1))
When 'writelen' is exactly 1 byte less than the NAND page size the formula equates to zero, so the code doesn't process it as a partial write, although it should. As a consequence the function remains in the while(1) loop with 'writelen' becoming 0xffffffff and iterating until the watchdog timeout triggers.
To reproduce the issue on a NAND with 2K page (0x800): => nand erase.part <partition> => nand write $loadaddr <partition> 7ff
Signed-off-by: Hector Palacios hector.palacios@digi.com
drivers/mtd/nand/nand_base.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c index 689716753ae6..c8be74849e56 100644 --- a/drivers/mtd/nand/nand_base.c +++ b/drivers/mtd/nand/nand_base.c @@ -877,7 +877,7 @@ static int nand_wait(struct mtd_info *mtd, struct nand_chip *chip) u32 timer = (CONFIG_SYS_HZ * timeo) / 1000; u32 time_start; -
time_start = get_timer(0); while (get_timer(time_start) < timer) { if (chip->dev_ready) {
This change is unrelated.
@@ -2411,7 +2411,7 @@ static int nand_do_write_ops(struct mtd_info *mtd, loff_t to, int cached = writelen > bytes && page != blockmask; uint8_t *wbuf = buf; int use_bufpoi;
int part_pagewr = (column || writelen < (mtd->writesize -
1));
int part_pagewr = (column || writelen < mtd->writesize);
if (part_pagewr) use_bufpoi = 1;
Could you send a non-RFC patch? This also needs to get fixed in Linux.
-Scott

On 07/15/2016 08:49 PM, Scott Wood wrote:
On Fri, 2016-07-15 at 09:45 +0200, Hector Palacios wrote:
nand_do_write_ops() determines if it is writing a partial page with the formula: part_pagewr = (column || writelen < (mtd->writesize - 1))
When 'writelen' is exactly 1 byte less than the NAND page size the formula equates to zero, so the code doesn't process it as a partial write, although it should. As a consequence the function remains in the while(1) loop with 'writelen' becoming 0xffffffff and iterating until the watchdog timeout triggers.
To reproduce the issue on a NAND with 2K page (0x800): => nand erase.part <partition> => nand write $loadaddr <partition> 7ff
Signed-off-by: Hector Palacios hector.palacios@digi.com
drivers/mtd/nand/nand_base.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c index 689716753ae6..c8be74849e56 100644 --- a/drivers/mtd/nand/nand_base.c +++ b/drivers/mtd/nand/nand_base.c @@ -877,7 +877,7 @@ static int nand_wait(struct mtd_info *mtd, struct nand_chip *chip)
u32 timer = (CONFIG_SYS_HZ * timeo) / 1000; u32 time_start;
- time_start = get_timer(0); while (get_timer(time_start) < timer) { if (chip->dev_ready) {
This change is unrelated.
Ok. Removed.
@@ -2411,7 +2411,7 @@ static int nand_do_write_ops(struct mtd_info *mtd, loff_t to, int cached = writelen > bytes && page != blockmask; uint8_t *wbuf = buf; int use_bufpoi;
int part_pagewr = (column || writelen < (mtd->writesize -
1));
int part_pagewr = (column || writelen < mtd->writesize);
if (part_pagewr) use_bufpoi = 1;
Could you send a non-RFC patch? This also needs to get fixed in Linux.
Done. I will prepare a patch for Linux, but do you know how to reproduce it in Linux? 'nandwrite' checks that the file size is a multiple of the page size or else it will ask you to force padding, and 'dd' over the mtdblock device is also rounding up the write size to a page size multiple. Maybe that's the reason why it went undetected for so long.
-- Hector Palacios
participants (2)
-
Hector Palacios
-
Scott Wood