[PATCH] mtd: nand: pxa3xx: Incorrect bitflip return on page read

Once a page is read with higher bitflips all subsequent reads are returning the same bitflip value even though they have none. max_bitflip variable is not being reset to 0 across page reads.
This is causing problems like incorrectly marking erase blocks bad by UBI and causing read failures.
Verified the change with both MTD reads and UBI. This change is inline with other NFC drivers.
Sample error log where a block is marked bad incorrectly:
ubi0: fixable bit-flip detected at PEB 125 ubi0: run torture test for PEB 125 ubi0: fixable bit-flip detected at PEB 125 ubi0 error: torture_peb: read problems on freshly erased PEB 125, must be bad ubi0 error: erase_worker: failed to erase PEB 125, error -5 ubi0: mark PEB 125 as bad
Signed-off-by: rminnikanti rminnikanti@marvell.com --- drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c index 1d9a6d107b..d2a4faad56 100644 --- a/drivers/mtd/nand/raw/pxa3xx_nand.c +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c @@ -800,6 +800,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command) info->ecc_err_cnt = 0; info->ndcb3 = 0; info->need_wait = 0; + /* + * Reset max_bitflips to zero. Once command is complete, + * max_bitflips for this READ is returned in ecc.read_page() + */ + info->max_bitflips = 0;
switch (command) { case NAND_CMD_READ0:

On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti rminnikanti@marvell.com wrote:
Once a page is read with higher bitflips all subsequent reads are returning the same bitflip value even though they have none. max_bitflip variable is not being reset to 0 across page reads.
This is causing problems like incorrectly marking erase blocks bad by UBI and causing read failures.
Verified the change with both MTD reads and UBI. This change is inline with other NFC drivers.
Sample error log where a block is marked bad incorrectly:
ubi0: fixable bit-flip detected at PEB 125 ubi0: run torture test for PEB 125 ubi0: fixable bit-flip detected at PEB 125 ubi0 error: torture_peb: read problems on freshly erased PEB 125, must be bad ubi0 error: erase_worker: failed to erase PEB 125, error -5 ubi0: mark PEB 125 as bad
Signed-off-by: rminnikanti rminnikanti@marvell.com
Looks good to me
Reviewed-by: Chris Packham judge.packham@gmail.com
drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c index 1d9a6d107b..d2a4faad56 100644 --- a/drivers/mtd/nand/raw/pxa3xx_nand.c +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c @@ -800,6 +800,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command) info->ecc_err_cnt = 0; info->ndcb3 = 0; info->need_wait = 0;
/*
* Reset max_bitflips to zero. Once command is complete,
* max_bitflips for this READ is returned in ecc.read_page()
*/
info->max_bitflips = 0; switch (command) { case NAND_CMD_READ0:
-- 2.17.1

On Mon, Apr 29, 2024 at 6:22 PM Chris Packham judge.packham@gmail.com wrote:
On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti rminnikanti@marvell.com wrote:
Once a page is read with higher bitflips all subsequent reads are returning the same bitflip value even though they have none. max_bitflip variable is not being reset to 0 across page reads.
This is causing problems like incorrectly marking erase blocks bad by UBI and causing read failures.
Verified the change with both MTD reads and UBI. This change is inline with other NFC drivers.
Sample error log where a block is marked bad incorrectly:
ubi0: fixable bit-flip detected at PEB 125 ubi0: run torture test for PEB 125 ubi0: fixable bit-flip detected at PEB 125 ubi0 error: torture_peb: read problems on freshly erased PEB 125, must be bad ubi0 error: erase_worker: failed to erase PEB 125, error -5 ubi0: mark PEB 125 as bad
Signed-off-by: rminnikanti rminnikanti@marvell.com
Looks good to me
Reviewed-by: Chris Packham judge.packham@gmail.com
drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c index 1d9a6d107b..d2a4faad56 100644 --- a/drivers/mtd/nand/raw/pxa3xx_nand.c +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c @@ -800,6 +800,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command) info->ecc_err_cnt = 0; info->ndcb3 = 0; info->need_wait = 0;
/*
* Reset max_bitflips to zero. Once command is complete,
* max_bitflips for this READ is returned in ecc.read_page()
*/
info->max_bitflips = 0;
Why this should not be put to 0 in read_page instead on prepare_start_command?
Michael
switch (command) { case NAND_CMD_READ0:
-- 2.17.1

On 4/29/24 09:59, Michael Nazzareno Trimarchi wrote:
On Mon, Apr 29, 2024 at 6:22 PM Chris Packham judge.packham@gmail.com wrote:
On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti rminnikanti@marvell.com wrote:
Once a page is read with higher bitflips all subsequent reads are returning the same bitflip value even though they have none. max_bitflip variable is not being reset to 0 across page reads.
This is causing problems like incorrectly marking erase blocks bad by UBI and causing read failures.
Verified the change with both MTD reads and UBI. This change is inline with other NFC drivers.
Sample error log where a block is marked bad incorrectly:
ubi0: fixable bit-flip detected at PEB 125 ubi0: run torture test for PEB 125 ubi0: fixable bit-flip detected at PEB 125 ubi0 error: torture_peb: read problems on freshly erased PEB 125, must be bad ubi0 error: erase_worker: failed to erase PEB 125, error -5 ubi0: mark PEB 125 as bad
Signed-off-by: rminnikanti rminnikanti@marvell.com
Looks good to me
Reviewed-by: Chris Packham judge.packham@gmail.com
drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c index 1d9a6d107b..d2a4faad56 100644 --- a/drivers/mtd/nand/raw/pxa3xx_nand.c +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c @@ -800,6 +800,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command) info->ecc_err_cnt = 0; info->ndcb3 = 0; info->need_wait = 0;
/*
* Reset max_bitflips to zero. Once command is complete,
* max_bitflips for this READ is returned in ecc.read_page()
*/
info->max_bitflips = 0;
Why this should not be put to 0 in read_page instead on prepare_start_command?
Michael
ecc.read_page is invoked after the read command execution. First chip->cmdfunc is executed with NAND_CMD_READ0 and then ecc.read_page is invoked to read the page from buffer. So, by the time read_page is invoked, info->max_bitflips must already have the bit flip value.
Thanks, Ravi.
switch (command) { case NAND_CMD_READ0:
-- 2.17.1

Hi Ravi
On Tue, Apr 30, 2024 at 6:25 AM Ravi Minnikanti rminnikanti@marvell.com wrote:
On 4/29/24 09:59, Michael Nazzareno Trimarchi wrote:
On Mon, Apr 29, 2024 at 6:22 PM Chris Packham judge.packham@gmail.com wrote:
On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti rminnikanti@marvell.com wrote:
Once a page is read with higher bitflips all subsequent reads are returning the same bitflip value even though they have none. max_bitflip variable is not being reset to 0 across page reads.
This is causing problems like incorrectly marking erase blocks bad by UBI and causing read failures.
Verified the change with both MTD reads and UBI. This change is inline with other NFC drivers.
Sample error log where a block is marked bad incorrectly:
ubi0: fixable bit-flip detected at PEB 125 ubi0: run torture test for PEB 125 ubi0: fixable bit-flip detected at PEB 125 ubi0 error: torture_peb: read problems on freshly erased PEB 125, must be bad ubi0 error: erase_worker: failed to erase PEB 125, error -5 ubi0: mark PEB 125 as bad
Signed-off-by: rminnikanti rminnikanti@marvell.com
Looks good to me
Reviewed-by: Chris Packham judge.packham@gmail.com
drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c index 1d9a6d107b..d2a4faad56 100644 --- a/drivers/mtd/nand/raw/pxa3xx_nand.c +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c @@ -800,6 +800,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command) info->ecc_err_cnt = 0; info->ndcb3 = 0; info->need_wait = 0;
/*
* Reset max_bitflips to zero. Once command is complete,
* max_bitflips for this READ is returned in ecc.read_page()
*/
info->max_bitflips = 0;
Why this should not be put to 0 in read_page instead on prepare_start_command?
Michael
ecc.read_page is invoked after the read command execution. First chip->cmdfunc is executed with NAND_CMD_READ0 and then ecc.read_page is invoked to read the page from buffer. So, by the time read_page is invoked, info->max_bitflips must already have the bit flip value.
All the other implementation has a slight different way to handle.
From what you said the reset should
be done on for NAND_CMD_READ0 command and should be sufficient. Technically should be moved in switch and not unconditionally.
Michael
Thanks, Ravi.
switch (command) { case NAND_CMD_READ0:
-- 2.17.1

On 5/6/24 00:35, Michael Nazzareno Trimarchi wrote:
Hi Ravi
On Tue, Apr 30, 2024 at 6:25 AM Ravi Minnikanti rminnikanti@marvell.com wrote:
On 4/29/24 09:59, Michael Nazzareno Trimarchi wrote:
On Mon, Apr 29, 2024 at 6:22 PM Chris Packham judge.packham@gmail.com wrote:
On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti rminnikanti@marvell.com wrote:
Once a page is read with higher bitflips all subsequent reads are returning the same bitflip value even though they have none. max_bitflip variable is not being reset to 0 across page reads.
This is causing problems like incorrectly marking erase blocks bad by UBI and causing read failures.
Verified the change with both MTD reads and UBI. This change is inline with other NFC drivers.
Sample error log where a block is marked bad incorrectly:
ubi0: fixable bit-flip detected at PEB 125 ubi0: run torture test for PEB 125 ubi0: fixable bit-flip detected at PEB 125 ubi0 error: torture_peb: read problems on freshly erased PEB 125, must be bad ubi0 error: erase_worker: failed to erase PEB 125, error -5 ubi0: mark PEB 125 as bad
Signed-off-by: rminnikanti rminnikanti@marvell.com
Looks good to me
Reviewed-by: Chris Packham judge.packham@gmail.com
drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c index 1d9a6d107b..d2a4faad56 100644 --- a/drivers/mtd/nand/raw/pxa3xx_nand.c +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c @@ -800,6 +800,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command) info->ecc_err_cnt = 0; info->ndcb3 = 0; info->need_wait = 0;
/*
* Reset max_bitflips to zero. Once command is complete,
* max_bitflips for this READ is returned in ecc.read_page()
*/
info->max_bitflips = 0;
Why this should not be put to 0 in read_page instead on prepare_start_command?
Michael
ecc.read_page is invoked after the read command execution. First chip->cmdfunc is executed with NAND_CMD_READ0 and then ecc.read_page is invoked to read the page from buffer. So, by the time read_page is invoked, info->max_bitflips must already have the bit flip value.
All the other implementation has a slight different way to handle. From what you said the reset should be done on for NAND_CMD_READ0 command and should be sufficient. Technically should be moved in switch and not unconditionally.
Michael
max_bitflip is not being reset to 0 across page reads. Once a page is read with higher bitflips all subsequent reads are returning the same bitflip value even though they have none.
This is causing problems like incorrectly marking erase blocks bad by UBI and read failures.
Tested it with both MTD reads and UBI attach. This change is inline with other NFC drivers.
Sample error log where a block is marked bad incorrectly:
ubi0: fixable bit-flip detected at PEB 125 ubi0: run torture test for PEB 125 ubi0: fixable bit-flip detected at PEB 125 ubi0 error: torture_peb: read problems on freshly erased PEB 125, must be bad ubi0 error: erase_worker: failed to erase PEB 125, error -5 ubi0: mark PEB 125 as bad
Signed-off-by: rminnikanti rminnikanti@marvell.com --- drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c index 1d9a6d107b..97f250483f 100644 --- a/drivers/mtd/nand/raw/pxa3xx_nand.c +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c @@ -803,6 +803,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command)
switch (command) { case NAND_CMD_READ0: + /* + * Reset max_bitflips to zero. Once command is complete, + * max_bitflips for this READ is returned in ecc.read_page() + */ + info->max_bitflips = 0; case NAND_CMD_READOOB: case NAND_CMD_PAGEPROG: if (!info->force_raw)

Hi Ravi
On Mon, May 6, 2024 at 7:33 PM Ravi Minnikanti rminnikanti@marvell.com wrote:
On 5/6/24 00:35, Michael Nazzareno Trimarchi wrote:
Hi Ravi
On Tue, Apr 30, 2024 at 6:25 AM Ravi Minnikanti rminnikanti@marvell.com wrote:
On 4/29/24 09:59, Michael Nazzareno Trimarchi wrote:
On Mon, Apr 29, 2024 at 6:22 PM Chris Packham judge.packham@gmail.com wrote:
On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti rminnikanti@marvell.com wrote:
Once a page is read with higher bitflips all subsequent reads are returning the same bitflip value even though they have none. max_bitflip variable is not being reset to 0 across page reads.
This is causing problems like incorrectly marking erase blocks bad by UBI and causing read failures.
Verified the change with both MTD reads and UBI. This change is inline with other NFC drivers.
Sample error log where a block is marked bad incorrectly:
ubi0: fixable bit-flip detected at PEB 125 ubi0: run torture test for PEB 125 ubi0: fixable bit-flip detected at PEB 125 ubi0 error: torture_peb: read problems on freshly erased PEB 125, must be bad ubi0 error: erase_worker: failed to erase PEB 125, error -5 ubi0: mark PEB 125 as bad
Signed-off-by: rminnikanti rminnikanti@marvell.com
Looks good to me
Reviewed-by: Chris Packham judge.packham@gmail.com
drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c index 1d9a6d107b..d2a4faad56 100644 --- a/drivers/mtd/nand/raw/pxa3xx_nand.c +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c @@ -800,6 +800,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command) info->ecc_err_cnt = 0; info->ndcb3 = 0; info->need_wait = 0;
/*
* Reset max_bitflips to zero. Once command is complete,
* max_bitflips for this READ is returned in ecc.read_page()
*/
info->max_bitflips = 0;
Why this should not be put to 0 in read_page instead on prepare_start_command?
Michael
ecc.read_page is invoked after the read command execution. First chip->cmdfunc is executed with NAND_CMD_READ0 and then ecc.read_page is invoked to read the page from buffer. So, by the time read_page is invoked, info->max_bitflips must already have the bit flip value.
All the other implementation has a slight different way to handle. From what you said the reset should be done on for NAND_CMD_READ0 command and should be sufficient. Technically should be moved in switch and not unconditionally.
Michael
max_bitflip is not being reset to 0 across page reads. Once a page is read with higher bitflips all subsequent reads are returning the same bitflip value even though they have none.
This is causing problems like incorrectly marking erase blocks bad by UBI and read failures.
Tested it with both MTD reads and UBI attach. This change is inline with other NFC drivers.
Sample error log where a block is marked bad incorrectly:
ubi0: fixable bit-flip detected at PEB 125 ubi0: run torture test for PEB 125 ubi0: fixable bit-flip detected at PEB 125 ubi0 error: torture_peb: read problems on freshly erased PEB 125, must be bad ubi0 error: erase_worker: failed to erase PEB 125, error -5 ubi0: mark PEB 125 as bad
Signed-off-by: rminnikanti rminnikanti@marvell.com
drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c index 1d9a6d107b..97f250483f 100644 --- a/drivers/mtd/nand/raw/pxa3xx_nand.c +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c @@ -803,6 +803,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command)
switch (command) { case NAND_CMD_READ0:
/*
* Reset max_bitflips to zero. Once command is complete,
* max_bitflips for this READ is returned in ecc.read_page()
*/
info->max_bitflips = 0; case NAND_CMD_READOOB: case NAND_CMD_PAGEPROG: if (!info->force_raw)
-- 2.17.1
Thanks Michael. Fixed it.
Thanks, Ravi.
switch (command) { case NAND_CMD_READ0:
-- 2.17.1
Acked-by: Michael Trimarchi michael@amarulasolutions.com

Hi,
Can you please merge this PR, if there are no more review comments?
Thanks, Ravi On 5/6/24 11:28, Michael Nazzareno Trimarchi wrote:
Hi Ravi
On Mon, May 6, 2024 at 7:33 PM Ravi Minnikanti rminnikanti@marvell.com wrote:
On 5/6/24 00:35, Michael Nazzareno Trimarchi wrote:
Hi Ravi
On Tue, Apr 30, 2024 at 6:25 AM Ravi Minnikanti rminnikanti@marvell.com wrote:
On 4/29/24 09:59, Michael Nazzareno Trimarchi wrote:
On Mon, Apr 29, 2024 at 6:22 PM Chris Packham judge.packham@gmail.com wrote:
On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti rminnikanti@marvell.com wrote: > > Once a page is read with higher bitflips all subsequent reads > are returning the same bitflip value even though they have none. > max_bitflip variable is not being reset to 0 across page reads. > > This is causing problems like incorrectly > marking erase blocks bad by UBI and causing read failures. > > Verified the change with both MTD reads and UBI. > This change is inline with other NFC drivers. > > Sample error log where a block is marked bad incorrectly: > > ubi0: fixable bit-flip detected at PEB 125 > ubi0: run torture test for PEB 125 > ubi0: fixable bit-flip detected at PEB 125 > ubi0 error: torture_peb: read problems on freshly erased PEB 125, > must be bad > ubi0 error: erase_worker: failed to erase PEB 125, error -5 > ubi0: mark PEB 125 as bad > > Signed-off-by: rminnikanti rminnikanti@marvell.com
Looks good to me
Reviewed-by: Chris Packham judge.packham@gmail.com
> --- > drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c > index 1d9a6d107b..d2a4faad56 100644 > --- a/drivers/mtd/nand/raw/pxa3xx_nand.c > +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c > @@ -800,6 +800,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command) > info->ecc_err_cnt = 0; > info->ndcb3 = 0; > info->need_wait = 0; > + /* > + * Reset max_bitflips to zero. Once command is complete, > + * max_bitflips for this READ is returned in ecc.read_page() > + */ > + info->max_bitflips = 0; >
Why this should not be put to 0 in read_page instead on prepare_start_command?
Michael
ecc.read_page is invoked after the read command execution. First chip->cmdfunc is executed with NAND_CMD_READ0 and then ecc.read_page is invoked to read the page from buffer. So, by the time read_page is invoked, info->max_bitflips must already have the bit flip value.
All the other implementation has a slight different way to handle. From what you said the reset should be done on for NAND_CMD_READ0 command and should be sufficient. Technically should be moved in switch and not unconditionally.
Michael
max_bitflip is not being reset to 0 across page reads. Once a page is read with higher bitflips all subsequent reads are returning the same bitflip value even though they have none.
This is causing problems like incorrectly marking erase blocks bad by UBI and read failures.
Tested it with both MTD reads and UBI attach. This change is inline with other NFC drivers.
Sample error log where a block is marked bad incorrectly:
ubi0: fixable bit-flip detected at PEB 125 ubi0: run torture test for PEB 125 ubi0: fixable bit-flip detected at PEB 125 ubi0 error: torture_peb: read problems on freshly erased PEB 125, must be bad ubi0 error: erase_worker: failed to erase PEB 125, error -5 ubi0: mark PEB 125 as bad
Signed-off-by: rminnikanti rminnikanti@marvell.com
drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c index 1d9a6d107b..97f250483f 100644 --- a/drivers/mtd/nand/raw/pxa3xx_nand.c +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c @@ -803,6 +803,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command)
switch (command) { case NAND_CMD_READ0:
/*
* Reset max_bitflips to zero. Once command is complete,
* max_bitflips for this READ is returned in ecc.read_page()
*/
info->max_bitflips = 0; case NAND_CMD_READOOB: case NAND_CMD_PAGEPROG: if (!info->force_raw)
-- 2.17.1
Thanks Michael. Fixed it.
Thanks, Ravi.
> switch (command) { > case NAND_CMD_READ0: > -- > 2.17.1
Acked-by: Michael Trimarchi michael@amarulasolutions.com

Hi Dario
Can you add to next pull?
Michael
On Tue, May 21, 2024, 4:31 PM Ravi Minnikanti rminnikanti@marvell.com wrote:
Hi,
Can you please merge this PR, if there are no more review comments?
Thanks, Ravi On 5/6/24 11:28, Michael Nazzareno Trimarchi wrote:
Hi Ravi
On Mon, May 6, 2024 at 7:33 PM Ravi Minnikanti rminnikanti@marvell.com
wrote:
On 5/6/24 00:35, Michael Nazzareno Trimarchi wrote:
Hi Ravi
On Tue, Apr 30, 2024 at 6:25 AM Ravi Minnikanti <
rminnikanti@marvell.com> wrote:
On 4/29/24 09:59, Michael Nazzareno Trimarchi wrote:
On Mon, Apr 29, 2024 at 6:22 PM Chris Packham <
judge.packham@gmail.com> wrote:
> > On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti <
rminnikanti@marvell.com> wrote:
>> >> Once a page is read with higher bitflips all subsequent reads >> are returning the same bitflip value even though they have none. >> max_bitflip variable is not being reset to 0 across page reads. >> >> This is causing problems like incorrectly >> marking erase blocks bad by UBI and causing read failures. >> >> Verified the change with both MTD reads and UBI. >> This change is inline with other NFC drivers. >> >> Sample error log where a block is marked bad incorrectly: >> >> ubi0: fixable bit-flip detected at PEB 125 >> ubi0: run torture test for PEB 125 >> ubi0: fixable bit-flip detected at PEB 125 >> ubi0 error: torture_peb: read problems on freshly erased PEB 125, >> must be bad >> ubi0 error: erase_worker: failed to erase PEB 125, error -5 >> ubi0: mark PEB 125 as bad >> >> Signed-off-by: rminnikanti rminnikanti@marvell.com > > Looks good to me > > Reviewed-by: Chris Packham judge.packham@gmail.com > >> --- >> drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c
b/drivers/mtd/nand/raw/pxa3xx_nand.c
>> index 1d9a6d107b..d2a4faad56 100644 >> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c >> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c >> @@ -800,6 +800,11 @@ static void prepare_start_command(struct
pxa3xx_nand_info *info, int command)
>> info->ecc_err_cnt = 0; >> info->ndcb3 = 0; >> info->need_wait = 0; >> + /* >> + * Reset max_bitflips to zero. Once command is complete, >> + * max_bitflips for this READ is returned in
ecc.read_page()
>> + */ >> + info->max_bitflips = 0; >>
Why this should not be put to 0 in read_page instead on
prepare_start_command?
Michael
ecc.read_page is invoked after the read command execution. First chip->cmdfunc is executed with NAND_CMD_READ0 and then
ecc.read_page is invoked
to read the page from buffer. So, by the time read_page is invoked,
info->max_bitflips
must already have the bit flip value.
All the other implementation has a slight different way to handle. From what you said the reset should be done on for NAND_CMD_READ0 command and should be sufficient. Technically should be moved in switch and not unconditionally.
Michael
max_bitflip is not being reset to 0 across page reads. Once a page is read with higher bitflips all subsequent reads are returning the same bitflip value even though they have none.
This is causing problems like incorrectly marking erase blocks bad by UBI and read failures.
Tested it with both MTD reads and UBI attach. This change is inline with other NFC drivers.
Sample error log where a block is marked bad incorrectly:
ubi0: fixable bit-flip detected at PEB 125 ubi0: run torture test for PEB 125 ubi0: fixable bit-flip detected at PEB 125 ubi0 error: torture_peb: read problems on freshly erased PEB 125, must be bad ubi0 error: erase_worker: failed to erase PEB 125, error -5 ubi0: mark PEB 125 as bad
Signed-off-by: rminnikanti rminnikanti@marvell.com
drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c
b/drivers/mtd/nand/raw/pxa3xx_nand.c
index 1d9a6d107b..97f250483f 100644 --- a/drivers/mtd/nand/raw/pxa3xx_nand.c +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c @@ -803,6 +803,11 @@ static void prepare_start_command(struct
pxa3xx_nand_info *info, int command)
switch (command) { case NAND_CMD_READ0:
/*
* Reset max_bitflips to zero. Once command is complete,
* max_bitflips for this READ is returned in
ecc.read_page()
*/
info->max_bitflips = 0; case NAND_CMD_READOOB: case NAND_CMD_PAGEPROG: if (!info->force_raw)
-- 2.17.1
Thanks Michael. Fixed it.
Thanks, Ravi.
>> switch (command) { >> case NAND_CMD_READ0: >> -- >> 2.17.1
Acked-by: Michael Trimarchi michael@amarulasolutions.com

Hi
Can this be merged? Let me know if I missed any process.
Thanks, Ravi
On 5/21/24 13:15, Michael Nazzareno Trimarchi wrote:
Hi Dario
Can you add to next pull?
Michael
On Tue, May 21, 2024, 4:31 PM Ravi Minnikanti rminnikanti@marvell.com wrote:
Hi,
Can you please merge this PR, if there are no more review comments?
Thanks, Ravi On 5/6/24 11:28, Michael Nazzareno Trimarchi wrote:
Hi Ravi
On Mon, May 6, 2024 at 7:33 PM Ravi Minnikanti rminnikanti@marvell.com
wrote:
On 5/6/24 00:35, Michael Nazzareno Trimarchi wrote:
Hi Ravi
On Tue, Apr 30, 2024 at 6:25 AM Ravi Minnikanti <
rminnikanti@marvell.com> wrote:
On 4/29/24 09:59, Michael Nazzareno Trimarchi wrote:
>
> On Mon, Apr 29, 2024 at 6:22 PM Chris Packham <
judge.packham@gmail.com> wrote:
>> >> On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti <
rminnikanti@marvell.com> wrote:
>>> >>> Once a page is read with higher bitflips all subsequent reads >>> are returning the same bitflip value even though they have none. >>> max_bitflip variable is not being reset to 0 across page reads. >>> >>> This is causing problems like incorrectly >>> marking erase blocks bad by UBI and causing read failures. >>> >>> Verified the change with both MTD reads and UBI. >>> This change is inline with other NFC drivers. >>> >>> Sample error log where a block is marked bad incorrectly: >>> >>> ubi0: fixable bit-flip detected at PEB 125 >>> ubi0: run torture test for PEB 125 >>> ubi0: fixable bit-flip detected at PEB 125 >>> ubi0 error: torture_peb: read problems on freshly erased PEB 125, >>> must be bad >>> ubi0 error: erase_worker: failed to erase PEB 125, error -5 >>> ubi0: mark PEB 125 as bad >>> >>> Signed-off-by: rminnikanti rminnikanti@marvell.com >> >> Looks good to me >> >> Reviewed-by: Chris Packham judge.packham@gmail.com >> >>> --- >>> drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ >>> 1 file changed, 5 insertions(+) >>> >>> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c
b/drivers/mtd/nand/raw/pxa3xx_nand.c
>>> index 1d9a6d107b..d2a4faad56 100644 >>> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c >>> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c >>> @@ -800,6 +800,11 @@ static void prepare_start_command(struct
pxa3xx_nand_info *info, int command)
>>> info->ecc_err_cnt = 0; >>> info->ndcb3 = 0; >>> info->need_wait = 0; >>> + /* >>> + * Reset max_bitflips to zero. Once command is complete, >>> + * max_bitflips for this READ is returned in
ecc.read_page()
>>> + */ >>> + info->max_bitflips = 0; >>> > > Why this should not be put to 0 in read_page instead on
prepare_start_command?
> > Michael >
ecc.read_page is invoked after the read command execution. First chip->cmdfunc is executed with NAND_CMD_READ0 and then
ecc.read_page is invoked
to read the page from buffer. So, by the time read_page is invoked,
info->max_bitflips
must already have the bit flip value.
All the other implementation has a slight different way to handle. From what you said the reset should be done on for NAND_CMD_READ0 command and should be sufficient. Technically should be moved in switch and not unconditionally.
Michael
max_bitflip is not being reset to 0 across page reads. Once a page is read with higher bitflips all subsequent reads are returning the same bitflip value even though they have none.
This is causing problems like incorrectly marking erase blocks bad by UBI and read failures.
Tested it with both MTD reads and UBI attach. This change is inline with other NFC drivers.
Sample error log where a block is marked bad incorrectly:
ubi0: fixable bit-flip detected at PEB 125 ubi0: run torture test for PEB 125 ubi0: fixable bit-flip detected at PEB 125 ubi0 error: torture_peb: read problems on freshly erased PEB 125, must be bad ubi0 error: erase_worker: failed to erase PEB 125, error -5 ubi0: mark PEB 125 as bad
Signed-off-by: rminnikanti rminnikanti@marvell.com
drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c
b/drivers/mtd/nand/raw/pxa3xx_nand.c
index 1d9a6d107b..97f250483f 100644 --- a/drivers/mtd/nand/raw/pxa3xx_nand.c +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c @@ -803,6 +803,11 @@ static void prepare_start_command(struct
pxa3xx_nand_info *info, int command)
switch (command) { case NAND_CMD_READ0:
/*
* Reset max_bitflips to zero. Once command is complete,
* max_bitflips for this READ is returned in
ecc.read_page()
*/
info->max_bitflips = 0; case NAND_CMD_READOOB: case NAND_CMD_PAGEPROG: if (!info->force_raw)
-- 2.17.1
Thanks Michael. Fixed it.
Thanks, Ravi.
>>> switch (command) { >>> case NAND_CMD_READ0: >>> -- >>> 2.17.1 >
Acked-by: Michael Trimarchi michael@amarulasolutions.com
participants (3)
-
Chris Packham
-
Michael Nazzareno Trimarchi
-
Ravi Minnikanti