[U-Boot] [PATCH] mem_mtest: bail out after finding 1st memory error.

The basic memtest function tries to watch for ^C after each pattern pass as an escape mechanism, but if things are horribly wrong, we'll be stuck in an inner loop flooding the console with error messages and never check for ^C. To make matters worse, if the user waits for all the error messages to complete, we then incorrectly report the test passed without errors.
By inspecting the code, it is clear that the test was originally written with returning after the 1st error in mind (which is what the optional more extensive test does). Making it do this also solves the endless console flood problem if a person tests really bad RAM.
Signed-off-by: Paul Gortmaker paul.gortmaker@windriver.com --- common/cmd_mem.c | 5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/common/cmd_mem.c b/common/cmd_mem.c index 9850800..abcd8fd 100644 --- a/common/cmd_mem.c +++ b/common/cmd_mem.c @@ -631,7 +631,6 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) vu_long *addr, *start, *end; ulong val; ulong readback; - int rcode = 0; int iterations = 1; int iteration_limit;
@@ -923,7 +922,7 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) printf ("\nMem error @ 0x%08X: " "found %08lX, expected %08lX\n", (uint)addr, readback, val); - rcode = 1; + return 1; } val += incr; } @@ -943,7 +942,7 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) incr = -incr; } #endif - return rcode; + return 0; }

Dear Paul Gortmaker,
In message 1254338488-15332-1-git-send-email-paul.gortmaker@windriver.com you wrote:
The basic memtest function tries to watch for ^C after each pattern pass as an escape mechanism, but if things are horribly wrong, we'll be stuck in an inner loop flooding the console with error messages and never check for ^C. To make matters worse, if the user waits for all the error messages to complete, we then incorrectly report the test passed without errors.
By inspecting the code, it is clear that the test was originally written with returning after the 1st error in mind (which is what the optional more extensive test does). Making it do this also solves the endless console flood problem if a person tests really bad RAM.
Please don't change the behaviour, rather fix the problems with it.
If you like, please feel free to add code to bail out after a number of errors, but that should be optional (for example using an additional argument on the command line).
Best regards,
Wolfgang Denk

Wolfgang Denk wrote:
Dear Paul Gortmaker,
In message 1254338488-15332-1-git-send-email-paul.gortmaker@windriver.com you wrote:
The basic memtest function tries to watch for ^C after each pattern pass as an escape mechanism, but if things are horribly wrong, we'll be stuck in an inner loop flooding the console with error messages and never check for ^C. To make matters worse, if the user waits for all the error messages to complete, we then incorrectly report the test passed without errors.
By inspecting the code, it is clear that the test was originally written with returning after the 1st error in mind (which is what the optional more extensive test does). Making it do this also solves the endless console flood problem if a person tests really bad RAM.
Please don't change the behaviour, rather fix the problems with it.
If you like, please feel free to add code to bail out after a number of errors, but that should be optional (for example using an additional argument on the command line).
I agree in principle, and I'd actually 1st created a patch that watched for ^C in the inner loop. But the more I looked at the code, the more I felt that the original intention of the code was in fact the "new" behaviour.
For example, the CONFIG_SYS_ALT_MEMTEST contains:
printf ("\nFAILURE: ....); return 1;
in several places throughout the test. And in the default test, the code has:
if (iteration_limit && iterations > iteration_limit) { printf("Tested %d iteration(s) without errors.\n", iterations-1); return 0; }
i.e. there was never any provision for checking the rcode variable or counting the errors -- it assumed that if it ran the full iteration count, then there were no errors.
If you still think it is best to maintain current behaviour and not stop after the 1st error, that is fine, I can do that, but I just wanted to be sure it was clear why I did it this way.
Thanks, Paul.
Best regards,
Wolfgang Denk

Dear Paul Gortmaker,
In message 4AC3C540.9050004@windriver.com you wrote:
If you still think it is best to maintain current behaviour and not stop after the 1st error, that is fine, I can do that, but I just wanted to be sure it was clear why I did it this way.
I have used the code many times (well, to be honest, not sooo many times, but several times) exactly that way: letting it run forever (or, for a long time), while manipulating the hardware (like using a hair dryer resp. cooling spray on it). In such a situation it is very useful when the code does _not_ terminate after the first error (even is this might have been the intention in earlier versions).
So beause (1) it is the behaviour users might be used to, (2) I see use cases for this and (3) adding a new option will allow to have both beheaviours so anybody can chose what he wants, I think we should do as I suggested.
Best regards,
Wolfgang Denk

Wolfgang Denk wrote:
Dear Paul Gortmaker,
In message 4AC3C540.9050004@windriver.com you wrote:
If you still think it is best to maintain current behaviour and not stop after the 1st error, that is fine, I can do that, but I just wanted to be sure it was clear why I did it this way.
I have used the code many times (well, to be honest, not sooo many times, but several times) exactly that way: letting it run forever (or, for a long time), while manipulating the hardware (like using a hair dryer resp. cooling spray on it). In such a situation it is very useful when the code does _not_ terminate after the first error (even is this might have been the intention in earlier versions).
Definitely a valid use case. Hopefully one I never have to use personally, mind you.
So beause (1) it is the behaviour users might be used to, (2) I see use cases for this and (3) adding a new option will allow to have both beheaviours so anybody can chose what he wants, I think we should do as I suggested.
OK. I can do that. What about the CONFIG_ALT_MEMTEST then? Should it be changed to run continuously as well, so at least the two tests are consistent in their default behaviours?
Paul.
Best regards,
Wolfgang Denk

Dear Paul,
in message 4AC4B612.7020306@windriver.com you wrote:
So beause (1) it is the behaviour users might be used to, (2) I see use cases for this and (3) adding a new option will allow to have both beheaviours so anybody can chose what he wants, I think we should do as I suggested.
OK. I can do that. What about the CONFIG_ALT_MEMTEST then? Should it be changed to run continuously as well, so at least the two tests are consistent in their default behaviours?
Yes, I guess that would be best. Thanks.
Best regards,
Wolfgang Denk

The basic memtest function tries to watch for ^C after each pattern pass as an escape mechanism, but if things are horribly wrong, we'll be stuck in an inner loop flooding the console with error messages and never check for ^C. To make matters worse, if the user waits for all the error messages to complete, we then incorrectly report the test passed without errors.
Adding a check for ^C after any error is printed will give the end user an escape mechanism from a console flood without slowing down the overall test speed on a slow processor.
Also, the more extensive memtest quit after just a single error, which is inconsistent with the normal memtest, and not useful if if you are doing dynamic environmental impact testing, such as heating/cooling etc.
Both tests now track the error count and report it properly at test completion.
Signed-off-by: Paul Gortmaker paul.gortmaker@windriver.com --- common/cmd_mem.c | 54 ++++++++++++++++++++++++++++++++++++++++++------------ 1 files changed, 42 insertions(+), 12 deletions(-)
diff --git a/common/cmd_mem.c b/common/cmd_mem.c index 9850800..e1a7964 100644 --- a/common/cmd_mem.c +++ b/common/cmd_mem.c @@ -631,7 +631,7 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) vu_long *addr, *start, *end; ulong val; ulong readback; - int rcode = 0; + ulong errs = 0; int iterations = 1; int iteration_limit;
@@ -698,8 +698,8 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[])
if (iteration_limit && iterations > iteration_limit) { - printf("Tested %d iteration(s) without errors.\n", - iterations-1); + printf("Tested %d iteration(s) with %lu errors.\n", + iterations-1, errs); return 0; }
@@ -732,9 +732,14 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) *dummy = ~val; /* clear the test data off of the bus */ readback = *addr; if(readback != val) { - printf ("FAILURE (data line): " + printf ("FAILURE (data line): " "expected %08lx, actual %08lx\n", val, readback); + errs++; + if (ctrlc()) { + putc ('\n'); + return 1; + } } *addr = ~val; *dummy = val; @@ -743,6 +748,11 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) printf ("FAILURE (data line): " "Is %08lx, should be %08lx\n", readback, ~val); + errs++; + if (ctrlc()) { + putc ('\n'); + return 1; + } } } } @@ -808,7 +818,11 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) printf ("\nFAILURE: Address bit stuck high @ 0x%.8lx:" " expected 0x%.8lx, actual 0x%.8lx\n", (ulong)&start[offset], pattern, temp); - return 1; + errs++; + if (ctrlc()) { + putc ('\n'); + return 1; + } } } start[test_offset] = pattern; @@ -826,7 +840,11 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) printf ("\nFAILURE: Address bit stuck low or shorted @" " 0x%.8lx: expected 0x%.8lx, actual 0x%.8lx\n", (ulong)&start[offset], pattern, temp); - return 1; + errs++; + if (ctrlc()) { + putc ('\n'); + return 1; + } } } start[test_offset] = pattern; @@ -864,7 +882,11 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) printf ("\nFAILURE (read/write) @ 0x%.8lx:" " expected 0x%.8lx, actual 0x%.8lx)\n", (ulong)&start[offset], pattern, temp); - return 1; + errs++; + if (ctrlc()) { + putc ('\n'); + return 1; + } }
anti_pattern = ~pattern; @@ -882,7 +904,11 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) printf ("\nFAILURE (read/write): @ 0x%.8lx:" " expected 0x%.8lx, actual 0x%.8lx)\n", (ulong)&start[offset], anti_pattern, temp); - return 1; + errs++; + if (ctrlc()) { + putc ('\n'); + return 1; + } } start[offset] = 0; } @@ -897,8 +923,8 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) }
if (iteration_limit && iterations > iteration_limit) { - printf("Tested %d iteration(s) without errors.\n", - iterations-1); + printf("Tested %d iteration(s) with %lu errors.\n", + iterations-1, errs); return 0; } ++iterations; @@ -923,7 +949,11 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) printf ("\nMem error @ 0x%08X: " "found %08lX, expected %08lX\n", (uint)addr, readback, val); - rcode = 1; + errs++; + if (ctrlc()) { + putc ('\n'); + return 1; + } } val += incr; } @@ -943,7 +973,7 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) incr = -incr; } #endif - return rcode; + return 0; }

On Thursday 01 October 2009 19:52:27 Paul Gortmaker wrote:
if (iteration_limit && iterations > iteration_limit) {
printf("Tested %d iteration(s) without errors.\n",
iterations-1);
printf("Tested %d iteration(s) with %lu errors.\n",
iterations-1, errs); return 0;
if you're showing the errs variable, then presumably it could possibly be non- zero, so you wouldnt want to return 0 right ? return !!errs;
char *argv[]) incr = -incr; } #endif
- return rcode;
- return 0;
i dont think you want to return 0 all the time here right ? return !!errs;
otherwise, the basic ^C handling is something that has annoyed me in the past, so acked-by for that :) -mike

Mike Frysinger wrote:
On Thursday 01 October 2009 19:52:27 Paul Gortmaker wrote:
if (iteration_limit && iterations > iteration_limit) {
printf("Tested %d iteration(s) without errors.\n",
iterations-1);
printf("Tested %d iteration(s) with %lu errors.\n",
iterations-1, errs); return 0;
if you're showing the errs variable, then presumably it could possibly be non- zero, so you wouldnt want to return 0 right ? return !!errs;
char *argv[]) incr = -incr; } #endif
- return rcode;
- return 0;
i dont think you want to return 0 all the time here right ? return !!errs;
Doh! I had it in my mind to "return errs!=0;" and then forgot.
Thanks, I'll respin and resend tomorrow. Paul.
otherwise, the basic ^C handling is something that has annoyed me in the past, so acked-by for that :) -mike

The basic memtest function tries to watch for ^C after each pattern pass as an escape mechanism, but if things are horribly wrong, we'll be stuck in an inner loop flooding the console with error messages and never check for ^C. To make matters worse, if the user waits for all the error messages to complete, we then incorrectly report the test passed without errors.
Adding a check for ^C after any error is printed will give the end user an escape mechanism from a console flood without slowing down the overall test speed on a slow processor.
Also, the more extensive memtest quit after just a single error, which is inconsistent with the normal memtest, and not useful if if you are doing dynamic environmental impact testing, such as heating/cooling etc.
Both tests now track the error count and report it properly at test completion.
Signed-off-by: Paul Gortmaker paul.gortmaker@windriver.com ---
Changes: fixed return values since prev. version.
common/cmd_mem.c | 58 ++++++++++++++++++++++++++++++++++++++++------------- 1 files changed, 44 insertions(+), 14 deletions(-)
diff --git a/common/cmd_mem.c b/common/cmd_mem.c index 9850800..a34b342 100644 --- a/common/cmd_mem.c +++ b/common/cmd_mem.c @@ -631,7 +631,7 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) vu_long *addr, *start, *end; ulong val; ulong readback; - int rcode = 0; + ulong errs = 0; int iterations = 1; int iteration_limit;
@@ -698,9 +698,9 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[])
if (iteration_limit && iterations > iteration_limit) { - printf("Tested %d iteration(s) without errors.\n", - iterations-1); - return 0; + printf("Tested %d iteration(s) with %lu errors.\n", + iterations-1, errs); + return errs != 0; }
printf("Iteration: %6d\r", iterations); @@ -732,9 +732,14 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) *dummy = ~val; /* clear the test data off of the bus */ readback = *addr; if(readback != val) { - printf ("FAILURE (data line): " + printf ("FAILURE (data line): " "expected %08lx, actual %08lx\n", val, readback); + errs++; + if (ctrlc()) { + putc ('\n'); + return 1; + } } *addr = ~val; *dummy = val; @@ -743,6 +748,11 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) printf ("FAILURE (data line): " "Is %08lx, should be %08lx\n", readback, ~val); + errs++; + if (ctrlc()) { + putc ('\n'); + return 1; + } } } } @@ -808,7 +818,11 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) printf ("\nFAILURE: Address bit stuck high @ 0x%.8lx:" " expected 0x%.8lx, actual 0x%.8lx\n", (ulong)&start[offset], pattern, temp); - return 1; + errs++; + if (ctrlc()) { + putc ('\n'); + return 1; + } } } start[test_offset] = pattern; @@ -826,7 +840,11 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) printf ("\nFAILURE: Address bit stuck low or shorted @" " 0x%.8lx: expected 0x%.8lx, actual 0x%.8lx\n", (ulong)&start[offset], pattern, temp); - return 1; + errs++; + if (ctrlc()) { + putc ('\n'); + return 1; + } } } start[test_offset] = pattern; @@ -864,7 +882,11 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) printf ("\nFAILURE (read/write) @ 0x%.8lx:" " expected 0x%.8lx, actual 0x%.8lx)\n", (ulong)&start[offset], pattern, temp); - return 1; + errs++; + if (ctrlc()) { + putc ('\n'); + return 1; + } }
anti_pattern = ~pattern; @@ -882,7 +904,11 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) printf ("\nFAILURE (read/write): @ 0x%.8lx:" " expected 0x%.8lx, actual 0x%.8lx)\n", (ulong)&start[offset], anti_pattern, temp); - return 1; + errs++; + if (ctrlc()) { + putc ('\n'); + return 1; + } } start[offset] = 0; } @@ -897,9 +923,9 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) }
if (iteration_limit && iterations > iteration_limit) { - printf("Tested %d iteration(s) without errors.\n", - iterations-1); - return 0; + printf("Tested %d iteration(s) with %lu errors.\n", + iterations-1, errs); + return errs != 0; } ++iterations;
@@ -923,7 +949,11 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) printf ("\nMem error @ 0x%08X: " "found %08lX, expected %08lX\n", (uint)addr, readback, val); - rcode = 1; + errs++; + if (ctrlc()) { + putc ('\n'); + return 1; + } } val += incr; } @@ -943,7 +973,7 @@ int do_mem_mtest (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) incr = -incr; } #endif - return rcode; + return 0; /* not reached */ }

On Friday 02 October 2009 18:18:33 Paul Gortmaker wrote:
The basic memtest function tries to watch for ^C after each pattern pass as an escape mechanism, but if things are horribly wrong, we'll be stuck in an inner loop flooding the console with error messages and never check for ^C. To make matters worse, if the user waits for all the error messages to complete, we then incorrectly report the test passed without errors.
Adding a check for ^C after any error is printed will give the end user an escape mechanism from a console flood without slowing down the overall test speed on a slow processor.
Also, the more extensive memtest quit after just a single error, which is inconsistent with the normal memtest, and not useful if if you are doing dynamic environmental impact testing, such as heating/cooling etc.
Both tests now track the error count and report it properly at test completion.
thanks, Acked-by: Mike Frysinger vapier@gentoo.org -mike

Dear Paul Gortmaker,
In message 1254521913-25655-1-git-send-email-paul.gortmaker@windriver.com you wrote:
The basic memtest function tries to watch for ^C after each pattern pass as an escape mechanism, but if things are horribly wrong, we'll be stuck in an inner loop flooding the console with error messages and never check for ^C. To make matters worse, if the user waits for all the error messages to complete, we then incorrectly report the test passed without errors.
Adding a check for ^C after any error is printed will give the end user an escape mechanism from a console flood without slowing down the overall test speed on a slow processor.
Also, the more extensive memtest quit after just a single error, which is inconsistent with the normal memtest, and not useful if if you are doing dynamic environmental impact testing, such as heating/cooling etc.
Both tests now track the error count and report it properly at test completion.
Signed-off-by: Paul Gortmaker paul.gortmaker@windriver.com
Changes: fixed return values since prev. version.
common/cmd_mem.c | 58 ++++++++++++++++++++++++++++++++++++++++------------- 1 files changed, 44 insertions(+), 14 deletions(-)
Applied, thanks.
Best regards,
Wolfgang Denk
participants (3)
-
Mike Frysinger
-
Paul Gortmaker
-
Wolfgang Denk