[U-Boot] mtest issue

Hello everyone,
I have a LPC2468 board and currently play with u-boot. U-boot version is 1.1.6 from Embedded Artists. I've already made u-boot loading and executing user commands. Everything seems works fine, except the memory test (mtest) command:
lpc-board# mtest
Pattern 00000000 Writing... Reading... Mem error @ 0xA075AB74: found A0780000, expected 001D6ADD
Mem error @ 0xA075AB78: found 001D6ADD, expected 001D6ADE
Pattern FFFFFFFF Writing...
After the mtest started to check RAM with "0xFFFFFFFF" pattern the system stops to respond. In case with CFG_ALT_MEMTEST defined mtest also fails with error:
Testing a0000000 ... a0780000: Iteration: 1 FAILURE (read/write) @ 0xa075ab68: expected 0x001d6adb, actual 0xffe29525)
You can see the ram dump made with "md a075ab40" command:
a075ab30: ffe29532 ffe29531 ffe29530 ffe2952f 2...1...0.../... a075ab40: a075ab68 a0788b64 a079ad7c a079ad7c h.u.d.x.|.y.|.y. a075ab50: 0000000d 0000000d a075ab68 a075ab74 ........h.u.t.u. a075ab60: 00000000 00000040 00000000 00000040 ....@.......@... a075ab70: 00000000 00000040 a079decc 00000002
This segment of RAM (addresses from 0xa075ab40) seems already initialized and used somehow by u-boot. I've checked second board in the same way. The result is the same, so any hardware problems are excepted. The question is: what is it? Is it stack or something else? Should i pay attention on that or just don't mind. Is it possible to make mtest pass?
As you may see, i'm totally new to u-boot and have no experience at all, that's why maybe my question will make you laugh. Anyway, could you please, help me to find the shortest way to fix this issue.
Thank you for your time wasted on me, Alexey Goncharov, Russia

Alexey Goncharov wrote:
Hello everyone,
I have a LPC2468 board and currently play with u-boot. U-boot version is 1.1.6 from Embedded Artists. I've already made u-boot loading and
1.1.6 is really old.
executing user commands. Everything seems works fine, except the memory test (mtest) command:
lpc-board# mtest
Pattern 00000000 Writing... Reading... Mem error @ 0xA075AB74: found A0780000, expected 001D6ADD
What does the value A0780000 mean to your system? That looks like a magic memory address. Who would store that into memory? That may be a clue for finding the culprit.
Mem error @ 0xA075AB78: found 001D6ADD, expected 001D6ADE
Oh-oh, the found vs. expected are one different. Since the memory test is writing and then verifying sequential numbers into sequential memory locations, this implies your hardware either read the wrong location (timing problem with the address lines or memory?) or the data was latched wrong so that the previous value was still on the data bus and latched rather than the correct value (a different flavor of timing problem).
Pattern FFFFFFFF Writing...
After the mtest started to check RAM with "0xFFFFFFFF" pattern the system stops to respond.
You need to compare the default mtest test range with your memory map and figure out where there is overlap. We don't know anything about your u-boot configuration, hardware, or memory map, so we cannot help.
If you specify a smaller, known unused, area to the mtest command (help mtest), does it work? Work your way through your "unused" memory map and find where it breaks.
In case with CFG_ALT_MEMTEST defined mtest also fails with error:
Testing a0000000 ... a0780000: Iteration: 1 FAILURE (read/write) @ 0xa075ab68: expected 0x001d6adb, actual 0xffe29525)
Note that the expected and actual are inverses: ~0x001d6adb == 0xffe29525 IIRC, the memory test stores the pattern, checks it, stores the ~pattern, and then checks that. This is smelling of a memory timing issue again.
You can see the ram dump made with "md a075ab40" command:
a075ab30: ffe29532 ffe29531 ffe29530 ffe2952f 2...1...0.../... a075ab40: a075ab68 a0788b64 a079ad7c a079ad7c h.u.d.x.|.y.|.y. a075ab50: 0000000d 0000000d a075ab68 a075ab74 ........h.u.t.u. a075ab60: 00000000 00000040 00000000 00000040 ....@.......@... a075ab70: 00000000 00000040 a079decc 00000002
This segment of RAM (addresses from 0xa075ab40) seems already initialized and used somehow by u-boot. I've checked second board in the same way. The result is the same, so any hardware problems are excepted. The question is: what is it?
We have no information, we have no clue. Use your information (memory map, deductive reasoning, detective skills).
Is it stack or something else?
Yes.
Should i pay attention on that or just don't mind.
YES, YOU MUST PAY ATTENTION TO THAT!
Is it possible to make mtest pass?
Yes, if your hardware and software work. If mtest is failing, you have a hardware or software problem. If you don't find and fix it, you will have an unreliable widget.
As you may see, i'm totally new to u-boot and have no experience at all, that's why maybe my question will make you laugh. Anyway, could you please, help me to find the shortest way to fix this issue.
Shortest: Send money to an expert to help you. Ask Embedded Artists? Is it their board or a custom board? Either way, they probably have the best knowledge of your board.
Most satisfying: Figure it out and fix it.
Thank you for your time wasted on me, Alexey Goncharov, Russia
Good luck, gvb

Hello Jerry!
I have a LPC2468 board and currently play with u-boot. U-boot version is 1.1.6 from Embedded Artists. I've already made u-boot loading and
1.1.6 is really old.
I'll keep this in mind and will update as soon as fix this issue.
Pattern 00000000 Writing... Reading... Mem error @ 0xA075AB74: found A0780000, expected 001D6ADD
What does the value A0780000 mean to your system? That looks like a magic memory address. Who would store that into memory? That may be a clue for finding the culprit.
Sorry i didn't provide you with the information at all. System has 8 Mbyte SDRAM and adresses from 0xA0780000 (7,5 Mbyte) to 0xA0800000 (8 MByte) are reserved for u-boot image and data. 0xA0780000 - is a TEXT_BASE (CFG_MEMTEST_END) in target system. I spent all the day yesterday navigating through the source code tree and found following excerpt from u-boot/cpu/arm720t/start.S:
ldr r0, _TEXT_BASE /* upper 128 KiB: relocated uboot */ sub r0, r0, #CFG_MALLOC_LEN /* malloc area */ sub r0, r0, #CFG_GBL_DATA_SIZE /* bdinfo */ #ifdef CONFIG_USE_IRQ sub r0, r0, #(CONFIG_STACKSIZE_IRQ+CONFIG_STACKSIZE_FIQ) #endif sub sp, r0, #12 /* leave 3 words for abort-stack */
This means that we have a memory map illustrated below:
1. Stack (grows "back" from higher adresses to lower addresses) 2. 12 bytes for abort-stack 3. FIQ Stack (grows "back" from higher adresses to lower addresses) 4. IRQ stack (grows "back" from higher adresses to lower addresses) 5. Global data 6. Heap --TEXT_BASE = 0xA0780000 7. Uboot image
After calculating all the segment's adresses, i figured out that addresses which give an error are from Stack segment (for example, 0xA075AB74). Here we come to the question: how can the memory segment be checked if it's already occupied as a stack segment? My guess is the mtest fails because of it "doesn't want" to corrupt the fragile data in stack. I don't have a clear view why does u-boot keep stack in adresses below TEXT_BASE. Addresses from TEXT_BASE to PHYS_SDRAM_1_SIZE are not supposed to be checked by mtest and it might be better to keep fragile data in that segment.
If you specify a smaller, known unused, area to the mtest command (help mtest), does it work? Work your way through your "unused" memory map and find where it breaks.
I also used the mtest to check memory segments right after "the broken cell" and would say that everything is fine. It's only a range about 20 bytes and some addresses within this range give a mtest-error.
In case with CFG_ALT_MEMTEST defined mtest also fails with error:
Testing a0000000 ... a0780000: Iteration: 1 FAILURE (read/write) @ 0xa075ab68: expected 0x001d6adb, actual 0xffe29525)
Note that the expected and actual are inverses: ~0x001d6adb == 0xffe29525 IIRC, the memory test stores the pattern, checks it, stores the ~pattern, and then checks that. This is smelling of a memory timing issue again.
I also used a very simple low-level test, written by co-worker, to check the SDRAM before u-boot becomes loaded. Here is the source of it:
unsigned int testSDRAM_32(void) { unsigned int i,j; for ( i = 0,j=0; i < (1024*1024*8); i+=sizeof(unsigned int),j++) { *(unsigned int*)((unsigned int)(SDRAM_BASE_ADDR+i)) = (unsigned int)j; } for ( i = 0,j=0; i < (1024*1024*8); i+=sizeof(unsigned int),j++) { if (*(unsigned int*)((unsigned int)(SDRAM_BASE_ADDR+i)) != (unsigned int)j) return(FALSE); } return(TRUE); }
{ static const unsigned char test_mem_BAD[] = "\r\nTest SDRAM Failed!\r\n"; unsigned char *pMsg; unsigned int i; if(testSDRAM_32() == TRUE) { static const unsigned char test_mem_OK[] = "\r\nTest SDRAM OK.\r\n"; pMsg = (unsigned char *)test_mem_OK - (unsigned char *)TEXT_BASE; for(i=0; i<(sizeof(test_mem_OK)-1); i++) { while((U0LSR & (1<<5)) == 0); /* Wait for empty U0THR */ U0THR = *pMsg++; } } else { static const unsigned char test_mem_BAD[] = "\r\nTest SDRAM Failed!\r\n"; pMsg = (unsigned char *)test_mem_BAD - (unsigned char *)TEXT_BASE; for(i=0; i<(sizeof(test_mem_BAD)-1); i++) { while((U0LSR & (1<<5)) == 0); /* Wait for empty U0THR */ U0THR = *pMsg++; } } }
And i would say that there is no errors in common, and timing issues in particular.
So.. let's try to gather all the questions in one list:
1. Am I right saying that TEXT_BASE and CFG_MEMTEST_END is the same? Can i decrease the CFG_MEMTEST_END to avoid the overlaying of mtest check-segment and stack (or whatever)? I mean i've already decreased CFG_MEMTEST_END to 0xa0750000. Mtest passes, but, obviously, addresses from 0xa0750000 to 0xa0800000 are not being checked. is it ok? 2. Is the memory map i "illustrated" above right? 3. What is abort-stack (12 bytes)? (please, don't laugh) 4. What does memory segment from TEXT_BASE to PHYS_SDRAM_1_SIZE contain? Only u-boot image (copy from flash) or u-boot image and u-boot data?

Hello Jerry, Wolfgang and all the u-boot maillist participants!
Sorry for the broken encoding in my previous mail.
I have a LPC2468 board and currently play with u-boot. U-boot version is 1.1.6 from Embedded Artists. I've already made u-boot loading and
1.1.6 is really old.
I'll keep this in mind and will make the update as soon as fix current issue.
Pattern 00000000 Writing... Reading... Mem error @ 0xA075AB74: found A0780000, expected 001D6ADD
What does the value A0780000 mean to your system? That looks like a magic memory address. Who would store that into memory? That may be a clue for finding the culprit.
Sorry i didn't provide you with the information at all. System has 8 Mbyte SDRAM and adresses from 0xA0780000 (7,5 Mbyte) to 0xA0800000 (8 MByte) are reserved for u-boot image and data. 0xA0780000 - is a TEXT_BASE (and CFG_MEMTEST_END) in target system. I found following excerpt from u-boot/cpu/arm720t/start.S:
ldr r0, _TEXT_BASE /* upper 128 KiB: relocated uboot */ sub r0, r0, #CFG_MALLOC_LEN /* malloc area */ sub r0, r0, #CFG_GBL_DATA_SIZE /* bdinfo */ #ifdef CONFIG_USE_IRQ sub r0, r0, #(CONFIG_STACKSIZE_IRQ+CONFIG_STACKSIZE_FIQ) #endif sub sp, r0, #12 /* leave 3 words for abort-stack */
This means that we have a memory map illustrated below:
1. Stack (grows "back" from higher adresses to lower addresses) 2. 12 bytes for abort-stack 3. FIQ Stack (grows "back" from higher adresses to lower addresses) 4. IRQ stack (grows "back" from higher adresses to lower addresses) 5. Global data 6. Heap --TEXT_BASE = 0xA0780000 7. Uboot image
After calculating all the segment's adresses, i figured out that addresses which give mtest error are from Stack segment (for example, 0xA075AB74). Here we come to the question: how can the memory segment be checked if it's already occupied as a stack segment? My guess is the mtest fails because of it "doesn't want" to corrupt the fragile data in stack.
If you specify a smaller, known unused, area to the mtest command (help mtest), does it work? Work your way through your "unused" memory map and find where it breaks.
I also used the mtest to check memory segments right after "the broken cell" and would say that everything is fine. It's only a range about 20 bytes and some addresses within this range give a mtest-error.
In case with CFG_ALT_MEMTEST defined mtest also fails with error:
Testing a0000000 ... a0780000: Iteration: 1 FAILURE (read/write) @ 0xa075ab68: expected 0x001d6adb, actual 0xffe29525)
Note that the expected and actual are inverses: ~0x001d6adb == 0xffe29525 IIRC, the memory test stores the pattern, checks it, stores the ~pattern, and then checks that. This is smelling of a memory timing issue again.
I also used a very simple low-level test, written by co-worker, to check the SDRAM before u-boot becomes loaded. Here is the source of it:
unsigned int testSDRAM_32(void) { unsigned int i,j; for ( i = 0,j=0; i < (1024*1024*8); i+=sizeof(unsigned int),j++) { *(unsigned int*)((unsigned int)(SDRAM_BASE_ADDR+i)) = (unsigned int)j; } for ( i = 0,j=0; i < (1024*1024*8); i+=sizeof(unsigned int),j++) { if (*(unsigned int*)((unsigned int)(SDRAM_BASE_ADDR+i)) != (unsigned int)j) return(FALSE); } return(TRUE); }
{ static const unsigned char test_mem_BAD[] = "\r\nTest SDRAM Failed!\r\n"; unsigned char *pMsg; unsigned int i; if(testSDRAM_32() == TRUE) { static const unsigned char test_mem_OK[] = "\r\nTest SDRAM OK.\r\n"; pMsg = (unsigned char *)test_mem_OK - (unsigned char *)TEXT_BASE; for(i=0; i<(sizeof(test_mem_OK)-1); i++) { while((U0LSR & (1<<5)) == 0); /* Wait for empty U0THR */ U0THR = *pMsg++; } } else { static const unsigned char test_mem_BAD[] = "\r\nTest SDRAM Failed!\r\n"; pMsg = (unsigned char *)test_mem_BAD - (unsigned char *)TEXT_BASE; for(i=0; i<(sizeof(test_mem_BAD)-1); i++) { while((U0LSR & (1<<5)) == 0); /* Wait for empty U0THR */ U0THR = *pMsg++; } } }
And i would say that there are no errors in common, and timing issues in particular.
So.. let's try to gather all the questions in one list:
1. Can i decrease the CFG_MEMTEST_END to avoid the overlaying of mtest check-segment and stack (or whatever)? I mean i've already decreased CFG_MEMTEST_END to 0xa0750000. Mtest passes, but, obviously, addresses from 0xa0750000 to 0xa0800000 are not being checked. is it ok? 2. Is the memory map i "illustrated" above right? 3. What is abort-stack (12 bytes)? (please, don't laugh)

Dear Alexey Goncharov,
In message 1bd99ff90911222207v227942d0h4ed03e884bd5ac6d@mail.gmail.com you wrote:
- Can i decrease the CFG_MEMTEST_END to avoid the overlaying of mtest
check-segment and stack (or whatever)? I mean i've already decreased CFG_MEMTEST_END to 0xa0750000. Mtest passes, but, obviously, addresses from 0xa0750000 to 0xa0800000 are not being checked. is it ok?
You are not only allowed, but you are actually supposed to configure CFG_MEMTEST_* such that these settings do not conflict with any used memory ranges on your board.
- Is the memory map i "illustrated" above right?
Probably. But note that it is you who defines the memory map on a specific board.
- What is abort-stack (12 bytes)? (please, don't laugh)
Dunno.
Best regards,
Wolfgang Denk
participants (3)
-
Alexey Goncharov
-
Jerry Van Baren
-
Wolfgang Denk