[U-Boot-Users] best way to debug memory address problems?

Help; I'm trying to figure out if these errors are hardware or software. And of course, this is our first ppc/uboot system, so I'm still stepping through a lot of learning as I go.
I'm supposed to have a memory region from 0 to 7fff_ffff assigned to our 128MB memory. However, I run into several issues when trying to run mtest. I end up with an large sections mis-behaving. I don't see any conflicts in the BRx registers, and I believe my OR1 is set up properly, so I'm not sure how to proceed.
good: #> md 0x00100000 1; mw 0x00100000 0xfff0000f ; md 0x00100000 1; 00100000: ffffffff .... 00100000: fff0000f ....
bad: #> md 0x00b8ac98 1; mw 0x00b8ac98 0xffff0000 ; md 0x00b8ac98 1; 00b8ac98: 61633938 ac98 00b8ac98: 61633938 ac98 #> md 0x00b8ac98 1; mw 0x00b8ac98 0x0000ffff ; md 0x00b8ac98 1; 00b8ac98: 61633938 ac98 00b8ac98: 61633938 ac98 #> md 0x00b8ac98 1; mw 0x00b8ac98 0xff0000ff ; md 0x00b8ac98 1; 00b8ac98: 61633938 ac98 00b8ac98: 61633938 ac98
System: mpc8248 - custom board based on ep8248e 128 MB RAM; 128 MB Flash; 128 MB Flash #> memcinfo BR0 = f8001801 OR0 = f80018c2 BR1 = 00001841 OR1 = f8002b00 BR2 = f4000801 OR2 = fff018c4 BR3 = 00000000 OR3 = 00000000 BR4 = e8001801 OR4 = f80018c2 BR5 = f4100801 OR5 = fff00864 BR6 = f4200801 OR6 = fff00864 BR7 = 00000000 OR7 = 00000000 BR8 = 00000000 OR8 = 00000000 BR9 = 00000000 OR9 = 00000000 BR10 = 00000000 OR10 = 00000000 BR11 = 00000000 OR11 = 00000000 MAR = adf32865 MAMR = 00000000 MBMR = 00000000 MCMR = 00000000 MPTPR = 1300 MDR = 1d005815 PSDMR = c2672522 LSDMR = 00000000 PURT = 21 PSRT = 4b LURT = 12 LSRT = a9 IMMR = f0000c10

Hi Alan,
Help; I'm trying to figure out if these errors are hardware or software. And of course, this is our first ppc/uboot system, so I'm still stepping through a lot of learning as I go.
I'm supposed to have a memory region from 0 to 7fff_ffff assigned to our 128MB memory. However, I run into several issues when trying to run mtest. I end up with an large sections mis-behaving. I don't see any conflicts in the BRx registers, and I believe my OR1 is set up properly, so I'm not sure how to proceed.
First thing would be to start with an oscilloscope.
I'm going to have to do this in a week or two.
Is your memory in a module, or is it directly soldered? Is the memory SDRAM, DDR, DDR2, etc.?
Regardless of the memory type, use a scope to look at the waveforms, relative timing of clock and data, and compare the measurements to the data sheet.
The memory controllers have lots of options with regards to drive strength, and timing, so those need to be customized to a specific board. Chips such as DDR also have internal registers for configuring drive strength.
For example, on the MPC8349EA-MDS board, they use a DDR DIMM module, and require a termination regulator. On my design, the DDR memory will be on the board, and space was a problem, so we simulated 2.5V DDR without a 1.25V termination regulator. The system works in theory :) But I know I will be scoping things out and messing with the drive strength registers using the BDI2000 to talk to the board before I even attempt to run a memory test, let alone attempt to boot from Flash.
Cheers, Dave

Alan Bennett wrote:
Help; I'm trying to figure out if these errors are hardware or software. And of course, this is our first ppc/uboot system, so I'm still stepping through a lot of learning as I go.
I'm supposed to have a memory region from 0 to 7fff_ffff assigned to
Looks like 07ff_ffff to me, although it doesn't matter for this discussion.
our 128MB memory. However, I run into several issues when trying to run mtest. I end up with an large sections mis-behaving. I don't see any conflicts in the BRx registers, and I believe my OR1 is set up properly, so I'm not sure how to proceed.
good: #> md 0x00100000 1; mw 0x00100000 0xfff0000f ; md 0x00100000 1; 00100000: ffffffff .... 00100000: fff0000f ....
bad: #> md 0x00b8ac98 1; mw 0x00b8ac98 0xffff0000 ; md 0x00b8ac98 1; 00b8ac98: 61633938 ac98 00b8ac98: 61633938 ac98
OK, the *data bits* in the memory are the *ASCII* values that form the lower half of the address 0x61 = 'a', 0x63 = 'c', 0x39 = '9', 0x38 = '8'. There is *no way* this can be a coincidence unless you wrote those values in intentionally before doing the above example "mw" command. Going down that decision tree...
a) If you put 0x61633938 in that location of memory, how did you do it when it worked as opposed to the above illustration where it failed?
b) If you *didn't* put 0x61633938 in that memory location, who did? There is no way hardware mis-wrote half the address as *ASCII* values as part of the "mw" command which means either there is a software bug involved or your hardware is seriously sick (does u-boot run reliably?). If this is the case, what happens when you use capital letter addresses? md 0x00b8AC98 1; mw 0x00b8AC98 0xffff0000 ; md 0x00b8AC98 1;
#> md 0x00b8ac98 1; mw 0x00b8ac98 0x0000ffff ; md 0x00b8ac98 1; 00b8ac98: 61633938 ac98 00b8ac98: 61633938 ac98 #> md 0x00b8ac98 1; mw 0x00b8ac98 0xff0000ff ; md 0x00b8ac98 1; 00b8ac98: 61633938 ac98 00b8ac98: 61633938 ac98
System: mpc8248 - custom board based on ep8248e 128 MB RAM; 128 MB Flash; 128 MB Flash #> memcinfo BR0 = f8001801 OR0 = f80018c2 BR1 = 00001841 OR1 = f8002b00 BR2 = f4000801 OR2 = fff018c4 BR3 = 00000000 OR3 = 00000000 BR4 = e8001801 OR4 = f80018c2 BR5 = f4100801 OR5 = fff00864 BR6 = f4200801 OR6 = fff00864 BR7 = 00000000 OR7 = 00000000 BR8 = 00000000 OR8 = 00000000 BR9 = 00000000 OR9 = 00000000 BR10 = 00000000 OR10 = 00000000 BR11 = 00000000 OR11 = 00000000 MAR = adf32865 MAMR = 00000000 MBMR = 00000000 MCMR = 00000000 MPTPR = 1300 MDR = 1d005815 PSDMR = c2672522 LSDMR = 00000000 PURT = 21 PSRT = 4b LURT = 12 LSRT = a9 IMMR = f0000c10
Puzzled, gvb

Jerry Van Baren wrote:
Alan Bennett wrote:
Help; I'm trying to figure out if these errors are hardware or software. And of course, this is our first ppc/uboot system, so I'm still stepping through a lot of learning as I go.
I'm supposed to have a memory region from 0 to 7fff_ffff assigned to
Looks like 07ff_ffff to me, although it doesn't matter for this discussion.
our 128MB memory. However, I run into several issues when trying to run mtest. I end up with an large sections mis-behaving. I don't see any conflicts in the BRx registers, and I believe my OR1 is set up properly, so I'm not sure how to proceed.
good: #> md 0x00100000 1; mw 0x00100000 0xfff0000f ; md 0x00100000 1; 00100000: ffffffff .... 00100000: fff0000f ....
bad: #> md 0x00b8ac98 1; mw 0x00b8ac98 0xffff0000 ; md 0x00b8ac98 1; 00b8ac98: 61633938 ac98 00b8ac98: 61633938 ac98
OK, the *data bits* in the memory are the *ASCII* values that form the lower half of the address 0x61 = 'a', 0x63 = 'c', 0x39 = '9', 0x38 = '8'. There is *no way* this can be a coincidence unless you wrote those values in intentionally before doing the above example "mw" command. Going down that decision tree...
a) If you put 0x61633938 in that location of memory, how did you do it when it worked as opposed to the above illustration where it failed?
b) If you *didn't* put 0x61633938 in that memory location, who did? There is no way hardware mis-wrote half the address as *ASCII* values as part of the "mw" command which means either there is a software bug involved or your hardware is seriously sick (does u-boot run reliably?). If this is the case, what happens when you use capital letter addresses? md 0x00b8AC98 1; mw 0x00b8AC98 0xffff0000 ; md 0x00b8AC98 1;
Hi Alan,
[snip]
The above symptoms could be because your u-boot scratchpad memory (.data and/or .bss) resides in the memory locations you are trying to test - 0x00b8xxxx - either intentionally or unintentionally. If unintentionally, you have either a hardware problem or a software misconfiguration.
Is your memory space clean or does it get replicated? If you write a unique pattern (say 0xcafeface) to address 0x0000_0000, do you see it again at an address related by one address bit - e.g. 0x0400_0000, 0x0200_0000, 0x0100_0000, ... 0x0000_0010, 0x0000_0008, 0x0000_0004?
Hardware problems would tend to be miswiring or shorts/opens on the address bus. That would be Very Bad. :-(
Software problems would tend to be errors in SDRAM (DDR/DDR2) configuration - if, for instance, you have your processor initialized with the wrong address split on the bank select, you will be sending the wrong number of bits for RAS and CAS, causing a memory "duplication" problem. That would not be good, but *NOT* Very Bad since it is dead simple to fix (once you figure out what the configuration _should_ be, that is ;-).
Good luck, gvb

Jerry; This is interesting, on further examination, I see replication occurring on the 16 MB boundary. Are you saying that this is due to a misconfiguration of the BR/OR registers? i.e. writing to 0 results in replication when reading from 01000000 / 02000000 / 03000000 etc...
Along the lines of the mtest errors. I see 340 B of errors just running the initial mtest routines. I'll see if I can find out what might be in that area, but of course, that's after figuring out this replication problem.
mtest results: size of memory error area: 0x00B8BDE4-0x00B8BC90=0x154 Mem error @ 0x00B8BC90: found 07B8BCA0, expected 002A2F24 ... Mem error @ 0x00B8BDE4: found 07FDC3D0, expected 002A2F79
-Thanks again, Alan

Alan Bennett wrote:
Jerry; This is interesting, on further examination, I see replication occurring on the 16 MB boundary. Are you saying that this is due to a misconfiguration of the BR/OR registers? i.e. writing to 0 results in replication when reading from 01000000 / 02000000 / 03000000 etc...
Along the lines of the mtest errors. I see 340 B of errors just running the initial mtest routines. I'll see if I can find out what might be in that area, but of course, that's after figuring out this replication problem.
mtest results: size of memory error area: 0x00B8BDE4-0x00B8BC90=0x154 Mem error @ 0x00B8BC90: found 07B8BCA0, expected 002A2F24 ... Mem error @ 0x00B8BDE4: found 07FDC3D0, expected 002A2F79
-Thanks again, Alan
Hi Alan,
Note: I've forgotten which processor you use and whether it is SDRAM, DDR, or DDR2. In the discussion below, I'm talking about SDRAM and am somewhat vague, but the concepts apply to all configurations. DDR/DDR2 are simply improved ways of implementing synchronous dynamic RAM (SDRAM).
I can think of three ways of having your memory replication problems (BR/OR configuration does not appear to be one of the ways).
1) If your SDRAM initialization is wrong such that you set up the processor's bank/page addresses to a value that doesn't match your SDRAM internals, you will get replications on the boundaries of your banks/pages.
1a) Wrong banks per device or wrong row start address: I would expect this to be around 4K-32K, depending on your SDRAM. Not your situation.
1b) Wrong number of rows: Definite possibility - you need to have the right number of address lines configured in your SDRAM machine (row_start_address - 1) + banks_per_device + number_of_rows (note row_start_address - 1 == columns) so that they match the number of address lines (memory size) of your memory. If you have too few address lines configured in your SDRAM configuration, you would create the symptoms you are seeing.
2) If you have a hardware wiring (layout) error with a missing address line, you will have have replication based on that line since the CPU will toggle it appropriately but the memory won't see it. Given the multiplexed nature of SDRAM addresses, this is somewhat less likely because a missing/broken address line would tend to hit both row and column addresses. However, if the missing/broken address line is only used for the row address, it would match your symptoms.
3) If you have a fabrication problem (short/open), the affected address line will generally be stuck low, stuck high, or driven by an adjacent line (which is almost always another address line). In all three cases, the CPU toggles the affected address line but the memory doesn't see it, causing a replication scenario.
Since you say you have a replication on a 16MB boundary, the address line that is suspect is A23 (2^24 - WARNING, I'm using the "standard" bit numbering here, NOT PowerPC). #1b or #3 are the most likely problems. For #1b, I would verify the SDRAM (DDR, I forgot what you are running) configuration.
For #3, X-ray machines for checking balls (solder quality) and VOMs for checking for continuity and shorts is where I would go next.
In parallel, I would task the hardware designer and/or layout person with verifying that the address lines are connected properly, especially A22, A23, A24, (and the multiplexed equiv. going to the SDRAM) and any other address lines that may be adjacent to A23.
Good luck, gvb

Good news; I found an error in my PSDMR and after correcting that, I'm off and running mtest and the replication has also disappeared.
Thanks for your help! BTW. It is a 128MB SDRAM MPC8248 design with 2 banks of 128MB flash
-Thanks!
On 9/11/07, Jerry Van Baren gerald.vanbaren@smiths-aerospace.com wrote:
Alan Bennett wrote:
Jerry; This is interesting, on further examination, I see replication occurring on the 16 MB boundary. Are you saying that this is due to a misconfiguration of the BR/OR registers? i.e. writing to 0 results in replication when reading from 01000000 / 02000000 / 03000000 etc...
Along the lines of the mtest errors. I see 340 B of errors just running the initial mtest routines. I'll see if I can find out what might be in that area, but of course, that's after figuring out this replication problem.
mtest results: size of memory error area: 0x00B8BDE4-0x00B8BC90=0x154 Mem error @ 0x00B8BC90: found 07B8BCA0, expected 002A2F24 ... Mem error @ 0x00B8BDE4: found 07FDC3D0, expected 002A2F79
-Thanks again, Alan
Hi Alan,
Note: I've forgotten which processor you use and whether it is SDRAM, DDR, or DDR2. In the discussion below, I'm talking about SDRAM and am somewhat vague, but the concepts apply to all configurations. DDR/DDR2 are simply improved ways of implementing synchronous dynamic RAM (SDRAM).
I can think of three ways of having your memory replication problems (BR/OR configuration does not appear to be one of the ways).
- If your SDRAM initialization is wrong such that you set up the
processor's bank/page addresses to a value that doesn't match your SDRAM internals, you will get replications on the boundaries of your banks/pages.
1a) Wrong banks per device or wrong row start address: I would expect this to be around 4K-32K, depending on your SDRAM. Not your situation.
1b) Wrong number of rows: Definite possibility - you need to have the right number of address lines configured in your SDRAM machine (row_start_address - 1) + banks_per_device + number_of_rows (note row_start_address - 1 == columns) so that they match the number of address lines (memory size) of your memory. If you have too few address lines configured in your SDRAM configuration, you would create the symptoms you are seeing.
- If you have a hardware wiring (layout) error with a missing address
line, you will have have replication based on that line since the CPU will toggle it appropriately but the memory won't see it. Given the multiplexed nature of SDRAM addresses, this is somewhat less likely because a missing/broken address line would tend to hit both row and column addresses. However, if the missing/broken address line is only used for the row address, it would match your symptoms.
- If you have a fabrication problem (short/open), the affected address
line will generally be stuck low, stuck high, or driven by an adjacent line (which is almost always another address line). In all three cases, the CPU toggles the affected address line but the memory doesn't see it, causing a replication scenario.
Since you say you have a replication on a 16MB boundary, the address line that is suspect is A23 (2^24 - WARNING, I'm using the "standard" bit numbering here, NOT PowerPC). #1b or #3 are the most likely problems. For #1b, I would verify the SDRAM (DDR, I forgot what you are running) configuration.
For #3, X-ray machines for checking balls (solder quality) and VOMs for checking for continuity and shorts is where I would go next.
In parallel, I would task the hardware designer and/or layout person with verifying that the address lines are connected properly, especially A22, A23, A24, (and the multiplexed equiv. going to the SDRAM) and any other address lines that may be adjacent to A23.
Good luck, gvb

Alan Bennett wrote:
On 9/11/07, Jerry Van Baren gerald.vanbaren@smiths-aerospace.com wrote:
Alan Bennett wrote:
Jerry; This is interesting, on further examination, I see replication occurring on the 16 MB boundary. Are you saying that this is due to a misconfiguration of the BR/OR registers? i.e. writing to 0 results in replication when reading from 01000000 / 02000000 / 03000000 etc...
Along the lines of the mtest errors. I see 340 B of errors just running the initial mtest routines. I'll see if I can find out what might be in that area, but of course, that's after figuring out this replication problem.
mtest results: size of memory error area: 0x00B8BDE4-0x00B8BC90=0x154 Mem error @ 0x00B8BC90: found 07B8BCA0, expected 002A2F24 ... Mem error @ 0x00B8BDE4: found 07FDC3D0, expected 002A2F79
-Thanks again, Alan
Hi Alan,
Note: I've forgotten which processor you use and whether it is SDRAM, DDR, or DDR2. In the discussion below, I'm talking about SDRAM and am somewhat vague, but the concepts apply to all configurations. DDR/DDR2 are simply improved ways of implementing synchronous dynamic RAM (SDRAM).
I can think of three ways of having your memory replication problems (BR/OR configuration does not appear to be one of the ways).
- If your SDRAM initialization is wrong such that you set up the
processor's bank/page addresses to a value that doesn't match your SDRAM internals, you will get replications on the boundaries of your banks/pages.
1a) Wrong banks per device or wrong row start address: I would expect this to be around 4K-32K, depending on your SDRAM. Not your situation.
1b) Wrong number of rows: Definite possibility - you need to have the right number of address lines configured in your SDRAM machine (row_start_address - 1) + banks_per_device + number_of_rows (note row_start_address - 1 == columns) so that they match the number of address lines (memory size) of your memory. If you have too few address lines configured in your SDRAM configuration, you would create the symptoms you are seeing.
- If you have a hardware wiring (layout) error with a missing address
line, you will have have replication based on that line since the CPU will toggle it appropriately but the memory won't see it. Given the multiplexed nature of SDRAM addresses, this is somewhat less likely because a missing/broken address line would tend to hit both row and column addresses. However, if the missing/broken address line is only used for the row address, it would match your symptoms.
- If you have a fabrication problem (short/open), the affected address
line will generally be stuck low, stuck high, or driven by an adjacent line (which is almost always another address line). In all three cases, the CPU toggles the affected address line but the memory doesn't see it, causing a replication scenario.
Since you say you have a replication on a 16MB boundary, the address line that is suspect is A23 (2^24 - WARNING, I'm using the "standard" bit numbering here, NOT PowerPC). #1b or #3 are the most likely problems. For #1b, I would verify the SDRAM (DDR, I forgot what you are running) configuration.
For #3, X-ray machines for checking balls (solder quality) and VOMs for checking for continuity and shorts is where I would go next.
In parallel, I would task the hardware designer and/or layout person with verifying that the address lines are connected properly, especially A22, A23, A24, (and the multiplexed equiv. going to the SDRAM) and any other address lines that may be adjacent to A23.
Good luck, gvb
Good news; I found an error in my PSDMR and after correcting that, I'm off and running mtest and the replication has also disappeared.
Thanks for your help! BTW. It is a 128MB SDRAM MPC8248 design with 2 banks of 128MB flash
-Thanks!
Very good news! Cheap solution (#1)! Some days you eat bear, some days the bear eats you. I hear bear is very tasty. ;-D
gvb
participants (4)
-
Alan Bennett
-
David Hawkins
-
Jerry Van Baren
-
Jerry Van Baren