[U-Boot-Users] Problems writing to memory with mw

Hi all,
I'm trying to track down problems loading a linux kernel on my custom 8548 board off of 1.3RC3 - it loads sometimes via a ramdisk and gives me a bash shell - but most times it crashes in unusual, different places.
I ram mtest in the monitor and it crashes at 00000a90 . When using mw I get:
=> mw 00000a90 cafecafe NIP: CAFECAFC XER: 00000000 LR: 1FFC109C REGS: 1ff9dc40 TRAP: 0700 DAR: 00000000 MSR: 00001000 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 00
GPR00: 1FFD9C44 1FF9DD30 00000200 1FF9DD40 00000338 00000018 1FFEA258 1FFEA308 GPR08: 1FFEA2F0 1FF9E00C 00000C01 1FFA03F0 1FFEA250 00000000 1FFEFB00 20040000 GPR16: 00000000 00000000 00000000 00000000 00001000 1FF9DD30 00000000 1FFC109C GPR24: CAFECAFE 1FFEA244 00000339 1FFE71B8 00000049 1FF9DF80 1FFF0860 00000338 ** Illegal Instruction ** Call backtrace: 00000000 1FFD6C80 1FFD74CC 1FFDB6A8 1FFDC0A4 1FFDC6D0 1FFD0AE0 1FFC8294 1FFC161C
I can write to 00000a90 via the bdi . u-boot otherwise runs perfectly. Any ideas? Robert

robert lazarski wrote:
Hi all,
I'm trying to track down problems loading a linux kernel on my custom 8548 board off of 1.3RC3 - it loads sometimes via a ramdisk and gives me a bash shell - but most times it crashes in unusual, different places.
I ram mtest in the monitor and it crashes at 00000a90 . When using mw I get:
=> mw 00000a90 cafecafe NIP: CAFECAFC XER: 00000000 LR: 1FFC109C REGS: 1ff9dc40 TRAP: 0700 DAR: 00000000 MSR: 00001000 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 00
GPR00: 1FFD9C44 1FF9DD30 00000200 1FF9DD40 00000338 00000018 1FFEA258 1FFEA308 GPR08: 1FFEA2F0 1FF9E00C 00000C01 1FFA03F0 1FFEA250 00000000 1FFEFB00 20040000 GPR16: 00000000 00000000 00000000 00000000 00001000 1FF9DD30 00000000 1FFC109C GPR24: CAFECAFE 1FFEA244 00000339 1FFE71B8 00000049 1FF9DF80 1FFF0860 00000338 ** Illegal Instruction ** Call backtrace: 00000000 1FFD6C80 1FFD74CC 1FFDB6A8 1FFDC0A4 1FFDC6D0 1FFD0AE0 1FFC8294 1FFC161C
I can write to 00000a90 via the bdi . u-boot otherwise runs perfectly. Any ideas? Robert
98% probability you have a SDRAM configuration problem. 2% probability you have a hardware problem. I'm rooting for SDRAM config problem, you probably should too. ;-) http://www.denx.de/wiki/view/DULG/UBootCrashAfterRelocation
Writing to location 0x0A90 doesn't sound like a good idea to me. I'm not familiar with the 8548, but this is in the middle of the exception vectors. You are probably overwriting exception handling code (check your 85xx UM), so that would be an invalid test (red herring).
gvb

On Nov 20, 2007 9:54 AM, Jerry Van Baren gerald.vanbaren@ge.com wrote:
robert lazarski wrote:
Hi all,
I'm trying to track down problems loading a linux kernel on my custom 8548 board off of 1.3RC3 - it loads sometimes via a ramdisk and gives me a bash shell - but most times it crashes in unusual, different places.
I ram mtest in the monitor and it crashes at 00000a90 . When using mw I get:
=> mw 00000a90 cafecafe NIP: CAFECAFC XER: 00000000 LR: 1FFC109C REGS: 1ff9dc40 TRAP: 0700 DAR: 00000000
<snip>
I can write to 00000a90 via the bdi . u-boot otherwise runs perfectly. Any ideas? Robert
98% probability you have a SDRAM configuration problem. 2% probability you have a hardware problem. I'm rooting for SDRAM config problem, you probably should too. ;-) http://www.denx.de/wiki/view/DULG/UBootCrashAfterRelocation
Writing to location 0x0A90 doesn't sound like a good idea to me. I'm not familiar with the 8548, but this is in the middle of the exception vectors. You are probably overwriting exception handling code (check your 85xx UM), so that would be an invalid test (red herring).
gvb
Ahh, seems my test is invalid - thanks for pointing that out. I defined and ran my boards testram() succesfully when I brought the board up - that seems to write to 0x0A90 et all when its safe to do so. I'll try again but its a long test.
I see no problems with u-boot - relocation seems to work. The link suggested following my memory specs "to the letter" . So far I'm just calling spd_sdram() like most 85xx boards do - should I look there? Since my kernel boots sometimes into bash, but usually doesn't, I'm trying to confirm my memory is functioning. When the kernel fails to boot the eldk 85xx uRamdisk, its crashes at several different places before it loads the RFS. A few times everything just worked fine, which makes me think its a hardware issue. Any suggestions to tracking that type of problem down?
Thanks, Robert

robert lazarski wrote:
On Nov 20, 2007 9:54 AM, Jerry Van Baren gerald.vanbaren@ge.com wrote:
robert lazarski wrote:
Hi all,
I'm trying to track down problems loading a linux kernel on my custom 8548 board off of 1.3RC3 - it loads sometimes via a ramdisk and gives me a bash shell - but most times it crashes in unusual, different places.
I ram mtest in the monitor and it crashes at 00000a90 . When using mw I get:
=> mw 00000a90 cafecafe NIP: CAFECAFC XER: 00000000 LR: 1FFC109C REGS: 1ff9dc40 TRAP: 0700 DAR: 00000000
<snip> >> I can write to 00000a90 via the bdi . u-boot otherwise runs >> perfectly. Any ideas? >> Robert > 98% probability you have a SDRAM configuration problem. 2% probability > you have a hardware problem. I'm rooting for SDRAM config problem, you > probably should too. ;-) > <http://www.denx.de/wiki/view/DULG/UBootCrashAfterRelocation> > > Writing to location 0x0A90 doesn't sound like a good idea to me. I'm > not familiar with the 8548, but this is in the middle of the exception > vectors. You are probably overwriting exception handling code (check > your 85xx UM), so that would be an invalid test (red herring). > > gvb >
Ahh, seems my test is invalid - thanks for pointing that out. I defined and ran my boards testram() succesfully when I brought the board up - that seems to write to 0x0A90 et all when its safe to do so. I'll try again but its a long test.
I see no problems with u-boot - relocation seems to work. The link suggested following my memory specs "to the letter" . So far I'm just calling spd_sdram() like most 85xx boards do - should I look there? Since my kernel boots sometimes into bash, but usually doesn't, I'm trying to confirm my memory is functioning. When the kernel fails to boot the eldk 85xx uRamdisk, its crashes at several different places before it loads the RFS. A few times everything just worked fine, which makes me think its a hardware issue. Any suggestions to tracking that type of problem down?
Thanks, Robert
Hi Robert,
The configuration SDRAM problems typically show up when cache is enabled because that is when the pipelining really becomes active and config errors show up. There isn't a good way to debug this that I'm aware of, other than reading the users manuals/data sheets repeatedly (!!!) and working out the numbers (painfully!). My recollection of SPD is that mapping SPD values to SDRAM controller configuration is not nearly as straight forward as you would expect.
You could try different memory sticks to see if a different manufacturer and/or speed rating stops the crashing.
You could also try running caching disabled (start with both data and instruction disabled, then the other 3 combinations) - it slows the boot down substantially, but if it runs, it is an indication that you likely have a configuration problem.
WRT hardware problems, probably you can dial back the speed of your bus. If disabling caching doesn't fix the problem but dialing back the speed does, it would indicate a hardware problem.
For hardware problems, I would suspect trace problems: perhaps the traces are not acceptably close to equal length (outside of the spec) or if they are routed poorly (through a noisy part of the board). Looking at the layout may show up something. I've seen poor routing cause problems that were "solved" by slowing down the bus.
Good luck, gvb

On Nov 20, 2007 12:47 PM, Jerry Van Baren gerald.vanbaren@ge.com wrote:
Hi Robert,
The configuration SDRAM problems typically show up when cache is enabled because that is when the pipelining really becomes active and config errors show up. There isn't a good way to debug this that I'm aware of, other than reading the users manuals/data sheets repeatedly (!!!) and working out the numbers (painfully!). My recollection of SPD is that mapping SPD values to SDRAM controller configuration is not nearly as straight forward as you would expect.
You could try different memory sticks to see if a different manufacturer and/or speed rating stops the crashing.
Good idea, any DDR2 manufacturer known to work well? We've gotten this far with kingston.
You could also try running caching disabled (start with both data and instruction disabled, then the other 3 combinations) - it slows the boot down substantially, but if it runs, it is an indication that you likely have a configuration problem.
The monitor no longer seems to have the icache and dcache commands. I tried this in my board config but it seemed to have no effect:
#undef CONFIG_CMD_CACHE
cpu/mpc85xx/cpu.c has this but it didn't seem clear to me how to disable it:
141 142 puts("L1: D-cache 32 kB enabled\n I-cache 32 kB enabled\n"); 143
Is disabling i-cache and d-cache valid in u-boot for 85xx? How do I do that?
WRT hardware problems, probably you can dial back the speed of your bus. If disabling caching doesn't fix the problem but dialing back the speed does, it would indicate a hardware problem.
For hardware problems, I would suspect trace problems: perhaps the traces are not acceptably close to equal length (outside of the spec) or if they are routed poorly (through a noisy part of the board). Looking at the layout may show up something. I've seen poor routing cause problems that were "solved" by slowing down the bus.
We are looking in to this, thanks, Robert

On Nov 20, 2007 12:47 PM, Jerry Van Baren gerald.vanbaren@ge.com wrote:
robert lazarski wrote:
On Nov 20, 2007 9:54 AM, Jerry Van Baren gerald.vanbaren@ge.com wrote:
robert lazarski wrote:
Hi all,
I'm trying to track down problems loading a linux kernel on my custom 8548 board off of 1.3RC3 - it loads sometimes via a ramdisk and gives me a bash shell - but most times it crashes in unusual, different places.
You could also try running caching disabled (start with both data and instruction disabled, then the other 3 combinations) - it slows the boot down substantially, but if it runs, it is an indication that you likely have a configuration problem.
Instruction cache disabled fixes the problem, thanks!!! I called icache_disable() right before spd_sdram () . What does this mean? What configuration do you think could be wrong, spd ?
Incidently, I couldn't call icache_disable() and get it to return before or after spd_sdram ().
Robert

robert lazarski wrote:
On Nov 20, 2007 12:47 PM, Jerry Van Baren gerald.vanbaren@ge.com wrote:
robert lazarski wrote:
On Nov 20, 2007 9:54 AM, Jerry Van Baren gerald.vanbaren@ge.com wrote:
robert lazarski wrote:
Hi all,
I'm trying to track down problems loading a linux kernel on my custom 8548 board off of 1.3RC3 - it loads sometimes via a ramdisk and gives me a bash shell - but most times it crashes in unusual, different places.
You could also try running caching disabled (start with both data and instruction disabled, then the other 3 combinations) - it slows the boot down substantially, but if it runs, it is an indication that you likely have a configuration problem.
Instruction cache disabled fixes the problem, thanks!!! I called icache_disable() right before spd_sdram () . What does this mean? What configuration do you think could be wrong, spd ?
EXCELLENT! The good news: you just won the hardware lottery. :-) The bad news: you lost the "read the manual" lottery. :-/
Something is wrong with your SDRAM initialization. There is no way I can give any useful advice other than read the user's manuals for both the SDRAM and the processor SDRAM configuration.
Note that there is a u-boot i2c sdram command will dump the SDRAM SDP configuration which often is helpful (you need to define CONFIG_CMD_SDRAM). http://www.denx.de/cgi-bin/gitweb.cgi?p=u-boot.git;a=blob;f=common/cmd_i2c.c;h=a684a580e6edc24f54f057ad0a93de8d40659d95;hb=HEAD#l657
Your Kingston memory sticks should be good (they are a reputable outfit). My suggestion of trying different speed grades and different suppliers is that you may get lucky and find one that works. This helps in two ways: 1) It gets your manager/customer off your back because you can show him *something* that works and 2) the difference between the working ones and "broken" ones may help point out where the brokenness lies.
What likely is happening is that the SDRAM is being configured with one set of clocking criteria and the processor is being configured with a slightly different set. The way SDRAM works (in an ideal world) is that the processor launches an address, the SDRAM goes and finds that address internally, and some time later they get back together to exchange the data (read/write). While the first address is being looked up, the processor can launch another address or complete a previous r/w operation (or open pages or close pages or do refreshes or...). There actually can be a lot of operations "in flight" at any given time (possibly in the double digits). In a misconfigured world, the SDRAM and the processor don't get back together on the *exact same* clock cycle and things go crash.
The key is that the SDRAM and the processor are independently but *S*ynchronously running identically configured, very complex, state machines which track addresses, commands, and data.
The problem generally is that the processor and the SDRAM state machines are *not* identically configured, somewhere there are different clock cycle value(s) in their respective state machines. Figuratively speaking, one is doing the Zamba and the other is doing the Samba. http://en.wikipedia.org/wiki/Samba As a result, instead of dancing with the stars, the one steps on his partner's toes and they both fall down.
The cache disable test is leveraging on the fact that, when caching is disabled, the processor (tends to?) launches only one address at a time and waits for the result before going on to the next address. This "papers over" the SDRAM configuration error.
Incidently, I couldn't call icache_disable() and get it to return before or after spd_sdram ().
Robert
HTH & good luck, gvb

Hi Robert,
The bad news: you lost the "read the manual" lottery. :-/
[snip]
The cache disable test is leveraging on the fact that, when caching is disabled, the processor (tends to?) launches only one address at a time and waits for the result before going on to the next address. This "papers over" the SDRAM configuration error.
If you want to probe the board for single-cycle accesses versus burst-accesses without triggering a crash due to CPU cache issues, I'd suggest reading the user-manual section on the DMA controller.
You should be able to DMA a block of data from say internal SRAM, or a device, or the local bus, to the DDR memory, or from DDR to another location.
Were you involved with the board layout? Was a DDR voltage simulation and timing simulation performed? In my MPC8349E design, I'm using DDR1 SDRAM soldered to the board. The voltage and timing simulations show that I could eliminate the VTT termination regulator due to the short transmission lines, i.e., the ringing of the non-terminated transmission lines was not sufficient to violate any 'rules'. However, I will have to use a specific MPC driver output impedance, specific source termination resistor values, and I will have to configure the DDR1 memory for half-strength drivers. So although I copied a lot of stuff off the MPC8349EA-MDS-PB reference board, I will need to sit down for a few days, when I get the boards, and get all the memory configuration correct.
Its this subtle information you require before you can have any confidence your memory interface is working correctly. Actually, its the simulation info you need before you even build your board to know that its possible to get your memory interface working correctly!
Since you are using DDR DIMMs, you have less variables. When I was routing the DDR on the board, I looked at Micron's site and they have DIMM design PCBs. I imagine that most DDR DIMMs either use these PCB designs directly, or are virtually identical. Kingston memory is pretty good, I've used it and had no problems in desktop machines.
If your design uses DDR2, then it has on-chip terminations, so the board design/layout would have been a little simpler than DDR1. I used DDR1 since it uses 2.5V and I needed that voltage elsewhere. However, all the traces on the board would have required careful length matching to operate at DDR speeds of 333MHz+.
What clock frequency are you trying to run the DDR at for your tests? If you back-off to a slower clock frequency, you open up the setup/hold windows and provide yourself a little more timing margin. The MPC8349E has a delay register for launching address/controls at various delays, so you could try that same control on your processor to adjust the address/control timing too.
Although, given that your issue is related to bursting, I'd be more suspicious of data bus waveforms or the timing of the burst data. A scope or DDR2 DIMM logic analyzer probe would be useful.
For probing data, you'll want to look at the timing of the DQ signals relative to the DQS (strobe) signal, and DM (mask). Those are the DDR1 names, I think DDR2 is the same, but the DQS signals can be differential.
Cheers, Dave

In message 4744757E.60604@ge.com you wrote:
The good news: you just won the hardware lottery. :-)
I think it's too earlyu for such a statement. Ther emay still be layout issues oin the board which get only triggered by the specific access patterns in burst mode. I've seen such cases more than once.
There are a lot of things you can forget or do wrong, especially with DDR or higher speeds.
Best regards,
Wolfgang Denk

Dear Robert,
in message f87675ee0711210922r4917e03ep8efb25d046f3c0ed@mail.gmail.com you wrote:
Instruction cache disabled fixes the problem, thanks!!! I called icache_disable() right before spd_sdram () . What does this mean? What configuration do you think could be wrong, spd ?
It is a 99.9999% reliable indication that the FAQ http://www.denx.de/wiki/view/DULG/UBootCrashAfterRelocation applies to your situation.
Your RAM is not working reliably when being accessed in burst mode. The cause for this may be anything - from bugs in your port of U-Boot to bad memory modules or board layout issues.
Best regards,
Wolfgang Denk

robert lazarski schrieb:
I see no problems with u-boot - relocation seems to work. The link suggested following my memory specs "to the letter" . So far I'm just calling spd_sdram() like most 85xx boards do - should I look there? Since my kernel boots sometimes into bash, but usually doesn't, I'm trying to confirm my memory is functioning. When the kernel fails to boot the eldk 85xx uRamdisk, its crashes at several different places before it loads the RFS. A few times everything just worked fine, which makes me think its a hardware issue. Any suggestions to tracking that type of problem down?
If you suspect hardware problems or critical timing, you could propably (additional to Jerry's ideas) vary voltages as well as the temperature of your board. If you see any correlation to the number of crashes you will have a hardware problem for sure.
I've had a problem with some flash memory on a broken 8548 board last week which was working more reliable on higher temperatures... @:-]
Regards,
participants (5)
-
Clemens Koller
-
David Hawkins
-
Jerry Van Baren
-
robert lazarski
-
Wolfgang Denk