
On Fri, 09 Sep 2005 11:54:15 +0200, Clemens Koller writes:
How do you test the DDR memory in your case? It would be interesting to do the same over here. Is there some code available?
Thanks very much for the quick response... I use the "mtest" command in U-Boot. There is also a Linux memory tester called "memtester" (URL: http://pyropus.ca/software/memtester/ - another page that might interest you resides at: http://linuxquality.sunsite.dk/articles/testsuites/).
But the dead giveaway is while running in linux, getting an Illegal Instruction exception on a legal instruction in a program that ran fine a number of times before and after, doing identical tasks (also get random Segmentation Violations) - plus I get random exceptions in kernel mode that crashes Linux. I am mounting the root filesystem via NFS.
It appears that I can get the memory stable in the memory testers, but something to do with the activity that Linux generates seems to trigger the problem (although the boot code will hang randomly if left long enough - the processor goes into a checkstop condition - I am guessing it gets a memory error in the middle of handling an exception - would this do it?).
I can't really rule out some problem in the Linux virtual memory system I suppose, but it is too suspicious. I am running with all caches turned off, and L2 is even disabled in HID1 (along with PCI and RIO which we dont use - I was hoping that completely disabling these parts of the chip in DEVDISR might reduce the internal noise).
A relevant point is that our DDR memory bus tracks are relatively short, I believe - all are the same length @ 54mm (~300ps propagation delay). I base this on various things I have read e.g. the worked CPO example in AN2583 uses 800-1000ps of propagation delay on the PCB - three times ours (but the difference isn't enough to make the default CPO value not work). We also do not use ECC - I was wondering if ECC might protect others in a similar situation e.g. if they only get single bit errors which are corrected.
I am hoping someone will reply saying "you fool - you forgot xxxxx" :-)
Over here, I am working on an MPC8540 board from MicroSys which is able to do ECC. Currently, I haven't seen any errors looking at the ECC status registers after several days uptime and lots of memory io, so I think the DDR is okay. I turned off ECC for some performance tests (no results yet)... never had a problem there, too.
...
Mine says: PPC8540 PX833LB 2L71V MSIA QEAD0412 working fine @166MHz
OK thanks for the info - it is much appreciated. Cheers! Murray... -- Murray Jensen, CSIRO Manufacturing & Infra. Tech. Phone: +61 3 9662 7763 Locked Bag No. 9, Preston, Vic, 3072, Australia. Fax: +61 3 9662 7853 Internet: Murray.Jensen@csiro.au
To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference.
The information contained in this e-mail may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this e-mail in error, please delete it immediately and notify Murray Jensen on +61 3 9662 7763. Thank you.