[U-Boot-Users] MPC8560 DDR controller

Hi U-Boot Users, I noticed a patch go through CVS a few days ago which updated the DDR DLL workaround for the tqm8560 and tqm8540 boards and it piqued my interest - mainly because we are having trouble with the DDR memory on our MPC8560 board.
I am interested in other people's experiences with the DDR controller on their MPC8560 based boards - either positive or negative. Here is our tail ...
It appears that noise internal to the processor is corrupting reads from DDR memory (by causing the DDR DLL to drop out). We have played extensively with the DDR configuration, following instructions in AN2583 (rev 2) and the DDR11 errata, with some success - but I don't think we have ever been able to completely eliminate the problem, only greatly reduce its frequency.
The things that seemed to help the most were setting the Reduced Drive Strength extended operating mode in the DDR memory chips themselves, and slowing the bus rate from 166MHz to 150Mhz. i.e. both of these really have nothing to do with the processor's DDR controller, or any errata workarounds, they obviously just combine to reduce noise on the bus.
Our processors are marked "PPC8560..." rather than "MPC8560.." and are dated 0412 (week 12 of 2004 I believe ??). We were thinking that maybe these were pre-production chips and it might help things if we replaced them with "younger chips". By the way, I believe our chips are version 2.0.2.
The board is currently usable for development, but would be unacceptable for production use - but even in development, if you get a crash, you can't really be sure if there was a real problem or whether the memory read simply failed.
Does anyone have any experiences and/or advise they could share with us? Thank you for listening. Cheers! Murray...

Hello, Murray!
Murray.Jensen@csiro.au wrote:
Hi U-Boot Users, I noticed a patch go through CVS a few days ago which updated the DDR DLL workaround for the tqm8560 and tqm8540 boards and it piqued my interest - mainly because we are having trouble with the DDR memory on our MPC8560 board.
How do you test the DDR memory in your case? It would be interesting to do the same over here. Is there some code available?
I am interested in other people's experiences with the DDR controller on their MPC8560 based boards - either positive or negative. Here is our tail ...
Over here, I am working on an MPC8540 board from MicroSys which is able to do ECC. Currently, I haven't seen any errors looking at the ECC status registers after several days uptime and lots of memory io, so I think the DDR is okay. I turned off ECC for some performance tests (no results yet)... never had a problem there, too.
Our processors are marked "PPC8560..." rather than "MPC8560.." and are dated 0412 (week 12 of 2004 I believe ??). We were thinking that maybe these were pre-production chips and it might help things if we replaced them with "younger chips". By the way, I believe our chips are version 2.0.2.
Mine says: PPC8540 PX833LB 2L71V MSIA QEAD0412 working fine @166MHz
Greets,
Clemens Koller _______________________________ R&D Imaging Devices Anagramm GmbH Rupert-Mayer-Str. 45/1 81379 Muenchen Germany
http://www.anagramm.de Phone: +49-89-741518-50 Fax: +49-89-741518-19

On Fri, 09 Sep 2005 11:54:15 +0200, Clemens Koller writes:
How do you test the DDR memory in your case? It would be interesting to do the same over here. Is there some code available?
Thanks very much for the quick response... I use the "mtest" command in U-Boot. There is also a Linux memory tester called "memtester" (URL: http://pyropus.ca/software/memtester/ - another page that might interest you resides at: http://linuxquality.sunsite.dk/articles/testsuites/).
But the dead giveaway is while running in linux, getting an Illegal Instruction exception on a legal instruction in a program that ran fine a number of times before and after, doing identical tasks (also get random Segmentation Violations) - plus I get random exceptions in kernel mode that crashes Linux. I am mounting the root filesystem via NFS.
It appears that I can get the memory stable in the memory testers, but something to do with the activity that Linux generates seems to trigger the problem (although the boot code will hang randomly if left long enough - the processor goes into a checkstop condition - I am guessing it gets a memory error in the middle of handling an exception - would this do it?).
I can't really rule out some problem in the Linux virtual memory system I suppose, but it is too suspicious. I am running with all caches turned off, and L2 is even disabled in HID1 (along with PCI and RIO which we dont use - I was hoping that completely disabling these parts of the chip in DEVDISR might reduce the internal noise).
A relevant point is that our DDR memory bus tracks are relatively short, I believe - all are the same length @ 54mm (~300ps propagation delay). I base this on various things I have read e.g. the worked CPO example in AN2583 uses 800-1000ps of propagation delay on the PCB - three times ours (but the difference isn't enough to make the default CPO value not work). We also do not use ECC - I was wondering if ECC might protect others in a similar situation e.g. if they only get single bit errors which are corrected.
I am hoping someone will reply saying "you fool - you forgot xxxxx" :-)
Over here, I am working on an MPC8540 board from MicroSys which is able to do ECC. Currently, I haven't seen any errors looking at the ECC status registers after several days uptime and lots of memory io, so I think the DDR is okay. I turned off ECC for some performance tests (no results yet)... never had a problem there, too.
...
Mine says: PPC8540 PX833LB 2L71V MSIA QEAD0412 working fine @166MHz
OK thanks for the info - it is much appreciated. Cheers! Murray... -- Murray Jensen, CSIRO Manufacturing & Infra. Tech. Phone: +61 3 9662 7763 Locked Bag No. 9, Preston, Vic, 3072, Australia. Fax: +61 3 9662 7853 Internet: Murray.Jensen@csiro.au
To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference.
The information contained in this e-mail may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this e-mail in error, please delete it immediately and notify Murray Jensen on +61 3 9662 7763. Thank you.

Hi, Murray!
(Sorry, this got OT now)
Murray.Jensen@csiro.au wrote:
Thanks very much for the quick response... I use the "mtest" command in U-Boot. There is also a Linux memory tester called "memtester" (URL: http://pyropus.ca/software/memtester/ - another page that might interest you resides at: http://linuxquality.sunsite.dk/articles/testsuites/).
Good stuff. Thanks!
But the dead giveaway is while running in linux, getting an Illegal Instruction exception on a legal instruction in a program that ran fine a number of times before and after, doing identical tasks (also get random Segmentation Violations) - plus I get random exceptions in kernel mode that crashes Linux. I am mounting the root filesystem via NFS.
Okay, so, you speak about an error rate of about 1..10 cph (crash per hour)?! Never had this over here.
I can't really rule out some problem in the Linux virtual memory system
Oh, I wouldn't blame the kernel in that case. 2.6.8 to 2.6.13 work really fine over here. What linux do you use?
A relevant point is that our DDR memory bus tracks are relatively short, I believe - all are the same length @ 54mm (~300ps propagation delay). I base this on various things I have read e.g. the worked CPO example in AN2583 uses 800-1000ps of propagation delay on the PCB - three times ours (but the difference isn't enough to make the default CPO value not work). We also do not use ECC - I was wondering if ECC might protect others in a similar situation e.g. if they only get single bit errors which are corrected.
Check the datasheet according ECC. AFAIK it can correct single-bit errors and detect double-bit errors. If you would have the chance to have an ECCable system you can pretty easily check the reliability of the DDR looking at the ECC status registers. And just for completeness: It's possible to turn off ECC during linux-runtime pretty reliable with something like this:
tmp=immap->im_ddr.sdram_cfg; mb(); /* sync */ wmb(); /* eieio */ immap->im_ddr.sdram_cfg = tmp & ~0x20000000; mb(); /* sync */ wmb(); /* eieio*/
I am hoping someone will reply saying "you fool - you forgot xxxxx" :-)
There is a story around here (from a guy who did a DDR for MPC8540) that you might have to put on really good capacitors on some of the ddr vref voltages on the CPU side because something is bouncing there more than you would expect...
You also might want to check what happens if you increase/decrease supply voltages here and there... well... you know, all these EE tricks.
Greets,
Clemens Koller _______________________________ R&D Imaging Devices Anagramm GmbH Rupert-Mayer-Str. 45/1 81379 Muenchen Germany
http://www.anagramm.de Phone: +49-89-741518-50 Fax: +49-89-741518-19

On Fri, 09 Sep 2005 14:37:21 +0200, Clemens Koller writes:
I can't really rule out some problem in the Linux virtual memory system
Oh, I wouldn't blame the kernel in that case. 2.6.8 to 2.6.13 work really fine over here. What linux do you use?
We are using 2.6.13, with some patches from the ozlabs patch tracker.
If you would have the chance to have an ECCable system you can pretty easily check the reliability of the DDR looking at the ECC status registers.
Unfortunately, this would require a board re-design.
There is a story around here (from a guy who did a DDR for MPC8540) that you might have to put on really good capacitors on some of the ddr vref voltages on the CPU side because something is bouncing there more than you would expect...
Thanks - we will take a look at them ...
You also might want to check what happens if you increase/decrease supply voltages here and there... well... you know, all these EE tricks.
Your response is very much appreciated. Cheers! Murray...
participants (2)
-
Clemens Koller
-
Murray.Jensen@csiro.au