[U-Boot] amcc kilauea odd crashes

Hi List,
Just updated my kilauea to top of git (2008.10-00092-gd9d8c7c) and am seeing odd crashes:
- sometimes u-boot comes up normally, but printenv displays random crap.
- sometimes I get machine checks such as here [1].
- sometimes it resets without any message at all.
The major visible change since the last update is the "DRAM: auto calibration" eye candy.
Does anyone see any similar problems? To me this smells like a faulty DRAM configuration.
Best regards Markus
P.S.: I'll go back and test some prior version, but unfortunaltely the board is bricked and I first need to find a BDI.
[1]:
U-Boot 2008.10-00092-gd9d8c7c (Oct 23 2008 - 10:59:45) CPU: AMCC PowerPC 405EX Rev. A at 400 MHz (PLB=200, OPB=100, EBC=100 MHz) Security support Bootstrap Option H - Boot ROM Location I2C (Addr 0x52) 16 kB I-Cache 16 kB D-Cache Board: Kilauea - AMCC PPC405EX Evaluation Board I2C: ready DTT1: 44 C DRAM: 256 MB FLASH: NIP: 0FFADE30 XER: 00000000 LR: 0FFADEC8 REGS: 0fea0d20 TRAP: 0700 DEAR: 00000000 MSR: 00021000 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 00
GPR00: 0FEA0020 0FEA0E10 0FEA0F44 0FFF8DFC 000000F0 0FEA0E2F 000000F0 5D396680 GPR08: 0FEA0E1C 0FEA0E1C 00000000 00000000 00002FAF 00000000 0FFF1800 10005000 GPR16: 0FFE6A14 0FFEC8E8 00000000 00000000 00000001 00000000 00000000 00000000 GPR24: 00000000 FC000000 00000000 00000000 0FFF8DFC 000000F0 0FEAAA00 17D7FFE8 ** Illegal Instruction ** Call backtrace: 17D78400 0FFADEB4 0FFAED50 0FFAF398 0FFA8DC8 0FFA76A4 Program Check Exception

Hi Markus,
On Thursday 23 October 2008, Markus Klotzbücher wrote:
Just updated my kilauea to top of git (2008.10-00092-gd9d8c7c) and am seeing odd crashes:
Is this a 600MHz Kilauea? This is a known issue, that the new autocalibration code has a problem here. I reported this problem a few weeks ago. AMCC is currently working on a fix for this.
sometimes u-boot comes up normally, but printenv displays random crap.
sometimes I get machine checks such as here [1].
sometimes it resets without any message at all.
The major visible change since the last update is the "DRAM: auto calibration" eye candy.
Does anyone see any similar problems? To me this smells like a faulty DRAM configuration.
Yes.
Best regards Markus
P.S.: I'll go back and test some prior version, but unfortunaltely the board is bricked and I first need to find a BDI.
:-(
Best regards, Stefan
===================================================================== DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: +49-8142-66989-0 Fax: +49-8142-66989-80 Email: office@denx.de =====================================================================

Dear Stefan,
In message 200810231251.36841.sr@denx.de you wrote:
Is this a 600MHz Kilauea? This is a known issue, that the new autocalibration
No, this is a CPU Rev. A board at 400 Mhz.
code has a problem here. I reported this problem a few weeks ago. AMCC is currently working on a fix for this.
Should we not backout the autocalib patches that cause the problem until a stable working solution is found?
Best regards,
Wolfgang Denk

Hi Wolfgang,
On Thursday 23 October 2008, Wolfgang Denk wrote:
In message 200810231251.36841.sr@denx.de you wrote:
Is this a 600MHz Kilauea? This is a known issue, that the new autocalibration
No, this is a CPU Rev. A board at 400 Mhz.
Probably with the same DDR2 frequency.
code has a problem here. I reported this problem a few weeks ago. AMCC is currently working on a fix for this.
Should we not backout the autocalib patches that cause the problem until a stable working solution is found?
Not sure. My hope is that AMCC find a solution quickly. They should receive the failing board this week.
And they already did send a "fix" (more a workaround) for this problem:
[PATCH v2] ppc4xx: Fix DDR2 auto calibration on Kilauea 600MHz
which you rejected. So I suggest to wait for a few days.
Victor, Adam, did you already receive my board?
Best regards, Stefan
===================================================================== DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: +49-8142-66989-0 Fax: +49-8142-66989-80 Email: office@denx.de =====================================================================

Dear Stefan,
In message 200810231315.08807.sr@denx.de you wrote:
Should we not backout the autocalib patches that cause the problem until a stable working solution is found?
Not sure. My hope is that AMCC find a solution quickly. They should receive the failing board this week.
And they already did send a "fix" (more a workaround) for this problem:
[PATCH v2] ppc4xx: Fix DDR2 auto calibration on Kilauea 600MHz
which you rejected. So I suggest to wait for a few days.
Well, that was one full month ago, and nothing happened since.1s
I see people running into problems with the current code, so I vote to back out the culprit until a real fix has been found.
Best regards,
Wolfgang Denk

Hi Wolfgang,
On Thursday 23 October 2008, Wolfgang Denk wrote:
Should we not backout the autocalib patches that cause the problem until a stable working solution is found?
Not sure. My hope is that AMCC find a solution quickly. They should receive the failing board this week.
And they already did send a "fix" (more a workaround) for this problem:
[PATCH v2] ppc4xx: Fix DDR2 auto calibration on Kilauea 600MHz
which you rejected. So I suggest to wait for a few days.
Well, that was one full month ago, and nothing happened since.1s
That's not correct. One patch got checked in which definitely made the situation better:
f8a00dea841d5d75de1f8e8107e90ee1beeddf5f
ppc4xx: Reset and relock memory DLL after SDRAM_CLKTR change
After changing SDRAM_CLKTR phase value rerun the memory preload initialization sequence (INITPLR) to reset and relock the memory DLL. Changing the SDRAM_CLKTR memory clock phase coarse timing adjustment effects the phase relationship of the internal, to the PPC chip, and external, to the PPC chip, versions of MEMCLK_OUT.
Signed-off-by: Adam Graham agraham@amcc.com Signed-off-by: Victor Gallardo vgallardo@amcc.com Signed-off-by: Stefan Roese sr@denx.de
Unfortunately it didn't fix all problems. AMCC already provided another patch for testing purposes. Not to the list but to me (and you) directly. Please find it attached again. Would be great if Markus could test it on the failing Kilauea.
I see people running into problems with the current code, so I vote to back out the culprit until a real fix has been found.
Hmmm, "people" have been running into this problem before too. That we me.
Again, let's please wait at least for Adam and/or Victor to comment on this issue. It should only be a few hours until they read their mails.
Best regards, Stefan
===================================================================== DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: +49-8142-66989-0 Fax: +49-8142-66989-80 Email: office@denx.de =====================================================================

On Thu, Oct 23, 2008 at 01:30:08PM +0200, Stefan Roese wrote:
Hi Wolfgang,
On Thursday 23 October 2008, Wolfgang Denk wrote:
Should we not backout the autocalib patches that cause the problem until a stable working solution is found?
Not sure. My hope is that AMCC find a solution quickly. They should receive the failing board this week.
And they already did send a "fix" (more a workaround) for this problem:
[PATCH v2] ppc4xx: Fix DDR2 auto calibration on Kilauea 600MHz
which you rejected. So I suggest to wait for a few days.
Well, that was one full month ago, and nothing happened since.1s
That's not correct. One patch got checked in which definitely made the situation better:
f8a00dea841d5d75de1f8e8107e90ee1beeddf5f
ppc4xx: Reset and relock memory DLL after SDRAM_CLKTR change After changing SDRAM_CLKTR phase value rerun the memory preload initialization sequence (INITPLR) to reset and relock the memory DLL. Changing the SDRAM_CLKTR memory clock phase coarse timing adjustment effects the phase relationship of the internal, to the PPC chip, and external, to the PPC chip, versions of MEMCLK_OUT. Signed-off-by: Adam Graham <agraham@amcc.com> Signed-off-by: Victor Gallardo <vgallardo@amcc.com> Signed-off-by: Stefan Roese <sr@denx.de>
Unfortunately it didn't fix all problems. AMCC already provided another patch for testing purposes. Not to the list but to me (and you) directly. Please find it attached again. Would be great if Markus could test it on the failing Kilauea.
I tested it and it's still failing. I dare say the patch makes things worse. After about 20 hard resets the board didn't reach the u-boot console a single time.
:-(
Thanks Markus

On Thursday 23 October 2008, Markus Klotzbücher wrote:
Unfortunately it didn't fix all problems. AMCC already provided another patch for testing purposes. Not to the list but to me (and you) directly. Please find it attached again. Would be great if Markus could test it on the failing Kilauea.
I tested it and it's still failing. I dare say the patch makes things worse. After about 20 hard resets the board didn't reach the u-boot console a single time.
:-(
Too bad. Thanks for testing though.
Adam & Victor, any ideas? If you don't see a "quick" solution, please provide a temporary fix for this problem.
Thanks.
Best regards, Stefan
===================================================================== DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: +49-8142-66989-0 Fax: +49-8142-66989-80 Email: office@denx.de =====================================================================

Subject: Re: [U-Boot] amcc kilauea odd crashes
On Thursday 23 October 2008, Markus Klotzbücher wrote:
Unfortunately it didn't fix all problems. AMCC already provided another patch for testing purposes. Not to the list but
to me (and you) directly.
Please find it attached again. Would be great if Markus
could test
it on the failing Kilauea.
I tested it and it's still failing. I dare say the patch
makes things
worse. After about 20 hard resets the board didn't reach the u-boot console a single time.
:-(
Markus, yes, thank you for your testing and debug efforts.
Too bad. Thanks for testing though.
Adam & Victor, any ideas? If you don't see a "quick" solution, please provide a temporary fix for this problem.
Markus, the only quick solution that we have for immediate application is to disable the DDR autocalibration and therefore run with the original Kilauea SDRAM static values. This will require you to recompile U-Boot and download it onto your board.
To disable the DDR autocalibration code, in the include/configs/kilauea.h file, undef the "CONFIG_PPC4xx_DDR_AUTOCALIBRATION" definition and recompile U-Boot.
Once we receive Stefan's Kilauea board we can debug his DDR autocalibration issue, which probably is a similar issue to what you are seeing on your board. We will keep you in the loop.
As an aside I grabbed the top of git tree and tried it on all my Kilauea boards, including the latest board which supports (CPU=600MHz, PLB=200MHz) clocking and I can not get DDR autocalibration to fail. I also successfully tried the PPC405EX chip built-in bootstraps of (CPU=333MHz PLB=166MHz, CPU=400MHz PLB=200MHz) and the bootstrap EEPROM strapping of (CPU=533MHz PLB=177MHz).
Thanks, Adam
Thanks.
Best regards, Stefan
===================================================================== DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: +49-8142-66989-0 Fax: +49-8142-66989-80 Email: office@denx.de =====================================================================

On Friday 24 October 2008, Adam Graham wrote:
Too bad. Thanks for testing though.
Adam & Victor, any ideas? If you don't see a "quick" solution, please provide a temporary fix for this problem.
Markus, the only quick solution that we have for immediate application is to disable the DDR autocalibration and therefore run with the original Kilauea SDRAM static values. This will require you to recompile U-Boot and download it onto your board.
To disable the DDR autocalibration code, in the include/configs/kilauea.h file, undef the "CONFIG_PPC4xx_DDR_AUTOCALIBRATION" definition and recompile U-Boot.
I just sent a patch disabling the autocalibration to the list. I intend to push this to Wolfgang soon. We can enable the autocalibration again when the issue is fixed.
Best regards, Stefan
===================================================================== DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: +49-8142-66989-0 Fax: +49-8142-66989-80 Email: office@denx.de =====================================================================

On Thu, Oct 23, 2008 at 07:51:30AM -0700, prodyut hazarika wrote:
I tested it and it's still failing. I dare say the patch makes things worse. After about 20 hard resets the board didn't reach the u-boot console a single time.
Markus, have you got access to BDI?
I sure do!
Regards Markus

Unfortunately it didn't fix all problems. AMCC already provided another patch for testing purposes. Not to the list but to me (and you) directly. Please find it attached again. Would be
great if Markus
could test it on the failing Kilauea.
I tested it and it's still failing. I dare say the patch makes things worse. After about 20 hard resets the board didn't reach the u-boot console a single time.
:-(
Thanks Markus
Markus,
Sorry, ignore my prior request for you to try out those unreleased patches as I see that you already tried them out and the patches did not work.
Thanks, Adam

Hi Wolfgang,
On Thursday 23 October 2008, Wolfgang Denk wrote:
Should we not backout the autocalib patches that cause
the problem
until a stable working solution is found?
Not sure. My hope is that AMCC find a solution quickly.
They should
receive the failing board this week.
We have not received this board yet and hope to try out our unreleased patches on your Kilauea board. (That being said, it looks like Markus' Kilauea configuration can also be used to test the unreleased patches - CPU 400MHz, PLB 200MHz.)
And they already did send a "fix" (more a workaround) for
this problem:
[PATCH v2] ppc4xx: Fix DDR2 auto calibration on Kilauea 600MHz
which you rejected. So I suggest to wait for a few days.
Well, that was one full month ago, and nothing happened since.1s
That's not correct. One patch got checked in which definitely made the situation better:
f8a00dea841d5d75de1f8e8107e90ee1beeddf5f
ppc4xx: Reset and relock memory DLL after SDRAM_CLKTR change After changing SDRAM_CLKTR phase value rerun the memory preload initialization sequence (INITPLR) to reset and relock the memory DLL. Changing the SDRAM_CLKTR memory clock phase coarse timing adjustment effects the phase relationship of the internal, to the PPC chip, and external, to the PPC chip, versions of MEMCLK_OUT. Signed-off-by: Adam Graham <agraham@amcc.com> Signed-off-by: Victor Gallardo <vgallardo@amcc.com> Signed-off-by: Stefan Roese <sr@denx.de>
Unfortunately it didn't fix all problems. AMCC already provided another patch for testing purposes. Not to the list but to me (and you) directly. Please find it attached again. Would be great if Markus could test it on the failing Kilauea.
Stefan, thank you for send the test patch to Markus.
Markus, if you like to test out this patch and send us (AMCC) the results, that would be appreciated. To get more DDR autocalibration information, you can set the U-Boot environment variable "autocalib" to "loop". This will display all the passing and non-passing write-read-compare memory windows as well as the final result that was chosen.
=> setenv autocalib loop => saveenv => reset
To remove the "autocalib" verbosity, unset this "autocalib" environment variable.
Thanks, Adam Graham AMCC

-----Original Message----- From: Stefan Roese [mailto:sr@denx.de] Sent: Thursday, October 23, 2008 4:15 AM To: u-boot@lists.denx.de Cc: Wolfgang Denk; Markus Klotzbücher; Adam Graham; Victor Gallardo Subject: Re: [U-Boot] amcc kilauea odd crashes
Hi Wolfgang,
On Thursday 23 October 2008, Wolfgang Denk wrote:
In message 200810231251.36841.sr@denx.de you wrote:
Is this a 600MHz Kilauea? This is a known issue, that the new autocalibration
No, this is a CPU Rev. A board at 400 Mhz.
Probably with the same DDR2 frequency.
Correct, the important clocking is the DDR2 frequency and not the CPU frequency (i.e. the PLB of 200MHz set on this board).
code has a problem here. I reported this problem a few weeks ago. AMCC is currently working on a fix for this.
Should we not backout the autocalib patches that cause the problem until a stable working solution is found?
Not sure. My hope is that AMCC find a solution quickly. They should receive the failing board this week.
And they already did send a "fix" (more a workaround) for this problem:
[PATCH v2] ppc4xx: Fix DDR2 auto calibration on Kilauea 600MHz
which you rejected. So I suggest to wait for a few days.
Victor, Adam, did you already receive my board?
We have not received your board yet. Do you have a shipping tracking number we can check?
Thanks, Adam
Best regards, Stefan
===================================================================== DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: +49-8142-66989-0 Fax: +49-8142-66989-80 Email: office@denx.de =====================================================================
participants (5)
-
Adam Graham
-
Markus Klotzbücher
-
prodyut hazarika
-
Stefan Roese
-
Wolfgang Denk