[PATCH u-boot-marvell 1/2] arm: mvebu: spl: Add option to reset the board on DDR training failure

From: Marek Behún marek.behun@nic.cz
Some boards may occacionally fail DDR training. Currently we hang() in this case. Add an option that makes the board do an immediate reset in such a case, so that a new training is tried as soon as possible, instead of hanging and possibly waiting for watchdog to reset the board.
Signed-off-by: Marek Behún marek.behun@nic.cz --- arch/arm/mach-mvebu/Kconfig | 9 +++++++++ arch/arm/mach-mvebu/spl.c | 6 +++++- 2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mach-mvebu/Kconfig b/arch/arm/mach-mvebu/Kconfig index d23cc0c760..ed957be6e1 100644 --- a/arch/arm/mach-mvebu/Kconfig +++ b/arch/arm/mach-mvebu/Kconfig @@ -213,6 +213,15 @@ config DDR_LOG_LEVEL At level 3, rovides the windows margin of each DQ as a results of DQS centeralization.
+config DDR_RESET_ON_TRAINING_FAILURE + bool "Reset the board on DDR training failure instead of hanging" + depends on ARMADA_38X || ARMADA_XP + help + If DDR training fails in SPL, reset the board instead of hanging. + Some boards are known to fail DDR training occasionally and an + immediate reset may be preferable to waiting until the board is + reset by watchdog (if there even is one). + config SYS_BOARD default "clearfog" if TARGET_CLEARFOG default "helios4" if TARGET_HELIOS4 diff --git a/arch/arm/mach-mvebu/spl.c b/arch/arm/mach-mvebu/spl.c index 273ecb8bd6..d3c3bdc74d 100644 --- a/arch/arm/mach-mvebu/spl.c +++ b/arch/arm/mach-mvebu/spl.c @@ -4,6 +4,7 @@ */
#include <common.h> +#include <cpu_func.h> #include <dm.h> #include <fdtdec.h> #include <hang.h> @@ -330,7 +331,10 @@ void board_init_f(ulong dummy) ret = ddr3_init(); if (ret) { printf("ddr3_init() failed: %d\n", ret); - hang(); + if (IS_ENABLED(CONFIG_DDR_RESET_ON_TRAINING_FAILURE)) + reset_cpu(); + else + hang(); } #endif

From: Marek Behún marek.behun@nic.cz
The state of the current DDR training code for Armada 38x is such that we cannot be sure it will always train successfully - although after the last change we were yet unable to find a board that failed DDR training, from experience in the last 2 years we know that it is possible.
The experience also tells us that in many cases the board fails training only sometimes, and after a reset the training is successful.
Enable the new option that makes the board reset itself on DDR training failure immediately. Until now we called hang() in such a case, which meant that the board was reset by the MCU after 120 seconds.
Signed-off-by: Marek Behún marek.behun@nic.cz --- configs/turris_omnia_defconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/configs/turris_omnia_defconfig b/configs/turris_omnia_defconfig index d6f70caeaf..010d69adcc 100644 --- a/configs/turris_omnia_defconfig +++ b/configs/turris_omnia_defconfig @@ -11,6 +11,7 @@ CONFIG_NR_DRAM_BANKS=2 CONFIG_SYS_MEMTEST_START=0x00800000 CONFIG_SYS_MEMTEST_END=0x00ffffff CONFIG_TARGET_TURRIS_OMNIA=y +CONFIG_DDR_RESET_ON_TRAINING_FAILURE=y CONFIG_ENV_SIZE=0x10000 CONFIG_ENV_OFFSET=0xF0000 CONFIG_ENV_SECT_SIZE=0x10000

On 2/17/22 01:08, Marek Behún wrote:
From: Marek Behún marek.behun@nic.cz
The state of the current DDR training code for Armada 38x is such that we cannot be sure it will always train successfully - although after the last change we were yet unable to find a board that failed DDR training, from experience in the last 2 years we know that it is possible.
The experience also tells us that in many cases the board fails training only sometimes, and after a reset the training is successful.
Enable the new option that makes the board reset itself on DDR training failure immediately. Until now we called hang() in such a case, which meant that the board was reset by the MCU after 120 seconds.
Signed-off-by: Marek Behún marek.behun@nic.cz
Reviewed-by: Stefan Roese sr@denx.de
Thanks, Stefan
configs/turris_omnia_defconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/configs/turris_omnia_defconfig b/configs/turris_omnia_defconfig index d6f70caeaf..010d69adcc 100644 --- a/configs/turris_omnia_defconfig +++ b/configs/turris_omnia_defconfig @@ -11,6 +11,7 @@ CONFIG_NR_DRAM_BANKS=2 CONFIG_SYS_MEMTEST_START=0x00800000 CONFIG_SYS_MEMTEST_END=0x00ffffff CONFIG_TARGET_TURRIS_OMNIA=y +CONFIG_DDR_RESET_ON_TRAINING_FAILURE=y CONFIG_ENV_SIZE=0x10000 CONFIG_ENV_OFFSET=0xF0000 CONFIG_ENV_SECT_SIZE=0x10000
Viele Grüße, Stefan Roese

On 2/17/22 01:08, Marek Behún wrote:
From: Marek Behún marek.behun@nic.cz
Some boards may occacionally fail DDR training. Currently we hang() in this case. Add an option that makes the board do an immediate reset in such a case, so that a new training is tried as soon as possible, instead of hanging and possibly waiting for watchdog to reset the board.
Signed-off-by: Marek Behún marek.behun@nic.cz
arch/arm/mach-mvebu/Kconfig | 9 +++++++++ arch/arm/mach-mvebu/spl.c | 6 +++++- 2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mach-mvebu/Kconfig b/arch/arm/mach-mvebu/Kconfig index d23cc0c760..ed957be6e1 100644 --- a/arch/arm/mach-mvebu/Kconfig +++ b/arch/arm/mach-mvebu/Kconfig @@ -213,6 +213,15 @@ config DDR_LOG_LEVEL At level 3, rovides the windows margin of each DQ as a results of DQS centeralization.
+config DDR_RESET_ON_TRAINING_FAILURE
- bool "Reset the board on DDR training failure instead of hanging"
- depends on ARMADA_38X || ARMADA_XP
- help
If DDR training fails in SPL, reset the board instead of hanging.
Some boards are known to fail DDR training occasionally and an
immediate reset may be preferable to waiting until the board is
reset by watchdog (if there even is one).
I'm wondering a bit, if we could/should make this a global Kconfig symbol, so that other boards/platforms might use it as well. But we can always generalize this at a later point.
Reviewed-by: Stefan Roese sr@denx.de
Thanks, Stefan
config SYS_BOARD default "clearfog" if TARGET_CLEARFOG default "helios4" if TARGET_HELIOS4 diff --git a/arch/arm/mach-mvebu/spl.c b/arch/arm/mach-mvebu/spl.c index 273ecb8bd6..d3c3bdc74d 100644 --- a/arch/arm/mach-mvebu/spl.c +++ b/arch/arm/mach-mvebu/spl.c @@ -4,6 +4,7 @@ */
#include <common.h> +#include <cpu_func.h> #include <dm.h> #include <fdtdec.h> #include <hang.h> @@ -330,7 +331,10 @@ void board_init_f(ulong dummy) ret = ddr3_init(); if (ret) { printf("ddr3_init() failed: %d\n", ret);
hang();
if (IS_ENABLED(CONFIG_DDR_RESET_ON_TRAINING_FAILURE))
reset_cpu();
else
} #endifhang();
Viele Grüße, Stefan Roese

On Thursday 17 February 2022 01:08:48 Marek Behún wrote:
From: Marek Behún marek.behun@nic.cz
Some boards may occacionally fail DDR training. Currently we hang() in this case. Add an option that makes the board do an immediate reset in such a case, so that a new training is tried as soon as possible, instead of hanging and possibly waiting for watchdog to reset the board.
Signed-off-by: Marek Behún marek.behun@nic.cz Reviewed-by: Stefan Roese sr@denx.de
arch/arm/mach-mvebu/Kconfig | 9 +++++++++ arch/arm/mach-mvebu/spl.c | 6 +++++- 2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mach-mvebu/Kconfig b/arch/arm/mach-mvebu/Kconfig index d23cc0c760..ed957be6e1 100644 --- a/arch/arm/mach-mvebu/Kconfig +++ b/arch/arm/mach-mvebu/Kconfig @@ -213,6 +213,15 @@ config DDR_LOG_LEVEL At level 3, rovides the windows margin of each DQ as a results of DQS centeralization.
+config DDR_RESET_ON_TRAINING_FAILURE
- bool "Reset the board on DDR training failure instead of hanging"
- depends on ARMADA_38X || ARMADA_XP
- help
If DDR training fails in SPL, reset the board instead of hanging.
Some boards are known to fail DDR training occasionally and an
immediate reset may be preferable to waiting until the board is
reset by watchdog (if there even is one).
config SYS_BOARD default "clearfog" if TARGET_CLEARFOG default "helios4" if TARGET_HELIOS4 diff --git a/arch/arm/mach-mvebu/spl.c b/arch/arm/mach-mvebu/spl.c index 273ecb8bd6..d3c3bdc74d 100644 --- a/arch/arm/mach-mvebu/spl.c +++ b/arch/arm/mach-mvebu/spl.c @@ -4,6 +4,7 @@ */
#include <common.h> +#include <cpu_func.h> #include <dm.h> #include <fdtdec.h> #include <hang.h> @@ -330,7 +331,10 @@ void board_init_f(ulong dummy) ret = ddr3_init(); if (ret) { printf("ddr3_init() failed: %d\n", ret);
hang();
if (IS_ENABLED(CONFIG_DDR_RESET_ON_TRAINING_FAILURE))
reset_cpu();
You should not call reset_cpu() from SPL loaded via UART. This will confuse x-modem software. Either return failure to BootROM or hang() like it was before.
else
}hang();
#endif

On 2/17/22 12:37, Pali Rohár wrote:
On Thursday 17 February 2022 01:08:48 Marek Behún wrote:
From: Marek Behún marek.behun@nic.cz
Some boards may occacionally fail DDR training. Currently we hang() in this case. Add an option that makes the board do an immediate reset in such a case, so that a new training is tried as soon as possible, instead of hanging and possibly waiting for watchdog to reset the board.
Signed-off-by: Marek Behún marek.behun@nic.cz Reviewed-by: Stefan Roese sr@denx.de
arch/arm/mach-mvebu/Kconfig | 9 +++++++++ arch/arm/mach-mvebu/spl.c | 6 +++++- 2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mach-mvebu/Kconfig b/arch/arm/mach-mvebu/Kconfig index d23cc0c760..ed957be6e1 100644 --- a/arch/arm/mach-mvebu/Kconfig +++ b/arch/arm/mach-mvebu/Kconfig @@ -213,6 +213,15 @@ config DDR_LOG_LEVEL At level 3, rovides the windows margin of each DQ as a results of DQS centeralization.
+config DDR_RESET_ON_TRAINING_FAILURE
- bool "Reset the board on DDR training failure instead of hanging"
- depends on ARMADA_38X || ARMADA_XP
- help
If DDR training fails in SPL, reset the board instead of hanging.
Some boards are known to fail DDR training occasionally and an
immediate reset may be preferable to waiting until the board is
reset by watchdog (if there even is one).
- config SYS_BOARD default "clearfog" if TARGET_CLEARFOG default "helios4" if TARGET_HELIOS4
diff --git a/arch/arm/mach-mvebu/spl.c b/arch/arm/mach-mvebu/spl.c index 273ecb8bd6..d3c3bdc74d 100644 --- a/arch/arm/mach-mvebu/spl.c +++ b/arch/arm/mach-mvebu/spl.c @@ -4,6 +4,7 @@ */
#include <common.h> +#include <cpu_func.h> #include <dm.h> #include <fdtdec.h> #include <hang.h> @@ -330,7 +331,10 @@ void board_init_f(ulong dummy) ret = ddr3_init(); if (ret) { printf("ddr3_init() failed: %d\n", ret);
hang();
if (IS_ENABLED(CONFIG_DDR_RESET_ON_TRAINING_FAILURE))
reset_cpu();
You should not call reset_cpu() from SPL loaded via UART. This will confuse x-modem software. Either return failure to BootROM or hang() like it was before.
I see that this might confuse the kwboot / x-modem use-case. But AFAIU, this is more targeted to the normal use-case, that the board boots from SPI NOR (etc). And here, a complete re-start could be helpful to at least get the board up and running after a few retries.
Or am I missing something?
Thanks, Stefan

On Thursday 17 February 2022 12:42:58 Stefan Roese wrote:
On 2/17/22 12:37, Pali Rohár wrote:
On Thursday 17 February 2022 01:08:48 Marek Behún wrote:
From: Marek Behún marek.behun@nic.cz
Some boards may occacionally fail DDR training. Currently we hang() in this case. Add an option that makes the board do an immediate reset in such a case, so that a new training is tried as soon as possible, instead of hanging and possibly waiting for watchdog to reset the board.
Signed-off-by: Marek Behún marek.behun@nic.cz Reviewed-by: Stefan Roese sr@denx.de
arch/arm/mach-mvebu/Kconfig | 9 +++++++++ arch/arm/mach-mvebu/spl.c | 6 +++++- 2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mach-mvebu/Kconfig b/arch/arm/mach-mvebu/Kconfig index d23cc0c760..ed957be6e1 100644 --- a/arch/arm/mach-mvebu/Kconfig +++ b/arch/arm/mach-mvebu/Kconfig @@ -213,6 +213,15 @@ config DDR_LOG_LEVEL At level 3, rovides the windows margin of each DQ as a results of DQS centeralization. +config DDR_RESET_ON_TRAINING_FAILURE
- bool "Reset the board on DDR training failure instead of hanging"
- depends on ARMADA_38X || ARMADA_XP
- help
If DDR training fails in SPL, reset the board instead of hanging.
Some boards are known to fail DDR training occasionally and an
immediate reset may be preferable to waiting until the board is
reset by watchdog (if there even is one).
- config SYS_BOARD default "clearfog" if TARGET_CLEARFOG default "helios4" if TARGET_HELIOS4
diff --git a/arch/arm/mach-mvebu/spl.c b/arch/arm/mach-mvebu/spl.c index 273ecb8bd6..d3c3bdc74d 100644 --- a/arch/arm/mach-mvebu/spl.c +++ b/arch/arm/mach-mvebu/spl.c @@ -4,6 +4,7 @@ */ #include <common.h> +#include <cpu_func.h> #include <dm.h> #include <fdtdec.h> #include <hang.h> @@ -330,7 +331,10 @@ void board_init_f(ulong dummy) ret = ddr3_init(); if (ret) { printf("ddr3_init() failed: %d\n", ret);
hang();
if (IS_ENABLED(CONFIG_DDR_RESET_ON_TRAINING_FAILURE))
reset_cpu();
You should not call reset_cpu() from SPL loaded via UART. This will confuse x-modem software. Either return failure to BootROM or hang() like it was before.
I see that this might confuse the kwboot / x-modem use-case. But AFAIU, this is more targeted to the normal use-case, that the board boots from SPI NOR (etc).
But kwb image is universal for both UART and SPI booting. So it target both use cases.
And here, a complete re-start could be helpful to at least get the board up and running after a few retries.
Of course, for SPI boot source it is required.
Or am I missing something?
Just reset_cpu() call should be skipped when SPL is loaded via UART.
Thanks, Stefan

On Thu, 17 Feb 2022 12:37:54 +0100 Pali Rohár pali@kernel.org wrote:
On Thursday 17 February 2022 01:08:48 Marek Behún wrote:
From: Marek Behún marek.behun@nic.cz
Some boards may occacionally fail DDR training. Currently we hang() in this case. Add an option that makes the board do an immediate reset in such a case, so that a new training is tried as soon as possible, instead of hanging and possibly waiting for watchdog to reset the board.
Signed-off-by: Marek Behún marek.behun@nic.cz Reviewed-by: Stefan Roese sr@denx.de
arch/arm/mach-mvebu/Kconfig | 9 +++++++++ arch/arm/mach-mvebu/spl.c | 6 +++++- 2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mach-mvebu/Kconfig b/arch/arm/mach-mvebu/Kconfig index d23cc0c760..ed957be6e1 100644 --- a/arch/arm/mach-mvebu/Kconfig +++ b/arch/arm/mach-mvebu/Kconfig @@ -213,6 +213,15 @@ config DDR_LOG_LEVEL At level 3, rovides the windows margin of each DQ as a results of DQS centeralization.
+config DDR_RESET_ON_TRAINING_FAILURE
- bool "Reset the board on DDR training failure instead of hanging"
- depends on ARMADA_38X || ARMADA_XP
- help
If DDR training fails in SPL, reset the board instead of hanging.
Some boards are known to fail DDR training occasionally and an
immediate reset may be preferable to waiting until the board is
reset by watchdog (if there even is one).
config SYS_BOARD default "clearfog" if TARGET_CLEARFOG default "helios4" if TARGET_HELIOS4 diff --git a/arch/arm/mach-mvebu/spl.c b/arch/arm/mach-mvebu/spl.c index 273ecb8bd6..d3c3bdc74d 100644 --- a/arch/arm/mach-mvebu/spl.c +++ b/arch/arm/mach-mvebu/spl.c @@ -4,6 +4,7 @@ */
#include <common.h> +#include <cpu_func.h> #include <dm.h> #include <fdtdec.h> #include <hang.h> @@ -330,7 +331,10 @@ void board_init_f(ulong dummy) ret = ddr3_init(); if (ret) { printf("ddr3_init() failed: %d\n", ret);
hang();
if (IS_ENABLED(CONFIG_DDR_RESET_ON_TRAINING_FAILURE))
reset_cpu();
You should not call reset_cpu() from SPL loaded via UART. This will confuse x-modem software. Either return failure to BootROM or hang() like it was before.
Hmm, I didn't consider that.
What will happen if I return failure to BootROM? Will it try booting from different medium?
I think that the best thing to do on failure when booting via UART is to hang()...
Marek

On Thursday 17 February 2022 12:48:08 Marek Behún wrote:
On Thu, 17 Feb 2022 12:37:54 +0100 Pali Rohár pali@kernel.org wrote:
On Thursday 17 February 2022 01:08:48 Marek Behún wrote:
From: Marek Behún marek.behun@nic.cz
Some boards may occacionally fail DDR training. Currently we hang() in this case. Add an option that makes the board do an immediate reset in such a case, so that a new training is tried as soon as possible, instead of hanging and possibly waiting for watchdog to reset the board.
Signed-off-by: Marek Behún marek.behun@nic.cz Reviewed-by: Stefan Roese sr@denx.de
arch/arm/mach-mvebu/Kconfig | 9 +++++++++ arch/arm/mach-mvebu/spl.c | 6 +++++- 2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mach-mvebu/Kconfig b/arch/arm/mach-mvebu/Kconfig index d23cc0c760..ed957be6e1 100644 --- a/arch/arm/mach-mvebu/Kconfig +++ b/arch/arm/mach-mvebu/Kconfig @@ -213,6 +213,15 @@ config DDR_LOG_LEVEL At level 3, rovides the windows margin of each DQ as a results of DQS centeralization.
+config DDR_RESET_ON_TRAINING_FAILURE
- bool "Reset the board on DDR training failure instead of hanging"
- depends on ARMADA_38X || ARMADA_XP
- help
If DDR training fails in SPL, reset the board instead of hanging.
Some boards are known to fail DDR training occasionally and an
immediate reset may be preferable to waiting until the board is
reset by watchdog (if there even is one).
config SYS_BOARD default "clearfog" if TARGET_CLEARFOG default "helios4" if TARGET_HELIOS4 diff --git a/arch/arm/mach-mvebu/spl.c b/arch/arm/mach-mvebu/spl.c index 273ecb8bd6..d3c3bdc74d 100644 --- a/arch/arm/mach-mvebu/spl.c +++ b/arch/arm/mach-mvebu/spl.c @@ -4,6 +4,7 @@ */
#include <common.h> +#include <cpu_func.h> #include <dm.h> #include <fdtdec.h> #include <hang.h> @@ -330,7 +331,10 @@ void board_init_f(ulong dummy) ret = ddr3_init(); if (ret) { printf("ddr3_init() failed: %d\n", ret);
hang();
if (IS_ENABLED(CONFIG_DDR_RESET_ON_TRAINING_FAILURE))
reset_cpu();
You should not call reset_cpu() from SPL loaded via UART. This will confuse x-modem software. Either return failure to BootROM or hang() like it was before.
Hmm, I didn't consider that.
What will happen if I return failure to BootROM? Will it try booting from different medium?
Looks like returning failure to A385 BootROM is broken...
I think that the best thing to do on failure when booting via UART is to hang()...
Marek
... so for UART boot source stay with hang() as before.
participants (3)
-
Marek Behún
-
Pali Rohár
-
Stefan Roese