
On Mon, Oct 10, 2022 at 02:14:25PM -0400, Tom Rini wrote:
On Mon, Oct 10, 2022 at 08:01:23PM +0200, Pali Rohár wrote:
On Monday 10 October 2022 13:56:10 Tom Rini wrote:
On Mon, Oct 10, 2022 at 07:44:05PM +0200, Pali Rohár wrote:
On Monday 10 October 2022 13:40:38 Tom Rini wrote:
On Mon, Oct 10, 2022 at 07:22:56PM +0200, Pali Rohár wrote:
On Monday 10 October 2022 12:28:18 Tom Rini wrote: > On Sun, Oct 09, 2022 at 09:12:25PM +0200, Pali Rohár wrote: > > Hello! Watchdog code seems to be broken in u-boot master branch. > > On Nokia N900 I'm getting following message in qemu: > > > > cyclic function rx51_watchdog took too long: 10000us vs 1000us max, disabling > > > > Seems that watchdog core code is not prepared for "slower" watchdogs > > which communicate over slower i2c bus, like it is the case for N900. > > > > Disabling slower watchdog is a bad idea as it would result in reboot > > loop instead of slower - but working code. > > So, looking at this in more detail, we have > CONFIG_CYCLIC_MAX_CPU_TIME_US as a configuration option (which is where > the too long comes from). And picking a random CI run: > https://source.denx.de/u-boot/u-boot/-/jobs/511177 > I do see we hit this in CI once, but not every time, QEMU runs here. Is > that the max time is configurable enough to satisfy your concerns here?
It is needed to investigate, how to _properly_ fix this issue, not just workarounded it. Probably other boards may be affected.
So it's the cyclic watchdog code, which we merged as early as possible that's the reason here. And it was merged as early as we could to see if there's problems. Are there problems? We're seeing "system too slow, disabling" on QEMU, sometimes, and the value of too slow is configurable. I know you reported other problems with n900 HW, so we can't see if it's failing there
I was tested it with older asm code (as described in that other email, via git checkout commit -- file) on n900 HW and watchdog problem is there too. Phone reboots in about 20 seconds. But as I do not have serial console, I do not know if that "disabling" message is printed there too (but I guess it is).
I think I'm a bit baffled at this point, honestly. The watchdog timeout is 60 seconds. If you're confident in it being about 20 seconds, consistently, changing WATCHDOG_TIMEOUT_MSECS to say 10000 (so, 10 seconds) should let you see if U-Boot has configured the watchdog and it's being tripped, or if it's still at the prior stage value.
$ git grep CONFIG_WATCHDOG_TIMEOUT_MSECS configs/nokia_rx51_defconfig configs/nokia_rx51_defconfig:CONFIG_WATCHDOG_TIMEOUT_MSECS=31000
Also watchdog is started by NOLO (which loads and execute U-Boot) so there can be some smaller timeout.
So I have feeling that on the real HW is same issue. cyclic code disabled watchdog kicking and then watchdog restarted phone.
I do not remember exact time (if it is 20s or 25s; I have not measured it precisely), but it sounds plausible.
OK, so what happens if you increase CONFIG_CYCLIC_MAX_CPU_TIME_US to something very high (so we should still enable the watchdog and configure the timeout) along with CONFIG_WATCHDOG_TIMEOUT_MSECS being high too (so if we can't service it in time really it's so long as to be noticeable) ? Or CONFIG_WATCHDOG_TIMEOUT_MSECS to something much lower (so that if the device is resetting quicker we're crashing elsewhere) ?
OK, on my beagleboard xM with a small change: diff --git a/drivers/watchdog/omap_wdt.c b/drivers/watchdog/omap_wdt.c index ca2bc7cfb59e..f0e57b4f7286 100644 --- a/drivers/watchdog/omap_wdt.c +++ b/drivers/watchdog/omap_wdt.c @@ -39,7 +39,7 @@ #include <common.h> #include <log.h> #include <watchdog.h> -#include <asm/arch/hardware.h> +#include <asm/ti-common/omap_wdt.h> #include <asm/io.h> #include <asm/processor.h> #include <asm/arch/cpu.h>
On my beagleboard xM I now see: U-Boot SPL 2022.10-00459-g73e741b8ee46-dirty (Oct 10 2022 - 15:18:38 -0400) Trying to boot from MMC1
U-Boot 2022.10-00459-g73e741b8ee46-dirty (Oct 10 2022 - 15:18:38 -0400)
OMAP3630/3730-GP ES1.1, CPU-OPP2, L3-200MHz, Max CPU Clock 800 MHz Model: TI OMAP3 BeagleBoard OMAP3 Beagle board + LPDDR/NAND I2C: ready DRAM: 256 MiB Core: 45 devices, 19 uclasses, devicetree: separate WDT: Started wdt@48314000 without servicing (60s timeout) NAND: 0 MiB MMC: OMAP SD/MMC: 0 Loading Environment from NAND... *** Warning - readenv() failed, using default environment
Beagle xM Rev A/B No EEPROM on expansion board OMAP die ID: 6e5e00211ff00000015739eb08031024 Net: No ethernet found. Hit any key to stop autoboot: 0
So, this is as close as I can get to testing on n900 HW, and it's fine here.