[PATCH] watchdog: allow overriding the rate-limiting logic

On the MPC8309-derived board I'm currently working on, the current rate-limiting logic in the generic watchdog_reset function poses two problems:
First, the hard-coded limit of 1000ms is too large for the external GPIO-triggered watchdog we have.
Second, in the SPL, the decrementer interrupt is not enabled, so the get_timer() call calls always returns 0, so wdt_reset() is never actually called. Enabling that interrupt (i.e. calling interrupt_init() somewhere in my early board code) is a bit problematic, since U-Boot proper gets loaded to address 0, hence overwriting exception vectors - and the reason I even need to care about the watchdog in the SPL is that the signature verification takes a rather long time, so just disabling interrupts before loading U-Boot proper doesn't work.
Both problems can be solved by allowing the board to override the rate-limiting logic. For example, in my case I will implement the function in terms of get_ticks() (i.e. the time base registers on PPC) which do not depend on interrupts, and use a threshold of gd->bus_clk / (4*16) ticks (~62ms).
Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk ---
This is on top of https://patchwork.ozlabs.org/patch/1242772/, but it's obviously trivial to do on master instead.
drivers/watchdog/wdt-uclass.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/drivers/watchdog/wdt-uclass.c b/drivers/watchdog/wdt-uclass.c index 309a0e9c5b..ad53e86b80 100644 --- a/drivers/watchdog/wdt-uclass.c +++ b/drivers/watchdog/wdt-uclass.c @@ -67,6 +67,19 @@ int wdt_expire_now(struct udevice *dev, ulong flags) }
#if defined(CONFIG_WATCHDOG) +__weak int watchdog_reset_ratelimit(void) +{ + static ulong next_reset; + ulong now; + + now = get_timer(0); + if (time_after(now, next_reset)) { + next_reset = now + 1000; /* reset every 1000ms */ + return 1; + } + return 0; +} + /* * Called by macro WATCHDOG_RESET. This function be called *very* early, * so we need to make sure, that the watchdog driver is ready before using @@ -74,19 +87,13 @@ int wdt_expire_now(struct udevice *dev, ulong flags) */ void watchdog_reset(void) { - static ulong next_reset; - ulong now; - /* Exit if GD is not ready or watchdog is not initialized yet */ if (!gd || !(gd->flags & GD_FLG_WDT_READY)) return;
/* Do not reset the watchdog too often */ - now = get_timer(0); - if (time_after(now, next_reset)) { - next_reset = now + 1000; /* reset every 1000ms */ + if (watchdog_reset_ratelimit()) wdt_reset(gd->watchdog_dev); - } } #endif

Hi Rasmus,
(added Christophe to Cc)
On 12.03.20 12:40, Rasmus Villemoes wrote:
On the MPC8309-derived board I'm currently working on, the current rate-limiting logic in the generic watchdog_reset function poses two problems:
First, the hard-coded limit of 1000ms is too large for the external GPIO-triggered watchdog we have.
I remember a discussion a few weeks ago with Christophe, where you pointed out, that there is already the official "hw_margin_ms" DT property for this delay here. Why don't you introduce it here so that all boards can use it as well?
Second, in the SPL, the decrementer interrupt is not enabled, so the get_timer() call calls always returns 0, so wdt_reset() is never actually called. Enabling that interrupt (i.e. calling interrupt_init() somewhere in my early board code) is a bit problematic, since U-Boot proper gets loaded to address 0, hence overwriting exception vectors - and the reason I even need to care about the watchdog in the SPL is that the signature verification takes a rather long time, so just disabling interrupts before loading U-Boot proper doesn't work.
Both problems can be solved by allowing the board to override the rate-limiting logic. For example, in my case I will implement the function in terms of get_ticks() (i.e. the time base registers on PPC) which do not depend on interrupts, and use a threshold of gd->bus_clk / (4*16) ticks (~62ms).
I would assume that you might run into multiple issues, when your timer infrastructure is not working correctly in SPL. Can't you instead "fix" this by using this get_ticks() option for the get_timer() functions in SPL so that you can use the common functions here and in all other places in SPL as well?
Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk
This is on top of https://patchwork.ozlabs.org/patch/1242772/, but it's obviously trivial to do on master instead.
drivers/watchdog/wdt-uclass.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/drivers/watchdog/wdt-uclass.c b/drivers/watchdog/wdt-uclass.c index 309a0e9c5b..ad53e86b80 100644 --- a/drivers/watchdog/wdt-uclass.c +++ b/drivers/watchdog/wdt-uclass.c @@ -67,6 +67,19 @@ int wdt_expire_now(struct udevice *dev, ulong flags) }
#if defined(CONFIG_WATCHDOG) +__weak int watchdog_reset_ratelimit(void) +{
- static ulong next_reset;
- ulong now;
- now = get_timer(0);
- if (time_after(now, next_reset)) {
next_reset = now + 1000; /* reset every 1000ms */
return 1;
- }
- return 0;
+}
- /*
- Called by macro WATCHDOG_RESET. This function be called *very* early,
- so we need to make sure, that the watchdog driver is ready before using
@@ -74,19 +87,13 @@ int wdt_expire_now(struct udevice *dev, ulong flags) */ void watchdog_reset(void) {
static ulong next_reset;
ulong now;
/* Exit if GD is not ready or watchdog is not initialized yet */ if (!gd || !(gd->flags & GD_FLG_WDT_READY)) return;
/* Do not reset the watchdog too often */
now = get_timer(0);
if (time_after(now, next_reset)) {
next_reset = now + 1000; /* reset every 1000ms */
- if (watchdog_reset_ratelimit()) wdt_reset(gd->watchdog_dev);
- } } #endif
Thanks, Stefan

On 12/03/2020 12.58, Stefan Roese wrote:
Hi Rasmus,
(added Christophe to Cc)
On 12.03.20 12:40, Rasmus Villemoes wrote:
On the MPC8309-derived board I'm currently working on, the current rate-limiting logic in the generic watchdog_reset function poses two problems:
First, the hard-coded limit of 1000ms is too large for the external GPIO-triggered watchdog we have.
I remember a discussion a few weeks ago with Christophe, where you pointed out, that there is already the official "hw_margin_ms" DT property for this delay here. Why don't you introduce it here so that all boards can use it as well?
Well, I considered that. Reading the value is probably just adding a dev_read_u32_default(..., 1000) next to the one reading the timeout-sec property in initr_watchdog(). But what is a good place to stash the result? It really belongs with the device, but as long as only one is supported I suppose one could move initr_watchdog to wdt-uclass.c and use a static variable in that file.
I can certainly do that. That leaves the other problem.
Second, in the SPL, the decrementer interrupt is not enabled, so the get_timer() call calls always returns 0, so wdt_reset() is never actually called. Enabling that interrupt (i.e. calling interrupt_init() somewhere in my early board code) is a bit problematic, since U-Boot proper gets loaded to address 0, hence overwriting exception vectors - and the reason I even need to care about the watchdog in the SPL is that the signature verification takes a rather long time, so just disabling interrupts before loading U-Boot proper doesn't work.
Both problems can be solved by allowing the board to override the rate-limiting logic. For example, in my case I will implement the function in terms of get_ticks() (i.e. the time base registers on PPC) which do not depend on interrupts, and use a threshold of gd->bus_clk / (4*16) ticks (~62ms).
I would assume that you might run into multiple issues, when your timer infrastructure is not working correctly in SPL.
None so far, except for the ratelimiting in terms of get_timer() not working.
Can't you instead "fix" this by using this get_ticks() option for the get_timer() functions in SPL so that you can use the common functions here and in all other places in SPL as well?
I don't understand what you mean. On PPC, get_timer(0) just returns a static variable which is incremented from the timer_interrupt(). When that interrupt is not enabled, that variable is always 0.
I can enable the timer interrupt and get the time (in terms of get_timer) going, but as I wrote, I would have to disable that interrupt again before actually loading U-Boot [you can probably imagine the fun of debugging what happens when one overwrites the exception vectors], so time would stop passing as far as get_timer() is concerned, which means that again the watchdog would no longer be serviced. So if there is to be any ratelimiting at all, it really has to be based on a time that does tick without relying on interrupts.
Rasmus

Some watchdogs must be reset more often than the once-per-second ratelimit used by the generic watchdog_reset function in wdt-uclass.c. There's precedent (from the gpio-wdt driver in linux) for using a property called hw_margin_ms to let the device tree tell the driver how often the device needs resetting. So use that generically. No change in default behaviour.
On top of https://patchwork.ozlabs.org/patch/1242772/ .
Stefan, something like this? That at least solves half my problems and might be useful to others as well. Then I'll have to figure out the time-stands-still problem in some other way.
Rasmus Villemoes (3): watchdog: remove stale ifndef CONFIG_WATCHDOG_TIMEOUT_MSECS from wdt.h watchdog: move initr_watchdog() to wdt-uclass.c watchdog: honour hw_margin_ms DT property
drivers/watchdog/wdt-uclass.c | 43 ++++++++++++++++++++++++++++++++++- include/wdt.h | 38 +------------------------------ 2 files changed, 43 insertions(+), 38 deletions(-)

Since WATCHDOG_TIMEOUT_MSECS was converted to Kconfig (commit ca51ef7c0c), CONFIG_WATCHDOG_TIMEOUT_MSECS has been guaranteed to be defined. So remove the dead fallback ifdeffery.
Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk --- include/wdt.h | 3 --- 1 file changed, 3 deletions(-)
diff --git a/include/wdt.h b/include/wdt.h index 5bcff24ab3..e833d3a772 100644 --- a/include/wdt.h +++ b/include/wdt.h @@ -107,9 +107,6 @@ struct wdt_ops { };
#if CONFIG_IS_ENABLED(WDT) -#ifndef CONFIG_WATCHDOG_TIMEOUT_MSECS -#define CONFIG_WATCHDOG_TIMEOUT_MSECS (60 * 1000) -#endif #define WATCHDOG_TIMEOUT_SECS (CONFIG_WATCHDOG_TIMEOUT_MSECS / 1000)
static inline int initr_watchdog(void)

On 13.03.20 17:04, Rasmus Villemoes wrote:
Since WATCHDOG_TIMEOUT_MSECS was converted to Kconfig (commit ca51ef7c0c), CONFIG_WATCHDOG_TIMEOUT_MSECS has been guaranteed to be defined. So remove the dead fallback ifdeffery.
Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk
include/wdt.h | 3 --- 1 file changed, 3 deletions(-)
diff --git a/include/wdt.h b/include/wdt.h index 5bcff24ab3..e833d3a772 100644 --- a/include/wdt.h +++ b/include/wdt.h @@ -107,9 +107,6 @@ struct wdt_ops { };
#if CONFIG_IS_ENABLED(WDT) -#ifndef CONFIG_WATCHDOG_TIMEOUT_MSECS -#define CONFIG_WATCHDOG_TIMEOUT_MSECS (60 * 1000) -#endif #define WATCHDOG_TIMEOUT_SECS (CONFIG_WATCHDOG_TIMEOUT_MSECS / 1000)
static inline int initr_watchdog(void)
Reviewed-by: Stefan Roese sr@denx.de
Thanks, Stefan

This function is a bit large for an inline function, and for U-Boot proper, it is called via a function pointer anyway (in board_r.c), so cannot be inlined.
It will shortly set a global variable to be used by the watchdog_reset() function in wdt-uclass.c, so this also allows making that variable local to wdt-uclass.c.
The WATCHDOG_TIMEOUT_SECS define is not used elsewhere.
Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk --- drivers/watchdog/wdt-uclass.c | 33 +++++++++++++++++++++++++++++++++ include/wdt.h | 35 +---------------------------------- 2 files changed, 34 insertions(+), 34 deletions(-)
diff --git a/drivers/watchdog/wdt-uclass.c b/drivers/watchdog/wdt-uclass.c index 309a0e9c5b..fb3e247c5f 100644 --- a/drivers/watchdog/wdt-uclass.c +++ b/drivers/watchdog/wdt-uclass.c @@ -13,6 +13,39 @@
DECLARE_GLOBAL_DATA_PTR;
+#define WATCHDOG_TIMEOUT_SECS (CONFIG_WATCHDOG_TIMEOUT_MSECS / 1000) + +int initr_watchdog(void) +{ + u32 timeout = WATCHDOG_TIMEOUT_SECS; + + /* + * Init watchdog: This will call the probe function of the + * watchdog driver, enabling the use of the device + */ + if (uclass_get_device_by_seq(UCLASS_WDT, 0, + (struct udevice **)&gd->watchdog_dev)) { + debug("WDT: Not found by seq!\n"); + if (uclass_get_device(UCLASS_WDT, 0, + (struct udevice **)&gd->watchdog_dev)) { + printf("WDT: Not found!\n"); + return 0; + } + } + + if (CONFIG_IS_ENABLED(OF_CONTROL)) { + timeout = dev_read_u32_default(gd->watchdog_dev, "timeout-sec", + WATCHDOG_TIMEOUT_SECS); + } + + wdt_start(gd->watchdog_dev, timeout * 1000, 0); + gd->flags |= GD_FLG_WDT_READY; + printf("WDT: Started with%s servicing (%ds timeout)\n", + IS_ENABLED(CONFIG_WATCHDOG) ? "" : "out", timeout); + + return 0; +} + int wdt_start(struct udevice *dev, u64 timeout_ms, ulong flags) { const struct wdt_ops *ops = device_get_ops(dev); diff --git a/include/wdt.h b/include/wdt.h index e833d3a772..aea5abc768 100644 --- a/include/wdt.h +++ b/include/wdt.h @@ -106,39 +106,6 @@ struct wdt_ops { int (*expire_now)(struct udevice *dev, ulong flags); };
-#if CONFIG_IS_ENABLED(WDT) -#define WATCHDOG_TIMEOUT_SECS (CONFIG_WATCHDOG_TIMEOUT_MSECS / 1000) - -static inline int initr_watchdog(void) -{ - u32 timeout = WATCHDOG_TIMEOUT_SECS; - - /* - * Init watchdog: This will call the probe function of the - * watchdog driver, enabling the use of the device - */ - if (uclass_get_device_by_seq(UCLASS_WDT, 0, - (struct udevice **)&gd->watchdog_dev)) { - debug("WDT: Not found by seq!\n"); - if (uclass_get_device(UCLASS_WDT, 0, - (struct udevice **)&gd->watchdog_dev)) { - printf("WDT: Not found!\n"); - return 0; - } - } - - if (CONFIG_IS_ENABLED(OF_CONTROL)) { - timeout = dev_read_u32_default(gd->watchdog_dev, "timeout-sec", - WATCHDOG_TIMEOUT_SECS); - } - - wdt_start(gd->watchdog_dev, timeout * 1000, 0); - gd->flags |= GD_FLG_WDT_READY; - printf("WDT: Started with%s servicing (%ds timeout)\n", - IS_ENABLED(CONFIG_WATCHDOG) ? "" : "out", timeout); - - return 0; -} -#endif +int initr_watchdog(void);
#endif /* _WDT_H_ */

On 13.03.20 17:04, Rasmus Villemoes wrote:
This function is a bit large for an inline function, and for U-Boot proper, it is called via a function pointer anyway (in board_r.c), so cannot be inlined.
It will shortly set a global variable to be used by the watchdog_reset() function in wdt-uclass.c, so this also allows making that variable local to wdt-uclass.c.
The WATCHDOG_TIMEOUT_SECS define is not used elsewhere.
Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk
drivers/watchdog/wdt-uclass.c | 33 +++++++++++++++++++++++++++++++++ include/wdt.h | 35 +---------------------------------- 2 files changed, 34 insertions(+), 34 deletions(-)
diff --git a/drivers/watchdog/wdt-uclass.c b/drivers/watchdog/wdt-uclass.c index 309a0e9c5b..fb3e247c5f 100644 --- a/drivers/watchdog/wdt-uclass.c +++ b/drivers/watchdog/wdt-uclass.c @@ -13,6 +13,39 @@
DECLARE_GLOBAL_DATA_PTR;
+#define WATCHDOG_TIMEOUT_SECS (CONFIG_WATCHDOG_TIMEOUT_MSECS / 1000)
+int initr_watchdog(void) +{
- u32 timeout = WATCHDOG_TIMEOUT_SECS;
- /*
* Init watchdog: This will call the probe function of the
* watchdog driver, enabling the use of the device
*/
- if (uclass_get_device_by_seq(UCLASS_WDT, 0,
(struct udevice **)&gd->watchdog_dev)) {
debug("WDT: Not found by seq!\n");
if (uclass_get_device(UCLASS_WDT, 0,
(struct udevice **)&gd->watchdog_dev)) {
printf("WDT: Not found!\n");
return 0;
}
- }
- if (CONFIG_IS_ENABLED(OF_CONTROL)) {
timeout = dev_read_u32_default(gd->watchdog_dev, "timeout-sec",
WATCHDOG_TIMEOUT_SECS);
- }
- wdt_start(gd->watchdog_dev, timeout * 1000, 0);
- gd->flags |= GD_FLG_WDT_READY;
- printf("WDT: Started with%s servicing (%ds timeout)\n",
IS_ENABLED(CONFIG_WATCHDOG) ? "" : "out", timeout);
- return 0;
+}
- int wdt_start(struct udevice *dev, u64 timeout_ms, ulong flags) { const struct wdt_ops *ops = device_get_ops(dev);
diff --git a/include/wdt.h b/include/wdt.h index e833d3a772..aea5abc768 100644 --- a/include/wdt.h +++ b/include/wdt.h @@ -106,39 +106,6 @@ struct wdt_ops { int (*expire_now)(struct udevice *dev, ulong flags); };
-#if CONFIG_IS_ENABLED(WDT) -#define WATCHDOG_TIMEOUT_SECS (CONFIG_WATCHDOG_TIMEOUT_MSECS / 1000)
-static inline int initr_watchdog(void) -{
- u32 timeout = WATCHDOG_TIMEOUT_SECS;
- /*
* Init watchdog: This will call the probe function of the
* watchdog driver, enabling the use of the device
*/
- if (uclass_get_device_by_seq(UCLASS_WDT, 0,
(struct udevice **)&gd->watchdog_dev)) {
debug("WDT: Not found by seq!\n");
if (uclass_get_device(UCLASS_WDT, 0,
(struct udevice **)&gd->watchdog_dev)) {
printf("WDT: Not found!\n");
return 0;
}
- }
- if (CONFIG_IS_ENABLED(OF_CONTROL)) {
timeout = dev_read_u32_default(gd->watchdog_dev, "timeout-sec",
WATCHDOG_TIMEOUT_SECS);
- }
- wdt_start(gd->watchdog_dev, timeout * 1000, 0);
- gd->flags |= GD_FLG_WDT_READY;
- printf("WDT: Started with%s servicing (%ds timeout)\n",
IS_ENABLED(CONFIG_WATCHDOG) ? "" : "out", timeout);
- return 0;
-} -#endif +int initr_watchdog(void);
#endif /* _WDT_H_ */
Reviewed-by: Stefan Roese sr@denx.de
Thanks, Stefan

Some watchdog devices, e.g. external gpio-triggered ones, must be reset more often than once per second, which means that the current rate-limiting logic in watchdog_reset() fails to keep the board alive.
gpio-wdt.txt in the linux source tree defines a "hw_margin_ms" property used to specifiy the maximum time allowed between resetting the device. Allow any watchdog device to specify such a property, and then use a reset period of one quarter of that. We keep the current default of resetting once every 1000ms.
Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk --- drivers/watchdog/wdt-uclass.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/watchdog/wdt-uclass.c b/drivers/watchdog/wdt-uclass.c index fb3e247c5f..436a52fd08 100644 --- a/drivers/watchdog/wdt-uclass.c +++ b/drivers/watchdog/wdt-uclass.c @@ -15,6 +15,12 @@ DECLARE_GLOBAL_DATA_PTR;
#define WATCHDOG_TIMEOUT_SECS (CONFIG_WATCHDOG_TIMEOUT_MSECS / 1000)
+/* + * Reset every 1000ms, or however often is required as indicated by a + * hw_margin_ms property. + */ +static ulong reset_period = 1000; + int initr_watchdog(void) { u32 timeout = WATCHDOG_TIMEOUT_SECS; @@ -36,6 +42,8 @@ int initr_watchdog(void) if (CONFIG_IS_ENABLED(OF_CONTROL)) { timeout = dev_read_u32_default(gd->watchdog_dev, "timeout-sec", WATCHDOG_TIMEOUT_SECS); + reset_period = dev_read_u32_default(gd->watchdog_dev, "hw_margin_ms", + 4*reset_period)/4; }
wdt_start(gd->watchdog_dev, timeout * 1000, 0); @@ -117,7 +125,7 @@ void watchdog_reset(void) /* Do not reset the watchdog too often */ now = get_timer(0); if (time_after(now, next_reset)) { - next_reset = now + 1000; /* reset every 1000ms */ + next_reset = now + reset_period; wdt_reset(gd->watchdog_dev); } }

On 13.03.20 17:04, Rasmus Villemoes wrote:
Some watchdog devices, e.g. external gpio-triggered ones, must be reset more often than once per second, which means that the current rate-limiting logic in watchdog_reset() fails to keep the board alive.
gpio-wdt.txt in the linux source tree defines a "hw_margin_ms" property used to specifiy the maximum time allowed between resetting the device. Allow any watchdog device to specify such a property, and then use a reset period of one quarter of that. We keep the current default of resetting once every 1000ms.
Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk
drivers/watchdog/wdt-uclass.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/watchdog/wdt-uclass.c b/drivers/watchdog/wdt-uclass.c index fb3e247c5f..436a52fd08 100644 --- a/drivers/watchdog/wdt-uclass.c +++ b/drivers/watchdog/wdt-uclass.c @@ -15,6 +15,12 @@ DECLARE_GLOBAL_DATA_PTR;
#define WATCHDOG_TIMEOUT_SECS (CONFIG_WATCHDOG_TIMEOUT_MSECS / 1000)
+/*
- Reset every 1000ms, or however often is required as indicated by a
- hw_margin_ms property.
- */
+static ulong reset_period = 1000;
- int initr_watchdog(void) { u32 timeout = WATCHDOG_TIMEOUT_SECS;
@@ -36,6 +42,8 @@ int initr_watchdog(void) if (CONFIG_IS_ENABLED(OF_CONTROL)) { timeout = dev_read_u32_default(gd->watchdog_dev, "timeout-sec", WATCHDOG_TIMEOUT_SECS);
reset_period = dev_read_u32_default(gd->watchdog_dev, "hw_margin_ms",
4*reset_period)/4;
Nitpicking: checkpatch will most likely complain about missing spaces here.
}
wdt_start(gd->watchdog_dev, timeout * 1000, 0); @@ -117,7 +125,7 @@ void watchdog_reset(void) /* Do not reset the watchdog too often */ now = get_timer(0); if (time_after(now, next_reset)) {
next_reset = now + 1000; /* reset every 1000ms */
wdt_reset(gd->watchdog_dev); } }next_reset = now + reset_period;
Other than that:
Reviewed-by: Stefan Roese sr@denx.de
Thanks, Stefan

On 14/03/2020 12.57, Stefan Roese wrote:
On 13.03.20 17:04, Rasmus Villemoes wrote:
int initr_watchdog(void) { u32 timeout = WATCHDOG_TIMEOUT_SECS; @@ -36,6 +42,8 @@ int initr_watchdog(void) if (CONFIG_IS_ENABLED(OF_CONTROL)) { timeout = dev_read_u32_default(gd->watchdog_dev, "timeout-sec", WATCHDOG_TIMEOUT_SECS); + reset_period = dev_read_u32_default(gd->watchdog_dev, "hw_margin_ms", + 4*reset_period)/4;
Nitpicking: checkpatch will most likely complain about missing spaces here.
Right. Want me to resend, or can you fix if/when applying?
Rasmus

On 16.03.20 16:39, Rasmus Villemoes wrote:
On 14/03/2020 12.57, Stefan Roese wrote:
On 13.03.20 17:04, Rasmus Villemoes wrote:
int initr_watchdog(void) { u32 timeout = WATCHDOG_TIMEOUT_SECS; @@ -36,6 +42,8 @@ int initr_watchdog(void) if (CONFIG_IS_ENABLED(OF_CONTROL)) { timeout = dev_read_u32_default(gd->watchdog_dev, "timeout-sec", WATCHDOG_TIMEOUT_SECS); + reset_period = dev_read_u32_default(gd->watchdog_dev, "hw_margin_ms", + 4*reset_period)/4;
Nitpicking: checkpatch will most likely complain about missing spaces here.
Right. Want me to resend, or can you fix if/when applying?
If there will be no other changes requiring a v2, then I can fix this while applying.
Thanks, Stefan

On 13.03.20 17:04, Rasmus Villemoes wrote:
Some watchdogs must be reset more often than the once-per-second ratelimit used by the generic watchdog_reset function in wdt-uclass.c. There's precedent (from the gpio-wdt driver in linux) for using a property called hw_margin_ms to let the device tree tell the driver how often the device needs resetting. So use that generically. No change in default behaviour.
On top of https://patchwork.ozlabs.org/patch/1242772/ .
Stefan, something like this?
Yes, thanks for looking into this.
That at least solves half my problems and might be useful to others as well. Then I'll have to figure out the time-stands-still problem in some other way.
If its too hard to enable interrupts in SPL for you or to provide some other means of a working get_timer() API, then we needto find another solution. You started with this weak function, which of course works. What other options are there? Adding a callback mechanism to register platform specific callback functions? Even though this might get a little bit too complicated.
Thanks, Stefan

On 14/03/2020 13.04, Stefan Roese wrote:
On 13.03.20 17:04, Rasmus Villemoes wrote:
That at least solves half my problems and might be useful to others as well. Then I'll have to figure out the time-stands-still problem in some other way.
If its too hard to enable interrupts in SPL for you or to provide some other means of a working get_timer() API, then we needto find another solution. You started with this weak function, which of course works. What other options are there? Adding a callback mechanism to register platform specific callback functions? Even though this might get a little bit too complicated.
Now that I dig a bit more into this, I seem to remember that we actually also had problems in U-Boot proper when loading a compressed kernel, so for now we're using an uncompressed kernel in our FIT images. I will have to re-investigate, but it now occurs to me that it might be due to the fact that interrupts get disabled during bootm (which makes sense, the same reason I stated previously of interrupt vectors about to be overwritten), so even in U-Boot proper, time as measured by get_timer() ceases to pass after that point, so all the WATCHDOG_RESET() calls from the inflate code effectively get ignored.
So it may be necessary to have some wdt_ratelimit_disable() hook that can be called from bootm_disable_interrupts() and e.g. some board-specific SPL code. I'll do some experiments and figure out if I do indeed need such a hook.
Rasmus

On 16/03/2020 16.52, Rasmus Villemoes wrote:
On 14/03/2020 13.04, Stefan Roese wrote:
On 13.03.20 17:04, Rasmus Villemoes wrote:
That at least solves half my problems and might be useful to others as well. Then I'll have to figure out the time-stands-still problem in some other way.
If its too hard to enable interrupts in SPL for you or to provide some other means of a working get_timer() API, then we needto find another solution. You started with this weak function, which of course works. What other options are there? Adding a callback mechanism to register platform specific callback functions? Even though this might get a little bit too complicated.
Now that I dig a bit more into this, I seem to remember that we actually also had problems in U-Boot proper when loading a compressed kernel, so for now we're using an uncompressed kernel in our FIT images. I will have to re-investigate, but it now occurs to me that it might be due to the fact that interrupts get disabled during bootm (which makes sense, the same reason I stated previously of interrupt vectors about to be overwritten), so even in U-Boot proper, time as measured by get_timer() ceases to pass after that point, so all the WATCHDOG_RESET() calls from the inflate code effectively get ignored.
So it may be necessary to have some wdt_ratelimit_disable() hook that can be called from bootm_disable_interrupts() and e.g. some board-specific SPL code. I'll do some experiments and figure out if I do indeed need such a hook.
OK, I have now had time to do some more experiments. I have enabled the timer tick in SPL, so get_timer() now "normally" works. Together with the .dts based read of the hardware margin, that makes the watchdog handling mostly work.
But, as I suspected, I do have a problem when loading a compressed kernel image - what I write above "so even in U-Boot proper, time as measured by get_timer() ceases to pass after that point, so all the WATCHDOG_RESET() calls from the inflate code effectively get ignored." is indeed the case.
So, what's the best way to proceed? Should there be a hook disabling the rate-limiting logic that bootm_disable_interrupts() can call? Or must get_timer() always return a sensible result even with interrupts disabled?
Rasmus

On 02.06.20 15:29, Rasmus Villemoes wrote:
On 16/03/2020 16.52, Rasmus Villemoes wrote:
On 14/03/2020 13.04, Stefan Roese wrote:
On 13.03.20 17:04, Rasmus Villemoes wrote:
That at least solves half my problems and might be useful to others as well. Then I'll have to figure out the time-stands-still problem in some other way.
If its too hard to enable interrupts in SPL for you or to provide some other means of a working get_timer() API, then we needto find another solution. You started with this weak function, which of course works. What other options are there? Adding a callback mechanism to register platform specific callback functions? Even though this might get a little bit too complicated.
Now that I dig a bit more into this, I seem to remember that we actually also had problems in U-Boot proper when loading a compressed kernel, so for now we're using an uncompressed kernel in our FIT images. I will have to re-investigate, but it now occurs to me that it might be due to the fact that interrupts get disabled during bootm (which makes sense, the same reason I stated previously of interrupt vectors about to be overwritten), so even in U-Boot proper, time as measured by get_timer() ceases to pass after that point, so all the WATCHDOG_RESET() calls from the inflate code effectively get ignored.
So it may be necessary to have some wdt_ratelimit_disable() hook that can be called from bootm_disable_interrupts() and e.g. some board-specific SPL code. I'll do some experiments and figure out if I do indeed need such a hook.
OK, I have now had time to do some more experiments. I have enabled the timer tick in SPL, so get_timer() now "normally" works. Together with the .dts based read of the hardware margin, that makes the watchdog handling mostly work.
But, as I suspected, I do have a problem when loading a compressed kernel image - what I write above "so even in U-Boot proper, time as measured by get_timer() ceases to pass after that point, so all the WATCHDOG_RESET() calls from the inflate code effectively get ignored." is indeed the case.
So, what's the best way to proceed? Should there be a hook disabling the rate-limiting logic that bootm_disable_interrupts() can call? Or must get_timer() always return a sensible result even with interrupts disabled?
Wouldn't it make sense to move the bootm_disable_interrupts() call to after loading and uncompressing the OS image? To right before jumping to the OS?
Thanks, Stefan

On 02/06/2020 16.53, Stefan Roese wrote:
On 02.06.20 15:29, Rasmus Villemoes wrote:
On 16/03/2020 16.52, Rasmus Villemoes wrote:
On 14/03/2020 13.04, Stefan Roese wrote:
On 13.03.20 17:04, Rasmus Villemoes wrote:
That at least solves half my problems and might be useful to others as well. Then I'll have to figure out the time-stands-still problem in some other way.
If its too hard to enable interrupts in SPL for you or to provide some other means of a working get_timer() API, then we needto find another solution. You started with this weak function, which of course works. What other options are there? Adding a callback mechanism to register platform specific callback functions? Even though this might get a little bit too complicated.
Now that I dig a bit more into this, I seem to remember that we actually also had problems in U-Boot proper when loading a compressed kernel, so for now we're using an uncompressed kernel in our FIT images. I will have to re-investigate, but it now occurs to me that it might be due to the fact that interrupts get disabled during bootm (which makes sense, the same reason I stated previously of interrupt vectors about to be overwritten), so even in U-Boot proper, time as measured by get_timer() ceases to pass after that point, so all the WATCHDOG_RESET() calls from the inflate code effectively get ignored.
So it may be necessary to have some wdt_ratelimit_disable() hook that can be called from bootm_disable_interrupts() and e.g. some board-specific SPL code. I'll do some experiments and figure out if I do indeed need such a hook.
OK, I have now had time to do some more experiments. I have enabled the timer tick in SPL, so get_timer() now "normally" works. Together with the .dts based read of the hardware margin, that makes the watchdog handling mostly work.
But, as I suspected, I do have a problem when loading a compressed kernel image - what I write above "so even in U-Boot proper, time as measured by get_timer() ceases to pass after that point, so all the WATCHDOG_RESET() calls from the inflate code effectively get ignored." is indeed the case.
So, what's the best way to proceed? Should there be a hook disabling the rate-limiting logic that bootm_disable_interrupts() can call? Or must get_timer() always return a sensible result even with interrupts disabled?
Wouldn't it make sense to move the bootm_disable_interrupts() call to after loading and uncompressing the OS image? To right before jumping to the OS?
No, because the point of disabling interrupts is that we may start writing to physical address 0 (e.g. if that's the load= address in the FIT image), which is also where the interrupt vectors reside - i.e., we're about to overwrite 0x900 (the decrementer interrupt vector), so if we don't disable interrupts, we'll crash on the very next decrementer interrupt (i.e., within one millisecond).
Rasmus

On 02/06/2020 17.38, Rasmus Villemoes wrote:
On 02/06/2020 16.53, Stefan Roese wrote:
On 02.06.20 15:29, Rasmus Villemoes wrote:
On 16/03/2020 16.52, Rasmus Villemoes wrote:
On 14/03/2020 13.04, Stefan Roese wrote:
On 13.03.20 17:04, Rasmus Villemoes wrote:
But, as I suspected, I do have a problem when loading a compressed kernel image - what I write above "so even in U-Boot proper, time as measured by get_timer() ceases to pass after that point, so all the WATCHDOG_RESET() calls from the inflate code effectively get ignored." is indeed the case.
So, what's the best way to proceed? Should there be a hook disabling the rate-limiting logic that bootm_disable_interrupts() can call? Or must get_timer() always return a sensible result even with interrupts disabled?
Wouldn't it make sense to move the bootm_disable_interrupts() call to after loading and uncompressing the OS image? To right before jumping to the OS?
No, because the point of disabling interrupts is that we may start writing to physical address 0 (e.g. if that's the load= address in the FIT image), which is also where the interrupt vectors reside - i.e., we're about to overwrite 0x900 (the decrementer interrupt vector), so if we don't disable interrupts, we'll crash on the very next decrementer interrupt (i.e., within one millisecond).
FWIW, the below very hacky patch makes get_timer() return sensible results on ppc even when interrupts are disabled, and hence ensures that the watchdog does get petted. It's obviously not meant for inclusion as is (it's prepared for being a proper config option, but for toying around it's easier to have it all in one file - also, I don't really like the name of the config knob). But it's also kind of expensive to do that do_div(), so I'm not sure I think this is even the right approach.
You previously rejected allowing the board to provide an override for the rate-limiting, and at least the "hw_margin_ms" parsing now solves part of what I wanted to use that for. What about implementing the rate-limiting instead in terms of get_ticks() (the hw_margin_ms can trivially be translated to a number of ticks at init time - there's already a usec_to_tick helper)? Are there any boards where get_ticks() doesn't return something sensible?
Rasmus
_Not_ for inclusion:
diff --git a/arch/powerpc/lib/interrupts.c b/arch/powerpc/lib/interrupts.c index 23ac5bca1e..e6b6a967ae 100644 --- a/arch/powerpc/lib/interrupts.c +++ b/arch/powerpc/lib/interrupts.c @@ -10,11 +10,14 @@ #include <common.h> #include <irq_func.h> #include <asm/processor.h> +#include <div64.h> #include <watchdog.h> #ifdef CONFIG_LED_STATUS #include <status_led.h> #endif
+#define CONFIG_GET_TIMER_IRQ 1 + #ifndef CONFIG_MPC83XX_TIMER #ifndef CONFIG_SYS_WATCHDOG_FREQ #define CONFIG_SYS_WATCHDOG_FREQ (CONFIG_SYS_HZ / 2) @@ -39,8 +42,33 @@ static __inline__ void set_dec (unsigned long val) } #endif /* !CONFIG_MPC83XX_TIMER */
+static u64 irq_off_ticks = 0; +static int interrupts_enabled = 0; +static volatile ulong timestamp = 0; + +static u32 irq_off_msecs(void) +{ + u64 t; + u32 d = get_tbclk(); + + if (!d) + return 0; + t = get_ticks() - irq_off_ticks; + t *= 1000; + do_div(t, d); + return t; +} + void enable_interrupts(void) { + ulong msr = get_msr (); + + if (IS_ENABLED(CONFIG_GET_TIMER_IRQ) && !(msr & MSR_EE)) { + /* Account for the time interrupts were off. */ + timestamp += irq_off_msecs(); + interrupts_enabled = 1; + } + set_msr (get_msr () | MSR_EE); }
@@ -50,6 +78,13 @@ int disable_interrupts(void) ulong msr = get_msr ();
set_msr (msr & ~MSR_EE); + + if (IS_ENABLED(CONFIG_GET_TIMER_IRQ) && (msr & MSR_EE)) { + /* Record when interrupts were disabled. */ + irq_off_ticks = get_ticks(); + interrupts_enabled = 0; + } + return ((msr & MSR_EE) != 0); }
@@ -61,13 +96,11 @@ int interrupt_init(void)
set_dec (decrementer_count);
- set_msr (get_msr () | MSR_EE); + enable_interrupts();
return (0); }
-static volatile ulong timestamp = 0; - void timer_interrupt(struct pt_regs *regs) { /* call cpu specific function from $(CPU)/interrupts.c */ @@ -90,6 +123,12 @@ void timer_interrupt(struct pt_regs *regs)
ulong get_timer (ulong base) { - return (timestamp - base); + ulong ts = timestamp; + + /* If called within an irq-off section, account for the time since irqs were turned off. */ + if (IS_ENABLED(CONFIG_GET_TIMER_IRQ) && !interrupts_enabled) + ts += irq_off_msecs(); + + return (ts - base); } #endif /* !CONFIG_MPC83XX_TIMER */

On 04.06.20 10:28, Rasmus Villemoes wrote:
On 02/06/2020 17.38, Rasmus Villemoes wrote:
On 02/06/2020 16.53, Stefan Roese wrote:
On 02.06.20 15:29, Rasmus Villemoes wrote:
On 16/03/2020 16.52, Rasmus Villemoes wrote:
On 14/03/2020 13.04, Stefan Roese wrote:
On 13.03.20 17:04, Rasmus Villemoes wrote:
But, as I suspected, I do have a problem when loading a compressed kernel image - what I write above "so even in U-Boot proper, time as measured by get_timer() ceases to pass after that point, so all the WATCHDOG_RESET() calls from the inflate code effectively get ignored." is indeed the case.
So, what's the best way to proceed? Should there be a hook disabling the rate-limiting logic that bootm_disable_interrupts() can call? Or must get_timer() always return a sensible result even with interrupts disabled?
Wouldn't it make sense to move the bootm_disable_interrupts() call to after loading and uncompressing the OS image? To right before jumping to the OS?
No, because the point of disabling interrupts is that we may start writing to physical address 0 (e.g. if that's the load= address in the FIT image), which is also where the interrupt vectors reside - i.e., we're about to overwrite 0x900 (the decrementer interrupt vector), so if we don't disable interrupts, we'll crash on the very next decrementer interrupt (i.e., within one millisecond).
Ah, thanks for refreshing me on this.
FWIW, the below very hacky patch makes get_timer() return sensible results on ppc even when interrupts are disabled, and hence ensures that the watchdog does get petted. It's obviously not meant for inclusion as is (it's prepared for being a proper config option, but for toying around it's easier to have it all in one file - also, I don't really like the name of the config knob). But it's also kind of expensive to do that do_div(), so I'm not sure I think this is even the right approach.
I agree. This does not look as its going into the right direction. But thanks for working on this anyways.
You previously rejected allowing the board to provide an override for the rate-limiting,
From my memory, I "just" suggested working on a different, more generic approach. But if this fails, I'm open to re-visit the options.
and at least the "hw_margin_ms" parsing now solves part of what I wanted to use that for. What about implementing the rate-limiting instead in terms of get_ticks() (the hw_margin_ms can trivially be translated to a number of ticks at init time - there's already a usec_to_tick helper)? Are there any boards where get_ticks() doesn't return something sensible?
Could you please send a new version of a patch(set) to address these issues by using the "override for the rate-limiting" or some other idea you have right now for this? I'll review it soon'ish.
Thanks, Stefan
Rasmus
_Not_ for inclusion:
diff --git a/arch/powerpc/lib/interrupts.c b/arch/powerpc/lib/interrupts.c index 23ac5bca1e..e6b6a967ae 100644 --- a/arch/powerpc/lib/interrupts.c +++ b/arch/powerpc/lib/interrupts.c @@ -10,11 +10,14 @@ #include <common.h> #include <irq_func.h> #include <asm/processor.h> +#include <div64.h> #include <watchdog.h> #ifdef CONFIG_LED_STATUS #include <status_led.h> #endif
+#define CONFIG_GET_TIMER_IRQ 1
- #ifndef CONFIG_MPC83XX_TIMER #ifndef CONFIG_SYS_WATCHDOG_FREQ #define CONFIG_SYS_WATCHDOG_FREQ (CONFIG_SYS_HZ / 2)
@@ -39,8 +42,33 @@ static __inline__ void set_dec (unsigned long val) } #endif /* !CONFIG_MPC83XX_TIMER */
+static u64 irq_off_ticks = 0; +static int interrupts_enabled = 0; +static volatile ulong timestamp = 0;
+static u32 irq_off_msecs(void) +{
- u64 t;
- u32 d = get_tbclk();
- if (!d)
return 0;
- t = get_ticks() - irq_off_ticks;
- t *= 1000;
- do_div(t, d);
- return t;
+}
- void enable_interrupts(void) {
- ulong msr = get_msr ();
- if (IS_ENABLED(CONFIG_GET_TIMER_IRQ) && !(msr & MSR_EE)) {
/* Account for the time interrupts were off. */
timestamp += irq_off_msecs();
interrupts_enabled = 1;
- }
- set_msr (get_msr () | MSR_EE); }
@@ -50,6 +78,13 @@ int disable_interrupts(void) ulong msr = get_msr ();
set_msr (msr & ~MSR_EE);
- if (IS_ENABLED(CONFIG_GET_TIMER_IRQ) && (msr & MSR_EE)) {
/* Record when interrupts were disabled. */
irq_off_ticks = get_ticks();
interrupts_enabled = 0;
- }
- return ((msr & MSR_EE) != 0); }
@@ -61,13 +96,11 @@ int interrupt_init(void)
set_dec (decrementer_count);
- set_msr (get_msr () | MSR_EE);
enable_interrupts();
return (0); }
-static volatile ulong timestamp = 0;
- void timer_interrupt(struct pt_regs *regs) { /* call cpu specific function from $(CPU)/interrupts.c */
@@ -90,6 +123,12 @@ void timer_interrupt(struct pt_regs *regs)
ulong get_timer (ulong base) {
- return (timestamp - base);
- ulong ts = timestamp;
- /* If called within an irq-off section, account for the time since
irqs were turned off. */
- if (IS_ENABLED(CONFIG_GET_TIMER_IRQ) && !interrupts_enabled)
ts += irq_off_msecs();
- return (ts - base); } #endif /* !CONFIG_MPC83XX_TIMER */
Viele Grüße, Stefan

On 13.03.20 17:04, Rasmus Villemoes wrote:
Some watchdogs must be reset more often than the once-per-second ratelimit used by the generic watchdog_reset function in wdt-uclass.c. There's precedent (from the gpio-wdt driver in linux) for using a property called hw_margin_ms to let the device tree tell the driver how often the device needs resetting. So use that generically. No change in default behaviour.
On top of https://patchwork.ozlabs.org/patch/1242772/ .
Stefan, something like this? That at least solves half my problems and might be useful to others as well. Then I'll have to figure out the time-stands-still problem in some other way.
Rasmus Villemoes (3): watchdog: remove stale ifndef CONFIG_WATCHDOG_TIMEOUT_MSECS from wdt.h watchdog: move initr_watchdog() to wdt-uclass.c watchdog: honour hw_margin_ms DT property
drivers/watchdog/wdt-uclass.c | 43 ++++++++++++++++++++++++++++++++++- include/wdt.h | 38 +------------------------------ 2 files changed, 43 insertions(+), 38 deletions(-)
Series applied to u-boot-marvell/master
Thanks, Stefan
participants (2)
-
Rasmus Villemoes
-
Stefan Roese