Re: [U-Boot] [PATCH] watchdog: omap_wdt: improve watchdog reset path

On Sat, 2018-03-03 at 12:00 +0000, u-boot-request@lists.denx.de wrote:
On 1 March 2018 at 16:33, Tom Rini trini@konsulko.com wrote:
On Thu, Mar 01, 2018 at 04:10:44PM +0200, Ruslan Bilovol wrote:
Hi Lukasz,
On Thu, Mar 1, 2018 at 12:53 PM, Lukasz Majewski lukma@denx.de wrote:
Hi Ruslan,
Remove busy looping during watchdog reset. Each polling of W_PEND_WTGR bit ("finish posted write") after watchdog reset takes 120-140us on BeagleBone Black board. Current U-Boot code has watchdog resets in random places and often there is situation when watchdog is reset few times in a row in nested functions. This adds extra delays and slows the whole system.
Instead of polling W_PEND_WTGR bit, we skip watchdog reset if the bit is set. Anyway, watchdog is in the middle of reset *right now*, so we can just return.
It seems like similar problem may be in the Linux kernel driver: v4.16-rc1/source/drivers/watchdog/omap_wdt.c#L74
Linux driver also waits for the write.
Right, Linux driver has similar code but it doesn't affect performance. This is because of watchdog usage in Linux, it is enabled and reset by userspace. This is quite rare event. Moreover, since Linux has interrupts enabled and has scheduling, such busy loops in the omap driver are not very different to just mdelay() usage. The system can handle interrupts, and can even do preemption if PREEMPT is enabled. So use of such busy loops in that particular case shouldn't cause any noticeable performance issues.
True. But, can you also submit a patch to the kernel side (and CC me) ? That's far more likely to get the attention of the engineers that might know of corner cases with the various SoC families where we might need to keep doing what we're doing or other possible problems. Thanks!
Some additional input from my side.
My previous measurements were wrong, due to usage of slow USB hub port on my laptop. Using another USB port I have next transfer speed for "fastboot flash" operation: - without patch: 2.84 MiB/sec - with patch: 7.81 MiB/sec
So it gives us about 2.75 times improvement, as stated by Ruslan in his commit message. Also, I tested that WDT continues to work correctly, and it does (tried several use-cases, and also debug patch with infinite loop). So again:
Tested-by: Sam Protsenko semen.protsenko@linaro.org
Also:
Reviewed-by: Sam Protsenko semen.protsenko@linaro.org
I checked the code and I can say that there shouldn't be any concerns about WDT functionality with this patch. By the way, in my opinion, for kernel this patch doesn't make much sense, and there might be even confusion in case we send it... If there any concerns about it, we can invite related engineers in this discussion, asking them to review this.
Overall, I think this patch is "must have" in U-Boot, as it gives us dramatic improvement without any drawbacks, especially for AM335x boards. It's probably the best approach for WDT reset procedure until interrupts are enabled for ARM architecture (if we even consider it).
Tested patch with AM335x on custom u-boot-2013.10 based branch. Patch cleanly applied, obviously not much in this wdt driver has changed since then.
Fastboot flash Before patch: 2.89MiB/s After patch: 8.68MiB/s
Tested-by: Jonathan Cormier jcormier@criticallink.com Reviewed-by: Jonathan Cormier jcormier@criticallink.com

On Mon, Mar 5, 2018 at 6:00 PM, Jonathan Cormer jcormier@criticallink.com wrote:
On Sat, 2018-03-03 at 12:00 +0000, u-boot-request@lists.denx.de wrote:
On 1 March 2018 at 16:33, Tom Rini trini@konsulko.com wrote:
On Thu, Mar 01, 2018 at 04:10:44PM +0200, Ruslan Bilovol wrote:
Hi Lukasz,
On Thu, Mar 1, 2018 at 12:53 PM, Lukasz Majewski lukma@denx.de wrote:
Hi Ruslan,
Remove busy looping during watchdog reset. Each polling of W_PEND_WTGR bit ("finish posted write") after watchdog reset takes 120-140us on BeagleBone Black board. Current U-Boot code has watchdog resets in random places and often there is situation when watchdog is reset few times in a row in nested functions. This adds extra delays and slows the whole system.
Instead of polling W_PEND_WTGR bit, we skip watchdog reset if the bit is set. Anyway, watchdog is in the middle of reset *right now*, so we can just return.
It seems like similar problem may be in the Linux kernel driver: v4.16-rc1/source/drivers/watchdog/omap_wdt.c#L74
Linux driver also waits for the write.
Right, Linux driver has similar code but it doesn't affect performance. This is because of watchdog usage in Linux, it is enabled and reset by userspace. This is quite rare event. Moreover, since Linux has interrupts enabled and has scheduling, such busy loops in the omap driver are not very different to just mdelay() usage. The system can handle interrupts, and can even do preemption if PREEMPT is enabled. So use of such busy loops in that particular case shouldn't cause any noticeable performance issues.
True. But, can you also submit a patch to the kernel side (and CC me) ? That's far more likely to get the attention of the engineers that might know of corner cases with the various SoC families where we might need to keep doing what we're doing or other possible problems. Thanks!
Some additional input from my side.
My previous measurements were wrong, due to usage of slow USB hub port on my laptop. Using another USB port I have next transfer speed for "fastboot flash" operation:
- without patch: 2.84 MiB/sec
- with patch: 7.81 MiB/sec
So it gives us about 2.75 times improvement, as stated by Ruslan in his commit message. Also, I tested that WDT continues to work correctly, and it does (tried several use-cases, and also debug patch with infinite loop). So again:
Tested-by: Sam Protsenko <semen.protsenko@linaro.org>
Also:
Reviewed-by: Sam Protsenko <semen.protsenko@linaro.org>
I checked the code and I can say that there shouldn't be any concerns about WDT functionality with this patch. By the way, in my opinion, for kernel this patch doesn't make much sense, and there might be even confusion in case we send it... If there any concerns about it, we can invite related engineers in this discussion, asking them to review this.
Overall, I think this patch is "must have" in U-Boot, as it gives us dramatic improvement without any drawbacks, especially for AM335x boards. It's probably the best approach for WDT reset procedure until interrupts are enabled for ARM architecture (if we even consider it).
Tested patch with AM335x on custom u-boot-2013.10 based branch. Patch cleanly applied, obviously not much in this wdt driver has changed since then.
Fastboot flash Before patch: 2.89MiB/s After patch: 8.68MiB/s
Cool, that's even a little bit better than I've got on my setup! Thanks for testing.
Regards, Ruslan

Because someone was wondering what the usb speed was in linux. My kernel is based on 3.2. Note I have USB DMA disabled so that doesn't help the speed. Not sure if u-boot is using DMA but I'd guess not.
ADB over usb $ time adb -s 16019491 push userdata.img /mnt/obb/ userdata.img: 1 file pushed. 3.5 MB/s (45007716 bytes in 12.391s)
ADB over rndis $ time adb -s 192.168.2.1 push userdata.img /mnt/obb/ userdata.img: 1 file pushed. 2.9 MB/s (45007716 bytes in 14.900s)
Netcat over rndis 1|root@android:/ # time busybox nc 192.168.2.2 12345 > /mnt/obb/userdata.img 0m11.12s real 0m0.11s user 0m8.30s system ~3.9MB/s
On Mon, Mar 5, 2018 at 3:45 PM, Ruslan Bilovol ruslan.bilovol@gmail.com wrote:
On Mon, Mar 5, 2018 at 6:00 PM, Jonathan Cormer jcormier@criticallink.com wrote:
On Sat, 2018-03-03 at 12:00 +0000, u-boot-request@lists.denx.de wrote:
On 1 March 2018 at 16:33, Tom Rini trini@konsulko.com wrote:
On Thu, Mar 01, 2018 at 04:10:44PM +0200, Ruslan Bilovol wrote:
Hi Lukasz,
On Thu, Mar 1, 2018 at 12:53 PM, Lukasz Majewski lukma@denx.de wrote:
Hi Ruslan,
> > Remove busy looping during watchdog reset. > Each polling of W_PEND_WTGR bit ("finish posted > write") after watchdog reset takes 120-140us > on BeagleBone Black board. Current U-Boot code > has watchdog resets in random places and often > there is situation when watchdog is reset > few times in a row in nested functions. > This adds extra delays and slows the whole system. > > Instead of polling W_PEND_WTGR bit, we skip > watchdog reset if the bit is set. Anyway, watchdog > is in the middle of reset *right now*, so we can > just return. It seems like similar problem may be in the Linux kernel driver: v4.16-rc1/source/drivers/watchdog/omap_wdt.c#L74
Linux driver also waits for the write.
Right, Linux driver has similar code but it doesn't affect performance. This is because of watchdog usage in Linux, it is enabled and reset by userspace. This is quite rare event. Moreover, since Linux has interrupts enabled and has scheduling, such busy loops in the omap driver are not very different to just mdelay() usage. The system can handle interrupts, and can even do preemption if PREEMPT is enabled. So use of such busy loops in that particular case shouldn't cause any noticeable performance issues.
True. But, can you also submit a patch to the kernel side (and CC me) ? That's far more likely to get the attention of the engineers that might know of corner cases with the various SoC families where we might need to keep doing what we're doing or other possible problems. Thanks!
Some additional input from my side.
My previous measurements were wrong, due to usage of slow USB hub port on my laptop. Using another USB port I have next transfer speed for "fastboot flash" operation:
- without patch: 2.84 MiB/sec
- with patch: 7.81 MiB/sec
So it gives us about 2.75 times improvement, as stated by Ruslan in his commit message. Also, I tested that WDT continues to work correctly, and it does (tried several use-cases, and also debug patch with infinite loop). So again:
Tested-by: Sam Protsenko <semen.protsenko@linaro.org>
Also:
Reviewed-by: Sam Protsenko <semen.protsenko@linaro.org>
I checked the code and I can say that there shouldn't be any concerns about WDT functionality with this patch. By the way, in my opinion, for kernel this patch doesn't make much sense, and there might be even confusion in case we send it... If there any concerns about it, we can invite related engineers in this discussion, asking them to review this.
Overall, I think this patch is "must have" in U-Boot, as it gives us dramatic improvement without any drawbacks, especially for AM335x boards. It's probably the best approach for WDT reset procedure until interrupts are enabled for ARM architecture (if we even consider it).
Tested patch with AM335x on custom u-boot-2013.10 based branch. Patch cleanly applied, obviously not much in this wdt driver has changed since then.
Fastboot flash Before patch: 2.89MiB/s After patch: 8.68MiB/s
Cool, that's even a little bit better than I've got on my setup! Thanks for testing.
Regards, Ruslan
participants (3)
-
Jon Cormier
-
Jonathan Cormer
-
Ruslan Bilovol