
On 18/06/2024 23.03, Tim Harvey wrote:
On Tue, Jun 18, 2024 at 7:32 AM Tom Rini trini@konsulko.com wrote:
Stefan and Tom,
I'm seeing CI issues here even with 5000us [1]: host bind 0 /tmp/sandbox/persistent-cyclic function wdt-gpio-level took too long: 5368us vs 5000us max
Yes, 5ms is way too little when you're not the only thing running on the cpu, which is why I went with 100ms.
Random thoughts and questions:
(1) Do we have any way to programmatically grab all the logs from azure CI, so we can get some kind of objective statistics on the number after "took too long:". Clicking through the web interface and randomly searching is too painful.
It would also be helpful to know what percentage of CI runs have failed due to that, versus due to some genuine error.
(2) I considered a patch that just added a
default $something big if SANDBOX
to config CYCLIC_MAX_CPU_TIME_US, but since the problem also hit qemu, I dropped that. But, if my patch is too ugly (and I might tend to think that myself...), perhaps at least this would be an added improvement over the generic bump to 5000us.
(3) I also thought that perhaps for sandbox, we should simply measure the time using clock_gettime(CLOCK_PROCESS_CPUTIME_ID), instead of wallclock time. But it's a little ugly to implement since the "now" variable is both used to decide if its time to run the callback, and as a starting point for measuring cpu time, and we probably still want the "is it time" to be measured on wallclock and not however much cpu-time the u-boot process has been given. Or maybe we don't, and CLOCK_PROCESS_CPUTIME_ID would simply be a better backend for os_get_nsec(). Sure, time in the sandbox would progress slower than on the host, but does that actually matter?
(4) Btw., what kind of clock tick do we even get when run under qemu? I don't have much experience with qemu, but from quick googling it seems that -icount would be interesting. Also see https://github.com/zephyrproject-rtos/zephyr/issues/14173 . From quick reading it seems there were some issues back in 2019, but that today it mostly works for them, except some SMP issues (that are certainly not relevant to U-Boot).
The current situation is a frustrating waste of developer and maintainer time and CI resources.
Rasmus