[U-Boot] [RFC PATCH] lib/timer: initialize timebase_l/timebase_h

On systems using the generic timer routines defined in lib/time.c we use timebase_l and timebase_h fields from the gd to detect wraparounds in our tick counter. The tick calculcation algorithm silently assumes that a long is only 32 bits, which leads to wrong results when timebase_h is not 0 on 64-bit systems. As one possible fix lets initialize timebase_h (and timebase_l) to 0, so on 64-bit systems timebase_h will never(TM) be bigger than 0 and thus cannot spoil timebase_l by being ORed into it.
This fixes occasional timeout issues on the Pine64 (and possibly other ARM64 systems).
Signed-off-by: Andre Przywara andre.przywara@arm.com --- Hi,
I am bit puzzled what the proper fix is, this one is the easiest I could come up with. Not sure if the gd should be zeroed normally (and it's just broken on sunxi/arm64 because of some linker issues) or whether we really forgot to initialize those fields and just got away with it. Other fixes I tried are implementing get_ticks() in armv8/generic_timer.c or fixing the generic get_ticks() implementation by using "#if BITS_PER_LONG < 64" in the function to shortcut the code on 64-bit systems. Please have your say on what's best.
Cheers, Andre.
P.S. "Never" above seems to be 195 years assuming a worst-case 3GHz arch timer.
lib/time.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/lib/time.c b/lib/time.c index f37150f..b77d134 100644 --- a/lib/time.c +++ b/lib/time.c @@ -109,6 +109,8 @@ static uint64_t notrace tick_to_time(uint64_t tick)
int __weak timer_init(void) { + gd->timebase_l = 0; + gd->timebase_h = 0; return 0; }

On 25/10/2016 02:51, Andre Przywara wrote:
On systems using the generic timer routines defined in lib/time.c we use timebase_l and timebase_h fields from the gd to detect wraparounds in our tick counter. The tick calculcation algorithm silently assumes that a long is only 32 bits, which leads to wrong results when timebase_h is not 0 on 64-bit systems. As one possible fix lets initialize timebase_h (and timebase_l) to 0, so on 64-bit systems timebase_h will never(TM) be bigger than 0 and thus cannot spoil timebase_l by being ORed into it.
This fixes occasional timeout issues on the Pine64 (and possibly other ARM64 systems).
Signed-off-by: Andre Przywara andre.przywara@arm.com
Hi,
I am bit puzzled what the proper fix is, this one is the easiest I could come up with. Not sure if the gd should be zeroed normally (and it's just broken on sunxi/arm64 because of some linker issues) or whether we really forgot to initialize those fields and just got away with it.
The gd clearing happens via crt0_64.S -> board_init_f_init_reserve(). There we should have fully cleared all memory associated with global data.
I can't see anything obviously wrong in that code. Could you try to dump gd if the timer offsets are != 0 on init? Maybe we can conclude something from the data in it.
Alex

On 25/10/16 08:52, Alexander Graf wrote:
Hi Alex,
thanks for looking at this!
On 25/10/2016 02:51, Andre Przywara wrote:
On systems using the generic timer routines defined in lib/time.c we use timebase_l and timebase_h fields from the gd to detect wraparounds in our tick counter. The tick calculcation algorithm silently assumes that a long is only 32 bits, which leads to wrong results when timebase_h is not 0 on 64-bit systems. As one possible fix lets initialize timebase_h (and timebase_l) to 0, so on 64-bit systems timebase_h will never(TM) be bigger than 0 and thus cannot spoil timebase_l by being ORed into it.
This fixes occasional timeout issues on the Pine64 (and possibly other ARM64 systems).
Well, not really (this fix isn't sufficient), but read on ...
Signed-off-by: Andre Przywara andre.przywara@arm.com
Hi,
I am bit puzzled what the proper fix is, this one is the easiest I could come up with. Not sure if the gd should be zeroed normally (and it's just broken on sunxi/arm64 because of some linker issues) or whether we really forgot to initialize those fields and just got away with it.
The gd clearing happens via crt0_64.S -> board_init_f_init_reserve(). There we should have fully cleared all memory associated with global data.
Ah,thanks for pointing to that. I was a bit clueless where to start looking for it - "git grep gd" is obviously not a good idea ;-)
I can't see anything obviously wrong in that code. Could you try to dump gd if the timer offsets are != 0 on init? Maybe we can conclude something from the data in it.
So I agree that this code looks sane and indeed in all my dumps it looks like the initialization works fine. I did some more debugging and learned that the increased timebase_h comes from detected overflows: in fact some readings are really lower than previous ones: ... MMC: SUNXI SD/MMC: 0 get_ticks() overflow: now: 118046720, tbl: 118063103, tbh: 0 get_ticks() overflow: now: 118439936, tbl: 118456318, tbh: 1 get_ticks() overflow: now: 118571008, tbl: 118587391, tbh: 2 get_ticks() overflow: now: 118734848, tbl: 118751230, tbh: 3 get_ticks() overflow: now: 119422976, tbl: 119439358, tbh: 4 get_ticks() overflow: now: 119783424, tbl: 119799807, tbh: 5 get_ticks() overflow: now: 120045568, tbl: 120061950, tbh: 6 get_ticks() overflow: now: 120406016, tbl: 120422398, tbh: 7 *** Warning - bad CRC, using default environment ...... Not sure how this actually happens - I am not aware of any such severe hardware errata in the A53 r0p4 or the timer, also I think that would have bitten us elsewhere already. As ATF keeps the secondaries in WFI, U-Boot only runs on CPU0 (confirmed by MPIDR reads). Also U-Boot reads the physical counter, so a bogus CNTVOFF can also not be the culprit.
So I can fix this annoying issue by using one of the other proposed fixes (handling timebase_h only if BITS_PER_LONG < 64 or defining get_ticks in armv8/generic_timer.c), but it would still be interesting to find the real root cause.
Thanks, Andre.

On 10/26/2016 02:07 AM, André Przywara wrote:
On 25/10/16 08:52, Alexander Graf wrote:
Hi Alex,
thanks for looking at this!
On 25/10/2016 02:51, Andre Przywara wrote:
On systems using the generic timer routines defined in lib/time.c we use timebase_l and timebase_h fields from the gd to detect wraparounds in our tick counter. The tick calculcation algorithm silently assumes that a long is only 32 bits, which leads to wrong results when timebase_h is not 0 on 64-bit systems. As one possible fix lets initialize timebase_h (and timebase_l) to 0, so on 64-bit systems timebase_h will never(TM) be bigger than 0 and thus cannot spoil timebase_l by being ORed into it.
This fixes occasional timeout issues on the Pine64 (and possibly other ARM64 systems).
Well, not really (this fix isn't sufficient), but read on ...
Signed-off-by: Andre Przywara andre.przywara@arm.com
Hi,
I am bit puzzled what the proper fix is, this one is the easiest I could come up with. Not sure if the gd should be zeroed normally (and it's just broken on sunxi/arm64 because of some linker issues) or whether we really forgot to initialize those fields and just got away with it.
The gd clearing happens via crt0_64.S -> board_init_f_init_reserve(). There we should have fully cleared all memory associated with global data.
Ah,thanks for pointing to that. I was a bit clueless where to start looking for it - "git grep gd" is obviously not a good idea ;-)
I can't see anything obviously wrong in that code. Could you try to dump gd if the timer offsets are != 0 on init? Maybe we can conclude something from the data in it.
So I agree that this code looks sane and indeed in all my dumps it looks like the initialization works fine. I did some more debugging and learned that the increased timebase_h comes from detected overflows: in fact some readings are really lower than previous ones: ... MMC: SUNXI SD/MMC: 0 get_ticks() overflow: now: 118046720, tbl: 118063103, tbh: 0 get_ticks() overflow: now: 118439936, tbl: 118456318, tbh: 1 get_ticks() overflow: now: 118571008, tbl: 118587391, tbh: 2 get_ticks() overflow: now: 118734848, tbl: 118751230, tbh: 3 get_ticks() overflow: now: 119422976, tbl: 119439358, tbh: 4 get_ticks() overflow: now: 119783424, tbl: 119799807, tbh: 5 get_ticks() overflow: now: 120045568, tbl: 120061950, tbh: 6 get_ticks() overflow: now: 120406016, tbl: 120422398, tbh: 7 *** Warning - bad CRC, using default environment ...... Not sure how this actually happens - I am not aware of any such severe hardware errata in the A53 r0p4 or the timer, also I think that would have bitten us elsewhere already. As ATF keeps the secondaries in WFI, U-Boot only runs on CPU0 (confirmed by MPIDR reads). Also U-Boot reads the physical counter, so a bogus CNTVOFF can also not be the culprit.
So I can fix this annoying issue by using one of the other proposed fixes (handling timebase_h only if BITS_PER_LONG < 64 or defining get_ticks in armv8/generic_timer.c), but it would still be interesting to find the real root cause.
Can you try and ask around? If this bites us in U-Boot, there's a good chance Linux timers should be broken too, no?
I remember that NXP had similar problems with the timer:
https://patchwork.kernel.org/patch/9344727/
Alex

Hi,
On 26/10/16 08:14, Alexander Graf wrote:
On 10/26/2016 02:07 AM, André Przywara wrote:
On 25/10/16 08:52, Alexander Graf wrote:
Hi Alex,
thanks for looking at this!
On 25/10/2016 02:51, Andre Przywara wrote:
On systems using the generic timer routines defined in lib/time.c we use timebase_l and timebase_h fields from the gd to detect wraparounds in our tick counter. The tick calculcation algorithm silently assumes that a long is only 32 bits, which leads to wrong results when timebase_h is not 0 on 64-bit systems. As one possible fix lets initialize timebase_h (and timebase_l) to 0, so on 64-bit systems timebase_h will never(TM) be bigger than 0 and thus cannot spoil timebase_l by being ORed into it.
This fixes occasional timeout issues on the Pine64 (and possibly other ARM64 systems).
Well, not really (this fix isn't sufficient), but read on ...
Signed-off-by: Andre Przywara andre.przywara@arm.com
Hi,
I am bit puzzled what the proper fix is, this one is the easiest I could come up with. Not sure if the gd should be zeroed normally (and it's just broken on sunxi/arm64 because of some linker issues) or whether we really forgot to initialize those fields and just got away with it.
The gd clearing happens via crt0_64.S -> board_init_f_init_reserve(). There we should have fully cleared all memory associated with global data.
Ah,thanks for pointing to that. I was a bit clueless where to start looking for it - "git grep gd" is obviously not a good idea ;-)
I can't see anything obviously wrong in that code. Could you try to dump gd if the timer offsets are != 0 on init? Maybe we can conclude something from the data in it.
So I agree that this code looks sane and indeed in all my dumps it looks like the initialization works fine. I did some more debugging and learned that the increased timebase_h comes from detected overflows: in fact some readings are really lower than previous ones: ... MMC: SUNXI SD/MMC: 0 get_ticks() overflow: now: 118046720, tbl: 118063103, tbh: 0 get_ticks() overflow: now: 118439936, tbl: 118456318, tbh: 1 get_ticks() overflow: now: 118571008, tbl: 118587391, tbh: 2 get_ticks() overflow: now: 118734848, tbl: 118751230, tbh: 3 get_ticks() overflow: now: 119422976, tbl: 119439358, tbh: 4 get_ticks() overflow: now: 119783424, tbl: 119799807, tbh: 5 get_ticks() overflow: now: 120045568, tbl: 120061950, tbh: 6 get_ticks() overflow: now: 120406016, tbl: 120422398, tbh: 7 *** Warning - bad CRC, using default environment ...... Not sure how this actually happens - I am not aware of any such severe hardware errata in the A53 r0p4 or the timer, also I think that would have bitten us elsewhere already. As ATF keeps the secondaries in WFI, U-Boot only runs on CPU0 (confirmed by MPIDR reads). Also U-Boot reads the physical counter, so a bogus CNTVOFF can also not be the culprit.
So I can fix this annoying issue by using one of the other proposed fixes (handling timebase_h only if BITS_PER_LONG < 64 or defining get_ticks in armv8/generic_timer.c), but it would still be interesting to find the real root cause.
Can you try and ask around? If this bites us in U-Boot, there's a good chance Linux timers should be broken too, no?
I remember that NXP had similar problems with the timer:
So I did some more experiments, and indeed it looks very much like a silicon bug in the A64. Philipp found this in the BSP kernel: https://github.com/longsleep/linux-pine64/blob/pine64-hacks-1.2/drivers/cloc...
So I have a direct test that reads both the CNTVCNT register and CLOCK_MONOTONIC_RAW back to back for a few million times and looks for decreasing values. On the Freescale box it triggers on every boot, on A64 only on certain boots.
But the error pattern looks similar (though not identical), on the A64 it is: CNTVCNT before: xxxx7fff, CNTVCNT after: xxxx4000 CNTVCNT before: xxxxffff, CNTVCNT after: xxxxc000 On the Freescale box I see different lengths of bogus lower bits, on the A64 it's always the lower 15 bits.
These are at least the cases that I could spot (because the later read returns a smaller value), there may be more corruptions that are harder to find.
Setting the new Freescale erratum DT property in the arch timer node fixes at least the Linux interface on the A64, as expected. But as stated above this doesn't happen on every boot, so there might be the chance that it's a missing initialization somewhere.
Since clock_gettime() gets disabled in the VDSO with the workaround, I'd like to do more research to get an idea why it sometimes works and sometimes not.
But unless we find another fix, I'll send a patch to enable the FSL workaround (both in U-Boot and Linux).
Cheers, Andre.
participants (3)
-
Alexander Graf
-
Andre Przywara
-
André Przywara