Re: [U-Boot] [RFC][Timer API] Revised Specification - Implementation details

27 May 2011

      On 5/26/2011 11:54 PM, Graeme Russ wrote:
...
On Fri, May 27, 2011 at 4:33 PM, J. William Campbell
jwilliamcampbell@comcast.net  wrote:
...
On 5/26/2011 9:33 PM, Graeme Russ wrote:
...
Hi Bill,
<snip>
[massive snip]
OK, you have my ears pricked - Can you give me code samples for:

get_ticks()
sync_timbase() (no need to implement the whole lot if that is too
much effort right now)
timer_isr()

that handle the following hardware tick counter scenarios:
a) 64-bit up counter
b) 64-bit down counter
Hi Graeme,
...
c) 32-bit up counter, wraps at 65000
Do you mean 32 bits or 16 bits? doesn't make much difference, but just 
checking.
...
d) 16-bit microsecond up counter 0-999 which wraps and triggers a 16-bit
    millisecond up counter. Reading milliseconds latched microseconds and
    clears milliseconds (look in arch/x86/cpu/sc520/timer.c for example)
e) 24-bit counter occupying bits 2-25 of a 32-bit word (just to be
    difficult)
f) Any other option anyone cares to throw ;)
All of these options must be covered using:

Minimal global data (we would like it to work before relocation, but
not mandatory - GD footprint would be nice)
All use the same sync_timebase function
Demonstrate using an ISR NOT synced to the hardware tick counter and
an ISR that is
Parameters to get_ticks() and sync_timer() are permitted, but not for
timer_isr() (naturally)

OK! Once again, I accept the challenge. One Caveat. My real work 
has been sliding due to the time I have been spending on this. I am 
flying from San Francisco to Sydney tonight , (to work, not to play),  
so I will be off-grid for 14 hours+. You will not get this code for a 
few days, like probably 3 days. I have looked at the requirements, and I 
see no real problems that I don't know how to solve.
...
...
I don't' see any reason to push this down to a lower level. It is just one
more thing to get messed up across implementations.
Agreed
...
...
...
detection in the non-interrupt case to sync_timebase as well.
Sync_timebase
can also invert the down-counting counters, removing that from get_ticks.
The wrap detection code can be #ifdef out if one is using interrupts and
Urghh - Why are you adding unnecessary ugliness - #ifdef in the middle of
code is usually a sign you are doing something wrong
As I said, this is an optional optimization. I do not agree that an #ifdef
in the middle of code indicates you have a bad design. Lots and Lots of
ifdefs certainly indicates a bad design. An ifdef to eliminate code if some
option is not selected is hardly such a strange thing, especially only a
single #ifdef. However, feel free to not have it if you so desire.
OK, I'll let this slide for the moment - please include in above example
Will Do.
...
...
...
...
offended by it's presence. Thanks for pointing this out and compelling me
to
reduce the number of cases! Making get_ticks more lightweight is a good
idea
in my opinion.
...
Lets say you have a platform with a 32-bit tick counter running at a
reasonably long rollover time so you decide not to use interrupts. Then
you create a new platform with the same tick counter, but it runs much
faster and you realise you need to implement the interrupt routine to
make get_timer() work for long enough periods - Fine, you add an ISR
for the new platform that calls sync_timebase - No other changes are
required.
The last thing we want is for the 64-bit tick counter to be conceptually
different across platforms
I just realised - the ISR _does not need to call the sync_timebase at
all_
The ISR only needs to call get_ticks(), so it will be fast anyway
The problem is that the way we previously detected wrapping does not work
if
the interrupt rate is == to the counter wrap time, which it essentially
always is. If get_ticks is trying to update the wrap count when an
interrupt
Is it, always, on every platform?
Yes, pretty much. You get a terminal count/counter underflow interrupt and
that is it.
Not on sc520 - The micro/millisecond counter cannot be used to driver an
interrupt - you need to setup a seperate timer. I think the x86 internal
performance counters are the same
What is true, as you have stated, is that the micro/millisecond counter 
on the sc520 does not interrupt at all. Nor do the x86 performance 
counters. The x86 performance counters are a non-problem because they 
are 64 bits long. We don't need interrupts for them. Now, if you choose 
to use the sc520 micro/millisecond counter, then you need another source 
of interrupts. Due to the fact that reading the sc520 counter resets it, 
we must accumulate the elapsed time in software. That means the 
interrupt routine must do a bit more work, but it also allows reading 
the counters in non-interrupt code (with interrupts disabled) to not 
mess up the accumulated count. We don't detect rollover in the sc520 
counters, we just read and accumulate the value. So no problem there.
...
...
...
...
comes in, it will do it wrong. If the interrupt rate was faster, it would
work, because get_ticks would always know the correct answer. Consider
the
following. Get ticks is called by the counter overflowing (which resets
it
to 0) and executes with the counter value at 0. Later, at the next
rollover,
with no intervening calls to get ticks, the interrupt routine calls get
ticks. Assuming negligible  interrupt latency, get_ticks just sees the
counter is still 0, so it will not detect a wrap condition. So now you
loose
But wait a minute, don't you KNOW that the interrupt gets triggered on a
rollover so therefore there MUST have been a rollover, therefore there has
been 2^32 (or 2^16 or 2^whatever) ticks which you can just subtract from
the
last known tick count (which would be zero if there had been no get_timer
calls in between)
Except if this interrupt was delayed because get_ticks turned off
interrupts, get_ticks may have already compensated the number. When we then
get the (delayed) interrupt, we will do it twice.
That would mean get_timer() got called EXACTLY on the rollover - A race
condition that should be avoided. But still, a race can still occur through
using disable/enable interrupts which could push the timer isr out
...
...
...
a period. I thought by passing in a flag we could force get_ticks to do
the
right thing in this case, but since we must disable interrupts when we
call
get ticks from the get_timer routine, the outer call to get ticks may
have
already detected the rollover and the flag will force an additional
rollover
to be detected. It is a classic race condition. If we detect rollover
only
Aha! - In nearly all situations, a race is better avoided than run!
Yes, that is why the new approach does NOT turn off interrupts when
get_ticks is called, but rather loops if the overflow count was changed
while reading the count lsb. That ensures get_ticks always gets both words
either before the counter wrapped or after the counter wrapped, but not
halfway in between. Some ISYNCs may be required to make sure there are no
outstanding changes pending to memory, depending on your architecture.
OK, lets have a closer look
...
...
...
in the interrupt routine, we do not need to protect the rollover variable
from access by the main program, so we can be sure the results of
get_ticks
is coherent. If we try do do it in both places, we will have trouble. If
the
interrupt occurred at a faster rate, like say twice the rollover, we
wouldn't have a problem. But it doesn't. In most cases it happens at the
rollover rate, or just a bit more sometimes due to interrupt latency not
Again does it really - for all arches?
Yep, pretty much. Almost all timers I know of only give one interrupt per
cycle.
A bold assumption - and one that is wrong ;)
Ok, except for timers that don't interrupt at all and need an outside 
assist! In those cases, you need to interrupt faster than the counter 
period, which as your original constraint. But in that case, the 
interrupt is coming from elsewhere, not the timer being processed.
...
...
...
...
being a constant. It may appear to happen at somewhat less than a
rollover
as well, if the interrupt latency was long at time n-1 and short at time
n.
I think this problem can be overcome in the old design by keeping track
of
whether the last update was by a non-interrupt or interrupt call to
get_ticks , but I think the new design is better anyway.
I disagree - The whole point of a HAL is to Abstract the Hardware
The HAL should abstract what needs abstracting, and no more. We are really
talking about literally 4 lines of code here, but I think those four lines
certainly belong in sync_timer. All the math in one place is a good thing.
In most cases, get_ticks becomes trivial. Only in cases where the timer is
wrong endian or embedded in another bit field (both things being possible
but not all that common) does get_ticks need to do anything other than just
read the registers and loop until the msb is stable. Simple here is good,
because this is what new implementers need to add/change. The less here, the
better! In fact, the looping until the msb is stable can also be moved up to
sync_timer. That is even simpler.
OK, you got me - "show me the money^H^H^H^H^Hcode"
Wilco! You will get them by Monday Afternoon, California time, or sooner 
if I can stay awake long enough.
Best Regards,
Bill Campbell
...
Regards,
Graeme