Re: [U-Boot] [RFC][Timer API] Revised Specification - Implementation details

27 May 2011


      Hi Bill,
On Fri, May 27, 2011 at 1:54 PM, J. William Campbell
jwilliamcampbell@comcast.net wrote:
...
On 5/26/2011 6:51 PM, Graeme Russ wrote:
...
Hi Bill,
On Fri, May 27, 2011 at 11:26 AM, J. William Campbell
jwilliamcampbell@comcast.net  wrote:
...
[snip]
...
...
...
Yes, that is the problem. I have come to the view that  two 32 bit words
are
the best approach. Note that the lsb may actually not fill the full 32
bits.
Urghhh
Hi All,
   Consider the case where we have a 16 bit counter clocked by 1.0 MHz (a
nice number) interrupting on overflow. We could return (overflow counter <<
16) | hardware counter from the get_ticks routine, which maintains the
"binaryness" of the "number".
OTOH, suppose the counter is a down counter clocked at 1.19318 MHz with a 1
ms period, like the PC counter is set up at present.  What do we return
then?   Whatever we do, it depends on the clock  rate, which may change, and
involves some math, which may not work for all clock rates. In effect, you
have a dual radix number. Yes, conversion is possible but it is messy and
would be present in different forms in all the various get_tick routines.
get_ticks() does not care about the clock rate - It simply looks at the
current value of the hardware tick counter and the value of the hardware
tick counter the last time get_ticks() was called, calculates the difference
and adds that to the 64-bit software tick-counter
I don't see how it being a down counter makes that any more difficult
...
This is neither simple nor maintainable. Further, it is un-necessary, as the
sync_timer routine is just going to convert the number from whatever radix
we converted it to into millisec.  If we leave the two numbers as split, all
that complexity is removed from get_ticks and sent upward to the common
routine that converts the answer into ms anyway. This makes the system more
maintainable by placing the minimum requirement on get_ticks. The "tick"
should be opaque to anybody but sync_timebase anyway.
But how is the common routine going to know if it is a split timer, up
timer, down timer, little endian, big endian, etc, etc.
get_ticks() abstracts the hardware implementation of the hardware timer
from sync_timer()
...
...
...
The top 32 bits are the rollover count and the bottom 32 bits are the
current counter. If the counter is a full 32 bits, so much the better.
Ahhhhh - Lets keep it that way
...
Again, one could put them together inside the interrupt routine , but it
is
easier to check for a changed value if you don't do this. Otherwise, you
have to check both words. It also makes the isr faster. It is just an
As I said before - Simple First, Fast Later
I am in favor of simple. That is why I want get_ticks to do as little as
possible. It should just read the hardware register and the overflow counter
if it is separate. Make sure the overflow didn't change while we were
reading. This is redundant if we are not using interrupts but we can leave
the code in. It just won't do anything.  We can also move the rollover
detection to sync_timebase. It will be redundant if we are using interrupts,
because time will never "back up". But we can do it this way. This
centralizes the overflow detection, which is a good thing.
That does not sound simple to me. This, however, does:
u64 get_ticks(void)
{
    static u64 last_hw_tick_count;
    static u64 last_sw_tick_count;
/* Now for the platform specific stuff - Read hardware tick counter */
    u64 current_hw_tick_count = /* Read hw registers etc */
/* Deal with hardware weirdness - errata, stupid hw engineers etc */
u64 elapsed_ticks = current_hw_tick_count - last_hw_tick_count;
    last_sw_tick_count += elapsed_ticks;
return last_sw_tick_count;
}
The '/* Read hw registers etc */' bit will always be the same, no matter
what way you do it
The '/* Deal with hardware weirdness - errata, stupid hw engineers etc */'
bit is where we are truly abstracting the hardware away - This is the
bit you propose to leave mangled and deal with in sync_time? You make a
lot of assumptions about the consistency of this highly variable logic
across all current (and future) U-Boot platforms
What if the hardware engineer figures he can save a squeeze some extra
mileage out of a FPGA by implementing a 24-bit counter + 8-bit status
in one 32-bit register - get_ticks() strips all that out
You want to do some in get_ticks(), some in the ISR and some in
sync_timer() - Put all the weird stuff on one place - The HAL
...
...
...
increment of the overflow counter, like the PPC is now. I happen to think
it
is easier to convert the two 32 bit words to milliseconds one at a time,
but
if you feel you must use 64 bit words, that is fine too. Just remember,
the
counter does not always fill the entire bottom word.
Urghhh
...
In cases where there are no interrupts, get_ticks has to detect that the
timer has "backed up" and increment the overflow counter itself, unless
the
counter is 64 bits to begin with and overflow is impossible anyway.
get_ticks should NOT try to detect overflows if interrupts are available.
If
it got both words before an interrupt happened, the answer is correct. If
it
got an interrupt in between fetching the words, the event will be
detected
and the value re-fetched. All sync_timebase would do now is convert the
returned value to milliseconds.
So, if you have a 64 bit hardware counter, get_ticks reads and returns
it.
Else if you have interrupts, get_ticks reads the overflow counter into
the
msb. Next, it reads the hardware timer into the lsb. If the counter is a
down counter, the lsb is = to the counter max - the lsb. The msb is then
checked to make sure it hasn't changed, if it has, repeat the process.
All
the interrupt routine does is increase the overflow count.
If you don't have interrupts get_ticks reads the hardware counter into
the
lsb. If the counter is a down counter, the lsb is = to the counter max -
the
lsb. If the lsb is less than it was in the previous call to get ticks,
the
overflow counter is increased. get_ticks then loads the overflow counter
into the msb.
sync_timebase converts the msb and lsb into millisec. It may do this by a
64
bit divide, or some shifting to align the lsb with then msb and the a 64
bit
divide, or a bunch of 32 bit fractional multiplies, or any such approach
that works.
How does that sound?
The fact that you have described three different implementations of
get_ticks() with two of these differentiated between whether you have
interrupts or not immediately suggests this solution is inherently more
complex and less maintainable.
Yes, It is less maintainable because there are three cases.  As shown above,
we can reduce the number of cases to one at the expense of some redundant
code. There are problems with our original approach when the interrupt rate
is == to the counter rollover period. This problem is detailed below. You
placed the constraint that the interrupt rate was faster than the rollover
period on the previous implementation. However, that constraint is not met
in essentially all interrupt driven cases. Also, note that these get_ticks
routines, even while different, are pretty simple. OTOH, some of the 64 bit
counters can't be read atomically either, so having the "redundant"check in
the code is not so bad. The again, we are talking about a total of probably
5 lines of code or so anyway. It is a good idea to move the rollover
detection in the non-interrupt case to sync_timebase as well. Sync_timebase
can also invert the down-counting counters, removing that from get_ticks.
The wrap detection code can be #ifdef out if one is using interrupts and
Urghh - Why are you adding unnecessary ugliness - #ifdef in the middle of
code is usually a sign you are doing something wrong
...
offended by it's presence. Thanks for pointing this out and compelling me to
reduce the number of cases! Making get_ticks more lightweight is a good idea
in my opinion.
...
Lets say you have a platform with a 32-bit tick counter running at a
reasonably long rollover time so you decide not to use interrupts. Then
you create a new platform with the same tick counter, but it runs much
faster and you realise you need to implement the interrupt routine to
make get_timer() work for long enough periods - Fine, you add an ISR
for the new platform that calls sync_timebase - No other changes are
required.
The last thing we want is for the 64-bit tick counter to be conceptually
different across platforms
I just realised - the ISR _does not need to call the sync_timebase at all_
The ISR only needs to call get_ticks(), so it will be fast anyway
The problem is that the way we previously detected wrapping does not work if
the interrupt rate is == to the counter wrap time, which it essentially
always is. If get_ticks is trying to update the wrap count when an interrupt
Is it, always, on every platform?
...
comes in, it will do it wrong. If the interrupt rate was faster, it would
work, because get_ticks would always know the correct answer. Consider the
following. Get ticks is called by the counter overflowing (which resets it
to 0) and executes with the counter value at 0. Later, at the next rollover,
with no intervening calls to get ticks, the interrupt routine calls get
ticks. Assuming negligible  interrupt latency, get_ticks just sees the
counter is still 0, so it will not detect a wrap condition. So now you loose
But wait a minute, don't you KNOW that the interrupt gets triggered on a
rollover so therefore there MUST have been a rollover, therefore there has
been 2^32 (or 2^16 or 2^whatever) ticks which you can just subtract from the
last known tick count (which would be zero if there had been no get_timer
calls in between)
...
a period. I thought by passing in a flag we could force get_ticks to do the
right thing in this case, but since we must disable interrupts when we call
get ticks from the get_timer routine, the outer call to get ticks may have
already detected the rollover and the flag will force an additional rollover
to be detected. It is a classic race condition. If we detect rollover only
Aha! - In nearly all situations, a race is better avoided than run!
...
in the interrupt routine, we do not need to protect the rollover variable
from access by the main program, so we can be sure the results of get_ticks
is coherent. If we try do do it in both places, we will have trouble. If the
interrupt occurred at a faster rate, like say twice the rollover, we
wouldn't have a problem. But it doesn't. In most cases it happens at the
rollover rate, or just a bit more sometimes due to interrupt latency not
Again does it really - for all arches?
...
being a constant. It may appear to happen at somewhat less than a rollover
as well, if the interrupt latency was long at time n-1 and short at time n.
I think this problem can be overcome in the old design by keeping track of
whether the last update was by a non-interrupt or interrupt call to
get_ticks , but I think the new design is better anyway.
I disagree - The whole point of a HAL is to Abstract the Hardware
Regards,
Graeme