
Hi Bill,
On Fri, May 27, 2011 at 1:54 PM, J. William Campbell jwilliamcampbell@comcast.net wrote:
On 5/26/2011 6:51 PM, Graeme Russ wrote:
Hi Bill,
On Fri, May 27, 2011 at 11:26 AM, J. William Campbell jwilliamcampbell@comcast.net wrote:
[snip]
Yes, that is the problem. I have come to the view that two 32 bit words are the best approach. Note that the lsb may actually not fill the full 32 bits.
Urghhh
Hi All, Consider the case where we have a 16 bit counter clocked by 1.0 MHz (a nice number) interrupting on overflow. We could return (overflow counter << 16) | hardware counter from the get_ticks routine, which maintains the "binaryness" of the "number". OTOH, suppose the counter is a down counter clocked at 1.19318 MHz with a 1 ms period, like the PC counter is set up at present. What do we return then? Whatever we do, it depends on the clock rate, which may change, and involves some math, which may not work for all clock rates. In effect, you have a dual radix number. Yes, conversion is possible but it is messy and would be present in different forms in all the various get_tick routines.
get_ticks() does not care about the clock rate - It simply looks at the current value of the hardware tick counter and the value of the hardware tick counter the last time get_ticks() was called, calculates the difference and adds that to the 64-bit software tick-counter
I don't see how it being a down counter makes that any more difficult
This is neither simple nor maintainable. Further, it is un-necessary, as the sync_timer routine is just going to convert the number from whatever radix we converted it to into millisec. If we leave the two numbers as split, all that complexity is removed from get_ticks and sent upward to the common routine that converts the answer into ms anyway. This makes the system more maintainable by placing the minimum requirement on get_ticks. The "tick" should be opaque to anybody but sync_timebase anyway.
But how is the common routine going to know if it is a split timer, up timer, down timer, little endian, big endian, etc, etc.
get_ticks() abstracts the hardware implementation of the hardware timer from sync_timer()
The top 32 bits are the rollover count and the bottom 32 bits are the current counter. If the counter is a full 32 bits, so much the better.
Ahhhhh - Lets keep it that way
Again, one could put them together inside the interrupt routine , but it is easier to check for a changed value if you don't do this. Otherwise, you have to check both words. It also makes the isr faster. It is just an
As I said before - Simple First, Fast Later
I am in favor of simple. That is why I want get_ticks to do as little as possible. It should just read the hardware register and the overflow counter if it is separate. Make sure the overflow didn't change while we were reading. This is redundant if we are not using interrupts but we can leave the code in. It just won't do anything. We can also move the rollover detection to sync_timebase. It will be redundant if we are using interrupts, because time will never "back up". But we can do it this way. This centralizes the overflow detection, which is a good thing.
That does not sound simple to me. This, however, does:
u64 get_ticks(void) { static u64 last_hw_tick_count; static u64 last_sw_tick_count;
/* Now for the platform specific stuff - Read hardware tick counter */ u64 current_hw_tick_count = /* Read hw registers etc */
/* Deal with hardware weirdness - errata, stupid hw engineers etc */
u64 elapsed_ticks = current_hw_tick_count - last_hw_tick_count; last_sw_tick_count += elapsed_ticks;
return last_sw_tick_count; }
The '/* Read hw registers etc */' bit will always be the same, no matter what way you do it
The '/* Deal with hardware weirdness - errata, stupid hw engineers etc */' bit is where we are truly abstracting the hardware away - This is the bit you propose to leave mangled and deal with in sync_time? You make a lot of assumptions about the consistency of this highly variable logic across all current (and future) U-Boot platforms
What if the hardware engineer figures he can save a squeeze some extra mileage out of a FPGA by implementing a 24-bit counter + 8-bit status in one 32-bit register - get_ticks() strips all that out
You want to do some in get_ticks(), some in the ISR and some in sync_timer() - Put all the weird stuff on one place - The HAL
increment of the overflow counter, like the PPC is now. I happen to think it is easier to convert the two 32 bit words to milliseconds one at a time, but if you feel you must use 64 bit words, that is fine too. Just remember, the counter does not always fill the entire bottom word.
Urghhh
In cases where there are no interrupts, get_ticks has to detect that the timer has "backed up" and increment the overflow counter itself, unless the counter is 64 bits to begin with and overflow is impossible anyway. get_ticks should NOT try to detect overflows if interrupts are available. If it got both words before an interrupt happened, the answer is correct. If it got an interrupt in between fetching the words, the event will be detected and the value re-fetched. All sync_timebase would do now is convert the returned value to milliseconds.
So, if you have a 64 bit hardware counter, get_ticks reads and returns it. Else if you have interrupts, get_ticks reads the overflow counter into the msb. Next, it reads the hardware timer into the lsb. If the counter is a down counter, the lsb is = to the counter max - the lsb. The msb is then checked to make sure it hasn't changed, if it has, repeat the process. All the interrupt routine does is increase the overflow count. If you don't have interrupts get_ticks reads the hardware counter into the lsb. If the counter is a down counter, the lsb is = to the counter max - the lsb. If the lsb is less than it was in the previous call to get ticks, the overflow counter is increased. get_ticks then loads the overflow counter into the msb.
sync_timebase converts the msb and lsb into millisec. It may do this by a 64 bit divide, or some shifting to align the lsb with then msb and the a 64 bit divide, or a bunch of 32 bit fractional multiplies, or any such approach that works.
How does that sound?
The fact that you have described three different implementations of get_ticks() with two of these differentiated between whether you have interrupts or not immediately suggests this solution is inherently more complex and less maintainable.
Yes, It is less maintainable because there are three cases. As shown above, we can reduce the number of cases to one at the expense of some redundant code. There are problems with our original approach when the interrupt rate is == to the counter rollover period. This problem is detailed below. You placed the constraint that the interrupt rate was faster than the rollover period on the previous implementation. However, that constraint is not met in essentially all interrupt driven cases. Also, note that these get_ticks routines, even while different, are pretty simple. OTOH, some of the 64 bit counters can't be read atomically either, so having the "redundant"check in the code is not so bad. The again, we are talking about a total of probably 5 lines of code or so anyway. It is a good idea to move the rollover detection in the non-interrupt case to sync_timebase as well. Sync_timebase can also invert the down-counting counters, removing that from get_ticks. The wrap detection code can be #ifdef out if one is using interrupts and
Urghh - Why are you adding unnecessary ugliness - #ifdef in the middle of code is usually a sign you are doing something wrong
offended by it's presence. Thanks for pointing this out and compelling me to reduce the number of cases! Making get_ticks more lightweight is a good idea in my opinion.
Lets say you have a platform with a 32-bit tick counter running at a reasonably long rollover time so you decide not to use interrupts. Then you create a new platform with the same tick counter, but it runs much faster and you realise you need to implement the interrupt routine to make get_timer() work for long enough periods - Fine, you add an ISR for the new platform that calls sync_timebase - No other changes are required.
The last thing we want is for the 64-bit tick counter to be conceptually different across platforms
I just realised - the ISR _does not need to call the sync_timebase at all_ The ISR only needs to call get_ticks(), so it will be fast anyway
The problem is that the way we previously detected wrapping does not work if the interrupt rate is == to the counter wrap time, which it essentially always is. If get_ticks is trying to update the wrap count when an interrupt
Is it, always, on every platform?
comes in, it will do it wrong. If the interrupt rate was faster, it would work, because get_ticks would always know the correct answer. Consider the following. Get ticks is called by the counter overflowing (which resets it to 0) and executes with the counter value at 0. Later, at the next rollover, with no intervening calls to get ticks, the interrupt routine calls get ticks. Assuming negligible interrupt latency, get_ticks just sees the counter is still 0, so it will not detect a wrap condition. So now you loose
But wait a minute, don't you KNOW that the interrupt gets triggered on a rollover so therefore there MUST have been a rollover, therefore there has been 2^32 (or 2^16 or 2^whatever) ticks which you can just subtract from the last known tick count (which would be zero if there had been no get_timer calls in between)
a period. I thought by passing in a flag we could force get_ticks to do the right thing in this case, but since we must disable interrupts when we call get ticks from the get_timer routine, the outer call to get ticks may have already detected the rollover and the flag will force an additional rollover to be detected. It is a classic race condition. If we detect rollover only
Aha! - In nearly all situations, a race is better avoided than run!
in the interrupt routine, we do not need to protect the rollover variable from access by the main program, so we can be sure the results of get_ticks is coherent. If we try do do it in both places, we will have trouble. If the interrupt occurred at a faster rate, like say twice the rollover, we wouldn't have a problem. But it doesn't. In most cases it happens at the rollover rate, or just a bit more sometimes due to interrupt latency not
Again does it really - for all arches?
being a constant. It may appear to happen at somewhat less than a rollover as well, if the interrupt latency was long at time n-1 and short at time n. I think this problem can be overcome in the old design by keeping track of whether the last update was by a non-interrupt or interrupt call to get_ticks , but I think the new design is better anyway.
I disagree - The whole point of a HAL is to Abstract the Hardware
Regards,
Graeme