
On 5/26/2011 4:28 PM, Graeme Russ wrote:
Hi Bill,
On Fri, May 27, 2011 at 2:56 AM, J. William Campbell jwilliamcampbell@comcast.net wrote:
On 5/26/2011 6:27 AM, Graeme Russ wrote:
Hello Everyone,
OK - Starting a new thread to discuss implementation details. This is a heads-up for arch/platform maintainers - Once this is a bit more stable, I will put it on the wiki
Assumed Capabilities of the Platform
- Has a 'tick counter' that does not rely on software to increment
Hi All, The nios2 with the most basic timer does not meet this requirement. It will not count at all without the 10 ms interrupt. I don't think this requirement matters anyway. We need a 'tick counter' that 'ticks'. If it takes software to make it tick, we don't much care. There may be problems with early use of udelay in that case, but that is a different issue.
I think we will need to define get_timer() weak - Nios will have to override the default implementation to cater for it's (Nios') limitations
Hi All, Yes, that will probably be required here.
- tick interval may by a fixed constant which cannot be controlled via software, or it could be programmable (PIT)
API Functions (/lib/timer.c)
- u32 get_timer(u32 start)
- Returns the number of elapsed milliseconds since 'start'
API Functions (/arch/...)
- void udelay(u32 delay)
- Used for 'short' delays (generally up to several seconds)
- Can use the tick counter if it is fast enough
- MUST NOT RESET THE TICK COUNTER
There is a requirement that udelay be available before relocation and before the BSS is available. One can use the tick counter to provide udelay as long as sync_timebase is not called OR sync timebase does not use BSS. It appears many implementations ignore this requirement at present. We should try to fix this, but is should not be a requirement.
If you really wanted to, sync_timebase() could use global data (it doesn't have many static variables) in which case all timer functions would be available before relocation
Yes, my implementation of the sync_timebase routine was written that way, using gd-> for the required variables.
'Helper' Functions (/lib/timer.c)
I think this function should be weak, so that it is possible for people to override it with a "custom" function. The fully general sync_timebase has lots of code in it that can be simplified in special cases. We want and need a fully general function to be available, but other users who are real tight on space may want a cut down version. We should make that easily possible.
Agree
- void sync_timebase(void)
- Updates the millisecond timer
- Utilises HAL functions to access the platform's tick counter
- Must be called more often than the rollover period of the platform's tick counter
- Does not need to be called with a regular frequency (immune to interrupt skew)
- Is always called by get_timer()
- For platforms with short tick counter rollovers it should be called by an ISR
- Bill Campbell wrote a good example which proved this can be common and arbitrary (and optionally free of divides and capable of maintaining accuracy even if the tick frequency is not an even division of 1ms)
HAL Functions (/arch/... or /board/...)
- u64 get_ticks(void)
For what it's worth, I would like to propose using a (gasp!) typedef here. It seems to me there are a whole lot of cases where the max number of ticks is a u32 or less. In those cases, the wrap at 32 bits helps things a lot. If the tick counter is really 64 bits, the function of sync_timebase is simply to convert the tick value to millisec, and that is it. Otherwise, if it is 32 bits or less then some other actions will be required. I think this is one of those times where a typedef would help, We could define a type called timer_tick_t to describe this quantity. That would allow a pure 32 bit implementation where appropriate.
Another suggestion is that perhaps we want a u32 get_ticks_lsb(void) as well as a regular get_ticks. The lsb version would be used for udelay and could possibly come from another timer if that was necessary/desirable. See the requirement for early udelay early availability.
I think this all adds unnecessary complexity
- Returns a tick count as an unsigned 64-bit integer - Abstracts the implementation of the platform tick counter (platform may by 32-bit 3MHz counter, might be a 16-bit 0-999 microsecond plus 16-bit 0-65535 millisecond etc)
- u64 ticks_per_millisecond()
- Returns the number of ticks (as returned by get_ticks()) per millisecond
I think ticks_per_sec would be a better choice. First, such a function already exists in almost all u-boots. Second, if one wants the best accuracy for things like udelay, you need better accuracy than millisec. Since this function is used only infrequently, when things are initialized, a divide to get ticks_per_millsec (if that is what you really want) is no big deal. Lastly, I think this function can remain u32. Yes, there is a 4 GHz limit on
Don't underestimate the ability of existing platforms to already exceed this limit - Scientific equipment can easily have a 1 nano second tick counter (with extreme precision)
True enough. I have already agreed that usec2ticks and ticks2usec are fine for this purpose.
the clock rate. If this ever becomes an issue, we can change the type to timer_tick_t. When the CPU clock rate gets quite high, it is an advantage to divide it down for performance measurement anyway. The AMD/Intel chips already do this. If the hardware doesn't do it, shift the timer value right two bits. I doubt you will miss the 0.4 nanoseconds resolution loss from your 10 GHz timestamp.
Why mess around with bit shifting (which you would then have to cludge into your platform code) when carting around a 64-bit value is relatively cheap, transparent and poratble (all all supported up-to-date tool chains)
I really STRONGLY disagree with this statement. If you actually needed 64 bit variables, fine use them. But as I have already shown, you do not need them in general. We are computing a 32 bit result. There is some entropy argument that says you shouldn't need 64 bits to do so. Another way to look at it is that converting the top 32 bit word and the bottom 32 bit word to ms separately can be easier than doing them both together at once. However, as we will see below, I do agree we need two 32 bit words to make this process go smoothly. I just don't agree that they should/will constitute a 64 bit binary word. See below.
- void timer_isr()
- Optional (particularly if tick counter rollover period is exceptionally log which is usually the case for native 64-bit tick counters)
- Simply calls sync_timebase()
- For platforms without any tick counter, this can implement one (but accuracy will be harmed due to usage of disable_interrupts()
and enable_interrupts() in U-Boot
So to get the new API up and running, only two functions are mandatory:
get_ticks() which reads the hardware tick counter and deals with any 'funny stuff' including rollovers, short timers (12-bit for example), composite counters (16-bit 0-999 microsecond + 16-bit millisecond) and maintains a 'clean' 64-bit tick counter which rolls over from all 1's to all 0's. The
I think it is the task of get_ticks to return the hardware tick counter as an increasing counter, period. The counter may wrap at some final count that is not all ones. That is ok. Sync_timebase deals with the rollovers if
The hardware tick counter may, the 64-bit software tick counter maintained by get_ticks() may not
necessary. get_ticks is very lightweight. get_ticks should deal with decrementing counters by returning the complement of the counter. The sc520 case is a bit more complex if you intend to use the 0-999 and 16 bit millisec registers, in that you do need to add them to the previous value to
As I mentioned in another post, this is a problem for the platform maintainer and is abstracted away throught the platform specific implementation of get_ticks()
make an increasing counter. Sync_timebase "likes" short counters in that they are easy to convert to millisec and tick remainders.
The compiler should handle using 64-bit rather than 32-bit transparently
True enough. But you don't need 64 bit variables at this point two 32 bit ones work just fine, in fact better in most cases.
64-bit tick counter does not need to be reset to zero ever (even on startup
- sync_timebase tacks care of all the details)
True, but sync_timebase does have to be initialized (as does the timer itself in most cases, so this is not a restriction).
This can be done in timer_init() via a call to sync_timebase() after the timer has been configured. This should bring everything into line
ticks_per_millisecond() simply return the number of ticks in a millisecond
- This may as simple as:
inline u64 ticks_per_millisecond(void) { return CONFIG_SYS_TICK_PER_MS; }
But it may be trickier if you have a programmable tick frequency
You will have to call the routine that initializes sync_timebase. This routine should have a name, like void init_sync_timebase(void)?
The optional timer ISR is required if the tick counter has a short roll over duration (short is up to you - 1 second is short, 1 hour might be, 1 century is not)
Regards,
Graeme
It is probably true that sync_timebase should have a parameter flag. The reason is that if the timer isr is called only when the timer wraps, then the calls to sync_timebase may be slightly more than a full timer period apart. (due to interrupt latency). Therefore, when the timer difference is computed, if the current update is due to a wrap AND the previous update is due to a wrap, the difference should be approximately 1 wrap. If it comes up real short, you must add a wrap. This isn't necessary if the routine is called more often than once per wrap. Also, when sync_timebase is called in
timer_isr() MUST be called more often than the rollover period of the underlying hardware tick counter - This is a requirement
The equality case can be made to work. If the extension of the counter is done in the interrupt routine, not in get_ticks, get_ticks just needs to read the msb of the counter, read the lsb of the counter, then verify that the msb has not changed. If you have interrupts that work, that is the easiest way to go. If the lsb of the counter has represents 1 ms or less, you can just drop it (equivalent to the what the PPC does now). If the interrupt is slower than that, you must use at least part of the LSB. If you don't have interrupts, the point is moot.
get_timer, you must first disable interrupts and then enable interrupts after sync_timebase returns
Why? - get_ticks() provides an atomic read of the hardware tick counter. If get_ticks() needs to disable and enable interrupts to do so, that is a problem for the platform maintainer
Admittedly, sync_timebase() will not be re-entrant, but how will it ever be called concurrently? - Ah, I see - a call to get_timer() interrupted by the timer ISR :)
Yes, that is the problem. I have come to the view that two 32 bit words are the best approach. Note that the lsb may actually not fill the full 32 bits. The top 32 bits are the rollover count and the bottom 32 bits are the current counter. If the counter is a full 32 bits, so much the better. Again, one could put them together inside the interrupt routine , but it is easier to check for a changed value if you don't do this. Otherwise, you have to check both words. It also makes the isr faster. It is just an increment of the overflow counter, like the PPC is now. I happen to think it is easier to convert the two 32 bit words to milliseconds one at a time, but if you feel you must use 64 bit words, that is fine too. Just remember, the counter does not always fill the entire bottom word. In cases where there are no interrupts, get_ticks has to detect that the timer has "backed up" and increment the overflow counter itself, unless the counter is 64 bits to begin with and overflow is impossible anyway. get_ticks should NOT try to detect overflows if interrupts are available. If it got both words before an interrupt happened, the answer is correct. If it got an interrupt in between fetching the words, the event will be detected and the value re-fetched. All sync_timebase would do now is convert the returned value to milliseconds.
So, if you have a 64 bit hardware counter, get_ticks reads and returns it. Else if you have interrupts, get_ticks reads the overflow counter into the msb. Next, it reads the hardware timer into the lsb. If the counter is a down counter, the lsb is = to the counter max - the lsb. The msb is then checked to make sure it hasn't changed, if it has, repeat the process. All the interrupt routine does is increase the overflow count. If you don't have interrupts get_ticks reads the hardware counter into the lsb. If the counter is a down counter, the lsb is = to the counter max - the lsb. If the lsb is less than it was in the previous call to get ticks, the overflow counter is increased. get_ticks then loads the overflow counter into the msb.
sync_timebase converts the msb and lsb into millisec. It may do this by a 64 bit divide, or some shifting to align the lsb with then msb and the a 64 bit divide, or a bunch of 32 bit fractional multiplies, or any such approach that works.
How does that sound? Best Regards. Bill Campbell
Regards,
Graeme