[U-Boot] [RFC][Timer API] Revised Specification - Implementation details

Hello Everyone,
OK - Starting a new thread to discuss implementation details. This is a heads-up for arch/platform maintainers - Once this is a bit more stable, I will put it on the wiki
Assumed Capabilities of the Platform - Has a 'tick counter' that does not rely on software to increment - tick interval may by a fixed constant which cannot be controlled via software, or it could be programmable (PIT)
API Functions (/lib/timer.c) - u32 get_timer(u32 start) - Returns the number of elapsed milliseconds since 'start'
API Functions (/arch/...) - void udelay(u32 delay) - Used for 'short' delays (generally up to several seconds) - Can use the tick counter if it is fast enough - MUST NOT RESET THE TICK COUNTER
'Helper' Functions (/lib/timer.c) - void sync_timebase(void) - Updates the millisecond timer - Utilises HAL functions to access the platform's tick counter - Must be called more often than the rollover period of the platform's tick counter - Does not need to be called with a regular frequency (immune to interrupt skew) - Is always called by get_timer() - For platforms with short tick counter rollovers it should be called by an ISR - Bill Campbell wrote a good example which proved this can be common and arbitrary (and optionally free of divides and capable of maintaining accuracy even if the tick frequency is not an even division of 1ms)
HAL Functions (/arch/... or /board/...) - u64 get_ticks(void) - Returns a tick count as an unsigned 64-bit integer - Abstracts the implementation of the platform tick counter (platform may by 32-bit 3MHz counter, might be a 16-bit 0-999 microsecond plus 16-bit 0-65535 millisecond etc) - u64 ticks_per_millisecond() - Returns the number of ticks (as returned by get_ticks()) per millisecond - void timer_isr() - Optional (particularly if tick counter rollover period is exceptionally log which is usually the case for native 64-bit tick counters) - Simply calls sync_timebase() - For platforms without any tick counter, this can implement one (but accuracy will be harmed due to usage of disable_interrupts() and enable_interrupts() in U-Boot
So to get the new API up and running, only two functions are mandatory:
get_ticks() which reads the hardware tick counter and deals with any 'funny stuff' including rollovers, short timers (12-bit for example), composite counters (16-bit 0-999 microsecond + 16-bit millisecond) and maintains a 'clean' 64-bit tick counter which rolls over from all 1's to all 0's. The 64-bit tick counter does not need to be reset to zero ever (even on startup - sync_timebase tacks care of all the details)
ticks_per_millisecond() simply return the number of ticks in a millisecond - This may as simple as:
inline u64 ticks_per_millisecond(void) { return CONFIG_SYS_TICK_PER_MS; }
But it may be trickier if you have a programmable tick frequency
The optional timer ISR is required if the tick counter has a short roll over duration (short is up to you - 1 second is short, 1 hour might be, 1 century is not)
Regards,
Graeme

Hi Graeme,
Thanks very much for doing this. I have been following the discussion and am very happy that you have continued with it.
On Thu, May 26, 2011 at 6:27 AM, Graeme Russ graeme.russ@gmail.com wrote:
Hello Everyone,
OK - Starting a new thread to discuss implementation details. This is a heads-up for arch/platform maintainers - Once this is a bit more stable, I will put it on the wiki
Assumed Capabilities of the Platform - Has a 'tick counter' that does not rely on software to increment - tick interval may by a fixed constant which cannot be controlled via software, or it could be programmable (PIT)
API Functions (/lib/timer.c) - u32 get_timer(u32 start) - Returns the number of elapsed milliseconds since 'start'
Can we have a microsecond one also please? Some sort of microsecond support is needed for udelay anyway. It can be implemented as get_timer() * 1000 as a fallback. I saw this in your original proposal. Given the 100 emails you have endured I understand if death is losing its sting, but it is still a useful feature.
API Functions (/arch/...) - void udelay(u32 delay) - Used for 'short' delays (generally up to several seconds) - Can use the tick counter if it is fast enough - MUST NOT RESET THE TICK COUNTER
'Helper' Functions (/lib/timer.c) - void sync_timebase(void) - Updates the millisecond timer - Utilises HAL functions to access the platform's tick counter - Must be called more often than the rollover period of the platform's tick counter - Does not need to be called with a regular frequency (immune to interrupt skew) - Is always called by get_timer() - For platforms with short tick counter rollovers it should be called by an ISR - Bill Campbell wrote a good example which proved this can be common and arbitrary (and optionally free of divides and capable of maintaining accuracy even if the tick frequency is not an even division of 1ms)
HAL Functions (/arch/... or /board/...) - u64 get_ticks(void) - Returns a tick count as an unsigned 64-bit integer - Abstracts the implementation of the platform tick counter (platform may by 32-bit 3MHz counter, might be a 16-bit 0-999 microsecond plus 16-bit 0-65535 millisecond etc) - u64 ticks_per_millisecond() - Returns the number of ticks (as returned by get_ticks()) per millisecond - void timer_isr() - Optional (particularly if tick counter rollover period is exceptionally log which is usually the case for native 64-bit tick counters) - Simply calls sync_timebase() - For platforms without any tick counter, this can implement one (but accuracy will be harmed due to usage of disable_interrupts() and enable_interrupts() in U-Boot
I suppose this isn't the U-Boot way, but perhaps these could have names which obviously indicate they are time and HAL-related, and need to be implemented by a board. Perhaps a timer_ prefix?
So to get the new API up and running, only two functions are mandatory:
get_ticks() which reads the hardware tick counter and deals with any 'funny stuff' including rollovers, short timers (12-bit for example), composite counters (16-bit 0-999 microsecond + 16-bit millisecond) and maintains a 'clean' 64-bit tick counter which rolls over from all 1's to all 0's. The 64-bit tick counter does not need to be reset to zero ever (even on startup
- sync_timebase tacks care of all the details)
ticks_per_millisecond() simply return the number of ticks in a millisecond
- This may as simple as:
inline u64 ticks_per_millisecond(void) { return CONFIG_SYS_TICK_PER_MS; }
But it may be trickier if you have a programmable tick frequency
This sounds find as I assume it allows the compiler to optimize to avoid division, etc. For the microsecond case ticks_to_microseconds(u64) might be better since the factor may not be an integer.
The optional timer ISR is required if the tick counter has a short roll over duration (short is up to you - 1 second is short, 1 hour might be, 1 century is not)
Regards, Simon
Regards,
Graeme _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

Dear Simon Glass,
In message BANLkTikWwuymrJtMEHBZkvNgNBK1e=RdWA@mail.gmail.com you wrote:
Can we have a microsecond one also please? Some sort of microsecond
I guess you cannot, at least not in general. In worst case that would mean we have to process 1e6 interrupts per second, which leaves little time for anything useful.
Best regards,
Wolfgang Denk

On Fri, May 27, 2011 at 3:28 AM, Wolfgang Denk wd@denx.de wrote:
Dear Simon Glass,
In message BANLkTikWwuymrJtMEHBZkvNgNBK1e=RdWA@mail.gmail.com you wrote:
Can we have a microsecond one also please? Some sort of microsecond
I guess you cannot, at least not in general. In worst case that would mean we have to process 1e6 interrupts per second, which leaves little time for anything useful.
If we implemented a sync_us_timer(), we could either:
a) Never kick it using an interrupt at all (only kick it in udelay()) b) Kick it in a much slower interrupt (1ms+ period)
Remember, the kicking of the sync function does not need to correlate to the incrementing of the tick counter - Only to the roll-over period of the tick counter.
For a 64-bit sub microsecond tick counter, interrupts will probably not ever be needed (unless the tick frequency is ludicrous - even a nanosecond tick counter will take 213 days to wrap) so in this case, sync_us_timer() would be fine
Regards,
Graeme

On Thu, May 26, 2011 at 3:44 PM, Graeme Russ graeme.russ@gmail.com wrote:
On Fri, May 27, 2011 at 3:28 AM, Wolfgang Denk wd@denx.de wrote:
Dear Simon Glass,
In message BANLkTikWwuymrJtMEHBZkvNgNBK1e=RdWA@mail.gmail.com you wrote:
Can we have a microsecond one also please? Some sort of microsecond
I guess you cannot, at least not in general. In worst case that would mean we have to process 1e6 interrupts per second, which leaves little time for anything useful.
Sorry Wolfgang I don't really understand this. We would only process when we read it, and then hopefully only a simple multiple or shift, after compiler optimizations kick in. Probably I am just missing what you are saying.
If we implemented a sync_us_timer(), we could either:
a) Never kick it using an interrupt at all (only kick it in udelay()) b) Kick it in a much slower interrupt (1ms+ period)
Remember, the kicking of the sync function does not need to correlate to the incrementing of the tick counter - Only to the roll-over period of the tick counter.
For a 64-bit sub microsecond tick counter, interrupts will probably not ever be needed (unless the tick frequency is ludicrous - even a nanosecond tick counter will take 213 days to wrap) so in this case, sync_us_timer() would be fine
Hi Graeme,
Well yes, I feel that you either worry about rollover or you use 64-bits. Since you are using 64-bits it shouldn't matter.
I hope we can avoid integer division in the microsecond case. Someone stated that time delays are the main use for the timer, but some of us have performance-monitoring plans.
Re the atomicity of handling 64-bit numbers, how about just disable/enable interrupts around this? I think 64-bit is overkill but at least it is simple, and prefer a u64 to a struct { u32 lo, hi; }.
Regards, Simon
Regards,
Graeme

Dear Simon Glass,
In message BANLkTinxp1wuA9+_EvC0ppK+7Uj89UkN-g@mail.gmail.com you wrote:
I guess you cannot, at least not in general. In worst case that would mean we have to process 1e6 interrupts per second, which leaves little time for anything useful.
Sorry Wolfgang I don't really understand this. We would only process when we read it, and then hopefully only a simple multiple or shift, after compiler optimizations kick in. Probably I am just missing what you are saying.
You assume that there is a counter register that can be read. This may not be the case. You may have just a timer which fires an interrupt every X time units, so you can implement a counter in the ISR. This is for examole how the tick is implemented on PPC now: we get an interrupt every millisecond and increment a counter then.
For a microsecond tick you need in such a setup one million interrupts per second.
I hope we can avoid integer division in the microsecond case. Someone stated that time delays are the main use for the timer, but some of us have performance-monitoring plans.
Re the atomicity of handling 64-bit numbers, how about just disable/enable interrupts around this? I think 64-bit is overkill but at least it is simple, and prefer a u64 to a struct { u32 lo, hi; }.
Enabling and disabling interrupts is not exactly performance-neutral either.
Best regards,
Wolfgang Denk

On Fri, May 27, 2011 at 12:40 AM, Wolfgang Denk wd@denx.de wrote:
Dear Simon Glass,
In message BANLkTinxp1wuA9+_EvC0ppK+7Uj89UkN-g@mail.gmail.com you wrote:
I guess you cannot, at least not in general. In worst case that would mean we have to process 1e6 interrupts per second, which leaves little time for anything useful.
Sorry Wolfgang I don't really understand this. We would only process when we read it, and then hopefully only a simple multiple or shift, after compiler optimizations kick in. Probably I am just missing what you are saying.
You assume that there is a counter register that can be read. This may not be the case. You may have just a timer which fires an interrupt every X time units, so you can implement a counter in the ISR. This is for examole how the tick is implemented on PPC now: we get an interrupt every millisecond and increment a counter then.
For a microsecond tick you need in such a setup one million interrupts per second.
I thought PPC had a performance counter? But if not, then it will just have to live with a millisecond timer.
I hope we can avoid integer division in the microsecond case. Someone stated that time delays are the main use for the timer, but some of us have performance-monitoring plans.
Re the atomicity of handling 64-bit numbers, how about just disable/enable interrupts around this? I think 64-bit is overkill but at least it is simple, and prefer a u64 to a struct { u32 lo, hi; }.
Enabling and disabling interrupts is not exactly performance-neutral either.
Unfortunately I know very little about PPC but at least on ARM UP this is not expensive. We can compare that against just reading the 64-bit counter until it doesn't change...
Best regards,
Wolfgang Denk
Regards, Simon
-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de The use of Microsoft crippleware systems is a sin that carries with it its own punishment. -- Tom Christiansen in 6bo3fr$pj8$5@csnews.cs.colorado.edu

On 5/26/2011 6:27 AM, Graeme Russ wrote:
Hello Everyone,
OK - Starting a new thread to discuss implementation details. This is a heads-up for arch/platform maintainers - Once this is a bit more stable, I will put it on the wiki
Assumed Capabilities of the Platform
- Has a 'tick counter' that does not rely on software to increment
Hi All, The nios2 with the most basic timer does not meet this requirement. It will not count at all without the 10 ms interrupt. I don't think this requirement matters anyway. We need a 'tick counter' that 'ticks'. If it takes software to make it tick, we don't much care. There may be problems with early use of udelay in that case, but that is a different issue.
- tick interval may by a fixed constant which cannot be controlled via software, or it could be programmable (PIT)
API Functions (/lib/timer.c)
- u32 get_timer(u32 start)
- Returns the number of elapsed milliseconds since 'start'
API Functions (/arch/...)
- void udelay(u32 delay)
- Used for 'short' delays (generally up to several seconds)
- Can use the tick counter if it is fast enough
- MUST NOT RESET THE TICK COUNTER
There is a requirement that udelay be available before relocation and before the BSS is available. One can use the tick counter to provide udelay as long as sync_timebase is not called OR sync timebase does not use BSS. It appears many implementations ignore this requirement at present. We should try to fix this, but is should not be a requirement.
'Helper' Functions (/lib/timer.c)
I think this function should be weak, so that it is possible for people to override it with a "custom" function. The fully general sync_timebase has lots of code in it that can be simplified in special cases. We want and need a fully general function to be available, but other users who are real tight on space may want a cut down version. We should make that easily possible.
- void sync_timebase(void)
- Updates the millisecond timer
- Utilises HAL functions to access the platform's tick counter
- Must be called more often than the rollover period of the platform's tick counter
- Does not need to be called with a regular frequency (immune to interrupt skew)
- Is always called by get_timer()
- For platforms with short tick counter rollovers it should be called by an ISR
- Bill Campbell wrote a good example which proved this can be common and arbitrary (and optionally free of divides and capable of maintaining accuracy even if the tick frequency is not an even division of 1ms)
HAL Functions (/arch/... or /board/...)
- u64 get_ticks(void)
For what it's worth, I would like to propose using a (gasp!) typedef here. It seems to me there are a whole lot of cases where the max number of ticks is a u32 or less. In those cases, the wrap at 32 bits helps things a lot. If the tick counter is really 64 bits, the function of sync_timebase is simply to convert the tick value to millisec, and that is it. Otherwise, if it is 32 bits or less then some other actions will be required. I think this is one of those times where a typedef would help, We could define a type called timer_tick_t to describe this quantity. That would allow a pure 32 bit implementation where appropriate.
Another suggestion is that perhaps we want a u32 get_ticks_lsb(void) as well as a regular get_ticks. The lsb version would be used for udelay and could possibly come from another timer if that was necessary/desirable. See the requirement for early udelay early availability.
- Returns a tick count as an unsigned 64-bit integer - Abstracts the implementation of the platform tick counter (platform may by 32-bit 3MHz counter, might be a 16-bit 0-999 microsecond plus 16-bit 0-65535 millisecond etc)
- u64 ticks_per_millisecond()
- Returns the number of ticks (as returned by get_ticks()) per millisecond
I think ticks_per_sec would be a better choice. First, such a function already exists in almost all u-boots. Second, if one wants the best accuracy for things like udelay, you need better accuracy than millisec. Since this function is used only infrequently, when things are initialized, a divide to get ticks_per_millsec (if that is what you really want) is no big deal. Lastly, I think this function can remain u32. Yes, there is a 4 GHz limit on the clock rate. If this ever becomes an issue, we can change the type to timer_tick_t. When the CPU clock rate gets quite high, it is an advantage to divide it down for performance measurement anyway. The AMD/Intel chips already do this. If the hardware doesn't do it, shift the timer value right two bits. I doubt you will miss the 0.4 nanoseconds resolution loss from your 10 GHz timestamp.
- void timer_isr()
- Optional (particularly if tick counter rollover period is exceptionally log which is usually the case for native 64-bit tick counters)
- Simply calls sync_timebase()
- For platforms without any tick counter, this can implement one (but accuracy will be harmed due to usage of disable_interrupts() and enable_interrupts() in U-Boot
So to get the new API up and running, only two functions are mandatory:
get_ticks() which reads the hardware tick counter and deals with any 'funny stuff' including rollovers, short timers (12-bit for example), composite counters (16-bit 0-999 microsecond + 16-bit millisecond) and maintains a 'clean' 64-bit tick counter which rolls over from all 1's to all 0's. The
I think it is the task of get_ticks to return the hardware tick counter as an increasing counter, period. The counter may wrap at some final count that is not all ones. That is ok. Sync_timebase deals with the rollovers if necessary. get_ticks is very lightweight. get_ticks should deal with decrementing counters by returning the complement of the counter. The sc520 case is a bit more complex if you intend to use the 0-999 and 16 bit millisec registers, in that you do need to add them to the previous value to make an increasing counter. Sync_timebase "likes" short counters in that they are easy to convert to millisec and tick remainders.
64-bit tick counter does not need to be reset to zero ever (even on startup
- sync_timebase tacks care of all the details)
True, but sync_timebase does have to be initialized (as does the timer itself in most cases, so this is not a restriction).
ticks_per_millisecond() simply return the number of ticks in a millisecond
- This may as simple as:
inline u64 ticks_per_millisecond(void) { return CONFIG_SYS_TICK_PER_MS; }
But it may be trickier if you have a programmable tick frequency
You will have to call the routine that initializes sync_timebase. This routine should have a name, like void init_sync_timebase(void)?
The optional timer ISR is required if the tick counter has a short roll over duration (short is up to you - 1 second is short, 1 hour might be, 1 century is not)
Regards,
Graeme
It is probably true that sync_timebase should have a parameter flag. The reason is that if the timer isr is called only when the timer wraps, then the calls to sync_timebase may be slightly more than a full timer period apart. (due to interrupt latency). Therefore, when the timer difference is computed, if the current update is due to a wrap AND the previous update is due to a wrap, the difference should be approximately 1 wrap. If it comes up real short, you must add a wrap. This isn't necessary if the routine is called more often than once per wrap. Also, when sync_timebase is called in get_timer, you must first disable interrupts and then enable interrupts after sync_timebase returns
Best Regards, Bill Campbell

Dear "J. William Campbell",
In message 4DDE8639.3090909@comcast.net you wrote:
I think it is the task of get_ticks to return the hardware tick counter as an increasing counter, period. The counter may wrap at some final count that is not all ones. That is ok. Sync_timebase deals with the
NO! We want to be able to compute time differences using simple unsigned arithmentics, even after a rollover of the counter. For this it is mandatory that the counter always gets only incremented until it wraps around at te end of it's number range, and never gets reset before.
You will have to call the routine that initializes sync_timebase. This routine should have a name, like void init_sync_timebase(void)?
init_timebase().
Best regards,
Wolfgang Denk

On 5/26/2011 10:53 AM, Wolfgang Denk wrote:
Dear "J. William Campbell",
In message4DDE8639.3090909@comcast.net you wrote:
I think it is the task of get_ticks to return the hardware tick counter as an increasing counter, period. The counter may wrap at some final count that is not all ones. That is ok. Sync_timebase deals with the
NO! We want to be able to compute time differences using simple unsigned arithmentics, even after a rollover of the counter. For this it is mandatory that the counter always gets only incremented until it wraps around at te end of it's number range, and never gets reset
Hi All, I agree that that is what must happen, but it should happen inside sync_timebase. Sync_timebase does what is needed to convert the less-than-fully capable counters into a fully capable one. You could move that functionality down into get_ticks, but if you do, you end up much like the present scheme where the multiple different get_ticks routines expected to cope with expanding the hardware counter properly. To date, it has been shown conclusively that this process cannot be relied upon, or we wouldn't be having this discussion. If we put that functionality inside sync_timebase, it is in one place and debuggable once. All sync_timebase requires to do this is ticks per second and maximum tick value. I do request that counters that decrement be negated in the get_ticks routine, but beyond that, it should be a simple read of the tick register(s).
Best Regards, Bill Campbell
before.
You will have to call the routine that initializes sync_timebase. This routine should have a name, like void init_sync_timebase(void)?
init_timebase().
Best regards,
Wolfgang Denk

Dear "J. William Campbell",
In message 4DDEA165.9010403@comcast.net you wrote:
I think it is the task of get_ticks to return the hardware tick counter as an increasing counter, period. The counter may wrap at some final count that is not all ones. That is ok. Sync_timebase deals with the
NO! We want to be able to compute time differences using simple unsigned arithmentics, even after a rollover of the counter. For this it is mandatory that the counter always gets only incremented until it wraps around at te end of it's number range, and never gets reset
I agree that that is what must happen, but it should happen inside
sync_timebase. Sync_timebase does what is needed to convert the less-than-fully capable counters into a fully capable one. You could
I think you also want this behaviour for get_ticks().
To date, it has been shown conclusively that this process cannot be relied upon, or we wouldn't be having this discussion. If we put that functionality inside sync_timebase, it is in one place and debuggable once. All sync_timebase requires to do this is ticks per second and maximum tick value. I do request that counters that decrement be negated in the get_ticks routine, but beyond that, it should be a simple read of the tick register(s).
I think using ticks per second is not a good idea. It may exceed ULONG_MAX, and having to use 64 bit for all calculations is probably overkill. The existing ticks2usec/usec2ticks work fine so far.
Best regards,
Wolfgang Denk

On 5/26/2011 12:16 PM, Wolfgang Denk wrote:
Dear "J. William Campbell",
In message4DDEA165.9010403@comcast.net you wrote:
I think it is the task of get_ticks to return the hardware tick counter as an increasing counter, period. The counter may wrap at some final count that is not all ones. That is ok. Sync_timebase deals with the
NO! We want to be able to compute time differences using simple unsigned arithmentics, even after a rollover of the counter. For this it is mandatory that the counter always gets only incremented until it wraps around at te end of it's number range, and never gets reset
I agree that that is what must happen, but it should happen inside
sync_timebase. Sync_timebase does what is needed to convert the less-than-fully capable counters into a fully capable one. You could
I think you also want this behaviour for get_ticks().
Hi Wolfgang, I understand why that might be nice. But to do that with common code would require get_ticks to call a generic routine (say sync_ticks) that would expand the counter to 64 bits. Note that this is not always totally trivial, as the timer may roll over at 10 ms or some other not-so-nice number. Then sync_timer would convert the 64 bit number to milliseconds. That approach will work. However, I think that is overkill, as we really want the result in milliseconds. If you look at the prototype sync_timer routine, you can see an example of how this is possible without resorting to 64 bit math. I think that avoiding the 64 bit math on processors that don't have a 64 bit tick counter (and are therefore probably less capable) is worthwhile. I also think that the purpose of the get_time routine abstracting the time into milliseconds is to avoid dealing with ticks anywhere except in the timer routines. Presumably, nobody but sync_timer would ever have reason to call get_ticks. If that is not your thinking, fine, we just disagree on that point, and having a sync_ticks to expand the tick counter certainly can be done.
To date, it has been shown conclusively that this process cannot be relied upon, or we wouldn't be having this discussion. If we put that functionality inside sync_timebase, it is in one place and debuggable once. All sync_timebase requires to do this is ticks per second and maximum tick value. I do request that counters that decrement be negated in the get_ticks routine, but beyond that, it should be a simple read of the tick register(s).
I think using ticks per second is not a good idea. It may exceed ULONG_MAX, and having to use 64 bit for all calculations is probably overkill. The existing ticks2usec/usec2ticks work fine so far.
I certainly agree using 64 bits for all calculations is vast overkill. In fact, I think using 64 bit calculations on systems that have only a 32 bit or less timer register is probably overkill. :-) However, to date,AFAIK, no processor has exceeded the u32 in ticks per second. As I pointed out in a previous e-mail, if they ever do this, we can just drop one or 2 bits off the 64 bit counter and in millisecond resolution, nobody will ever know. Also as previously pointed out, usec2ticks is not present yet in lots of implementations. Also, if the fundamental clock frequency is 32 kHz (or anything less than 1 MHz), usec2ticks is 0! This probably rules out using it to get ticks per millisec or ticks per sec.
Best Regards, Bill Campbell
Best regards,
Wolfgang Denk

Dear "J. William Campbell",
In message 4DDEAFE0.8060905@comcast.net you wrote:
I certainly agree using 64 bits for all calculations is vast overkill. In fact, I think using 64 bit calculations on systems that have only a 32 bit or less timer register is probably overkill. :-) However, to date,AFAIK, no processor has exceeded the u32 in ticks per second. As I
Not yet. But it makes no sense to start a new design with settings already in critical range, especially since there is zero problem with breaking it down by a factor of 1000 or 1e6.
pointed out in a previous e-mail, if they ever do this, we can just drop one or 2 bits off the 64 bit counter and in millisecond resolution, nobody will ever know. Also as previously pointed out, usec2ticks is
No. I will not accept a design that is so close on the edge of breaking.
What is your exact problem with the existing interfaces ticks2usec() and usec2ticks() ?
not present yet in lots of implementations. Also, if the fundamental clock frequency is 32 kHz (or anything less than 1 MHz), usec2ticks is 0! This probably rules out using it to get ticks per millisec or ticks per sec.
The statement "usec2ticks is 0" makes absolutely no sense as long as you don't say which argument you pass in. You get a return value of 0 even for a tick rate in the GHz range if you pass 0 as argument.
Hoewver, usec2ticks(1000) or maybe usec2ticks(100000) will probably return pretty useful values.
[Note that by passing properly scaled arguments you can also avoid a number of rounding errors.]
Best regards,
Wolfgang Denk

On 5/26/2011 1:27 PM, Wolfgang Denk wrote:
Dear "J. William Campbell",
In message4DDEAFE0.8060905@comcast.net you wrote:
I certainly agree using 64 bits for all calculations is vast overkill. In fact, I think using 64 bit calculations on systems that have only a 32 bit or less timer register is probably overkill. :-) However, to date,AFAIK, no processor has exceeded the u32 in ticks per second. As I
Not yet. But it makes no sense to start a new design with settings already in critical range, especially since there is zero problem with breaking it down by a factor of 1000 or 1e6.
pointed out in a previous e-mail, if they ever do this, we can just drop one or 2 bits off the 64 bit counter and in millisecond resolution, nobody will ever know. Also as previously pointed out, usec2ticks is
No. I will not accept a design that is so close on the edge of breaking.
What is your exact problem with the existing interfaces ticks2usec() and usec2ticks() ?
not present yet in lots of implementations. Also, if the fundamental clock frequency is 32 kHz (or anything less than 1 MHz), usec2ticks is 0! This probably rules out using it to get ticks per millisec or ticks per sec.
The statement "usec2ticks is 0" makes absolutely no sense as long as you don't say which argument you pass in. You get a return value of 0 even for a tick rate in the GHz range if you pass 0 as argument.
Hoewver, usec2ticks(1000) or maybe usec2ticks(100000) will probably return pretty useful values.
[Note that by passing properly scaled arguments you can also avoid a number of rounding errors.]
Hi Wolfgang, Yes, you are correct. I was thinking usec2ticks(1), which is certainly not the way to do it. I am happy with usec2ticks and ticks2usec. That works for me. Sorry for the noise.
How about the first part of my response? Are you still thinking about it or is it just too bad for words :-) ?
Best Regards, Bill Campbell
Best regards,
Wolfgang Denk

On Fri, May 27, 2011 at 4:52 AM, J. William Campbell jwilliamcampbell@comcast.net wrote:
On 5/26/2011 10:53 AM, Wolfgang Denk wrote:
Dear "J. William Campbell",
In message4DDE8639.3090909@comcast.net you wrote:
I think it is the task of get_ticks to return the hardware tick counter as an increasing counter, period. The counter may wrap at some final count that is not all ones. That is ok. Sync_timebase deals with the
NO! We want to be able to compute time differences using simple unsigned arithmentics, even after a rollover of the counter. For this it is mandatory that the counter always gets only incremented until it wraps around at te end of it's number range, and never gets reset
Hi All, I agree that that is what must happen, but it should happen inside sync_timebase. Sync_timebase does what is needed to convert the less-than-fully capable counters into a fully capable one. You could move
How will sync_timebase() know that a rollover has happened before all 1's?
We would then need to tell sync_timebase() what the maximum value returned by get_ticks() is. Easier to do have get_ticks() handle it as that is a platform specific detail
that functionality down into get_ticks, but if you do, you end up much like the present scheme where the multiple different get_ticks routines expected to cope with expanding the hardware counter properly. To date, it has been
Correct - the tick counter is a platform specific implementation detail and therefore the implementation of the HAL (get_ticks()) must also be platform specific
shown conclusively that this process cannot be relied upon, or we wouldn't be having this discussion. If we put that functionality inside sync_timebase, it is in one place and debuggable once. All sync_timebase requires to do this is ticks per second and maximum tick value. I do request that counters that decrement be negated in the get_ticks routine, but beyond that, it should be a simple read of the tick register(s).
But how do you handle cases like the sc520 - A microsecond counter which counts 0-999, kicking a 16-bit millisecond counter on rollover. Reading of the millisecond counter latches the microsecond counter and resets the millsecond counter to zero
There is no uniform tick counter to read - It has to be fudged - in get_ticks()
Regards,
Graeme

Hi Bill,
On Fri, May 27, 2011 at 2:56 AM, J. William Campbell jwilliamcampbell@comcast.net wrote:
On 5/26/2011 6:27 AM, Graeme Russ wrote:
Hello Everyone,
OK - Starting a new thread to discuss implementation details. This is a heads-up for arch/platform maintainers - Once this is a bit more stable, I will put it on the wiki
Assumed Capabilities of the Platform - Has a 'tick counter' that does not rely on software to increment
Hi All, The nios2 with the most basic timer does not meet this requirement. It will not count at all without the 10 ms interrupt. I don't think this requirement matters anyway. We need a 'tick counter' that 'ticks'. If it takes software to make it tick, we don't much care. There may be problems with early use of udelay in that case, but that is a different issue.
I think we will need to define get_timer() weak - Nios will have to override the default implementation to cater for it's (Nios') limitations
- tick interval may by a fixed constant which cannot be controlled via software, or it could be programmable (PIT)
API Functions (/lib/timer.c) - u32 get_timer(u32 start) - Returns the number of elapsed milliseconds since 'start'
API Functions (/arch/...) - void udelay(u32 delay) - Used for 'short' delays (generally up to several seconds) - Can use the tick counter if it is fast enough - MUST NOT RESET THE TICK COUNTER
There is a requirement that udelay be available before relocation and before the BSS is available. One can use the tick counter to provide udelay as long as sync_timebase is not called OR sync timebase does not use BSS. It appears many implementations ignore this requirement at present. We should try to fix this, but is should not be a requirement.
If you really wanted to, sync_timebase() could use global data (it doesn't have many static variables) in which case all timer functions would be available before relocation
'Helper' Functions (/lib/timer.c)
I think this function should be weak, so that it is possible for people to override it with a "custom" function. The fully general sync_timebase has lots of code in it that can be simplified in special cases. We want and need a fully general function to be available, but other users who are real tight on space may want a cut down version. We should make that easily possible.
Agree
- void sync_timebase(void) - Updates the millisecond timer - Utilises HAL functions to access the platform's tick counter - Must be called more often than the rollover period of the platform's tick counter - Does not need to be called with a regular frequency (immune to interrupt skew) - Is always called by get_timer() - For platforms with short tick counter rollovers it should be called by an ISR - Bill Campbell wrote a good example which proved this can be common and arbitrary (and optionally free of divides and capable of maintaining accuracy even if the tick frequency is not an even division of 1ms)
HAL Functions (/arch/... or /board/...) - u64 get_ticks(void)
For what it's worth, I would like to propose using a (gasp!) typedef here. It seems to me there are a whole lot of cases where the max number of ticks is a u32 or less. In those cases, the wrap at 32 bits helps things a lot. If the tick counter is really 64 bits, the function of sync_timebase is simply to convert the tick value to millisec, and that is it. Otherwise, if it is 32 bits or less then some other actions will be required. I think this is one of those times where a typedef would help, We could define a type called timer_tick_t to describe this quantity. That would allow a pure 32 bit implementation where appropriate.
Another suggestion is that perhaps we want a u32 get_ticks_lsb(void) as well as a regular get_ticks. The lsb version would be used for udelay and could possibly come from another timer if that was necessary/desirable. See the requirement for early udelay early availability.
I think this all adds unnecessary complexity
- Returns a tick count as an unsigned 64-bit integer - Abstracts the implementation of the platform tick counter (platform may by 32-bit 3MHz counter, might be a 16-bit 0-999 microsecond plus 16-bit 0-65535 millisecond etc) - u64 ticks_per_millisecond() - Returns the number of ticks (as returned by get_ticks()) per millisecond
I think ticks_per_sec would be a better choice. First, such a function already exists in almost all u-boots. Second, if one wants the best accuracy for things like udelay, you need better accuracy than millisec. Since this function is used only infrequently, when things are initialized, a divide to get ticks_per_millsec (if that is what you really want) is no big deal. Lastly, I think this function can remain u32. Yes, there is a 4 GHz limit on
Don't underestimate the ability of existing platforms to already exceed this limit - Scientific equipment can easily have a 1 nano second tick counter (with extreme precision)
the clock rate. If this ever becomes an issue, we can change the type to timer_tick_t. When the CPU clock rate gets quite high, it is an advantage to divide it down for performance measurement anyway. The AMD/Intel chips already do this. If the hardware doesn't do it, shift the timer value right two bits. I doubt you will miss the 0.4 nanoseconds resolution loss from your 10 GHz timestamp.
Why mess around with bit shifting (which you would then have to cludge into your platform code) when carting around a 64-bit value is relatively cheap, transparent and poratble (all all supported up-to-date tool chains)
- void timer_isr() - Optional (particularly if tick counter rollover period is exceptionally log which is usually the case for native 64-bit tick counters) - Simply calls sync_timebase() - For platforms without any tick counter, this can implement one (but accuracy will be harmed due to usage of disable_interrupts() and enable_interrupts() in U-Boot
So to get the new API up and running, only two functions are mandatory:
get_ticks() which reads the hardware tick counter and deals with any 'funny stuff' including rollovers, short timers (12-bit for example), composite counters (16-bit 0-999 microsecond + 16-bit millisecond) and maintains a 'clean' 64-bit tick counter which rolls over from all 1's to all 0's. The
I think it is the task of get_ticks to return the hardware tick counter as an increasing counter, period. The counter may wrap at some final count that is not all ones. That is ok. Sync_timebase deals with the rollovers if
The hardware tick counter may, the 64-bit software tick counter maintained by get_ticks() may not
necessary. get_ticks is very lightweight. get_ticks should deal with decrementing counters by returning the complement of the counter. The sc520 case is a bit more complex if you intend to use the 0-999 and 16 bit millisec registers, in that you do need to add them to the previous value to
As I mentioned in another post, this is a problem for the platform maintainer and is abstracted away throught the platform specific implementation of get_ticks()
make an increasing counter. Sync_timebase "likes" short counters in that they are easy to convert to millisec and tick remainders.
The compiler should handle using 64-bit rather than 32-bit transparently
64-bit tick counter does not need to be reset to zero ever (even on startup
- sync_timebase tacks care of all the details)
True, but sync_timebase does have to be initialized (as does the timer itself in most cases, so this is not a restriction).
This can be done in timer_init() via a call to sync_timebase() after the timer has been configured. This should bring everything into line
ticks_per_millisecond() simply return the number of ticks in a millisecond
- This may as simple as:
inline u64 ticks_per_millisecond(void) { return CONFIG_SYS_TICK_PER_MS; }
But it may be trickier if you have a programmable tick frequency
You will have to call the routine that initializes sync_timebase. This routine should have a name, like void init_sync_timebase(void)?
The optional timer ISR is required if the tick counter has a short roll over duration (short is up to you - 1 second is short, 1 hour might be, 1 century is not)
Regards,
Graeme
It is probably true that sync_timebase should have a parameter flag. The reason is that if the timer isr is called only when the timer wraps, then the calls to sync_timebase may be slightly more than a full timer period apart. (due to interrupt latency). Therefore, when the timer difference is computed, if the current update is due to a wrap AND the previous update is due to a wrap, the difference should be approximately 1 wrap. If it comes up real short, you must add a wrap. This isn't necessary if the routine is called more often than once per wrap. Also, when sync_timebase is called in
timer_isr() MUST be called more often than the rollover period of the underlying hardware tick counter - This is a requirement
get_timer, you must first disable interrupts and then enable interrupts after sync_timebase returns
Why? - get_ticks() provides an atomic read of the hardware tick counter. If get_ticks() needs to disable and enable interrupts to do so, that is a problem for the platform maintainer
Admittedly, sync_timebase() will not be re-entrant, but how will it ever be called concurrently? - Ah, I see - a call to get_timer() interrupted by the timer ISR :)
Regards,
Graeme

On 5/26/2011 4:28 PM, Graeme Russ wrote:
Hi Bill,
On Fri, May 27, 2011 at 2:56 AM, J. William Campbell jwilliamcampbell@comcast.net wrote:
On 5/26/2011 6:27 AM, Graeme Russ wrote:
Hello Everyone,
OK - Starting a new thread to discuss implementation details. This is a heads-up for arch/platform maintainers - Once this is a bit more stable, I will put it on the wiki
Assumed Capabilities of the Platform
- Has a 'tick counter' that does not rely on software to increment
Hi All, The nios2 with the most basic timer does not meet this requirement. It will not count at all without the 10 ms interrupt. I don't think this requirement matters anyway. We need a 'tick counter' that 'ticks'. If it takes software to make it tick, we don't much care. There may be problems with early use of udelay in that case, but that is a different issue.
I think we will need to define get_timer() weak - Nios will have to override the default implementation to cater for it's (Nios') limitations
Hi All, Yes, that will probably be required here.
- tick interval may by a fixed constant which cannot be controlled via software, or it could be programmable (PIT)
API Functions (/lib/timer.c)
- u32 get_timer(u32 start)
- Returns the number of elapsed milliseconds since 'start'
API Functions (/arch/...)
- void udelay(u32 delay)
- Used for 'short' delays (generally up to several seconds)
- Can use the tick counter if it is fast enough
- MUST NOT RESET THE TICK COUNTER
There is a requirement that udelay be available before relocation and before the BSS is available. One can use the tick counter to provide udelay as long as sync_timebase is not called OR sync timebase does not use BSS. It appears many implementations ignore this requirement at present. We should try to fix this, but is should not be a requirement.
If you really wanted to, sync_timebase() could use global data (it doesn't have many static variables) in which case all timer functions would be available before relocation
Yes, my implementation of the sync_timebase routine was written that way, using gd-> for the required variables.
'Helper' Functions (/lib/timer.c)
I think this function should be weak, so that it is possible for people to override it with a "custom" function. The fully general sync_timebase has lots of code in it that can be simplified in special cases. We want and need a fully general function to be available, but other users who are real tight on space may want a cut down version. We should make that easily possible.
Agree
- void sync_timebase(void)
- Updates the millisecond timer
- Utilises HAL functions to access the platform's tick counter
- Must be called more often than the rollover period of the platform's tick counter
- Does not need to be called with a regular frequency (immune to interrupt skew)
- Is always called by get_timer()
- For platforms with short tick counter rollovers it should be called by an ISR
- Bill Campbell wrote a good example which proved this can be common and arbitrary (and optionally free of divides and capable of maintaining accuracy even if the tick frequency is not an even division of 1ms)
HAL Functions (/arch/... or /board/...)
- u64 get_ticks(void)
For what it's worth, I would like to propose using a (gasp!) typedef here. It seems to me there are a whole lot of cases where the max number of ticks is a u32 or less. In those cases, the wrap at 32 bits helps things a lot. If the tick counter is really 64 bits, the function of sync_timebase is simply to convert the tick value to millisec, and that is it. Otherwise, if it is 32 bits or less then some other actions will be required. I think this is one of those times where a typedef would help, We could define a type called timer_tick_t to describe this quantity. That would allow a pure 32 bit implementation where appropriate.
Another suggestion is that perhaps we want a u32 get_ticks_lsb(void) as well as a regular get_ticks. The lsb version would be used for udelay and could possibly come from another timer if that was necessary/desirable. See the requirement for early udelay early availability.
I think this all adds unnecessary complexity
- Returns a tick count as an unsigned 64-bit integer - Abstracts the implementation of the platform tick counter (platform may by 32-bit 3MHz counter, might be a 16-bit 0-999 microsecond plus 16-bit 0-65535 millisecond etc)
- u64 ticks_per_millisecond()
- Returns the number of ticks (as returned by get_ticks()) per millisecond
I think ticks_per_sec would be a better choice. First, such a function already exists in almost all u-boots. Second, if one wants the best accuracy for things like udelay, you need better accuracy than millisec. Since this function is used only infrequently, when things are initialized, a divide to get ticks_per_millsec (if that is what you really want) is no big deal. Lastly, I think this function can remain u32. Yes, there is a 4 GHz limit on
Don't underestimate the ability of existing platforms to already exceed this limit - Scientific equipment can easily have a 1 nano second tick counter (with extreme precision)
True enough. I have already agreed that usec2ticks and ticks2usec are fine for this purpose.
the clock rate. If this ever becomes an issue, we can change the type to timer_tick_t. When the CPU clock rate gets quite high, it is an advantage to divide it down for performance measurement anyway. The AMD/Intel chips already do this. If the hardware doesn't do it, shift the timer value right two bits. I doubt you will miss the 0.4 nanoseconds resolution loss from your 10 GHz timestamp.
Why mess around with bit shifting (which you would then have to cludge into your platform code) when carting around a 64-bit value is relatively cheap, transparent and poratble (all all supported up-to-date tool chains)
I really STRONGLY disagree with this statement. If you actually needed 64 bit variables, fine use them. But as I have already shown, you do not need them in general. We are computing a 32 bit result. There is some entropy argument that says you shouldn't need 64 bits to do so. Another way to look at it is that converting the top 32 bit word and the bottom 32 bit word to ms separately can be easier than doing them both together at once. However, as we will see below, I do agree we need two 32 bit words to make this process go smoothly. I just don't agree that they should/will constitute a 64 bit binary word. See below.
- void timer_isr()
- Optional (particularly if tick counter rollover period is exceptionally log which is usually the case for native 64-bit tick counters)
- Simply calls sync_timebase()
- For platforms without any tick counter, this can implement one (but accuracy will be harmed due to usage of disable_interrupts()
and enable_interrupts() in U-Boot
So to get the new API up and running, only two functions are mandatory:
get_ticks() which reads the hardware tick counter and deals with any 'funny stuff' including rollovers, short timers (12-bit for example), composite counters (16-bit 0-999 microsecond + 16-bit millisecond) and maintains a 'clean' 64-bit tick counter which rolls over from all 1's to all 0's. The
I think it is the task of get_ticks to return the hardware tick counter as an increasing counter, period. The counter may wrap at some final count that is not all ones. That is ok. Sync_timebase deals with the rollovers if
The hardware tick counter may, the 64-bit software tick counter maintained by get_ticks() may not
necessary. get_ticks is very lightweight. get_ticks should deal with decrementing counters by returning the complement of the counter. The sc520 case is a bit more complex if you intend to use the 0-999 and 16 bit millisec registers, in that you do need to add them to the previous value to
As I mentioned in another post, this is a problem for the platform maintainer and is abstracted away throught the platform specific implementation of get_ticks()
make an increasing counter. Sync_timebase "likes" short counters in that they are easy to convert to millisec and tick remainders.
The compiler should handle using 64-bit rather than 32-bit transparently
True enough. But you don't need 64 bit variables at this point two 32 bit ones work just fine, in fact better in most cases.
64-bit tick counter does not need to be reset to zero ever (even on startup
- sync_timebase tacks care of all the details)
True, but sync_timebase does have to be initialized (as does the timer itself in most cases, so this is not a restriction).
This can be done in timer_init() via a call to sync_timebase() after the timer has been configured. This should bring everything into line
ticks_per_millisecond() simply return the number of ticks in a millisecond
- This may as simple as:
inline u64 ticks_per_millisecond(void) { return CONFIG_SYS_TICK_PER_MS; }
But it may be trickier if you have a programmable tick frequency
You will have to call the routine that initializes sync_timebase. This routine should have a name, like void init_sync_timebase(void)?
The optional timer ISR is required if the tick counter has a short roll over duration (short is up to you - 1 second is short, 1 hour might be, 1 century is not)
Regards,
Graeme
It is probably true that sync_timebase should have a parameter flag. The reason is that if the timer isr is called only when the timer wraps, then the calls to sync_timebase may be slightly more than a full timer period apart. (due to interrupt latency). Therefore, when the timer difference is computed, if the current update is due to a wrap AND the previous update is due to a wrap, the difference should be approximately 1 wrap. If it comes up real short, you must add a wrap. This isn't necessary if the routine is called more often than once per wrap. Also, when sync_timebase is called in
timer_isr() MUST be called more often than the rollover period of the underlying hardware tick counter - This is a requirement
The equality case can be made to work. If the extension of the counter is done in the interrupt routine, not in get_ticks, get_ticks just needs to read the msb of the counter, read the lsb of the counter, then verify that the msb has not changed. If you have interrupts that work, that is the easiest way to go. If the lsb of the counter has represents 1 ms or less, you can just drop it (equivalent to the what the PPC does now). If the interrupt is slower than that, you must use at least part of the LSB. If you don't have interrupts, the point is moot.
get_timer, you must first disable interrupts and then enable interrupts after sync_timebase returns
Why? - get_ticks() provides an atomic read of the hardware tick counter. If get_ticks() needs to disable and enable interrupts to do so, that is a problem for the platform maintainer
Admittedly, sync_timebase() will not be re-entrant, but how will it ever be called concurrently? - Ah, I see - a call to get_timer() interrupted by the timer ISR :)
Yes, that is the problem. I have come to the view that two 32 bit words are the best approach. Note that the lsb may actually not fill the full 32 bits. The top 32 bits are the rollover count and the bottom 32 bits are the current counter. If the counter is a full 32 bits, so much the better. Again, one could put them together inside the interrupt routine , but it is easier to check for a changed value if you don't do this. Otherwise, you have to check both words. It also makes the isr faster. It is just an increment of the overflow counter, like the PPC is now. I happen to think it is easier to convert the two 32 bit words to milliseconds one at a time, but if you feel you must use 64 bit words, that is fine too. Just remember, the counter does not always fill the entire bottom word. In cases where there are no interrupts, get_ticks has to detect that the timer has "backed up" and increment the overflow counter itself, unless the counter is 64 bits to begin with and overflow is impossible anyway. get_ticks should NOT try to detect overflows if interrupts are available. If it got both words before an interrupt happened, the answer is correct. If it got an interrupt in between fetching the words, the event will be detected and the value re-fetched. All sync_timebase would do now is convert the returned value to milliseconds.
So, if you have a 64 bit hardware counter, get_ticks reads and returns it. Else if you have interrupts, get_ticks reads the overflow counter into the msb. Next, it reads the hardware timer into the lsb. If the counter is a down counter, the lsb is = to the counter max - the lsb. The msb is then checked to make sure it hasn't changed, if it has, repeat the process. All the interrupt routine does is increase the overflow count. If you don't have interrupts get_ticks reads the hardware counter into the lsb. If the counter is a down counter, the lsb is = to the counter max - the lsb. If the lsb is less than it was in the previous call to get ticks, the overflow counter is increased. get_ticks then loads the overflow counter into the msb.
sync_timebase converts the msb and lsb into millisec. It may do this by a 64 bit divide, or some shifting to align the lsb with then msb and the a 64 bit divide, or a bunch of 32 bit fractional multiplies, or any such approach that works.
How does that sound? Best Regards. Bill Campbell
Regards,
Graeme

Hi Bill,
On Fri, May 27, 2011 at 11:26 AM, J. William Campbell jwilliamcampbell@comcast.net wrote:
On 5/26/2011 4:28 PM, Graeme Russ wrote:
Why mess around with bit shifting (which you would then have to cludge into your platform code) when carting around a 64-bit value is relatively cheap, transparent and poratble (all all supported up-to-date tool chains)
I really STRONGLY disagree with this statement. If you actually needed 64 bit variables, fine use them. But as I have already shown, you do not need them in general. We are computing a 32 bit result. There is some entropy argument that says you shouldn't need 64 bits to do so. Another way to look at it is that converting the top 32 bit word and the bottom 32 bit word to ms separately can be easier than doing them both together at once. However, as we will see below, I do agree we need two 32 bit words to make this process go smoothly. I just don't agree that they should/will constitute a 64 bit binary word. See below.
- void timer_isr() - Optional (particularly if tick counter rollover period is exceptionally log which is usually the case for native 64-bit tick counters) - Simply calls sync_timebase() - For platforms without any tick counter, this can implement one (but accuracy will be harmed due to usage of disable_interrupts() and enable_interrupts() in U-Boot
So to get the new API up and running, only two functions are mandatory:
get_ticks() which reads the hardware tick counter and deals with any 'funny stuff' including rollovers, short timers (12-bit for example), composite counters (16-bit 0-999 microsecond + 16-bit millisecond) and maintains a 'clean' 64-bit tick counter which rolls over from all 1's to all 0's. The
I think it is the task of get_ticks to return the hardware tick counter as an increasing counter, period. The counter may wrap at some final count that is not all ones. That is ok. Sync_timebase deals with the rollovers if
The hardware tick counter may, the 64-bit software tick counter maintained by get_ticks() may not
necessary. get_ticks is very lightweight. get_ticks should deal with decrementing counters by returning the complement of the counter. The sc520 case is a bit more complex if you intend to use the 0-999 and 16 bit millisec registers, in that you do need to add them to the previous value to
As I mentioned in another post, this is a problem for the platform maintainer and is abstracted away throught the platform specific implementation of get_ticks()
make an increasing counter. Sync_timebase "likes" short counters in that they are easy to convert to millisec and tick remainders.
The compiler should handle using 64-bit rather than 32-bit transparently
True enough. But you don't need 64 bit variables at this point two 32 bit ones work just fine, in fact better in most cases.
Remember, we are not dealing with a high performance OS here. The primary goal is portability - Performance optimisations (which do not break portability) can be performed later
64-bit tick counter does not need to be reset to zero ever (even on startup
- sync_timebase tacks care of all the details)
True, but sync_timebase does have to be initialized (as does the timer itself in most cases, so this is not a restriction).
This can be done in timer_init() via a call to sync_timebase() after the timer has been configured. This should bring everything into line
ticks_per_millisecond() simply return the number of ticks in a millisecond
- This may as simple as:
inline u64 ticks_per_millisecond(void) { return CONFIG_SYS_TICK_PER_MS; }
But it may be trickier if you have a programmable tick frequency
You will have to call the routine that initializes sync_timebase. This routine should have a name, like void init_sync_timebase(void)?
The optional timer ISR is required if the tick counter has a short roll over duration (short is up to you - 1 second is short, 1 hour might be, 1 century is not)
Regards,
Graeme
It is probably true that sync_timebase should have a parameter flag. The reason is that if the timer isr is called only when the timer wraps, then the calls to sync_timebase may be slightly more than a full timer period apart. (due to interrupt latency). Therefore, when the timer difference is computed, if the current update is due to a wrap AND the previous update is due to a wrap, the difference should be approximately 1 wrap. If it comes up real short, you must add a wrap. This isn't necessary if the routine is called more often than once per wrap. Also, when sync_timebase is called in
timer_isr() MUST be called more often than the rollover period of the underlying hardware tick counter - This is a requirement
The equality case can be made to work. If the extension of the counter is done in the interrupt routine, not in get_ticks, get_ticks just needs to read the msb of the counter, read the lsb of the counter, then verify that the msb has not changed. If you have interrupts that work, that is the easiest way to go. If the lsb of the counter has represents 1 ms or less, you can just drop it (equivalent to the what the PPC does now). If the interrupt is slower than that, you must use at least part of the LSB. If you don't have interrupts, the point is moot.
So now we have a complicated ISR and a complicated get_ticks() and you have to change get_ticks() when you decide to implement an ISR
get_timer, you must first disable interrupts and then enable interrupts after sync_timebase returns
Why? - get_ticks() provides an atomic read of the hardware tick counter. If get_ticks() needs to disable and enable interrupts to do so, that is a problem for the platform maintainer
Admittedly, sync_timebase() will not be re-entrant, but how will it ever be called concurrently? - Ah, I see - a call to get_timer() interrupted by the timer ISR :)
Yes, that is the problem. I have come to the view that two 32 bit words are the best approach. Note that the lsb may actually not fill the full 32 bits.
Urghhh
The top 32 bits are the rollover count and the bottom 32 bits are the current counter. If the counter is a full 32 bits, so much the better.
Ahhhhh - Lets keep it that way
Again, one could put them together inside the interrupt routine , but it is easier to check for a changed value if you don't do this. Otherwise, you have to check both words. It also makes the isr faster. It is just an
As I said before - Simple First, Fast Later
increment of the overflow counter, like the PPC is now. I happen to think it is easier to convert the two 32 bit words to milliseconds one at a time, but if you feel you must use 64 bit words, that is fine too. Just remember, the counter does not always fill the entire bottom word.
Urghhh
In cases where there are no interrupts, get_ticks has to detect that the timer has "backed up" and increment the overflow counter itself, unless the counter is 64 bits to begin with and overflow is impossible anyway. get_ticks should NOT try to detect overflows if interrupts are available. If it got both words before an interrupt happened, the answer is correct. If it got an interrupt in between fetching the words, the event will be detected and the value re-fetched. All sync_timebase would do now is convert the returned value to milliseconds.
So, if you have a 64 bit hardware counter, get_ticks reads and returns it. Else if you have interrupts, get_ticks reads the overflow counter into the msb. Next, it reads the hardware timer into the lsb. If the counter is a down counter, the lsb is = to the counter max - the lsb. The msb is then checked to make sure it hasn't changed, if it has, repeat the process. All the interrupt routine does is increase the overflow count. If you don't have interrupts get_ticks reads the hardware counter into the lsb. If the counter is a down counter, the lsb is = to the counter max - the lsb. If the lsb is less than it was in the previous call to get ticks, the overflow counter is increased. get_ticks then loads the overflow counter into the msb.
sync_timebase converts the msb and lsb into millisec. It may do this by a 64 bit divide, or some shifting to align the lsb with then msb and the a 64 bit divide, or a bunch of 32 bit fractional multiplies, or any such approach that works.
How does that sound?
The fact that you have described three different implementations of get_ticks() with two of these differentiated between whether you have interrupts or not immediately suggests this solution is inherently more complex and less maintainable.
Lets say you have a platform with a 32-bit tick counter running at a reasonably long rollover time so you decide not to use interrupts. Then you create a new platform with the same tick counter, but it runs much faster and you realise you need to implement the interrupt routine to make get_timer() work for long enough periods - Fine, you add an ISR for the new platform that calls sync_timebase - No other changes are required.
The last thing we want is for the 64-bit tick counter to be conceptually different across platforms
I just realised - the ISR _does not need to call the sync_timebase at all_ The ISR only needs to call get_ticks(), so it will be fast anyway
Regards,
Graeme

On 5/26/2011 6:51 PM, Graeme Russ wrote:
Hi Bill,
On Fri, May 27, 2011 at 11:26 AM, J. William Campbell jwilliamcampbell@comcast.net wrote:
On 5/26/2011 4:28 PM, Graeme Russ wrote:
Why mess around with bit shifting (which you would then have to cludge into your platform code) when carting around a 64-bit value is relatively cheap, transparent and poratble (all all supported up-to-date tool chains)
I really STRONGLY disagree with this statement. If you actually needed 64 bit variables, fine use them. But as I have already shown, you do not need them in general. We are computing a 32 bit result. There is some entropy argument that says you shouldn't need 64 bits to do so. Another way to look at it is that converting the top 32 bit word and the bottom 32 bit word to ms separately can be easier than doing them both together at once. However, as we will see below, I do agree we need two 32 bit words to make this process go smoothly. I just don't agree that they should/will constitute a 64 bit binary word. See below.
- void timer_isr()
- Optional (particularly if tick counter rollover period is exceptionally log which is usually the case for native 64-bit tick counters)
- Simply calls sync_timebase()
- For platforms without any tick counter, this can implement one (but accuracy will be harmed due to usage of disable_interrupts()
and enable_interrupts() in U-Boot
So to get the new API up and running, only two functions are mandatory:
get_ticks() which reads the hardware tick counter and deals with any 'funny stuff' including rollovers, short timers (12-bit for example), composite counters (16-bit 0-999 microsecond + 16-bit millisecond) and maintains a 'clean' 64-bit tick counter which rolls over from all 1's to all 0's. The
I think it is the task of get_ticks to return the hardware tick counter as an increasing counter, period. The counter may wrap at some final count that is not all ones. That is ok. Sync_timebase deals with the rollovers if
The hardware tick counter may, the 64-bit software tick counter maintained by get_ticks() may not
necessary. get_ticks is very lightweight. get_ticks should deal with decrementing counters by returning the complement of the counter. The sc520 case is a bit more complex if you intend to use the 0-999 and 16 bit millisec registers, in that you do need to add them to the previous value to
As I mentioned in another post, this is a problem for the platform maintainer and is abstracted away throught the platform specific implementation of get_ticks()
make an increasing counter. Sync_timebase "likes" short counters in that they are easy to convert to millisec and tick remainders.
The compiler should handle using 64-bit rather than 32-bit transparently
True enough. But you don't need 64 bit variables at this point two 32 bit ones work just fine, in fact better in most cases.
Remember, we are not dealing with a high performance OS here. The primary goal is portability - Performance optimisations (which do not break portability) can be performed later
64-bit tick counter does not need to be reset to zero ever (even on startup
- sync_timebase tacks care of all the details)
True, but sync_timebase does have to be initialized (as does the timer itself in most cases, so this is not a restriction).
This can be done in timer_init() via a call to sync_timebase() after the timer has been configured. This should bring everything into line
ticks_per_millisecond() simply return the number of ticks in a millisecond
- This may as simple as:
inline u64 ticks_per_millisecond(void) { return CONFIG_SYS_TICK_PER_MS; }
But it may be trickier if you have a programmable tick frequency
You will have to call the routine that initializes sync_timebase. This routine should have a name, like void init_sync_timebase(void)?
The optional timer ISR is required if the tick counter has a short roll over duration (short is up to you - 1 second is short, 1 hour might be, 1 century is not)
Regards,
Graeme
It is probably true that sync_timebase should have a parameter flag. The reason is that if the timer isr is called only when the timer wraps, then the calls to sync_timebase may be slightly more than a full timer period apart. (due to interrupt latency). Therefore, when the timer difference is computed, if the current update is due to a wrap AND the previous update is due to a wrap, the difference should be approximately 1 wrap. If it comes up real short, you must add a wrap. This isn't necessary if the routine is called more often than once per wrap. Also, when sync_timebase is called in
timer_isr() MUST be called more often than the rollover period of the underlying hardware tick counter - This is a requirement
The equality case can be made to work. If the extension of the counter is done in the interrupt routine, not in get_ticks, get_ticks just needs to read the msb of the counter, read the lsb of the counter, then verify that the msb has not changed. If you have interrupts that work, that is the easiest way to go. If the lsb of the counter has represents 1 ms or less, you can just drop it (equivalent to the what the PPC does now). If the interrupt is slower than that, you must use at least part of the LSB. If you don't have interrupts, the point is moot.
So now we have a complicated ISR and a complicated get_ticks() and you have to change get_ticks() when you decide to implement an ISR
get_timer, you must first disable interrupts and then enable interrupts after sync_timebase returns
Why? - get_ticks() provides an atomic read of the hardware tick counter. If get_ticks() needs to disable and enable interrupts to do so, that is a problem for the platform maintainer
Admittedly, sync_timebase() will not be re-entrant, but how will it ever be called concurrently? - Ah, I see - a call to get_timer() interrupted by the timer ISR :)
Yes, that is the problem. I have come to the view that two 32 bit words are the best approach. Note that the lsb may actually not fill the full 32 bits.
Urghhh
Hi All, Consider the case where we have a 16 bit counter clocked by 1.0 MHz (a nice number) interrupting on overflow. We could return (overflow counter << 16) | hardware counter from the get_ticks routine, which maintains the "binaryness" of the "number". OTOH, suppose the counter is a down counter clocked at 1.19318 MHz with a 1 ms period, like the PC counter is set up at present. What do we return then? Whatever we do, it depends on the clock rate, which may change, and involves some math, which may not work for all clock rates. In effect, you have a dual radix number. Yes, conversion is possible but it is messy and would be present in different forms in all the various get_tick routines. This is neither simple nor maintainable. Further, it is un-necessary, as the sync_timer routine is just going to convert the number from whatever radix we converted it to into millisec. If we leave the two numbers as split, all that complexity is removed from get_ticks and sent upward to the common routine that converts the answer into ms anyway. This makes the system more maintainable by placing the minimum requirement on get_ticks. The "tick" should be opaque to anybody but sync_timebase anyway.
The top 32 bits are the rollover count and the bottom 32 bits are the current counter. If the counter is a full 32 bits, so much the better.
Ahhhhh - Lets keep it that way
Again, one could put them together inside the interrupt routine , but it is easier to check for a changed value if you don't do this. Otherwise, you have to check both words. It also makes the isr faster. It is just an
As I said before - Simple First, Fast Later
I am in favor of simple. That is why I want get_ticks to do as little as possible. It should just read the hardware register and the overflow counter if it is separate. Make sure the overflow didn't change while we were reading. This is redundant if we are not using interrupts but we can leave the code in. It just won't do anything. We can also move the rollover detection to sync_timebase. It will be redundant if we are using interrupts, because time will never "back up". But we can do it this way. This centralizes the overflow detection, which is a good thing.
increment of the overflow counter, like the PPC is now. I happen to think it is easier to convert the two 32 bit words to milliseconds one at a time, but if you feel you must use 64 bit words, that is fine too. Just remember, the counter does not always fill the entire bottom word.
Urghhh
In cases where there are no interrupts, get_ticks has to detect that the timer has "backed up" and increment the overflow counter itself, unless the counter is 64 bits to begin with and overflow is impossible anyway. get_ticks should NOT try to detect overflows if interrupts are available. If it got both words before an interrupt happened, the answer is correct. If it got an interrupt in between fetching the words, the event will be detected and the value re-fetched. All sync_timebase would do now is convert the returned value to milliseconds.
So, if you have a 64 bit hardware counter, get_ticks reads and returns it. Else if you have interrupts, get_ticks reads the overflow counter into the msb. Next, it reads the hardware timer into the lsb. If the counter is a down counter, the lsb is = to the counter max - the lsb. The msb is then checked to make sure it hasn't changed, if it has, repeat the process. All the interrupt routine does is increase the overflow count. If you don't have interrupts get_ticks reads the hardware counter into the lsb. If the counter is a down counter, the lsb is = to the counter max - the lsb. If the lsb is less than it was in the previous call to get ticks, the overflow counter is increased. get_ticks then loads the overflow counter into the msb.
sync_timebase converts the msb and lsb into millisec. It may do this by a 64 bit divide, or some shifting to align the lsb with then msb and the a 64 bit divide, or a bunch of 32 bit fractional multiplies, or any such approach that works.
How does that sound?
The fact that you have described three different implementations of get_ticks() with two of these differentiated between whether you have interrupts or not immediately suggests this solution is inherently more complex and less maintainable.
Yes, It is less maintainable because there are three cases. As shown above, we can reduce the number of cases to one at the expense of some redundant code. There are problems with our original approach when the interrupt rate is == to the counter rollover period. This problem is detailed below. You placed the constraint that the interrupt rate was faster than the rollover period on the previous implementation. However, that constraint is not met in essentially all interrupt driven cases. Also, note that these get_ticks routines, even while different, are pretty simple. OTOH, some of the 64 bit counters can't be read atomically either, so having the "redundant"check in the code is not so bad. The again, we are talking about a total of probably 5 lines of code or so anyway. It is a good idea to move the rollover detection in the non-interrupt case to sync_timebase as well. Sync_timebase can also invert the down-counting counters, removing that from get_ticks. The wrap detection code can be #ifdef out if one is using interrupts and offended by it's presence. Thanks for pointing this out and compelling me to reduce the number of cases! Making get_ticks more lightweight is a good idea in my opinion.
Lets say you have a platform with a 32-bit tick counter running at a reasonably long rollover time so you decide not to use interrupts. Then you create a new platform with the same tick counter, but it runs much faster and you realise you need to implement the interrupt routine to make get_timer() work for long enough periods - Fine, you add an ISR for the new platform that calls sync_timebase - No other changes are required.
The last thing we want is for the 64-bit tick counter to be conceptually different across platforms
I just realised - the ISR _does not need to call the sync_timebase at all_ The ISR only needs to call get_ticks(), so it will be fast anyway
The problem is that the way we previously detected wrapping does not work if the interrupt rate is == to the counter wrap time, which it essentially always is. If get_ticks is trying to update the wrap count when an interrupt comes in, it will do it wrong. If the interrupt rate was faster, it would work, because get_ticks would always know the correct answer. Consider the following. Get ticks is called by the counter overflowing (which resets it to 0) and executes with the counter value at 0. Later, at the next rollover, with no intervening calls to get ticks, the interrupt routine calls get ticks. Assuming negligible interrupt latency, get_ticks just sees the counter is still 0, so it will not detect a wrap condition. So now you loose a period. I thought by passing in a flag we could force get_ticks to do the right thing in this case, but since we must disable interrupts when we call get ticks from the get_timer routine, the outer call to get ticks may have already detected the rollover and the flag will force an additional rollover to be detected. It is a classic race condition. If we detect rollover only in the interrupt routine, we do not need to protect the rollover variable from access by the main program, so we can be sure the results of get_ticks is coherent. If we try do do it in both places, we will have trouble. If the interrupt occurred at a faster rate, like say twice the rollover, we wouldn't have a problem. But it doesn't. In most cases it happens at the rollover rate, or just a bit more sometimes due to interrupt latency not being a constant. It may appear to happen at somewhat less than a rollover as well, if the interrupt latency was long at time n-1 and short at time n. I think this problem can be overcome in the old design by keeping track of whether the last update was by a non-interrupt or interrupt call to get_ticks , but I think the new design is better anyway.
Best Regards, Bill Campbell
Regards,
Graeme

Hi Bill,
On Fri, May 27, 2011 at 1:54 PM, J. William Campbell jwilliamcampbell@comcast.net wrote:
On 5/26/2011 6:51 PM, Graeme Russ wrote:
Hi Bill,
On Fri, May 27, 2011 at 11:26 AM, J. William Campbell jwilliamcampbell@comcast.net wrote:
[snip]
Yes, that is the problem. I have come to the view that two 32 bit words are the best approach. Note that the lsb may actually not fill the full 32 bits.
Urghhh
Hi All, Consider the case where we have a 16 bit counter clocked by 1.0 MHz (a nice number) interrupting on overflow. We could return (overflow counter << 16) | hardware counter from the get_ticks routine, which maintains the "binaryness" of the "number". OTOH, suppose the counter is a down counter clocked at 1.19318 MHz with a 1 ms period, like the PC counter is set up at present. What do we return then? Whatever we do, it depends on the clock rate, which may change, and involves some math, which may not work for all clock rates. In effect, you have a dual radix number. Yes, conversion is possible but it is messy and would be present in different forms in all the various get_tick routines.
get_ticks() does not care about the clock rate - It simply looks at the current value of the hardware tick counter and the value of the hardware tick counter the last time get_ticks() was called, calculates the difference and adds that to the 64-bit software tick-counter
I don't see how it being a down counter makes that any more difficult
This is neither simple nor maintainable. Further, it is un-necessary, as the sync_timer routine is just going to convert the number from whatever radix we converted it to into millisec. If we leave the two numbers as split, all that complexity is removed from get_ticks and sent upward to the common routine that converts the answer into ms anyway. This makes the system more maintainable by placing the minimum requirement on get_ticks. The "tick" should be opaque to anybody but sync_timebase anyway.
But how is the common routine going to know if it is a split timer, up timer, down timer, little endian, big endian, etc, etc.
get_ticks() abstracts the hardware implementation of the hardware timer from sync_timer()
The top 32 bits are the rollover count and the bottom 32 bits are the current counter. If the counter is a full 32 bits, so much the better.
Ahhhhh - Lets keep it that way
Again, one could put them together inside the interrupt routine , but it is easier to check for a changed value if you don't do this. Otherwise, you have to check both words. It also makes the isr faster. It is just an
As I said before - Simple First, Fast Later
I am in favor of simple. That is why I want get_ticks to do as little as possible. It should just read the hardware register and the overflow counter if it is separate. Make sure the overflow didn't change while we were reading. This is redundant if we are not using interrupts but we can leave the code in. It just won't do anything. We can also move the rollover detection to sync_timebase. It will be redundant if we are using interrupts, because time will never "back up". But we can do it this way. This centralizes the overflow detection, which is a good thing.
That does not sound simple to me. This, however, does:
u64 get_ticks(void) { static u64 last_hw_tick_count; static u64 last_sw_tick_count;
/* Now for the platform specific stuff - Read hardware tick counter */ u64 current_hw_tick_count = /* Read hw registers etc */
/* Deal with hardware weirdness - errata, stupid hw engineers etc */
u64 elapsed_ticks = current_hw_tick_count - last_hw_tick_count; last_sw_tick_count += elapsed_ticks;
return last_sw_tick_count; }
The '/* Read hw registers etc */' bit will always be the same, no matter what way you do it
The '/* Deal with hardware weirdness - errata, stupid hw engineers etc */' bit is where we are truly abstracting the hardware away - This is the bit you propose to leave mangled and deal with in sync_time? You make a lot of assumptions about the consistency of this highly variable logic across all current (and future) U-Boot platforms
What if the hardware engineer figures he can save a squeeze some extra mileage out of a FPGA by implementing a 24-bit counter + 8-bit status in one 32-bit register - get_ticks() strips all that out
You want to do some in get_ticks(), some in the ISR and some in sync_timer() - Put all the weird stuff on one place - The HAL
increment of the overflow counter, like the PPC is now. I happen to think it is easier to convert the two 32 bit words to milliseconds one at a time, but if you feel you must use 64 bit words, that is fine too. Just remember, the counter does not always fill the entire bottom word.
Urghhh
In cases where there are no interrupts, get_ticks has to detect that the timer has "backed up" and increment the overflow counter itself, unless the counter is 64 bits to begin with and overflow is impossible anyway. get_ticks should NOT try to detect overflows if interrupts are available. If it got both words before an interrupt happened, the answer is correct. If it got an interrupt in between fetching the words, the event will be detected and the value re-fetched. All sync_timebase would do now is convert the returned value to milliseconds.
So, if you have a 64 bit hardware counter, get_ticks reads and returns it. Else if you have interrupts, get_ticks reads the overflow counter into the msb. Next, it reads the hardware timer into the lsb. If the counter is a down counter, the lsb is = to the counter max - the lsb. The msb is then checked to make sure it hasn't changed, if it has, repeat the process. All the interrupt routine does is increase the overflow count. If you don't have interrupts get_ticks reads the hardware counter into the lsb. If the counter is a down counter, the lsb is = to the counter max - the lsb. If the lsb is less than it was in the previous call to get ticks, the overflow counter is increased. get_ticks then loads the overflow counter into the msb.
sync_timebase converts the msb and lsb into millisec. It may do this by a 64 bit divide, or some shifting to align the lsb with then msb and the a 64 bit divide, or a bunch of 32 bit fractional multiplies, or any such approach that works.
How does that sound?
The fact that you have described three different implementations of get_ticks() with two of these differentiated between whether you have interrupts or not immediately suggests this solution is inherently more complex and less maintainable.
Yes, It is less maintainable because there are three cases. As shown above, we can reduce the number of cases to one at the expense of some redundant code. There are problems with our original approach when the interrupt rate is == to the counter rollover period. This problem is detailed below. You placed the constraint that the interrupt rate was faster than the rollover period on the previous implementation. However, that constraint is not met in essentially all interrupt driven cases. Also, note that these get_ticks routines, even while different, are pretty simple. OTOH, some of the 64 bit counters can't be read atomically either, so having the "redundant"check in the code is not so bad. The again, we are talking about a total of probably 5 lines of code or so anyway. It is a good idea to move the rollover detection in the non-interrupt case to sync_timebase as well. Sync_timebase can also invert the down-counting counters, removing that from get_ticks. The wrap detection code can be #ifdef out if one is using interrupts and
Urghh - Why are you adding unnecessary ugliness - #ifdef in the middle of code is usually a sign you are doing something wrong
offended by it's presence. Thanks for pointing this out and compelling me to reduce the number of cases! Making get_ticks more lightweight is a good idea in my opinion.
Lets say you have a platform with a 32-bit tick counter running at a reasonably long rollover time so you decide not to use interrupts. Then you create a new platform with the same tick counter, but it runs much faster and you realise you need to implement the interrupt routine to make get_timer() work for long enough periods - Fine, you add an ISR for the new platform that calls sync_timebase - No other changes are required.
The last thing we want is for the 64-bit tick counter to be conceptually different across platforms
I just realised - the ISR _does not need to call the sync_timebase at all_ The ISR only needs to call get_ticks(), so it will be fast anyway
The problem is that the way we previously detected wrapping does not work if the interrupt rate is == to the counter wrap time, which it essentially always is. If get_ticks is trying to update the wrap count when an interrupt
Is it, always, on every platform?
comes in, it will do it wrong. If the interrupt rate was faster, it would work, because get_ticks would always know the correct answer. Consider the following. Get ticks is called by the counter overflowing (which resets it to 0) and executes with the counter value at 0. Later, at the next rollover, with no intervening calls to get ticks, the interrupt routine calls get ticks. Assuming negligible interrupt latency, get_ticks just sees the counter is still 0, so it will not detect a wrap condition. So now you loose
But wait a minute, don't you KNOW that the interrupt gets triggered on a rollover so therefore there MUST have been a rollover, therefore there has been 2^32 (or 2^16 or 2^whatever) ticks which you can just subtract from the last known tick count (which would be zero if there had been no get_timer calls in between)
a period. I thought by passing in a flag we could force get_ticks to do the right thing in this case, but since we must disable interrupts when we call get ticks from the get_timer routine, the outer call to get ticks may have already detected the rollover and the flag will force an additional rollover to be detected. It is a classic race condition. If we detect rollover only
Aha! - In nearly all situations, a race is better avoided than run!
in the interrupt routine, we do not need to protect the rollover variable from access by the main program, so we can be sure the results of get_ticks is coherent. If we try do do it in both places, we will have trouble. If the interrupt occurred at a faster rate, like say twice the rollover, we wouldn't have a problem. But it doesn't. In most cases it happens at the rollover rate, or just a bit more sometimes due to interrupt latency not
Again does it really - for all arches?
being a constant. It may appear to happen at somewhat less than a rollover as well, if the interrupt latency was long at time n-1 and short at time n. I think this problem can be overcome in the old design by keeping track of whether the last update was by a non-interrupt or interrupt call to get_ticks , but I think the new design is better anyway.
I disagree - The whole point of a HAL is to Abstract the Hardware
Regards,
Graeme

On 5/26/2011 9:33 PM, Graeme Russ wrote:
Hi Bill,
<snip>
get_ticks() does not care about the clock rate - It simply looks at the current value of the hardware tick counter and the value of the hardware tick counter the last time get_ticks() was called, calculates the difference and adds that to the 64-bit software tick-counter I don't see how it being a down counter makes that any more difficult
This is neither simple nor maintainable. Further, it is un-necessary, as the sync_timer routine is just going to convert the number from whatever radix we converted it to into millisec. If we leave the two numbers as split, all that complexity is removed from get_ticks and sent upward to the common routine that converts the answer into ms anyway. This makes the system more maintainable by placing the minimum requirement on get_ticks. The "tick" should be opaque to anybody but sync_timebase anyway.
But how is the common routine going to know if it is a split timer, up timer, down timer, little endian, big endian, etc, etc.
get_ticks() abstracts the hardware implementation of the hardware timer from sync_timer()
Hi All, I understand your point. I prefer a higher level of abstraction. You are correct that there are some aspects of the tick counter that are very hardware quirky, and these attributes are hard to abstract. If the timer is embedded into a bit field with other variables, it is reasonable to expect get_ticks to extract the bit field and right justify the number. If there are endian issues, the get_ticks routine must return the number in the "natural" endianness of the platform. However, after that point, the values are extremely "regular". The fact that a counter is a down counter can be expressed in a data structure as a boolean. The high limit of the hardware counter is a number. The number of ticks per millsecond is obtainable from usec2ticks(1000), or 10000 if we want to avoid some roundoff. From these values, sync_timer can take the two part ticks value and convert it to millisec. Trust me on this. I have the routines to do it. This puts as much of the abstraction of the two numbers into ONE COMMON ROUTINE, sync_timer. Now it is clearly possible to move some of the abstraction down a level into sync_timer. For instance you could move inverting the counter down to that level, and then multiply the msb by the maximum value of the lsb counter and add in the msb. It is clearly possible to move ALL of sync_timer down into get_ticks, if one wanted to. It is clearly possible to replace general values in gd-> with platform specific constant values. However, if you do that, you end up with a lot of duplicate, or almost duplicate, code running around. That has proven to be error prone, and it has left new ports of u-boot to sort of fend for themselves in figuring out how things should work. I prefer to abstract that all up into sync_timer. That way, all the math is in one place, and is table driven so it is easy to change.
The top 32 bits are the rollover count and the bottom 32 bits are the current counter. If the counter is a full 32 bits, so much the better.
Ahhhhh - Lets keep it that way
Again, one could put them together inside the interrupt routine , but it is easier to check for a changed value if you don't do this. Otherwise, you have to check both words. It also makes the isr faster. It is just an
As I said before - Simple First, Fast Later
I am in favor of simple. That is why I want get_ticks to do as little as possible. It should just read the hardware register and the overflow counter if it is separate. Make sure the overflow didn't change while we were reading. This is redundant if we are not using interrupts but we can leave the code in. It just won't do anything. We can also move the rollover detection to sync_timebase. It will be redundant if we are using interrupts, because time will never "back up". But we can do it this way. This centralizes the overflow detection, which is a good thing.
That does not sound simple to me. This, however, does:
u64 get_ticks(void) { static u64 last_hw_tick_count; static u64 last_sw_tick_count;
/* Now for the platform specific stuff - Read hardware tick counter */ u64 current_hw_tick_count = /* Read hw registers etc */
/* Deal with hardware weirdness - errata, stupid hw engineers etc */
u64 elapsed_ticks = current_hw_tick_count - last_hw_tick_count; last_sw_tick_count += elapsed_ticks;
return last_sw_tick_count; }
The '/* Read hw registers etc */' bit will always be the same, no matter what way you do it
Agree.
The '/* Deal with hardware weirdness - errata, stupid hw engineers etc */' bit is where we are truly abstracting the hardware away - This is the bit you propose to leave mangled and deal with in sync_time?
Not totally. The get_ticks routine must mask off any extra bits and right justify the hardware counter. If the counter is of "improper" endianness (not common, but probably not unknown), it should be fixed in get_ticks.
You make a lot of assumptions about the consistency of this highly variable logic across all current (and future) U-Boot platforms
Not really. The assumptions are that the two numbers are both binary numbers
What if the hardware engineer figures he can save a squeeze some extra mileage out of a FPGA by implementing a 24-bit counter + 8-bit status in one 32-bit register - get_ticks() strips all that out
Correct, it should strip out the extra bits
You want to do some in get_ticks(), some in the ISR and some in sync_timer() -
Well, not really. All the ISR ever does is increment the count msb. Just like it does now on the PPC. Always that, exactly that, and nothing more. Get ticks "extracts" the hardware counter into an msb and lsb part. It makes sure that the msb hasn't changed while it extracted the lsb. This avoids a race with the isr, and is not necessary if interrupts are not used. Both your version and my version of the get_ticks routine will need to do this.
Put all the weird stuff on one place - The HAL
I agree. I just don't consider compensating for the timer counting down instead of up to be "weird". I consider it common. I also don't consider the fact that the hardware counter may not contain 32 bits and may not cover a power of 2 to be "weird". I consider it to be common also. The total difference between what you would put in get_ticks and what I would put in sync_timer is something like this
/* if timer is a down counter, reverse it */ if (gd->timer_counts_down) count.lsb = gd->lsb_max_value - count.lsb; /* you would add the following to make a number out of the two parts. I would do this at the next level up, after I had checked for wrapping. That way I can avoid 64 bits longer, possibly forever. Note that a 0 lsb_max_value corresponds to a full 32 bits */ if (gd->lsb_max_value > 0 ) count.u64 = count.msb * gd->lsb_max_value + count.lsb; /* if it was a 64 bit number to begin with, we don't have to do any multiply/add */
I don't' see any reason to push this down to a lower level. It is just one more thing to get messed up across implementations.
detection in the non-interrupt case to sync_timebase as well. Sync_timebase can also invert the down-counting counters, removing that from get_ticks. The wrap detection code can be #ifdef out if one is using interrupts and
Urghh - Why are you adding unnecessary ugliness - #ifdef in the middle of code is usually a sign you are doing something wrong
As I said, this is an optional optimization. I do not agree that an #ifdef in the middle of code indicates you have a bad design. Lots and Lots of ifdefs certainly indicates a bad design. An ifdef to eliminate code if some option is not selected is hardly such a strange thing, especially only a single #ifdef. However, feel free to not have it if you so desire.
offended by it's presence. Thanks for pointing this out and compelling me to reduce the number of cases! Making get_ticks more lightweight is a good idea in my opinion.
Lets say you have a platform with a 32-bit tick counter running at a reasonably long rollover time so you decide not to use interrupts. Then you create a new platform with the same tick counter, but it runs much faster and you realise you need to implement the interrupt routine to make get_timer() work for long enough periods - Fine, you add an ISR for the new platform that calls sync_timebase - No other changes are required.
The last thing we want is for the 64-bit tick counter to be conceptually different across platforms
I just realised - the ISR _does not need to call the sync_timebase at all_ The ISR only needs to call get_ticks(), so it will be fast anyway
The problem is that the way we previously detected wrapping does not work if the interrupt rate is == to the counter wrap time, which it essentially always is. If get_ticks is trying to update the wrap count when an interrupt
Is it, always, on every platform?
Yes, pretty much. You get a terminal count/counter underflow interrupt and that is it.
comes in, it will do it wrong. If the interrupt rate was faster, it would work, because get_ticks would always know the correct answer. Consider the following. Get ticks is called by the counter overflowing (which resets it to 0) and executes with the counter value at 0. Later, at the next rollover, with no intervening calls to get ticks, the interrupt routine calls get ticks. Assuming negligible interrupt latency, get_ticks just sees the counter is still 0, so it will not detect a wrap condition. So now you loose
But wait a minute, don't you KNOW that the interrupt gets triggered on a rollover so therefore there MUST have been a rollover, therefore there has been 2^32 (or 2^16 or 2^whatever) ticks which you can just subtract from the last known tick count (which would be zero if there had been no get_timer calls in between)
Except if this interrupt was delayed because get_ticks turned off interrupts, get_ticks may have already compensated the number. When we then get the (delayed) interrupt, we will do it twice.
a period. I thought by passing in a flag we could force get_ticks to do the right thing in this case, but since we must disable interrupts when we call get ticks from the get_timer routine, the outer call to get ticks may have already detected the rollover and the flag will force an additional rollover to be detected. It is a classic race condition. If we detect rollover only
Aha! - In nearly all situations, a race is better avoided than run!
Yes, that is why the new approach does NOT turn off interrupts when get_ticks is called, but rather loops if the overflow count was changed while reading the count lsb. That ensures get_ticks always gets both words either before the counter wrapped or after the counter wrapped, but not halfway in between. Some ISYNCs may be required to make sure there are no outstanding changes pending to memory, depending on your architecture.
in the interrupt routine, we do not need to protect the rollover variable from access by the main program, so we can be sure the results of get_ticks is coherent. If we try do do it in both places, we will have trouble. If the interrupt occurred at a faster rate, like say twice the rollover, we wouldn't have a problem. But it doesn't. In most cases it happens at the rollover rate, or just a bit more sometimes due to interrupt latency not
Again does it really - for all arches?
Yep, pretty much. Almost all timers I know of only give one interrupt per cycle.
being a constant. It may appear to happen at somewhat less than a rollover as well, if the interrupt latency was long at time n-1 and short at time n. I think this problem can be overcome in the old design by keeping track of whether the last update was by a non-interrupt or interrupt call to get_ticks , but I think the new design is better anyway.
I disagree - The whole point of a HAL is to Abstract the Hardware
The HAL should abstract what needs abstracting, and no more. We are really talking about literally 4 lines of code here, but I think those four lines certainly belong in sync_timer. All the math in one place is a good thing. In most cases, get_ticks becomes trivial. Only in cases where the timer is wrong endian or embedded in another bit field (both things being possible but not all that common) does get_ticks need to do anything other than just read the registers and loop until the msb is stable. Simple here is good, because this is what new implementers need to add/change. The less here, the better! In fact, the looping until the msb is stable can also be moved up to sync_timer. That is even simpler.
Best Regards, Bill Campbell
Regards,
Graeme

On Fri, May 27, 2011 at 4:33 PM, J. William Campbell jwilliamcampbell@comcast.net wrote:
On 5/26/2011 9:33 PM, Graeme Russ wrote:
Hi Bill,
<snip> >
[massive snip]
OK, you have my ears pricked - Can you give me code samples for:
- get_ticks() - sync_timbase() (no need to implement the whole lot if that is too much effort right now) - timer_isr()
that handle the following hardware tick counter scenarios:
a) 64-bit up counter b) 64-bit down counter c) 32-bit up counter, wraps at 65000 d) 16-bit microsecond up counter 0-999 which wraps and triggers a 16-bit millisecond up counter. Reading milliseconds latched microseconds and clears milliseconds (look in arch/x86/cpu/sc520/timer.c for example) e) 24-bit counter occupying bits 2-25 of a 32-bit word (just to be difficult) f) Any other option anyone cares to throw ;)
All of these options must be covered using: - Minimal global data (we would like it to work before relocation, but not mandatory - GD footprint would be nice) - All use the same sync_timebase function - Demonstrate using an ISR NOT synced to the hardware tick counter and an ISR that is - Parameters to get_ticks() and sync_timer() are permitted, but not for timer_isr() (naturally)
I don't' see any reason to push this down to a lower level. It is just one more thing to get messed up across implementations.
Agreed
detection in the non-interrupt case to sync_timebase as well. Sync_timebase can also invert the down-counting counters, removing that from get_ticks. The wrap detection code can be #ifdef out if one is using interrupts and
Urghh - Why are you adding unnecessary ugliness - #ifdef in the middle of code is usually a sign you are doing something wrong
As I said, this is an optional optimization. I do not agree that an #ifdef in the middle of code indicates you have a bad design. Lots and Lots of ifdefs certainly indicates a bad design. An ifdef to eliminate code if some option is not selected is hardly such a strange thing, especially only a single #ifdef. However, feel free to not have it if you so desire.
OK, I'll let this slide for the moment - please include in above example
offended by it's presence. Thanks for pointing this out and compelling me to reduce the number of cases! Making get_ticks more lightweight is a good idea in my opinion.
Lets say you have a platform with a 32-bit tick counter running at a reasonably long rollover time so you decide not to use interrupts. Then you create a new platform with the same tick counter, but it runs much faster and you realise you need to implement the interrupt routine to make get_timer() work for long enough periods - Fine, you add an ISR for the new platform that calls sync_timebase - No other changes are required.
The last thing we want is for the 64-bit tick counter to be conceptually different across platforms
I just realised - the ISR _does not need to call the sync_timebase at all_ The ISR only needs to call get_ticks(), so it will be fast anyway
The problem is that the way we previously detected wrapping does not work if the interrupt rate is == to the counter wrap time, which it essentially always is. If get_ticks is trying to update the wrap count when an interrupt
Is it, always, on every platform?
Yes, pretty much. You get a terminal count/counter underflow interrupt and that is it.
Not on sc520 - The micro/millisecond counter cannot be used to driver an interrupt - you need to setup a seperate timer. I think the x86 internal performance counters are the same
comes in, it will do it wrong. If the interrupt rate was faster, it would work, because get_ticks would always know the correct answer. Consider the following. Get ticks is called by the counter overflowing (which resets it to 0) and executes with the counter value at 0. Later, at the next rollover, with no intervening calls to get ticks, the interrupt routine calls get ticks. Assuming negligible interrupt latency, get_ticks just sees the counter is still 0, so it will not detect a wrap condition. So now you loose
But wait a minute, don't you KNOW that the interrupt gets triggered on a rollover so therefore there MUST have been a rollover, therefore there has been 2^32 (or 2^16 or 2^whatever) ticks which you can just subtract from the last known tick count (which would be zero if there had been no get_timer calls in between)
Except if this interrupt was delayed because get_ticks turned off interrupts, get_ticks may have already compensated the number. When we then get the (delayed) interrupt, we will do it twice.
That would mean get_timer() got called EXACTLY on the rollover - A race condition that should be avoided. But still, a race can still occur through using disable/enable interrupts which could push the timer isr out
a period. I thought by passing in a flag we could force get_ticks to do the right thing in this case, but since we must disable interrupts when we call get ticks from the get_timer routine, the outer call to get ticks may have already detected the rollover and the flag will force an additional rollover to be detected. It is a classic race condition. If we detect rollover only
Aha! - In nearly all situations, a race is better avoided than run!
Yes, that is why the new approach does NOT turn off interrupts when get_ticks is called, but rather loops if the overflow count was changed while reading the count lsb. That ensures get_ticks always gets both words either before the counter wrapped or after the counter wrapped, but not halfway in between. Some ISYNCs may be required to make sure there are no outstanding changes pending to memory, depending on your architecture.
OK, lets have a closer look
in the interrupt routine, we do not need to protect the rollover variable from access by the main program, so we can be sure the results of get_ticks is coherent. If we try do do it in both places, we will have trouble. If the interrupt occurred at a faster rate, like say twice the rollover, we wouldn't have a problem. But it doesn't. In most cases it happens at the rollover rate, or just a bit more sometimes due to interrupt latency not
Again does it really - for all arches?
Yep, pretty much. Almost all timers I know of only give one interrupt per cycle.
A bold assumption - and one that is wrong ;)
being a constant. It may appear to happen at somewhat less than a rollover as well, if the interrupt latency was long at time n-1 and short at time n. I think this problem can be overcome in the old design by keeping track of whether the last update was by a non-interrupt or interrupt call to get_ticks , but I think the new design is better anyway.
I disagree - The whole point of a HAL is to Abstract the Hardware
The HAL should abstract what needs abstracting, and no more. We are really talking about literally 4 lines of code here, but I think those four lines certainly belong in sync_timer. All the math in one place is a good thing. In most cases, get_ticks becomes trivial. Only in cases where the timer is wrong endian or embedded in another bit field (both things being possible but not all that common) does get_ticks need to do anything other than just read the registers and loop until the msb is stable. Simple here is good, because this is what new implementers need to add/change. The less here, the better! In fact, the looping until the msb is stable can also be moved up to sync_timer. That is even simpler.
OK, you got me - "show me the money^H^H^H^H^Hcode"
Regards,
Graeme

On 5/26/2011 11:54 PM, Graeme Russ wrote:
On Fri, May 27, 2011 at 4:33 PM, J. William Campbell jwilliamcampbell@comcast.net wrote:
On 5/26/2011 9:33 PM, Graeme Russ wrote:
Hi Bill,
<snip>
[massive snip]
OK, you have my ears pricked - Can you give me code samples for:
- get_ticks()
- sync_timbase() (no need to implement the whole lot if that is too much effort right now)
- timer_isr()
that handle the following hardware tick counter scenarios:
a) 64-bit up counter b) 64-bit down counter
Hi Graeme,
c) 32-bit up counter, wraps at 65000
Do you mean 32 bits or 16 bits? doesn't make much difference, but just checking.
d) 16-bit microsecond up counter 0-999 which wraps and triggers a 16-bit millisecond up counter. Reading milliseconds latched microseconds and clears milliseconds (look in arch/x86/cpu/sc520/timer.c for example) e) 24-bit counter occupying bits 2-25 of a 32-bit word (just to be difficult) f) Any other option anyone cares to throw ;)
All of these options must be covered using:
- Minimal global data (we would like it to work before relocation, but not mandatory - GD footprint would be nice)
- All use the same sync_timebase function
- Demonstrate using an ISR NOT synced to the hardware tick counter and an ISR that is
- Parameters to get_ticks() and sync_timer() are permitted, but not for timer_isr() (naturally)
OK! Once again, I accept the challenge. One Caveat. My real work has been sliding due to the time I have been spending on this. I am flying from San Francisco to Sydney tonight , (to work, not to play), so I will be off-grid for 14 hours+. You will not get this code for a few days, like probably 3 days. I have looked at the requirements, and I see no real problems that I don't know how to solve.
I don't' see any reason to push this down to a lower level. It is just one more thing to get messed up across implementations.
Agreed
detection in the non-interrupt case to sync_timebase as well. Sync_timebase can also invert the down-counting counters, removing that from get_ticks. The wrap detection code can be #ifdef out if one is using interrupts and
Urghh - Why are you adding unnecessary ugliness - #ifdef in the middle of code is usually a sign you are doing something wrong
As I said, this is an optional optimization. I do not agree that an #ifdef in the middle of code indicates you have a bad design. Lots and Lots of ifdefs certainly indicates a bad design. An ifdef to eliminate code if some option is not selected is hardly such a strange thing, especially only a single #ifdef. However, feel free to not have it if you so desire.
OK, I'll let this slide for the moment - please include in above example
Will Do.
offended by it's presence. Thanks for pointing this out and compelling me to reduce the number of cases! Making get_ticks more lightweight is a good idea in my opinion.
Lets say you have a platform with a 32-bit tick counter running at a reasonably long rollover time so you decide not to use interrupts. Then you create a new platform with the same tick counter, but it runs much faster and you realise you need to implement the interrupt routine to make get_timer() work for long enough periods - Fine, you add an ISR for the new platform that calls sync_timebase - No other changes are required.
The last thing we want is for the 64-bit tick counter to be conceptually different across platforms
I just realised - the ISR _does not need to call the sync_timebase at all_ The ISR only needs to call get_ticks(), so it will be fast anyway
The problem is that the way we previously detected wrapping does not work if the interrupt rate is == to the counter wrap time, which it essentially always is. If get_ticks is trying to update the wrap count when an interrupt
Is it, always, on every platform?
Yes, pretty much. You get a terminal count/counter underflow interrupt and that is it.
Not on sc520 - The micro/millisecond counter cannot be used to driver an interrupt - you need to setup a seperate timer. I think the x86 internal performance counters are the same
What is true, as you have stated, is that the micro/millisecond counter on the sc520 does not interrupt at all. Nor do the x86 performance counters. The x86 performance counters are a non-problem because they are 64 bits long. We don't need interrupts for them. Now, if you choose to use the sc520 micro/millisecond counter, then you need another source of interrupts. Due to the fact that reading the sc520 counter resets it, we must accumulate the elapsed time in software. That means the interrupt routine must do a bit more work, but it also allows reading the counters in non-interrupt code (with interrupts disabled) to not mess up the accumulated count. We don't detect rollover in the sc520 counters, we just read and accumulate the value. So no problem there.
comes in, it will do it wrong. If the interrupt rate was faster, it would work, because get_ticks would always know the correct answer. Consider the following. Get ticks is called by the counter overflowing (which resets it to 0) and executes with the counter value at 0. Later, at the next rollover, with no intervening calls to get ticks, the interrupt routine calls get ticks. Assuming negligible interrupt latency, get_ticks just sees the counter is still 0, so it will not detect a wrap condition. So now you loose
But wait a minute, don't you KNOW that the interrupt gets triggered on a rollover so therefore there MUST have been a rollover, therefore there has been 2^32 (or 2^16 or 2^whatever) ticks which you can just subtract from the last known tick count (which would be zero if there had been no get_timer calls in between)
Except if this interrupt was delayed because get_ticks turned off interrupts, get_ticks may have already compensated the number. When we then get the (delayed) interrupt, we will do it twice.
That would mean get_timer() got called EXACTLY on the rollover - A race condition that should be avoided. But still, a race can still occur through using disable/enable interrupts which could push the timer isr out
a period. I thought by passing in a flag we could force get_ticks to do the right thing in this case, but since we must disable interrupts when we call get ticks from the get_timer routine, the outer call to get ticks may have already detected the rollover and the flag will force an additional rollover to be detected. It is a classic race condition. If we detect rollover only
Aha! - In nearly all situations, a race is better avoided than run!
Yes, that is why the new approach does NOT turn off interrupts when get_ticks is called, but rather loops if the overflow count was changed while reading the count lsb. That ensures get_ticks always gets both words either before the counter wrapped or after the counter wrapped, but not halfway in between. Some ISYNCs may be required to make sure there are no outstanding changes pending to memory, depending on your architecture.
OK, lets have a closer look
in the interrupt routine, we do not need to protect the rollover variable from access by the main program, so we can be sure the results of get_ticks is coherent. If we try do do it in both places, we will have trouble. If the interrupt occurred at a faster rate, like say twice the rollover, we wouldn't have a problem. But it doesn't. In most cases it happens at the rollover rate, or just a bit more sometimes due to interrupt latency not
Again does it really - for all arches?
Yep, pretty much. Almost all timers I know of only give one interrupt per cycle.
A bold assumption - and one that is wrong ;)
Ok, except for timers that don't interrupt at all and need an outside assist! In those cases, you need to interrupt faster than the counter period, which as your original constraint. But in that case, the interrupt is coming from elsewhere, not the timer being processed.
being a constant. It may appear to happen at somewhat less than a rollover as well, if the interrupt latency was long at time n-1 and short at time n. I think this problem can be overcome in the old design by keeping track of whether the last update was by a non-interrupt or interrupt call to get_ticks , but I think the new design is better anyway.
I disagree - The whole point of a HAL is to Abstract the Hardware
The HAL should abstract what needs abstracting, and no more. We are really talking about literally 4 lines of code here, but I think those four lines certainly belong in sync_timer. All the math in one place is a good thing. In most cases, get_ticks becomes trivial. Only in cases where the timer is wrong endian or embedded in another bit field (both things being possible but not all that common) does get_ticks need to do anything other than just read the registers and loop until the msb is stable. Simple here is good, because this is what new implementers need to add/change. The less here, the better! In fact, the looping until the msb is stable can also be moved up to sync_timer. That is even simpler.
OK, you got me - "show me the money^H^H^H^H^Hcode"
Wilco! You will get them by Monday Afternoon, California time, or sooner if I can stay awake long enough.
Best Regards, Bill Campbell
Regards,
Graeme

On 28/05/11 01:49, J. William Campbell wrote:
On 5/26/2011 11:54 PM, Graeme Russ wrote:
On Fri, May 27, 2011 at 4:33 PM, J. William Campbell jwilliamcampbell@comcast.net wrote:
On 5/26/2011 9:33 PM, Graeme Russ wrote:
Hi Bill,
<snip>
[massive snip]
[another big snip]
I just realised - the ISR _does not need to call the sync_timebase at all_ The ISR only needs to call get_ticks(), so it will be fast anyway
The problem is that the way we previously detected wrapping does not work if the interrupt rate is == to the counter wrap time, which it essentially always is. If get_ticks is trying to update the wrap count when an interrupt
Is it, always, on every platform?
Yes, pretty much. You get a terminal count/counter underflow interrupt and that is it.
Not on sc520 - The micro/millisecond counter cannot be used to driver an interrupt - you need to setup a seperate timer. I think the x86 internal performance counters are the same
What is true, as you have stated, is that the micro/millisecond counter on the sc520 does not interrupt at all. Nor do the x86 performance counters. The x86 performance counters are a non-problem because they are 64 bits long. We don't need interrupts for them. Now, if you choose to use the sc520 micro/millisecond counter, then you need another source of interrupts. Due to the fact that reading the sc520 counter resets it, we must accumulate the elapsed time in software. That means the interrupt routine must do a bit more work, but it also allows reading the counters in
I had planned to do _all_ of that in get_ticks(), so all the complicated code is in one spot which is called by sync_ms_timer() which is in turned called by either timer_isr() or get_timer()
Of course the problem we have now identified is reentrancy - Looking forward to seeing your solution
Regards,
Graeme

Dear "J. William Campbell",
In message 4DDF2072.5090802@comcast.net you wrote: ...
The problem is that the way we previously detected wrapping does not work if the interrupt rate is == to the counter wrap time, which it essentially always is. If get_ticks is trying to update the wrap count
You ignore the fact that this is only ever a problem when the rollover cannot signal through an interrupt or similar. Also, some processors allow daisy-chaning of timers, etc.
Again, I would really like to know about how many exotic systems we are talking that fulfil your worst-case expectations. I bet the overwhelming majority behaves absolutely harmless.
Best regards,
Wolfgang Denk

On 5/27/2011 12:33 AM, Wolfgang Denk wrote:
Dear "J. William Campbell",
In message4DDF2072.5090802@comcast.net you wrote: ...
The problem is that the way we previously detected wrapping does not work if the interrupt rate is == to the counter wrap time, which it essentially always is. If get_ticks is trying to update the wrap count
You ignore the fact that this is only ever a problem when the rollover cannot signal through an interrupt or similar. Also, some processors allow daisy-chaning of timers, etc.
Again, I would really like to know about how many exotic systems we are talking that fulfil your worst-case expectations. I bet the overwhelming majority behaves absolutely harmless.
Hi Wolfgang, I think that in fact the opposite is true. The problems occur if both the main program and the interrupt routine are trying to update the timer msb using the same code, as we were originally talking about. There is no problem if only the interrupt routine detects the rollover. That is the correct way to go if your interrupts work. There was nothing particularly exotic required. It was the "normal" case. Take a look at what would happen on the PPC is the main program was reading the decrementer, detecting wraps and increasing the timestamp while the interrupt routine was also incrementing the timestamp. Every so often you get a double increment. Why were we doing this? Because I was trying to re-use exactly the same code in the interrupt case and the non-interrupt case. Not a good idea, in fact a bad idea as it turns out.
Best Regards, Bill Campbell
Best regards,
Wolfgang Denk

Dear "J. William Campbell",
In message 4DDEFDBC.7050403@comcast.net you wrote:
I really STRONGLY disagree with this statement. If you actually needed 64 bit variables, fine use them. But as I have already shown, you do not need them in general. We are computing a 32 bit result. There is some entropy argument that says you shouldn't need 64 bits to do so. Another way to look at it is that converting the top 32 bit word and the bottom 32 bit word to ms separately can be easier than doing them both together at once. However, as we will see below, I do agree we need two 32 bit words to make this process go smoothly. I just don't agree that they should/will constitute a 64 bit binary word. See below.
And I disagree with this.
Yes, that is the problem. I have come to the view that two 32 bit words are the best approach. Note that the lsb may actually not fill the full 32 bits. The top 32 bits are the rollover count and the bottom 32 bits are the current counter. If the counter is a full 32 bits, so much the better. Again, one could put them together inside the interrupt routine , but it is easier to check for a changed value if you don't do this.
It's even easier if you use a single 64 bit variable, because then you can simply use ==.
Otherwise, you have to check both words. It also makes the isr faster.
Drop any thoughts about "make FOO faster" for now, please. Especially at this stage it is much more important to have a simple and clean design. If split in two variables, even a simple read access will turn into code like
do { upper = timebase_upper; lower = timebase_lower; } while (upper != timebase_upper);
This is not exactly as simple as you claimed.
Best regards,
Wolfgang Denk

On 5/27/2011 12:28 AM, Wolfgang Denk wrote:
Dear "J. William Campbell",
In message4DDEFDBC.7050403@comcast.net you wrote:
I really STRONGLY disagree with this statement. If you actually needed 64 bit variables, fine use them. But as I have already shown, you do not need them in general. We are computing a 32 bit result. There is some entropy argument that says you shouldn't need 64 bits to do so. Another way to look at it is that converting the top 32 bit word and the bottom 32 bit word to ms separately can be easier than doing them both together at once. However, as we will see below, I do agree we need two 32 bit words to make this process go smoothly. I just don't agree that they should/will constitute a 64 bit binary word. See below.
And I disagree with this.
Hi Wolfgang, OK, I hear you.
Yes, that is the problem. I have come to the view that two 32 bit words are the best approach. Note that the lsb may actually not fill the full 32 bits. The top 32 bits are the rollover count and the bottom 32 bits are the current counter. If the counter is a full 32 bits, so much the better. Again, one could put them together inside the interrupt routine , but it is easier to check for a changed value if you don't do this.
It's even easier if you use a single 64 bit variable, because then you can simply use ==.
In general, no you can't, or at least you probably don't want to. . If you are reading a 64 bit performance counter, it is quite likely that you cannot read it twice without the "clock" having "ticked". If the CPU executes 1 instruction (or fewer(if an SPR/memory reference is involved?) per performance counter tick, which is the goal of the performance counter, == is an infinite loop!!!! A similar condition exists if you are combining a software counter with a fairly fast hardware counter. It might require flipping the hardware counter (if it is a down counter) and a 64 bit multiply add, which must be done in software/a subroutine if the cpu has no 64 by 64 multiply. By the time that is done, the timer LSB may have ticked. Consider the m68K case.
Otherwise, you have to check both words. It also makes the isr faster.
Drop any thoughts about "make FOO faster" for now, please. Especially at this stage it is much more important to have a simple and clean design. If split in two variables, even a simple read access will turn into code like
do { upper = timebase_upper; lower = timebase_lower; } while (upper != timebase_upper);
This is not exactly as simple as you claimed.
True, but if you look at a lot of 64 bit performance counters, that is EXACTLY what the handbook book recommends on how to read them. There is no atomic way to read them both at once, and reading one half doesn't freeze the other half. This code is also required if timebase_upper is altered in the interrupt routine. YMMV, but in a lot, dare I say most, cases this is required anyway. And while the code is more complex than a simple assignment statement, it is not very complex.
Best Regards, Bill Campbell
Best regards,
Wolfgang Denk

Dear Graeme Russ,
In message BANLkTinWvY9b4QzeLNawF7MKT9z1zeMXyg@mail.gmail.com you wrote:
I think we will need to define get_timer() weak - Nios will have to override the default implementation to cater for it's (Nios') limitations
Please don't - isn't the purpose of this whole discussion to use common code for this ?
Best regards,
Wolfgang Denk

Hi Wolfgang,
On 27/05/11 17:13, Wolfgang Denk wrote:
Dear Graeme Russ,
In message BANLkTinWvY9b4QzeLNawF7MKT9z1zeMXyg@mail.gmail.com you wrote:
I think we will need to define get_timer() weak - Nios will have to override the default implementation to cater for it's (Nios') limitations
Please don't - isn't the purpose of this whole discussion to use common code for this ?
Yes, but Nios is particularly bad - It has a 10ms tick counter :(
I don't see reason for hamstring other platforms when a simply weak function can get around it
Regards,
Graeme

Dear Graeme Russ,
In message 4DDF543D.6020506@gmail.com you wrote:
I think we will need to define get_timer() weak - Nios will have to override the default implementation to cater for it's (Nios') limitations
Please don't - isn't the purpose of this whole discussion to use common code for this ?
Yes, but Nios is particularly bad - It has a 10ms tick counter :(
I don't see reason for hamstring other platforms when a simply weak function can get around it
Why does NIOS require a different get_timer() implementation?
Nobody claims that get_timer() has any specific resolution. It is perfectly legal that a loop like
for (;;) { u32 t = get_time(); printf("t=%ul\n", t); }
returns 100 millisecond increments.
Best regards,
Wolfgang Denk

Hi Wolfgang
On Friday, May 27, 2011, Wolfgang Denk wd@denx.de wrote:
Dear Graeme Russ,
In message 4DDF543D.6020506@gmail.com you wrote:
I think we will need to define get_timer() weak - Nios will have to override the default implementation to cater for it's (Nios') limitations
Please don't - isn't the purpose of this whole discussion to use common code for this ?
Yes, but Nios is particularly bad - It has a 10ms tick counter :(
I don't see reason for hamstring other platforms when a simply weak function can get around it
Why does NIOS require a different get_timer() implementation?
Nobody claims that get_timer() has any specific resolution. It is perfectly legal that a loop like
for (;;) { u32 t = get_time();
printf("t=%ul\n", t); }
returns 100 millisecond increments.
Except everyone expects it to tick at something vaguely close to 1ms. When you comment about accuracy, I didn't expect 1000% error was acceptable...
Regards,
Graeme

On Friday, May 27, 2011, Graeme Russ graeme.russ@gmail.com wrote:
Hi Wolfgang
On Friday, May 27, 2011, Wolfgang Denk wd@denx.de wrote:
Dear Graeme Russ,
In message 4DDF543D.6020506@gmail.com you wrote:
I think we will need to define get_timer() weak - Nios will have to override the default implementation to cater for it's (Nios') limitations
Please don't - isn't the purpose of this whole discussion to use common code for this ?
Yes, but Nios is particularly bad - It has a 10ms tick counter :(
I don't see reason for hamstring other platforms when a simply weak function can get around it
Why does NIOS require a different get_timer() implementation?
Nobody claims that get_timer() has any specific resolution. It is perfectly legal that a loop like
for (;;) { u32 t = get_time();
printf("t=%ul\n", t); }
returns 100 millisecond increments.
Except everyone expects it to tick at something vaguely close to 1ms. When you comment about accuracy, I didn't expect 1000% error was acceptable...
Besides, Nios can return an increment of 10 (presumably ms) between two immediately consecutive calls. This causes early timeouts in CFI driver
Regards,
Graeme

Dear Graeme Russ,
In message BANLkTik2SUm4Sm8aLjCrCmz+kcMGWgEzKw@mail.gmail.com you wrote:
Besides, Nios can return an increment of 10 (presumably ms) between two immediately consecutive calls. This causes early timeouts in CFI driver
Now this in turn is a bug in the timer implementation that needs to be fixed.
Best regards,
Wolfgang Denk

Hi Wolfgang
On Friday, May 27, 2011, Wolfgang Denk wd@denx.de wrote:
Dear Graeme Russ,
In message BANLkTik2SUm4Sm8aLjCrCmz+kcMGWgEzKw@mail.gmail.com you wrote:
Besides, Nios can return an increment of 10 (presumably ms) between two immediately consecutive calls. This causes early timeouts in CFI driver
Now this in turn is a bug in the timer implementation that needs to be fixed.
Agreed, but that is not something I can achieve - I don't want to hold up this whole show that we have all put so much effort into for the sake of one weak function
Regards,
Graeme

Graeme Russ wrote:
Hi Wolfgang
On Friday, May 27, 2011, Wolfgang Denk wd@denx.de wrote:
Dear Graeme Russ,
In message BANLkTik2SUm4Sm8aLjCrCmz+kcMGWgEzKw@mail.gmail.com you wrote:
Besides, Nios can return an increment of 10 (presumably ms) between two immediately consecutive calls. This causes early timeouts in CFI driver
Now this in turn is a bug in the timer implementation that needs to be fixed.
And this is what reset_timer() corrected.
Agreed, but that is not something I can achieve - I don't want to hold up this whole show that we have all put so much effort into for the sake of one weak function
And I don't want to see something that currently works become broken because we "improved" a feature ... simply because the resolution of the timestamp is 10 msec rather than 1 msec.
And just to be clear. This is not a Nios issue. Currently, if the timestamp is incremented via a fixed period interrupt, and the period of the interrupt is longer that 1 msec, calls to get_timer() may produce early timeouts ... regardless of platform.
--Scott

On 5/27/2011 6:07 AM, Scott McNutt wrote:
Graeme Russ wrote:
Hi Wolfgang
On Friday, May 27, 2011, Wolfgang Denk wd@denx.de wrote:
Dear Graeme Russ,
In message BANLkTik2SUm4Sm8aLjCrCmz+kcMGWgEzKw@mail.gmail.com you wrote:
Besides, Nios can return an increment of 10 (presumably ms) between two immediately consecutive calls. This causes early timeouts in CFI driver
Now this in turn is a bug in the timer implementation that needs to be fixed.
And this is what reset_timer() corrected.
Agreed, but that is not something I can achieve - I don't want to hold up this whole show that we have all put so much effort into for the sake of one weak function
And I don't want to see something that currently works become broken because we "improved" a feature ... simply because the resolution of the timestamp is 10 msec rather than 1 msec.
And just to be clear. This is not a Nios issue. Currently, if the timestamp is incremented via a fixed period interrupt, and the period of the interrupt is longer that 1 msec, calls to get_timer() may produce early timeouts ... regardless of platform.
Hi All, A more precise statement of the problem is that all timer delays may be shortened by the timer resolution. So this means that if you have a timeout of 1 ms in your get_time(0) { } while ( ... < 1), then your actual delay may be anywhere between 0 and 1 ms. The problem arises when some piece of common code uses a delay of say 8 millisec, expecting the actual delay to be between 7 and 8. If the resolution is 10 ms, the delay will be between 0 and 10 ms, 0 being particularly bad. This can be fixed in get_timer, making the 8 ms delay become a minimum of 10 ms at the expense of it becoming up to 20 ms sometimes. Since these delays are used mostly for error conditions, making them longer will probably be ok, and doesn't require changing any of the common code. It probably will not make things slower either, because the error timeouts should not be reached. The reset of the hardware timer would cause all "short" delays to become 10 ms. This reset approach is bad in that it prevents proper nesting of timing loops. However, in this case it isn't so bad, in that the nested loops are just extended, not shortened. Note that if the reset is only resetting the HARDWARE interrupt generator, not the actual timestamp itself, we are just extending all existing timeouts by 0 to 10 ms.. So this just lengthens all pending timeouts. The other fix is in my opinion nicer, because it affects the nest loops less. If the inner loop is executed 100 times, with the reset, the outer loop timeout is extended by up to 1000 ms.
Best Regards, Bill Campbell
--Scott

On Fri, May 27, 2011 at 8:00 AM, J. William Campbell jwilliamcampbell@comcast.net wrote: [snip]
Hi All, A more precise statement of the problem is that all timer delays may be shortened by the timer resolution. So this means that if you have a timeout of 1 ms in your get_time(0) { } while ( ... < 1), then your actual delay may be anywhere between 0 and 1 ms. The problem arises when some piece of common code uses a delay of say 8 millisec, expecting the actual delay to be between 7 and 8. If the resolution is 10 ms, the delay will be between 0 and 10 ms, 0 being particularly bad. This can be fixed in get_timer, making the 8 ms delay become a minimum of 10 ms at the expense of it becoming up to 20 ms sometimes. Since these delays are used mostly for error conditions, making them longer will probably be ok, and doesn't require changing any of the common code. It probably will not make things slower either, because the error timeouts should not be reached. The reset of the hardware timer would cause all "short" delays to become 10 ms. This reset approach is bad in that it prevents proper nesting of timing loops. However, in this case it isn't so bad, in that the nested loops are just extended, not shortened. Note that if the reset is only resetting the HARDWARE interrupt generator, not the actual timestamp itself, we are just extending all existing timeouts by 0 to 10 ms.. So this just lengthens all pending timeouts. The other fix is in my opinion nicer, because it affects the nest loops less. If the inner loop is executed 100 times, with the reset, the outer loop timeout is extended by up to 1000 ms.
Best Regards, Bill Campbell
Hi Bill,
Yes I agree that this is ugly - I didn't realize that this is what reset_timer() does, but I think these 10ms platforms should have to live with the fact that timeouts will be 0-10ms longer than hoped. Perhaps reset_timer() should become a non-standard board thing that is deprecated. Really if you have a 10ms timer and are asking for a 10ms timeout you are being a bit hopeful.
But perhaps this argues for a function to check timeouts - at the moment get_timer() returns the time since an event and it is used at the start of the loop and the end. Perhaps we should have:
#define TIMEOUTMS 2000
stop_time = get_future_time(TIMEOUT_MS); // Returns current time + TIMEOUT_MS + (resolution of timer) while (get_timer(stop_time) < 0) // (I would much prefer while (!timed_out(stop_time)) wait for something }
Regards, Simon

On 5/27/2011 8:13 AM, Simon Glass wrote:
On Fri, May 27, 2011 at 8:00 AM, J. William Campbell jwilliamcampbell@comcast.net wrote: [snip]
Hi All, A more precise statement of the problem is that all timer delays may be shortened by the timer resolution. So this means that if you have a timeout of 1 ms in your get_time(0) { } while ( ...< 1), then your actual delay may be anywhere between 0 and 1 ms. The problem arises when some piece of common code uses a delay of say 8 millisec, expecting the actual delay to be between 7 and 8. If the resolution is 10 ms, the delay will be between 0 and 10 ms, 0 being particularly bad. This can be fixed in get_timer, making the 8 ms delay become a minimum of 10 ms at the expense of it becoming up to 20 ms sometimes. Since these delays are used mostly for error conditions, making them longer will probably be ok, and doesn't require changing any of the common code. It probably will not make things slower either, because the error timeouts should not be reached. The reset of the hardware timer would cause all "short" delays to become 10 ms. This reset approach is bad in that it prevents proper nesting of timing loops. However, in this case it isn't so bad, in that the nested loops are just extended, not shortened. Note that if the reset is only resetting the HARDWARE interrupt generator, not the actual timestamp itself, we are just extending all existing timeouts by 0 to 10 ms.. So this just lengthens all pending timeouts. The other fix is in my opinion nicer, because it affects the nest loops less. If the inner loop is executed 100 times, with the reset, the outer loop timeout is extended by up to 1000 ms.
Best Regards, Bill Campbell
Hi Bill,
Yes I agree that this is ugly - I didn't realize that this is what reset_timer() does, but I think these 10ms platforms should have to live with the fact that timeouts will be 0-10ms longer than hoped. Perhaps reset_timer() should become a non-standard board thing that is deprecated. Really if you have a 10ms timer and are asking for a 10ms timeout you are being a bit hopeful.
Hi All, Yes, but the person writing the driver was writing "common" code. He probably didn't even know there was a timer whose resolution was not 1 ms.
But perhaps this argues for a function to check timeouts - at the moment get_timer() returns the time since an event and it is used at the start of the loop and the end. Perhaps we should have:
#define TIMEOUTMS 2000
stop_time = get_future_time(TIMEOUT_MS); // Returns current time + TIMEOUT_MS + (resolution of timer) while (get_timer(stop_time)< 0) // (I would much prefer while (!timed_out(stop_time)) wait for something }
Regards, Simon
In the existing system, you can get the same result by running the while loop with a condition of (get_timer(base) < TIMEOUTMS + TIMER_RESOLUTION). We could just make TIMER_RESOLUTION a mandatory define for all u-boots. Then common code would be wrong if the TIMER_RESOLUTION were omitted. For all I know, there may be such a define already. Anybody know of one?
Best Regards, Bill Campbell

J. William Campbell wrote:
On 5/27/2011 6:07 AM, Scott McNutt wrote:
Graeme Russ wrote:
Hi Wolfgang
On Friday, May 27, 2011, Wolfgang Denk wd@denx.de wrote:
Dear Graeme Russ,
In message BANLkTik2SUm4Sm8aLjCrCmz+kcMGWgEzKw@mail.gmail.com you wrote:
Besides, Nios can return an increment of 10 (presumably ms) between two immediately consecutive calls. This causes early timeouts in CFI driver
Now this in turn is a bug in the timer implementation that needs to be fixed.
And this is what reset_timer() corrected.
Agreed, but that is not something I can achieve - I don't want to hold up this whole show that we have all put so much effort into for the sake of one weak function
And I don't want to see something that currently works become broken because we "improved" a feature ... simply because the resolution of the timestamp is 10 msec rather than 1 msec.
And just to be clear. This is not a Nios issue. Currently, if the timestamp is incremented via a fixed period interrupt, and the period of the interrupt is longer that 1 msec, calls to get_timer() may produce early timeouts ... regardless of platform.
<snip>
This can be fixed in get_timer, making the 8 ms delay become a minimum of 10 ms at the expense of it becoming up to 20 ms sometimes.
Ok. Now I get it. Thanks.
<snip>
This reset approach is bad in that it prevents proper nesting of timing loops.
In my particular case, because reset_timer() resets the timestamp to zero rather than simply restarting the timer. I believe leaving the timestamp alone would solve the nesting problem.
<snip>
The other fix is in my opinion nicer, because it affects the nest loops less. If the inner loop is executed 100 times, with the reset, the outer loop timeout is extended by up to 1000 ms.
Bill, thank you for explaining -- probably for the nth time -- but it did finally sink in.
Regards, --Scott

On 5/27/2011 8:44 AM, Scott McNutt wrote:
J. William Campbell wrote:
On 5/27/2011 6:07 AM, Scott McNutt wrote:
Graeme Russ wrote:
Hi Wolfgang
On Friday, May 27, 2011, Wolfgang Denk wd@denx.de wrote:
Dear Graeme Russ,
In message BANLkTik2SUm4Sm8aLjCrCmz+kcMGWgEzKw@mail.gmail.com you wrote:
Besides, Nios can return an increment of 10 (presumably ms) between two immediately consecutive calls. This causes early timeouts in CFI driver
Now this in turn is a bug in the timer implementation that needs to be fixed.
And this is what reset_timer() corrected.
Agreed, but that is not something I can achieve - I don't want to hold up this whole show that we have all put so much effort into for the sake of one weak function
And I don't want to see something that currently works become broken because we "improved" a feature ... simply because the resolution of the timestamp is 10 msec rather than 1 msec.
And just to be clear. This is not a Nios issue. Currently, if the timestamp is incremented via a fixed period interrupt, and the period of the interrupt is longer that 1 msec, calls to get_timer() may produce early timeouts ... regardless of platform.
<snip> > This can be fixed in get_timer, making the 8 ms delay become a > minimum of 10 ms at the expense of it becoming up to 20 ms sometimes.
Ok. Now I get it. Thanks.
<snip> > This reset approach is bad in that it prevents proper nesting of > timing loops.
In my particular case, because reset_timer() resets the timestamp to zero rather than simply restarting the timer. I believe leaving the timestamp alone would solve the nesting problem.
<snip> > The other fix is in my opinion nicer, because it affects the nest > loops less. If the inner loop is executed 100 times, with the reset, > the outer loop timeout is extended by up to 1000 ms.
Bill, thank you for explaining -- probably for the nth time -- but it did finally sink in.
Hi Scott, Glad to help, I finally think I understand it myself in looking into it further! I think we have a good way ahead that should keep everything working. We will get you an alpha copy of whatever we do as soon as possible so you can verify we didn't break nios2! Best Regards, Bill Campbell
Regards, --Scott

Dear Scott McNutt,
In message 4DDFA206.5050101@psyent.com you wrote:
Besides, Nios can return an increment of 10 (presumably ms) between two immediately consecutive calls. This causes early timeouts in CFI driver
...
And this is what reset_timer() corrected.
I cannot see how reset_timer() could ever correct the bug that two seuccessive calls to get_timer() return an delta of 10 milliseconds?
Agreed, but that is not something I can achieve - I don't want to hold up this whole show that we have all put so much effort into for the sake of one weak function
And I don't want to see something that currently works become broken because we "improved" a feature ... simply because the resolution of the timestamp is 10 msec rather than 1 msec.
We agree on that. Yet, an implementation with a resolution of 10 milliseconds must only return a new values (incremented by ten missiseconds) after (at least) 10 milliseconds have passed.
What I've been told is that this condition is violated in the code, which would be a bug that needs to be fixed.
And just to be clear. This is not a Nios issue. Currently, if the timestamp is incremented via a fixed period interrupt, and the period of the interrupt is longer that 1 msec, calls to get_timer() may produce early timeouts ... regardless of platform.
Please point out which other implementations show this problem, too, so we can fix these as well.
Best regards,
Wolfgang Denk

On Sun, May 29, 2011 at 8:55 AM, Wolfgang Denk wd@denx.de wrote:
Dear Scott McNutt,
In message 4DDFA206.5050101@psyent.com you wrote:
Besides, Nios can return an increment of 10 (presumably ms) between two immediately consecutive calls. This causes early timeouts in CFI driver
...
And this is what reset_timer() corrected.
I cannot see how reset_timer() could ever correct the bug that two seuccessive calls to get_timer() return an delta of 10 milliseconds?
Agreed, but that is not something I can achieve - I don't want to hold up this whole show that we have all put so much effort into for the sake of one weak function
And I don't want to see something that currently works become broken because we "improved" a feature ... simply because the resolution of the timestamp is 10 msec rather than 1 msec.
We agree on that. Yet, an implementation with a resolution of 10 milliseconds must only return a new values (incremented by ten missiseconds) after (at least) 10 milliseconds have passed.
Hi Wolfgang,
Sure if you are tracking the timer, and wait for it to increment, and then wait for it to increment a second time, you can be confident that the time between the first and second increments is 10ms.
But in general it is possible that your first read of the timer happens at 9.999ms after the timer last incremented, just because you are unlucky. Then perhaps two successive reads of the timer only 1us apart will see a difference of 10ms in time.
I believe this resolution problem could/should be solved by a function which returns a time not less than n ms in the future. It would work by returning something like current_time_ms + (delay_ms + resolution_ms * 2 - 1) / resolution_ms * resolution_ms, where resolution_ms is 10 in this case. I think someone else mentioned that too.
When the timer reaches that time then it is guaranteed that at least delay_ms ms has passed, although it might be up to delay_ms + 10.
Regards, Simon
What I've been told is that this condition is violated in the code, which would be a bug that needs to be fixed.
And just to be clear. This is not a Nios issue. Currently, if the timestamp is incremented via a fixed period interrupt, and the period of the interrupt is longer that 1 msec, calls to get_timer() may produce early timeouts ... regardless of platform.
Please point out which other implementations show this problem, too, so we can fix these as well.
Best regards,
Wolfgang Denk
-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Q: How many DEC repairman does it take to fix a flat? A: Five; four to hold the car up and one to swap tires. Q: How long does it take? A: It's indeterminate. It will depend upon how many flats they've brought with them. Q: What happens if you've got TWO flats? A: They replace your generator. _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

Dear Simon Glass,
In message BANLkTim-JZyMmzw_tWHhHQQYoVmK9mZyVw@mail.gmail.com you wrote:
Sure if you are tracking the timer, and wait for it to increment, and then wait for it to increment a second time, you can be confident that the time between the first and second increments is 10ms.
OK. Good.
But in general it is possible that your first read of the timer happens at 9.999ms after the timer last incremented, just because you are unlucky. Then perhaps two successive reads of the timer only 1us apart will see a difference of 10ms in time.
Agreed.
I believe this resolution problem could/should be solved by a function which returns a time not less than n ms in the future. It would work by returning something like current_time_ms + (delay_ms + resolution_ms * 2 - 1) / resolution_ms * resolution_ms, where resolution_ms is 10 in this case. I think someone else mentioned that too.
When the timer reaches that time then it is guaranteed that at least delay_ms ms has passed, although it might be up to delay_ms + 10.
We (well, Detlev and me) discussed this. We think it is important to realize (and to document) that the timing information provided by get_timer() is inherently subject to the principles of interval arithmetics with an implementation dependent interval width.
So far, most (all ?) of the code ignored this property, or silently assumed that the interval width was equal to or better than 1 milli- second which is the timing unit used by get_timer().
I think it is important to realize the (most important) use cases of get_timer(): 1) [longish] timeouts and 2) other timing computations. For completeness, I also include a third situation here: 0) [short] timeouts:
0) [short] timeouts:
Short timeouts are obviously all timeouts that are shorter than one millisecond - it is obvious that get_timer() cannot be used for these, and instead we use udelay() based delay loops.
Note that this method also can be used for longer timeouts, especially when we really have to wait for some event to happen, i. e. when we cannot do "something useful" while waiting.
A typical implementation to wait for some event with a timeout of 5 seconds might look like this:
int loops = 10; /* timeout after 10 * 500 milliseconds */
while (test_for_event() == 0) { if (--loops <= 0) handle_timeout();
udelay(500000); /delay 500 millisecond */ }
Note that this implementation has the disadvantage that it may introduce an unnecessary delay of up to 500 milliseconds if the event happens quickly after the call to test_for_event(), so typically it is more efficient to use a large number of short loops intead - due to the loop overhead these are less accurate, but for timeouts this usually does not matter, but at least they insert less heavy delays. Example:
int loops = 50000; /* timeout after 50,000 * 100 microseconds */
while (test_for_event() == 0) { if (--loops <= 0) handle_timeout();
udelay(100); }
Here we lose a maximim of 100 us in the delay call.
The inherent disadvantage of these delay based timeout loops is the low accuracy - depending on the loop overhead (for example the time spent in the test_for_event() call and on the size of the argument to the udelay() call the total execution time of the loop is always bigger than the assumed duration, especially for short delays. Usually this is not a problem in such contexts; also it is possible to trade accuracy for additional delays after the event happens - see first code example above.
1) [longish] timeouts:
Timeouts in the order of several milliseconds or longer are frequently implemented using get_timer(). Above 5 second timeout could be implemented like this:
u32 start,now; u32 timeout = 5 * CONFIG_SYS_HZ; /* timeout after 5 seconds */
start = get_timer(0);
while (test_for_event() == 0) { now = get_timer(0); if ((now - start) > timeout) handle_timeout();
udelay(100); }
Here the loop overhead caused by short delay which may result in a high number of calls to test_for_event() gets eliminated because the time reference is independet of the delay.
2) Other timing computations:
These are usually used to measure the time needed to perform some specific actions, for example like this:
u32 start, now;
start = get_timer(0);
do_something_complicated();
now = get_timer(0);
printf("execution time: %ld millisec\n", now - start);
Neither 1) nor 2) take into account that both "start" and "now" time stamps depend on the resolution of the underlying, implementation dependent timer. For large delays this is not critical, but for short delays (few milliseconds) this introduces large errors, and can even be fatal.
One solution to the problem could be to take the interval length into consideration. However, it seems unwise to make such a low level, implementation specific detail visible to normal "application" code.
Instead, we suggest to introduce a new function delta_timer() which hides this implementation detail from the user. delta_timer() will compute the differnce between two time stamps from get_timer(), and will return the number of milliseconds that are guaranteed to have passed AT LEAST between these two moments:
/* * Everybody uses a 1 millisecond interval, #ifdef CONFIG_NIOS2 static inline u32 timer_interval_size(void) { return 10; } #else static inline u32 timer_interval_size(void) { return 1; } #endif
u32 delta_timer(u32 from, u32 to) { u32 delta = to - from;
if (delta < timer_interval_size()) return 0;
return detla - timer_interval_size(); }
We could also design a more complicated API like this one, but I doubt this is needed:
/* * round - used to control rounding: * <0 : round down, return time that has passed AT LEAST * =0 : don't round, provide raw time difference * >0 : round up, return time that has passed AT MOST */ u32 delta_timer(u32 from, u32 to, int round) { u32 delta = to - from;
if (round == 0) /* raw result, no rounding*/ return delta;
if (round > 0) /* round up */ return delta + timer_interval_size();
/* round down */ if (delta < timer_interval_size()) return 0;
return delta - timer_interval_size(); }
So our timeout from case 1) above would now be written like this:
u32 start,now; u32 timeout = 5 * CONFIG_SYS_HZ; /* timeout after 5 seconds */
start = get_timer(0);
while (test_for_event() == 0) { now = get_timer(0);
if (delta_timer(start, now) > timeout) handle_timeout();
udelay(100); }
and would be guaranteed never to terminate early.
Comments?
Best regards,
Wolfgang Denk

Hi Wolfgang,
On 30/05/11 20:57, Wolfgang Denk wrote:
Dear Simon Glass,
In message BANLkTim-JZyMmzw_tWHhHQQYoVmK9mZyVw@mail.gmail.com you wrote:
Sure if you are tracking the timer, and wait for it to increment, and then wait for it to increment a second time, you can be confident that the time between the first and second increments is 10ms.
OK. Good.
But in general it is possible that your first read of the timer happens at 9.999ms after the timer last incremented, just because you are unlucky. Then perhaps two successive reads of the timer only 1us apart will see a difference of 10ms in time.
Agreed.
I believe this resolution problem could/should be solved by a function which returns a time not less than n ms in the future. It would work by returning something like current_time_ms + (delay_ms + resolution_ms * 2 - 1) / resolution_ms * resolution_ms, where resolution_ms is 10 in this case. I think someone else mentioned that too.
When the timer reaches that time then it is guaranteed that at least delay_ms ms has passed, although it might be up to delay_ms + 10.
We (well, Detlev and me) discussed this. We think it is important to realize (and to document) that the timing information provided by get_timer() is inherently subject to the principles of interval arithmetics with an implementation dependent interval width.
Agree - I fully intend to collate all of this information into an API document in the source tree
So far, most (all ?) of the code ignored this property, or silently assumed that the interval width was equal to or better than 1 milli- second which is the timing unit used by get_timer().
Well, we were until I tried to clean up CFI and found out about the Nios limitation - Since then, I think we have all been acutely aware that it was the 'elephant in the room'
I think it is important to realize the (most important) use cases of get_timer(): 1) [longish] timeouts and 2) other timing computations. For completeness, I also include a third situation here: 0) [short] timeouts:
[short] timeouts:
Short timeouts are obviously all timeouts that are shorter than one millisecond - it is obvious that get_timer() cannot be used for these, and instead we use udelay() based delay loops.
Note that this method also can be used for longer timeouts, especially when we really have to wait for some event to happen, i. e. when we cannot do "something useful" while waiting.
A typical implementation to wait for some event with a timeout of 5 seconds might look like this:
int loops = 10; /* timeout after 10 * 500 milliseconds */
while (test_for_event() == 0) { if (--loops <= 0) handle_timeout();
udelay(500000); /delay 500 millisecond */ }
Note that this implementation has the disadvantage that it may introduce an unnecessary delay of up to 500 milliseconds if the event happens quickly after the call to test_for_event(), so typically it is more efficient to use a large number of short loops intead - due to the loop overhead these are less accurate, but for timeouts this usually does not matter, but at least they insert less heavy delays. Example:
int loops = 50000; /* timeout after 50,000 * 100 microseconds */
while (test_for_event() == 0) { if (--loops <= 0) handle_timeout();
udelay(100); }
Here we lose a maximim of 100 us in the delay call.
The inherent disadvantage of these delay based timeout loops is the low accuracy - depending on the loop overhead (for example the time spent in the test_for_event() call and on the size of the argument to the udelay() call the total execution time of the loop is always bigger than the assumed duration, especially for short delays. Usually this is not a problem in such contexts; also it is possible to trade accuracy for additional delays after the event happens - see first code example above.
Some platforms are _way_ worse than this - I am sure I have seen a udelay() done with the millisecond time - So udelay(100) could be closer to udelay(1000) - So your above 5 second delay could take as long as 50 seconds!!!
[longish] timeouts:
Timeouts in the order of several milliseconds or longer are frequently implemented using get_timer(). Above 5 second timeout could be implemented like this:
u32 start,now; u32 timeout = 5 * CONFIG_SYS_HZ; /* timeout after 5 seconds */
start = get_timer(0);
while (test_for_event() == 0) { now = get_timer(0); if ((now - start) > timeout) handle_timeout();
udelay(100); }
Here the loop overhead caused by short delay which may result in a high number of calls to test_for_event() gets eliminated because the time reference is independet of the delay.
I personally think _any_ timeout greater than 1s (maybe even >500ms) should be done this way
Other timing computations:
These are usually used to measure the time needed to perform some specific actions, for example like this:
u32 start, now;
start = get_timer(0);
do_something_complicated();
now = get_timer(0);
printf("execution time: %ld millisec\n", now - start);
Currently fails spectacularly if do_something_complicated() involves a delay loop which calls reset_timer()
I think this may become more popular for performance tuning once the API has been established. The desire for boot profiling was what initially made me investigate the API given the current use of reset_timer() prohibits profiling across a timeout loop.
And you missed one:
3) Raw Delay
When you know an operation takes AT LEAST 'x' ms/us and there is no way to detect if the operation has completed before then
do_something(); udelay(500); /* Operation has finished - Keep going */
OR
u32 start;
do_something();
start = get_timer(0); while (get_timer(start) < 500) ; /* Operation has finished - Keep going */
Neither 1) nor 2) take into account that both "start" and "now" time stamps depend on the resolution of the underlying, implementation dependent timer. For large delays this is not critical, but for short delays (few milliseconds) this introduces large errors, and can even be fatal.
One solution to the problem could be to take the interval length into consideration. However, it seems unwise to make such a low level, implementation specific detail visible to normal "application" code.
Agree - The user should be never have to consider individual platform 'quirks'
Instead, we suggest to introduce a new function delta_timer() which hides this implementation detail from the user. delta_timer() will compute the differnce between two time stamps from get_timer(), and will return the number of milliseconds that are guaranteed to have passed AT LEAST between these two moments:
/* * Everybody uses a 1 millisecond interval, #ifdef CONFIG_NIOS2 static inline u32 timer_interval_size(void) { return 10; } #else static inline u32 timer_interval_size(void) { return 1; } #endif
u32 delta_timer(u32 from, u32 to) { u32 delta = to - from;
if (delta < timer_interval_size()) return 0; return detla - timer_interval_size();
}
We could also design a more complicated API like this one, but I doubt this is needed:
/* * round - used to control rounding: * <0 : round down, return time that has passed AT LEAST * =0 : don't round, provide raw time difference * >0 : round up, return time that has passed AT MOST */ u32 delta_timer(u32 from, u32 to, int round) { u32 delta = to - from;
if (round == 0) /* raw result, no rounding*/ return delta; if (round > 0) /* round up */ return delta + timer_interval_size(); /* round down */ if (delta < timer_interval_size()) return 0; return delta - timer_interval_size();
}
So our timeout from case 1) above would now be written like this:
u32 start,now;
u32 timeout = 5 * CONFIG_SYS_HZ; /* timeout after 5 seconds */
start = get_timer(0);
while (test_for_event() == 0) { now = get_timer(0);
if (delta_timer(start, now) > timeout) handle_timeout(); udelay(100);
}
and would be guaranteed never to terminate early.
Comments?
I figured we would need another API function - I was thinking along the lines of:
u32 start = get_current_ms();
while (test_for_event() == 0) { if (time_elapsed(start, timeout)) handle_timeout();
udelay(100); }
I don't like the 'time_elapsed' function name though
Both are essentially identical
Regards,
Graeme

Dear Graeme Russ,
In message 4DE383D3.7020008@gmail.com you wrote:
Some platforms are _way_ worse than this - I am sure I have seen a udelay() done with the millisecond time - So udelay(100) could be closer to udelay(1000) - So your above 5 second delay could take as long as 50 seconds!!!
That should show up quickly as soon as you run a "sleep 5" command on the console (which is implemented like that).
while (test_for_event() == 0) { now = get_timer(0); if ((now - start) > timeout) handle_timeout();
udelay(100);
}
Here the loop overhead caused by short delay which may result in a high number of calls to test_for_event() gets eliminated because the time reference is independet of the delay.
I personally think _any_ timeout greater than 1s (maybe even >500ms) should be done this way
Agreed. As soon as the timeout is >> the interval size of the underlying timer service this is the best we can do.
start = get_timer(0);
do_something_complicated();
now = get_timer(0);
printf("execution time: %ld millisec\n", now - start);
Currently fails spectacularly if do_something_complicated() involves a delay loop which calls reset_timer()
Note that I (intentionally) always used argument 0 in the calls to get_timer(). I think we really should get rid of this argument.
I think this may become more popular for performance tuning once the API has been established. The desire for boot profiling was what initially made me investigate the API given the current use of reset_timer() prohibits profiling across a timeout loop.
Agreed.
Best regards,
Wolfgang Denk

Hi Wolfgang,
On 30/05/11 22:31, Wolfgang Denk wrote:
Dear Graeme Russ,
In message 4DE383D3.7020008@gmail.com you wrote:
Some platforms are _way_ worse than this - I am sure I have seen a udelay() done with the millisecond time - So udelay(100) could be closer to udelay(1000) - So your above 5 second delay could take as long as 50 seconds!!!
That should show up quickly as soon as you run a "sleep 5" command on the console (which is implemented like that).
while (test_for_event() == 0) { now = get_timer(0); if ((now - start) > timeout) handle_timeout();
udelay(100);
}
Here the loop overhead caused by short delay which may result in a high number of calls to test_for_event() gets eliminated because the time reference is independet of the delay.
I personally think _any_ timeout greater than 1s (maybe even >500ms) should be done this way
Agreed. As soon as the timeout is >> the interval size of the underlying timer service this is the best we can do.
start = get_timer(0);
do_something_complicated();
now = get_timer(0);
printf("execution time: %ld millisec\n", now - start);
Currently fails spectacularly if do_something_complicated() involves a delay loop which calls reset_timer()
Note that I (intentionally) always used argument 0 in the calls to get_timer(). I think we really should get rid of this argument.
Agreed - I am more than willing to update all existing timer usages to a completely new set of timer API functions. I think suffering the pain of moving to a more well defined API would be better than trying to clunk around the existing one.
I think we all fairly well agree on how U-Boot will maintain a millisecond (and possibly microsecond) timer in a platform independent manner using a platform defined tick counter. Defining a timer API around that is simply a matter of taste - particularly when it comes to dealing with precision issues. I think your delta function is a good start - However maybe something more like ms_delta(u32 from, u32 to) would be a more appropriate name as it clears the way for us_delta()
Regards,
Graeme

Dear ALL,
it still escapes me why everyone tries to make things so complicated INSIDE the loop.
Why not just define an API like this:
u32 timeout = make_timeout(5); /* minimum 5 millisecond timeout */ u32 start = get_timer();
while ((get_timer() - start) < timeout) ...
make_timeout() can be arch/soc/platform specific and take into account to return at least such a value that the timeout is never cut short. (In case of a 10 ms NIOS timer, make_timeout(5) would have to return the value 20, resulting in a real timeout of at least 10 ms but upto 20 ms )
If anyone sees the need, make_timeout (or what ever it might be called) could have a second parameter, indicating whether round up or round down is desired.
...
I also agree to remove the parameter of get_timer(), but we should also get rid of CONFIG_SYS_HZ.
Reinhard

Hi Reinhard,
On Tue, May 31, 2011 at 4:57 AM, Reinhard Meyer u-boot@emk-elektronik.de wrote:
Dear ALL,
it still escapes me why everyone tries to make things so complicated INSIDE the loop.
Why not just define an API like this:
u32 timeout = make_timeout(5); /* minimum 5 millisecond timeout */ u32 start = get_timer();
while ((get_timer() - start) < timeout) ...
The would work if we typedef'd 'timeout'. Otherwise, one runs the risk of not calling make_timeout() and assuming get_timer() always has a 1ms resolution
make_timeout() can be arch/soc/platform specific and take into account to return at least such a value that the timeout is never cut short. (In case of a 10 ms NIOS timer, make_timeout(5) would have to return the value 20, resulting in a real timeout of at least 10 ms but upto 20 ms )
What about this:
u32 start = get_timer();
while (!timer_expired(start, timeout)) ...
If anyone sees the need, make_timeout (or what ever it might be called) could have a second parameter, indicating whether round up or round down is desired.
...
I also agree to remove the parameter of get_timer(), but we should also get rid of CONFIG_SYS_HZ.
Wholeheartedly agree
Regards,
Graeme

Dear Graeme Russ,
Hi Reinhard,
On Tue, May 31, 2011 at 4:57 AM, Reinhard Meyer u-boot@emk-elektronik.de wrote:
Dear ALL,
it still escapes me why everyone tries to make things so complicated INSIDE the loop.
Why not just define an API like this:
u32 timeout = make_timeout(5); /* minimum 5 millisecond timeout */ u32 start = get_timer();
while ((get_timer() - start)< timeout) ...
The would work if we typedef'd 'timeout'. Otherwise, one runs the risk of not calling make_timeout() and assuming get_timer() always has a 1ms resolution
If you think people cannot follow API conventions, then typedef it...
make_timeout() can be arch/soc/platform specific and take into account to return at least such a value that the timeout is never cut short. (In case of a 10 ms NIOS timer, make_timeout(5) would have to return the value 20, resulting in a real timeout of at least 10 ms but upto 20 ms )
What about this:
u32 start = get_timer();
while (!timer_expired(start, timeout)) ...
Again.. why put the complicated calculations INSIDE the loop?
If anyone sees the need, make_timeout (or what ever it might be called) could have a second parameter, indicating whether round up or round down is desired.
...
I also agree to remove the parameter of get_timer(), but we should also get rid of CONFIG_SYS_HZ.
Wholeheartedly agree
Regards, Reinhard

Hi Reinhard,
On Tue, May 31, 2011 at 2:07 PM, Reinhard Meyer u-boot@emk-elektronik.de wrote:
Dear Graeme Russ,
Hi Reinhard,
On Tue, May 31, 2011 at 4:57 AM, Reinhard Meyer u-boot@emk-elektronik.de wrote:
Dear ALL,
it still escapes me why everyone tries to make things so complicated INSIDE the loop.
Why not just define an API like this:
u32 timeout = make_timeout(5); /* minimum 5 millisecond timeout */ u32 start = get_timer();
while ((get_timer() - start)< timeout) ...
The would work if we typedef'd 'timeout'. Otherwise, one runs the risk of not calling make_timeout() and assuming get_timer() always has a 1ms resolution
If you think people cannot follow API conventions, then typedef it...
make_timeout() can be arch/soc/platform specific and take into account to return at least
Actually, make_timeout() would not need to be platform specific - All it needs is a get_min_ms_resolution() which wuld be a simple inline usually returning a const so the compiler would optimise it away
such a value that the timeout is never cut short. (In case of a 10 ms NIOS timer, make_timeout(5) would have to return the value 20, resulting in a real timeout of at least 10 ms but upto 20 ms )
What about this:
u32 start = get_timer();
while (!timer_expired(start, timeout)) ...
Again.. why put the complicated calculations INSIDE the loop?
Well, the calculations are hidden from the user, and we aren't running a high performance system. Also, the most complex calculations will be performed each time you call get_timer() anyway. The additional overhead of performing a precision rounding will be insignificant
Best to make the API as defensive as possible rather than try to trim off a few CPU instructions per loop.
Regards,
Graeme

Dear Graeme Russ,
Hi Reinhard,
On Tue, May 31, 2011 at 2:07 PM, Reinhard Meyer u-boot@emk-elektronik.de wrote:
Dear Graeme Russ,
Hi Reinhard,
On Tue, May 31, 2011 at 4:57 AM, Reinhard Meyer u-boot@emk-elektronik.de wrote:
Dear ALL,
it still escapes me why everyone tries to make things so complicated INSIDE the loop.
Why not just define an API like this:
u32 timeout = make_timeout(5); /* minimum 5 millisecond timeout */ u32 start = get_timer();
while ((get_timer() - start)< timeout) ...
The would work if we typedef'd 'timeout'. Otherwise, one runs the risk of not calling make_timeout() and assuming get_timer() always has a 1ms resolution
If you think people cannot follow API conventions, then typedef it...
make_timeout() can be arch/soc/platform specific and take into account to return at least
Actually, make_timeout() would not need to be platform specific - All it needs is a get_min_ms_resolution() which wuld be a simple inline usually returning a const so the compiler would optimise it away
such a value that the timeout is never cut short. (In case of a 10 ms NIOS timer, make_timeout(5) would have to return the value 20, resulting in a real timeout of at least 10 ms but upto 20 ms )
What about this:
u32 start = get_timer(); while (!timer_expired(start, timeout)) ...
Again.. why put the complicated calculations INSIDE the loop?
Well, the calculations are hidden from the user, and we aren't running a high performance system. Also, the most complex calculations will be performed each time you call get_timer() anyway. The additional overhead of performing a precision rounding will be insignificant
Best to make the API as defensive as possible rather than try to trim off a few CPU instructions per loop.
Excuse me, but THIS API does not prevent the user to do a "(get_timer() - start) < timeout" inside the loop, making your argument moot. But as I said before, it escapes me why by all means the loop must be more complicated and obscure (on the user side) then essentially necessary...
Regards,
Reinhard

Hi Reinhard,
On Tue, May 31, 2011 at 2:36 PM, Reinhard Meyer u-boot@emk-elektronik.de wrote:
Dear Graeme Russ,
Hi Reinhard,
On Tue, May 31, 2011 at 2:07 PM, Reinhard Meyer u-boot@emk-elektronik.de wrote:
Dear Graeme Russ,
Hi Reinhard,
On Tue, May 31, 2011 at 4:57 AM, Reinhard Meyer u-boot@emk-elektronik.de wrote:
Dear ALL,
it still escapes me why everyone tries to make things so complicated INSIDE the loop.
Why not just define an API like this:
u32 timeout = make_timeout(5); /* minimum 5 millisecond timeout */ u32 start = get_timer();
while ((get_timer() - start)< timeout) ...
The would work if we typedef'd 'timeout'. Otherwise, one runs the risk of not calling make_timeout() and assuming get_timer() always has a 1ms resolution
If you think people cannot follow API conventions, then typedef it...
make_timeout() can be arch/soc/platform specific and take into account to return at least
Actually, make_timeout() would not need to be platform specific - All it needs is a get_min_ms_resolution() which wuld be a simple inline usually returning a const so the compiler would optimise it away
such a value that the timeout is never cut short. (In case of a 10 ms NIOS timer, make_timeout(5) would have to return the value 20, resulting in a real timeout of at least 10 ms but upto 20 ms )
What about this:
u32 start = get_timer();
while (!timer_expired(start, timeout)) ...
Again.. why put the complicated calculations INSIDE the loop?
Well, the calculations are hidden from the user, and we aren't running a high performance system. Also, the most complex calculations will be performed each time you call get_timer() anyway. The additional overhead of performing a precision rounding will be insignificant
Best to make the API as defensive as possible rather than try to trim off a few CPU instructions per loop.
Excuse me, but THIS API does not prevent the user to do a "(get_timer() - start) < timeout" inside the loop, making your argument moot.
Ah true - Oops ;)
But as I said before, it escapes me why by all means the loop must be more complicated and obscure (on the user side) then essentially necessary...
What about Simon's solution (next post):
u32 stop = time_get_future_ms(1234);
while (!time_reached(stop)) ..
I really like the idea of a simple while(!something(whatever))
Regards,
Graeme

Dear Reinhard Meyer,
In message 4DE47046.3010703@emk-elektronik.de you wrote:
Excuse me, but THIS API does not prevent the user to do a "(get_timer() - start) < timeout" inside the loop, making your argument moot.
You can be pretty sure that I will NAK any design that _prevents_ me from doing this when I have specific reasons to do exactly this or something similar.
It is definitely a good idea to provide simple and reliable ways for standard tasks - but you also must provide the freedom to do things differently when the standard way does not fit for a reason or another.
This is also why I consider it mandatory that get_timer() (or time_read() or whatever it is going t be called) uses a standard unit of time like milliseconds, and not som random internal scaling.
Best regards,
Wolfgang Denk

On Mon, May 30, 2011 at 5:24 PM, Graeme Russ graeme.russ@gmail.com wrote:
Hi Reinhard,
On Tue, May 31, 2011 at 4:57 AM, Reinhard Meyer
...
make_timeout() can be arch/soc/platform specific and take into account to return at least such a value that the timeout is never cut short. (In case of a 10 ms NIOS timer, make_timeout(5) would have to return the value 20, resulting in a real timeout of at least 10 ms but upto 20 ms )
What about this:
u32 start = get_timer();
while (!timer_expired(start, timeout)) ...
Hi Graham,
I like this, although I have a small preference for:
u32 stop = time_get_future_ms(1234);
while (!time_reached(stop)) ..
since it possibly means the processing happens up front. However any such function is good and I hope you can add it to your API.
If anyone sees the need, make_timeout (or what ever it might be called) could have a second parameter, indicating whether round up or round down is desired.
...
I also agree to remove the parameter of get_timer(), but we should also get rid of CONFIG_SYS_HZ.
Wholeheartedly agree
SGTM. Things are getting better all the time.
Regards, Simon
Regards,
Graeme _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

Dear Simon Glass,
On Mon, May 30, 2011 at 5:24 PM, Graeme Russgraeme.russ@gmail.com wrote:
Hi Reinhard,
On Tue, May 31, 2011 at 4:57 AM, Reinhard Meyer
...
make_timeout() can be arch/soc/platform specific and take into account to return at least such a value that the timeout is never cut short. (In case of a 10 ms NIOS timer, make_timeout(5) would have to return the value 20, resulting in a real timeout of at least 10 ms but upto 20 ms )
What about this:
u32 start = get_timer(); while (!timer_expired(start, timeout)) ...
Hi Graham,
I like this, although I have a small preference for:
u32 stop = time_get_future_ms(1234);
while (!time_reached(stop)) ..
I would perfectly like such a solution, it is equivalent to what I have been proposing almost a year ago!
since it possibly means the processing happens up front. However any such function is good and I hope you can add it to your API.
Exactly! And (saying it silently) this would not mandate that the now hidden internal timer needs to be in ms units, it could be the bare "natural" tick of the hardware... Making time_get_future() to return the "tick" (in whatever granularity) that has to be passed would reduce time_reached() to a very simple function.
But I get the feeling that exactly this simplicity of above concept is the problem for people that have the urge to invent elaborate and complicated solutions ;)
Regards,
Reinhard

On Tue, May 31, 2011 at 2:53 PM, Reinhard Meyer u-boot@emk-elektronik.de wrote:
Dear Simon Glass,
On Mon, May 30, 2011 at 5:24 PM, Graeme Russgraeme.russ@gmail.com wrote:
Hi Reinhard,
On Tue, May 31, 2011 at 4:57 AM, Reinhard Meyer
...
make_timeout() can be arch/soc/platform specific and take into account to return at least such a value that the timeout is never cut short. (In case of a 10 ms NIOS timer, make_timeout(5) would have to return the value 20, resulting in a real timeout of at least 10 ms but upto 20 ms )
What about this:
u32 start = get_timer();
while (!timer_expired(start, timeout)) ...
Hi Graham,
I like this, although I have a small preference for:
u32 stop = time_get_future_ms(1234);
while (!time_reached(stop)) ..
I would perfectly like such a solution, it is equivalent to what I have been proposing almost a year ago!
Don't forget the API will have a get_current_ms() so we can do duration measurements. So you could still accidentally do:
u32 stop = get_current_ms() + 1234;
bypassing the resolution correction. If time_reached() did the resolution correction, would this solve the problem of API misuse (yes, I know it puts a complicated calculation back in the loop)
since it possibly means the processing happens up front. However any such function is good and I hope you can add it to your API.
Exactly! And (saying it silently) this would not mandate that the now hidden internal timer needs to be in ms units, it could be the bare "natural" tick of the hardware... Making time_get_future() to return the "tick" (in whatever granularity) that has to be passed would reduce time_reached() to a very simple function.
Half the point of refreshing the timer API was to solidify the fact that timers operate on a fixed time base (milliseconds or microseconds) so they can be used trivially for a variety of things (delays, timeouts, durations measurement etc). Ticks can be very short, so doing durations would require 64-bit 'start tick', and a conversion at the end:
u64 start = get_current_tick(); ... do something ... u32 duration = ticks_to_ms(get_current_tick() - start);
Yetch! - We will not be exposing ticks!
But I get the feeling that exactly this simplicity of above concept is the problem for people that have the urge to invent elaborate and complicated solutions ;)
I like simple as much as the next guy - I also like hard to misuse ;)
Regards,
Graeme

Dear Graeme Russ,
On Tue, May 31, 2011 at 2:53 PM, Reinhard Meyer u-boot@emk-elektronik.de wrote:
Dear Simon Glass,
On Mon, May 30, 2011 at 5:24 PM, Graeme Russgraeme.russ@gmail.com wrote:
Hi Reinhard,
On Tue, May 31, 2011 at 4:57 AM, Reinhard Meyer
...
make_timeout() can be arch/soc/platform specific and take into account to return at least such a value that the timeout is never cut short. (In case of a 10 ms NIOS timer, make_timeout(5) would have to return the value 20, resulting in a real timeout of at least 10 ms but upto 20 ms )
What about this:
u32 start = get_timer(); while (!timer_expired(start, timeout)) ...
Hi Graham,
I like this, although I have a small preference for:
u32 stop = time_get_future_ms(1234);
while (!time_reached(stop)) ..
I would perfectly like such a solution, it is equivalent to what I have been proposing almost a year ago!
Don't forget the API will have a get_current_ms() so we can do duration measurements. So you could still accidentally do:
u32 stop = get_current_ms() + 1234;
bypassing the resolution correction. If time_reached() did the resolution correction, would this solve the problem of API misuse (yes, I know it puts a complicated calculation back in the loop)
since it possibly means the processing happens up front. However any such function is good and I hope you can add it to your API.
Exactly! And (saying it silently) this would not mandate that the now hidden internal timer needs to be in ms units, it could be the bare "natural" tick of the hardware... Making time_get_future() to return the "tick" (in whatever granularity) that has to be passed would reduce time_reached() to a very simple function.
Half the point of refreshing the timer API was to solidify the fact that timers operate on a fixed time base (milliseconds or microseconds) so they can be used trivially for a variety of things (delays, timeouts, durations measurement etc). Ticks can be very short, so doing durations would require 64-bit 'start tick', and a conversion at the end:
u64 start = get_current_tick(); ... do something ... u32 duration = ticks_to_ms(get_current_tick() - start);
Yetch! - We will not be exposing ticks!
Moot argument again. Any fast 64 bit tick can be very simply brought into a 32 bit, just sub-ms granularity by a simple right shift. But I would also be happy with 64 bits as well, since all calculations in the loop would be just add/subtracts and no mul/divs.
But I get the feeling that exactly this simplicity of above concept is the problem for people that have the urge to invent elaborate and complicated solutions ;)
I like simple as much as the next guy - I also like hard to misuse ;)
typedefs would prevent accidental misuses, there is no cure against deliberate misuses except peer review...
Reinhard

Dear Graeme Russ,
In message BANLkTikZUkk1EaYqCg=5mB2npva2iE6Lew@mail.gmail.com you wrote:
Don't forget the API will have a get_current_ms() so we can do duration
I don't think we will have this.
We have get_timer() (or, as recently suggested, renamed it into time_read() or similar). We don't need yet another function that dioes the same just by a different name.
bypassing the resolution correction. If time_reached() did the resolution correction, would this solve the problem of API misuse (yes, I know it puts a complicated calculation back in the loop)
Complicated? Come on, guys.
And please don't forget thatthese are usually delay or timeout loops, so who cares how long it takes?
Yetch! - We will not be exposing ticks!
Oh, I'm no so sure about this. We will not use it in common code, but the interface should be available for special purposes.
I like simple as much as the next guy - I also like hard to misuse ;)
NAK. What you today consider "misuse" might actually be a clever solution to my problem tomorrow.
An SPI is good then the standard solution actually covers 99.9% of all use cases and is so convenient to use that you don't even think about doing it differently. But it is extremely useful when you also can use it for things the designer/implementor never even dreamt of.
So don't try to prevent "misuse".
Best regards,
Wolfgang Denk

Hi Wolfgang,
On Tue, May 31, 2011 at 4:03 PM, Wolfgang Denk wd@denx.de wrote:
Dear Graeme Russ,
In message BANLkTikZUkk1EaYqCg=5mB2npva2iE6Lew@mail.gmail.com you wrote:
Don't forget the API will have a get_current_ms() so we can do duration
I don't think we will have this.
We have get_timer() (or, as recently suggested, renamed it into time_read() or similar). We don't need yet another function that dioes the same just by a different name.
No, I did not mean a new function which does the same as an existing function - I just meant there would be a function to give us current ms which means any 'wait_until_ms(x)' style function is ripe for 'abuse'
bypassing the resolution correction. If time_reached() did the resolution correction, would this solve the problem of API misuse (yes, I know it puts a complicated calculation back in the loop)
Complicated? Come on, guys.
And please don't forget thatthese are usually delay or timeout loops, so who cares how long it takes?
Yetch! - We will not be exposing ticks!
Oh, I'm no so sure about this. We will not use it in common code, but the interface should be available for special purposes.
Well yes, the suggested API does expose ticks via get_ticks(). And my current demo implementation also stores ticks in global data as a storage for 'last ticks' in order to do tick deltas to calculate us/ms deltas...
I like simple as much as the next guy - I also like hard to misuse ;)
NAK. What you today consider "misuse" might actually be a clever solution to my problem tomorrow.
Well now that we are doing much more stringent peer reviews on new code (and re-written code) I guess the misuse argument does become a little bit irrelavent
An SPI is good then the standard solution actually covers 99.9% of all use cases and is so convenient to use that you don't even think about doing it differently. But it is extremely useful when you also can use it for things the designer/implementor never even dreamt of.
So don't try to prevent "misuse".
OK
Regards,
Graeme

Dear Reinhard Meyer,
In message 4DE4743C.5040003@emk-elektronik.de you wrote:
Exactly! And (saying it silently) this would not mandate that the now hidden internal timer needs to be in ms units, it could be the bare "natural" tick of the hardware...
Yes. We can throw everything away and restart evolution from the amoeba, or even with a big bang.
Are we discussing API changes or a complete rewrite of everything?
Best regards,
Wolfgang Denk

Dear Wolfgang Denk,
Dear Reinhard Meyer,
In message4DE4743C.5040003@emk-elektronik.de you wrote:
Exactly! And (saying it silently) this would not mandate that the now hidden internal timer needs to be in ms units, it could be the bare "natural" tick of the hardware...
Yes. We can throw everything away and restart evolution from the amoeba, or even with a big bang.
All you can throw into the timer discussion is critics and pointless remarks, but I miss any productive input from you except sometimes pointing out how powerpc does it.
Are we discussing API changes or a complete rewrite of everything?
We DO have most of the tick based functions in the current API...
And as it seems, the current ms based API is a Pandora's box of problems: - granularity problems in some hardware - early timeouts that need complicated solutions inside the loop - complicated algorithms to convert a natural, non-ms tick into ms values
All this would go if the timer would be left in natural ticks of the hardware and simple helper functions be used to convert to and from that to ms, if so desired:
tick_t start = get_tick(); ... tick_t end = get_tick();
printf("Elapsed time in ms: %u", ticks2ms(end - start));
OR:
tick_t end_tick = time_get_future_tick(1234); /* parameter is in ms, tick will be calculated such that it causes at least xxx ms delay */
while (!tick_reached(end_tick)) ...
Perhaps just too simple and too brilliant...
Best Regards, Reinhard

Dear Reinhard Meyer,
In message 4DE47EA1.1090100@emk-elektronik.de you wrote:
All you can throw into the timer discussion is critics and pointless remarks, but I miss any productive input from you except sometimes pointing out how powerpc does it.
Thanks you very much.
We DO have most of the tick based functions in the current API...
There are a number of good reasons NOT to use ticks on the external API level. They have been explained before. Go and re-read the archived.
And as it seems, the current ms based API is a Pandora's box of problems:
- granularity problems in some hardware
- early timeouts that need complicated solutions inside the loop
- complicated algorithms to convert a natural, non-ms tick into ms values
All this would go if the timer would be left in natural ticks of the hardware and simple helper functions be used to convert to and from that to ms, if so desired:
You obviously ignore a lof things here. I will not repeat all the stuff again for you. Go and re-read the thread.
Perhaps just too simple and too brilliant...
It's a shame others don't agree blindly, isn't it?
Wolfgang Denk

On Mon, May 30, 2011 at 3:57 AM, Wolfgang Denk wd@denx.de wrote:
Dear Simon Glass,
In message BANLkTim-JZyMmzw_tWHhHQQYoVmK9mZyVw@mail.gmail.com you wrote:
Sure if you are tracking the timer, and wait for it to increment, and then wait for it to increment a second time, you can be confident that the time between the first and second increments is 10ms.
OK. Good.
But in general it is possible that your first read of the timer happens at 9.999ms after the timer last incremented, just because you are unlucky. Then perhaps two successive reads of the timer only 1us apart will see a difference of 10ms in time.
Agreed.
I believe this resolution problem could/should be solved by a function which returns a time not less than n ms in the future. It would work by returning something like current_time_ms + (delay_ms + resolution_ms * 2 - 1) / resolution_ms * resolution_ms, where resolution_ms is 10 in this case. I think someone else mentioned that too.
When the timer reaches that time then it is guaranteed that at least delay_ms ms has passed, although it might be up to delay_ms + 10.
We (well, Detlev and me) discussed this. We think it is important to realize (and to document) that the timing information provided by get_timer() is inherently subject to the principles of interval arithmetics with an implementation dependent interval width.
So far, most (all ?) of the code ignored this property, or silently assumed that the interval width was equal to or better than 1 milli- second which is the timing unit used by get_timer().
I think it is important to realize the (most important) use cases of get_timer(): 1) [longish] timeouts and 2) other timing computations. For completeness, I also include a third situation here: 0) [short] timeouts:
[lots of horrible things that I believe we all want to deprecate]
Instead, we suggest to introduce a new function delta_timer() which hides this implementation detail from the user. delta_timer() will compute the differnce between two time stamps from get_timer(), and will return the number of milliseconds that are guaranteed to have passed AT LEAST between these two moments:
/* * Everybody uses a 1 millisecond interval, #ifdef CONFIG_NIOS2 static inline u32 timer_interval_size(void) { return 10; } #else static inline u32 timer_interval_size(void) { return 1; } #endif
u32 delta_timer(u32 from, u32 to) { u32 delta = to - from;
if (delta < timer_interval_size()) return 0;
return detla - timer_interval_size(); }
Hi Wolfgang,
I think this is a big step forward from what we might call the 'manually coded' loops.
We could also design a more complicated API like this one, but I doubt this is needed:
I doubt it too.
[snip]
So our timeout from case 1) above would now be written like this:
u32 start,now; u32 timeout = 5 * CONFIG_SYS_HZ; /* timeout after 5 seconds */
start = get_timer(0);
while (test_for_event() == 0) { now = get_timer(0);
if (delta_timer(start, now) > timeout) handle_timeout();
udelay(100); }
and would be guaranteed never to terminate early.
Comments?
Great!
I do think it would be nice to put a time_ prefix before all the time functions, but this is a pretty minor point.
See my other message about computing a future time. But the less ad-hoc time calculation we can leave to callers of get_timer() the better. I think these things are actually a sign of an API which is too low level. There is a certain purity and simplicity with get_timer(), sort of minimalist, but the ugly constructs that people build on top of it need to be considered and brought into the equation too.
Regards, Simon
Best regards,
Wolfgang Denk
-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de "Beware of bugs in the above code; I have only proved it correct, not tried it." - Donald Knuth

Dear Simon Glass,
In message BANLkTi=T_pzB9TOPtQuNZXarvQsHN80P3g@mail.gmail.com you wrote:
I do think it would be nice to put a time_ prefix before all the time functions, but this is a pretty minor point.
Agree.
By now, I also find get_timer() kind of misleading - one might expect from that name that it allocates one of (eventually several available) timers. We should probably rename it into time_read(); the newly suggested function would then become time_delta() [or time_diff()].
See my other message about computing a future time. But the less
I disagree with this, mostly because it seems a too narrow design to me. There is not always and only the need for "wait-until-time-X" type of tasks. The time_delta() way to do things also gives you the option to compare timestamps that have been recorded any time before.
ad-hoc time calculation we can leave to callers of get_timer() the better. I think these things are actually a sign of an API which is too low level. There is a certain purity and simplicity with get_timer(), sort of minimalist, but the ugly constructs that people build on top of it need to be considered and brought into the equation too.
It is certainly a good idea to provide simple and reliable ways for standard tasks - but see sig below.
Best regards,
Wolfgang Denk

Hi Wolfgang,
On Tue, May 31, 2011 at 3:49 PM, Wolfgang Denk wd@denx.de wrote:
Dear Simon Glass,
In message BANLkTi=T_pzB9TOPtQuNZXarvQsHN80P3g@mail.gmail.com you wrote:
I do think it would be nice to put a time_ prefix before all the time functions, but this is a pretty minor point.
Agree.
By now, I also find get_timer() kind of misleading - one might expect from that name that it allocates one of (eventually several available) timers. We should probably rename it into time_read(); the newly suggested function would then become time_delta() [or time_diff()].
I would personally like to see two sets of parallel time_* functions, one for milliseconds and one for microseconds. The microsecond API is going to be critical for boot profiling and I see no reason to think about it now rather than later.
Regards,
Graeme

going to be critical for boot profiling and I see no reason to think about it now rather than later.
going to be critical for boot profiling and I see no reason NOT to think about it now rather than later.
Oops,
Regards,
Graeme

Hi Wolfgang,
Since discussion seems to have died down, I have assumed pseudo consensus and have started hitting the timer cleanup in earnest. All I can say is:
Wow! What a mess ;)
We could also design a more complicated API like this one, but I doubt this is needed:
Well, it is needed if you are measuring how long something has taken (say erasing a Flash, performing an I/O operation, profiling boot-up etc)
/* * round - used to control rounding: * <0 : round down, return time that has passed AT LEAST * =0 : don't round, provide raw time difference * >0 : round up, return time that has passed AT MOST */ u32 delta_timer(u32 from, u32 to, int round) {
[snip]
}
I decided to implement three separate functions:
u32 time_ms_delta_min(u32 from, u32 to) u32 time_ms_delta_max(u32 from, u32 to) u32 time_ms_delta_raw(u32 from, u32 to)
So if you only use one, the rest get stripped out of the binary
The ms_ part allows for:
u32 time_us_delta_min(u32 from, u32 to) u32 time_us_delta_max(u32 from, u32 to) u32 time_us_delta_raw(u32 from, u32 to)
I intend to let the time_us_delta* functions drop to ms resolution of the underlying tick counter is not sub-millisecond. Where the tick counter is microsecond (or better), then arch-specific udelay becomes a trivial implementation of a loop using time_us_now() and time_us_delta_min() - Actually, we can make this a weak default and have arches with a non microsecond tick counter override it.
So our timeout from case 1) above would now be written like this:
u32 start,now;
u32 timeout = 5 * CONFIG_SYS_HZ; /* timeout after 5 seconds */
start = get_timer(0);
I now have: start = time_ms_now();
while (test_for_event() == 0) { now = get_timer(0);
if (delta_timer(start, now) > timeout) handle_timeout(); udelay(100);
}
and would be guaranteed never to terminate early.
Comments?
With the 'time_ms_' prefix, it's starting to get rather long, and I'm pushing a lot of timeout checks beyond 80 columns - especially when existing code has variables like 'start_time_tx' - I'm starting to consider dropping the time_ prefix (all time functions will still have a ms_ or us_ prefix anyway) and rename a lot of variables
Thoughts?
Regards,
Graeme

Hi Graeme,
On Wed, Jun 15, 2011 at 6:17 AM, Graeme Russ graeme.russ@gmail.com wrote:
Hi Wolfgang,
...
/* * round - used to control rounding: * <0 : round down, return time that has passed AT LEAST * =0 : don't round, provide raw time difference * >0 : round up, return time that has passed AT MOST */ u32 delta_timer(u32 from, u32 to, int round) {
[snip]
}
I decided to implement three separate functions:
u32 time_ms_delta_min(u32 from, u32 to) u32 time_ms_delta_max(u32 from, u32 to) u32 time_ms_delta_raw(u32 from, u32 to)
So if you only use one, the rest get stripped out of the binary
Here is m 2p worth:
- the common case is min I think, so let's get rid of the min prefix so everyone will use it without question or needing to read screeds of doc
- would prefer the ms and us at the end as I think it reads better. Getting the time is the important bit - the units are generally at the end
This code from your excellent wiki page bothers me. Can we find a way to shrink it?
now = timer_ms_now(); if (time_ms_delta_min(start, now)> timeout)
How about:
if (time_since_ms(start) > timeout)
The idea of the time 'since' an event is more natural than the delta between then and now which seems more abstract.
The ms_ part allows for:
u32 time_us_delta_min(u32 from, u32 to) u32 time_us_delta_max(u32 from, u32 to) u32 time_us_delta_raw(u32 from, u32 to)
I intend to let the time_us_delta* functions drop to ms resolution of the underlying tick counter is not sub-millisecond. Where the tick counter is microsecond (or better), then arch-specific udelay becomes a trivial implementation of a loop using time_us_now() and time_us_delta_min() - Actually, we can make this a weak default and have arches with a non microsecond tick counter override it.
So our timeout from case 1) above would now be written like this:
u32 start,now; u32 timeout = 5 * CONFIG_SYS_HZ; /* timeout after 5 seconds */
start = get_timer(0);
I now have: start = time_ms_now();
while (test_for_event() == 0) { now = get_timer(0);
if (delta_timer(start, now) > timeout) handle_timeout();
udelay(100); }
and would be guaranteed never to terminate early.
Comments?
With the 'time_ms_' prefix, it's starting to get rather long, and I'm pushing a lot of timeout checks beyond 80 columns - especially when existing code has variables like 'start_time_tx' - I'm starting to consider dropping the time_ prefix (all time functions will still have a ms_ or us_ prefix anyway) and rename a lot of variables
Now I see why you want units at the start. Seems a bit ugly to me - which lines are getting long - do you mean the time delta check line? If so see above, or perhaps it is shorter without _min.
Regards, Simon
Thoughts?
Regards,
Graeme

On 16/06/11 02:03, Simon Glass wrote:
Hi Graeme,
On Wed, Jun 15, 2011 at 6:17 AM, Graeme Russ graeme.russ@gmail.com wrote:
Hi Wolfgang,
...
/* * round - used to control rounding: * <0 : round down, return time that has passed AT LEAST * =0 : don't round, provide raw time difference * >0 : round up, return time that has passed AT MOST */ u32 delta_timer(u32 from, u32 to, int round) {
[snip]
}
I decided to implement three separate functions:
u32 time_ms_delta_min(u32 from, u32 to) u32 time_ms_delta_max(u32 from, u32 to) u32 time_ms_delta_raw(u32 from, u32 to)
So if you only use one, the rest get stripped out of the binary
Here is m 2p worth:
- the common case is min I think, so let's get rid of the min prefix
so everyone will use it without question or needing to read screeds of doc
I don't like this idea - Extrapolate this to dropping the 'ms' common case and you end up with:
u32 time_delta(u32 from, u32 to) u32 time_delta_max(u32 from, u32 to) u32 time_delta_raw(u32 from, u32 to)
u32 time_us_delta(u32 from, u32 to) u32 time_us_delta_max(u32 from, u32 to) u32 time_us_delta_raw(u32 from, u32 to)
Not as grep'able IMHO
- would prefer the ms and us at the end as I think it reads better.
Getting the time is the important bit - the units are generally at the end
Hmm, time_since_ms or time_ms_since - I suppose either reads OK - But if I keep min/max/raw, I find time_min_ms_since (barely) easier in the eye than time_min_since_ms - I'm not bothered either way and will go with the general consensus
This code from your excellent wiki page bothers me. Can we find a way to shrink it?
now = timer_ms_now(); if (time_ms_delta_min(start, now)> timeout)
How about:
if (time_since_ms(start) > timeout)
The idea of the time 'since' an event is more natural than the delta between then and now which seems more abstract.
I tend to agree - I have been thinking about 'since' functions for a while now
[snip]
With the 'time_ms_' prefix, it's starting to get rather long, and I'm pushing a lot of timeout checks beyond 80 columns - especially when existing code has variables like 'start_time_tx' - I'm starting to consider dropping the time_ prefix (all time functions will still have a ms_ or us_ prefix anyway) and rename a lot of variables
Now I see why you want units at the start. Seems a bit ugly to me - which lines are getting long - do you mean the time delta check line? If so see above, or perhaps it is shorter without _min.
An example from drivers/net/ns7520_eth.c:
ulStartJiffies = time_ms_now(); while (time_ms_delta_min(ulStartJiffies, time_ms_now()) < NS7520_MII_NEG_DELAY) {
Could be reduced to:
ulStartJiffies = time_ms_now(); while (time_min_ms_since(ulStartJiffies) < NS7520_MII_NEG_DELAY) {
And with a rename:
start_ms = time_ms_now(); while (time_min_ms_since(start_ms) < NS7520_MII_NEG_DELAY) {
Regards,
Graeme

Hi Graeme,
On Wed, Jun 15, 2011 at 1:38 PM, Graeme Russ graeme.russ@gmail.com wrote:
On 16/06/11 02:03, Simon Glass wrote:
- the common case is min I think, so let's get rid of the min prefix
so everyone will use it without question or needing to read screeds of doc
I don't like this idea - Extrapolate this to dropping the 'ms' common case and you end up with:
u32 time_delta(u32 from, u32 to) u32 time_delta_max(u32 from, u32 to) u32 time_delta_raw(u32 from, u32 to)
u32 time_us_delta(u32 from, u32 to) u32 time_us_delta_max(u32 from, u32 to) u32 time_us_delta_raw(u32 from, u32 to)
Not as grep'able IMHO
The reason I suggested this is not because it is common, in that ms is more common than us. It is that users will find time_ms_delta_min confusing. There is a min, max and raw - which should they use? While everyone on this list is a genius there will be other people who will choose one at random and run with it.
To my mind 'min' is the safe one and the others will be rarely used.
I agree that removing ms is a bad idea. But this is pretty grep'able for sensible values of grep:
u32 time_delta_ms(u32 from, u32 to) u32 time_delta_us(u32 from, u32 to)
[hide these away in the closet: u32 time_delta_max_ms(u32 from, u32 to) u32 time_delta_raw_ms(u32 from, u32 to) u32 time_delta_max_us(u32 from, u32 to) u32 time_delta_raw_us(u32 from, u32 to) ]
int time_since_ms(u32 from)
Also bear in mind that _us() will be seldom used. I will cheerfully use it for boot timing though.
BTW should the deltas return a signed value?
- would prefer the ms and us at the end as I think it reads better.
Getting the time is the important bit - the units are generally at the end
Hmm, time_since_ms or time_ms_since - I suppose either reads OK - But if I keep min/max/raw, I find time_min_ms_since (barely) easier in the eye than time_min_since_ms - I'm not bothered either way and will go with the general consensus
No I mean _ms at the end, so it is always last. time_min_ms_since() sounds like a martian trying to learn English. See my comment above: time_since_ms() - we shouldn't even define time_since_raw_ms() and time_since_max_ms() in my view.
This code from your excellent wiki page bothers me. Can we find a way to shrink it?
now = timer_ms_now(); if (time_ms_delta_min(start, now)> timeout)
How about:
if (time_since_ms(start) > timeout)
The idea of the time 'since' an event is more natural than the delta between then and now which seems more abstract.
I tend to agree - I have been thinking about 'since' functions for a while now
Cool.
[snip]
With the 'time_ms_' prefix, it's starting to get rather long, and I'm pushing a lot of timeout checks beyond 80 columns - especially when existing code has variables like 'start_time_tx' - I'm starting to consider dropping the time_ prefix (all time functions will still have a ms_ or us_ prefix anyway) and rename a lot of variables
Now I see why you want units at the start. Seems a bit ugly to me - which lines are getting long - do you mean the time delta check line? If so see above, or perhaps it is shorter without _min.
An example from drivers/net/ns7520_eth.c:
ulStartJiffies = time_ms_now(); while (time_ms_delta_min(ulStartJiffies, time_ms_now()) < NS7520_MII_NEG_DELAY) {
Could be reduced to:
ulStartJiffies = time_ms_now(); while (time_min_ms_since(ulStartJiffies) < NS7520_MII_NEG_DELAY) {
And with a rename:
start_ms = time_ms_now(); while (time_min_ms_since(start_ms) < NS7520_MII_NEG_DELAY) {
or:
start_ms = time_now_ms(); while (time_since_ms(start_ms) < NS7520_MII_NEG_DELAY) {
Regards, Simon
Regards,
Graeme

Hi Simon,
On Thu, Jun 16, 2011 at 7:58 AM, Simon Glass sjg@chromium.org wrote:
Hi Graeme,
On Wed, Jun 15, 2011 at 1:38 PM, Graeme Russ graeme.russ@gmail.com wrote:
On 16/06/11 02:03, Simon Glass wrote:
- the common case is min I think, so let's get rid of the min prefix
so everyone will use it without question or needing to read screeds of doc
I don't like this idea - Extrapolate this to dropping the 'ms' common case and you end up with:
u32 time_delta(u32 from, u32 to) u32 time_delta_max(u32 from, u32 to) u32 time_delta_raw(u32 from, u32 to)
u32 time_us_delta(u32 from, u32 to) u32 time_us_delta_max(u32 from, u32 to) u32 time_us_delta_raw(u32 from, u32 to)
Not as grep'able IMHO
The reason I suggested this is not because it is common, in that ms is more common than us. It is that users will find time_ms_delta_min confusing. There is a min, max and raw - which should they use? While everyone on this list is a genius there will be other people who will choose one at random and run with it.
To my mind 'min' is the safe one and the others will be rarely used.
I agree that removing ms is a bad idea. But this is pretty grep'able for sensible values of grep:
u32 time_delta_ms(u32 from, u32 to) u32 time_delta_us(u32 from, u32 to)
[hide these away in the closet: u32 time_delta_max_ms(u32 from, u32 to) u32 time_delta_raw_ms(u32 from, u32 to) u32 time_delta_max_us(u32 from, u32 to) u32 time_delta_raw_us(u32 from, u32 to) ]
int time_since_ms(u32 from)
Also bear in mind that _us() will be seldom used. I will cheerfully use it for boot timing though.
BTW should the deltas return a signed value?
No - times are unsigned utilising the entire range of the u32 so (to - from) will always returned the unsigned delta, even if there is a wrap between them. I cannot imagine we would ever need to measure time backward anyway.
- would prefer the ms and us at the end as I think it reads better.
Getting the time is the important bit - the units are generally at the end
Hmm, time_since_ms or time_ms_since - I suppose either reads OK - But if I keep min/max/raw, I find time_min_ms_since (barely) easier in the eye than time_min_since_ms - I'm not bothered either way and will go with the general consensus
No I mean _ms at the end, so it is always last. time_min_ms_since() sounds like a martian trying to learn English. See my comment above:
I read it like: Time API - Minimum Milliseconds Since (Epoch). Last time I checked I wasn't a martian ;)
time_min_since_ms reads like: Time API - Minimum Since Millisecond (Epoch)
Anyway, time_since_ms() is fine by me
time_since_ms() - we shouldn't even define time_since_raw_ms() and time_since_max_ms() in my view.
I agree that the raw versions should be dropped entirely. The max versions should be kept (and will be used for boot profiling as you are after the maximum time an operation could possibly have run)
This code from your excellent wiki page bothers me. Can we find a way to shrink it?
now = timer_ms_now(); if (time_ms_delta_min(start, now)> timeout)
How about:
if (time_since_ms(start) > timeout)
The idea of the time 'since' an event is more natural than the delta between then and now which seems more abstract.
I tend to agree - I have been thinking about 'since' functions for a while now
Cool.
[snip]
With the 'time_ms_' prefix, it's starting to get rather long, and I'm pushing a lot of timeout checks beyond 80 columns - especially when existing code has variables like 'start_time_tx' - I'm starting to consider dropping the time_ prefix (all time functions will still have a ms_ or us_ prefix anyway) and rename a lot of variables
Now I see why you want units at the start. Seems a bit ugly to me - which lines are getting long - do you mean the time delta check line? If so see above, or perhaps it is shorter without _min.
An example from drivers/net/ns7520_eth.c:
ulStartJiffies = time_ms_now(); while (time_ms_delta_min(ulStartJiffies, time_ms_now()) < NS7520_MII_NEG_DELAY) {
Could be reduced to:
ulStartJiffies = time_ms_now(); while (time_min_ms_since(ulStartJiffies) < NS7520_MII_NEG_DELAY) {
And with a rename:
start_ms = time_ms_now(); while (time_min_ms_since(start_ms) < NS7520_MII_NEG_DELAY) {
or:
start_ms = time_now_ms(); while (time_since_ms(start_ms) < NS7520_MII_NEG_DELAY) {
OK, So the API could now look like:
/* Functions to retrieve the current value of the timers */ u32 time_now_ms(void); u32 time_now_us(void); u64 time_now_ticks(void);
/* Functions to determine 'minimum' time elapsed (the usual case) */ u32 time_since_ms(u32 start); u32 time_since_us(u32 start); u64 time_since_us(u64 start);
/* Functions to determine 'maximum' time elapsed (performance profiling) */ u32 time_max_since_ms(u32 start); u32 time_max_since_us(u32 start); u64 time_max_since_ticks(u64 start);
/* Default udelay() */ void udelay(u32 delay) { u32 start = time_now_us();
while(time_since_us(start) < delay) ; }
Regards,
Graeme

Hi Graeme,
On Wed, Jun 15, 2011 at 4:09 PM, Graeme Russ graeme.russ@gmail.com wrote: [snip]
BTW should the deltas return a signed value?
No - times are unsigned utilising the entire range of the u32 so (to - from) will always returned the unsigned delta, even if there is a wrap between them. I cannot imagine we would ever need to measure time backward anyway.
Can you please explain this again in different words as I don't follow. The difference between two unsigned times may be +ve or -ve, surely. For example, say:
now = 100 event_time = 200
then the delta is currently -100. Some people might loop until the delta is >= 0. This is like setting a future time and waiting for it to happen. If the delta is unsigned then this won't work.
No I mean _ms at the end, so it is always last. time_min_ms_since() sounds like a martian trying to learn English. See my comment above:
I read it like: Time API - Minimum Milliseconds Since (Epoch). Last time I checked I wasn't a martian ;)
I see. I wonder how you checked? Skin color and height I suppose.
time_min_since_ms reads like: Time API - Minimum Since Millisecond (Epoch)
Anyway, time_since_ms() is fine by me
time_since_ms() - we shouldn't even define time_since_raw_ms() and time_since_max_ms() in my view.
I agree that the raw versions should be dropped entirely. The max versions should be kept (and will be used for boot profiling as you are after the maximum time an operation could possibly have run)
OK I wasn't suggesting dropping raw, just hiding it. My main point was just that the common one should have the obvious name. Anyway, I suppose raw is not that useful.
This code from your excellent wiki page bothers me. Can we find a way to shrink it?
now = timer_ms_now(); if (time_ms_delta_min(start, now)> timeout)
How about:
if (time_since_ms(start) > timeout)
The idea of the time 'since' an event is more natural than the delta between then and now which seems more abstract.
I tend to agree - I have been thinking about 'since' functions for a while now
Cool.
[snip]
With the 'time_ms_' prefix, it's starting to get rather long, and I'm pushing a lot of timeout checks beyond 80 columns - especially when existing code has variables like 'start_time_tx' - I'm starting to consider dropping the time_ prefix (all time functions will still have a ms_ or us_ prefix anyway) and rename a lot of variables
Now I see why you want units at the start. Seems a bit ugly to me - which lines are getting long - do you mean the time delta check line? If so see above, or perhaps it is shorter without _min.
An example from drivers/net/ns7520_eth.c:
ulStartJiffies = time_ms_now(); while (time_ms_delta_min(ulStartJiffies, time_ms_now()) < NS7520_MII_NEG_DELAY) {
Could be reduced to:
ulStartJiffies = time_ms_now(); while (time_min_ms_since(ulStartJiffies) < NS7520_MII_NEG_DELAY) {
And with a rename:
start_ms = time_ms_now(); while (time_min_ms_since(start_ms) < NS7520_MII_NEG_DELAY) {
or:
start_ms = time_now_ms(); while (time_since_ms(start_ms) < NS7520_MII_NEG_DELAY) {
OK, So the API could now look like:
/* Functions to retrieve the current value of the timers */ u32 time_now_ms(void); u32 time_now_us(void); u64 time_now_ticks(void);
/* Functions to determine 'minimum' time elapsed (the usual case) */ u32 time_since_ms(u32 start); u32 time_since_us(u32 start); u64 time_since_us(u64 start);
time_since_ticks()
/* Functions to determine 'maximum' time elapsed (performance profiling) */ u32 time_max_since_ms(u32 start); u32 time_max_since_us(u32 start); u64 time_max_since_ticks(u64 start);
/* Default udelay() */ void udelay(u32 delay) { u32 start = time_now_us();
while(time_since_us(start) < delay) ; }
Well it has my vote - does it solve your 80-column problems? Some might complain about the proliferation of 'since' functions. I don't have a strong view on this - but it is perhaps preferable to having everyone do things manually. The strongly orthogonal nature of your functions helps with understanding anyway I think.
One interesting aspect to me is whether it might be possible for the 'since' functions to have a debug mode which prints out a message when the boot appears to be hung. Sometimes U-Boot hangs on USB or similar and there is no message displayed. I imagine it might be possible to detect a hung or repeating timeout and print a stack trace. Another thing is how much time was spent hanging around during a boot (as a summary at the end). Just a thought, and NOT suggesting doing anything about it now.
Regards, Simon
Regards,
Graeme

Hi Simon,
On Thu, Jun 16, 2011 at 3:53 PM, Simon Glass sjg@chromium.org wrote:
Hi Graeme,
On Wed, Jun 15, 2011 at 4:09 PM, Graeme Russ graeme.russ@gmail.com wrote: [snip]
BTW should the deltas return a signed value?
No - times are unsigned utilising the entire range of the u32 so (to - from) will always returned the unsigned delta, even if there is a wrap between them. I cannot imagine we would ever need to measure time backward anyway.
Can you please explain this again in different words as I don't follow. The difference between two unsigned times may be +ve or -ve, surely. For example, say:
now = 100 event_time = 200
then the delta is currently -100. Some people might loop until the delta is >= 0. This is like setting a future time and waiting for it to happen. If the delta is unsigned then this won't work.
start = 0xffffff00 now = 0x00000010
So there was a timer wrap between 'start' and 'now'
delta = now - start = 0x110
Remember that the new timer API does not rely on the tick counter (and therefore the ms/us timers) to start at zero - It must be allowed to wrap without causing timer glitches at any arbitray time
Regards,
Graeme

On Wed, Jun 15, 2011 at 11:27 PM, Graeme Russ graeme.russ@gmail.com wrote:
Hi Simon,
On Thu, Jun 16, 2011 at 3:53 PM, Simon Glass sjg@chromium.org wrote:
Hi Graeme,
On Wed, Jun 15, 2011 at 4:09 PM, Graeme Russ graeme.russ@gmail.com wrote: [snip]
BTW should the deltas return a signed value?
No - times are unsigned utilising the entire range of the u32 so (to - from) will always returned the unsigned delta, even if there is a wrap between them. I cannot imagine we would ever need to measure time backward anyway.
Can you please explain this again in different words as I don't follow. The difference between two unsigned times may be +ve or -ve, surely. For example, say:
now = 100 event_time = 200
then the delta is currently -100. Some people might loop until the delta is >= 0. This is like setting a future time and waiting for it to happen. If the delta is unsigned then this won't work.
start = 0xffffff00 now = 0x00000010
So there was a timer wrap between 'start' and 'now'
delta = now - start = 0x110
Remember that the new timer API does not rely on the tick counter (and therefore the ms/us timers) to start at zero - It must be allowed to wrap without causing timer glitches at any arbitray time
Hi Graeme,
Yes I'm fine with that last bit.
Taking your example, say now is 0xffffff00. Then:
unsigned stop_at = time_now_ms() + 0x110; // stop_at = 0x10
int diff = time_since_ms(stop_at)
I think this will set diff to -0x110, right? You can then wait until diff goes +ve, at which point you know your timeout has passed. I think it is useful for the time delta to be negative or positive, for this reason. I suppose what I am saying is that there are two ways of doing timeouts:
1. Record the start time, then wait until the delta gets big enough
2. Record the finish time, then wait until it arrives.
The latter can be useful if you have multiple timeouts in progress at once, something I am thinking we may want to do with USB scanning.
Regards, Simon
Regards,
Graeme

Dear Graeme Russ,
In message BANLkTimB4YKkpk10gZkoRmLUYqbPxTHUKQ@mail.gmail.com you wrote:
Nobody claims that get_timer() has any specific resolution. It is perfectly legal that a loop like
for (;;) { u32t = get_time(); printf("t=%ul\n", t); }
returns 100 millisecond increments.
Except everyone expects it to tick at something vaguely close to 1ms. When you comment about accuracy, I didn't expect 1000% error was acceptable...
This is not an error. It is resolution, which is a completely different thing.
Under Linux, gettimeofday() also returns time information in seconds and microseconds, yet the same loop as above will print the time in 10 ms jumps when you set HZ=100.
This is normal behaviour for this type of functions.
Best regards,
Wolfgang Denk

On 5/27/2011 12:35 AM, Graeme Russ wrote:
Hi Wolfgang,
On 27/05/11 17:13, Wolfgang Denk wrote:
Dear Graeme Russ,
In messageBANLkTinWvY9b4QzeLNawF7MKT9z1zeMXyg@mail.gmail.com you wrote:
I think we will need to define get_timer() weak - Nios will have to override the default implementation to cater for it's (Nios') limitations
Please don't - isn't the purpose of this whole discussion to use common code for this ?
Yes, but Nios is particularly bad - It has a 10ms tick counter :(
Hi All, And a hardware timer that you can't read to subdivide the 10 ms. Note that this is not necessarily a problem with all NIOS implementations. The timer characteristics can be controlled when you generate the bitstream for the FPGA. You can make the counter both faster and readable if you want. It just uses a bit more silicon. Sad to say, it probably will require per board get_ticks routine. For the "old" nios2 timers however, overriding get_timer with a /board routine is probably the only way to go.
l
I don't see reason for hamstring other platforms when a simply weak function can get around it
Agree. Best Regards, Bill Campbel
Regards,
Graeme

Hi Bill,
On 28/05/11 00:23, J. William Campbell wrote:
On 5/27/2011 12:35 AM, Graeme Russ wrote:
Hi Wolfgang,
On 27/05/11 17:13, Wolfgang Denk wrote:
Dear Graeme Russ,
In messageBANLkTinWvY9b4QzeLNawF7MKT9z1zeMXyg@mail.gmail.com you wrote:
I think we will need to define get_timer() weak - Nios will have to override the default implementation to cater for it's (Nios') limitations
Please don't - isn't the purpose of this whole discussion to use common code for this ?
Yes, but Nios is particularly bad - It has a 10ms tick counter :(
Hi All, And a hardware timer that you can't read to subdivide the 10 ms. Note that this is not necessarily a problem with all NIOS implementations. The timer characteristics can be controlled when you generate the bitstream for the FPGA. You can make the counter both faster and readable if you want. It just uses a bit more silicon. Sad to say, it probably will require per board get_ticks routine. For the "old" nios2 timers however, overriding get_timer with a /board routine is probably the only way to go.
Actually, using the logic you presented in the "Remove calls to [get,reset]_timer outside arch/" thread, we could have a common get_timer() implementation which deals with all timer resolution issues:
#if !defined(CONFIG_MIN_TIMER_RESOLUTION) #define CONFIG_MIN_TIMER_RESOLUTION 1 #endif
u32 get_timer(u32 base) { if (base != 0) { if (timer - base < (CONFIG_MIN_TIMER_RESOLUTION * 2)) return 0; else return timer - base - CONFIG_MIN_TIMER_RESOLUTION; } else { return timer; } }
This would also guarantee that timeout loops run for _at least_ the timeout period
Regards,
Graeme

Dear Graeme Russ,
u32 get_timer(u32 base) { if (base != 0) { if (timer - base< (CONFIG_MIN_TIMER_RESOLUTION * 2)) return 0; else return timer - base - CONFIG_MIN_TIMER_RESOLUTION; } else { return timer; } }
This is not good. get_timer() should not do more than just return the timer difference since epoch and not contain any funky logic.
I might point again to my futile suggestion of simply calculating the "end tick" of a timeout loop at begin of the loop and inside the loop just checking whether that "end tick" has passed.
When that were used, none of those complicated quirks that were devised in this thread would be needed. Just calculate the "end tick" by rounding it up by one and all is pretty simple...
Best Regards, Reinhard

On 28/05/11 16:18, Reinhard Meyer wrote:
Dear Graeme Russ,
u32 get_timer(u32 base) { if (base != 0) { if (timer - base< (CONFIG_MIN_TIMER_RESOLUTION * 2)) return 0; else return timer - base - CONFIG_MIN_TIMER_RESOLUTION; } else { return timer; } }
This is not good. get_timer() should not do more than just return the timer difference since epoch and not contain any funky logic.
I might point again to my futile suggestion of simply calculating the "end tick" of a timeout loop at begin of the loop and inside the loop just checking whether that "end tick" has passed.
When that were used, none of those complicated quirks that were devised in this thread would be needed. Just calculate the "end tick" by rounding it up by one and all is pretty simple...
This will not work - Example (10ms wait with 10ms timer resolution)
get_timer(0) @ 109ms returns 100
end_time = 110
end_time will be reached in 1ms, not 10 as expected
To make this work, you will need to do the resolution adjustment _at the timing loop_ - This is the last place to expect to have to make such an adjustment
Alternatively, add _another_ API function: end_time() which does the adjustment:
end_time = end_time(timeout);
while (get_timer() < end_time) { /* Do Stuff */ }
But this also is not failsafe:
end_time = get_timer() + timeout
is, to me, a more natural expression of what is desired and what many programmers are likely to write (naively ignorant of resolution concerns)
It's all a matter of degrees and what is the lesser of two (or three or four) evils ;)
Regards.
Graeme

On 5/27/2011 10:53 PM, Graeme Russ wrote:
Hi Bill,
On 28/05/11 00:23, J. William Campbell wrote:
On 5/27/2011 12:35 AM, Graeme Russ wrote:
Hi Wolfgang,
On 27/05/11 17:13, Wolfgang Denk wrote:
Dear Graeme Russ,
In messageBANLkTinWvY9b4QzeLNawF7MKT9z1zeMXyg@mail.gmail.com you wrote:
I think we will need to define get_timer() weak - Nios will have to override the default implementation to cater for it's (Nios') limitations
Please don't - isn't the purpose of this whole discussion to use common code for this ?
Yes, but Nios is particularly bad - It has a 10ms tick counter :(
Hi All, And a hardware timer that you can't read to subdivide the 10 ms. Note that this is not necessarily a problem with all NIOS implementations. The timer characteristics can be controlled when you generate the bitstream for the FPGA. You can make the counter both faster and readable if you want. It just uses a bit more silicon. Sad to say, it probably will require per board get_ticks routine. For the "old" nios2 timers however, overriding get_timer with a /board routine is probably the only way to go.
Actually, using the logic you presented in the "Remove calls to [get,reset]_timer outside arch/" thread, we could have a common get_timer() implementation which deals with all timer resolution issues:
#if !defined(CONFIG_MIN_TIMER_RESOLUTION) #define CONFIG_MIN_TIMER_RESOLUTION 1 #endif
u32 get_timer(u32 base) { if (base != 0) { if (timer - base< (CONFIG_MIN_TIMER_RESOLUTION * 2)) return 0; else return timer - base - CONFIG_MIN_TIMER_RESOLUTION; } else { return timer; } }
This would also guarantee that timeout loops run for _at least_ the timeout period
Hi Graeme, You are almost correct. The problem is that if you are calling get_timer early on in start-up, the timer can actually be 0. The loop initializing get_timer(0) will then return 0. If you can initialize the timer to a number !=0, then the above code works. If the timer is totally hardware derived, such an initialization may not be easy. For nios2, it is easy since the counter is software interrupt driven. It can just be initialized to 1. The 10 ms updates will then run like normal. It also won't work if the user is doing his own timer arithmetic by subtracting get_timer(0) values, but there is nothing we can do about that case anyway so that case is moot. Good job of spotting this approach.
Best Regards, Bill Campbell
Regards,
Graeme

Dear Graeme Russ,
In message 4DDE5548.3020906@gmail.com you wrote:
Assumed Capabilities of the Platform
- Has a 'tick counter' that does not rely on software to increment
I think we should delete the "does not rely on software to increment" part. It is not really essential.
- tick interval may by a fixed constant which cannot be controlled via software, or it could be programmable (PIT)
API Functions (/lib/timer.c)
- u32 get_timer(u32 start)
- Returns the number of elapsed milliseconds since 'start'
If you really want to make the code flexible, then say "returns the number of "1/CONFIG_SYS_HZ" time units since 'start'"
The 1 ms resolution is actually tied to the CONFIG_SYS_HZ definition (which is the reason why I always fought that CONFIG_SYS_HZ must be defined as 1000).
Of course we could also drop this definition completely...
API Functions (/arch/...)
- void udelay(u32 delay)
- Used for 'short' delays (generally up to several seconds)
no. only good for delays well below one second.
- Can use the tick counter if it is fast enough - MUST NOT RESET THE TICK COUNTER
Should ne not state that the tick counter must never be reset during the life-time -f U-Boot?
We should also note that neither get_timer() nor udelay() make any guarantee for the precision of the returned timing information, except that udelay(N) is always supposed to delay for _at_least_ N microseconds.
'Helper' Functions (/lib/timer.c)
- void sync_timebase(void)
Can be a macro (to avoid call overhead).
- Updates the millisecond timer
We need to define that term.. Eventually you want to change the definition of get_timer() into "returns the content of the millisecond timer" or similar.
- Utilises HAL functions to access the platform's tick counter - Must be called more often than the rollover period of the platform's tick counter - Does not need to be called with a regular frequency (immune to interrupt skew) - Is always called by get_timer() - For platforms with short tick counter rollovers it should be called by an ISR - Bill Campbell wrote a good example which proved this can be common and arbitrary (and optionally free of divides and capable of maintaining accuracy even if the tick frequency is not an even division of 1ms)
HAL Functions (/arch/... or /board/...)
- u64 get_ticks(void)
- Returns a tick count as an unsigned 64-bit integer
- Abstracts the implementation of the platform tick counter (platform may by 32-bit 3MHz counter, might be a 16-bit 0-999 microsecond plus 16-bit 0-65535 millisecond etc)
- u64 ticks_per_millisecond()
- Returns the number of ticks (as returned by get_ticks()) per millisecond
I recommend to stick with the existing ticks2usec() and usec2ticks(); you can then use usec2ticks(1000) to get the ticks per millisecond.
- void timer_isr()
- Optional (particularly if tick counter rollover period is exceptionally log which is usually the case for native 64-bit tick counters)
- Simply calls sync_timebase()
- For platforms without any tick counter, this can implement one (but accuracy will be harmed due to usage of disable_interrupts() and enable_interrupts() in U-Boot
So to get the new API up and running, only two functions are mandatory:
get_ticks() which reads the hardware tick counter and deals with any 'funny stuff' including rollovers, short timers (12-bit for example), composite counters (16-bit 0-999 microsecond + 16-bit millisecond) and maintains a 'clean' 64-bit tick counter which rolls over from all 1's to all 0's. The 64-bit tick counter does not need to be reset to zero ever (even on startup
- sync_timebase tacks care of all the details)
ticks_per_millisecond() simply return the number of ticks in a millisecond
- This may as simple as:
inline u64 ticks_per_millisecond(void) { return CONFIG_SYS_TICK_PER_MS; }
Better move this part up to the description of usec2ticks() / ticks2usec().
Best regards,
Wolfgang Denk

On Fri, May 27, 2011 at 3:49 AM, Wolfgang Denk wd@denx.de wrote:
Dear Graeme Russ,
In message 4DDE5548.3020906@gmail.com you wrote:
Assumed Capabilities of the Platform - Has a 'tick counter' that does not rely on software to increment
I think we should delete the "does not rely on software to increment" part. It is not really essential.
I am sure there has been confusion with a misunderstanding the the tick counter is NOT interrupt driven
- tick interval may by a fixed constant which cannot be controlled via software, or it could be programmable (PIT)
API Functions (/lib/timer.c) - u32 get_timer(u32 start) - Returns the number of elapsed milliseconds since 'start'
If you really want to make the code flexible, then say "returns the number of "1/CONFIG_SYS_HZ" time units since 'start'"
The 1 ms resolution is actually tied to the CONFIG_SYS_HZ definition (which is the reason why I always fought that CONFIG_SYS_HZ must be defined as 1000).
Of course we could also drop this definition completely...
I think we should - If CONFIG_SYS_HZ _MUST_ be 1000 anyway, what is the point. Also, get_timer() utilisation as it stands for the most part already assumes a 1ms time base. Maybe we should change get_timer() to get_ms_timer() to avoid any ambiguity
API Functions (/arch/...) - void udelay(u32 delay) - Used for 'short' delays (generally up to several seconds)
no. only good for delays well below one second.
- Can use the tick counter if it is fast enough - MUST NOT RESET THE TICK COUNTER
Should ne not state that the tick counter must never be reset during the life-time -f U-Boot?
Yes, I think you are right
We should also note that neither get_timer() nor udelay() make any guarantee for the precision of the returned timing information, except that udelay(N) is always supposed to delay for _at_least_ N microseconds.
Correct
'Helper' Functions (/lib/timer.c) - void sync_timebase(void)
Can be a macro (to avoid call overhead).
True
- Updates the millisecond timer
We need to define that term.. Eventually you want to change the definition of get_timer() into "returns the content of the millisecond timer" or similar.
- Utilises HAL functions to access the platform's tick counter - Must be called more often than the rollover period of the platform's tick counter - Does not need to be called with a regular frequency (immune to interrupt skew) - Is always called by get_timer() - For platforms with short tick counter rollovers it should be called by an ISR - Bill Campbell wrote a good example which proved this can be common and arbitrary (and optionally free of divides and capable of maintaining accuracy even if the tick frequency is not an even division of 1ms)
HAL Functions (/arch/... or /board/...) - u64 get_ticks(void) - Returns a tick count as an unsigned 64-bit integer - Abstracts the implementation of the platform tick counter (platform may by 32-bit 3MHz counter, might be a 16-bit 0-999 microsecond plus 16-bit 0-65535 millisecond etc) - u64 ticks_per_millisecond() - Returns the number of ticks (as returned by get_ticks()) per millisecond
I recommend to stick with the existing ticks2usec() and usec2ticks(); you can then use usec2ticks(1000) to get the ticks per millisecond.
Can do - And that would mean a sync_us_timer() would call usec2ticks(1)
- void timer_isr() - Optional (particularly if tick counter rollover period is exceptionally log which is usually the case for native 64-bit tick counters) - Simply calls sync_timebase() - For platforms without any tick counter, this can implement one (but accuracy will be harmed due to usage of disable_interrupts() and enable_interrupts() in U-Boot
So to get the new API up and running, only two functions are mandatory:
get_ticks() which reads the hardware tick counter and deals with any 'funny stuff' including rollovers, short timers (12-bit for example), composite counters (16-bit 0-999 microsecond + 16-bit millisecond) and maintains a 'clean' 64-bit tick counter which rolls over from all 1's to all 0's. The 64-bit tick counter does not need to be reset to zero ever (even on startup
- sync_timebase tacks care of all the details)
ticks_per_millisecond() simply return the number of ticks in a millisecond
- This may as simple as:
inline u64 ticks_per_millisecond(void) { return CONFIG_SYS_TICK_PER_MS; }
Better move this part up to the description of usec2ticks() / ticks2usec().
OK
Regards,
Graeme

Dear Graeme Russ,
In message BANLkTi=Nj09smJ+tTuE4p=rWFz=r9GBhwQ@mail.gmail.com you wrote:
I think we should - If CONFIG_SYS_HZ _MUST_ be 1000 anyway, what is the point. Also, get_timer() utilisation as it stands for the most part already assumes a 1ms time base. Maybe we should change get_timer() to get_ms_timer() to avoid any ambiguity
No. At least not unless you also provide other get_<some unit>_timer() functions which we most likely will not do.
Best regards,
Wolfgang Denk

Hi Wolfgang,
On 27/05/11 17:17, Wolfgang Denk wrote:
Dear Graeme Russ,
In message BANLkTi=Nj09smJ+tTuE4p=rWFz=r9GBhwQ@mail.gmail.com you wrote:
I think we should - If CONFIG_SYS_HZ _MUST_ be 1000 anyway, what is the point. Also, get_timer() utilisation as it stands for the most part already assumes a 1ms time base. Maybe we should change get_timer() to get_ms_timer() to avoid any ambiguity
No. At least not unless you also provide other get_<some unit>_timer() functions which we most likely will not do.
I think you will find most platforms will support get_us_timer() trivially. Those that can't can use get_ms_timer() * 1000 (plus taking wrapping into consideration)
Regards,
Graeme

Dear Graeme Russ,
In message 4DDF53D3.1060504@gmail.com you wrote:
No. At least not unless you also provide other get_<some unit>_timer() functions which we most likely will not do.
I think you will find most platforms will support get_us_timer() trivially. Those that can't can use get_ms_timer() * 1000 (plus taking wrapping into consideration)
To use common code, we have to assume the "cannot" case, and be prepared to get only millisecond resolution - when can then simply use get_timer() right away and keep the code simple.
Best regards,
Wolfgang Denk

On Fri, May 27, 2011 at 12:45 AM, Wolfgang Denk wd@denx.de wrote:
Dear Graeme Russ,
In message 4DDF53D3.1060504@gmail.com you wrote:
No. At least not unless you also provide other get_<some unit>_timer() functions which we most likely will not do.
I think you will find most platforms will support get_us_timer() trivially. Those that can't can use get_ms_timer() * 1000 (plus taking wrapping into consideration)
To use common code, we have to assume the "cannot" case, and be prepared to get only millisecond resolution - when can then simply use get_timer() right away and keep the code simple.
Hi Wolfgang,
Can we just assume for a minute that some people want to make use of the microsecond-resolution timers available on some SOC?
If so then we can either say:
1. Sorry, you're on your own, this is an SOC-specific feature and we won't accept patches which use it. Why do you need it anyway, etc.
2. Yes sure, we have a basic framework, please plumb in here.
I would love to see U-Boot take the second approach. It could be as simple as CONFIG_MICROSECOND_TIMER whereupon the library code avoids defining (and the board code must define) get_timer_us(). I don't even believe it needs to be 64-bit, since microsecond timing is obviously not about time of day. Let's keep it simple, but we should have something IMO.
Regards, Simon
Best regards,
Wolfgang Denk
-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de "Plan to throw one away. You will anyway." - Fred Brooks, "The Mythical Man Month" _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
participants (6)
-
Graeme Russ
-
J. William Campbell
-
Reinhard Meyer
-
Scott McNutt
-
Simon Glass
-
Wolfgang Denk