
Hi Simon,
On Wed, Aug 31, 2022 at 7:27 PM Simon Glass sjg@chromium.org wrote:
Hi Tony,
On Wed, 31 Aug 2022 at 19:39, Tony Dinh mibodhi@gmail.com wrote:
Hi Stefan,
On Wed, Aug 31, 2022 at 2:53 PM Tony Dinh mibodhi@gmail.com wrote:
Hi Stefan,
On Wed, Aug 31, 2022 at 8:08 AM Tony Dinh mibodhi@gmail.com wrote:
Hi Stefan,
On Wed, Aug 31, 2022 at 12:22 AM Stefan Roese sr@denx.de wrote:
Hi Tony,
On 31.08.22 08:30, Tony Dinh wrote:
Hi Stefan,
On Tue, Aug 30, 2022 at 10:08 PM Stefan Roese sr@denx.de wrote: > > Hi Tony, > > On 31.08.22 07:02, Stefan Roese wrote: >> Hi Tony, >> >> On 31.08.22 00:15, Tony Dinh wrote: >>> Hi Stefan, >>> >>> On Tue, Aug 30, 2022 at 4:53 AM Stefan Roese sr@denx.de wrote: >>>> >>>> This patchset enhaces the recently added Orion Timer driver to support >>>> all other Kirkwood & 32bit MVEBU Armada platforms. Additionally, this >>>> timer support is then enabled per default for those platforms, so that >>>> the board config files don't need to be changed. Also necessary is >>>> some dts hacking, so that the timer DT node is available in early >>>> U-Boot stages. >>>> >>>> I've successfully tested this patchset on an Armada XP board. Additional >>>> test on other boards and platforms are very welcome and necessary. >>> >>> I've run some tests with the following 2 Kirkwood boards: Cloud >>> Engines Pogo V4 88F6192 (with CONFIG_DM_RTC and CONFIG_RTC_EMULATION), >>> and Marvell Sheevaplug 88F6281 (with CONFIG_DM_RTC and CONFIG_RTC_MV). >>> >>> It seems that it was either frozen or the timer did not expire at some >>> subsequent sleep commands. Sometime it happened at 2nd command, some >>> time at a later sleep command. For example, >>> >>> === Pogo V4 (the 1st sleep command works correctly at 10 seconds on my >>> stopwatch) >>> >>> U-Boot 2022.10-rc3-00048-g66ccd87a9c-dirty (Aug 30 2022 - 13:38:24 -0700) >>> Pogoplug V4 >>> >>> Hit any key to stop autoboot: 0 >>> Pogo_V4> sleep 10 >>> Pogo_V4> sleep 31.5 >>> <frozen here> >> >> Does the cmd interface support fractial numbers? Please test again with >> 31 or other integral numbers. >> >>> === Sheevaplug (RTC battery is old, so the date was not updated, but >>> the clock seems OK) >>> >>> U-Boot 2022.10-rc3-00048-g66ccd87a9c-dirty (Aug 30 2022 - 14:14:24 -0700) >>> Marvell-Sheevaplug >>> Hit any key to stop autoboot: 0 >>> => date >>> Date: 2000-01-01 (Saturday) Time: 0:02:55 >>> => sleep 10 >>> => date >>> Date: 2000-01-01 (Saturday) Time: 0:03:18 >>> => sleep 10 >>> => sleep 20.1 >>> <frozen here> >>> >>> Please let me know what I can do (i.e. perhaps running a debug patch). >> >> Please see above. I assume that the fractional numbers result in very >> long numbers internally, which result in a frozen / hanging system. > > I just tested fractional numbers on another board and hey, it just > works. Learned something new. So we seem to have a problem here. Let > me see, if I can find something.
I've added debug printfs and possibly tracked down this issue. Seems like in the 2nd call to sleep, get_timer(0) did not reset the start number.
cmd/sleep.c static int do_sleep(struct cmd_tbl *cmdtp, int flag, int argc, char *const argv[]) { ulong start = get_timer(0);
"do_sleep got a timer start = 2015" is the 1st call to sleep 5. "do_sleep got a timer start = 16304" is the 2nd call to sleep 10.
<BEGIN log> Pogo_V4> sleep 5 do_sleep got a timer start = 2015 do_sleep delay = 5000 do_sleep delay = 5000 do_sleep sleeping... do_sleep start 2015 curent 100 do_sleep start 2015 curent 200 do_sleep start 2015 curent 300 <snip> do_sleep start 2015 curent 4900 do_sleep end of sleep ... current = 5000 Pogo_V4>
Pogo_V4> sleep 10 do_sleep got a timer start = 16304 do_sleep delay = 10000 do_sleep delay = 10000 do_sleep sleeping...
<snip>
<END log>
So somewhere in the DM timer, "start" got accumulated. I think each get_timer(0) should be a different timer instance. It looks like the same timer instance is used again and again, causing the "start "to grow bigger, and at one point it might just overflow.
Frankly I don't really understand the problem you describe above. What do you mean with "timer instance"? get_timer(0) will return different values, depending on when you call this function. So where exactly is the problem with the 2nd "sleep 10" above?
Please ignore what I said above! I misunderstood what get_timer() does. I'll try again on another KIrkwood board to see if the behavior will be the same.
I've run the tests again with the Seagate GoFlex Home board (Kirkwood 88F6281). Below is the log, with my annotated comments starting with ****. In these tests, current = get_timer(start) is inside the while loop. And I printed out the start and current values when (current % 100 == 0).
<BEGIN LOG> U-Boot 2022.10-rc3-00048-g66ccd87a9c-dirty (Aug 31 2022 - 13:06:04 -0700) Seagate GoFlex Home
SoC: Kirkwood 88F6281_A1 DRAM: 128 MiB Core: 12 devices, 10 uclasses, devicetree: separate NAND: orion_timer_probe successful 256 MiB Loading Environment from NAND... OK In: serial Out: serial Err: serial Net: eth0: ethernet-controller@72000 Hit any key to stop autoboot: 0
GoFlexHome> date Date: 2022-08-31 (Wednesday) Time: 20:43:03
GoFlexHome> sleep 5 do_sleep got a timer start = 1244 do_sleep delay = 5000 do_sleep delay = 5000 do_sleep sleeping... do_sleep start 1244 current 100
<snip> do_sleep start 1244 current 2300 <snip> do_sleep start 1244 current 3700 <snip> do_sleep start 1244 current 4800 do_sleep start 1244 current 4900 do_sleep end of sleep ... current = 5000
**** working as intended
GoFlexHome> sleep 10 do_sleep got a timer start = 11428 do_sleep delay = 10000 do_sleep delay = 10000 do_sleep sleeping... do_sleep start 11428 current 100
<snip> do_sleep start 11428 current 2300 <snip> do_sleep start 11428 current 9900 do_sleep end of sleep ... current = 10000
**** working as intended
GoFlexHome> sleep 20.5 do_sleep got a timer start = 15031 do_sleep delay = 20000 do_sleep delay = 20500 do_sleep sleeping... do_sleep start 15031 current 100
<snip> do_sleep start 15031 current 6400 do_sleep end of sleep ... current = 4294952265
*** Something strange happened here. current should be 6500, but it seems to have garbage. So the loop exits prematurely.
GoFlexHome> sleep 20.5 do_sleep got a timer start = 8339 do_sleep delay = 20000 do_sleep delay = 20500 do_sleep sleeping... do_sleep start 8339 current 100 do_sleep start 8339 current 200
<snip> do_sleep start 8339 current 12300 do_sleep start 8339 current 12400 do_sleep start 8339 current 12500 do_sleep start 8339 current 12600 do_sleep start 8339 current 12700 do_sleep start 8339 current 12800 do_sleep start 8339 current 12900 do_sleep start 8339 current 13000 do_sleep start 8339 current 13100
*** It was frozen right here. Control-C could not interrupt the loop, so my guess *** either it was stuck during the call to get_timer(). Or the current value *** overflowed and crashed the system.
<END LOG>
Finally, I removed the patch set, and rerun the sleep tests on this board. They all worked fine, I did not see anything strange.
Some ideas.
The get_timer() function looks wrong assigning an uint64_t to ulong.
lib/time.c
static uint64_t notrace tick_to_time(uint64_t tick) uint64_t notrace get_ticks(void) uint64_t __weak notrace get_ticks(void) ulong __weak get_timer(ulong base) { return tick_to_time(get_ticks()) - base; }
Most of the timer infrastructure is using uint64_t. I'm seeing this __weak function get_timer was invoked in Kirkwood boards. Both in sleep and timer commands.
The get_ticks() thing can run at 1MHz but the timer is 1KHz, so that is why we don't need a u64 for the timer.
Thanks for your explanation! However, would you agree that code is problematic and needed some improvement ? IOW, depending what the compiler does, it might return the 1st 32 bit of the 64-bit integer result?
Thanks, Tony
Regards, Simon