an off-by-one error in dm_test_rtc_set_get()?

newer
[PATCH v4 0/5] sf: Add...

older
Re: [PATCH v2 13/41] dm: core: Add...

Bin Meng

27 Oct 2021 27 Oct '21

6:43 a.m.

Hi Simon,

gitlab reported the following test error below:

=================================== FAILURES =================================== __________________________ test_ut[ut_dm_rtc_set_get] __________________________ test/py/tests/test_ut.py:43: in test_ut assert output.endswith('Failures: 0') E AssertionError: assert False E + where False = <built-in method endswith of str object at 0x7f3bb792dcb0>('Failures: 0') E + where <built-in method endswith of str object at 0x7f3bb792dcb0> = 'Test: dm_test_rtc_set_get: rtc.c\r\r\nexpected: 27/10/2021 03:38:15\r\r\nactual: 27/10/2021 03:38:14\r\r\ntest/dm/rtc...w, &cmp, 1): Expected 0x0 (0), got 0xffffffea (-22)\r\r\nTest: dm_test_rtc_set_get: rtc.c (flat tree)\r\r\nFailures: 1'.endswith ----------------------------- Captured stdout call ----------------------------- =>

See https://source.denx.de/u-boot/custodians/u-boot-x86/-/jobs/341905

But the same branch same commit, azure test results passed: https://dev.azure.com/bmeng/GitHub/_build/results?buildId=460&view=resul...

It looks like the error is an off-by-one where actual time is 1 second behind the expected time?

expected: 27/10/2021 03:38:15 actual: 27/10/2021 03:38:14

Is this a known issue?

Regards, Bin

Show replies by date

Tom Rini

27 Oct 27 Oct

3:22 p.m.

On Wed, Oct 27, 2021 at 12:43:38PM +0800, Bin Meng wrote:

...

Hi Simon,

gitlab reported the following test error below:

=================================== FAILURES =================================== __________________________ test_ut[ut_dm_rtc_set_get] __________________________ test/py/tests/test_ut.py:43: in test_ut assert output.endswith('Failures: 0') E AssertionError: assert False E + where False = <built-in method endswith of str object at 0x7f3bb792dcb0>('Failures: 0') E + where <built-in method endswith of str object at 0x7f3bb792dcb0> = 'Test: dm_test_rtc_set_get: rtc.c\r\r\nexpected: 27/10/2021 03:38:15\r\r\nactual: 27/10/2021 03:38:14\r\r\ntest/dm/rtc...w, &cmp, 1): Expected 0x0 (0), got 0xffffffea (-22)\r\r\nTest: dm_test_rtc_set_get: rtc.c (flat tree)\r\r\nFailures: 1'.endswith ----------------------------- Captured stdout call ----------------------------- =>

See https://source.denx.de/u-boot/custodians/u-boot-x86/-/jobs/341905

But the same branch same commit, azure test results passed: https://dev.azure.com/bmeng/GitHub/_build/results?buildId=460&view=resul...

It looks like the error is an off-by-one where actual time is 1 second behind the expected time?

expected: 27/10/2021 03:38:15 actual: 27/10/2021 03:38:14

Is this a known issue?

Yes, which is why the test checks for a certain amount of "fuzz" around the return value. I've wondered about if we need to increase that value slightly sometimes, or just live with hitting the re-run failed jobs button on whatever CI system was a bit too slow sometimes.

-- Tom

Bin Meng

3:43 p.m.

On Wed, Oct 27, 2021 at 9:22 PM Tom Rini trini@konsulko.com wrote:

...

On Wed, Oct 27, 2021 at 12:43:38PM +0800, Bin Meng wrote:

...
Hi Simon,

gitlab reported the following test error below:

=================================== FAILURES =================================== __________________________ test_ut[ut_dm_rtc_set_get] __________________________ test/py/tests/test_ut.py:43: in test_ut assert output.endswith('Failures: 0') E AssertionError: assert False E + where False = <built-in method endswith of str object at 0x7f3bb792dcb0>('Failures: 0') E + where <built-in method endswith of str object at 0x7f3bb792dcb0> = 'Test: dm_test_rtc_set_get: rtc.c\r\r\nexpected: 27/10/2021 03:38:15\r\r\nactual: 27/10/2021 03:38:14\r\r\ntest/dm/rtc...w, &cmp, 1): Expected 0x0 (0), got 0xffffffea (-22)\r\r\nTest: dm_test_rtc_set_get: rtc.c (flat tree)\r\r\nFailures: 1'.endswith ----------------------------- Captured stdout call ----------------------------- =>

See https://source.denx.de/u-boot/custodians/u-boot-x86/-/jobs/341905

But the same branch same commit, azure test results passed: https://dev.azure.com/bmeng/GitHub/_build/results?buildId=460&view=resul...

It looks like the error is an off-by-one where actual time is 1 second behind the expected time?

expected: 27/10/2021 03:38:15 actual: 27/10/2021 03:38:14

Is this a known issue?

Yes, which is why the test checks for a certain amount of "fuzz" around the return value. I've wondered about if we need to increase that value slightly sometimes, or just live with hitting the re-run failed jobs button on whatever CI system was a bit too slow sometimes.

Thanks Tom. So it is a known issue. It's just my first time seeing this failure in CI :)

Regards, Bin

Rasmus Villemoes

28 Oct 28 Oct

9:22 a.m.

On 27/10/2021 15.22, Tom Rini wrote:

...

On Wed, Oct 27, 2021 at 12:43:38PM +0800, Bin Meng wrote:

...
Hi Simon,

gitlab reported the following test error below:

=================================== FAILURES =================================== __________________________ test_ut[ut_dm_rtc_set_get] __________________________ test/py/tests/test_ut.py:43: in test_ut assert output.endswith('Failures: 0') E AssertionError: assert False E + where False = <built-in method endswith of str object at 0x7f3bb792dcb0>('Failures: 0') E + where <built-in method endswith of str object at 0x7f3bb792dcb0> = 'Test: dm_test_rtc_set_get: rtc.c\r\r\nexpected: 27/10/2021 03:38:15\r\r\nactual: 27/10/2021 03:38:14\r\r\ntest/dm/rtc...w, &cmp, 1): Expected 0x0 (0), got 0xffffffea (-22)\r\r\nTest: dm_test_rtc_set_get: rtc.c (flat tree)\r\r\nFailures: 1'.endswith ----------------------------- Captured stdout call ----------------------------- =>

See https://source.denx.de/u-boot/custodians/u-boot-x86/-/jobs/341905

But the same branch same commit, azure test results passed: https://dev.azure.com/bmeng/GitHub/_build/results?buildId=460&view=resul...

It looks like the error is an off-by-one where actual time is 1 second behind the expected time?

expected: 27/10/2021 03:38:15 actual: 27/10/2021 03:38:14

Is this a known issue?

Yes, which is why the test checks for a certain amount of "fuzz" around the return value.

You said the same thing about dm_test_rtc_reset() in https://lore.kernel.org/u-boot/20210831124441.GC858@bill-the-cat/ , but I can't find anything about any fuzz in the code. Could you point out where you think that's implemented? In both cases, the expected and actual values were just 1 from each other, and I fail to see how any fuzz value could be smaller than that.

I've wondered about if we need to increase that value

...

slightly sometimes, or just live with hitting the re-run failed jobs button on whatever CI system was a bit too slow sometimes.

It has nothing to do with a CI being slow, it's plain and simple buggy test code AFAICT. It's essentially "assert(time(NULL) == time(NULL))". If a call to time() takes 1us, do this a million times and it will on average fail once. Obviously, a loaded system increases the chance of being preempted between the two calls and hence effectively increases the delta and proportionally the probability of hitting this.

Rasmus

Tom Rini

1:58 p.m.

On Thu, Oct 28, 2021 at 09:22:30AM +0200, Rasmus Villemoes wrote:

...

On 27/10/2021 15.22, Tom Rini wrote:

...
On Wed, Oct 27, 2021 at 12:43:38PM +0800, Bin Meng wrote:

...
Hi Simon,

gitlab reported the following test error below:

=================================== FAILURES =================================== __________________________ test_ut[ut_dm_rtc_set_get] __________________________ test/py/tests/test_ut.py:43: in test_ut assert output.endswith('Failures: 0') E AssertionError: assert False E + where False = <built-in method endswith of str object at 0x7f3bb792dcb0>('Failures: 0') E + where <built-in method endswith of str object at 0x7f3bb792dcb0> = 'Test: dm_test_rtc_set_get: rtc.c\r\r\nexpected: 27/10/2021 03:38:15\r\r\nactual: 27/10/2021 03:38:14\r\r\ntest/dm/rtc...w, &cmp, 1): Expected 0x0 (0), got 0xffffffea (-22)\r\r\nTest: dm_test_rtc_set_get: rtc.c (flat tree)\r\r\nFailures: 1'.endswith ----------------------------- Captured stdout call ----------------------------- =>

See https://source.denx.de/u-boot/custodians/u-boot-x86/-/jobs/341905

But the same branch same commit, azure test results passed: https://dev.azure.com/bmeng/GitHub/_build/results?buildId=460&view=resul...

It looks like the error is an off-by-one where actual time is 1 second behind the expected time?

expected: 27/10/2021 03:38:15 actual: 27/10/2021 03:38:14

Is this a known issue?

Yes, which is why the test checks for a certain amount of "fuzz" around the return value.

You said the same thing about dm_test_rtc_reset() in https://lore.kernel.org/u-boot/20210831124441.GC858@bill-the-cat/ , but I can't find anything about any fuzz in the code. Could you point out where you think that's implemented? In both cases, the expected and actual values were just 1 from each other, and I fail to see how any fuzz value could be smaller than that.

I've wondered about if we need to increase that value

...
slightly sometimes, or just live with hitting the re-run failed jobs button on whatever CI system was a bit too slow sometimes.

It has nothing to do with a CI being slow, it's plain and simple buggy test code AFAICT. It's essentially "assert(time(NULL) == time(NULL))". If a call to time() takes 1us, do this a million times and it will on average fail once. Obviously, a loaded system increases the chance of being preempted between the two calls and hence effectively increases the delta and proportionally the probability of hitting this.

You're right, I confused this with the sleep test, which does have a bit of fuzz to it.

-- Tom

Simon Glass

25 Nov 25 Nov

1:11 a.m.

Hi,

I filed

https://source.denx.de/u-boot/u-boot/-/issues/4

Regards, Simon

On Thu, 28 Oct 2021 at 05:58, Tom Rini trini@konsulko.com wrote:

...

On Thu, Oct 28, 2021 at 09:22:30AM +0200, Rasmus Villemoes wrote:

...
On 27/10/2021 15.22, Tom Rini wrote:

...
On Wed, Oct 27, 2021 at 12:43:38PM +0800, Bin Meng wrote:

...
Hi Simon,

gitlab reported the following test error below:

=================================== FAILURES =================================== __________________________ test_ut[ut_dm_rtc_set_get] __________________________ test/py/tests/test_ut.py:43: in test_ut assert output.endswith('Failures: 0') E AssertionError: assert False E + where False = <built-in method endswith of str object at 0x7f3bb792dcb0>('Failures: 0') E + where <built-in method endswith of str object at 0x7f3bb792dcb0> = 'Test: dm_test_rtc_set_get: rtc.c\r\r\nexpected: 27/10/2021 03:38:15\r\r\nactual: 27/10/2021 03:38:14\r\r\ntest/dm/rtc...w, &cmp, 1): Expected 0x0 (0), got 0xffffffea (-22)\r\r\nTest: dm_test_rtc_set_get: rtc.c (flat tree)\r\r\nFailures: 1'.endswith ----------------------------- Captured stdout call ----------------------------- =>

See https://source.denx.de/u-boot/custodians/u-boot-x86/-/jobs/341905

But the same branch same commit, azure test results passed: https://dev.azure.com/bmeng/GitHub/_build/results?buildId=460&view=resul...

It looks like the error is an off-by-one where actual time is 1 second behind the expected time?

expected: 27/10/2021 03:38:15 actual: 27/10/2021 03:38:14

Is this a known issue?

Yes, which is why the test checks for a certain amount of "fuzz" around the return value.

You said the same thing about dm_test_rtc_reset() in https://lore.kernel.org/u-boot/20210831124441.GC858@bill-the-cat/ , but I can't find anything about any fuzz in the code. Could you point out where you think that's implemented? In both cases, the expected and actual values were just 1 from each other, and I fail to see how any fuzz value could be smaller than that.

I've wondered about if we need to increase that value

...
slightly sometimes, or just live with hitting the re-run failed jobs button on whatever CI system was a bit too slow sometimes.

It has nothing to do with a CI being slow, it's plain and simple buggy test code AFAICT. It's essentially "assert(time(NULL) == time(NULL))". If a call to time() takes 1us, do this a million times and it will on average fail once. Obviously, a loaded system increases the chance of being preempted between the two calls and hence effectively increases the delta and proportionally the probability of hitting this.

You're right, I confused this with the sleep test, which does have a bit of fuzz to it.

-- Tom

1259

Age (days ago)

1288

Last active (days ago)

List overview

Download

5 comments

4 participants

tags (0)

participants (4)

Bin Meng
Rasmus Villemoes
Simon Glass
Tom Rini