an off-by-one error in dm_test_rtc_set_get()?

Hi Simon,
gitlab reported the following test error below:
=================================== FAILURES =================================== __________________________ test_ut[ut_dm_rtc_set_get] __________________________ test/py/tests/test_ut.py:43: in test_ut assert output.endswith('Failures: 0') E AssertionError: assert False E + where False = <built-in method endswith of str object at 0x7f3bb792dcb0>('Failures: 0') E + where <built-in method endswith of str object at 0x7f3bb792dcb0> = 'Test: dm_test_rtc_set_get: rtc.c\r\r\nexpected: 27/10/2021 03:38:15\r\r\nactual: 27/10/2021 03:38:14\r\r\ntest/dm/rtc...w, &cmp, 1): Expected 0x0 (0), got 0xffffffea (-22)\r\r\nTest: dm_test_rtc_set_get: rtc.c (flat tree)\r\r\nFailures: 1'.endswith ----------------------------- Captured stdout call ----------------------------- =>
See https://source.denx.de/u-boot/custodians/u-boot-x86/-/jobs/341905
But the same branch same commit, azure test results passed: https://dev.azure.com/bmeng/GitHub/_build/results?buildId=460&view=resul...
It looks like the error is an off-by-one where actual time is 1 second behind the expected time?
expected: 27/10/2021 03:38:15 actual: 27/10/2021 03:38:14
Is this a known issue?
Regards, Bin

On Wed, Oct 27, 2021 at 12:43:38PM +0800, Bin Meng wrote:
Hi Simon,
gitlab reported the following test error below:
=================================== FAILURES =================================== __________________________ test_ut[ut_dm_rtc_set_get] __________________________ test/py/tests/test_ut.py:43: in test_ut assert output.endswith('Failures: 0') E AssertionError: assert False E + where False = <built-in method endswith of str object at 0x7f3bb792dcb0>('Failures: 0') E + where <built-in method endswith of str object at 0x7f3bb792dcb0> = 'Test: dm_test_rtc_set_get: rtc.c\r\r\nexpected: 27/10/2021 03:38:15\r\r\nactual: 27/10/2021 03:38:14\r\r\ntest/dm/rtc...w, &cmp, 1): Expected 0x0 (0), got 0xffffffea (-22)\r\r\nTest: dm_test_rtc_set_get: rtc.c (flat tree)\r\r\nFailures: 1'.endswith ----------------------------- Captured stdout call ----------------------------- =>
See https://source.denx.de/u-boot/custodians/u-boot-x86/-/jobs/341905
But the same branch same commit, azure test results passed: https://dev.azure.com/bmeng/GitHub/_build/results?buildId=460&view=resul...
It looks like the error is an off-by-one where actual time is 1 second behind the expected time?
expected: 27/10/2021 03:38:15 actual: 27/10/2021 03:38:14
Is this a known issue?
Yes, which is why the test checks for a certain amount of "fuzz" around the return value. I've wondered about if we need to increase that value slightly sometimes, or just live with hitting the re-run failed jobs button on whatever CI system was a bit too slow sometimes.

On Wed, Oct 27, 2021 at 9:22 PM Tom Rini trini@konsulko.com wrote:
On Wed, Oct 27, 2021 at 12:43:38PM +0800, Bin Meng wrote:
Hi Simon,
gitlab reported the following test error below:
=================================== FAILURES =================================== __________________________ test_ut[ut_dm_rtc_set_get] __________________________ test/py/tests/test_ut.py:43: in test_ut assert output.endswith('Failures: 0') E AssertionError: assert False E + where False = <built-in method endswith of str object at 0x7f3bb792dcb0>('Failures: 0') E + where <built-in method endswith of str object at 0x7f3bb792dcb0> = 'Test: dm_test_rtc_set_get: rtc.c\r\r\nexpected: 27/10/2021 03:38:15\r\r\nactual: 27/10/2021 03:38:14\r\r\ntest/dm/rtc...w, &cmp, 1): Expected 0x0 (0), got 0xffffffea (-22)\r\r\nTest: dm_test_rtc_set_get: rtc.c (flat tree)\r\r\nFailures: 1'.endswith ----------------------------- Captured stdout call ----------------------------- =>
See https://source.denx.de/u-boot/custodians/u-boot-x86/-/jobs/341905
But the same branch same commit, azure test results passed: https://dev.azure.com/bmeng/GitHub/_build/results?buildId=460&view=resul...
It looks like the error is an off-by-one where actual time is 1 second behind the expected time?
expected: 27/10/2021 03:38:15 actual: 27/10/2021 03:38:14
Is this a known issue?
Yes, which is why the test checks for a certain amount of "fuzz" around the return value. I've wondered about if we need to increase that value slightly sometimes, or just live with hitting the re-run failed jobs button on whatever CI system was a bit too slow sometimes.
Thanks Tom. So it is a known issue. It's just my first time seeing this failure in CI :)
Regards, Bin

On 27/10/2021 15.22, Tom Rini wrote:
On Wed, Oct 27, 2021 at 12:43:38PM +0800, Bin Meng wrote:
Hi Simon,
gitlab reported the following test error below:
=================================== FAILURES =================================== __________________________ test_ut[ut_dm_rtc_set_get] __________________________ test/py/tests/test_ut.py:43: in test_ut assert output.endswith('Failures: 0') E AssertionError: assert False E + where False = <built-in method endswith of str object at 0x7f3bb792dcb0>('Failures: 0') E + where <built-in method endswith of str object at 0x7f3bb792dcb0> = 'Test: dm_test_rtc_set_get: rtc.c\r\r\nexpected: 27/10/2021 03:38:15\r\r\nactual: 27/10/2021 03:38:14\r\r\ntest/dm/rtc...w, &cmp, 1): Expected 0x0 (0), got 0xffffffea (-22)\r\r\nTest: dm_test_rtc_set_get: rtc.c (flat tree)\r\r\nFailures: 1'.endswith ----------------------------- Captured stdout call ----------------------------- =>
See https://source.denx.de/u-boot/custodians/u-boot-x86/-/jobs/341905
But the same branch same commit, azure test results passed: https://dev.azure.com/bmeng/GitHub/_build/results?buildId=460&view=resul...
It looks like the error is an off-by-one where actual time is 1 second behind the expected time?
expected: 27/10/2021 03:38:15 actual: 27/10/2021 03:38:14
Is this a known issue?
Yes, which is why the test checks for a certain amount of "fuzz" around the return value.
You said the same thing about dm_test_rtc_reset() in https://lore.kernel.org/u-boot/20210831124441.GC858@bill-the-cat/ , but I can't find anything about any fuzz in the code. Could you point out where you think that's implemented? In both cases, the expected and actual values were just 1 from each other, and I fail to see how any fuzz value could be smaller than that.
I've wondered about if we need to increase that value
slightly sometimes, or just live with hitting the re-run failed jobs button on whatever CI system was a bit too slow sometimes.
It has nothing to do with a CI being slow, it's plain and simple buggy test code AFAICT. It's essentially "assert(time(NULL) == time(NULL))". If a call to time() takes 1us, do this a million times and it will on average fail once. Obviously, a loaded system increases the chance of being preempted between the two calls and hence effectively increases the delta and proportionally the probability of hitting this.
Rasmus

On Thu, Oct 28, 2021 at 09:22:30AM +0200, Rasmus Villemoes wrote:
On 27/10/2021 15.22, Tom Rini wrote:
On Wed, Oct 27, 2021 at 12:43:38PM +0800, Bin Meng wrote:
Hi Simon,
gitlab reported the following test error below:
=================================== FAILURES =================================== __________________________ test_ut[ut_dm_rtc_set_get] __________________________ test/py/tests/test_ut.py:43: in test_ut assert output.endswith('Failures: 0') E AssertionError: assert False E + where False = <built-in method endswith of str object at 0x7f3bb792dcb0>('Failures: 0') E + where <built-in method endswith of str object at 0x7f3bb792dcb0> = 'Test: dm_test_rtc_set_get: rtc.c\r\r\nexpected: 27/10/2021 03:38:15\r\r\nactual: 27/10/2021 03:38:14\r\r\ntest/dm/rtc...w, &cmp, 1): Expected 0x0 (0), got 0xffffffea (-22)\r\r\nTest: dm_test_rtc_set_get: rtc.c (flat tree)\r\r\nFailures: 1'.endswith ----------------------------- Captured stdout call ----------------------------- =>
See https://source.denx.de/u-boot/custodians/u-boot-x86/-/jobs/341905
But the same branch same commit, azure test results passed: https://dev.azure.com/bmeng/GitHub/_build/results?buildId=460&view=resul...
It looks like the error is an off-by-one where actual time is 1 second behind the expected time?
expected: 27/10/2021 03:38:15 actual: 27/10/2021 03:38:14
Is this a known issue?
Yes, which is why the test checks for a certain amount of "fuzz" around the return value.
You said the same thing about dm_test_rtc_reset() in https://lore.kernel.org/u-boot/20210831124441.GC858@bill-the-cat/ , but I can't find anything about any fuzz in the code. Could you point out where you think that's implemented? In both cases, the expected and actual values were just 1 from each other, and I fail to see how any fuzz value could be smaller than that.
I've wondered about if we need to increase that value
slightly sometimes, or just live with hitting the re-run failed jobs button on whatever CI system was a bit too slow sometimes.
It has nothing to do with a CI being slow, it's plain and simple buggy test code AFAICT. It's essentially "assert(time(NULL) == time(NULL))". If a call to time() takes 1us, do this a million times and it will on average fail once. Obviously, a loaded system increases the chance of being preempted between the two calls and hence effectively increases the delta and proportionally the probability of hitting this.
You're right, I confused this with the sleep test, which does have a bit of fuzz to it.

Hi,
I filed
https://source.denx.de/u-boot/u-boot/-/issues/4
Regards, Simon
On Thu, 28 Oct 2021 at 05:58, Tom Rini trini@konsulko.com wrote:
On Thu, Oct 28, 2021 at 09:22:30AM +0200, Rasmus Villemoes wrote:
On 27/10/2021 15.22, Tom Rini wrote:
On Wed, Oct 27, 2021 at 12:43:38PM +0800, Bin Meng wrote:
Hi Simon,
gitlab reported the following test error below:
=================================== FAILURES =================================== __________________________ test_ut[ut_dm_rtc_set_get] __________________________ test/py/tests/test_ut.py:43: in test_ut assert output.endswith('Failures: 0') E AssertionError: assert False E + where False = <built-in method endswith of str object at 0x7f3bb792dcb0>('Failures: 0') E + where <built-in method endswith of str object at 0x7f3bb792dcb0> = 'Test: dm_test_rtc_set_get: rtc.c\r\r\nexpected: 27/10/2021 03:38:15\r\r\nactual: 27/10/2021 03:38:14\r\r\ntest/dm/rtc...w, &cmp, 1): Expected 0x0 (0), got 0xffffffea (-22)\r\r\nTest: dm_test_rtc_set_get: rtc.c (flat tree)\r\r\nFailures: 1'.endswith ----------------------------- Captured stdout call ----------------------------- =>
See https://source.denx.de/u-boot/custodians/u-boot-x86/-/jobs/341905
But the same branch same commit, azure test results passed: https://dev.azure.com/bmeng/GitHub/_build/results?buildId=460&view=resul...
It looks like the error is an off-by-one where actual time is 1 second behind the expected time?
expected: 27/10/2021 03:38:15 actual: 27/10/2021 03:38:14
Is this a known issue?
Yes, which is why the test checks for a certain amount of "fuzz" around the return value.
You said the same thing about dm_test_rtc_reset() in https://lore.kernel.org/u-boot/20210831124441.GC858@bill-the-cat/ , but I can't find anything about any fuzz in the code. Could you point out where you think that's implemented? In both cases, the expected and actual values were just 1 from each other, and I fail to see how any fuzz value could be smaller than that.
I've wondered about if we need to increase that value
slightly sometimes, or just live with hitting the re-run failed jobs button on whatever CI system was a bit too slow sometimes.
It has nothing to do with a CI being slow, it's plain and simple buggy test code AFAICT. It's essentially "assert(time(NULL) == time(NULL))". If a call to time() takes 1us, do this a million times and it will on average fail once. Obviously, a loaded system increases the chance of being preempted between the two calls and hence effectively increases the delta and proportionally the probability of hitting this.
You're right, I confused this with the sleep test, which does have a bit of fuzz to it.
-- Tom
participants (4)
-
Bin Meng
-
Rasmus Villemoes
-
Simon Glass
-
Tom Rini