
Hi Tom,
On Mon, 13 Jan 2025 at 13:20, Tom Rini trini@konsulko.com wrote:
On Mon, Jan 13, 2025 at 12:01:36PM -0700, Simon Glass wrote:
Hi Tom,
On Fri, 10 Jan 2025 at 09:48, Tom Rini trini@konsulko.com wrote:
On Fri, Jan 10, 2025 at 06:40:37AM -0700, Simon Glass wrote:
Hi Tom,
On Thu, 9 Jan 2025 at 09:51, Tom Rini trini@konsulko.com wrote:
On Thu, Jan 09, 2025 at 08:02:01AM -0700, Simon Glass wrote:
Hi Tom,
On Wed, 8 Jan 2025 at 12:15, Tom Rini trini@konsulko.com
wrote:
> > On Wed, Jan 08, 2025 at 10:02:52AM -0700, Simon Glass wrote: > > Hi Heinrich, Tom, > > > > On Tue, 7 Jan 2025 at 08:47, Heinrich Schuchardt <
xypron.glpk@gmx.de> wrote:
> > > > > > On 07.01.25 16:11, Tom Rini wrote: > > > > On Tue, Jan 07, 2025 at 06:57:50AM -0700, Simon Glass
wrote:
> > > >> Hi Heinrich, > > > >> > > > >> On Tue, 7 Jan 2025 at 06:11, Heinrich Schuchardt <
xypron.glpk@gmx.de> wrote:
> > > >>> > > > >>> On 07.01.25 13:15, Simon Glass wrote: > > > >>>> Hi Heinrich, > > > >>>> > > > >>>> On Mon, 6 Jan 2025 at 10:00, Heinrich Schuchardt <
xypron.glpk@gmx.de> wrote:
> > > >>>>> > > > >>>>> On 06.01.25 15:47, Simon Glass wrote: > > > >>>>>> This test was hamstrung in code review so this
series is an attempt to
> > > >>>>>> complete the intended functionality: > > > >>>>>> > > > >>>>>> - Check memory allocations look correct > > > >>>>>> - Check that exit-boot-services removes active-DMA
devices
> > > >>>>>> - Check that the bootflow is still present after
testapp finishes
> > > >>>>>> > > > >>>>>> The EFI functionality duplicates
bootm_announce_and_cleanup() and still
> > > >>>>>> uses the defunct board_quiesce_devices() so a nice
cleanup would be to
> > > >>>>>> call the bootm function instead, with suitable
modifications. That would
> > > >>>>>> allow bootstage to work too. > > > >>>>>> > > > >>>>>> This series is based on sjg/master since the EFI
logging was rejected so
> > > >>>>>> far. > > > >>>>> > > > >>>>> Yes, it was rejected because a solution at the
lib/log.c level would be
> > > >>>>> more generic. > > > >>>> > > > >>>> As I mentioned, that idea isn't suitable for
programmatic use.
> > > >>> > > > >>> What can be done with show_addr("mem", rec->memory);
that log_debug()
> > > >>> does not offer or which you could not do with a new
log function in
> > > >>> lib/log.c that takes variadic arguments? > > > >> > > > >> There are asserts in [1], for example. How do you
propose to handle
> > > >> that? See [2] for my previous explanation, quoted here: > > > >> > > > >>> CONFIG_LOG with a bloblist option would be a great
idea, but it's hard
> > > >>> to programmatically scan text...plus only the
external call sites are
> > > >>> actually logged. > > > >> > > > >> Also see the discussion on the original patch [3].
There was also your
> > > >> reply at [4], but I think you missed that this is
intended for use in
> > > >> unit tests (i.e. with ut_assert()). > > > >> > > > >> You also requested that this be generalised, rather
than being
> > > >> EFI-loader-specific. I have no objection to that, but
don't have a use
> > > >> case for it yet, so have deferred that to later. It's
a fairly simple
> > > >> change, if/when needed. If the series was not NAKed,
I'd be happy to
> > > >> do it now. > > > >> > > > >>>> > > > >>>>> > > > >>>>> Tom suggested not to send patches that are for
private enjoyment to the
> > > >>>>> mailing list. > > > >>>> > > > >>>> My contributions to U-Boot are only ever about
private enjoyment :-)
> > > >>>> > > > >>>> Do you have any comments on the patches? > > > >> > > > >> Regards, > > > >> Simon > > > >> > > > >> [1]
https://patchwork.ozlabs.org/project/uboot/patch/20250106144755.3054780-6-sj...
> > > >> [2]
https://lore.kernel.org/u-boot/CAFLszTjxOE_037+kR0jgdax80sBombYo_k0YgiuVnP=K...
> > > >> [3]
https://lore.kernel.org/u-boot/CAC_iWjKtaN54B98OKbkoXkC_GmKJ=x+M4=UY_O6roSOp...
> > > >> [4]
https://lore.kernel.org/u-boot/D513D326-41A6-425E-B11F-85958065BCD2@gmx.de/
> > > > > > > > Looking at the logging portions of the original series
again, especially
> > > > if this was made generic, we probably don't want to
print to actual
> > > > console every time we're making a note of some memory
allocation for
> > > > example, that would be unreadable outside of a debug
context. The point
> > > > of this really seems to be "log things for verifying in
tests later".
> > > > Does that end up being useful? I don't know. Heinrich
or Ilias, do the
> > > > tests in [1] look generally useful? > > > > > > > > > > The tests in [1] are not documented, not even in the
commit message. So
> > > the reasoning behind the tests remains Simon's secret. > > > > Are you asking for code comments in the test? If so, I can
add some.
> > > > > > > > At first sight the tests in [1] don't make much sense.
E.g. that only a
> > > subset of memory types have been used does not tell that
the right
> > > memory type has been used for the right object. > > > > It is a pretty good start, though. It makes sure that the
memory types
> > are sane, checks addresses are within DRAM, etc. With [5]
it makes
> > sure that devices are removed. > > > > > > > > Implementing a specific tracing functionality for EFI is
definitively
> > > the wrong way forward as it will lead to code duplication. > > > > We can cross that bridge when we come to it. > > Well, no. It's backwards to make a bridge in one place when
everyone
> agrees it needs to be moved somewhere else. I mean [5] is a
generic
> issue and test/py/tests/test_net_boot.py or some other test
we already
> have which tests booting an OS should confirm that we've
quiesced
> devices before moving on. And as a bonus it's in python where
dealing
> with strings doesn't suck.
I really don't want to write C tests in Python. CI is slow
enough as
it is, something realy want to fix. I'm also not sure how you
can tell
if a device has been removed. Run 'dm tree' and look for the
missing
'star' in the resulting 300 lines of text?
As I'm in a bisect-hell in our C tests you'll have to forgive me
for not
thinking the C tests are noticeably faster than python tests. Or
that
they aren't their own potential source of corner-case bugs. But I digress..
Welcome to my world. I bisected my lab devices so many times to try
to
isolate all the breakages that have crept in. What is the problem, maybe I can help?
Sure. test/cmd/hash.c::dm_test_cmd_hash_md5 fails randomly, in maybe 1 out of 100 runs, via pytest, in sandbox. Not via "./u-boot -T -c 'ut
dm
dm_test_cmd_hash_md5'" however (I stopped checking after 1000 iterations). I was iterating over "and built with clang" but I think
it
happens with gcc too, from the actual failures in CI. And you can use "-k ut" to limit to just what's matched there, so it's a quicker iteration.
Hmmm do you have a link? It's hard to imagine what it is, but perhaps a dependency on a previous test.
Sure: https://source.denx.de/u-boot/u-boot/-/jobs/993200#L286
My current gut feeling is that we've got some section overlap issue somewhere. And, FWIW, if I turn off EFI_LOADER I still see it. And if you use the same binary it won't always fail.
The only way I think the hash command can return CMD_RET_USAGE is if there are not enough arguments.
The only way that can happen with this code:
ut_assertok(run_command("hash md5 $loadaddr 0", 0));
is if loadaddr is undefined.
But (like you, I suspect) I cannot find how that could be.
In fact, I can't even build with clang:
$ buildman -O clang-17 --bo sandbox -w -o /tmp/b/sandbox-clang Building current source for 1 boards (1 thread, 32 jobs per thread) sandbox: + sandbox +/usr/bin/llvm-ar: error: arch/sandbox/cpu/built-in.o: Opaque pointers are only supported in -opaque-pointers mode (Producer: 'LLVM17.0.6' Reader: 'LLVM 14.0.0') +make[2]: *** [scripts/Makefile.build:333: arch/sandbox/cpu/built-in.o] Error 1 +make[1]: *** [Makefile:1906: arch/sandbox/cpu] Error 2 +make: *** [Makefile:177: sub-make] Error 2 0 0 1 /1 sandbox
At present 'ut all' fails so I am going to take a look at that. Quite a bit of clean-up needed in test system, though. Ideally we could run the tests in random order so we can find and fix the dependencies. For driver model we reinit as needed, but that's not the case for EFI, for example.
Personally, for making pytest faster I'd look at the general recommendations various blog posts about "make your pytest run faster" and then go from there.
I think the problem is that you are looking at the C tests through a Python lens, so everything seems a bit slow.
'ut all' takes about 18 seconds for me.
And yes, taking a bunch of text and parsing it, is what python is
fast
at. And easier to write.
But actually [5] is not generic, since EFI uses its own code to
remove
devices. This test is solely focussed on EFI.
Yes, you're testing the EFI version of the code in arch/$(ARCH)/lib/bootm.c. The remove devices functions being
called in
both cases are generic.
The code in EFI is:
if (!efi_st_keep_devices) { bootm_disable_interrupts(); if (IS_ENABLED(CONFIG_USB_DEVICE)) udc_disconnect(); board_quiesce_devices(); dm_remove_devices_active(); }
It does call somewhat the same functions, but is doing its own
thing,
not even using the arch-specific code. As I mentioned, a nice
clean-up
would be to make bootm_announce_and_cleanup() common.
Yes, we almost agree? Both the EFI code, and arch/$(ARCH)/lib/bootm.c have functions that make the above calls. A nice clean-up would be to have something common.
Yes indeed. It still does not provide a test for the EFI bootmeth, though, where I found half a dozen bugs.
Yes, the bootmeth code is it's own thing.
No I mean there were bugs in the EFI implementation.
Actually, now that I see efi_st_keep_devices, I wonder why Heinrich didn't want my ANSI patch[6] which serves a similar function.
No? Your patch disables ANSI output in those tests, that variable is
for
making sure those tests can accomplish (if I skim things right)
similar
kinds of tests you've asked for before, but with an EFI app instead?
But
perhaps better to not start yet another tangent here...
I wouldn't know where to start, anyway...
If you want the logging to be renamed and placed centrally I
don't
mind doing it now. But note that only EFI will use it for now.
> > > > > > > > > We already have function _log() which is variadic. > > > > > > Simon could write a new log driver that parses the
`format` parameter
> > > and saves the binary data in an appropriate format for
analysis by the
> > > unit tests: > > > > > > * For %s the driver should save the string and not the
address of the
> > > string. > > > * For %pD the driver should save the device path instead
of the pointer.
> > > * ... > > > > > > Some changes to the log driver interface will be needed
to pass the
> > > variadic arguments instead of the formatted message. > > > > Perhaps the word 'log' is confusing people. But the above
suggestion
> > is quite a complicated way of handling things. We have no
way to
> > decode printf() strings in this way. See log_dispatch() for
how this
> > is handled today. It uses sprintf(). Trying to test based
on text
> > output would be very clumsy (lots of regexes and sscan()
calls?) and
> > result in a huge amount of parsing code, highly dependent
on the
> > printf() format, etc. > > > > I very-much doubt that would produce a useful
implementation, but if
> > you would like to try it out then I would be happy to look
at it.
> > > > I mentioned this several times, but even if we did go that
way, we
> > only have logging on the external calls, so much of the
EFI-memory
> > allocation in U-Boot would not be logged. > > > > Regards, > > Simon > > > > [5]
https://patchwork.ozlabs.org/project/uboot/patch/20250106144755.3054780-9-sj...
> > Yes, calling this a "log" when it's intended for capturing
information
> for tests got some of this off on the wrong track. But that
also helps
> explain now that this is still on the wrong track and should
instead be
> following normal design practices for testing and expanding
existing
> infrastructure and not inventing a new everything. So if you
don't like
> Heinrich's suggestion, take a look at Caleb's suggestion.
I don't have the energy to port the tracing framework from
Linux to
U-Boot, although I agree it would be useful. Still, function
tracing
is quite fragile and confusing to work with when refactoring
code. I
don't like that idea much for this use case, although if
function
tracing did exist in U-Boot I would likely have used it.
I mean yes, it would be good if you went back and expanded on the
trace
functionality you did before.
I still don't believe it is the best solution and seems like yet another ocean I should avoid sticking my heater into.
I strongly disagree. If you go back to the trace code you brought in
to
start with and make it more useful / include newer features existing elsewhere you're not going to end up in conflict with everyone asking why you're doing something subsystem specific.
Perhaps someone else could do this? It would be a substantial amount of work to bring runtime tooling into U-Boot, bpf and the like. It would be quite a pain to use, I suspect, and certainly not possible to write a simple C test as I have done here.
I'm not sure where bpf and the like comes from in what I said here. But again, if you find that logging thing you wrote handy, keep it in a topic branch somewhere and then later on you can "Told You So" the rest of us later on.
I'm not sure what you are suggesting re tracing, but I assumed it was that we patch the code at runtime and set tracepoints to check for particular calls in a test. I'm not sure how else we would get access to the function arguments. Then, presumably we need a bpf rule to collect the function args, etc.? It wasn't my idea, so perhaps we should let Caleb explain the intent.
> And if you > don't like Caleb's suggestion, go put this in a topic branch
you can
> merge when you need to debug some problem that seemingly
nothing else
> will catch.
Here we are over a year after I reported the bug and we still
don't
have a test to cover it. This series is better than the
available
alternatives, IMO.
Well, no. We have commit dabaa4ae3206 ("dm: Add dm_remove_devices_active() for ordered device removal") we have a
test
for the underlying problem. We need more functional boot tests,
but we
need those to be in python too, and not more C code.
That is a nice improvement, but did not fix the underlying problem. The underlying problem was that EFI was calling exit-boot-services, causing U-Boot to free up data structures which were needed to boot. This was on x86_64. I never quite figured out which one (very hard when you cannot get back to U-Boot to check).
There were quite a lot of problems, actually. There v2 series is at
[7]
Only a C test can check what actually happens inside U-Boot.
Yes, I think now we get back to disagreeing on which symptoms lead to which code problems and then what to do about them.
OK
And you're not just coming up with a test, you're refactoring a
bunch of
code and introducing new subsystems in order to do that. When as
I keep
pointing out, we don't need that. We could easily extend the
existing OS
boot tests we have to script booting an ISO. And we only run
those when
say "ENABLE_SLOW_TESTS" is set, and only do that on tagged
releases.
Yes of course we need to refactor to make tests work. This is not necessarily a bad thing, as it helps us break code down into
testable
chunks. We cannot rely only on large functional-tests, not that you are suggesting that. See [8], but they are too slow, too hard to
debug
when they fail. They also tend to devolve into chaos as people get lazy and stop writing unit/smaller tests.
I'll just note that I don't ever even think to use "make tests" or "qcheck" or any of the others since they never work for me.
Would you mind filing an issue on that? I use 'make pcheck' all the
time.
Done: https://source.denx.de/u-boot/custodians/u-boot-dm/-/issues/33
I didn't file one for "qcheck" which fails differently in that same environment.
OK, I'll reply there, thanks.
Regards, Simon