Re: [PATCH 0/8] efi_loader: Complete the bootflow_efi() test

18 Jan 2025


      Hi Tom,
On Mon, 13 Jan 2025 at 13:20, Tom Rini trini@konsulko.com wrote:
...
On Mon, Jan 13, 2025 at 12:01:36PM -0700, Simon Glass wrote:
...
Hi Tom,
On Fri, 10 Jan 2025 at 09:48, Tom Rini trini@konsulko.com wrote:
...
On Fri, Jan 10, 2025 at 06:40:37AM -0700, Simon Glass wrote:
...
Hi Tom,
On Thu, 9 Jan 2025 at 09:51, Tom Rini trini@konsulko.com wrote:
...
On Thu, Jan 09, 2025 at 08:02:01AM -0700, Simon Glass wrote:
...
Hi Tom,
On Wed, 8 Jan 2025 at 12:15, Tom Rini trini@konsulko.com
wrote:
...
...
...
...
...
...
>
> On Wed, Jan 08, 2025 at 10:02:52AM -0700, Simon Glass wrote:
> > Hi Heinrich, Tom,
> >
> > On Tue, 7 Jan 2025 at 08:47, Heinrich Schuchardt <
xypron.glpk@gmx.de> wrote:
...
...
...
...
...
...
> > >
> > > On 07.01.25 16:11, Tom Rini wrote:
> > > > On Tue, Jan 07, 2025 at 06:57:50AM -0700, Simon Glass
wrote:
...
...
...
...
...
...
> > > >> Hi Heinrich,
> > > >>
> > > >> On Tue, 7 Jan 2025 at 06:11, Heinrich Schuchardt <
xypron.glpk@gmx.de> wrote:
...
...
...
...
...
...
> > > >>>
> > > >>> On 07.01.25 13:15, Simon Glass wrote:
> > > >>>> Hi Heinrich,
> > > >>>>
> > > >>>> On Mon, 6 Jan 2025 at 10:00, Heinrich Schuchardt <
xypron.glpk@gmx.de> wrote:
...
...
...
...
...
...
> > > >>>>>
> > > >>>>> On 06.01.25 15:47, Simon Glass wrote:
> > > >>>>>> This test was hamstrung in code review so this
series is an attempt to
...
...
...
...
...
...
> > > >>>>>> complete the intended functionality:
> > > >>>>>>
> > > >>>>>> - Check memory allocations look correct
> > > >>>>>> - Check that exit-boot-services removes active-DMA
devices
...
...
...
...
...
...
> > > >>>>>> - Check that the bootflow is still present after
testapp finishes
...
...
...
...
...
...
> > > >>>>>>
> > > >>>>>> The EFI functionality duplicates
bootm_announce_and_cleanup() and still
...
...
...
...
...
...
> > > >>>>>> uses the defunct board_quiesce_devices() so a nice
cleanup would be to
...
...
...
...
...
...
> > > >>>>>> call the bootm function instead, with suitable
modifications. That would
...
...
...
...
...
...
> > > >>>>>> allow bootstage to work too.
> > > >>>>>>
> > > >>>>>> This series is based on sjg/master since the EFI
logging was rejected so
...
...
...
...
...
...
> > > >>>>>> far.
> > > >>>>>
> > > >>>>> Yes, it was rejected because a solution at the
lib/log.c level would be
...
...
...
...
...
...
> > > >>>>> more generic.
> > > >>>>
> > > >>>> As I mentioned, that idea isn't suitable for
programmatic use.
...
...
...
...
...
...
> > > >>>
> > > >>> What can be done with show_addr("mem", rec->memory);
that log_debug()
...
...
...
...
...
...
> > > >>> does not offer or which you could not do with a new
log function in
...
...
...
...
...
...
> > > >>> lib/log.c that takes variadic arguments?
> > > >>
> > > >> There are asserts in [1], for example. How do you
propose to handle
...
...
...
...
...
...
> > > >> that? See [2] for my previous explanation, quoted here:
> > > >>
> > > >>> CONFIG_LOG with a bloblist option would be a great
idea, but it's hard
...
...
...
...
...
...
> > > >>> to programmatically scan text...plus only the
external call sites are
...
...
...
...
...
...
> > > >>> actually logged.
> > > >>
> > > >> Also see the discussion on the original patch [3].
There was also your
...
...
...
...
...
...
> > > >> reply at [4], but I think you missed that this is
intended for use in
...
...
...
...
...
...
> > > >> unit tests (i.e. with ut_assert()).
> > > >>
> > > >> You also requested that this be generalised, rather
than being
...
...
...
...
...
...
> > > >> EFI-loader-specific. I have no objection to that, but
don't have a use
...
...
...
...
...
...
> > > >> case for it yet, so have deferred that to later. It's
a fairly simple
...
...
...
...
...
...
> > > >> change, if/when needed. If the series was not NAKed,
I'd be happy to
...
...
...
...
...
...
> > > >> do it now.
> > > >>
> > > >>>>
> > > >>>>>
> > > >>>>> Tom suggested not to send patches that are for
private enjoyment to the
...
...
...
...
...
...
> > > >>>>> mailing list.
> > > >>>>
> > > >>>> My contributions to U-Boot are only ever about
private enjoyment :-)
...
...
...
...
...
...
> > > >>>>
> > > >>>> Do you have any comments on the patches?
> > > >>
> > > >> Regards,
> > > >> Simon
> > > >>
> > > >> [1]
https://patchwork.ozlabs.org/project/uboot/patch/20250106144755.3054780-6-sj...
...
...
...
...
...
...
> > > >> [2]
https://lore.kernel.org/u-boot/CAFLszTjxOE_037+kR0jgdax80sBombYo_k0YgiuVnP=K...
...
...
...
...
...
...
> > > >> [3]
https://lore.kernel.org/u-boot/CAC_iWjKtaN54B98OKbkoXkC_GmKJ=x+M4=UY_O6roSOp...
...
...
...
...
...
...
> > > >> [4]
https://lore.kernel.org/u-boot/D513D326-41A6-425E-B11F-85958065BCD2@gmx.de/
...
...
...
...
...
...
> > > >
> > > > Looking at the logging portions of the original series
again, especially
...
...
...
...
...
...
> > > > if this was made generic, we probably don't want to
print to actual
...
...
...
...
...
...
> > > > console every time we're making a note of some memory
allocation for
...
...
...
...
...
...
> > > > example, that would be unreadable outside of a debug
context. The point
...
...
...
...
...
...
> > > > of this really seems to be "log things for verifying in
tests later".
...
...
...
...
...
...
> > > > Does that end up being useful? I don't know. Heinrich
or Ilias, do the
...
...
...
...
...
...
> > > > tests in [1] look generally useful?
> > > >
> > >
> > > The tests in [1] are not documented, not even in the
commit message. So
...
...
...
...
...
...
> > > the reasoning behind the tests remains Simon's secret.
> >
> > Are you asking for code comments in the test? If so, I can
add some.
...
...
...
...
...
...
> >
> > >
> > > At first sight the tests in [1] don't make much sense.
E.g. that only a
...
...
...
...
...
...
> > > subset of memory types have been used does not tell that
the right
...
...
...
...
...
...
> > > memory type has been used for the right object.
> >
> > It is a pretty good start, though. It makes sure that the
memory types
...
...
...
...
...
...
> > are sane, checks addresses are within DRAM, etc. With [5]
it makes
...
...
...
...
...
...
> > sure that devices are removed.
> >
> > >
> > > Implementing a specific tracing functionality for EFI is
definitively
...
...
...
...
...
...
> > > the wrong way forward as it will lead to code duplication.
> >
> > We can cross that bridge when we come to it.
>
> Well, no. It's backwards to make a bridge in one place when
everyone
...
...
...
...
...
...
> agrees it needs to be moved somewhere else. I mean [5] is a
generic
...
...
...
...
...
...
> issue and test/py/tests/test_net_boot.py or some other test
we already
...
...
...
...
...
...
> have which tests booting an OS should confirm that we've
quiesced
...
...
...
...
...
...
> devices before moving on. And as a bonus it's in python where
dealing
...
...
...
...
...
...
> with strings doesn't suck.
I really don't want to write C tests in Python. CI is slow
enough as
...
...
...
...
...
...
it is, something realy want to fix. I'm also not sure how you
can tell
...
...
...
...
...
...
if a device has been removed. Run 'dm tree' and look for the
missing
...
...
...
...
...
...
'star' in the resulting 300 lines of text?
As I'm in a bisect-hell in our C tests you'll have to forgive me
for not
...
...
...
...
...
thinking the C tests are noticeably faster than python tests. Or
that
...
...
...
...
...
they aren't their own potential source of corner-case bugs. But I
digress..
Welcome to my world. I bisected my lab devices so many times to try
to
...
...
...
...
isolate all the breakages that have crept in. What is the problem,
maybe I can help?
Sure. test/cmd/hash.c::dm_test_cmd_hash_md5 fails randomly, in maybe 1
out of 100 runs, via pytest, in sandbox. Not via "./u-boot -T -c 'ut
dm
...
...
...
dm_test_cmd_hash_md5'" however (I stopped checking after 1000
iterations). I was iterating over "and built with clang" but I think
it
...
...
...
happens with gcc too, from the actual failures in CI. And you can use
"-k ut" to limit to just what's matched there, so it's a quicker
iteration.
Hmmm do you have a link? It's hard to imagine what it is, but perhaps
a dependency on a previous test.
Sure:
https://source.denx.de/u-boot/u-boot/-/jobs/993200#L286
My current gut feeling is that we've got some section overlap issue
somewhere. And, FWIW, if I turn off EFI_LOADER I still see it. And if
you use the same binary it won't always fail.
The only way I think the hash command can return CMD_RET_USAGE is if there
are not enough arguments.
The only way that can happen with this code:
ut_assertok(run_command("hash md5 $loadaddr 0", 0));
is if loadaddr is undefined.
But (like you, I suspect) I cannot find how that could be.
In fact, I can't even build with clang:
$ buildman -O clang-17 --bo sandbox -w -o /tmp/b/sandbox-clang
Building current source for 1 boards (1 thread, 32 jobs per thread)
   sandbox:  +   sandbox
+/usr/bin/llvm-ar: error: arch/sandbox/cpu/built-in.o: Opaque pointers are
only supported in -opaque-pointers mode (Producer: 'LLVM17.0.6' Reader:
'LLVM 14.0.0')
+make[2]: *** [scripts/Makefile.build:333: arch/sandbox/cpu/built-in.o]
Error 1
+make[1]: *** [Makefile:1906: arch/sandbox/cpu] Error 2
+make: *** [Makefile:177: sub-make] Error 2
    0    0    1 /1              sandbox
...
...
At present 'ut all' fails so I am going to take a look at that. Quite
a bit of clean-up needed in test system, though. Ideally we could run
the tests in random order so we can find and fix the dependencies. For
driver model we reinit as needed, but that's not the case for EFI, for
example.
Personally, for making pytest faster I'd look at the general
recommendations various blog posts about "make your pytest run faster"
and then go from there.
I think the problem is that you are looking at the C tests through a Python
lens, so everything seems a bit slow.
'ut all' takes about 18 seconds for me.
...
...
...
...
...
And yes, taking a bunch of text and parsing it, is what python is
fast
...
...
...
...
...
at. And easier to write.
...
But actually [5] is not generic, since EFI uses its own code to
remove
...
...
...
...
...
...
devices. This test is solely focussed on EFI.
Yes, you're testing the EFI version of the code in
arch/$(ARCH)/lib/bootm.c. The remove devices functions being
called in
...
...
...
...
...
both cases are generic.
The code in EFI is:
if (!efi_st_keep_devices) {
bootm_disable_interrupts();
if (IS_ENABLED(CONFIG_USB_DEVICE))
udc_disconnect();
board_quiesce_devices();
dm_remove_devices_active();
}
It does call somewhat the same functions, but is doing its own
thing,
...
...
...
...
not even using the arch-specific code. As I mentioned, a nice
clean-up
...
...
...
...
would be to make bootm_announce_and_cleanup() common.
Yes, we almost agree? Both the EFI code, and arch/$(ARCH)/lib/bootm.c
have functions that make the above calls. A nice clean-up would be to
have something common.
Yes indeed. It still does not provide a test for the EFI bootmeth,
though, where I found half a dozen bugs.
Yes, the bootmeth code is it's own thing.
No I mean there were bugs in the EFI implementation.
...
...
...
...
Actually, now that I see efi_st_keep_devices, I wonder why Heinrich
didn't want my ANSI patch[6] which serves a similar function.
No? Your patch disables ANSI output in those tests, that variable is
for
...
...
...
making sure those tests can accomplish (if I skim things right)
similar
...
...
...
kinds of tests you've asked for before, but with an EFI app instead?
But
...
...
...
perhaps better to not start yet another tangent here...
I wouldn't know where to start, anyway...
...
...
...
...
If you want the logging to be renamed and placed centrally I
don't
...
...
...
...
...
...
mind doing it now. But note that only EFI will use it for now.
>
> >
> > >
> > > We already have function _log() which is variadic.
> > >
> > > Simon could write a new log driver that parses the
`format` parameter
...
...
...
...
...
...
> > > and saves the binary data in an appropriate format for
analysis by the
...
...
...
...
...
...
> > > unit tests:
> > >
> > > * For %s the driver should save the string and not the
address of the
...
...
...
...
...
...
> > > string.
> > > * For %pD the driver should save the device path instead
of the pointer.
...
...
...
...
...
...
> > > * ...
> > >
> > > Some changes to the log driver interface will be needed
to pass the
...
...
...
...
...
...
> > > variadic arguments instead of the formatted message.
> >
> > Perhaps the word 'log' is confusing people. But the above
suggestion
...
...
...
...
...
...
> > is quite a complicated way of handling things. We have no
way to
...
...
...
...
...
...
> > decode printf() strings in this way. See log_dispatch() for
how this
...
...
...
...
...
...
> > is handled today. It uses sprintf(). Trying to test based
on text
...
...
...
...
...
...
> > output would be very clumsy (lots of regexes and sscan()
calls?) and
...
...
...
...
...
...
> > result in a huge amount of parsing code, highly dependent
on the
...
...
...
...
...
...
> > printf() format, etc.
> >
> > I very-much doubt that would produce a useful
implementation, but if
...
...
...
...
...
...
> > you would like to try it out then I would be happy to look
at it.
...
...
...
...
...
...
> >
> > I mentioned this several times, but even if we did go that
way, we
...
...
...
...
...
...
> > only have logging on the external calls, so much of the
EFI-memory
...
...
...
...
...
...
> > allocation in U-Boot would not be logged.
> >
> > Regards,
> > Simon
> >
> > [5]
https://patchwork.ozlabs.org/project/uboot/patch/20250106144755.3054780-9-sj...
...
...
...
...
...
...
>
> Yes, calling this a "log" when it's intended for capturing
information
...
...
...
...
...
...
> for tests got some of this off on the wrong track. But that
also helps
...
...
...
...
...
...
> explain now that this is still on the wrong track and should
instead be
...
...
...
...
...
...
> following normal design practices for testing and expanding
existing
...
...
...
...
...
...
> infrastructure and not inventing a new everything. So if you
don't like
...
...
...
...
...
...
> Heinrich's suggestion, take a look at Caleb's suggestion.
I don't have the energy to port the tracing framework from
Linux to
...
...
...
...
...
...
U-Boot, although I agree it would be useful. Still, function
tracing
...
...
...
...
...
...
is quite fragile and confusing to work with when refactoring
code. I
...
...
...
...
...
...
don't like that idea much for this use case, although if
function
...
...
...
...
...
...
tracing did exist in U-Boot I would likely have used it.
I mean yes, it would be good if you went back and expanded on the
trace
...
...
...
...
...
functionality you did before.
I still don't believe it is the best solution and seems like yet
another ocean I should avoid sticking my heater into.
I strongly disagree. If you go back to the trace code you brought in
to
...
...
...
start with and make it more useful / include newer features existing
elsewhere you're not going to end up in conflict with everyone asking
why you're doing something subsystem specific.
Perhaps someone else could do this? It would be a substantial amount
of work to bring runtime tooling into U-Boot, bpf and the like. It
would be quite a pain to use, I suspect, and certainly not possible to
write a simple C test as I have done here.
I'm not sure where bpf and the like comes from in what I said here. But
again, if you find that logging thing you wrote handy, keep it in a
topic branch somewhere and then later on you can "Told You So" the rest
of us later on.
I'm not sure what you are suggesting re tracing, but I assumed it was that
we patch the code at runtime and set tracepoints to check for particular
calls in a test. I'm not sure how else we would get access to the function
arguments. Then, presumably we need a bpf rule to collect the function
args, etc.? It wasn't my idea, so perhaps we should let Caleb explain the
intent.
...
...
...
...
...
...
> And if you
> don't like Caleb's suggestion, go put this in a topic branch
you can
...
...
...
...
...
...
> merge when you need to debug some problem that seemingly
nothing else
...
...
...
...
...
...
> will catch.
Here we are over a year after I reported the bug and we still
don't
...
...
...
...
...
...
have a test to cover it. This series is better than the
available
...
...
...
...
...
...
alternatives, IMO.
Well, no. We have commit dabaa4ae3206 ("dm: Add
dm_remove_devices_active() for ordered device removal") we have a
test
...
...
...
...
...
for the underlying problem. We need more functional boot tests,
but we
...
...
...
...
...
need those to be in python too, and not more C code.
That is a nice improvement, but did not fix the underlying problem.
The underlying problem was that EFI was calling exit-boot-services,
causing U-Boot to free up data structures which were needed to boot.
This was on x86_64. I never quite figured out which one (very hard
when you cannot get back to U-Boot to check).
There were quite a lot of problems, actually. There v2 series is at
[7]
...
...
...
...
Only a C test can check what actually happens inside U-Boot.
Yes, I think now we get back to disagreeing on which symptoms lead to
which code problems and then what to do about them.
OK
...
...
...
And you're not just coming up with a test, you're refactoring a
bunch of
...
...
...
...
...
code and introducing new subsystems in order to do that. When as
I keep
...
...
...
...
...
pointing out, we don't need that. We could easily extend the
existing OS
...
...
...
...
...
boot tests we have to script booting an ISO. And we only run
those when
...
...
...
...
...
say "ENABLE_SLOW_TESTS" is set, and only do that on tagged
releases.
...
...
...
...
Yes of course we need to refactor to make tests work. This is not
necessarily a bad thing, as it helps us break code down into
testable
...
...
...
...
chunks. We cannot rely only on large functional-tests, not that you
are suggesting that. See [8], but they are too slow, too hard to
debug
...
...
...
...
when they fail. They also tend to devolve into chaos as people get
lazy and stop writing unit/smaller tests.
I'll just note that I don't ever even think to use "make tests" or
"qcheck" or any of the others since they never work for me.
Would you mind filing an issue on that? I use 'make pcheck' all the
time.
...
Done:
https://source.denx.de/u-boot/custodians/u-boot-dm/-/issues/33
I didn't file one for "qcheck" which fails differently in that same
environment.
OK, I'll reply there, thanks.
Regards,
Simon