
Apologies for the late public reply.
On Thu, Oct 19, 2023 at 03:06:48PM +0100, Peter Robinson wrote:
On Fri, Oct 13, 2023 at 6:48 PM Tom Rini trini@konsulko.com wrote:
On Fri, Oct 13, 2023 at 05:22:03PM +0100, Peter Robinson wrote:
On Fri, Oct 13, 2023 at 5:09 PM Peter Robinson pbrobinson@gmail.com wrote:
On Tue, Oct 10, 2023 at 3:58 PM Simon Glass sjg@chromium.org wrote:
Hi,
On Tue, 10 Oct 2023 at 04:39, Guillaume Gardet Guillaume.Gardet@arm.com wrote:
> -----Original Message----- > From: Peter Robinson pbrobinson@gmail.com > Sent: Tuesday, October 10, 2023 12:22 PM > To: Guillaume Gardet Guillaume.Gardet@arm.com > Cc: mbrugger@suse.com; Ivan Ivanov ivan.ivanov@suse.com; Simon Glass > sjg@chromium.org; u-boot@lists.denx.de > Subject: Re: U-Boot 2023.10 does not boot from uSD on RPi4 > > On Tue, Oct 10, 2023 at 10:26 AM Guillaume Gardet > Guillaume.Gardet@arm.com wrote: > > > > Hi, > > > > U-Boot 2023.10 does not boot from uSD on RPi4. > > This has been found on openSUSE Tumbleweed. The only diff we need is: > > -CONFIG_OF_EMBED=y > > +CONFIG_OF_BOARD=y > > To use firmware provided Device Tree. But that should not affect the mmc > behavior too much, I think. > > I've been booting Fedora fine on a RPi4 BUT there's issues with the display > turning off [1] when the accelerated display modules load > (vc4) as a result of this patch set. Can you confirm if that's the same problem > you're seeing?
No, that's not my problem. My issue is grub was not loaded by u-boot from uSD. It seems more like Simon's problem: https://lists.denx.de/pipermail/u-boot/2023-October/533162.html
@Simon, can you check if the patch below fixes your boot problem on RPi4, please?
This has been reported at least twice before. There is a fix [2] which is in my queue to apply.
Looking at that patch it scans the first 3 devices, how does it handle non storage devices like SDIO WiFi modules? It shouldn't be trying to scan those.
And in the case of the RPi the other enabled SDHCI interface is the WiFi, why would we even be trying to boot off a non storage interface, something here just feels broken/wrong in general.
The patch does make it work with pure upstream, and I'm not sure why the Fedora build boots it fine out of the box, but the patch just feels like it's hacking around some other underlying problem with bootstd, we didn't have this problem with the previous method and trying to boot off non storage interfaces feels like it could cause other problems.
I think the answer here is that we're doing the best we can given that we just don't know until run time what we have. In the case where sdhci
Well that's not entirely true in the case of mmc/sdhci, we know what devices could be storage, such as when a device is a mSD or eMMC or a wifi interface, those don't change from boot to boot, a SDHCI interface on one boot is not mysteriously going to become a emmc storage unit the next boot.
Getting in to specifics here, I believe one of the issues is that RPi 3 uses SDHCI #1 for WiFi and #0 for micro SD card and RPi 4 is the other way. So rpi_arm64 U-Boot binaries have to talk at both devices to see what's there. We _should_ be doing this in such a way that we discover both quickly enough and safely enough that we don't have a block device and stop. It's possible there's some quirk handling code needed from the upstream sdhci drivers for these chipsets, but I don't know them well enough.
is something other than storage, we get as far as asking "are you a block device?" which then fails when sdhci is a WiFi an not an eMMC. This does mean the user could notice "Card did not respond to voltage select! : -110" being printed, and I don't know if we should do anything about that (it's a handy message when your uSD isn't fully inserted, etc). But since we (can) support everything on a single build, we just have to figure it out at run time.
It has caused issues and it causes bug reports from users which is an issue for me as a maintainer as it wastes my time. In short it's not a great user experience.
Are we talking about the "Card did not ..." messages? If so, maybe we should lower the priority there from pr_err. If the probe itself leads to further errors, I would _really_ love to see the reports and how to reproduce it. As best I can see through the code, we're doing things safely and the command/response is "this is not a block device" and stop.
Overall the last few U-Boot releases have been a nightmare from my PoV, I have spent *all* my available time for upstream U-Boot dealing with regressions.
First, I appreciate your time here. We've all been testing things to the best of our time and resources, but that's everyones constraint too. And some of the regressions I believe you've had to deal with are unfortunate and part of how other components we have to use are / aren't documented. I hope we've gotten the Rockchip side of things sorted out now but I think one or two of those cases truly confused most of us.
In the case of the RPi I currently have 3 issues, 1) display 2) mSD 3) USB (that Ivan has also mentioned). The 3 of these together make things very hard to bisect and I am struggling. I also have 3 other devices with issues I'm trying to debug for the Fedora release, and the asahi people have also reported [1] regressions in their fork. I honestly regret applying the bootstd patches.
For the record, Peter and worked on these off-list. The display one (and some other Fedora issues) came down to the expected device tree not being the one that was passed over to Linux. That wasn't bootstd's fault either. We didn't sync up on the mSD issue. The USB one that Ivan mentioned has been fixed, and was a bug (and missing test) that Simon has addressed in the thread where that was fixed.
Looking at the USB XHCI error, I think this is yet another case of U-Boot being in a "stuck" position as Marek has been asking for someone to please work on re-syncing the driver with the kernel portion but simply not having the time to do it. I believe Eugen Hristev has volunteered to start re-syncing (incrementally rather than all the way up to current).
When even Simon [2] is losing track of things I think we need to change approach, the problems here upstream are nearly breaking me and for Fedora I am now considering just forking U-Boot and cherry picking the patches from upstream we need for particular devices and features. It's absolutely not something I want to do but I feel it's getting to the point I need to do it for the Fedora users and my sanity.
I want to say that it seems that one high level issue was that U-Boot's device tree was the one being passed to the Fedora kernel and not the device tree from the kernel and so issues that arose once Linux was booted were from that. And that was not a bootstd issue but the combination of using the bootmenu and efi bootmgr, which did not end up loading the expected dtb.
I like the concept of bootstd and other features but the quantity of patches, and sometimes other series of change for changes sake, where the testing is clearly either not there, or is relying on "it works on CI" [3] (and other examples) and is clearly not tested on real HW makes some of the churn hugely problematic,
Here's one problem we all have, and I'm not sure how we can fully address it. Today, I put all of my merges through our pytest tests, on real hardware, on a few platforms. I also try and remember to at leas towards the end of the release fire up the console on them and let whatever OS I had installed autoboot and come up to a prompt. But since it's manual testing, it doesn't happen consistently. I'm working towards getting a lab setup here (Konsulko) and using lab manager to manage the devices. Then do what needs doing so that kernelci can run the U-Boot pytests. And then have at least some of the quick kernelci tests themselves fire off. But this is a thing I want, not a business case, so work progresses when our engineers have spare time.
We also have some companies doing their own frequency and availability of upstream testing. I know in the past NVIDIA folks have mentioned they monitor my tree and run pytest on their hardware and speak up when it breaks, but I haven't heard from them in a while. I also know TI does regular testing of upstream on a large number of their platforms. And Toradex has recently given a talk about how they test a combination of upstream components (U-Boot, Kernel, OE) regularly and report regressions (and just did re some iMX changes). Oh, and I know Linaro is doing some specific tests of testing too, but I don't recall the details (and Ilias has offered to walk me through it). I don't know what else is being tested out there, with regularity.
In some ways it's understandable that we don't have as much hardware testing going on compared with the Linux Kernel in that you either need a platform that lets you load firmware via USB/UART/etc or you need some type of sdmux board which isn't commercially available (but are made+sold by people). I'd love more hardware testing. And I want to get more progress on making our pytest suite be able to be triggered by kernelci, or at least lab manager or something as that might reduce the overhead for other groups with labs to turn this testing on. But it's all volunteer, and it depends on what people have available to them. Since you've mentioned bootstd, I know Simon tested what he has available at a number of combinations of distributions, and that's how we've gotten a lot of issues addressed before it came to trying it out on Pi and so getting picked up by Fedora/SuSE/others.
similarly the applying of patches when there's been opposition and push back for the sake of it (eg NFSv1 patch) as is things like force enabling people's pet projects (looking at VBE here) where there's no actual real world users and real security ramifications (alternate unaudited boot methods of devices) also adds to my thought process for forking.
So this is where things are a lot trickier I think. One persons pet-project is another persons production use-case. I don't get the NFSv1 use case myself, but someone that is using it for production work, and has been for a while contributed it upstream. Yay for new contributors, that's how we grow. And it didn't impact the overall build size by much (which is a common concern when adding a new feature). In hindsight, yes, we also should have stopped enabling NFS by default since (and especially the forms we support) is a legacy protocol.
VBE is a chicken-and-egg. Is it widely used right now? No. But it also builds off of (iirc) the ChromeOS way of doing secure boot and some of the lessons learned there over the years, and leverages old and well tested at this point technologies like signed FIT images to solve the problems that people are trying to figure out how exactly to solve instead with UKI, on the EFI side. And not everyone is in agreement that the EFI path is the best path forward in every case for modern chips. So is Fedora right to disable VBE by default? Sure. But also, personally, I'm tired of "security" as a reason. We let users modify arbitrary locations in memory with arbitrary values by default and load and execute arbitrary payloads. Do we protect ourselves at runtime now? Yes, sure. Could someone work-around that? Yes. To be clear, I do see "make a secure U-Boot that users can Trust" is a good and valid use case. And yes, I see "companies want to Trust their deployed platform" is a valid use case too. So if you have an end goal of "Fedora ships U-Boot that users can Trust", disabling VBE isn't the first step, but working with Simon on the things he's doing so that you can't drop down to a prompt and start modifying memory should be on that list.
One of those things I do for every pull request / merge of my branches is do a world build before/after, and see what's growing where size-wise, and for what platforms. I try and keep global behavior from changing without reason, be it bug fix or pretty important new feature.
I feel we as a project need to have a proper discussion about these things.
Yes, we should all talk more. Maybe we're long enough in to COVID now that some of the virtual meeting fatigue has subsided, and we take a page from OpenEmbedded and setup a regular time-rolling video/audio chat.
And building off of something I had mentioned to you, yes, I do need to reach out to more people, more often, myself. So this is also an invitation to anyone else reading along and saying to themselves that I've missed something or I'm wrong about something or just need to tell me something, send me an email, and if you want to talk, we can schedule something. And I should email a number of people directly too, with that message.