
On 12/10/2022 16.52, Tom Rini wrote:
Option 1 has the benefit that we don't do any of the blob handling, so it just dies right away. Perhaps this is a philosophical question, but it is a little subtle and I'm not actually sure people would notice the difference so long as they get the errors as expected.
The way I'm thinking of it is there's two cases. The first case is someone who is testing on the hardware that requires these files. Stop ASAP because they either will know they forgot to pass X/Y/Z files or they'll re-read the board instructions page and see oh, they need to grab binaries X/Y/Z too. Waiting to collect up missing file errors doesn't save time.
Indeed. If I forgot to put lpddr4_pmu_train_1d_dmem_202006.bin in my build folder, it is extremely unlikely I wouldn't remember to do the other three lpddr*.bin files at the same time. And even if there's some fifth file I also forgot, re-running make (yes, make) after putting the lpddr*.bin files in place doesn't take long since the whole tree is already built. I don't think any board really needs more than a handful of blobs (hopefully), and certainly not more than from a few sources (i.e., I count the imx8 lpddr*.bin as one), so there's really practically not much difference between "stop as soon as one is missing" or "give an error at the end and print all the missing stuff".
Personally, I usually prefer the "stop at first fatal error" model, because that makes it easier to see the actual problem - some of the follow-on errors/warnings could be due the first problem, and sometimes not stopping on that first error means the same problem gets printed over and over, making it hard to scroll back to find the first occurrence. Somewhat hand-wavy, yes, and I can't give any good examples.
The counter problem is this isn't the first time someone has come up and noted how much time they've wasted because we defaulted to fake binaries. I think we've optimized too much for the people that build a thousand configs all the time (us) instead of the people that build for 1 or two configs at a time (most people?).
To that end, I am really curious what Rasmus has to say here, or anyone else that has a different workflow from you and I.
Indeed my workflow never involves building a thousand configs, I leave that to upstream's CI.
I have roughly four different ways of building, all of them must fail if the resulting binary is known to be unusable (and I think we all agree on that part), and preferably without having to pass special flags to however the build is done (i.e., failing must be default, and I also think we're in agreement there), because otherwise I know those flags would be missed sometimes. Just to enumerate:
(1) I do local development, building in my xterm, and testing on target either by booting the binary with uuu or (for some other boards/SOCs) scp'ing it to target or however it can easily be deployed and tested.
(2) Once this has stabilized, I update our bitbake metadata to point at the new branch/commit, and do a local build with yocto (which is always done inside a docker container based on a specific image that we and our customers agree on using). That primarily catches things like missing host tools or libraries that I may happen to have on my dev machine but which are not in the docker image, or not exposed by Yocto. This can then either mean the recipe needs to grow a new DEPENDS, or (which is thankfully pretty rare) our docker image definition needs to be updated.
(3) When I can build with Yocto, it's time for my customer's CI to chew on it. Depending on the project, that sometimes involves automatically deploying the new bootloader to a test rack - which is why it's so important that we do not pass a build if the binary is known-broken.
(4) [And we're not here yet, but pretty close, which is why I've been rather active pushing stuff upstream the past few months] We want to have a CI job set up for automatically merging upstream master into our private branch, at least every -rc release, build test that and if successful, deploy it to target; if not (or if the merge cannot be done automatically in the first place), send an email so we're aware of it before the next release happens. So far, I've found three bugs in v2022.10 that could have been avoided (i.e. fixed before release) if we/I had this in place.
Since I'm writing this wall of text anyway, let me explain how I was bitten by the build not failing: I had added a new blob requirement (not a "proprietary" thing, just changing the logic so that the boot script is included in the u-boot fit image as a "loadable" and thus automatically loaded to a known location by SPL, instead of having U-Boot itself load it from somewhere), and locally added that bootscript.itb to my build folder when testing; I had also duly updated the bitbake recipe to copy bootscript.itb to "${B}" before do_compile, but failed to remember that that was not the right place to put it because the actual build folder is "${B}/<config name>". My own yocto build succeeded, I deployed the binary to the board on my desk, and it didn't work... Only then did I go look in do_compile.log and found the warning(s).
Rasmus