Re: Broken build on OpenBSD

25 Feb 2021


      Hi Alex,
On Tue, 23 Feb 2021 at 17:23, Alex G. mr.nuke.me@gmail.com wrote:
...
On 2/23/21 3:18 PM, Simon Glass wrote:
...
Hi Alex,
On Tue, 23 Feb 2021 at 14:48, Alex G. mr.nuke.me@gmail.com wrote:
...
On 2/23/21 1:07 PM, Mark Kettenis wrote:
...
Hi Simon,
Commit c5819701a3de61e2ba2ef7ad0b616565b32305e5 broke the build on
OpenBSD and probably other non-Linux systems.  ENODATA, which is now
used in fit_check_format(), isn't defined.  It isn't part of POSIX[1]
and generally not available on BSD-derived systems.  Could you pick
another error code for this case?
Hi Mark,
I looked at the commit you mentioned, and I think it's fundamentally
broken. The errors represent -EINVAL, and trying to assign different
error codes doesn't make sense.
"Wrong FIT format: no images parent node":
-ENOENT "No such file or directory".
This just doesn't make sense. We obviously have the file data at this
point, and we know the data is wrong. This should be -EINVAL.
"Wrong FIT format: no description":
-ENOMSG "No message of desired type".
Again, this doesn't make sense. We're not dealing with messaging APIs or
send()/recv(). I think this should be -EINVAL.
"Wrong FIT format: not a flattened device tree":
-ENOEXEC "Exec format error"
This one is amusing, as it's comparing a flattened devicetree to an
executable. An FDT might have executable code, which is in the wrong
format, but this is not why we're failing here.
Simon,
I'd suggest using the correct error code, which, for each case is
-EINVAL, as the log messages also confirm: "Wrong [input value] format".
We might have issues with the "configurations", an "@" in a signature
name, and so forth. There just aren't enough error codes to cover the
set of possible failures. And in any case, there likely can't be a
reasonable 1:1 mapping to _distinct_ errno codes.
Does any user even check the error code beyond "less than zero"? Take
different decisions based on what the negative code indicates? If
information as to what is wrong with the input value (FIT) is needed,
then I'd suggest using a separate enum, and stick to -EINVAL.
Actually I make an effort to use different codes where possible, so
there is some indication what went wrong. Of course devs can whip out
the JTAG debugger or start filling the code with printf()s but normal
users cannot, so having an idea what is wrong is helpful.
We don't have to cover every case, but years ago U-Boot used to return
-1 for lots of failures and it was certainly frustrating to debug
things.
I agree with most of these arguments. And I agree with using errno codes
to represent errno codes. However, when we deviate from the agreed upon
convention, can we still apply the said convention? Each function
acquires its own set of rules. And when each function has its own set of
rules, the source code is needed to derive the meaning.
+Tom Rini too
I am not a fan of each function having its own rules, but the overall
U-Boot rules are very broad. Something like:
-EPERM - the old -1
-ENOENT - entry or object not found
-EIO - failed to perform I/O
-ENXIO - couldn't find device/address
-EAGAIN - try later (e.g. dependencies not ready)
-ENOMEM - out of memory
-EINVAL - dev_read_...() failed
-ENODEV - do not bind device
-ENOSPC - ran out of space
-EREMOTEIO - cannot talk to peripheral, e.g. i2c
-EPFNOSUPPORT - missing uclass
There are others. If we only used these it would not be much better
than using -1 / -EPERM.
...
You make the argument that these codes give normal users an idea of what
is wrong. I assume that normal users respond better to human-readable
strings than to negative integers -- for which they would have to go to
he source code anyway to decipher the meaning. Because, in order to be
useful, error codes require the, they cannot be useful for normal users.
The problem is that we don't want to print strings willy nilly in
drivers and library code. That makes it manageable for higher-level
code that is actually in control. If an object is not found and an
error is reported, but the caller knows it is optional and continues,
the user sees the error and assumes something is wrong. Better to
print errors just in the top-level code. At that point all we have is
the error number. As things are today, top-level code can use that to
print useful messages if it wants to, but if  all the numbers returned
are the same, it can't.
...
I believe this rebukes your central point around the unconventional use
of errno codes.
To the extent that it is unconventional, that reflects the decision to
avoid adding U-Boot-specific error numbers and perhaps also to avoid
having a different error number for each possible failure in U-Boot.
...
So then the question is how to cover error cases without returning '-1',
and without making things a nightmare to debug.
If you need to tell the user that there are "no images parent node",
then tell the user -ENOFDTIMAGESNODE, or FIT_ERROR_NO_IMAGES_NODE. How
can someone know that -ENOENT really comes from fit_check_format()
instead of the FAT code, and really means "FIT has no images node"
instead of "there is no FIT file"? I guess we could  bust out the old
JTAG to check.
For the case you mention, the FIT code would have to call into FAT, or
vice versa, otherwise the top-level code would see the two errors
separately, one after calling FIT and another after calling FAT.
Typically subsystems are not linked that deeply. So in practice the
current scheme works fairly well. Also, subsystems to have the option
to change the error code on the way back up the stack. For example,
libfdt functions return their own error numbers and these are
typically converted to ernno values.
We don't invent our own error numbers at present, nor even #define new
names for existing ones. I can see some small appeal to the latter,
but it does not obviate the need to comment a function's return value,
so there is little benefit in it other than the name. We should avoid
adding distinctions without a difference. When code needs to deal with
errors from lower levels, a smaller set of well-known error numbers
has some appeal.
...
...
BTW -EINVAL is mostly reserved for of_to_plat() failure in U-Boot. It
indicates something is wrong with your devicetree data for a device.
Reserving -EINVAL for a special class of input value errors, but not
others is breaking convention, so all my arguments above apply.
Perhaps -EDOM for generic invalid function args? But unless this comes
from user data, it is likely a programming error so should not happen
/ is caught by tests.
Regards,
Simon