
Hi Alex,
On Tue, 23 Feb 2021 at 17:23, Alex G. mr.nuke.me@gmail.com wrote:
On 2/23/21 3:18 PM, Simon Glass wrote:
Hi Alex,
On Tue, 23 Feb 2021 at 14:48, Alex G. mr.nuke.me@gmail.com wrote:
On 2/23/21 1:07 PM, Mark Kettenis wrote:
Hi Simon,
Commit c5819701a3de61e2ba2ef7ad0b616565b32305e5 broke the build on OpenBSD and probably other non-Linux systems. ENODATA, which is now used in fit_check_format(), isn't defined. It isn't part of POSIX[1] and generally not available on BSD-derived systems. Could you pick another error code for this case?
Hi Mark,
I looked at the commit you mentioned, and I think it's fundamentally broken. The errors represent -EINVAL, and trying to assign different error codes doesn't make sense.
"Wrong FIT format: no images parent node": -ENOENT "No such file or directory". This just doesn't make sense. We obviously have the file data at this point, and we know the data is wrong. This should be -EINVAL.
"Wrong FIT format: no description": -ENOMSG "No message of desired type". Again, this doesn't make sense. We're not dealing with messaging APIs or send()/recv(). I think this should be -EINVAL.
"Wrong FIT format: not a flattened device tree": -ENOEXEC "Exec format error" This one is amusing, as it's comparing a flattened devicetree to an executable. An FDT might have executable code, which is in the wrong format, but this is not why we're failing here.
Simon, I'd suggest using the correct error code, which, for each case is -EINVAL, as the log messages also confirm: "Wrong [input value] format". We might have issues with the "configurations", an "@" in a signature name, and so forth. There just aren't enough error codes to cover the set of possible failures. And in any case, there likely can't be a reasonable 1:1 mapping to _distinct_ errno codes.
Does any user even check the error code beyond "less than zero"? Take different decisions based on what the negative code indicates? If information as to what is wrong with the input value (FIT) is needed, then I'd suggest using a separate enum, and stick to -EINVAL.
Actually I make an effort to use different codes where possible, so there is some indication what went wrong. Of course devs can whip out the JTAG debugger or start filling the code with printf()s but normal users cannot, so having an idea what is wrong is helpful.
We don't have to cover every case, but years ago U-Boot used to return -1 for lots of failures and it was certainly frustrating to debug things.
I agree with most of these arguments. And I agree with using errno codes to represent errno codes. However, when we deviate from the agreed upon convention, can we still apply the said convention? Each function acquires its own set of rules. And when each function has its own set of rules, the source code is needed to derive the meaning.
+Tom Rini too
I am not a fan of each function having its own rules, but the overall U-Boot rules are very broad. Something like:
-EPERM - the old -1 -ENOENT - entry or object not found -EIO - failed to perform I/O -ENXIO - couldn't find device/address -EAGAIN - try later (e.g. dependencies not ready) -ENOMEM - out of memory -EINVAL - dev_read_...() failed -ENODEV - do not bind device -ENOSPC - ran out of space -EREMOTEIO - cannot talk to peripheral, e.g. i2c -EPFNOSUPPORT - missing uclass
There are others. If we only used these it would not be much better than using -1 / -EPERM.
You make the argument that these codes give normal users an idea of what is wrong. I assume that normal users respond better to human-readable strings than to negative integers -- for which they would have to go to he source code anyway to decipher the meaning. Because, in order to be useful, error codes require the, they cannot be useful for normal users.
The problem is that we don't want to print strings willy nilly in drivers and library code. That makes it manageable for higher-level code that is actually in control. If an object is not found and an error is reported, but the caller knows it is optional and continues, the user sees the error and assumes something is wrong. Better to print errors just in the top-level code. At that point all we have is the error number. As things are today, top-level code can use that to print useful messages if it wants to, but if all the numbers returned are the same, it can't.
I believe this rebukes your central point around the unconventional use of errno codes.
To the extent that it is unconventional, that reflects the decision to avoid adding U-Boot-specific error numbers and perhaps also to avoid having a different error number for each possible failure in U-Boot.
So then the question is how to cover error cases without returning '-1', and without making things a nightmare to debug.
If you need to tell the user that there are "no images parent node", then tell the user -ENOFDTIMAGESNODE, or FIT_ERROR_NO_IMAGES_NODE. How can someone know that -ENOENT really comes from fit_check_format() instead of the FAT code, and really means "FIT has no images node" instead of "there is no FIT file"? I guess we could bust out the old JTAG to check.
For the case you mention, the FIT code would have to call into FAT, or vice versa, otherwise the top-level code would see the two errors separately, one after calling FIT and another after calling FAT. Typically subsystems are not linked that deeply. So in practice the current scheme works fairly well. Also, subsystems to have the option to change the error code on the way back up the stack. For example, libfdt functions return their own error numbers and these are typically converted to ernno values.
We don't invent our own error numbers at present, nor even #define new names for existing ones. I can see some small appeal to the latter, but it does not obviate the need to comment a function's return value, so there is little benefit in it other than the name. We should avoid adding distinctions without a difference. When code needs to deal with errors from lower levels, a smaller set of well-known error numbers has some appeal.
BTW -EINVAL is mostly reserved for of_to_plat() failure in U-Boot. It indicates something is wrong with your devicetree data for a device.
Reserving -EINVAL for a special class of input value errors, but not others is breaking convention, so all my arguments above apply.
Perhaps -EDOM for generic invalid function args? But unless this comes from user data, it is likely a programming error so should not happen / is caught by tests.
Regards, Simon