Broken build on OpenBSD

older
[PATCH v5 0/3] Move qfw to DM, add...

Mark Kettenis

23 Feb 2021 23 Feb '21

8:07 p.m.

Hi Simon,

Commit c5819701a3de61e2ba2ef7ad0b616565b32305e5 broke the build on OpenBSD and probably other non-Linux systems. ENODATA, which is now used in fit_check_format(), isn't defined. It isn't part of POSIX[1] and generally not available on BSD-derived systems. Could you pick another error code for this case?

Thanks,

Mark

[1] It is mentioned in the Open Group Base Specification, however it is part of the obsolete XSI STREAMS extension which was never part of POSIX proper.

Show replies by date

Alex G.

23 Feb 23 Feb

8:48 p.m.

On 2/23/21 1:07 PM, Mark Kettenis wrote:

...

Hi Simon,

Commit c5819701a3de61e2ba2ef7ad0b616565b32305e5 broke the build on OpenBSD and probably other non-Linux systems. ENODATA, which is now used in fit_check_format(), isn't defined. It isn't part of POSIX[1] and generally not available on BSD-derived systems. Could you pick another error code for this case?

Hi Mark,

I looked at the commit you mentioned, and I think it's fundamentally broken. The errors represent -EINVAL, and trying to assign different error codes doesn't make sense.

"Wrong FIT format: no images parent node": -ENOENT "No such file or directory". This just doesn't make sense. We obviously have the file data at this point, and we know the data is wrong. This should be -EINVAL.

"Wrong FIT format: no description": -ENOMSG "No message of desired type". Again, this doesn't make sense. We're not dealing with messaging APIs or send()/recv(). I think this should be -EINVAL.

"Wrong FIT format: not a flattened device tree": -ENOEXEC "Exec format error" This one is amusing, as it's comparing a flattened devicetree to an executable. An FDT might have executable code, which is in the wrong format, but this is not why we're failing here.

Simon, I'd suggest using the correct error code, which, for each case is -EINVAL, as the log messages also confirm: "Wrong [input value] format". We might have issues with the "configurations", an "@" in a signature name, and so forth. There just aren't enough error codes to cover the set of possible failures. And in any case, there likely can't be a reasonable 1:1 mapping to _distinct_ errno codes.

Does any user even check the error code beyond "less than zero"? Take different decisions based on what the negative code indicates? If information as to what is wrong with the input value (FIT) is needed, then I'd suggest using a separate enum, and stick to -EINVAL.

Alex

Simon Glass

10:18 p.m.

Hi Alex,

On Tue, 23 Feb 2021 at 14:48, Alex G. mr.nuke.me@gmail.com wrote:

...

On 2/23/21 1:07 PM, Mark Kettenis wrote:

...
Hi Simon,

Commit c5819701a3de61e2ba2ef7ad0b616565b32305e5 broke the build on OpenBSD and probably other non-Linux systems. ENODATA, which is now used in fit_check_format(), isn't defined. It isn't part of POSIX[1] and generally not available on BSD-derived systems. Could you pick another error code for this case?

Hi Mark,

I looked at the commit you mentioned, and I think it's fundamentally broken. The errors represent -EINVAL, and trying to assign different error codes doesn't make sense.

"Wrong FIT format: no images parent node": -ENOENT "No such file or directory". This just doesn't make sense. We obviously have the file data at this point, and we know the data is wrong. This should be -EINVAL.

"Wrong FIT format: no description": -ENOMSG "No message of desired type". Again, this doesn't make sense. We're not dealing with messaging APIs or send()/recv(). I think this should be -EINVAL.

"Wrong FIT format: not a flattened device tree": -ENOEXEC "Exec format error" This one is amusing, as it's comparing a flattened devicetree to an executable. An FDT might have executable code, which is in the wrong format, but this is not why we're failing here.

Simon, I'd suggest using the correct error code, which, for each case is -EINVAL, as the log messages also confirm: "Wrong [input value] format". We might have issues with the "configurations", an "@" in a signature name, and so forth. There just aren't enough error codes to cover the set of possible failures. And in any case, there likely can't be a reasonable 1:1 mapping to _distinct_ errno codes.

Does any user even check the error code beyond "less than zero"? Take different decisions based on what the negative code indicates? If information as to what is wrong with the input value (FIT) is needed, then I'd suggest using a separate enum, and stick to -EINVAL.

Actually I make an effort to use different codes where possible, so there is some indication what went wrong. Of course devs can whip out the JTAG debugger or start filling the code with printf()s but normal users cannot, so having an idea what is wrong is helpful.

We don't have to cover every case, but years ago U-Boot used to return -1 for lots of failures and it was certainly frustrating to debug things.

BTW -EINVAL is mostly reserved for of_to_plat() failure in U-Boot. It indicates something is wrong with your devicetree data for a device.

Regards, Simon

Alex G.

11:23 p.m.

On 2/23/21 3:18 PM, Simon Glass wrote:

...

Hi Alex,

On Tue, 23 Feb 2021 at 14:48, Alex G. mr.nuke.me@gmail.com wrote:

...
On 2/23/21 1:07 PM, Mark Kettenis wrote:

...
Hi Simon,

Commit c5819701a3de61e2ba2ef7ad0b616565b32305e5 broke the build on OpenBSD and probably other non-Linux systems. ENODATA, which is now used in fit_check_format(), isn't defined. It isn't part of POSIX[1] and generally not available on BSD-derived systems. Could you pick another error code for this case?

Hi Mark,

I looked at the commit you mentioned, and I think it's fundamentally broken. The errors represent -EINVAL, and trying to assign different error codes doesn't make sense.

"Wrong FIT format: no images parent node": -ENOENT "No such file or directory". This just doesn't make sense. We obviously have the file data at this point, and we know the data is wrong. This should be -EINVAL.

"Wrong FIT format: no description": -ENOMSG "No message of desired type". Again, this doesn't make sense. We're not dealing with messaging APIs or send()/recv(). I think this should be -EINVAL.

"Wrong FIT format: not a flattened device tree": -ENOEXEC "Exec format error" This one is amusing, as it's comparing a flattened devicetree to an executable. An FDT might have executable code, which is in the wrong format, but this is not why we're failing here.

Simon, I'd suggest using the correct error code, which, for each case is -EINVAL, as the log messages also confirm: "Wrong [input value] format". We might have issues with the "configurations", an "@" in a signature name, and so forth. There just aren't enough error codes to cover the set of possible failures. And in any case, there likely can't be a reasonable 1:1 mapping to _distinct_ errno codes.

Does any user even check the error code beyond "less than zero"? Take different decisions based on what the negative code indicates? If information as to what is wrong with the input value (FIT) is needed, then I'd suggest using a separate enum, and stick to -EINVAL.

Actually I make an effort to use different codes where possible, so there is some indication what went wrong. Of course devs can whip out the JTAG debugger or start filling the code with printf()s but normal users cannot, so having an idea what is wrong is helpful.

We don't have to cover every case, but years ago U-Boot used to return -1 for lots of failures and it was certainly frustrating to debug things.

I agree with most of these arguments. And I agree with using errno codes to represent errno codes. However, when we deviate from the agreed upon convention, can we still apply the said convention? Each function acquires its own set of rules. And when each function has its own set of rules, the source code is needed to derive the meaning.

You make the argument that these codes give normal users an idea of what is wrong. I assume that normal users respond better to human-readable strings than to negative integers -- for which they would have to go to he source code anyway to decipher the meaning. Because, in order to be useful, error codes require the, they cannot be useful for normal users.

I believe this rebukes your central point around the unconventional use of errno codes.

So then the question is how to cover error cases without returning '-1', and without making things a nightmare to debug.

If you need to tell the user that there are "no images parent node", then tell the user -ENOFDTIMAGESNODE, or FIT_ERROR_NO_IMAGES_NODE. How can someone know that -ENOENT really comes from fit_check_format() instead of the FAT code, and really means "FIT has no images node" instead of "there is no FIT file"? I guess we could bust out the old JTAG to check.

...

BTW -EINVAL is mostly reserved for of_to_plat() failure in U-Boot. It indicates something is wrong with your devicetree data for a device.

Reserving -EINVAL for a special class of input value errors, but not others is breaking convention, so all my arguments above apply.

Alex

Simon Glass

25 Feb 25 Feb

8:31 p.m.

Hi Alex,

On Tue, 23 Feb 2021 at 17:23, Alex G. mr.nuke.me@gmail.com wrote:

...

On 2/23/21 3:18 PM, Simon Glass wrote:

...
Hi Alex,

On Tue, 23 Feb 2021 at 14:48, Alex G. mr.nuke.me@gmail.com wrote:

...
On 2/23/21 1:07 PM, Mark Kettenis wrote:

...
Hi Simon,

Commit c5819701a3de61e2ba2ef7ad0b616565b32305e5 broke the build on OpenBSD and probably other non-Linux systems. ENODATA, which is now used in fit_check_format(), isn't defined. It isn't part of POSIX[1] and generally not available on BSD-derived systems. Could you pick another error code for this case?

Hi Mark,

I looked at the commit you mentioned, and I think it's fundamentally broken. The errors represent -EINVAL, and trying to assign different error codes doesn't make sense.

"Wrong FIT format: no images parent node": -ENOENT "No such file or directory". This just doesn't make sense. We obviously have the file data at this point, and we know the data is wrong. This should be -EINVAL.

"Wrong FIT format: no description": -ENOMSG "No message of desired type". Again, this doesn't make sense. We're not dealing with messaging APIs or send()/recv(). I think this should be -EINVAL.

"Wrong FIT format: not a flattened device tree": -ENOEXEC "Exec format error" This one is amusing, as it's comparing a flattened devicetree to an executable. An FDT might have executable code, which is in the wrong format, but this is not why we're failing here.

Simon, I'd suggest using the correct error code, which, for each case is -EINVAL, as the log messages also confirm: "Wrong [input value] format". We might have issues with the "configurations", an "@" in a signature name, and so forth. There just aren't enough error codes to cover the set of possible failures. And in any case, there likely can't be a reasonable 1:1 mapping to _distinct_ errno codes.

Does any user even check the error code beyond "less than zero"? Take different decisions based on what the negative code indicates? If information as to what is wrong with the input value (FIT) is needed, then I'd suggest using a separate enum, and stick to -EINVAL.

Actually I make an effort to use different codes where possible, so there is some indication what went wrong. Of course devs can whip out the JTAG debugger or start filling the code with printf()s but normal users cannot, so having an idea what is wrong is helpful.

We don't have to cover every case, but years ago U-Boot used to return -1 for lots of failures and it was certainly frustrating to debug things.

I agree with most of these arguments. And I agree with using errno codes to represent errno codes. However, when we deviate from the agreed upon convention, can we still apply the said convention? Each function acquires its own set of rules. And when each function has its own set of rules, the source code is needed to derive the meaning.

+Tom Rini too

I am not a fan of each function having its own rules, but the overall U-Boot rules are very broad. Something like:

-EPERM - the old -1 -ENOENT - entry or object not found -EIO - failed to perform I/O -ENXIO - couldn't find device/address -EAGAIN - try later (e.g. dependencies not ready) -ENOMEM - out of memory -EINVAL - dev_read_...() failed -ENODEV - do not bind device -ENOSPC - ran out of space -EREMOTEIO - cannot talk to peripheral, e.g. i2c -EPFNOSUPPORT - missing uclass

There are others. If we only used these it would not be much better than using -1 / -EPERM.

...

You make the argument that these codes give normal users an idea of what is wrong. I assume that normal users respond better to human-readable strings than to negative integers -- for which they would have to go to he source code anyway to decipher the meaning. Because, in order to be useful, error codes require the, they cannot be useful for normal users.

The problem is that we don't want to print strings willy nilly in drivers and library code. That makes it manageable for higher-level code that is actually in control. If an object is not found and an error is reported, but the caller knows it is optional and continues, the user sees the error and assumes something is wrong. Better to print errors just in the top-level code. At that point all we have is the error number. As things are today, top-level code can use that to print useful messages if it wants to, but if all the numbers returned are the same, it can't.

...

I believe this rebukes your central point around the unconventional use of errno codes.

To the extent that it is unconventional, that reflects the decision to avoid adding U-Boot-specific error numbers and perhaps also to avoid having a different error number for each possible failure in U-Boot.

...

So then the question is how to cover error cases without returning '-1', and without making things a nightmare to debug.

If you need to tell the user that there are "no images parent node", then tell the user -ENOFDTIMAGESNODE, or FIT_ERROR_NO_IMAGES_NODE. How can someone know that -ENOENT really comes from fit_check_format() instead of the FAT code, and really means "FIT has no images node" instead of "there is no FIT file"? I guess we could bust out the old JTAG to check.

For the case you mention, the FIT code would have to call into FAT, or vice versa, otherwise the top-level code would see the two errors separately, one after calling FIT and another after calling FAT. Typically subsystems are not linked that deeply. So in practice the current scheme works fairly well. Also, subsystems to have the option to change the error code on the way back up the stack. For example, libfdt functions return their own error numbers and these are typically converted to ernno values.

We don't invent our own error numbers at present, nor even #define new names for existing ones. I can see some small appeal to the latter, but it does not obviate the need to comment a function's return value, so there is little benefit in it other than the name. We should avoid adding distinctions without a difference. When code needs to deal with errors from lower levels, a smaller set of well-known error numbers has some appeal.

...

...
BTW -EINVAL is mostly reserved for of_to_plat() failure in U-Boot. It indicates something is wrong with your devicetree data for a device.

Reserving -EINVAL for a special class of input value errors, but not others is breaking convention, so all my arguments above apply.

Perhaps -EDOM for generic invalid function args? But unless this comes from user data, it is likely a programming error so should not happen / is caught by tests.

Regards, Simon

Alex G.

26 Feb 26 Feb

8:01 p.m.

On 2/25/21 1:31 PM, Simon Glass wrote:

...

Hi Alex,

To the extent that it is unconventional, that reflects the decision to avoid adding U-Boot-specific error numbers and perhaps also to avoid having a different error number for each possible failure in U-Boot.

The set of errno codes is much smaller than the set of possible failures. It is objectively impossible to map the set of possible failures onto the set of errno codes. And that's why I think this decision is wrong.

The following arguments are subjective:

Compared to TF-A and OP-TEE, I find u-boot sources more difficult to work with. One of the reasons is that different parts have different idiosyncrasies. TF-A and OP-TEE are bad in their own ways, but they are at the very least, consistent wrt conventions of the C language. Now we're talking about every u-boot function potentially having its own semantics. This is going from bad to worse. And now the code is returning error codes that don't even make sense in context.

What you're describing (not quoted in this reply) is a mechanism to allow users to handle failures. We first need to define user and how the user interfaces with the software product. For example, is someone who presses the power button also expected to resolve storage media corruption? Only then can we spec out the requirements for this mechanism. We somehow have the solution to a problem that isn't properly defined yet.

This is a textbook example of when all you have is a hammer, everything looks like a nail.

Alex

Tom Rini

1 Mar 1 Mar

3:04 p.m.

On Fri, Feb 26, 2021 at 01:01:58PM -0600, Alex G. wrote:

...

On 2/25/21 1:31 PM, Simon Glass wrote:

...
Hi Alex,

To the extent that it is unconventional, that reflects the decision to avoid adding U-Boot-specific error numbers and perhaps also to avoid having a different error number for each possible failure in U-Boot.

The set of errno codes is much smaller than the set of possible failures. It is objectively impossible to map the set of possible failures onto the set of errno codes. And that's why I think this decision is wrong.

The following arguments are subjective:

Compared to TF-A and OP-TEE, I find u-boot sources more difficult to work with. One of the reasons is that different parts have different idiosyncrasies. TF-A and OP-TEE are bad in their own ways, but they are at the very least, consistent wrt conventions of the C language. Now we're talking about every u-boot function potentially having its own semantics. This is going from bad to worse. And now the code is returning error codes that don't even make sense in context.

What you're describing (not quoted in this reply) is a mechanism to allow users to handle failures. We first need to define user and how the user interfaces with the software product. For example, is someone who presses the power button also expected to resolve storage media corruption? Only then can we spec out the requirements for this mechanism. We somehow have the solution to a problem that isn't properly defined yet.

This is a textbook example of when all you have is a hammer, everything looks like a nail.

There's two different problems here. The first problem is that for user space tools (which is what this problem report is about), there's very well understood conventions and we need to follow that. We in fact (and this is hard) need to follow a slightly more reduced set of possible values than we might otherwise as some BSDs do not have all POSIX.1-2001 values _and_ it needs to be consistent with the general understanding of the values too.

The second problem is that we need to be internally consistent about what we use for error return codes, and what they mean, and doc/driver-model/design.rst needs an update. DM is the thing that ties all of the various subsystems together and leads to consistency between various similar functions.

Things get tricky when we (for generally good reason) share the same code between both cases.

-- Tom

Simon Glass

23 Feb 23 Feb

10:19 p.m.

+Tom Rini

Hi Mark,

On Tue, 23 Feb 2021 at 14:07, Mark Kettenis mark.kettenis@xs4all.nl wrote:

...

Hi Simon,

Commit c5819701a3de61e2ba2ef7ad0b616565b32305e5 broke the build on OpenBSD and probably other non-Linux systems. ENODATA, which is now used in fit_check_format(), isn't defined. It isn't part of POSIX[1] and generally not available on BSD-derived systems. Could you pick another error code for this case?

OK. I think this was reported already, but will take a look.

Are we able to get this into the CI system so there is a test for it?

Regards, Simon

Tom Rini

10:24 p.m.

On Tue, Feb 23, 2021 at 04:19:35PM -0500, Simon Glass wrote:

...

+Tom Rini

Hi Mark,

On Tue, 23 Feb 2021 at 14:07, Mark Kettenis mark.kettenis@xs4all.nl wrote:

...
Hi Simon,

Commit c5819701a3de61e2ba2ef7ad0b616565b32305e5 broke the build on OpenBSD and probably other non-Linux systems. ENODATA, which is now used in fit_check_format(), isn't defined. It isn't part of POSIX[1] and generally not available on BSD-derived systems. Could you pick another error code for this case?

OK. I think this was reported already, but will take a look.

Yeah, I think the suggestion was to use EBADMSG instead? I was hoping for a patch.

...

Are we able to get this into the CI system so there is a test for it?

I think it wasn't easy, or at least wasn't free, to enable a BSD host system in Azure. But if someone wants to donate a BSD host that we can plug in to GitLab for running a subset of builds on (I was thinking sandbox+pytest and host-tools perhaps) it would be greatly appreciated.

-- Tom

Tom Rini

1 Mar 1 Mar

3:02 p.m.

On Tue, Feb 23, 2021 at 08:07:21PM +0100, Mark Kettenis wrote:

...

Hi Simon,

Commit c5819701a3de61e2ba2ef7ad0b616565b32305e5 broke the build on OpenBSD and probably other non-Linux systems. ENODATA, which is now used in fit_check_format(), isn't defined. It isn't part of POSIX[1] and generally not available on BSD-derived systems. Could you pick another error code for this case?

Thanks,

Mark

[1] It is mentioned in the Open Group Base Specification, however it is part of the obsolete XSI STREAMS extension which was never part of POSIX proper.

Just for the record:

https://man7.org/linux/man-pages/man3/errno.3.html:

ENODATA No message is available on the STREAM head read queue (POSIX.1-2001).

So perhaps you want to also raise this with the Linux man pages project folks?

-- Tom

Mark Kettenis

7 Mar 7 Mar

2:48 p.m.

...

Date: Mon, 1 Mar 2021 09:02:18 -0500 From: Tom Rini trini@konsulko.com

On Tue, Feb 23, 2021 at 08:07:21PM +0100, Mark Kettenis wrote:

...
Hi Simon,

Commit c5819701a3de61e2ba2ef7ad0b616565b32305e5 broke the build on OpenBSD and probably other non-Linux systems. ENODATA, which is now used in fit_check_format(), isn't defined. It isn't part of POSIX[1] and generally not available on BSD-derived systems. Could you pick another error code for this case?

Thanks,

Mark

[1] It is mentioned in the Open Group Base Specification, however it is part of the obsolete XSI STREAMS extension which was never part of POSIX proper.

Just for the record:

https://man7.org/linux/man-pages/man3/errno.3.html:

ENODATA No message is available on the STREAM head read queue (POSIX.1-2001).

So perhaps you want to also raise this with the Linux man pages project folks?

Looks like somebody made a mistake and put the XSI STREAMS option marker for it on ENOBUFS just above.

I filed a bug report:

https://bugzilla.kernel.org/show_bug.cgi?id=212103

Cheers,

Mark

1523

Age (days ago)

1535

Last active (days ago)

List overview

Download

10 comments

4 participants

tags (0)

participants (4)

Alex G.
Mark Kettenis
Simon Glass
Tom Rini