
Hi Tom,
On Thu, 12 Aug 2021 at 07:48, Tom Rini trini@konsulko.com wrote:
On Thu, Aug 12, 2021 at 08:40:21AM +0200, Wolfgang Denk wrote:
Dear Tom,
In message 20210811124318.GT858@bill-the-cat you wrote:
This argument fits on all types or effors: they are supposed to never ever happen - at least in theory; in reality they do, and more often than we like.
And a proper error message is mandatory for correct error handling.
Error messages are a fine line to walk. We've got to have been very badly corrupted to go down this error path. There's going to be lots of other error messages popping out. Saving a bit of .text is good. It makes it easier to justify spending a little .text later.
Letting errors slip through unnoticed when there is a trival way to at least inform the user of the problem is simply unacceptable.
Please do not let U-Boot degrade into such a crappy piece of code.
There are tons of other places where we don't even mention code size, so if you want to save on that, there are many bette4r places to save than error handling.
Alright, lets take a look at what kind of area of the code we're talking about. uclass_get is a pretty fundamental thing. If that fails, your system is on fire. Things are massively corrupt. Lets look at other existing callers to see what happens. Most callers check the return code, like you need to, and pass it up the chain to deal with. We have a few board specific ones such as board/Marvell/octeontx2/board.c::board_quiesce_devices() that is also conceptually like the x530 case in the next part of the series. That does print on failure. The rest of the ones that print an error message are about commands and it's somewhat helpful there.
So yes, return codes need to be checked and passed. But no, not every single error path needs to print to the user along every part of an error path either.
And here I agree, catch an error code, pass the error code back to the caller. That's far more important than making sure that every error code we catch logs a message by default every time.
It does not matter where the error is reported - in the called function, or in some caller firther up the call tree. But it _must_ be reportet at least once.
So if we don't issue an error message here, we need to check and fix the callers, too.
That would be the next patch in the series where the BSP author isn't currently checking the return value, and this series doesn't change that. Perhaps it should, and CC the maintainer. But I think has been said a few times over the course of this series, what exactly is one going to do about the failure? Getting specific for a moment, if you're in the case of "shutdown the watchdog" and the watchdog doesn't shutdown like you want it to, do you hang and hope the watchdog is alive to kick things still? hang and lock the system? Figure the system is on fire anyhow but add another message to the failure spew?
Again, I think the change that's needed to this patch is to make it return the error code to the caller. Let the caller decide. And make sure to CC the board maintainer on the next go-round so they can chime in about that.
I strongly agree with this.I try to encourage people not to report errors inside drivers, since the caller should be able to deal with it, particularly if the error number provides useful information. It bloats the code.
But we do have these strange cases where there is no obvious thing to do. Where we are probing several devices to fire them up and one fails, people worry that this means that some of them won't get probed. In that case I tend to continue on and probe them all, but then return error if any of them failed. At least the caller knows.
Also I like the new log_ret() and log_msg_ret() functions, which can log an error as it goes up the stack if LOG_ERROR_RETURN is enabled. It is nice to be able to see where an error came from. We could perhaps improve this by logging the error when it is created (the first time some calls 'return -Exxx').
I'm not a fan of board code which calls a function and doesn't check the error. The board may not operate correctly, or it may limp along, but the board author should be able to get all the bugs of it at the start so that we are not requested invalid clocks, etc.
Regards, Simon