[U-Boot] [PATCH] fw_printenv: Don't bail out directly after one env read error

From: Joe Hershberger joe.hershberger@ni.com
When using a redundant environment a read error should simply mean to not use that copy instead of giving up completely. The other copy may be just fine.
Signed-off-by: Joe Hershberger joe.hershberger@ni.com Signed-off-by: Ioan-Adrian Ratiu adrian.ratiu@ni.com --- tools/env/fw_env.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-)
diff --git a/tools/env/fw_env.c b/tools/env/fw_env.c index a5d75958e1..3a5ad026f0 100644 --- a/tools/env/fw_env.c +++ b/tools/env/fw_env.c @@ -1427,14 +1427,21 @@ int fw_env_open(struct env_opts *opts) }
dev_current = 0; - if (flash_io(O_RDONLY)) { + + if (!flash_io(O_RDONLY)) { + crc0 = crc32(0, (uint8_t *)environment.data, ENV_SIZE); + crc0_ok = (crc0 == *environment.crc); + } else if (have_redund_env) { + /* + * to give the redundant env a chance, maybe it's good: + * mark env crc0 invalid then test below if crc1 is ok + */ + crc0_ok = 0; + } else { ret = -EIO; goto open_cleanup; }
- crc0 = crc32(0, (uint8_t *)environment.data, ENV_SIZE); - - crc0_ok = (crc0 == *environment.crc); if (!have_redund_env) { if (!crc0_ok) { fprintf(stderr, @@ -1462,8 +1469,10 @@ int fw_env_open(struct env_opts *opts) */ environment.image = addr1; if (flash_io(O_RDONLY)) { - ret = -EIO; - goto open_cleanup; + crc1_ok = 0; + } else { + crc1 = crc32(0, (uint8_t *)redundant->data, ENV_SIZE); + crc1_ok = (crc1 == redundant->crc); }
/* Check flag scheme compatibility */ @@ -1489,9 +1498,6 @@ int fw_env_open(struct env_opts *opts) goto open_cleanup; }
- crc1 = crc32(0, (uint8_t *)redundant->data, ENV_SIZE); - - crc1_ok = (crc1 == redundant->crc); flag1 = redundant->flags;
if (crc0_ok && !crc1_ok) {

On Tue, Jun 26, 2018 at 4:37 AM, Ioan-Adrian Ratiu adrian.ratiu@ni.com wrote:
From: Joe Hershberger joe.hershberger@ni.com
When using a redundant environment a read error should simply mean to not use that copy instead of giving up completely. The other copy may be just fine.
Signed-off-by: Joe Hershberger joe.hershberger@ni.com Signed-off-by: Ioan-Adrian Ratiu adrian.ratiu@ni.com
Hey Tom, can you pull this in?
Thanks, -Joe
tools/env/fw_env.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-)
diff --git a/tools/env/fw_env.c b/tools/env/fw_env.c index a5d75958e1..3a5ad026f0 100644 --- a/tools/env/fw_env.c +++ b/tools/env/fw_env.c @@ -1427,14 +1427,21 @@ int fw_env_open(struct env_opts *opts) }
dev_current = 0;
if (flash_io(O_RDONLY)) {
if (!flash_io(O_RDONLY)) {
crc0 = crc32(0, (uint8_t *)environment.data, ENV_SIZE);
crc0_ok = (crc0 == *environment.crc);
} else if (have_redund_env) {
/*
* to give the redundant env a chance, maybe it's good:
* mark env crc0 invalid then test below if crc1 is ok
*/
crc0_ok = 0;
} else { ret = -EIO; goto open_cleanup; }
crc0 = crc32(0, (uint8_t *)environment.data, ENV_SIZE);
crc0_ok = (crc0 == *environment.crc); if (!have_redund_env) { if (!crc0_ok) { fprintf(stderr,
@@ -1462,8 +1469,10 @@ int fw_env_open(struct env_opts *opts) */ environment.image = addr1; if (flash_io(O_RDONLY)) {
ret = -EIO;
goto open_cleanup;
crc1_ok = 0;
} else {
crc1 = crc32(0, (uint8_t *)redundant->data, ENV_SIZE);
crc1_ok = (crc1 == redundant->crc); } /* Check flag scheme compatibility */
@@ -1489,9 +1498,6 @@ int fw_env_open(struct env_opts *opts) goto open_cleanup; }
crc1 = crc32(0, (uint8_t *)redundant->data, ENV_SIZE);
crc1_ok = (crc1 == redundant->crc); flag1 = redundant->flags; if (crc0_ok && !crc1_ok) {
-- 2.17.1
U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot

Dear Joe,
In message CANr=Z=atFzdNO6gNhMgopCHaQ-KXPGMfaOz2+_KCVrKwkMOhuw@mail.gmail.com you wrote:
When using a redundant environment a read error should simply mean to not use that copy instead of giving up completely. The other copy may be just fine.
Signed-off-by: Joe Hershberger joe.hershberger@ni.com Signed-off-by: Ioan-Adrian Ratiu adrian.ratiu@ni.com
Hey Tom, can you pull this in?
NO! Please don't!!
NAK!!
This patch can lead to reading incorrect (old, no longer valid) values without any way for the user to see what is happening.
This must not be done!
Best regards,
Wolfgang Denk

On Fri, Jun 29, 2018 at 12:57:45PM +0200, Wolfgang Denk wrote:
Dear Joe,
In message CANr=Z=atFzdNO6gNhMgopCHaQ-KXPGMfaOz2+_KCVrKwkMOhuw@mail.gmail.com you wrote:
When using a redundant environment a read error should simply mean to not use that copy instead of giving up completely. The other copy may be just fine.
Signed-off-by: Joe Hershberger joe.hershberger@ni.com Signed-off-by: Ioan-Adrian Ratiu adrian.ratiu@ni.com
Hey Tom, can you pull this in?
NO! Please don't!!
NAK!!
This patch can lead to reading incorrect (old, no longer valid) values without any way for the user to see what is happening.
This must not be done!
I'm not 100% sure, after reading all of the code, if there's a problem. What we indeed do not want to do is be silent in seeing that the first environment location we read from failed. But AFAICT if flash_io returns non-zero we also output something useful to stderr, so it should be visible to the user that something went wrong. The next question is, if half of the redundant environment has failed, is the other half considered valid (so long as the crc passes) or would only the built-in be valid? I would think the other half is the valid one.

Dear Tom,
In message 20180629181550.GG18596@bill-the-cat you wrote:
This patch can lead to reading incorrect (old, no longer valid) values without any way for the user to see what is happening.
This must not be done!
I'm not 100% sure, after reading all of the code, if there's a problem.
There is.
What we indeed do not want to do is be silent in seeing that the first environment location we read from failed. But AFAICT if flash_io returns non-zero we also output something useful to stderr, so it should be visible to the user that something went wrong.
But there is no indication about the problem in the return code, so any tools using this have no way to determine there was a problem.
The next question is, if half of the redundant environment has failed, is the other half considered valid (so long as the crc passes) or would only the built-in be valid? I would think the other half is the valid one.
It may be valid, but it is still not what we want.
redundant environment is used to provide a fall-back in case of poblems when _writing_ the environment, i. e. like a power-off while a "saveenv" is running. The purpose is to make sute that even then we will always have a valid environment, with correct checksum.
But as soon as the saveenv has succesfully completed, we do not have to equal, exchangable copies - instead, we have one which holds the current data (and only this is what represents the "correct" state), and another copy, which holds the old state, i. e. which is obsolete.
Sielntly swapping the current and the obsolete data is a serious bug, which must never happen. It may be OK as part of some recovery procedure to limit the damage, but it must never ever go unnoticed.
Assume you switch to another boot device / boot partion as part of a system update procedure. Then booting the wrong system is definitely not what you want.
Please revert!
Best regards,
Wolfgang Denk

On Sat, Jun 30, 2018 at 12:35:49PM +0200, Wolfgang Denk wrote:
Dear Tom,
In message 20180629181550.GG18596@bill-the-cat you wrote:
This patch can lead to reading incorrect (old, no longer valid) values without any way for the user to see what is happening.
This must not be done!
I'm not 100% sure, after reading all of the code, if there's a problem.
There is.
What we indeed do not want to do is be silent in seeing that the first environment location we read from failed. But AFAICT if flash_io returns non-zero we also output something useful to stderr, so it should be visible to the user that something went wrong.
But there is no indication about the problem in the return code, so any tools using this have no way to determine there was a problem.
Ah, true, thanks!
The next question is, if half of the redundant environment has failed, is the other half considered valid (so long as the crc passes) or would only the built-in be valid? I would think the other half is the valid one.
It may be valid, but it is still not what we want.
redundant environment is used to provide a fall-back in case of poblems when _writing_ the environment, i. e. like a power-off while a "saveenv" is running. The purpose is to make sute that even then we will always have a valid environment, with correct checksum.
But as soon as the saveenv has succesfully completed, we do not have to equal, exchangable copies - instead, we have one which holds the current data (and only this is what represents the "correct" state), and another copy, which holds the old state, i. e. which is obsolete.
Sielntly swapping the current and the obsolete data is a serious bug, which must never happen. It may be OK as part of some recovery procedure to limit the damage, but it must never ever go unnoticed.
Assume you switch to another boot device / boot partion as part of a system update procedure. Then booting the wrong system is definitely not what you want.
Please revert!
Joe, Ioan-Adrian, I've reverted this. Please re-work the changes such that the use case you need to support is handled in some manner (likely a new flag being passed to fw_printenv/setenv) such that the current behavior is still the default, given Wolfgang's explanations above, thanks!

On Tue, Jun 26, 2018 at 12:37:59PM +0300, Ioan-Adrian Ratiu wrote:
From: Joe Hershberger joe.hershberger@ni.com
When using a redundant environment a read error should simply mean to not use that copy instead of giving up completely. The other copy may be just fine.
Signed-off-by: Joe Hershberger joe.hershberger@ni.com Signed-off-by: Ioan-Adrian Ratiu adrian.ratiu@ni.com
Applied to u-boot/master, thanks!

Dear Tom,
In message 20180628022059.GR4609@bill-the-cat.ec.rr.com you wrote:
From: Joe Hershberger joe.hershberger@ni.com =20 When using a redundant environment a read error should simply mean to not use that copy instead of giving up completely. The other copy may be just fine. =20 Signed-off-by: Joe Hershberger joe.hershberger@ni.com Signed-off-by: Ioan-Adrian Ratiu adrian.ratiu@ni.com
Applied to u-boot/master, thanks!
Please revert. This is introducing a source for nasty, almost impossible to detect bugs!
Read errors must never be ignored.
Best regards,
Wolfgang Denk

Dear Adrian,
In message 20180626093759.24018-1-adrian.ratiu@ni.com you wrote:
From: Joe Hershberger joe.hershberger@ni.com
When using a redundant environment a read error should simply mean to not use that copy instead of giving up completely. The other copy may be just fine.
While the general idea is fine, I think we should NOT automatically read data from the backup copy, at least not without clearly letting the user know about this - and such notification should also work in automated scripts or cod calling these routines, so a plain warning message is NOT sufficient.
I suggest that the default remains as is: environment read errors cause an error return of this function.
But it would probably nice for recovery purposes or such to add an option to switch into some "permissive" mode - here the fall-back to the redundant copy would be permitted, and the return code should indicate what happened (read of primary env copy OK; read failed, but redundant copy could ne read; all read attemtps failed).
Thanks!
Best regards,
Wolfgang Denk
participants (4)
-
Ioan-Adrian Ratiu
-
Joe Hershberger
-
Tom Rini
-
Wolfgang Denk