[U-Boot-Users] Redundant environment expected behavior vs current

Hi,
I have an inquiry regarding how the U-Boot environment redundancy is supposed to work.
For the new embedded board I have decided to implement redundant u-boot environment to make it more resilient against corruption. I had not done this before so I did not know how it was actually implemented.
I've found that redundant environment does not really create an exact duplicate environment (as far as environment variables are concerned). Instead, the active environment is rotated each save and only the new active environment copy contains the changes to the environment.
So, at this stage if your active environment is lost/corrupted your latest changes to the environment is lost as well which might be important to boot your system. The idea behind redundancy (IMHO) is such that if one environment is lost the backup can provide all that was in active but in current implementation it is not be possible simply using one saveenv command. To get truly redundant environment that is exactly duplicate, you are supposed to save the environment twice.
I personally think this is not quite how redundant environment should be implemented. I think once the update of one environment is completed, second environment should be updated with the same. What do you think? Am I the only one that expect complete sync of both sectors? Should I submit a patch?
Best regards, Tolunay
Other cosmetic issues noticed:
1) Defining CFG_ENV_ADDR_REDUND or CFG_ENV_OFFSET_REDUND results in definition of CFG_REDUNDAND_ENVIRONMENT. I might be nitpicking but the correct spelling should have been CFG_REDUNDANT_ENVIRONMENT
2) The output of saveenv intermixed with output from flash driver is looking rather untidy and confusing. For example, ". done" below is coming from "driver/cfi_flash.c". We get ". done" right after "Saving Environment to Flash..." message and other messages follow. However, the first ". done" was really for unprotect which is reported after. I think like U-Boot does for Erasing part, we should announce the operation first (i.e. Un-protecting ... Protecting...) so flash driver output can match the current operation properly.
=> saveenv Saving Environment to Flash... . done Un-Protected 1 sectors . done Un-Protected 1 sectors Erasing Flash... . done Erased 1 sectors Writing to Flash... done . done Protected 1 sectors . done Protected 1 sectors

In message 444E7C7D.2060106@orkun.us you wrote:
I've found that redundant environment does not really create an exact duplicate environment (as far as environment variables are concerned). Instead, the active environment is rotated each save and only the new active environment copy contains the changes to the environment.
Correct. That's how it is supposed to work.
So, at this stage if your active environment is lost/corrupted your latest changes to the environment is lost as well which might be important to boot your system. The idea behind redundancy (IMHO) is such that if one environment is lost the backup can provide all that was in active but in current implementation it is not be possible simply using one saveenv command. To get truly redundant environment that is exactly duplicate, you are supposed to save the environment twice.
Corruption happens usually only because you have a power loss or reset or crash when writing the new environment. In this case you keep the old, known to be working state.
Redundant environment implements something like an atomic transaction.
I personally think this is not quite how redundant environment should be implemented. I think once the update of one environment is completed, second environment should be updated with the same. What do you think?
If you want this behaviour, then just use it. All you need to do is typing "saveenv;saveenv". Next question, please.
Am I the only one that expect complete sync of both sectors? Should I submit a patch?
No. Nothing is broken.
Other cosmetic issues noticed:
- Defining CFG_ENV_ADDR_REDUND or CFG_ENV_OFFSET_REDUND results in
definition of CFG_REDUNDAND_ENVIRONMENT. I might be nitpicking but the correct spelling should have been CFG_REDUNDANT_ENVIRONMENT
Typo. Submit a patch.
- The output of saveenv intermixed with output from flash driver is
looking rather untidy and confusing. For example, ". done" below is coming from "driver/cfi_flash.c". We get ". done" right after "Saving Environment to Flash..." message and other messages follow. However, the first ". done" was really for unprotect which is reported after. I think like U-Boot does for Erasing part, we should announce the operation first (i.e. Un-protecting ... Protecting...) so flash driver output can match the current operation properly.
The CFI driver is a bit noisy, indeed.
Best regards,
Wolfgang Denk

Wolfgang Denk wrote:
So, at this stage if your active environment is lost/corrupted your latest changes to the environment is lost as well which might be important to boot your system. The idea behind redundancy (IMHO) is such that if one environment is lost the backup can provide all that was in active but in current implementation it is not be possible simply using one saveenv command. To get truly redundant environment that is exactly duplicate, you are supposed to save the environment twice.
Corruption happens usually only because you have a power loss or reset or crash when writing the new environment. In this case you keep the old, known to be working state.
I agree but once you have written the one copy of the environment and protect it (if you have hardware support), one copy is already securely written you can go ahead and write the second environment. We changed the environment because the old one was not right so keeping the old environment after one copy is written might not save us in certain situations.
Redundant environment implements something like an atomic transaction.
I know it only writes over the flag byte in old environment by writing 00 (can always transform from 1s to 0 in nor flash without erasing). I guess you are referring to this as atomic transaction.
I personally think this is not quite how redundant environment should be implemented. I think once the update of one environment is completed, second environment should be updated with the same. What do you think?
If you want this behaviour, then just use it. All you need to do is typing "saveenv;saveenv". Next question, please.
It depends on a user doing this which might not be true. Heck, I even forget to do this sort of stuff after some time. It would be great if the sync is provided as an option. How about CFG_ENV_REDUND_SYNC (or something like this) that runs the save command twice internally or something like that to that effect?
I guess same thing is applicable when using fw_setenv from linux.
The current scheme also does not sync environment from the good one if one environment detected bad during boot. Should U-Boot fix the bad one from the good one automatically? Currently, I think there is not even a diagnostic message that one environment is bad.
How about U-Boot commands to verify environment so we can use it to do the sync etc. in a script.
Other cosmetic issues noticed:
- Defining CFG_ENV_ADDR_REDUND or CFG_ENV_OFFSET_REDUND results in
definition of CFG_REDUNDAND_ENVIRONMENT. I might be nitpicking but the correct spelling should have been CFG_REDUNDANT_ENVIRONMENT
Typo. Submit a patch.
Will do.
- The output of saveenv intermixed with output from flash driver is
looking rather untidy and confusing. For example, ". done" below is coming from "driver/cfi_flash.c". We get ". done" right after "Saving Environment to Flash..." message and other messages follow. However, the first ". done" was really for unprotect which is reported after. I think like U-Boot does for Erasing part, we should announce the operation first (i.e. Un-protecting ... Protecting...) so flash driver output can match the current operation properly.
The CFI driver is a bit noisy, indeed.
Should I add "Protecting..." "Un-protecting..." before operations to compensate for flash driver output...
Best regards, Tolunay

In message 444EB1D5.1000804@orkun.us you wrote:
I agree but once you have written the one copy of the environment and protect it (if you have hardware support), one copy is already securely written you can go ahead and write the second environment. We changed
You just said that the data is securely written and protected.
the environment because the old one was not right so keeping the old environment after one copy is written might not save us in certain situations.
Which are?
I know it only writes over the flag byte in old environment by writing 00 (can always transform from 1s to 0 in nor flash without erasing). I guess you are referring to this as atomic transaction.
No. I mean, that "saveenv" has a transaction character: either it will succeed, and you end up with the new environment, or it will fail, and you will end up with the previous one.
Writing the new environment twice just adds flash wear.
So far, I haven't seen a situation where it would have been useful.
If you want this behaviour, then just use it. All you need to do is typing "saveenv;saveenv". Next question, please.
It depends on a user doing this which might not be true. Heck, I even
Provide a simple update command in a variable. I really don't think this is generally useful. It just wasts flash life time.
forget to do this sort of stuff after some time. It would be great if the sync is provided as an option. How about CFG_ENV_REDUND_SYNC (or something like this) that runs the save command twice internally or something like that to that effect?
Feel free to add this as a local extension. I don;t think I would ever enable this on any of my boards.
Other opinions? Is there anybody who thinks this would improve the robustness of his devices?
The current scheme also does not sync environment from the good one if one environment detected bad during boot. Should U-Boot fix the bad one from the good one automatically? Currently, I think there is not even a
U-Boot never does any automatic writing to flash. This is something I consider evil.
diagnostic message that one environment is bad.
No, should there be one? Obviously a "saveenv" command did not complete succesfully; maybe just one millisecond eralier it would not have been started at all.
That's what I mean by "transaction": if it does not complete succesfully, then it did not take place at all. This is not considered a failure mode.
How about U-Boot commands to verify environment so we can use it to do the sync etc. in a script.
They are all in place. (crc, test). Just use them as needed. But frankly: did you ever see any corruption of NOR flash except when erasing / writing? And if you did, are you only concerned about the contents of the environment variables?
The CFI driver is a bit noisy, indeed.
Should I add "Protecting..." "Un-protecting..." before operations to compensate for flash driver output...
Ummm... no! I said it already is too noisy, so adding more output cannot be an improvement.
Best regards,
Wolfgang Denk

Wolfgang Denk wrote:
In message 444EB1D5.1000804@orkun.us you wrote:
I agree but once you have written the one copy of the environment and protect it (if you have hardware support), one copy is already securely written you can go ahead and write the second environment. We changed
You just said that the data is securely written and protected.
Due to aging flash flash sectors that is written can change in which case, the newly written one might show up corrupt **over time**. At that time U-Boot will switch to the second copy but second copy does not have the latest stuff we put/modified because we did not sync them.
What I am trying to save redundancy means within "certain limitations" we can recover the data that is redundant. This redundancy scheme does not provide for that. By certain limitation I am pointing to things like number of correctable bits in ram , number of simultaneous disk failures in a RAID 5 array etc.
the environment because the old one was not right so keeping the old environment after one copy is written might not save us in certain situations.
Which are?
Say, I am booting one of the two kernels/initrd in flash. Or NFS booting from a different IP etc. Supplying a different kernel command line.
Writing the new environment twice just adds flash wear.
I agree but we are already adding wear by writing the flag byte location of that sector. Failure of the flag byte will make it unusable as well. Besides, if we are not going to update the environment frequently wear due to repeated write issue not a concern. Having a truly redundant environment is of greater in importance in my opinion.
forget to do this sort of stuff after some time. It would be great if the sync is provided as an option. How about CFG_ENV_REDUND_SYNC (or something like this) that runs the save command twice internally or something like that to that effect?
Feel free to add this as a local extension. I don;t think I would ever enable this on any of my boards.
I will have to add the code associated with this option into common/env_flash.c. If CFG_ENV_REDUND_SYNC is not defined no new code is added and existing operation would be maintained. So, if you do not see any benefit from this you will not have to change anything... I can add this option to others (say env_nand.c) as well for parallelism.
The current scheme also does not sync environment from the good one if one environment detected bad during boot. Should U-Boot fix the bad one from the good one automatically? Currently, I think there is not even a
U-Boot never does any automatic writing to flash. This is something I consider evil.
Yes, I agree. But I think we need to know if one copy of environment is bad just like one, there has been a correctable parity error or one disk of a raid5 array has failed so a corrective action could be performed.
diagnostic message that one environment is bad.
No, should there be one? Obviously a "saveenv" command did not complete succesfully; maybe just one millisecond eralier it would not have been started at all.
Maybe saveenv completed correctly and over time there was been charge decay in flash cell caused some bits to flip....
How about U-Boot commands to verify environment so we can use it to do the sync etc. in a script.
They are all in place. (crc, test). Just use them as needed. But
crc is too general purpose. I need to have to add knowledge of where the environment is stored and organized etc. which is not a big deal but not clean to use in a script. Luckily U-Boot environment structure is simpler than uimage files.
frankly: did you ever see any corruption of NOR flash except when erasing / writing? And if you did, are you only concerned about the contents of the environment variables?
I did see this happen in aging flash. It is not common and possibly more recent flashes probably have better charge retention etc. but it happens.
Best regards, Tolunay

In message 444EC5F1.10205@orkun.us you wrote:
Due to aging flash flash sectors that is written can change in which case, the newly written one might show up corrupt **over time**. At that
Chances, that the problem happens while writing, are much higher than that flash sector which are just being read will lose their contents. Of course this is possible, but then all flash sectors are affected, including those storing the U-Boot code. If you are concerned about such things, you will have to add additioinal security measures.
But seriously, have you ever seen such a thing happen in real life?
time U-Boot will switch to the second copy but second copy does not have the latest stuff we put/modified because we did not sync them.
If theis is your concern, then sync it. Nothing prevents you from doing this.
not provide for that. By certain limitation I am pointing to things like number of correctable bits in ram , number of simultaneous disk failures in a RAID 5 array etc.
So you probably want ECC on your boot flash?
environment after one copy is written might not save us in certain situations.
Which are?
Say, I am booting one of the two kernels/initrd in flash. Or NFS booting from a different IP etc. Supplying a different kernel command line.
How would this corrupt an already stored and write protected environment sector?
I agree but we are already adding wear by writing the flag byte location of that sector. Failure of the flag byte will make it unusable as well.
No. This does not add a new erase cycle.
Besides, if we are not going to update the environment frequently wear due to repeated write issue not a concern. Having a truly redundant environment is of greater in importance in my opinion.
This is your opinion, OK. As mentioned before, all you need to do is run "saveenv" twice.
I will have to add the code associated with this option into common/env_flash.c. If CFG_ENV_REDUND_SYNC is not defined no new code is
You can keep this as local extensions / patches. I don't think I'm going to add this, unless at least some other people speak up here on the list and say that they need this, too.
U-Boot never does any automatic writing to flash. This is something I consider evil.
Yes, I agree. But I think we need to know if one copy of environment is bad just like one, there has been a correctable parity error or one disk of a raid5 array has failed so a corrective action could be performed.
This is a different story. With a RAID5 array, you have a disk that needs to be replaced because it is broken.
With redundand environment, in 99% or more of all cases nothing is broken, the only problem was a reset of the system in an unlucky moment (while storing the new environment). This situation will go away automatically whenever you use the next "saveenv" command. Until then, no problem exists - you have a valid environment.
I do not see any problems here.
Maybe saveenv completed correctly and over time there was been charge decay in flash cell caused some bits to flip....
If this is your concern you need to protect / check all other flash sectors as well. But if you don't trust the contents of the flash memory - why would you then trust the program that is running from this memory and let it check itself? If you reach such a level of paranoia, you need parity or ECC for your flash memory.
crc is too general purpose. I need to have to add knowledge of where the environment is stored and organized etc. which is not a big deal but not clean to use in a script. Luckily U-Boot environment structure is simpler than uimage files.
U-Boot provides an astonishing flexibility by using scripts. I request that you use this flexibility instead of blowing up the common code whith rarely used features that can be implemented trivially as a script.
frankly: did you ever see any corruption of NOR flash except when erasing / writing? And if you did, are you only concerned about the contents of the environment variables?
I did see this happen in aging flash. It is not common and possibly more recent flashes probably have better charge retention etc. but it happens.
Did it really happen in a situation where the flash was only read?
Best regards,
Wolfgang Denk

Wolfgang Denk wrote:
In message 444EC5F1.10205@orkun.us you wrote:
time U-Boot will switch to the second copy but second copy does not have the latest stuff we put/modified because we did not sync them.
If theis is your concern, then sync it. Nothing prevents you from doing this.
environment after one copy is written might not save us in certain situations.
Which are?
Say, I am booting one of the two kernels/initrd in flash. Or NFS booting from a different IP etc. Supplying a different kernel command line.
How would this corrupt an already stored and write protected environment sector?
I am new to this but it seems that once we successfully write the environment and protect it, how does (should) one go about writing that same environment to the redundant sector? Is it done with saveenv if the redundant environment is enabled? I guess I need to go look at the code.
-Randy Smith

In message 444F684A.8060406@imagemap.com you wrote:
I am new to this but it seems that once we successfully write the environment and protect it, how does (should) one go about writing that same environment to the redundant sector? Is it done with saveenv if the redundant environment is enabled? I guess I need to go look at the
Correct. All you need to do is call "saveenv" twice. [That's why I can't see a real need for changing the code.]
Best regards,
Wolfgang Denk

Wolfgang Denk wrote:
In message 444F684A.8060406@imagemap.com you wrote:
I am new to this but it seems that once we successfully write the environment and protect it, how does (should) one go about writing that same environment to the redundant sector? Is it done with saveenv if the redundant environment is enabled? I guess I need to go look at the
Correct. All you need to do is call "saveenv" twice. [That's why I can't see a real need for changing the code.]
Best regards,
Wolfgang Denk
Just my uninformed opinion...It seems to me that calling saveenv twice is misleading. What I mean is that I assume that I have a "golden" copy of my environment in the redundant area. I should be able to call saveenv as many times as I wish without touching the contents of the "golden" copy and that there should be another mechanism to update the "golden" copy. I had no idea that calling saveenv twice will overwrite the redundant area, but then again, I am in new to this.
-Randy Smith

In message 444F74B1.8060909@imagemap.com you wrote:
Just my uninformed opinion...It seems to me that calling saveenv twice is misleading. What I mean is that I assume that I have a "golden" copy of my environment in the redundant area. I should be able to call
No. That "golden" copy is what we call "default environment" - you get this when you lose your environment (with redundand environment it means that you lose both copies).
saveenv as many times as I wish without touching the contents of the "golden" copy and that there should be another mechanism to update the "golden" copy. I had no idea that calling saveenv twice will overwrite
That's not how redundance is defined. You are looking for a backup copy, which is provided by the default environment.
Note that updating the default environment is not trivial, as this is compile-time defined.
Best regards,
Wolfgang Denk

I will drop in this post to put some final remarks...
Wolfgang Denk wrote:
In message 444F74B1.8060909@imagemap.com you wrote:
Just my uninformed opinion...It seems to me that calling saveenv twice is misleading. What I mean is that I assume that I have a "golden" copy of my environment in the redundant area. I should be able to call
No. That "golden" copy is what we call "default environment" - you get this when you lose your environment (with redundand environment it means that you lose both copies).
Yet, the default environment normally does not contain such important stuff like ethaddr which is assigned per board.
saveenv as many times as I wish without touching the contents of the "golden" copy and that there should be another mechanism to update the "golden" copy. I had no idea that calling saveenv twice will overwrite
That's not how redundance is defined. You are looking for a backup copy, which is provided by the default environment.
Indeed redundancy of environment in U-Boot is rather different than I am accustomed to.
Anyway, at least I understand how this stuff works much better now and I hope this discussion will probably help other developers understand that what they might be getting with redundant environment right now might not be exactly what they think they are getting.
I really hoped you would allow to introduce the choice of functionality. It would break no existing boards. I have to think about maintaining an out-of-three patch for this case :(
Best regards, Tolunay

In message 444F91D0.9080904@orkun.us you wrote:
Yet, the default environment normally does not contain such important stuff like ethaddr which is assigned per board.
If you consider params like MAC address and serial number critical, than make sure they hetinitialized automatically; see for example board/tqm8xx/load_sernum_ethaddr.c; here a special information block will be written to a reserved area in flash (done as part of the production, when MAC address and serial number get assigned to the board. If they are not set in the environment, U-Boot will automatically pick the values from there. Thus you can combine a common default environment with board specific parameters like serial# or ethaddr.
Just look around a bit in the code. Many problems have been solved before. Don't reinvent the wheel. Steel the code - it's free software.
Best regards,
Wolfgang Denk

On Wednesday, 26. April 2006 09:53, Wolfgang Denk wrote:
I will have to add the code associated with this option into common/env_flash.c. If CFG_ENV_REDUND_SYNC is not defined no new code is
You can keep this as local extensions / patches. I don't think I'm going to add this, unless at least some other people speak up here on the list and say that they need this, too.
I have to admit, that I was also a little astonished how the redundant environment works when I first used it. I would have expected (as Tolunay did) the 2nd totally synced version.
I do see the benefit that the current implementation only erases the flash sectors half as much as normal (not redundant) flash environment or totally synced redundant environment does. But is this really a problem? Then all not redundant flash environment U-Boot implementations would have a problem too.
I also tend to forget such things like using "saveenv" twice, so I would vote to let Tolunay implement this new behavior (using CFG_ENV_REDUND_SYNC) that the user (or developer) can choose between both versions.
And if Wolfgang agrees to accept this patch then please include a short description on both implementations in the README. That would be very helpful.
Best regards, Stefan

Stefan Roese wrote:
On Wednesday, 26. April 2006 09:53, Wolfgang Denk wrote:
I will have to add the code associated with this option into common/env_flash.c. If CFG_ENV_REDUND_SYNC is not defined no new code is
You can keep this as local extensions / patches. I don't think I'm going to add this, unless at least some other people speak up here on the list and say that they need this, too.
I have to admit, that I was also a little astonished how the redundant environment works when I first used it. I would have expected (as Tolunay did) the 2nd totally synced version.
I do see the benefit that the current implementation only erases the flash sectors half as much as normal (not redundant) flash environment or totally synced redundant environment does. But is this really a problem? Then all not redundant flash environment U-Boot implementations would have a problem too.
I also tend to forget such things like using "saveenv" twice, so I would vote to let Tolunay implement this new behavior (using CFG_ENV_REDUND_SYNC) that the user (or developer) can choose between both versions.
And if Wolfgang agrees to accept this patch then please include a short description on both implementations in the README. That would be very helpful.
Best regards, Stefan
My 2 cents: ----------- 1c: What u-boot currently has I would label a "back up" env, not a "redundant" env. The analogy would be a back up tape's copy of your hard drive vs. a RAID-1's redundant copy. Both copy mechanisms are useful, but in different ways.
2c: I've been working with "full featured" and flash EEPROMs for 20 years and have not seen any lose their contents after being successfully programmed (and properly programmed - losing power during a program cycle is a Bad Thing[tm] DAMHIK :-/). OTOH, I am in development and may have missed Manufacturing's warranty repair bulletins :-/.
gvb

In message 444EC5F1.10205@orkun.us you wrote:
I agree but we are already adding wear by writing the flag byte location of that sector. Failure of the flag byte will make it unusable as well.
One more comment on this: all we do is flipping a single bit from 1 to 0.
This does NOT cause any additional flash wear, as it does not require any erase cycle. And what can fail? The bit might turn out to be not programmable any more. Than we have a flash error, which will be reported. Your device is defect then, and needs to be replaced, like with any other flash sector failing.
Best regards,
Wolfgang Denk
participants (5)
-
Jerry Van Baren
-
Randy Smith
-
Stefan Roese
-
Tolunay Orkun
-
Wolfgang Denk