Re: [U-Boot-Users] Re: Redundant environment

22 May 2006

      I am sorry I am responding to this so late as I got so busy recently and 
had accumulated over 1000 emails from public lists I am following....
Wolfgang Denk wrote:
...
Dear Tolunay,
in message 445B8086.9000404@orkun.us you wrote:
...
This patch would solve the issue that exists today that when the 
"active" environment is lost/corrupted for some reason the "redundant" 
environment would contain an exact copy of the primary to have the board 
come up without requiring the need to redo the changes that was lost on
Actually I think that you will not acchieve  this  with  your  patch.
This  is  why  I'm concerned. You see, if you feel better having this
patch I would not complain, but I am afraid  that  a  lot  of  people
might  just  activate it because they think it would do them any good
when it doesn't (and actually it just hurts).
I can only offer a detailed description of what it does and under what 
condition it might be useful and under what condition it can hurt in 
README (and perhaps Wiki)
...
There is only one occasion when we have any significant likelyhood of
losing the environment data: this is when a call to  "saveenv"  fails
becaue  either  a)  we  have  a  power  loss, b) we have an otherwise
induced reset of the CPU, or  c)  the  flash  sector  that  shall  be
erased/written is failing.
So where exactly does your modification improve  anything?  Let's  go
through this step by step.
Case 1: power loss/reset happens during the first  "saveenv",  i.  e.
        when writing the first copy of the new environment data.
    In this case this first copy  contains  no  valid  data;  the
    second copy of the environment contains valid, but old data.

    This is exactly the same as we have with the  current  imple-
    mentation. I don't see any improvement.

This is a tie in terms of functionality between two implementations.
...
Case 2: power loss/reset happens during the second "saveenv",  i.  e.
        when writing the second copy of the new environment data.
    In this case this first copy contains valid new  data,  while
    the  second  copy  of  the environment does not contain valid
    data.

    In the current implementation, the first (and  only)  saveenv
    would  have  completed,  too,  and  the reset would hit after
    leaving this part of code, so we had valid new  data  in  the
    first copy, and valid (but old) data in the second one.

    Again, this is not  an  improvement.  Actually  I  think  the
    current implementations is even more useful.

I would call this as a tie too.
...
Case 3: A flash sector in the first copy of the  environment  becomes
        defective  while  we  erase or write it. In this case we will
        see appropriate error conditions, and the  "saveenv"  command
        will abort.
    This is the same as case 1: no valid data in copy  1,  valid,
    but  old  data  in copy 2; no difference between the existing
    and your new implementation.

Tie.
...
Case 4: A flash sector in the second copy of the environment  becomes
        defective  while  we  erase or write it. In this case we will
        see appropriate error conditions, and the  "saveenv"  command
        will abort.
    This is the same as case 2: valid new  data  in  copy  1,  no
    valid  data  in copy 2 with your implementation, but probably
    valid old data with the existing code.

Tie.
...
I guess I must have missed some cases  because  there  was  none  yet
where  the  new  implementaion  would improve the reliability. Please
fill in these missing cases.
You are right there is little difference under these conditions. The 
alternate implementation I've proposed, takes care of the things that 
happen after "saveenv" has completed successfully.
1) Charge loss/fading on flash cells.
When primary environment is partially lost due to charge loss on flash 
cells. It is true that under perfect conditions, the cells should retail 
charge for a long time but if there was a positive ripple in power 
supply while flash was written vs a low power supply while being read 
could reduce the time required significantly. A good power supply 
regulation and good power supply distribution on PCB prevents more or 
less but aging flash chip may be more susceptible.
2) If the power supply is lost while flash is being written/erased, 
ongoing write might effect sometimes other cells/blocks that were not 
the target. True when this occurs environment is not the only thing we 
should be concerned but if it actually lands in the environment we can 
recover from it.
...
But, and I think this is an undisputet fact,  the  current  implemen-
tation needs only hald the number of erase/write cycles, so it causes
much less flash wear than your code. [Actually your code will see the
same  level  of  flash  wear  as  you  have now without the redundant
environment enabled; it's that enabling the current implementation of
redundance  *improves*  flash  lifetime  by  halfing  the  number  of
erase/write cycles to the environment.]
As I pointed earlier, if you are writing the environment not so often 
this is not a concern. If you are updating the environment every time 
the board boots it might be a concern. The documentation would note that 
and have implementor decide for their situation.
...
...
Among the things that can cause one environment to go corrupt would be 
charge decays in memory cells in aging flash, supply variations/noise
I think that the likelyhood of such a thing  to  happen  during  read
accesses only is infinitesimal.
I've experienced it. It has been some years and the controllers were 
deployed in factory environments (EMI noise issues) ... You might call 
me unlucky, or perhaps we had a bad chip to begin with. Perhaps it is 
not an issue with more modern/reliable production techniques. Who knows...
Well, I think this has dragged on way too long. If you are not convinced 
that it might be useful, I will drop this patch proposal from consideration.
Best regards,
Tolunay

Re: [U-Boot-Users] Re: Redundant environment

Tolunay Orkun