[U-Boot] UBIFS seeing corrupt blank pages when image flashed via u-boot

Hi All,
I have been facing a weird problem, may be someone has a solution.
*_Case-1_ Flashing UBIFS image from u-boot using 'nand write' utility*
For a partially written erased-block.. (a) 1st page is written with 'erase-header' (b) 2nd page is written with 'volume-header' (c) '3rd page' is written with 'some data' (d) '4th to last-page of block' should be left blank, but they are written with 0xFF. As a effect of (d), the ECC calculated for (all 0xff data) is written to OOB area of all pages from 4th-page till last-page of the PEB.
As per my understanding, after mounting UBIFS as root, kernel tries to append some data to some files in rootfs, due to which leftover pages (d) get written by appended data, _without_ PEB getting erased. This causes ECC bytes to get corrupted, because OOB of 'unused-pages' was already written with ECC of (all 0xff data) by u-boot in step(d). And on next reboot, kernel sees 'un-corrected ECC' errors while booting.
*_Case-2_ Flashing UBIFS image from kernel using 'ubiformat' utility*
Whereas when same image is flashed using 'ubiformat' utility, after booting Kernel from other root source. 'ubiformat' automatically skips empty pages (4th to last-page) in erased-block. Thus OOB area of pages from 4th-page till last-page are un-touched. Hence, when kernel 'appends' the rootfs files there is no ECC corruption as the OOB area of 4th to last-page of PEB were blank. So everything works fine.
Now my queries: (1) while _appending_ a file (a) does UBIFS writes appended data to same existing PEB, if there is enough space in PEB to accommodate new data ? OR (b) does UBIFS copies the existing data to newer PEB along with the appended data ?
(2) In-case (b), then can someone point me to what can possibly be the issue ? (Any references to UBIFS docs on infradead.org may also help).
with regards, pekon

Hi Pekon,
On Fri, 2014-01-03 at 11:45 +0000, Gupta, Pekon wrote:
*_Case-1_ Flashing UBIFS image from u-boot using 'nand write' utility*
For a partially written erased-block.. (a) 1st page is written with 'erase-header' (b) 2nd page is written with 'volume-header' (c) '3rd page' is written with 'some data' (d) '4th to last-page of block' should be left blank, but they are written with 0xFF. As a effect of (d), the ECC calculated for (all 0xff data) is written to OOB area of all pages from 4th-page till last-page of the PEB.
Yup.
As per my understanding, after mounting UBIFS as root, kernel tries to append some data to some files in rootfs, due to which leftover pages (d) get written by appended data, _without_ PEB getting erased.
Right.
This causes ECC bytes to get corrupted, because OOB of 'unused-pages' was already written with ECC of (all 0xff data) by u-boot in step(d). And on next reboot, kernel sees 'un-corrected ECC' errors while booting.
Sure.
*_Case-2_ Flashing UBIFS image from kernel using 'ubiformat' utility*
Whereas when same image is flashed using 'ubiformat' utility, after booting Kernel from other root source. 'ubiformat' automatically skips empty pages (4th to last-page) in erased-block. Thus OOB area of pages from 4th-page till last-page are un-touched. Hence, when kernel 'appends' the rootfs files there is no ECC corruption as the OOB area of 4th to last-page of PEB were blank. So everything works fine.
Exactly!
Now my queries: (1) while _appending_ a file (a) does UBIFS writes appended data to same existing PEB, if there is enough space in PEB to accommodate new data ?
UBIFS always writes to the Journal PEB, whatever it happens to be. So no, it is unlikely that the data will go to the same PEB.
OR (b) does UBIFS copies the existing data to newer PEB along with the appended data ?
No, the existing data stays where it is. New data goes to the journal. Then the journal PEB gets indexed. The data nodes end up in different PEBs (i.e., fragmented).
If you are worried about fragmentation, we can discuss this separately. You can find more about UBIFS journal in my very old UBIFS presentation, which explains basic ideas behind the UBIFS wandering journal:
http://www.linux-mtd.infradead.org/doc/ubifs.html#L_documentation
There is also Adrian's white paper with some design description there.
(2) In-case (b), then can someone point me to what can possibly be the issue ? (Any references to UBIFS docs on infradead.org may also help).
I do not understand the question. There are no problems in your (b), neither in "*_Case-2_" described.
If you meant "*_Case-1_", then yes, there is a piece of doc:
http://www.linux-mtd.infradead.org/doc/ubi.html#L_flasher_algo
Basically, "ubiformat" is the "correct" UBI-aware flasher, while u-boot's "nand write" seems to be a dumb flasher. I guess you have 2 options:
1. Teach u-boot's "nand write" to skip empty pages, or may be implement a separate "clever" flashing command.
2. Use UBIFS's "space fixup" feature. This will cause UBIFS to fix-up all empty pages by basically copying all partially-used PEBs to different PEBes with empty pages skipping. This will be done on the first mount, only once, and may cause considerable delays.
See http://www.linux-mtd.infradead.org/faq/ubifs.html#L_free_space_fixup
P.S. Looking at the MTD web-site now, when I am not doing any UBI/UBIFS/MTD work anymore for few years, I am impressed how much stuff I actually documented there :-)
HTH.

Hi Gupta,
On 03/01/2014 13:59, Artem Bityutskiy wrote:
Basically, "ubiformat" is the "correct" UBI-aware flasher, while u-boot's "nand write" seems to be a dumb flasher.
It is, it is *not* recommended for UBI volume without "ubinizing" your image.
I guess you have 2 options:
- Teach u-boot's "nand write" to skip empty pages, or may be implement
a separate "clever" flashing command.
You can also add (my preferred way) UBI support to your U-Boot, if it does not yet have. Then you will have "ubi part" (corresponds to ubiattach in Artem's MTD utilities), "ubi createvol" and "ubi writevol" (respectively, ubiupdatevol and ubimkvol). If you use dumb nand utilities, you have to erase the flash first with "nand erase", and this will lose the erase counters.
Best regards, Stefano Babic

Hi Artem,
I wanted to check the 'white-space-fixup' and re-reading your documentation before, so got delayed in replying. + my mail got moderated again by mailman..
From: Artem Bityutskiy [mailto:artem.bityutskiy@linux.intel.com]
[...]
If you are worried about fragmentation, we can discuss this separately. You can find more about UBIFS journal in my very old UBIFS presentation, which explains basic ideas behind the UBIFS wandering journal:
http://www.linux-mtd.infradead.org/doc/ubifs.html#L_documentation
There is also Adrian's white paper with some design description there.
Thanks much for reminding me about this. I had read your slides long back, but never dig deep into Adrian's slides. so, this was still in my 'To Read' list. But really appreciate your work and presentation.
[...]
I do not understand the question. There are no problems in your (b), neither in "*_Case-2_" described.
If you meant "*_Case-1_", then yes, there is a piece of doc:
http://www.linux-mtd.infradead.org/doc/ubi.html#L_flasher_algo
Basically, "ubiformat" is the "correct" UBI-aware flasher, while u-boot's "nand write" seems to be a dumb flasher. I guess you have 2 options:
- Teach u-boot's "nand write" to skip empty pages, or may be implement
a separate "clever" flashing command.
Yes, I'll try 'Stefano Babic' suggestion of using u-boot UBI tools.
- Use UBIFS's "space fixup" feature. This will cause UBIFS to fix-up
all empty pages by basically copying all partially-used PEBs to different PEBes with empty pages skipping. This will be done on the first mount, only once, and may cause considerable delays.
See http://www.linux-mtd.infradead.org/faq/ubifs.html#L_free_space_fixup
Though I had read about 'white-space-fixup' feature earlier too, But somewhere in back of my mind, I thought it was only for "free PEBs" (erased-blocks which had corrupted or no volume-header). But after re-reading the FAQ page, I realized that 'white-space-fixup' is done for all pages, whether in 'free-PEB' or 'used-PEB'.
So, This solved my problem.. Thanks much..
P.S. Looking at the MTD web-site now, when I am not doing any UBI/UBIFS/MTD work anymore for few years, I am impressed how much stuff I actually documented there :-)
Absolutely agree. Therefore your file-system is so popular.. Especially the MTD and UBI documentation is not only limited to 'how to use it', Instead I think, it has some advanced details, explanations and reasoning which were quite ahead of its time when it was written.
This is something which you and other MTD/UBI/ & UBIFS Authors and Maintainers should be proud of.
Thanks again ..
with regards, pekon

Hi,
On Fri, Jan 3, 2014 at 6:29 PM, Artem Bityutskiy artem.bityutskiy@linux.intel.com wrote:
Hi Pekon,
On Fri, 2014-01-03 at 11:45 +0000, Gupta, Pekon wrote:
*_Case-1_ Flashing UBIFS image from u-boot using 'nand write' utility*
For a partially written erased-block.. (a) 1st page is written with 'erase-header' (b) 2nd page is written with 'volume-header' (c) '3rd page' is written with 'some data' (d) '4th to last-page of block' should be left blank, but they are written with 0xFF. As a effect of (d), the ECC calculated for (all 0xff data) is written to OOB area of all pages from 4th-page till last-page of the PEB.
Yup.
If the 4th to last-page are left blank and not covered with ECC, what will happen in case of bit flips on the blank pages? There was an issue reported some time back. http://lists.infradead.org/pipermail/linux-mtd/2012-January/039256.html
Does UBI/UBIFS take care of this now?
Thanks, Calvin

On Mon, 2014-01-13 at 17:49 +0530, Calvin Johnson wrote:
If the 4th to last-page are left blank and not covered with ECC, what will happen in case of bit flips on the blank pages? There was an issue reported some time back. http://lists.infradead.org/pipermail/linux-mtd/2012-January/039256.html
Does UBI/UBIFS take care of this now?
No. UBIFS still assumes that blank pages are ECC-protected by the driver. No one stepped in and took care of changing this yet.

Hi Calvin,
From: Artem Bityutskiy [mailto:artem.bityutskiy@linux.intel.com]
On Mon, 2014-01-13 at 17:49 +0530, Calvin Johnson wrote: If the 4th to last-page are left blank and not covered with ECC, what will happen in case of bit flips on the blank pages? There was an issue reported some time back. http://lists.infradead.org/pipermail/linux-mtd/2012-January/039256.html
Does UBI/UBIFS take care of this now?
No. UBIFS still assumes that blank pages are ECC-protected by the driver. No one stepped in and took care of changing this yet.
Yes, it's true that in newer technologies (specially < 28nm flash), we are seeing lot of erased-pages having bit-flips. And due to which UBIFS is cribbing. But as there is no ECC stored in an erased-page, bit-flips in erased-page cannot be corrected, unless you compare each byte of read_data. However, there are other way of handling bit-flips in erased-page. Following are few ways in which OMAP NAND driver handles bit-flips in erased-page:
*Case-1*: If bit-flips are found in data-region of an erased-page. (1) An erased-page implicitly means that its data-region should *only* contain 0xff, so its safe to fill read_buf() with 0xff. But the controller driver should report the correctable/un-correctable bit-flips to upper-layer, so that upper-layers like UBI take corrective action by re-erase this block before using it. (Refer) http://lists.infradead.org/pipermail/linux-mtd/2014-January/051368.html
*Case-2*: If bit-flips are found in oob-region of an erased-page. This is bit trivial, because if there are bit-flips in ecc-layout (OOB region) of erased-page, it would be difficult to differentiate between an erased-page v/s programmed-page. Though you can keep a 'marker' in ecc-layout reserved for detecting Programmed-pages, but that marker byte itself can be subjected to bit-flips (assuming on MLC and newer technology NAND bit-flips are common). So, OMAP NAND driver takes probabilistic approach. (Refer) http://lists.infradead.org/pipermail/linux-mtd/2014-January/051367.html ----------------------- This patch 'assumes' any page to be 'erased': (a) if all(read_ecc) == 0xff (b) else if all(read_data) == 0xff -----------------------
Currently both UBI and UBIFS layer checks for erased-page to be all(0xff), But I think its over-kill to put this burden on UBI or UBIFS layer, because low-level controller drivers can handle this easily. So, if Artem and Brian agree to above approaches, then I can a submit patch for removal of: - "ubi_self_check_all_ff()" from UBI layer. - checking of 'buf == 0xff' from ubifs_scan_leb() in UBIFS layer.
with regards, pekon

On Mon, 2014-01-13 at 13:16 +0000, Gupta, Pekon wrote:
Currently both UBI and UBIFS layer checks for erased-page to be all(0xff), But I think its over-kill to put this burden on UBI or UBIFS layer, because low-level controller drivers can handle this easily. So, if Artem and Brian agree to above approaches, then I can a submit patch for removal of:
- "ubi_self_check_all_ff()" from UBI layer.
Well, this is just debugging and sanity check stuff.
- checking of 'buf == 0xff' from ubifs_scan_leb() in UBIFS layer.
I do not think this is a good idea. Let me do some quick braindump, thankfully I still remember the reasons behind this.
This is about the recovery, and this is the code path where we actually do these checks.
Just like in defensive programming you try to assume the worst, we tried to assume the worst too. And the worst is - you cannot make any assumption about what is on the media.
Now, we wanted to make UBIFS robust in a sense that you can cut the power off at any point, and you can be sure the UBIFS driver is still able to mount your flash. You can lose some data because it did not make it to the media yet by the time of power cut. But you never lose the data which made it to the media before the power cut.
And the file-system should mount the media without any user-space tools like 'ckfs.ubifs'. The system should recover itself (detect half-written garbage and get rid of it, preparing "clean" blank flash area for writing new data).
When you mount a file-system, UBIFS scans the journal. Suppose it hits a corrupted data node. At this point UBIFS need to make a decision whether this is a node which was corrupted because of a power cut, or this is a piece of data which has to be correct, but got corrupted because of, say, under-voltage problems, or NAND wear, or radiation, etc.
In the first case - you recover silently, and you do not bother the user with warnings.
In the second case - you report loudly. You do not do anything because you risk of losing important user data (an expensive bitcoin!)
Right? So you gotta be very careful, because this is user data.
To put it differently, we specifically targeted a special type of corruptions - power-cut related corruptions. We made related assumptions. And we were very careful about validating these assumptions.
So UBIFS always starts with fully erased LEBs. Then it writes there sequentially, NAND page-by-page, from beginning to the end.
(Well, it is a bit more complex than that, but this is not important in this discussion. The complexity is that there are several journal heads, so UBIFS writes to more than one LEBs, but it is sequetial anyway. Also, we write in so-called "max. write units", which are usually the same as NAND page in case of NAND anyway).
When UBIFS mounts a file-system, it scans the journal. When it meets a corrupted node in NAND page X, it looks at NAND page X+1 and checks if it is blank or not. If it is blank, this looks normal, and X was just the NAND page UBIFS presumably was writing to just before the power cut.
If NAND page X+1 contains something, then page X cannot be corrupted due to power cut, and this is something else. And we, the FS authors, do not know how to deal with this, we did not think about this type of corruptions. So just we complain and exit. This is better then trying to erase something and make you lose your data, right?
That's the logic. And of course people are welcome to extend it and improve it.
Conclusion: all UBIFS needs is a way to ask the driver - is this NAND page blank or not? UBIFS does not really has to compare to all 0xFFs.
participants (4)
-
Artem Bityutskiy
-
Calvin Johnson
-
Gupta, Pekon
-
Stefano Babic