
Hi,
On 09/05/2011 05:57 PM, Holger Brunck wrote:
On 09/05/2011 04:37 PM, Stefan Roese wrote:
BTW: Is this problem reproducible on one of your systems?
yes we find a way to reproduce the bug on one of our boards. We need a special bit pattern in one UBI PEB, force a bitflip and afterwards the problem is present.
I have done some further investigations. It's not true that we need a special bit pattern in one UBI peb. We only need a situation where the UBI layer in u-boot finds a fixable bitflip in NAND and u-boot gets stuck.
The loop in which u-boot gets stuck is in driver/mtd/ubi/wlc.:
schedule_erase <-------- | | erase_worker | | | ensure_wear_leveling | | | wear_leveling_worker --|
And from this loop we will never return.
I have seen in mainline kernel this fix in the ubi layer:
commit b86a2c56e512f46d140a4bcb4e35e8a7d4a99a4b Author: Artem Bityutskiy Artem.Bityutskiy@nokia.com Date: Sun May 24 14:13:34 2009 +0300
UBI: do not switch to R/O mode on read errors
This patch improves UBI errors handling. ATM UBI switches to R/O mode when the WL worker fails to read the source PEB. This means that the upper layers (e.g., UBIFS) has no chances to unmap the erroneous PEB and fix the error. This patch changes this behaviour and makes UBI put PEBs like this into a separate RB-tree, thus preventing the WL worker from hitting the same read errors again and again.
[...]
And this sounds like the problem I see in u-boot. But this patch is not easy to port onto u-boot because previously undergoing changes in the kernels ubi layer...
Best regards Holger Brunck