
On Mon, 2019-08-26 at 14:43 +0200, Felix Brack wrote:
Hello Weijie,
On 26.08.19 10:19, Weijie Gao wrote:
On Thu, 2019-08-22 at 15:58 +0200, Felix Brack wrote:
On 11.07.19 09:10, Weijie Gao wrote:
Some storage devices have multiple hw partitions and both address from zero, for example eMMC. However currently block cache invalidation only applies to block write/erase. This can cause a problem that data of current hw partition is cached before switching to another hw partition. And the following read operation of the latter hw partition will get wrong data when reading from the addresses that have been cached previously.
To solve this problem, invalidate block cache after a successful select_hwpart operation.
Signed-off-by: Weijie Gao weijie.gao@mediatek.com
This patch breaks correct operation of PDU001 board.
drivers/block/blk-uclass.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/block/blk-uclass.c b/drivers/block/blk-uclass.c index baaf431e5e0..c23b6682a6c 100644 --- a/drivers/block/blk-uclass.c +++ b/drivers/block/blk-uclass.c @@ -208,7 +208,11 @@ int blk_select_hwpart_devnum(enum if_type if_type, int devnum, int hwpart) if (ret) return ret;
- return blk_select_hwpart(dev, hwpart);
- ret = blk_select_hwpart(dev, hwpart);
- if (!ret)
blkcache_invalidate(if_type, devnum);
- return ret;
}
int blk_list_part(enum if_type if_type) @@ -348,7 +352,13 @@ int blk_select_hwpart(struct udevice *dev, int hwpart)
int blk_dselect_hwpart(struct blk_desc *desc, int hwpart) {
- return blk_select_hwpart(desc->bdev, hwpart);
- int ret;
- ret = blk_select_hwpart(desc->bdev, hwpart);
- if (!ret)
blkcache_invalidate(desc->if_type, desc->devnum);
Commenting the invalidation of the block cache on this line results in proper working of the board.
- return ret;
}
int blk_first_device(int if_type, struct udevice **devp)
With the patch active, files from the boot FAT partition of the SD-card (mmc device 1) do not load anymore, hence booting fails. Using the eMMC (mmc device 0) instead works fine, files can bee loaded and the board boots.
To isolate the problem I have modified the configuration to only load a 10k test file (test.bin) instead of loading the DTB and zImage files followed by booting LINUX. Furthermore I have enabled debugging for some parts of the code.
When I boot from the SD-card (which fails) I get the following logged:
CPU : AM335X-GP rev 1.0
This time the file test.bin gets loaded correctly.
To be honest I don't really see why things are going wrong. It looks like the block cache gets filled but then again invalidated before it's data is used. I also feel that this problem is related to the SD-card not being the first (number 0) but the second device (number 1). Can anybody shed some light on this and give me a hint?
many thanks, Felix
Hi Felix,
I've found the root cause.
Many thanks for looking into this!
There is a bug in the FAT driver. In function file_fat_read_at(), malloc_cache_aligned() allocates memory for the itr. The itr is a pointer to struct fat_itr. fat_itr has a member block[MAX_CLUSTSIZE] used for storing blocks read from MMC.
The FAT driver resolves the file table and stores the required dir entry in the member block. Then the pointer of the dir entry is passed to get_contents() to get the contents of the file.
However the itr is freed BEFORE the call to get_contents(). This means the contents dir entry points to is no longer valid. And it results to your situation.
Indeed this is the root of all trouble. The freeing of itr of course also invalidates the assignment made earlier:
dir_entry *dentptr = itr->dent;
As dentptr is used in the call to get_contents() the evil takes its course...
Again many thanks for analyzing and explaining this as detailed as you did.
But this only happens when CONFIG_BLOCK_CACHE=y and the patch applied. I've also found its cause.
The MMC driver calls blk_dselect_hwpart() unconditionally in mmc_bread(). This causes block cache being invalidated and refilled many times during the FAT load operation.
Yes I have noticed this frequent invalidations of the cache due to calls to blk_dselect_hwpart(). Some optimization, if possible, would be great.
The frequent block cache invalidation and refilling finally results in frequent memory allocation and freeing. There is a mechanism in the dlmalloc.c: if the total free space reaches a threshold, the malloc_trim() will be called. malloc_trim() calls sbrk(), which fills the memory being freed with zeros and of course the data of itr is damaged, otherwise the original data remains unchanged.
So this is the reason the bug only appears when block cache is enabled and cache invalidation is called after every select_hwpart.
I see and now understand the different behavior of eMMC and SD-card.
I have made a patch, could you please have a try? I've tested it worked. If it works for you too, I will submit it to u-boot. And I will submit another patch to optimize the cache invalidation for select_hwpart.
--- a/fs/fat/fat.c +++ b/fs/fat/fat.c @@ -1174,10 +1174,6 @@ int file_fat_read_at(const char *filename, loff_t pos, void *buffer, /* For saving default max clustersize memory allocated to malloc pool */ dir_entry *dentptr = itr->dent;
- free(itr);
- itr = NULL;
- ret = get_contents(&fsdata, dentptr, pos, buffer, maxsize, actread);
out_free_both:
I can confirm that your patch works perfect, thanks!
Please CC me when submitting the two patches you mentioned above. It will be my pleasure to test them on my hardware.
BTW, I reproduced this failure on a MT7623 board, which also has an eMMC and a SD. And this patch is indeed used for MT7623 to solve the bug on switching hwpart on eMMC.
Best Regards,
Weijie
regard, Felix
Hi Felix,
I found there's already a commit merged 6 hours ago (mailed on Aug 20) that fixed this issue: d7af2a863017be1f5fd1b65a858ddc7e87d7b876
So there's no need to send patch again. What you need is to pull the latest master branch.
I will still send a patch to optimize the invalidation in select_hwpart.
Best Regards,
Weijie