[U-Boot] [PATCH] blk: Invalidate block cache when switching hwpart

Some storage devices have multiple hw partitions and both address from zero, for example eMMC. However currently block cache invalidation only applies to block write/erase. This can cause a problem that data of current hw partition is cached before switching to another hw partition. And the following read operation of the latter hw partition will get wrong data when reading from the addresses that have been cached previously.
To solve this problem, invalidate block cache after a successful select_hwpart operation.
Signed-off-by: Weijie Gao weijie.gao@mediatek.com --- drivers/block/blk-uclass.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/block/blk-uclass.c b/drivers/block/blk-uclass.c index baaf431e5e0..c23b6682a6c 100644 --- a/drivers/block/blk-uclass.c +++ b/drivers/block/blk-uclass.c @@ -208,7 +208,11 @@ int blk_select_hwpart_devnum(enum if_type if_type, int devnum, int hwpart) if (ret) return ret;
- return blk_select_hwpart(dev, hwpart); + ret = blk_select_hwpart(dev, hwpart); + if (!ret) + blkcache_invalidate(if_type, devnum); + + return ret; }
int blk_list_part(enum if_type if_type) @@ -348,7 +352,13 @@ int blk_select_hwpart(struct udevice *dev, int hwpart)
int blk_dselect_hwpart(struct blk_desc *desc, int hwpart) { - return blk_select_hwpart(desc->bdev, hwpart); + int ret; + + ret = blk_select_hwpart(desc->bdev, hwpart); + if (!ret) + blkcache_invalidate(desc->if_type, desc->devnum); + + return ret; }
int blk_first_device(int if_type, struct udevice **devp)

On Thu, Jul 11, 2019 at 03:10:23PM +0800, Weijie Gao wrote:
Some storage devices have multiple hw partitions and both address from zero, for example eMMC. However currently block cache invalidation only applies to block write/erase. This can cause a problem that data of current hw partition is cached before switching to another hw partition. And the following read operation of the latter hw partition will get wrong data when reading from the addresses that have been cached previously.
To solve this problem, invalidate block cache after a successful select_hwpart operation.
Signed-off-by: Weijie Gao weijie.gao@mediatek.com
Applied to u-boot/master, thanks!

On 11.07.19 09:10, Weijie Gao wrote:
Some storage devices have multiple hw partitions and both address from zero, for example eMMC. However currently block cache invalidation only applies to block write/erase. This can cause a problem that data of current hw partition is cached before switching to another hw partition. And the following read operation of the latter hw partition will get wrong data when reading from the addresses that have been cached previously.
To solve this problem, invalidate block cache after a successful select_hwpart operation.
Signed-off-by: Weijie Gao weijie.gao@mediatek.com
This patch breaks correct operation of PDU001 board.
drivers/block/blk-uclass.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/block/blk-uclass.c b/drivers/block/blk-uclass.c index baaf431e5e0..c23b6682a6c 100644 --- a/drivers/block/blk-uclass.c +++ b/drivers/block/blk-uclass.c @@ -208,7 +208,11 @@ int blk_select_hwpart_devnum(enum if_type if_type, int devnum, int hwpart) if (ret) return ret;
- return blk_select_hwpart(dev, hwpart);
- ret = blk_select_hwpart(dev, hwpart);
- if (!ret)
blkcache_invalidate(if_type, devnum);
- return ret;
}
int blk_list_part(enum if_type if_type) @@ -348,7 +352,13 @@ int blk_select_hwpart(struct udevice *dev, int hwpart)
int blk_dselect_hwpart(struct blk_desc *desc, int hwpart) {
- return blk_select_hwpart(desc->bdev, hwpart);
- int ret;
- ret = blk_select_hwpart(desc->bdev, hwpart);
- if (!ret)
blkcache_invalidate(desc->if_type, desc->devnum);
Commenting the invalidation of the block cache on this line results in proper working of the board.
- return ret;
}
int blk_first_device(int if_type, struct udevice **devp)
With the patch active, files from the boot FAT partition of the SD-card (mmc device 1) do not load anymore, hence booting fails. Using the eMMC (mmc device 0) instead works fine, files can bee loaded and the board boots.
To isolate the problem I have modified the configuration to only load a 10k test file (test.bin) instead of loading the DTB and zImage files followed by booting LINUX. Furthermore I have enabled debugging for some parts of the code.
When I boot from the SD-card (which fails) I get the following logged: === CPU : AM335X-GP rev 1.0 Model: EETS,PDU001 DRAM: 256 MiB mmc_bind: alias ret=-2, devnum=-1 mmc_bind: alias ret=-2, devnum=-1 MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 In: serial@481a6000 Out: serial@481a6000 Err: serial@481a6000 Press SPACE to abort autoboot in 1 seconds blk_get_devnum_by_typename: if_type=6, devnum=1: mmc@48060000.blk, 6, 0 blk_get_devnum_by_typename: if_type=6, devnum=1: mmc@481d8000.blk, 6, 1 clock is disabled (0Hz) clock is enabled (400000Hz) clock is enabled (50000000Hz) miss: start 0, count 1 blk_find_device: if_type=6, devnum=1: mmc@48060000.blk, 6, 0 blk_find_device: if_type=6, devnum=1: mmc@481d8000.blk, 6, 1 fill: start 0, count 1 part_init: try 'EFI': ret=-1 hit: start 0, count 1 part_init: try 'DOS': ret=0 blk_get_devnum_by_typename: Device desc 8dfb3a98 miss: start 0, count 1 blk_find_device: if_type=6, devnum=1: mmc@48060000.blk, 6, 0 blk_find_device: if_type=6, devnum=1: mmc@481d8000.blk, 6, 1 fill: start 0, count 1 part_init: try 'EFI': ret=-1 hit: start 0, count 1 part_init: try 'DOS': ret=0 hit: start 0, count 1 miss: start 800, count 1 blk_find_device: if_type=6, devnum=1: mmc@48060000.blk, 6, 0 blk_find_device: if_type=6, devnum=1: mmc@481d8000.blk, 6, 1 fill: start 800, count 1 hit: start 800, count 1 FAT16, fat_sect: 1, fatlength: 64 Rootdir begins at cluster: 0, sector: 129, offset: 10200 Data begins at: 153 Sector size: 512, cluster size: 4 FAT read(sect=129), clust_size=4, read_size=4, DIRENTSPERBLOCK=16 miss: start 881, count 4 blk_find_device: if_type=6, devnum=1: mmc@48060000.blk, 6, 0 blk_find_device: if_type=6, devnum=1: mmc@481d8000.blk, 6, 1 fill: start 881, count 4 miss: start 800, count 1 blk_find_device: if_type=6, devnum=1: mmc@48060000.blk, 6, 0 blk_find_device: if_type=6, devnum=1: mmc@481d8000.blk, 6, 1 fill: start 800, count 1 FAT16, fat_sect: 1, fatlength: 64 Rootdir begins at cluster: 0, sector: 129, offset: 10200 Data begins at: 153 Sector size: 512, cluster size: 4 FAT read(sect=129), clust_size=4, read_size=4, DIRENTSPERBLOCK=16 miss: start 881, count 4 blk_find_device: if_type=6, devnum=1: mmc@48060000.blk, 6, 0 blk_find_device: if_type=6, devnum=1: mmc@481d8000.blk, 6, 1 fill: start 881, count 4 reading test.bin at pos 0 Filesize: 0 bytes Read position past EOF: 0 0 bytes read in 94 ms (0 Bytes/s) ===
Obviously the the file test.bin does not get loaded.
Booting from the eMMC (which works) logs the following: === CPU : AM335X-GP rev 1.0 Model: EETS,PDU001 DRAM: 256 MiB mmc_bind: alias ret=-2, devnum=-1 mmc_bind: alias ret=-2, devnum=-1 MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 In: serial@481a6000 Out: serial@481a6000 Err: serial@481a6000 Press SPACE to abort autoboot in 1 seconds blk_get_devnum_by_typename: if_type=6, devnum=0: mmc@48060000.blk, 6, 0 clock is disabled (0Hz) clock is enabled (400000Hz) clock is enabled (25000000Hz) clock is enabled (52000000Hz) miss: start 0, count 1 blk_find_device: if_type=6, devnum=0: mmc@48060000.blk, 6, 0 fill: start 0, count 1 part_init: try 'EFI': ret=-1 hit: start 0, count 1 part_init: try 'DOS': ret=0 blk_get_devnum_by_typename: Device desc 8dfb3800 miss: start 0, count 1 blk_find_device: if_type=6, devnum=0: mmc@48060000.blk, 6, 0 fill: start 0, count 1 part_init: try 'EFI': ret=-1 hit: start 0, count 1 part_init: try 'DOS': ret=0 hit: start 0, count 1 miss: start 1, count 1 blk_find_device: if_type=6, devnum=0: mmc@48060000.blk, 6, 0 fill: start 1, count 1 hit: start 1, count 1 FAT16, fat_sect: 1, fatlength: 20 Rootdir begins at cluster: 0, sector: 41, offset: 5200 Data begins at: 65 Sector size: 512, cluster size: 4 FAT read(sect=41), clust_size=4, read_size=4, DIRENTSPERBLOCK=16 miss: start 2a, count 4 blk_find_device: if_type=6, devnum=0: mmc@48060000.blk, 6, 0 fill: start 2a, count 4 miss: start 1, count 1 blk_find_device: if_type=6, devnum=0: mmc@48060000.blk, 6, 0 fill: start 1, count 1 FAT16, fat_sect: 1, fatlength: 20 Rootdir begins at cluster: 0, sector: 41, offset: 5200 Data begins at: 65 Sector size: 512, cluster size: 4 FAT read(sect=41), clust_size=4, read_size=4, DIRENTSPERBLOCK=16 miss: start 2a, count 4 blk_find_device: if_type=6, devnum=0: mmc@48060000.blk, 6, 0 fill: start 2a, count 4 reading test.bin at pos 0 Filesize: 10240 bytes 10240 bytes FAT16: entry: 0x000000a3 = 163, offset: 0x00a3 = 163 miss: start 2, count 6 blk_find_device: if_type=6, devnum=0: mmc@48060000.blk, 6, 0 fill: start 2, count 6 FAT16: ret: 0x000000a4, entry: 0x000000a3, offset: 0x00a3 FAT16: entry: 0x000000a4 = 164, offset: 0x00a4 = 164 FAT16: ret: 0x000000a5, entry: 0x000000a4, offset: 0x00a4 FAT16: entry: 0x000000a5 = 165, offset: 0x00a5 = 165 FAT16: ret: 0x000000a6, entry: 0x000000a5, offset: 0x00a5 FAT16: entry: 0x000000a6 = 166, offset: 0x00a6 = 166 FAT16: ret: 0x000000a7, entry: 0x000000a6, offset: 0x00a6 gc - clustnum: 163, startsect: 717 miss: start 2ce, count 20 blk_find_device: if_type=6, devnum=0: mmc@48060000.blk, 6, 0 10240 bytes read in 135 ms (73.2 KiB/s) ===
This time the file test.bin gets loaded correctly.
To be honest I don't really see why things are going wrong. It looks like the block cache gets filled but then again invalidated before it's data is used. I also feel that this problem is related to the SD-card not being the first (number 0) but the second device (number 1). Can anybody shed some light on this and give me a hint?
many thanks, Felix

On Thu, 2019-08-22 at 15:58 +0200, Felix Brack wrote:
On 11.07.19 09:10, Weijie Gao wrote:
Some storage devices have multiple hw partitions and both address from zero, for example eMMC. However currently block cache invalidation only applies to block write/erase. This can cause a problem that data of current hw partition is cached before switching to another hw partition. And the following read operation of the latter hw partition will get wrong data when reading from the addresses that have been cached previously.
To solve this problem, invalidate block cache after a successful select_hwpart operation.
Signed-off-by: Weijie Gao weijie.gao@mediatek.com
This patch breaks correct operation of PDU001 board.
drivers/block/blk-uclass.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/block/blk-uclass.c b/drivers/block/blk-uclass.c index baaf431e5e0..c23b6682a6c 100644 --- a/drivers/block/blk-uclass.c +++ b/drivers/block/blk-uclass.c @@ -208,7 +208,11 @@ int blk_select_hwpart_devnum(enum if_type if_type, int devnum, int hwpart) if (ret) return ret;
- return blk_select_hwpart(dev, hwpart);
- ret = blk_select_hwpart(dev, hwpart);
- if (!ret)
blkcache_invalidate(if_type, devnum);
- return ret;
}
int blk_list_part(enum if_type if_type) @@ -348,7 +352,13 @@ int blk_select_hwpart(struct udevice *dev, int hwpart)
int blk_dselect_hwpart(struct blk_desc *desc, int hwpart) {
- return blk_select_hwpart(desc->bdev, hwpart);
- int ret;
- ret = blk_select_hwpart(desc->bdev, hwpart);
- if (!ret)
blkcache_invalidate(desc->if_type, desc->devnum);
Commenting the invalidation of the block cache on this line results in proper working of the board.
- return ret;
}
int blk_first_device(int if_type, struct udevice **devp)
With the patch active, files from the boot FAT partition of the SD-card (mmc device 1) do not load anymore, hence booting fails. Using the eMMC (mmc device 0) instead works fine, files can bee loaded and the board boots.
To isolate the problem I have modified the configuration to only load a 10k test file (test.bin) instead of loading the DTB and zImage files followed by booting LINUX. Furthermore I have enabled debugging for some parts of the code.
When I boot from the SD-card (which fails) I get the following logged:
CPU : AM335X-GP rev 1.0
This time the file test.bin gets loaded correctly.
To be honest I don't really see why things are going wrong. It looks like the block cache gets filled but then again invalidated before it's data is used. I also feel that this problem is related to the SD-card not being the first (number 0) but the second device (number 1). Can anybody shed some light on this and give me a hint?
many thanks, Felix
Hi Felix,
I've found the root cause.
There is a bug in the FAT driver. In function file_fat_read_at(), malloc_cache_aligned() allocates memory for the itr. The itr is a pointer to struct fat_itr. fat_itr has a member block[MAX_CLUSTSIZE] used for storing blocks read from MMC.
The FAT driver resolves the file table and stores the required dir entry in the member block. Then the pointer of the dir entry is passed to get_contents() to get the contents of the file.
However the itr is freed BEFORE the call to get_contents(). This means the contents dir entry points to is no longer valid. And it results to your situation.
But this only happens when CONFIG_BLOCK_CACHE=y and the patch applied. I've also found its cause.
The MMC driver calls blk_dselect_hwpart() unconditionally in mmc_bread(). This causes block cache being invalidated and refilled many times during the FAT load operation.
The frequent block cache invalidation and refilling finally results in frequent memory allocation and freeing. There is a mechanism in the dlmalloc.c: if the total free space reaches a threshold, the malloc_trim() will be called. malloc_trim() calls sbrk(), which fills the memory being freed with zeros and of course the data of itr is damaged, otherwise the original data remains unchanged.
So this is the reason the bug only appears when block cache is enabled and cache invalidation is called after every select_hwpart.
I have made a patch, could you please have a try? I've tested it worked. If it works for you too, I will submit it to u-boot. And I will submit another patch to optimize the cache invalidation for select_hwpart.
--- a/fs/fat/fat.c +++ b/fs/fat/fat.c @@ -1174,10 +1174,6 @@ int file_fat_read_at(const char *filename, loff_t pos, void *buffer, /* For saving default max clustersize memory allocated to malloc pool */ dir_entry *dentptr = itr->dent;
- free(itr); - - itr = NULL; - ret = get_contents(&fsdata, dentptr, pos, buffer, maxsize, actread);
out_free_both:
BTW, I reproduced this failure on a MT7623 board, which also has an eMMC and a SD. And this patch is indeed used for MT7623 to solve the bug on switching hwpart on eMMC.
Best Regards,
Weijie

Hello Weijie,
On 26.08.19 10:19, Weijie Gao wrote:
On Thu, 2019-08-22 at 15:58 +0200, Felix Brack wrote:
On 11.07.19 09:10, Weijie Gao wrote:
Some storage devices have multiple hw partitions and both address from zero, for example eMMC. However currently block cache invalidation only applies to block write/erase. This can cause a problem that data of current hw partition is cached before switching to another hw partition. And the following read operation of the latter hw partition will get wrong data when reading from the addresses that have been cached previously.
To solve this problem, invalidate block cache after a successful select_hwpart operation.
Signed-off-by: Weijie Gao weijie.gao@mediatek.com
This patch breaks correct operation of PDU001 board.
drivers/block/blk-uclass.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/block/blk-uclass.c b/drivers/block/blk-uclass.c index baaf431e5e0..c23b6682a6c 100644 --- a/drivers/block/blk-uclass.c +++ b/drivers/block/blk-uclass.c @@ -208,7 +208,11 @@ int blk_select_hwpart_devnum(enum if_type if_type, int devnum, int hwpart) if (ret) return ret;
- return blk_select_hwpart(dev, hwpart);
- ret = blk_select_hwpart(dev, hwpart);
- if (!ret)
blkcache_invalidate(if_type, devnum);
- return ret;
}
int blk_list_part(enum if_type if_type) @@ -348,7 +352,13 @@ int blk_select_hwpart(struct udevice *dev, int hwpart)
int blk_dselect_hwpart(struct blk_desc *desc, int hwpart) {
- return blk_select_hwpart(desc->bdev, hwpart);
- int ret;
- ret = blk_select_hwpart(desc->bdev, hwpart);
- if (!ret)
blkcache_invalidate(desc->if_type, desc->devnum);
Commenting the invalidation of the block cache on this line results in proper working of the board.
- return ret;
}
int blk_first_device(int if_type, struct udevice **devp)
With the patch active, files from the boot FAT partition of the SD-card (mmc device 1) do not load anymore, hence booting fails. Using the eMMC (mmc device 0) instead works fine, files can bee loaded and the board boots.
To isolate the problem I have modified the configuration to only load a 10k test file (test.bin) instead of loading the DTB and zImage files followed by booting LINUX. Furthermore I have enabled debugging for some parts of the code.
When I boot from the SD-card (which fails) I get the following logged:
CPU : AM335X-GP rev 1.0
This time the file test.bin gets loaded correctly.
To be honest I don't really see why things are going wrong. It looks like the block cache gets filled but then again invalidated before it's data is used. I also feel that this problem is related to the SD-card not being the first (number 0) but the second device (number 1). Can anybody shed some light on this and give me a hint?
many thanks, Felix
Hi Felix,
I've found the root cause.
Many thanks for looking into this!
There is a bug in the FAT driver. In function file_fat_read_at(), malloc_cache_aligned() allocates memory for the itr. The itr is a pointer to struct fat_itr. fat_itr has a member block[MAX_CLUSTSIZE] used for storing blocks read from MMC.
The FAT driver resolves the file table and stores the required dir entry in the member block. Then the pointer of the dir entry is passed to get_contents() to get the contents of the file.
However the itr is freed BEFORE the call to get_contents(). This means the contents dir entry points to is no longer valid. And it results to your situation.
Indeed this is the root of all trouble. The freeing of itr of course also invalidates the assignment made earlier:
dir_entry *dentptr = itr->dent;
As dentptr is used in the call to get_contents() the evil takes its course...
Again many thanks for analyzing and explaining this as detailed as you did.
But this only happens when CONFIG_BLOCK_CACHE=y and the patch applied. I've also found its cause.
The MMC driver calls blk_dselect_hwpart() unconditionally in mmc_bread(). This causes block cache being invalidated and refilled many times during the FAT load operation.
Yes I have noticed this frequent invalidations of the cache due to calls to blk_dselect_hwpart(). Some optimization, if possible, would be great.
The frequent block cache invalidation and refilling finally results in frequent memory allocation and freeing. There is a mechanism in the dlmalloc.c: if the total free space reaches a threshold, the malloc_trim() will be called. malloc_trim() calls sbrk(), which fills the memory being freed with zeros and of course the data of itr is damaged, otherwise the original data remains unchanged.
So this is the reason the bug only appears when block cache is enabled and cache invalidation is called after every select_hwpart.
I see and now understand the different behavior of eMMC and SD-card.
I have made a patch, could you please have a try? I've tested it worked. If it works for you too, I will submit it to u-boot. And I will submit another patch to optimize the cache invalidation for select_hwpart.
--- a/fs/fat/fat.c +++ b/fs/fat/fat.c @@ -1174,10 +1174,6 @@ int file_fat_read_at(const char *filename, loff_t pos, void *buffer, /* For saving default max clustersize memory allocated to malloc pool */ dir_entry *dentptr = itr->dent;
- free(itr);
- itr = NULL;
- ret = get_contents(&fsdata, dentptr, pos, buffer, maxsize, actread);
out_free_both:
I can confirm that your patch works perfect, thanks!
Please CC me when submitting the two patches you mentioned above. It will be my pleasure to test them on my hardware.
BTW, I reproduced this failure on a MT7623 board, which also has an eMMC and a SD. And this patch is indeed used for MT7623 to solve the bug on switching hwpart on eMMC.
Best Regards,
Weijie
regard, Felix

On Mon, 2019-08-26 at 14:43 +0200, Felix Brack wrote:
Hello Weijie,
On 26.08.19 10:19, Weijie Gao wrote:
On Thu, 2019-08-22 at 15:58 +0200, Felix Brack wrote:
On 11.07.19 09:10, Weijie Gao wrote:
Some storage devices have multiple hw partitions and both address from zero, for example eMMC. However currently block cache invalidation only applies to block write/erase. This can cause a problem that data of current hw partition is cached before switching to another hw partition. And the following read operation of the latter hw partition will get wrong data when reading from the addresses that have been cached previously.
To solve this problem, invalidate block cache after a successful select_hwpart operation.
Signed-off-by: Weijie Gao weijie.gao@mediatek.com
This patch breaks correct operation of PDU001 board.
drivers/block/blk-uclass.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/block/blk-uclass.c b/drivers/block/blk-uclass.c index baaf431e5e0..c23b6682a6c 100644 --- a/drivers/block/blk-uclass.c +++ b/drivers/block/blk-uclass.c @@ -208,7 +208,11 @@ int blk_select_hwpart_devnum(enum if_type if_type, int devnum, int hwpart) if (ret) return ret;
- return blk_select_hwpart(dev, hwpart);
- ret = blk_select_hwpart(dev, hwpart);
- if (!ret)
blkcache_invalidate(if_type, devnum);
- return ret;
}
int blk_list_part(enum if_type if_type) @@ -348,7 +352,13 @@ int blk_select_hwpart(struct udevice *dev, int hwpart)
int blk_dselect_hwpart(struct blk_desc *desc, int hwpart) {
- return blk_select_hwpart(desc->bdev, hwpart);
- int ret;
- ret = blk_select_hwpart(desc->bdev, hwpart);
- if (!ret)
blkcache_invalidate(desc->if_type, desc->devnum);
Commenting the invalidation of the block cache on this line results in proper working of the board.
- return ret;
}
int blk_first_device(int if_type, struct udevice **devp)
With the patch active, files from the boot FAT partition of the SD-card (mmc device 1) do not load anymore, hence booting fails. Using the eMMC (mmc device 0) instead works fine, files can bee loaded and the board boots.
To isolate the problem I have modified the configuration to only load a 10k test file (test.bin) instead of loading the DTB and zImage files followed by booting LINUX. Furthermore I have enabled debugging for some parts of the code.
When I boot from the SD-card (which fails) I get the following logged:
CPU : AM335X-GP rev 1.0
This time the file test.bin gets loaded correctly.
To be honest I don't really see why things are going wrong. It looks like the block cache gets filled but then again invalidated before it's data is used. I also feel that this problem is related to the SD-card not being the first (number 0) but the second device (number 1). Can anybody shed some light on this and give me a hint?
many thanks, Felix
Hi Felix,
I've found the root cause.
Many thanks for looking into this!
There is a bug in the FAT driver. In function file_fat_read_at(), malloc_cache_aligned() allocates memory for the itr. The itr is a pointer to struct fat_itr. fat_itr has a member block[MAX_CLUSTSIZE] used for storing blocks read from MMC.
The FAT driver resolves the file table and stores the required dir entry in the member block. Then the pointer of the dir entry is passed to get_contents() to get the contents of the file.
However the itr is freed BEFORE the call to get_contents(). This means the contents dir entry points to is no longer valid. And it results to your situation.
Indeed this is the root of all trouble. The freeing of itr of course also invalidates the assignment made earlier:
dir_entry *dentptr = itr->dent;
As dentptr is used in the call to get_contents() the evil takes its course...
Again many thanks for analyzing and explaining this as detailed as you did.
But this only happens when CONFIG_BLOCK_CACHE=y and the patch applied. I've also found its cause.
The MMC driver calls blk_dselect_hwpart() unconditionally in mmc_bread(). This causes block cache being invalidated and refilled many times during the FAT load operation.
Yes I have noticed this frequent invalidations of the cache due to calls to blk_dselect_hwpart(). Some optimization, if possible, would be great.
The frequent block cache invalidation and refilling finally results in frequent memory allocation and freeing. There is a mechanism in the dlmalloc.c: if the total free space reaches a threshold, the malloc_trim() will be called. malloc_trim() calls sbrk(), which fills the memory being freed with zeros and of course the data of itr is damaged, otherwise the original data remains unchanged.
So this is the reason the bug only appears when block cache is enabled and cache invalidation is called after every select_hwpart.
I see and now understand the different behavior of eMMC and SD-card.
I have made a patch, could you please have a try? I've tested it worked. If it works for you too, I will submit it to u-boot. And I will submit another patch to optimize the cache invalidation for select_hwpart.
--- a/fs/fat/fat.c +++ b/fs/fat/fat.c @@ -1174,10 +1174,6 @@ int file_fat_read_at(const char *filename, loff_t pos, void *buffer, /* For saving default max clustersize memory allocated to malloc pool */ dir_entry *dentptr = itr->dent;
- free(itr);
- itr = NULL;
- ret = get_contents(&fsdata, dentptr, pos, buffer, maxsize, actread);
out_free_both:
I can confirm that your patch works perfect, thanks!
Please CC me when submitting the two patches you mentioned above. It will be my pleasure to test them on my hardware.
BTW, I reproduced this failure on a MT7623 board, which also has an eMMC and a SD. And this patch is indeed used for MT7623 to solve the bug on switching hwpart on eMMC.
Best Regards,
Weijie
regard, Felix
Hi Felix,
I found there's already a commit merged 6 hours ago (mailed on Aug 20) that fixed this issue: d7af2a863017be1f5fd1b65a858ddc7e87d7b876
So there's no need to send patch again. What you need is to pull the latest master branch.
I will still send a patch to optimize the invalidation in select_hwpart.
Best Regards,
Weijie

Hello Weijie,
On 27.08.19 05:07, Weijie Gao wrote:
Hi Felix,
I found there's already a commit merged 6 hours ago (mailed on Aug 20) that fixed this issue: d7af2a863017be1f5fd1b65a858ddc7e87d7b876
Thanks for the information. The patch referenced can be found here: https://patchwork.ozlabs.org/patch/1151091/
So there's no need to send patch again. What you need is to pull the latest master branch.
I will still send a patch to optimize the invalidation in select_hwpart.
Best Regards,
Weijie
regards, Felix
participants (3)
-
Felix Brack
-
Tom Rini
-
Weijie Gao