[U-Boot] ext4 and caching

Hi all,
I've been seeing the same sort of issues repoted by Ionut and as addressed by this patch: http://lists.denx.de/pipermail/u-boot/2014-January/171459.html
That patch was added in commit fc0fc50 and reverted in commit 715b56f.
It no longer applies cleanly, and when I tried to resurrect it, I saw errors traversing directories and perhaps something went wrong with my merge.
Ionut, do you have a current version of this patch?
When I looked a little further, I found that a read of a ~150 MiB was around 30x slower than typical, and that the **same** 8 blocks accounted for the majority of the reads.
In fact, these same blocks were read back-to-back.
The following is a quick picture of the output from a simple printf in mmc_bread of the block number and count during a load of the problem file.
~$ uniq -c < mmc_bread.log | sort -n 1 mmc_bread: 0/1 1 mmc_bread: 2293760/1 1 mmc_bread: 2293762/2 1 mmc_bread: 2293768/1 1 mmc_bread: 2293768/1 1 mmc_bread: 2293768/1 1 mmc_bread: 2293768/1 1 mmc_bread: 2295264/1 1 mmc_bread: 2295270/1 1 mmc_bread: 2295290/1 1 mmc_bread: 2295645/1 1 mmc_bread: 7536640/131072 1 mmc_bread: 7667712/32768 1 mmc_bread: 7700480/16384 1 mmc_bread: 7729152/65536 1 mmc_bread: 7798784/130722 1 mmc_bread: 7929506/1 1 mmc_init: 0, time 129 2 mmc_bread: 0/1 6 mmc_bread: 2359120/1 10 mmc_bread: 2358752/1 34 mmc_bread: 2358808/1 2048 mmc_bread: 2557936/8 4096 mmc_bread: 2557936/8 8193 mmc_bread: 2557936/8 16340 mmc_bread: 2557936/8 16384 mmc_bread: 2557936/8
~$ sort < mmc_bread.log | uniq -c | sort -n | tail -n 1 47061 mmc_bread: 2557936/8
In English, the 8 blocks starting at 2557936 are read 47061 times, and back to back in large (2k/4k/16k) bunches, so a very simple **single** block cache of the last read will fix the speed issue and I hacked something up to verify that.
Is anybody else working on things in this area?
I think this is something that's probably easier to fix at the block device level rather than within the ext4 filesystem code.
That said, the 2k/4k/16k bunches above may also indicate a simpler fix in the ext4 code.
Please chime in with your thoughts.
Regards,
Eric Nelson

Here's an example of a very simple cache for block devices that will prevent duplicate back-to-back reads from the same block device.
By itself, this is sufficient to speed reads of certain files from ext4 by 30x as described in this thread.
The areas I think could benefit most from some level of block cache in U-Boot are: - reads from partition tables - directory searches
Having more than one cache entry is probably better for both of those, but I'd like to get some feedback first.
The implementation explicitly prevents reads of more than 8 blocks to prevent slowing down the most speed-critical operations (reading files) and I'm not sure this is the right choice of size.
The small size limit also prevents the overhead in the interface to cache_block_fill() from being a problem. A more robust approach would allow the cache to take over previously allocated blocks, but this would also be more invasive, since some cacheable blocks are currently allocated on the stack.
Eric Nelson (2): add block device cache mmc: add support for block device cache
drivers/block/Makefile | 1 + drivers/block/cache_block.c | 76 +++++++++++++++++++++++++++++++++++++++++++++ drivers/mmc/mmc.c | 10 +++++- drivers/mmc/mmc_write.c | 7 +++++ include/part.h | 65 ++++++++++++++++++++++++++++++++++++++ 5 files changed, 158 insertions(+), 1 deletion(-) create mode 100644 drivers/block/cache_block.c

Signed-off-by: Eric Nelson eric@nelint.com --- drivers/block/Makefile | 1 + drivers/block/cache_block.c | 76 +++++++++++++++++++++++++++++++++++++++++++++ include/part.h | 65 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 142 insertions(+) create mode 100644 drivers/block/cache_block.c
diff --git a/drivers/block/Makefile b/drivers/block/Makefile index b5c7ae1..056a48b 100644 --- a/drivers/block/Makefile +++ b/drivers/block/Makefile @@ -24,3 +24,4 @@ obj-$(CONFIG_IDE_SIL680) += sil680.o obj-$(CONFIG_SANDBOX) += sandbox.o obj-$(CONFIG_SCSI_SYM53C8XX) += sym53c8xx.o obj-$(CONFIG_SYSTEMACE) += systemace.o +obj-$(CONFIG_BLOCK_CACHE) += cache_block.o diff --git a/drivers/block/cache_block.c b/drivers/block/cache_block.c new file mode 100644 index 0000000..12e60ac --- /dev/null +++ b/drivers/block/cache_block.c @@ -0,0 +1,76 @@ +/* + * Copyright (C) Nelson Integration, LLC 2016 + * Author: Eric Nelsoneric@nelint.com + * + * SPDX-License-Identifier: GPL-2.0+ + * + */ +#include <config.h> +#include <common.h> +#include <malloc.h> +#include <part.h> + +#define MAX_CACHEBLOCKS 8 + +static int cache_iftype = -1; +static int cache_devnum = -1; +static lbaint_t cache_start = -1; +static lbaint_t cache_blkcnt = -1; +static unsigned long cache_blksz; +static void *cache; + +int cache_block_read(int iftype, int dev, + lbaint_t start, lbaint_t blkcnt, + unsigned long blksz, void *buffer) +{ + if ((iftype == cache_iftype) && + (dev == cache_devnum) && + (start == cache_start) && + (blkcnt <= cache_blkcnt) && + (blksz == cache_blksz) && + (cache != 0)) { + memcpy(buffer, cache, blksz*blkcnt); + return 1; + } + return 0; +} + +void cache_block_fill(int iftype, int dev, + lbaint_t start, lbaint_t blkcnt, + unsigned long blksz, void const *buffer) +{ + lbaint_t bytes; + + /* don't cache big stuff */ + if (blkcnt > MAX_CACHEBLOCKS) + return; + + bytes = blksz*blkcnt; + if (cache != 0) { + if (bytes != (cache_blksz*cache_blkcnt)) { + free(cache); + cache = malloc(blksz*blkcnt); + if (!cache) + return; + } /* change in size */ + } else { + cache = malloc(blksz*blkcnt); + if (!cache) + return; + } + memcpy(cache, buffer, bytes); + cache_iftype = iftype; + cache_devnum = dev; + cache_start = start; + cache_blkcnt = blkcnt; + cache_blksz = blksz; +} + +void cache_block_invalidate(int iftype, int dev) +{ + cache_iftype = -1; + if (cache) { + free(cache); + cache = 0; + } +} diff --git a/include/part.h b/include/part.h index 6d8f520..21f820f 100644 --- a/include/part.h +++ b/include/part.h @@ -369,4 +369,69 @@ int gpt_verify_partitions(struct blk_desc *dev_desc, gpt_header *gpt_head, gpt_entry **gpt_pte); #endif
+#ifdef CONFIG_BLOCK_CACHE +/** + * cache_block_read() - attempt to read a set of blocks from cache + * + * @param iftype - IF_TYPE_x for type of device + * @param dev - device index of particular type + * @param start - starting block number + * @param blkcnt - number of blocks to read + * @param blksz - size in bytes of each block + * @param buf - buffer to contain cached data + * + * @return - '1' if block returned from cache, '0' otherwise. + */ +int cache_block_read + (int iftype, int dev, + lbaint_t start, lbaint_t blkcnt, + unsigned long blksz, void *buffer); + +/** + * cache_block_fill() - make data read from a block device available + * to the block cache + * + * @param iftype - IF_TYPE_x for type of device + * @param dev - device index of particular type + * @param start - starting block number + * @param blkcnt - number of blocks available + * @param blksz - size in bytes of each block + * @param buf - buffer containing data to cache + * + */ +void cache_block_fill + (int iftype, int dev, + lbaint_t start, lbaint_t blkcnt, + unsigned long blksz, void const *buffer); + +/** + * cache_block_invalidate() - discard the cache for a set of blocks + * because of a write or device (re)initialization. + * + * @param iftype - IF_TYPE_x for type of device + * @param dev - device index of particular type + */ +void cache_block_invalidate + (int iftype, int dev); + +#else + +static inline int cache_block_read + (int iftype, int dev, + lbaint_t start, lbaint_t blkcnt, + unsigned long blksz, void *buffer) +{ + return 0; +} + +static inline void cache_block_fill + (int iftype, int dev, + lbaint_t start, lbaint_t blkcnt, + unsigned long blksz, void const *buffer) {} + +static inline void cache_block_invalidate + (int iftype, int dev) {} + +#endif + #endif /* _PART_H */

On 03/16/2016 03:40 PM, Eric Nelson wrote:
Signed-off-by: Eric Nelson eric@nelint.com
A patch description would be useful here; the cover letter wouldn't be checked in.
diff --git a/drivers/block/cache_block.c b/drivers/block/cache_block.c
+static int cache_iftype = -1; +static int cache_devnum = -1; +static lbaint_t cache_start = -1; +static lbaint_t cache_blkcnt = -1; +static unsigned long cache_blksz; +static void *cache;
Since variable "cache" is in BSS and hence is 0, which is included in all checks below, none of the other values need to be initialized.
+int cache_block_read(int iftype, int dev,
lbaint_t start, lbaint_t blkcnt,
unsigned long blksz, void *buffer)
+{
- if ((iftype == cache_iftype) &&
(dev == cache_devnum) &&
(start == cache_start) &&
(blkcnt <= cache_blkcnt) &&
(blksz == cache_blksz) &&
(cache != 0)) {
I'd suggest putting (cache != 0) first, since that check indicates whether any of the others are relevant.
memcpy(buffer, cache, blksz*blkcnt);
Nit: space around the * operator. Same comment everywhere.
It's probably hard to fix this given the simple API and lack of MMU page-tables to remap on a per-page basis, but one deficiency in this (deliberately) simple implementation is that if someone caches blocks 0..7 then later tries to read 2..10 (or even 2..3) they won't get a cache hit. While partial hits (the 2..10 case) are hard to deal with, perhaps it's useful to make subset cases (2..3 with 0..7 cached) work?
+void cache_block_fill(int iftype, int dev,
lbaint_t start, lbaint_t blkcnt,
unsigned long blksz, void const *buffer)
- bytes = blksz*blkcnt;
- if (cache != 0) {
if (bytes != (cache_blksz*cache_blkcnt)) {
free(cache);
cache = malloc(blksz*blkcnt);
if (!cache)
return;
} /* change in size */
- } else {
cache = malloc(blksz*blkcnt);
if (!cache)
return;
- }
Is it better to put the if (!cache) test after the whole if/else block so it's unified?
+void cache_block_invalidate(int iftype, int dev) +{
- cache_iftype = -1;
That isn't actually necessary, since assigning 0 to cache below will prevent anything from using the cache.
- if (cache) {
free(cache);
cache = 0;
- }
+}
diff --git a/include/part.h b/include/part.h
+#ifdef CONFIG_BLOCK_CACHE +/**
- cache_block_read() - attempt to read a set of blocks from cache
- @param iftype - IF_TYPE_x for type of device
- @param dev - device index of particular type
- @param start - starting block number
- @param blkcnt - number of blocks to read
- @param blksz - size in bytes of each block
- @param buf - buffer to contain cached data
s/contain/receive/?
- @return - '1' if block returned from cache, '0' otherwise.
- */
+int cache_block_read
- (int iftype, int dev,
lbaint_t start, lbaint_t blkcnt,
unsigned long blksz, void *buffer);
Nit: ( and probably the first n parameters should be on the same line as the function name.

Thanks for the review(s) Stephen.
On 03/17/2016 02:16 PM, Stephen Warren wrote:
On 03/16/2016 03:40 PM, Eric Nelson wrote:
Signed-off-by: Eric Nelson eric@nelint.com
A patch description would be useful here; the cover letter wouldn't be checked in.
Yeah. Please hote the RFC.
I was really hoping for some broader feedback about whether this is a better approach than the more specialized ext4 extent cache.
If I can get an ack on the approach, I think a minimal block device cache would support at least 2 or 4 entries, and I'd need to be able to answer the questions from your other response:
Do you have any stats on how many operations this saves for typical FS operations such as:
- Partition table type identification (with various types such as MBR/DOS, GPT, ...)
- Partition enumeration
- Filesystem identification (with various filesystems such as FAT, ext, ...)
- File reads
Should I interpret this as support of a small(ish) block device cache?
Regards,
Eric

On 03/17/2016 03:33 PM, Eric Nelson wrote:
Thanks for the review(s) Stephen.
On 03/17/2016 02:16 PM, Stephen Warren wrote:
On 03/16/2016 03:40 PM, Eric Nelson wrote:
Signed-off-by: Eric Nelson eric@nelint.com
A patch description would be useful here; the cover letter wouldn't be checked in.
Yeah. Please hote the RFC.
I was really hoping for some broader feedback about whether this is a better approach than the more specialized ext4 extent cache.
If I can get an ack on the approach, I think a minimal block device cache would support at least 2 or 4 entries, and I'd need to be able to answer the questions from your other response:
Do you have any stats on how many operations this saves for typical FS operations such as:
- Partition table type identification (with various types such as MBR/DOS, GPT, ...)
- Partition enumeration
- Filesystem identification (with various filesystems such as FAT, ext, ...)
- File reads
Should I interpret this as support of a small(ish) block device cache?
Conceptually I think it sounds OK provided it yields some demonstrable improvement across a variety of use-cases, and the design doesn't prevent it addressing obvious other cases it should address.
You probably want to Cc (or hope they see this and provide feedback anyway) a few other people such as Tom Rini, Simon Glass, Marek Vasut, etc.

On Thu, Mar 17, 2016 at 02:33:04PM -0700, Eric Nelson wrote:
Thanks for the review(s) Stephen.
On 03/17/2016 02:16 PM, Stephen Warren wrote:
On 03/16/2016 03:40 PM, Eric Nelson wrote:
Signed-off-by: Eric Nelson eric@nelint.com
A patch description would be useful here; the cover letter wouldn't be checked in.
Yeah. Please hote the RFC.
I was really hoping for some broader feedback about whether this is a better approach than the more specialized ext4 extent cache.
Well, I guess it comes down to how hard it is to also re-use this on say USB (another common case and one I assume you can test on your HW) since all of EXT4 and FAT and MMC and USB are fairly common and we should be able to address these with your approach, or at least be within "rework some other stuff and then.." distance.

Thanks for the feedback Tom,
On 03/20/2016 03:13 PM, Tom Rini wrote:
On Thu, Mar 17, 2016 at 02:33:04PM -0700, Eric Nelson wrote:
Thanks for the review(s) Stephen.
On 03/17/2016 02:16 PM, Stephen Warren wrote:
On 03/16/2016 03:40 PM, Eric Nelson wrote:
Signed-off-by: Eric Nelson eric@nelint.com
A patch description would be useful here; the cover letter wouldn't be checked in.
Yeah. Please hote the RFC.
I was really hoping for some broader feedback about whether this is a better approach than the more specialized ext4 extent cache.
Well, I guess it comes down to how hard it is to also re-use this on say USB (another common case and one I assume you can test on your HW) since all of EXT4 and FAT and MMC and USB are fairly common and we should be able to address these with your approach, or at least be within "rework some other stuff and then.." distance.
It should be trivial to include this support for USB, SATA, IDE, or any other block device, but not transparent.
I'm close to a V2 RFC patch so people can throw stones at an actual implementation.
Regards,
Eric

Signed-off-by: Eric Nelson eric@nelint.com --- drivers/mmc/mmc.c | 10 +++++++++- drivers/mmc/mmc_write.c | 7 +++++++ 2 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/mmc/mmc.c b/drivers/mmc/mmc.c index 8b2e606..956f4e1 100644 --- a/drivers/mmc/mmc.c +++ b/drivers/mmc/mmc.c @@ -6,7 +6,6 @@ * * SPDX-License-Identifier: GPL-2.0+ */ - #include <config.h> #include <common.h> #include <command.h> @@ -240,6 +239,8 @@ static ulong mmc_bread(struct blk_desc *block_dev, lbaint_t start, int dev_num = block_dev->devnum; int err; lbaint_t cur, blocks_todo = blkcnt; + void *outbuf = dst; + lbaint_t outblk = start;
if (blkcnt == 0) return 0; @@ -260,6 +261,10 @@ static ulong mmc_bread(struct blk_desc *block_dev, lbaint_t start, return 0; }
+ if (cache_block_read(IF_TYPE_MMC, dev_num, start, blkcnt, + mmc->read_bl_len, dst)) + return blkcnt; + if (mmc_set_blocklen(mmc, mmc->read_bl_len)) { debug("%s: Failed to set blocklen\n", __func__); return 0; @@ -277,6 +282,9 @@ static ulong mmc_bread(struct blk_desc *block_dev, lbaint_t start, dst += cur * mmc->read_bl_len; } while (blocks_todo > 0);
+ cache_block_fill(IF_TYPE_MMC, dev_num, outblk, blkcnt, + mmc->read_bl_len, outbuf); + return blkcnt; }
diff --git a/drivers/mmc/mmc_write.c b/drivers/mmc/mmc_write.c index 7b186f8..a877c78 100644 --- a/drivers/mmc/mmc_write.c +++ b/drivers/mmc/mmc_write.c @@ -12,6 +12,7 @@ #include <part.h> #include <div64.h> #include <linux/math64.h> +#include <part.h> #include "mmc_private.h"
static ulong mmc_erase_t(struct mmc *mmc, ulong start, lbaint_t blkcnt) @@ -20,6 +21,8 @@ static ulong mmc_erase_t(struct mmc *mmc, ulong start, lbaint_t blkcnt) ulong end; int err, start_cmd, end_cmd;
+ cache_block_invalidate(IF_TYPE_MMC, mmc->block_dev.dev); + if (mmc->high_capacity) { end = start + blkcnt - 1; } else { @@ -82,6 +85,8 @@ unsigned long mmc_berase(struct blk_desc *block_dev, lbaint_t start, if (err < 0) return -1;
+ cache_block_invalidate(IF_TYPE_MMC, dev_num); + /* * We want to see if the requested start or total block count are * unaligned. We discard the whole numbers and only care about the @@ -186,6 +191,8 @@ ulong mmc_bwrite(struct blk_desc *block_dev, lbaint_t start, lbaint_t blkcnt, if (err < 0) return 0;
+ cache_block_invalidate(IF_TYPE_MMC, dev_num); + if (mmc_set_blocklen(mmc, mmc->write_bl_len)) return 0;

On 03/16/2016 03:40 PM, Eric Nelson wrote:
Signed-off-by: Eric Nelson eric@nelint.com
Patch description.
drivers/mmc/mmc.c | 10 +++++++++- drivers/mmc/mmc_write.c | 7 +++++++
Presumably it makes sense for the cache to work for IDE, SATA, USB, SCSI, ... too. I wonder if it's possible to put this code somewhere more central than mmc*.c so it automatically applies to dev_desc->block_read() (see include/part.h). Perhaps not since each implementation supplies its own block_read function directly, so the cache calls do need to be duplicated everywhere.
diff --git a/drivers/mmc/mmc.c b/drivers/mmc/mmc.c
- SPDX-License-Identifier: GPL-2.0+
*/
- #include <config.h>
Nit: unrelated change.
I think there is a missing call to cache_block_invalidate() when the MMC device gets re-enumerated/re-initialized. The user would do something to trigger this (e.g. mmc rescan) when they'd swapped an SD card out for example.
Do you have any stats on how many operations this saves for typical FS operations such as:
- Partition table type identification (with various types such as MBR/DOS, GPT, ...) - Partition enumeration - Filesystem identification (with various filesystems such as FAT, ext, ...) - File reads

Hi Stephen,
On 03/17/2016 02:23 PM, Stephen Warren wrote:
On 03/16/2016 03:40 PM, Eric Nelson wrote:
Signed-off-by: Eric Nelson eric@nelint.com
Patch description.
drivers/mmc/mmc.c | 10 +++++++++- drivers/mmc/mmc_write.c | 7 +++++++
Presumably it makes sense for the cache to work for IDE, SATA, USB, SCSI, ... too. I wonder if it's possible to put this code somewhere more central than mmc*.c so it automatically applies to dev_desc->block_read() (see include/part.h). Perhaps not since each implementation supplies its own block_read function directly, so the cache calls do need to be duplicated everywhere.
Yeah. I haven't found a spot that would allow interception of the various block_read/write functions.
The get_dev_hwpart() routine in disk/part.c is close, but it seems to be bypassed by cmd_mmc, cmd_sata, et cetera.
I think the best that can be done is to provide a common shim that can easily be inserted from within the various block driver code.
diff --git a/drivers/mmc/mmc.c b/drivers/mmc/mmc.c
- SPDX-License-Identifier: GPL-2.0+
*/
- #include <config.h>
Nit: unrelated change.
I think there is a missing call to cache_block_invalidate() when the MMC device gets re-enumerated/re-initialized. The user would do something to trigger this (e.g. mmc rescan) when they'd swapped an SD card out for example.
Good catch.
Do you have any stats on how many operations this saves for typical FS operations such as:
- Partition table type identification (with various types such as
MBR/DOS, GPT, ...)
- Partition enumeration
- Filesystem identification (with various filesystems such as FAT, ext,
...)
- File reads
Not yet, but I'm working something up that will allow this to be gathered easily. As soon as we implement a cache, it provides a nice spot for tracing operations.
Regards,
Eric

On Sun, Mar 20, 2016 at 12:35:53PM -0700, Eric Nelson wrote:
Hi Stephen,
On 03/17/2016 02:23 PM, Stephen Warren wrote:
On 03/16/2016 03:40 PM, Eric Nelson wrote:
Signed-off-by: Eric Nelson eric@nelint.com
Patch description.
drivers/mmc/mmc.c | 10 +++++++++- drivers/mmc/mmc_write.c | 7 +++++++
Presumably it makes sense for the cache to work for IDE, SATA, USB, SCSI, ... too. I wonder if it's possible to put this code somewhere more central than mmc*.c so it automatically applies to dev_desc->block_read() (see include/part.h). Perhaps not since each implementation supplies its own block_read function directly, so the cache calls do need to be duplicated everywhere.
Yeah. I haven't found a spot that would allow interception of the various block_read/write functions.
Shouldn't DM also help here?

Hi Tom,
On 03/20/2016 03:13 PM, Tom Rini wrote:
On Sun, Mar 20, 2016 at 12:35:53PM -0700, Eric Nelson wrote:
Hi Stephen,
On 03/17/2016 02:23 PM, Stephen Warren wrote:
On 03/16/2016 03:40 PM, Eric Nelson wrote:
Signed-off-by: Eric Nelson eric@nelint.com
Patch description.
drivers/mmc/mmc.c | 10 +++++++++- drivers/mmc/mmc_write.c | 7 +++++++
Presumably it makes sense for the cache to work for IDE, SATA, USB, SCSI, ... too. I wonder if it's possible to put this code somewhere more central than mmc*.c so it automatically applies to dev_desc->block_read() (see include/part.h). Perhaps not since each implementation supplies its own block_read function directly, so the cache calls do need to be duplicated everywhere.
Yeah. I haven't found a spot that would allow interception of the various block_read/write functions.
Shouldn't DM also help here?
I haven't yet looked, but this may be true.
I'm seeing some build breakage on master surrounding the use of DM though.
If I select DM and BLK on top of nitrogen6q_defconfig, I get lots of build errors.
I want to get a V2 RFC patch out before digging through the details of that.
Regards,
Eric

Hi Tom,
On 03/20/2016 03:54 PM, Eric Nelson wrote:
On 03/20/2016 03:13 PM, Tom Rini wrote:
On Sun, Mar 20, 2016 at 12:35:53PM -0700, Eric Nelson wrote:
On 03/17/2016 02:23 PM, Stephen Warren wrote:
On 03/16/2016 03:40 PM, Eric Nelson wrote:
Signed-off-by: Eric Nelson eric@nelint.com
Patch description.
drivers/mmc/mmc.c | 10 +++++++++- drivers/mmc/mmc_write.c | 7 +++++++
Presumably it makes sense for the cache to work for IDE, SATA, USB, SCSI, ... too. I wonder if it's possible to put this code somewhere more central than mmc*.c so it automatically applies to dev_desc->block_read() (see include/part.h). Perhaps not since each implementation supplies its own block_read function directly, so the cache calls do need to be duplicated everywhere.
Yeah. I haven't found a spot that would allow interception of the various block_read/write functions.
Shouldn't DM also help here?
I haven't yet looked, but this may be true.
I'm seeing some build breakage on master surrounding the use of DM though.
If I select DM and BLK on top of nitrogen6q_defconfig, I get lots of build errors.
I want to get a V2 RFC patch out before digging through the details of that.
I'm obviously not up to speed on the state of DM and I hadn't seen Simon's patch adding blk.h.
The new blk_dread/write/erase functions do provide a convenient spot for checking cache, though they're not universally used yet.
In particular, hooking up the cache there will lose visibility into things like the "mmc write" command.
I'm also not sure of the current state of DM with respect to block drivers and wonder if a block cache should wait a cycle or two.
Simon, I'd appreciate some feedback when you have a chance.
Regards,
Eric

Hi all,
On 03/21/2016 11:31 AM, Eric Nelson wrote:
On 03/20/2016 03:54 PM, Eric Nelson wrote:
On 03/20/2016 03:13 PM, Tom Rini wrote:
On Sun, Mar 20, 2016 at 12:35:53PM -0700, Eric Nelson wrote:
On 03/17/2016 02:23 PM, Stephen Warren wrote:
On 03/16/2016 03:40 PM, Eric Nelson wrote:
Signed-off-by: Eric Nelson eric@nelint.com
...
I'm seeing some build breakage on master surrounding the use of DM though.
Peng's patch made it clear that DM wasn't supported by fsl_esdhc.
If I select DM and BLK on top of nitrogen6q_defconfig, I get lots of build errors.
It's CONFIG_BLK that produces lots of issues, and from what I can tell, it's only currently supported for sandbox.
Out of ignorance, I was conflating the two.
I want to get a V2 RFC patch out before digging through the details of that.
I'm obviously not up to speed on the state of DM and I hadn't seen Simon's patch adding blk.h.
The new blk_dread/write/erase functions do provide a convenient spot for checking cache, though they're not universally used yet.
In particular, hooking up the cache there will lose visibility into things like the "mmc write" command.
I'm also not sure of the current state of DM with respect to block drivers and wonder if a block cache should wait a cycle or two.
Simon, I'd appreciate some feedback when you have a chance.
I think I have a better handle on this now.
I'm still a bit confused on what needs to be done in order for CONFIG_BLK to work against real hardware.
From what I can tell, the some modules in cmd/ need to be
updated to use blk_dread/blk_dwrite/blk_derase and some kind of re-structuring needs to occur in drivers/mmc to support the "blk" uclass.
Does that sound about right?
Is somebody currently working on this?
Please advise,
Eric

Hi Eric,
On 20 March 2016 at 16:54, Eric Nelson eric@nelint.com wrote:
Hi Tom,
On 03/20/2016 03:13 PM, Tom Rini wrote:
On Sun, Mar 20, 2016 at 12:35:53PM -0700, Eric Nelson wrote:
Hi Stephen,
On 03/17/2016 02:23 PM, Stephen Warren wrote:
On 03/16/2016 03:40 PM, Eric Nelson wrote:
Signed-off-by: Eric Nelson eric@nelint.com
Patch description.
drivers/mmc/mmc.c | 10 +++++++++- drivers/mmc/mmc_write.c | 7 +++++++
Presumably it makes sense for the cache to work for IDE, SATA, USB, SCSI, ... too. I wonder if it's possible to put this code somewhere more central than mmc*.c so it automatically applies to dev_desc->block_read() (see include/part.h). Perhaps not since each implementation supplies its own block_read function directly, so the cache calls do need to be duplicated everywhere.
Yeah. I haven't found a spot that would allow interception of the various block_read/write functions.
Shouldn't DM also help here?
I haven't yet looked, but this may be true.
I'm seeing some build breakage on master surrounding the use of DM though.
If I select DM and BLK on top of nitrogen6q_defconfig, I get lots of build errors.
I want to get a V2 RFC patch out before digging through the details of that.
I'm about to send out a series that rationalises the block devices a bit. In the meantime, see u-boot-dm/blkb-working for some MMC ideas.
Regards, Simon

Hi Simon,
On 04/09/2016 10:55 AM, Simon Glass wrote:
On 20 March 2016 at 16:54, Eric Nelson eric@nelint.com wrote:
On 03/20/2016 03:13 PM, Tom Rini wrote:
On Sun, Mar 20, 2016 at 12:35:53PM -0700, Eric Nelson wrote:
On 03/17/2016 02:23 PM, Stephen Warren wrote:
On 03/16/2016 03:40 PM, Eric Nelson wrote:
Signed-off-by: Eric Nelson eric@nelint.com
Patch description.
drivers/mmc/mmc.c | 10 +++++++++- drivers/mmc/mmc_write.c | 7 +++++++
Presumably it makes sense for the cache to work for IDE, SATA, USB, SCSI, ... too. I wonder if it's possible to put this code somewhere more central than mmc*.c so it automatically applies to dev_desc->block_read() (see include/part.h). Perhaps not since each implementation supplies its own block_read function directly, so the cache calls do need to be duplicated everywhere.
Yeah. I haven't found a spot that would allow interception of the various block_read/write functions.
Shouldn't DM also help here?
I haven't yet looked, but this may be true.
I'm seeing some build breakage on master surrounding the use of DM though.
If I select DM and BLK on top of nitrogen6q_defconfig, I get lots of build errors.
I want to get a V2 RFC patch out before digging through the details of that.
I'm about to send out a series that rationalises the block devices a bit. In the meantime, see u-boot-dm/blkb-working for some MMC ideas.
I figured things out and sent V2 and V3 versions of these patches.
Tom applied V3 to master, though I do still have three patches to address Stephen's review pending:
http://lists.denx.de/pipermail/u-boot/2016-April/thread.html#250331
https://patchwork.ozlabs.org/patch/605421/ https://patchwork.ozlabs.org/patch/605420/ https://patchwork.ozlabs.org/patch/605422/
Regards,
Eric

Hi Stephen,
On 03/20/2016 12:35 PM, Eric Nelson wrote:
On 03/17/2016 02:23 PM, Stephen Warren wrote:
On 03/16/2016 03:40 PM, Eric Nelson wrote:
...
Do you have any stats on how many operations this saves for typical FS operations such as:
- Partition table type identification (with various types such as
MBR/DOS, GPT, ...)
- Partition enumeration
- Filesystem identification (with various filesystems such as FAT, ext,
...)
- File reads
Not yet, but I'm working something up that will allow this to be gathered easily. As soon as we implement a cache, it provides a nice spot for tracing operations.
The V2 RFC patch set implements a command (blkcache) that allows tracing block read operations as they pass through the cache, among other things:
=> blkcache trace => load mmc 0 10008000 /zImage miss: start 0, count 1 fill: start 0, count 1 miss: start 2000, count 1 fill: start 2000, count 1 reading /zImage hit: start 2000, count 1 miss: start 2011, count 2 miss: start 2001, count 6 miss: start 2031, count 9678 miss: start 45ff, count 1 fill: start 45ff, count 1 4955304 bytes read in 258 ms (18.3 MiB/s)
http://lists.denx.de/pipermail/u-boot/2016-March/249155.html https://patchwork.ozlabs.org/patch/599942/

Hi Eric,
On Wed, Mar 16, 2016 at 11:42:55AM -0700, EXT Eric Nelson wrote:
Hi all,
I've been seeing the same sort of issues repoted by Ionut and as addressed by this patch: http://lists.denx.de/pipermail/u-boot/2014-January/171459.html
That patch was added in commit fc0fc50 and reverted in commit 715b56f.
It no longer applies cleanly, and when I tried to resurrect it, I saw errors traversing directories and perhaps something went wrong with my merge.
Ionut, do you have a current version of this patch?
We reverted that patch because it was breaking ext4 write support. I initially developed the patch on top of an older u-boot which didn't have write supoort at all.
I have tried to refactor my patch to work with both read/write, but I gave up when I saw that the ext4 write code relied on the old behavior of read_allocated_block/ext4fs_get_extent_block. I could have worked around that, but then the whole thing would have looked like hack, so I didn't like it ...
If you are interested in a current version of this patch, I could try to revive it. But it has the limitation I mentioned above, so I guess you could use it just for some ext4 read performance measurement test.
Regards, Ioan

Thanks Ioan,
On 03/19/2016 08:42 AM, Ioan Nicu wrote:
Hi Eric,
On Wed, Mar 16, 2016 at 11:42:55AM -0700, EXT Eric Nelson wrote:
Hi all,
I've been seeing the same sort of issues repoted by Ionut and as addressed by this patch: http://lists.denx.de/pipermail/u-boot/2014-January/171459.html
That patch was added in commit fc0fc50 and reverted in commit 715b56f.
It no longer applies cleanly, and when I tried to resurrect it, I saw errors traversing directories and perhaps something went wrong with my merge.
Ionut, do you have a current version of this patch?
We reverted that patch because it was breaking ext4 write support. I initially developed the patch on top of an older u-boot which didn't have write supoort at all.
I have tried to refactor my patch to work with both read/write, but I gave up when I saw that the ext4 write code relied on the old behavior of read_allocated_block/ext4fs_get_extent_block. I could have worked around that, but then the whole thing would have looked like hack, so I didn't like it ...
If you are interested in a current version of this patch, I could try to revive it. But it has the limitation I mentioned above, so I guess you could use it just for some ext4 read performance measurement test.
I do want to get rid of the performance problem, but as I suggested by my patch set, I think that a more general-purpose cache is a better approach.
Regards,
Eric
participants (5)
-
Eric Nelson
-
Ioan Nicu
-
Simon Glass
-
Stephen Warren
-
Tom Rini