
On 03/31/2016 02:24 PM, Eric Nelson wrote:
Hi Stephen,
On 03/30/2016 02:57 PM, Stephen Warren wrote:
On 03/30/2016 11:34 AM, Eric Nelson wrote:
Thanks again for the detailed review, Stephen.
On 03/30/2016 07:36 AM, Stephen Warren wrote:
On 03/28/2016 11:05 AM, Eric Nelson wrote:
Add a block device cache to speed up repeated reads of block devices by various filesystems. diff --git a/disk/part.c b/disk/part.c
@@ -268,6 +268,8 @@ void part_init(struct blk_desc *dev_desc) const int n_ents = ll_entry_count(struct part_driver, part_driver); struct part_driver *entry;
- blkcache_invalidate(dev_desc->if_type, dev_desc->devnum);
Doesn't this invalidate the cache far too often? I expect that function is called for command the user executes from the command-line, whereas it'd be nice if the cache persisted across commands. I suppose this is a reasonable (and very safe) first implementation though, and saves having to go through each storage provider type and find out the right place to detect media changes.
I'm not sure it does. I traced through the mmc initialization and it's only called when the card itself is initialized.
I don't believe U-Boot caches the partition structure across user commands. Doesn't each user command (e.g. part list, ls, load, save) first look up the block device, then scan the partition table, then "mount" the filesystem, then perform its action, then throw all that state away? Conversely, "mmc rescan" only happens under explicit user control. Still as I said, the current implementation is probably fine to start with, and at least is safe.
At least for MMC, this isn't the case. Various filesystem commands operate without calling part_init.
Interesting; that step is indeed only performed when the device is first probed for MMC and USB.
diff --git a/drivers/block/blkcache.c b/drivers/block/blkcache.c
+struct block_cache_node {
- struct list_head lh;
- int iftype;
- int devnum;
- lbaint_t start;
- lbaint_t blkcnt;
- unsigned long blksz;
- char *cache;
+};
+static LIST_HEAD(block_cache);
+static struct block_cache_stats _stats = {
- .max_blocks_per_entry = 2,
- .max_entries = 32
+};
Now is a good time to mention another reason why I don't like using a dynamically allocated linked list for this: Memory fragmentation. By dynamically allocating the cache, we could easily run into a situation where the user runs a command that allocates memory and also adds to the block cache, then most of that memory gets freed when U-Boot returns to the command prompt, then the user runs the command again but it fails since it can't allocate the memory due to fragmentation of the heap. This is a real problem I've seen e.g. with the "ums" and "dfu" commands, since they might initialize the USB controller the first time they're run, which allocates some new memory. Statically allocation would avoid this.
We're going to allocate a block or set of blocks every time we allocate a new node for the list, so having the list in an array doesn't fix the problem.
We could allocate the data storage for the block cache at the top of RAM before relocation, like many other things are allocated, and hence not use malloc() for that.
Hmmm. We seem to have gone from a discussion about data structures to type of allocation.
I'm interested in seeing how that works. Can you provide hints about what's doing this now?
Something like common/board_f.c:reserve_mmu() and many other functions there. relocaddr starts at approximately the top of RAM, continually gets adjusted down as many static allocations are reserved, and eventually becomes the address that U-Boot is relocated to. Simply adding another entry into init_sequence_f[] for the disk cache might work.
While re-working the code, I also thought more about using an array and still don't see how the implementation doesn't get more complex.
The key bit is that the list is implemented in MRU order so invalidating the oldest is trivial.
Yes, the MRU logic would make it more complex. Is that particularly useful, i.e. is it an intrinsic part of the speedup?
It's not a question of speed with small numbers of entries. The code to handle eviction would just be more complex.
My thought was that if the eviction algorithm wasn't important (i.e. most of the speedup comes from have some (any) kind of cache, but the eviction algorithm makes little difference to the gain from having the cache), we could just drop MRU completely. If that's not possible, then indeed a list would make implementing MRU easier.
You could still do a list with a statically allocated set of list nodes, especially since the length of the list is bounded.
Given that the command "blkcache configure 0 0" will discard all cache and since both dfu and ums should properly have the cache disabled, I'd like to proceed as-is with the list and heap approach.
I don't understand "since both dfu and ums should properly have the cache disabled"; I didn't see anything that did that. Perhaps you're referring to the fact that writes invalidate the cache?
Eventually it seems better to keep the cache enabled for at least DFU to a filesystem (rather than raw block device) since presumably parsing the directory structure to write to a file for DFU would benefit from the cache just like anything else.