[PATCH 0/8] u-boot: fs: add generic unaligned read handling

[BACKGROUND] Unlike FUSE/Kernel which always pass aligned read range, U-boot fs code just pass the request range to underlying fses.
Under most case, this works fine, as U-boot only really needs to read the whole file (aka, 0 for both offset and len, len will be later determined using file size).
But if some advanced user/script wants to extract kernel/initramfs from combined image, we may need to do unaligned read in that case.
[ADVANTAGE] This patchset will handle unaligned read range in _fs_read():
- Get blocksize of the underlying fs
- Read the leading block contianing the unaligned range The full block will be stored in a local buffer, then only copy the bytes in the unaligned range into the destination buffer.
If the first block covers the whole range, we just call it aday.
- Read the aligned range if there is any
- Read the tailing block containing the unaligned range And copy the covered range into the destination.
[DISADVANTAGE] There are mainly two problems:
- Extra memory allocation for every _fs_read() call For the leading and tailing block.
- Extra path resolving All those supported fs will have to do extra path resolving up to 2 times (one for the leading block, one for the tailing block). This may slow down the read.
[SUPPORTED FSES]
- Btrfs (manually tested*) - Ext4 (manually tested) - FAT (manually tested) - Erofs - sandboxfs - ubifs
*: Failed to get the test cases run, thus have to go sandbox mode, and attach an image with target fs, load the target file (with unaligned range) and compare the result using md5sum.
For EXT4/FAT, they may need extra cleanup, as their existing unaligned range handling is no longer needed anymore, cleaning them up should free more code lines than the added one.
Just not confident enough to modify them all by myself.
[UNSUPPORTED FSES] - Squashfs They don't support non-zero offset, thus it can not handle the tailing block reading. Need extra help to add block aligned offset support.
- Semihostfs It's using hardcoded trap to do system calls, not sure how it would work for stat() call.
Extra testing/feedback is always appreciated.
Qu Wenruo (8): fs: fat: unexport file_fat_read_at() fs: always get the file size in _fs_read() fs: btrfs: move the unaligned read code to _fs_read() for btrfs fs: ext4: rely on _fs_read() to pass block aligned range into ext4fs_read_file() fs: fat: rely on higher layer to get block aligned read range fs: sandboxfs: add sandbox_fs_get_blocksize() fs: ubifs: rely on higher layer to do unaligned read fs: erofs: add unaligned read range handling
arch/sandbox/cpu/os.c | 11 +++ fs/btrfs/btrfs.c | 24 +++--- fs/btrfs/inode.c | 84 ++------------------ fs/erofs/internal.h | 1 + fs/erofs/super.c | 6 ++ fs/ext4/ext4fs.c | 11 +++ fs/fat/fat.c | 17 ++++- fs/fs.c | 169 +++++++++++++++++++++++++++++++++++++++-- fs/sandbox/sandboxfs.c | 14 ++++ fs/ubifs/ubifs.c | 13 ++-- include/btrfs.h | 1 + include/erofs.h | 1 + include/ext4fs.h | 1 + include/fat.h | 3 +- include/os.h | 8 ++ include/sandboxfs.h | 1 + include/ubifs_uboot.h | 1 + 17 files changed, 258 insertions(+), 108 deletions(-)

That function is only utilized inside fat driver, unexport it.
Signed-off-by: Qu Wenruo wqu@suse.com --- fs/fat/fat.c | 4 ++-- include/fat.h | 2 -- 2 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/fs/fat/fat.c b/fs/fat/fat.c index df9ea2c028fc..dcceccbcee0a 100644 --- a/fs/fat/fat.c +++ b/fs/fat/fat.c @@ -1243,8 +1243,8 @@ out_free_itr: return ret; }
-int file_fat_read_at(const char *filename, loff_t pos, void *buffer, - loff_t maxsize, loff_t *actread) +static int file_fat_read_at(const char *filename, loff_t pos, void *buffer, + loff_t maxsize, loff_t *actread) { fsdata fsdata; fat_itr *itr; diff --git a/include/fat.h b/include/fat.h index bd8e450b33a3..a9756fb4cd1b 100644 --- a/include/fat.h +++ b/include/fat.h @@ -200,8 +200,6 @@ static inline u32 sect_to_clust(fsdata *fsdata, int sect) int file_fat_detectfs(void); int fat_exists(const char *filename); int fat_size(const char *filename, loff_t *size); -int file_fat_read_at(const char *filename, loff_t pos, void *buffer, - loff_t maxsize, loff_t *actread); int file_fat_read(const char *filename, void *buffer, int maxsize); int fat_set_blk_dev(struct blk_desc *rbdd, struct disk_partition *info); int fat_register_device(struct blk_desc *dev_desc, int part_no);

On Tue, 28 Jun 2022 at 01:28, Qu Wenruo wqu@suse.com wrote:
That function is only utilized inside fat driver, unexport it.
Signed-off-by: Qu Wenruo wqu@suse.com
fs/fat/fat.c | 4 ++-- include/fat.h | 2 -- 2 files changed, 2 insertions(+), 4 deletions(-)
Reviewed-by: Simon Glass sjg@chromium.org
diff --git a/fs/fat/fat.c b/fs/fat/fat.c index df9ea2c028fc..dcceccbcee0a 100644 --- a/fs/fat/fat.c +++ b/fs/fat/fat.c @@ -1243,8 +1243,8 @@ out_free_itr: return ret; }
-int file_fat_read_at(const char *filename, loff_t pos, void *buffer,
loff_t maxsize, loff_t *actread)
+static int file_fat_read_at(const char *filename, loff_t pos, void *buffer,
loff_t maxsize, loff_t *actread)
{ fsdata fsdata; fat_itr *itr; diff --git a/include/fat.h b/include/fat.h index bd8e450b33a3..a9756fb4cd1b 100644 --- a/include/fat.h +++ b/include/fat.h @@ -200,8 +200,6 @@ static inline u32 sect_to_clust(fsdata *fsdata, int sect) int file_fat_detectfs(void); int fat_exists(const char *filename); int fat_size(const char *filename, loff_t *size); -int file_fat_read_at(const char *filename, loff_t pos, void *buffer,
loff_t maxsize, loff_t *actread);
int file_fat_read(const char *filename, void *buffer, int maxsize); int fat_set_blk_dev(struct blk_desc *rbdd, struct disk_partition *info); int fat_register_device(struct blk_desc *dev_desc, int part_no); -- 2.36.1

For _fs_read(), @len == 0 means we read the whole file. And we just pass @len == 0 to make each filesystem to handle it.
In fact we have info->size() call to properly get the filesize.
So we can not only call info->size() to grab the file_size for len == 0 case, but also detect invalid @len (e.g. @len > file_size) in advance or truncate @len.
This behavior also allows us to handle unaligned better in the incoming patches.
Signed-off-by: Qu Wenruo wqu@suse.com --- fs/fs.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/fs/fs.c b/fs/fs.c index 6de1a3eb6d5d..d992cdd6d650 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -579,6 +579,7 @@ static int _fs_read(const char *filename, ulong addr, loff_t offset, loff_t len, { struct fstype_info *info = fs_get_info(fs_type); void *buf; + loff_t file_size; int ret;
#ifdef CONFIG_LMB @@ -589,10 +590,26 @@ static int _fs_read(const char *filename, ulong addr, loff_t offset, loff_t len, } #endif
- /* - * We don't actually know how many bytes are being read, since len==0 - * means read the whole file. - */ + ret = info->size(filename, &file_size); + if (ret < 0) { + log_err("** Unable to get file size for %s, %d **\n", + filename, ret); + return ret; + } + if (offset >= file_size) { + log_err( + "** Invalid offset, offset (%llu) >= file size (%llu)\n", + offset, file_size); + return -EINVAL; + + } + if (len == 0 || offset + len > file_size) { + if (len > file_size) + log_info( + "** Truncate read length from %llu to %llu, as file size is %llu **\n", + len, file_size, file_size); + len = file_size - offset; + } buf = map_sysmem(addr, len); ret = info->read(filename, buf, offset, len, actread); unmap_sysmem(buf);

Hi, wenruo,
在 2022/6/28 15:28, Qu Wenruo 写道:
For _fs_read(), @len == 0 means we read the whole file. And we just pass @len == 0 to make each filesystem to handle it.
In fact we have info->size() call to properly get the filesize.
So we can not only call info->size() to grab the file_size for len == 0 case, but also detect invalid @len (e.g. @len > file_size) in advance or truncate @len.
This behavior also allows us to handle unaligned better in the incoming patches.
Signed-off-by: Qu Wenruo wqu@suse.com
fs/fs.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/fs/fs.c b/fs/fs.c index 6de1a3eb6d5d..d992cdd6d650 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -579,6 +579,7 @@ static int _fs_read(const char *filename, ulong addr, loff_t offset, loff_t len, { struct fstype_info *info = fs_get_info(fs_type); void *buf;
loff_t file_size; int ret;
#ifdef CONFIG_LMB
@@ -589,10 +590,26 @@ static int _fs_read(const char *filename, ulong addr, loff_t offset, loff_t len, } #endif
- /*
* We don't actually know how many bytes are being read, since len==0
* means read the whole file.
*/
- ret = info->size(filename, &file_size);
I get an error when running the erofs test cases. The return value isn't as expected when reading symlink file. For symlink file, erofs_size will return the size of the symlink itself here.
- if (ret < 0) {
log_err("** Unable to get file size for %s, %d **\n",
filename, ret);
return ret;
- }
- if (offset >= file_size) {
log_err(
"** Invalid offset, offset (%llu) >= file size (%llu)\n",
offset, file_size);
return -EINVAL;
- }
- if (len == 0 || offset + len > file_size) {
if (len > file_size)
log_info(
- "** Truncate read length from %llu to %llu, as file size is %llu **\n",
len, file_size, file_size);
len = file_size - offset;
Then, we will get a wrong len in the case of len==0. So I think we need to do something extra with the symlink file?
Thanks, Jianan
- } buf = map_sysmem(addr, len); ret = info->read(filename, buf, offset, len, actread); unmap_sysmem(buf);

On 2022/6/28 20:36, Huang Jianan wrote:
Hi, wenruo,
在 2022/6/28 15:28, Qu Wenruo 写道:
For _fs_read(), @len == 0 means we read the whole file. And we just pass @len == 0 to make each filesystem to handle it.
In fact we have info->size() call to properly get the filesize.
So we can not only call info->size() to grab the file_size for len == 0 case, but also detect invalid @len (e.g. @len > file_size) in advance or truncate @len.
This behavior also allows us to handle unaligned better in the incoming patches.
Signed-off-by: Qu Wenruo wqu@suse.com
fs/fs.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/fs/fs.c b/fs/fs.c index 6de1a3eb6d5d..d992cdd6d650 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -579,6 +579,7 @@ static int _fs_read(const char *filename, ulong addr, loff_t offset, loff_t len, { struct fstype_info *info = fs_get_info(fs_type); void *buf; + loff_t file_size; int ret; #ifdef CONFIG_LMB @@ -589,10 +590,26 @@ static int _fs_read(const char *filename, ulong addr, loff_t offset, loff_t len, } #endif - /* - * We don't actually know how many bytes are being read, since len==0 - * means read the whole file. - */ + ret = info->size(filename, &file_size);
I get an error when running the erofs test cases. The return value isn't as expected when reading symlink file. For symlink file, erofs_size will return the size of the symlink itself here.
Indeed, this is a problem, in fact I also checked other fses, it looks like we all just return the inode size for the softlink, thus size() call can not be relied on for those cases.
While for the read() calls, every fs will do extra resolving for soft links, thus it doesn't cause problems.
This can still be solved by not calling size() calls at all, and only do the unaligned read handling for the leading block.
Thank you very much for pointing this bug out, would update the patchset for that.
Thanks, Qu
+ if (ret < 0) { + log_err("** Unable to get file size for %s, %d **\n", + filename, ret); + return ret; + } + if (offset >= file_size) { + log_err( + "** Invalid offset, offset (%llu) >= file size (%llu)\n", + offset, file_size); + return -EINVAL;
+ } + if (len == 0 || offset + len > file_size) { + if (len > file_size) + log_info( + "** Truncate read length from %llu to %llu, as file size is %llu **\n", + len, file_size, file_size); + len = file_size - offset;
Then, we will get a wrong len in the case of len==0. So I think we need to do something extra with the symlink file?
Thanks, Jianan
+ } buf = map_sysmem(addr, len); ret = info->read(filename, buf, offset, len, actread); unmap_sysmem(buf);

Unlike FUSE or kernel, U-boot filesystem code makes the underly fs code to handle the unaligned read (aka, read range is not aligned to fs block size).
This makes underlying fs code harder to implement, as unlike FUSE/kernel code, now they have to handle unaligned read all by themselves.
This patch will change the behavior, starting from btrfs, by moving the unaligned read code into _fs_read().
The idea is pretty simple, if we have an unaligned read request, we handle it in the following steps:
1. Grab the blocksize of the fs
2. Read the leading unaligned range We will read the block that @offset is in, and copy the requested part into buf.
The the block we read covers the whole range, we just call it a day.
3. Read the aligned part
4. Read the tailing unaligned range Pretty much the same as the leading unaligned range, just read the whole block where @offset + @len is, then copy the requested range in the buffer.
This has been tested with a proper randomly populated btrfs file, then tried in sandbox mode with different aligned and unaligned range and compare the output with md5sum.
Cc: Marek Behun marek.behun@nic.cz Cc: linux-btrfs@vger.kernel.org Signed-off-by: Qu Wenruo wqu@suse.com --- fs/btrfs/btrfs.c | 24 ++++---- fs/btrfs/inode.c | 84 ++------------------------- fs/fs.c | 148 +++++++++++++++++++++++++++++++++++++++++++++-- include/btrfs.h | 1 + 4 files changed, 160 insertions(+), 97 deletions(-)
diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c index 741c6e20f533..f9a67468d508 100644 --- a/fs/btrfs/btrfs.c +++ b/fs/btrfs/btrfs.c @@ -223,17 +223,27 @@ out: return ret; }
+int btrfs_get_blocksize(const char *filename) +{ + struct btrfs_fs_info *fs_info = current_fs_info; + + return fs_info->sectorsize; +} + int btrfs_read(const char *file, void *buf, loff_t offset, loff_t len, loff_t *actread) { struct btrfs_fs_info *fs_info = current_fs_info; struct btrfs_root *root; - loff_t real_size = 0; u64 ino; u8 type; int ret;
ASSERT(fs_info); + + /* Higher layer should always pass correct @len in. */ + ASSERT(len); + ret = btrfs_lookup_path(fs_info->fs_root, BTRFS_FIRST_FREE_OBJECTID, file, &root, &ino, &type, 40); if (ret < 0) { @@ -246,18 +256,6 @@ int btrfs_read(const char *file, void *buf, loff_t offset, loff_t len, return -EINVAL; }
- if (!len) { - ret = btrfs_size(file, &real_size); - if (ret < 0) { - error("Failed to get inode size: %s", file); - return ret; - } - len = real_size; - } - - if (len > real_size - offset) - len = real_size - offset; - ret = btrfs_file_read(root, ino, offset, len, buf); if (ret < 0) { error("An error occurred while reading file %s", file); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d00b5153336d..5615143fab82 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -620,44 +620,6 @@ check_next: return 1; }
-static int read_and_truncate_page(struct btrfs_path *path, - struct btrfs_file_extent_item *fi, - int start, int len, char *dest) -{ - struct extent_buffer *leaf = path->nodes[0]; - struct btrfs_fs_info *fs_info = leaf->fs_info; - u64 aligned_start = round_down(start, fs_info->sectorsize); - u8 extent_type; - char *buf; - int page_off = start - aligned_start; - int page_len = fs_info->sectorsize - page_off; - int ret; - - ASSERT(start + len <= aligned_start + fs_info->sectorsize); - buf = malloc_cache_aligned(fs_info->sectorsize); - if (!buf) - return -ENOMEM; - - extent_type = btrfs_file_extent_type(leaf, fi); - if (extent_type == BTRFS_FILE_EXTENT_INLINE) { - ret = btrfs_read_extent_inline(path, fi, buf); - memcpy(dest, buf + page_off, min(page_len, ret)); - free(buf); - return len; - } - - ret = btrfs_read_extent_reg(path, fi, - round_down(start, fs_info->sectorsize), - fs_info->sectorsize, buf); - if (ret < 0) { - free(buf); - return ret; - } - memcpy(dest, buf + page_off, page_len); - free(buf); - return len; -} - int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len, char *dest) { @@ -676,31 +638,12 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len, /* Set the whole dest all zero, so we won't need to bother holes */ memset(dest, 0, len);
- /* Read out the leading unaligned part */ - if (aligned_start != file_offset) { - ret = lookup_data_extent(root, &path, ino, aligned_start, - &next_offset); - if (ret < 0) - goto out; - if (ret == 0) { - /* Read the unaligned part out*/ - fi = btrfs_item_ptr(path.nodes[0], path.slots[0], - struct btrfs_file_extent_item); - ret = read_and_truncate_page(&path, fi, file_offset, - round_up(file_offset, fs_info->sectorsize) - - file_offset, dest); - if (ret < 0) - goto out; - cur += fs_info->sectorsize; - } else { - /* The whole file is a hole */ - if (!next_offset) { - memset(dest, 0, len); - return len; - } - cur = next_offset; - } - } + /* + * Higher layer should ensure the offset/len is already sectorsize + * aligned. + */ + ASSERT(IS_ALIGNED(file_offset, fs_info->sectorsize)); + ASSERT(IS_ALIGNED(len, fs_info->sectorsize));
/* Read the aligned part */ while (cur < aligned_end) { @@ -752,21 +695,6 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len, goto out; cur += min(extent_num_bytes, aligned_end - cur); } - - /* Read the tailing unaligned part*/ - if (file_offset + len != aligned_end) { - btrfs_release_path(&path); - ret = lookup_data_extent(root, &path, ino, aligned_end, - &next_offset); - /* <0 is error, >0 means no extent */ - if (ret) - goto out; - fi = btrfs_item_ptr(path.nodes[0], path.slots[0], - struct btrfs_file_extent_item); - ret = read_and_truncate_page(&path, fi, aligned_end, - file_offset + len - aligned_end, - dest + aligned_end - file_offset); - } out: btrfs_release_path(&path); if (ret < 0) diff --git a/fs/fs.c b/fs/fs.c index d992cdd6d650..30696ac6c1a3 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -25,9 +25,11 @@ #include <asm/io.h> #include <div64.h> #include <linux/math64.h> +#include <linux/kernel.h> #include <efi_loader.h> #include <squashfs.h> #include <erofs.h> +#include <memalign.h>
DECLARE_GLOBAL_DATA_PTR;
@@ -139,6 +141,11 @@ static inline int fs_mkdir_unsupported(const char *dirname) return -1; }
+static inline int fs_get_blocksize_unsupported(const char *filename) +{ + return 0; +} + struct fstype_info { int fstype; char *name; @@ -156,6 +163,13 @@ struct fstype_info { int (*ls)(const char *dirname); int (*exists)(const char *filename); int (*size)(const char *filename, loff_t *size); + /* + * Get the minimal unit of data IO. + * If implemented, U-boot generic fs code will handle the unaligned + * read, and the underlying code will only need to bother fully aligned + * read. + */ + int (*get_blocksize)(const char *filename); int (*read)(const char *filename, void *buf, loff_t offset, loff_t len, loff_t *actread); int (*write)(const char *filename, void *buf, loff_t offset, @@ -193,6 +207,7 @@ static struct fstype_info fstypes[] = { .exists = fat_exists, .size = fat_size, .read = fat_read_file, + .get_blocksize = fs_get_blocksize_unsupported, #if CONFIG_IS_ENABLED(FAT_WRITE) .write = file_fat_write, .unlink = fat_unlink, @@ -221,6 +236,7 @@ static struct fstype_info fstypes[] = { .exists = ext4fs_exists, .size = ext4fs_size, .read = ext4_read_file, + .get_blocksize = fs_get_blocksize_unsupported, #ifdef CONFIG_CMD_EXT4_WRITE .write = ext4_write_file, .ln = ext4fs_create_link, @@ -245,6 +261,7 @@ static struct fstype_info fstypes[] = { .exists = sandbox_fs_exists, .size = sandbox_fs_size, .read = fs_read_sandbox, + .get_blocksize = fs_get_blocksize_unsupported, .write = fs_write_sandbox, .uuid = fs_uuid_unsupported, .opendir = fs_opendir_unsupported, @@ -264,6 +281,7 @@ static struct fstype_info fstypes[] = { .exists = fs_exists_unsupported, .size = smh_fs_size, .read = smh_fs_read, + .get_blocksize = fs_get_blocksize_unsupported, .write = smh_fs_write, .uuid = fs_uuid_unsupported, .opendir = fs_opendir_unsupported, @@ -284,6 +302,7 @@ static struct fstype_info fstypes[] = { .exists = ubifs_exists, .size = ubifs_size, .read = ubifs_read, + .get_blocksize = fs_get_blocksize_unsupported, .write = fs_write_unsupported, .uuid = fs_uuid_unsupported, .opendir = fs_opendir_unsupported, @@ -305,6 +324,7 @@ static struct fstype_info fstypes[] = { .exists = btrfs_exists, .size = btrfs_size, .read = btrfs_read, + .get_blocksize = btrfs_get_blocksize, .write = fs_write_unsupported, .uuid = btrfs_uuid, .opendir = fs_opendir_unsupported, @@ -324,6 +344,7 @@ static struct fstype_info fstypes[] = { .readdir = sqfs_readdir, .ls = fs_ls_generic, .read = sqfs_read, + .get_blocksize = fs_get_blocksize_unsupported, .size = sqfs_size, .close = sqfs_close, .closedir = sqfs_closedir, @@ -345,6 +366,7 @@ static struct fstype_info fstypes[] = { .readdir = erofs_readdir, .ls = fs_ls_generic, .read = erofs_read, + .get_blocksize = fs_get_blocksize_unsupported, .size = erofs_size, .close = erofs_close, .closedir = erofs_closedir, @@ -366,6 +388,7 @@ static struct fstype_info fstypes[] = { .exists = fs_exists_unsupported, .size = fs_size_unsupported, .read = fs_read_unsupported, + .get_blocksize = fs_get_blocksize_unsupported, .write = fs_write_unsupported, .uuid = fs_uuid_unsupported, .opendir = fs_opendir_unsupported, @@ -578,8 +601,15 @@ static int _fs_read(const char *filename, ulong addr, loff_t offset, loff_t len, int do_lmb_check, loff_t *actread) { struct fstype_info *info = fs_get_info(fs_type); + /* Buffer for the unalgiend read. */ + void *block_buffer; void *buf; loff_t file_size; + loff_t real_start = offset; + loff_t real_len = len; + loff_t bytes_read = 0; + loff_t total_read = 0; + int blocksize; int ret;
#ifdef CONFIG_LMB @@ -610,15 +640,121 @@ static int _fs_read(const char *filename, ulong addr, loff_t offset, loff_t len, len, file_size, file_size); len = file_size - offset; } + blocksize = info->get_blocksize(filename); + if (blocksize < 0) { + log_err("** Unable to get blocksize, %d **\n", blocksize); + return blocksize; + } + + /* + * The fs doesn't yet report its blocksize, then let the read() + * call to handle it + * + * This should be deleted when all fs support blocksize reporting. + */ + if (blocksize == 0) { + buf = map_sysmem(addr, len); + ret = info->read(filename, buf, offset, len, actread); + unmap_sysmem(buf); + if (*actread < len) + log_debug( + "** %s short read, off %llu len %llu actual read %llu **\n", + filename, real_start, real_len, bytes_read); + return ret; + } + + block_buffer = malloc_cache_aligned(blocksize); + if (!block_buffer) { + log_err("** Unable to alloc memory for one block **\n"); + return -ENOMEM; + } + memset(block_buffer, 0, blocksize); + + real_start = round_up(offset, blocksize); + buf = map_sysmem(addr, len); - ret = info->read(filename, buf, offset, len, actread); - unmap_sysmem(buf); + /* Read the leading unaligned part. */ + if (!IS_ALIGNED(offset, blocksize)) { + loff_t read_off = round_down(offset, blocksize); + + ret = info->read(filename, block_buffer, read_off, blocksize, + &bytes_read); + if (ret < 0) { + log_err("** Failed to read %s at offset %llu, %d **\n", + filename, read_off, ret); + goto out; + }
- /* If we requested a specific number of bytes, check we got it */ - if (ret == 0 && len && *actread != len) - log_debug("** %s shorter than offset + len **\n", filename); - fs_close(); + /* + * Calculate the real lengh we should copy from the block to + * the buffer. + */ + bytes_read = min(blocksize - offset % blocksize, len); + + memcpy(buf, block_buffer + offset % blocksize, bytes_read); + total_read += bytes_read; + + /* + * The whole read range is covered by the one block, we have + * finished the whole read. + */ + if (round_up(offset, blocksize) > + round_down(offset + len, blocksize)) + goto out; + } + /* At this stage, offset + bytes_read should be aligned to blocksize. */ + assert(IS_ALIGNED(offset + bytes_read, blocksize)); + + /* And we should have at least 0 or more aligned bytes to read. */ + assert(round_down(offset + len, blocksize) >= + round_up(offset, blocksize)); + real_len = round_down(offset + len, blocksize) - real_start; + + /* Read the aligned part. */ + if (real_len) { + ret = info->read(filename, buf + total_read, real_start, real_len, + &bytes_read); + if (ret < 0) { + log_err("** failed to read %s off %llu len %llu, %d **\n", + filename, real_start, real_len, ret); + goto out; + } + total_read += bytes_read; + } + + /* A short read happened. */ + if (bytes_read < real_len) { + log_debug( + "** %s short read, off %llu len %llu actual read %llu **\n", + filename, real_start, real_len, bytes_read); + goto out; + }
+ /* Read the tailing unaligned part. */ + if (!IS_ALIGNED(offset + len, blocksize)) { + /* + * Reset the block buffer, as it may contain some data from + * the leading block. + */ + memset(block_buffer, 0, blocksize); + ret = info->read(filename, block_buffer, + round_down(offset + len, blocksize), blocksize, + &bytes_read); + if (ret < 0) { + log_err("** Failed to read %s at offset %llu, %d **\n", + filename, round_down(offset + len, blocksize), + ret); + goto out; + } + bytes_read = (offset + len) % blocksize; + memcpy(buf + total_read, block_buffer, bytes_read); + total_read += bytes_read; + } +out: + unmap_sysmem(buf); + free(block_buffer); + fs_close(); + *actread = total_read; return ret; }
diff --git a/include/btrfs.h b/include/btrfs.h index a7605e158970..bba71ec02893 100644 --- a/include/btrfs.h +++ b/include/btrfs.h @@ -17,6 +17,7 @@ int btrfs_ls(const char *); int btrfs_exists(const char *); int btrfs_size(const char *, loff_t *); int btrfs_read(const char *, void *, loff_t, loff_t, loff_t *); +int btrfs_get_blocksize(const char *); void btrfs_close(void); int btrfs_uuid(char *); void btrfs_list_subvols(void);

Since _fs_read() is handling the unaligned read internally, ext4 driver only need to handle block aligned read.
Unfortunately I'm not familiar with ext4 and its driver, thus not confident enough to cleanup all the unaligned read related code.
So here we will have some dead code, and any help to clean them up is appreciated.
Cc: Tom Rini trini@konsulko.com Signed-off-by: Qu Wenruo wqu@suse.com --- fs/ext4/ext4fs.c | 11 +++++++++++ fs/fs.c | 2 +- include/ext4fs.h | 1 + 3 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/ext4fs.c b/fs/ext4/ext4fs.c index 4c89152ce4ad..be2680994d8b 100644 --- a/fs/ext4/ext4fs.c +++ b/fs/ext4/ext4fs.c @@ -71,6 +71,10 @@ int ext4fs_read_file(struct ext2fs_node *node, loff_t pos,
ext_cache_init(&cache);
+ /* Higher layer has ensured to pass block aligned range here. */ + assert(IS_ALIGNED(pos, blocksize)); + assert(IS_ALIGNED(len, blocksize)); + /* Adjust len so it we can't read past the end of the file. */ if (len + pos > filesize) len = (filesize - pos); @@ -183,6 +187,13 @@ int ext4fs_read_file(struct ext2fs_node *node, loff_t pos, return 0; }
+int ext4fs_get_blocksize(const char *filename) +{ + struct ext_filesystem *fs = get_fs(); + + return fs->blksz; +} + int ext4fs_ls(const char *dirname) { struct ext2fs_node *dirnode = NULL; diff --git a/fs/fs.c b/fs/fs.c index 30696ac6c1a3..e69a0968bb6d 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -236,7 +236,7 @@ static struct fstype_info fstypes[] = { .exists = ext4fs_exists, .size = ext4fs_size, .read = ext4_read_file, - .get_blocksize = fs_get_blocksize_unsupported, + .get_blocksize = ext4fs_get_blocksize, #ifdef CONFIG_CMD_EXT4_WRITE .write = ext4_write_file, .ln = ext4fs_create_link, diff --git a/include/ext4fs.h b/include/ext4fs.h index cb5d9cc0a5c0..cc40cfedd954 100644 --- a/include/ext4fs.h +++ b/include/ext4fs.h @@ -146,6 +146,7 @@ int ext4fs_create_link(const char *target, const char *fname); struct ext_filesystem *get_fs(void); int ext4fs_open(const char *filename, loff_t *len); int ext4fs_read(char *buf, loff_t offset, loff_t len, loff_t *actread); +int ext4fs_get_blocksize(const char *filename); int ext4fs_mount(unsigned part_length); void ext4fs_close(void); void ext4fs_reinit_global(void);

Just implement fat_get_blocksize() for fat, so that fat_read_file() always get a block aligned read range.
Unfortunately I'm not experienced enough to cleanup the fat code, thus further cleanup is appreciated.
Cc: Tom Rini trini@konsulko.com Signed-off-by: Qu Wenruo wqu@suse.com --- fs/fat/fat.c | 13 +++++++++++++ fs/fs.c | 2 +- include/fat.h | 1 + 3 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/fs/fat/fat.c b/fs/fat/fat.c index dcceccbcee0a..e13035e8e6d1 100644 --- a/fs/fat/fat.c +++ b/fs/fat/fat.c @@ -1299,6 +1299,19 @@ int fat_read_file(const char *filename, void *buf, loff_t offset, loff_t len, return ret; }
+int fat_get_blocksize(const char *filename) +{ + fsdata fsdata = {0}; + int ret; + + ret = get_fs_info(&fsdata); + if (ret) + return ret; + + free(fsdata.fatbuf); + return fsdata.sect_size; +} + typedef struct { struct fs_dir_stream parent; struct fs_dirent dirent; diff --git a/fs/fs.c b/fs/fs.c index e69a0968bb6d..7e4ead9b790b 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -207,7 +207,7 @@ static struct fstype_info fstypes[] = { .exists = fat_exists, .size = fat_size, .read = fat_read_file, - .get_blocksize = fs_get_blocksize_unsupported, + .get_blocksize = fat_get_blocksize, #if CONFIG_IS_ENABLED(FAT_WRITE) .write = file_fat_write, .unlink = fat_unlink, diff --git a/include/fat.h b/include/fat.h index a9756fb4cd1b..c03a2bebecef 100644 --- a/include/fat.h +++ b/include/fat.h @@ -201,6 +201,7 @@ int file_fat_detectfs(void); int fat_exists(const char *filename); int fat_size(const char *filename, loff_t *size); int file_fat_read(const char *filename, void *buffer, int maxsize); +int fat_get_blocksize(const char *filename); int fat_set_blk_dev(struct blk_desc *rbdd, struct disk_partition *info); int fat_register_device(struct blk_desc *dev_desc, int part_no);

This is to make sandboxfs to report blocksize it supports for _fs_read() to handle unaligned read.
Unlike all other fses, sandboxfs can handle unaligned read/write without any problem since it's calling read()/write(), which doesn't bother the blocksize at all.
This change is mostly to make testing of _fs_read() much easier.
Cc: Simon Glass sjg@chromium.org Signed-off-by: Qu Wenruo wqu@suse.com --- arch/sandbox/cpu/os.c | 11 +++++++++++ fs/fs.c | 2 +- fs/sandbox/sandboxfs.c | 14 ++++++++++++++ include/os.h | 8 ++++++++ include/sandboxfs.h | 1 + 5 files changed, 35 insertions(+), 1 deletion(-)
diff --git a/arch/sandbox/cpu/os.c b/arch/sandbox/cpu/os.c index 5ea54179176c..6c29f29bdd9b 100644 --- a/arch/sandbox/cpu/os.c +++ b/arch/sandbox/cpu/os.c @@ -46,6 +46,17 @@ ssize_t os_read(int fd, void *buf, size_t count) return read(fd, buf, count); }
+ssize_t os_get_blocksize(int fd) +{ + struct stat stat = {0}; + int ret; + + ret = fstat(fd, &stat); + if (ret < 0) + return -errno; + return stat.st_blksize; +} + ssize_t os_write(int fd, const void *buf, size_t count) { return write(fd, buf, count); diff --git a/fs/fs.c b/fs/fs.c index 7e4ead9b790b..337d5711c28c 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -261,7 +261,7 @@ static struct fstype_info fstypes[] = { .exists = sandbox_fs_exists, .size = sandbox_fs_size, .read = fs_read_sandbox, - .get_blocksize = fs_get_blocksize_unsupported, + .get_blocksize = sandbox_fs_get_blocksize, .write = fs_write_sandbox, .uuid = fs_uuid_unsupported, .opendir = fs_opendir_unsupported, diff --git a/fs/sandbox/sandboxfs.c b/fs/sandbox/sandboxfs.c index 4ae41d5b4db1..130fee088621 100644 --- a/fs/sandbox/sandboxfs.c +++ b/fs/sandbox/sandboxfs.c @@ -55,6 +55,20 @@ int sandbox_fs_read_at(const char *filename, loff_t pos, void *buffer, return ret; }
+int sandbox_fs_get_blocksize(const char *filename) +{ + int fd; + int ret; + + fd = os_open(filename, OS_O_RDONLY); + if (fd < 0) + return fd; + + ret = os_get_blocksize(fd); + os_close(fd); + return ret; +} + int sandbox_fs_write_at(const char *filename, loff_t pos, void *buffer, loff_t towrite, loff_t *actwrite) { diff --git a/include/os.h b/include/os.h index 10e198cf503e..a864d9ca39b2 100644 --- a/include/os.h +++ b/include/os.h @@ -26,6 +26,14 @@ struct sandbox_state; */ ssize_t os_read(int fd, void *buf, size_t count);
+/** + * Get the optimial blocksize through stat() call. + * + * @fd: File descriptor as returned by os_open() + * Return: >=0 for the blocksize. <0 for error. + */ +ssize_t os_get_blocksize(int fd); + /** * Access to the OS write() system call * diff --git a/include/sandboxfs.h b/include/sandboxfs.h index 783dd5c88a73..6937068f7b82 100644 --- a/include/sandboxfs.h +++ b/include/sandboxfs.h @@ -32,6 +32,7 @@ void sandbox_fs_close(void); int sandbox_fs_ls(const char *dirname); int sandbox_fs_exists(const char *filename); int sandbox_fs_size(const char *filename, loff_t *size); +int sandbox_fs_get_blocksize(const char *filename); int fs_read_sandbox(const char *filename, void *buf, loff_t offset, loff_t len, loff_t *actread); int fs_write_sandbox(const char *filename, void *buf, loff_t offset,

On Tue, 28 Jun 2022 at 01:28, Qu Wenruo wqu@suse.com wrote:
This is to make sandboxfs to report blocksize it supports for _fs_read() to handle unaligned read.
Unlike all other fses, sandboxfs can handle unaligned read/write without any problem since it's calling read()/write(), which doesn't bother the blocksize at all.
This change is mostly to make testing of _fs_read() much easier.
Cc: Simon Glass sjg@chromium.org Signed-off-by: Qu Wenruo wqu@suse.com
arch/sandbox/cpu/os.c | 11 +++++++++++ fs/fs.c | 2 +- fs/sandbox/sandboxfs.c | 14 ++++++++++++++ include/os.h | 8 ++++++++ include/sandboxfs.h | 1 + 5 files changed, 35 insertions(+), 1 deletion(-)
Reviewed-by: Simon Glass sjg@chromium.org
with a comment as requested below
diff --git a/arch/sandbox/cpu/os.c b/arch/sandbox/cpu/os.c index 5ea54179176c..6c29f29bdd9b 100644 --- a/arch/sandbox/cpu/os.c +++ b/arch/sandbox/cpu/os.c @@ -46,6 +46,17 @@ ssize_t os_read(int fd, void *buf, size_t count) return read(fd, buf, count); }
+ssize_t os_get_blocksize(int fd) +{
struct stat stat = {0};
int ret;
ret = fstat(fd, &stat);
if (ret < 0)
return -errno;
return stat.st_blksize;
+}
ssize_t os_write(int fd, const void *buf, size_t count) { return write(fd, buf, count); diff --git a/fs/fs.c b/fs/fs.c index 7e4ead9b790b..337d5711c28c 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -261,7 +261,7 @@ static struct fstype_info fstypes[] = { .exists = sandbox_fs_exists, .size = sandbox_fs_size, .read = fs_read_sandbox,
.get_blocksize = fs_get_blocksize_unsupported,
.get_blocksize = sandbox_fs_get_blocksize, .write = fs_write_sandbox, .uuid = fs_uuid_unsupported, .opendir = fs_opendir_unsupported,
diff --git a/fs/sandbox/sandboxfs.c b/fs/sandbox/sandboxfs.c index 4ae41d5b4db1..130fee088621 100644 --- a/fs/sandbox/sandboxfs.c +++ b/fs/sandbox/sandboxfs.c @@ -55,6 +55,20 @@ int sandbox_fs_read_at(const char *filename, loff_t pos, void *buffer, return ret; }
+int sandbox_fs_get_blocksize(const char *filename) +{
int fd;
int ret;
fd = os_open(filename, OS_O_RDONLY);
if (fd < 0)
return fd;
ret = os_get_blocksize(fd);
os_close(fd);
return ret;
+}
int sandbox_fs_write_at(const char *filename, loff_t pos, void *buffer, loff_t towrite, loff_t *actwrite) { diff --git a/include/os.h b/include/os.h index 10e198cf503e..a864d9ca39b2 100644 --- a/include/os.h +++ b/include/os.h @@ -26,6 +26,14 @@ struct sandbox_state; */ ssize_t os_read(int fd, void *buf, size_t count);
+/**
- Get the optimial blocksize through stat() call.
- @fd: File descriptor as returned by os_open()
- Return: >=0 for the blocksize. <0 for error.
- */
+ssize_t os_get_blocksize(int fd);
/**
- Access to the OS write() system call
diff --git a/include/sandboxfs.h b/include/sandboxfs.h index 783dd5c88a73..6937068f7b82 100644 --- a/include/sandboxfs.h +++ b/include/sandboxfs.h @@ -32,6 +32,7 @@ void sandbox_fs_close(void); int sandbox_fs_ls(const char *dirname); int sandbox_fs_exists(const char *filename); int sandbox_fs_size(const char *filename, loff_t *size); +int sandbox_fs_get_blocksize(const char *filename);
Please add a full comment.
int fs_read_sandbox(const char *filename, void *buf, loff_t offset, loff_t len, loff_t *actread); int fs_write_sandbox(const char *filename, void *buf, loff_t offset, -- 2.36.1
Regards, Simon

On 2022/6/30 18:06, Simon Glass wrote:
On Tue, 28 Jun 2022 at 01:28, Qu Wenruo wqu@suse.com wrote:
This is to make sandboxfs to report blocksize it supports for _fs_read() to handle unaligned read.
Unlike all other fses, sandboxfs can handle unaligned read/write without any problem since it's calling read()/write(), which doesn't bother the blocksize at all.
This change is mostly to make testing of _fs_read() much easier.
Cc: Simon Glass sjg@chromium.org Signed-off-by: Qu Wenruo wqu@suse.com
arch/sandbox/cpu/os.c | 11 +++++++++++ fs/fs.c | 2 +- fs/sandbox/sandboxfs.c | 14 ++++++++++++++ include/os.h | 8 ++++++++ include/sandboxfs.h | 1 + 5 files changed, 35 insertions(+), 1 deletion(-)
Reviewed-by: Simon Glass sjg@chromium.org
with a comment as requested below
diff --git a/arch/sandbox/cpu/os.c b/arch/sandbox/cpu/os.c index 5ea54179176c..6c29f29bdd9b 100644 --- a/arch/sandbox/cpu/os.c +++ b/arch/sandbox/cpu/os.c @@ -46,6 +46,17 @@ ssize_t os_read(int fd, void *buf, size_t count) return read(fd, buf, count); }
+ssize_t os_get_blocksize(int fd) +{
struct stat stat = {0};
int ret;
ret = fstat(fd, &stat);
if (ret < 0)
return -errno;
return stat.st_blksize;
+}
- ssize_t os_write(int fd, const void *buf, size_t count) { return write(fd, buf, count);
diff --git a/fs/fs.c b/fs/fs.c index 7e4ead9b790b..337d5711c28c 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -261,7 +261,7 @@ static struct fstype_info fstypes[] = { .exists = sandbox_fs_exists, .size = sandbox_fs_size, .read = fs_read_sandbox,
.get_blocksize = fs_get_blocksize_unsupported,
.get_blocksize = sandbox_fs_get_blocksize, .write = fs_write_sandbox, .uuid = fs_uuid_unsupported, .opendir = fs_opendir_unsupported,
diff --git a/fs/sandbox/sandboxfs.c b/fs/sandbox/sandboxfs.c index 4ae41d5b4db1..130fee088621 100644 --- a/fs/sandbox/sandboxfs.c +++ b/fs/sandbox/sandboxfs.c @@ -55,6 +55,20 @@ int sandbox_fs_read_at(const char *filename, loff_t pos, void *buffer, return ret; }
+int sandbox_fs_get_blocksize(const char *filename) +{
int fd;
int ret;
fd = os_open(filename, OS_O_RDONLY);
if (fd < 0)
return fd;
ret = os_get_blocksize(fd);
os_close(fd);
return ret;
+}
- int sandbox_fs_write_at(const char *filename, loff_t pos, void *buffer, loff_t towrite, loff_t *actwrite) {
diff --git a/include/os.h b/include/os.h index 10e198cf503e..a864d9ca39b2 100644 --- a/include/os.h +++ b/include/os.h @@ -26,6 +26,14 @@ struct sandbox_state; */ ssize_t os_read(int fd, void *buf, size_t count);
+/**
- Get the optimial blocksize through stat() call.
- @fd: File descriptor as returned by os_open()
- Return: >=0 for the blocksize. <0 for error.
- */
+ssize_t os_get_blocksize(int fd);
- /**
- Access to the OS write() system call
diff --git a/include/sandboxfs.h b/include/sandboxfs.h index 783dd5c88a73..6937068f7b82 100644 --- a/include/sandboxfs.h +++ b/include/sandboxfs.h @@ -32,6 +32,7 @@ void sandbox_fs_close(void); int sandbox_fs_ls(const char *dirname); int sandbox_fs_exists(const char *filename); int sandbox_fs_size(const char *filename, loff_t *size); +int sandbox_fs_get_blocksize(const char *filename);
Please add a full comment.
This is already removed in the formal version: https://patchwork.kernel.org/project/linux-btrfs/list/?series=654990
As sandbox is just calling host OS read() to handle IO, thus it doesn't need to bother the alignment at all.
Thanks, Qu
int fs_read_sandbox(const char *filename, void *buf, loff_t offset, loff_t len, loff_t *actread); int fs_write_sandbox(const char *filename, void *buf, loff_t offset, -- 2.36.1
Regards, Simon

Currently ubifs doesn't support unaligned read offset, thanks to the recent _fs_read() work to handle unaligned read, we only need to implement ubifs_get_blocksize() to take advantage of it.
Now ubifs can do unaligned read without any problem.
Signed-off-by: Qu Wenruo wqu@suse.com --- fs/fs.c | 2 +- fs/ubifs/ubifs.c | 13 ++++++++----- include/ubifs_uboot.h | 1 + 3 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/fs/fs.c b/fs/fs.c index 337d5711c28c..41943e3a95db 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -302,7 +302,7 @@ static struct fstype_info fstypes[] = { .exists = ubifs_exists, .size = ubifs_size, .read = ubifs_read, - .get_blocksize = fs_get_blocksize_unsupported, + .get_blocksize = ubifs_get_blocksize, .write = fs_write_unsupported, .uuid = fs_uuid_unsupported, .opendir = fs_opendir_unsupported, diff --git a/fs/ubifs/ubifs.c b/fs/ubifs/ubifs.c index d3026e310168..a8ab556dd376 100644 --- a/fs/ubifs/ubifs.c +++ b/fs/ubifs/ubifs.c @@ -846,11 +846,9 @@ int ubifs_read(const char *filename, void *buf, loff_t offset,
*actread = 0;
- if (offset & (PAGE_SIZE - 1)) { - printf("ubifs: Error offset must be a multiple of %d\n", - PAGE_SIZE); - return -1; - } + /* Higher layer should ensure it always pass page aligned range. */ + assert(IS_ALIGNED(offset, PAGE_SIZE)); + assert(IS_ALIGNED(size, PAGE_SIZE));
c->ubi = ubi_open_volume(c->vi.ubi_num, c->vi.vol_id, UBI_READONLY); /* ubifs_findfile will resolve symlinks, so we know that we get @@ -920,6 +918,11 @@ out: return err; }
+int ubifs_get_blocksize(const char *filename) +{ + return PAGE_SIZE; +} + void ubifs_close(void) { } diff --git a/include/ubifs_uboot.h b/include/ubifs_uboot.h index b025779d59ff..bcd21715314a 100644 --- a/include/ubifs_uboot.h +++ b/include/ubifs_uboot.h @@ -29,6 +29,7 @@ int ubifs_exists(const char *filename); int ubifs_size(const char *filename, loff_t *size); int ubifs_read(const char *filename, void *buf, loff_t offset, loff_t size, loff_t *actread); +int ubifs_get_blocksize(const char *filename); void ubifs_close(void);
#endif /* __UBIFS_UBOOT_H__ */

I'm not an expert on erofs, but my quick glance didn't expose any special handling on unaligned range, thus I think the U-boot erofs driver doesn't really support unaligned read range.
This patch will add erofs_get_blocksize() so erofs can benefit from the generic unaligned read support.
Cc: Huang Jianan jnhuang95@gmail.com Cc: linux-erofs@lists.ozlabs.org Signed-off-by: Qu Wenruo wqu@suse.com --- fs/erofs/internal.h | 1 + fs/erofs/super.c | 6 ++++++ fs/fs.c | 2 +- include/erofs.h | 1 + 4 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 4af7c91560cc..d368a6481bf1 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -83,6 +83,7 @@ struct erofs_sb_info { u16 available_compr_algs; u16 lz4_max_distance; u32 checksum; + u32 blocksize; u16 extra_devices; union { u16 devt_slotoff; /* used for mkfs */ diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 4cca322b9ead..df01d2e719a7 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -99,7 +99,13 @@ int erofs_read_superblock(void)
sbi.build_time = le64_to_cpu(dsb->build_time); sbi.build_time_nsec = le32_to_cpu(dsb->build_time_nsec); + sbi.blocksize = 1 << blkszbits;
memcpy(&sbi.uuid, dsb->uuid, sizeof(dsb->uuid)); return erofs_init_devices(&sbi, dsb); } + +int erofs_get_blocksize(const char *filename) +{ + return sbi.blocksize; +} diff --git a/fs/fs.c b/fs/fs.c index 41943e3a95db..0ef5fbdda5d6 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -366,7 +366,7 @@ static struct fstype_info fstypes[] = { .readdir = erofs_readdir, .ls = fs_ls_generic, .read = erofs_read, - .get_blocksize = fs_get_blocksize_unsupported, + .get_blocksize = erofs_get_blocksize, .size = erofs_size, .close = erofs_close, .closedir = erofs_closedir, diff --git a/include/erofs.h b/include/erofs.h index 1fbe82bf72cb..18bd6807c538 100644 --- a/include/erofs.h +++ b/include/erofs.h @@ -10,6 +10,7 @@ int erofs_probe(struct blk_desc *fs_dev_desc, struct disk_partition *fs_partition); int erofs_read(const char *filename, void *buf, loff_t offset, loff_t len, loff_t *actread); +int erofs_get_blocksize(const char *filename); int erofs_size(const char *filename, loff_t *size); int erofs_exists(const char *filename); void erofs_close(void);

On 6/28/22 3:28 AM, Qu Wenruo wrote:
[BACKGROUND] Unlike FUSE/Kernel which always pass aligned read range, U-boot fs code just pass the request range to underlying fses.
Under most case, this works fine, as U-boot only really needs to read the whole file (aka, 0 for both offset and len, len will be later determined using file size).
But if some advanced user/script wants to extract kernel/initramfs from combined image, we may need to do unaligned read in that case.
[ADVANTAGE] This patchset will handle unaligned read range in _fs_read():
Get blocksize of the underlying fs
Read the leading block contianing the unaligned range The full block will be stored in a local buffer, then only copy the bytes in the unaligned range into the destination buffer.
If the first block covers the whole range, we just call it aday.
Read the aligned range if there is any
Read the tailing block containing the unaligned range And copy the covered range into the destination.
[DISADVANTAGE] There are mainly two problems:
Extra memory allocation for every _fs_read() call For the leading and tailing block.
Extra path resolving All those supported fs will have to do extra path resolving up to 2 times (one for the leading block, one for the tailing block). This may slow down the read.
[SUPPORTED FSES]
- Btrfs (manually tested*)
- Ext4 (manually tested)
- FAT (manually tested)
- Erofs
- sandboxfs
- ubifs
*: Failed to get the test cases run, thus have to go sandbox mode, and attach an image with target fs, load the target file (with unaligned range) and compare the result using md5sum.
For EXT4/FAT, they may need extra cleanup, as their existing unaligned range handling is no longer needed anymore, cleaning them up should free more code lines than the added one.
Just not confident enough to modify them all by myself.
[UNSUPPORTED FSES]
Squashfs They don't support non-zero offset, thus it can not handle the tailing block reading. Need extra help to add block aligned offset support.
Semihostfs It's using hardcoded trap to do system calls, not sure how it would work for stat() call.
There are no alignment requirements for semihosted FSs. So you can pass in an unaligned offset and it will work fine. This is because typically the host will call read() and the host OS will do the aligning.
--Sean

On Tue, Jun 28, 2022 at 03:28:00PM +0800, Qu Wenruo wrote:
[BACKGROUND] Unlike FUSE/Kernel which always pass aligned read range, U-boot fs code just pass the request range to underlying fses.
Under most case, this works fine, as U-boot only really needs to read the whole file (aka, 0 for both offset and len, len will be later determined using file size).
But if some advanced user/script wants to extract kernel/initramfs from combined image, we may need to do unaligned read in that case.
[ADVANTAGE] This patchset will handle unaligned read range in _fs_read():
Get blocksize of the underlying fs
Read the leading block contianing the unaligned range The full block will be stored in a local buffer, then only copy the bytes in the unaligned range into the destination buffer.
If the first block covers the whole range, we just call it aday.
Read the aligned range if there is any
Read the tailing block containing the unaligned range And copy the covered range into the destination.
[DISADVANTAGE] There are mainly two problems:
Extra memory allocation for every _fs_read() call For the leading and tailing block.
Extra path resolving All those supported fs will have to do extra path resolving up to 2 times (one for the leading block, one for the tailing block). This may slow down the read.
This conceptually seems like a good thing. Can you please post some before/after times of reading large images from the supported filesystems?

On 2022/6/28 22:17, Tom Rini wrote:
On Tue, Jun 28, 2022 at 03:28:00PM +0800, Qu Wenruo wrote:
[BACKGROUND] Unlike FUSE/Kernel which always pass aligned read range, U-boot fs code just pass the request range to underlying fses.
Under most case, this works fine, as U-boot only really needs to read the whole file (aka, 0 for both offset and len, len will be later determined using file size).
But if some advanced user/script wants to extract kernel/initramfs from combined image, we may need to do unaligned read in that case.
[ADVANTAGE] This patchset will handle unaligned read range in _fs_read():
Get blocksize of the underlying fs
Read the leading block contianing the unaligned range The full block will be stored in a local buffer, then only copy the bytes in the unaligned range into the destination buffer.
If the first block covers the whole range, we just call it aday.
Read the aligned range if there is any
Read the tailing block containing the unaligned range And copy the covered range into the destination.
[DISADVANTAGE] There are mainly two problems:
Extra memory allocation for every _fs_read() call For the leading and tailing block.
Extra path resolving All those supported fs will have to do extra path resolving up to 2 times (one for the leading block, one for the tailing block). This may slow down the read.
This conceptually seems like a good thing. Can you please post some before/after times of reading large images from the supported filesystems?
One thing to mention is, this change doesn't really bother large file read.
As the patchset is splitting a large read into 3 parts:
1) Leading block 2) Aligned blocks, aka the main part of a large file 3) Tailing block
Most time should still be spent on part 2), not much different than the old code. Part 1) and Part 3) are at most 2 blocks (aka, 2 * 4KiB for most modern large enough fses).
So I doubt it would make any difference for large file read.
Furthermore, as pointed out by Huang Jianan, currently the patchset can not handle read on soft link correctly, thus I'd update the series to do the split into even less parts:
1) Leading block For the unaligned initial block
2) Aligned blocks until the end The tailing block should still starts at a block aligned position, thus most filesystems is already handling them correctly. (Just a min(end, blockend) is enough for most cases already).
Anyway, I'll try to craft some benchmarking for file reads using sandbox. But please don't expect much (or any) difference in that case.
Thanks, Qu

On Wed, Jun 29, 2022 at 09:40:58AM +0800, Qu Wenruo wrote:
On 2022/6/28 22:17, Tom Rini wrote:
On Tue, Jun 28, 2022 at 03:28:00PM +0800, Qu Wenruo wrote:
[BACKGROUND] Unlike FUSE/Kernel which always pass aligned read range, U-boot fs code just pass the request range to underlying fses.
Under most case, this works fine, as U-boot only really needs to read the whole file (aka, 0 for both offset and len, len will be later determined using file size).
But if some advanced user/script wants to extract kernel/initramfs from combined image, we may need to do unaligned read in that case.
[ADVANTAGE] This patchset will handle unaligned read range in _fs_read():
Get blocksize of the underlying fs
Read the leading block contianing the unaligned range The full block will be stored in a local buffer, then only copy the bytes in the unaligned range into the destination buffer.
If the first block covers the whole range, we just call it aday.
Read the aligned range if there is any
Read the tailing block containing the unaligned range And copy the covered range into the destination.
[DISADVANTAGE] There are mainly two problems:
Extra memory allocation for every _fs_read() call For the leading and tailing block.
Extra path resolving All those supported fs will have to do extra path resolving up to 2 times (one for the leading block, one for the tailing block). This may slow down the read.
This conceptually seems like a good thing. Can you please post some before/after times of reading large images from the supported filesystems?
One thing to mention is, this change doesn't really bother large file read.
As the patchset is splitting a large read into 3 parts:
- Leading block
- Aligned blocks, aka the main part of a large file
- Tailing block
Most time should still be spent on part 2), not much different than the old code. Part 1) and Part 3) are at most 2 blocks (aka, 2 * 4KiB for most modern large enough fses).
So I doubt it would make any difference for large file read.
Furthermore, as pointed out by Huang Jianan, currently the patchset can not handle read on soft link correctly, thus I'd update the series to do the split into even less parts:
Leading block For the unaligned initial block
Aligned blocks until the end The tailing block should still starts at a block aligned position, thus most filesystems is already handling them correctly. (Just a min(end, blockend) is enough for most cases already).
Anyway, I'll try to craft some benchmarking for file reads using sandbox. But please don't expect much (or any) difference in that case.
The rework sounds good. And doing it without any real impact to performance either way is good.
participants (6)
-
Huang Jianan
-
Qu Wenruo
-
Qu Wenruo
-
Sean Anderson
-
Simon Glass
-
Tom Rini