ext4: invalid extent block on imx7

Hi all,
=> ls mmc 0:1 /usr/lib/linux-image-4.9.11-1.3.0-dirty CACHE: Misaligned operation at range [bdfff998, bdfffd98] CACHE: Misaligned operation at range [bdfff998, bdfffd98] CACHE: Misaligned operation at range [bdfff998, bdfffd98] CACHE: Misaligned operation at range [bdfff998, bdfffd98] invalid extent block
I'm using master (50be9f0e1ccc) on the MCIMX7SABRE, defconfig.
What could this be? The filesystem is fine from Linux POV.
Thanks Jan

On Mon, Mar 16, 2020 at 08:09:53PM +0100, Jan Kiszka wrote:
Hi all,
=> ls mmc 0:1 /usr/lib/linux-image-4.9.11-1.3.0-dirty CACHE: Misaligned operation at range [bdfff998, bdfffd98] CACHE: Misaligned operation at range [bdfff998, bdfffd98] CACHE: Misaligned operation at range [bdfff998, bdfffd98] CACHE: Misaligned operation at range [bdfff998, bdfffd98] invalid extent block
I'm using master (50be9f0e1ccc) on the MCIMX7SABRE, defconfig.
What could this be? The filesystem is fine from Linux POV.
Use tune2fs -l and see if there's any new'ish features enabled that we need some sort of check-and-reject for would be my first guess.

On 20.03.20 19:21, Tom Rini wrote:
On Mon, Mar 16, 2020 at 08:09:53PM +0100, Jan Kiszka wrote:
Hi all,
=> ls mmc 0:1 /usr/lib/linux-image-4.9.11-1.3.0-dirty CACHE: Misaligned operation at range [bdfff998, bdfffd98] CACHE: Misaligned operation at range [bdfff998, bdfffd98] CACHE: Misaligned operation at range [bdfff998, bdfffd98] CACHE: Misaligned operation at range [bdfff998, bdfffd98] invalid extent block
I'm using master (50be9f0e1ccc) on the MCIMX7SABRE, defconfig.
What could this be? The filesystem is fine from Linux POV.
Use tune2fs -l and see if there's any new'ish features enabled that we need some sort of check-and-reject for would be my first guess.
Here are the reported feature flags:
has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Anything too fancy in here? But the method of creating this filesystem does not deviate from many other setups we have for U-Boot (on other boards).
Thanks, Jan

On Wed, Mar 25, 2020 at 07:32:30AM +0100, Jan Kiszka wrote:
On 20.03.20 19:21, Tom Rini wrote:
On Mon, Mar 16, 2020 at 08:09:53PM +0100, Jan Kiszka wrote:
Hi all,
=> ls mmc 0:1 /usr/lib/linux-image-4.9.11-1.3.0-dirty CACHE: Misaligned operation at range [bdfff998, bdfffd98] CACHE: Misaligned operation at range [bdfff998, bdfffd98] CACHE: Misaligned operation at range [bdfff998, bdfffd98] CACHE: Misaligned operation at range [bdfff998, bdfffd98] invalid extent block
I'm using master (50be9f0e1ccc) on the MCIMX7SABRE, defconfig.
What could this be? The filesystem is fine from Linux POV.
Use tune2fs -l and see if there's any new'ish features enabled that we need some sort of check-and-reject for would be my first guess.
Here are the reported feature flags:
has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Of that, only metadata_csum means that you can't write to that image, but you're just trying to read and that should be fine. Can you go back in time a little and see if this problem persists or if it's been introduced of late? Or recreate it on other platforms/SoCs? Thanks!
Anything too fancy in here? But the method of creating this filesystem does not deviate from many other setups we have for U-Boot (on other boards).
Yes, but for some time now e2fsprogs has introduced new default features that require compatibility checks.

On 25.03.20 16:00, Tom Rini wrote:
On Wed, Mar 25, 2020 at 07:32:30AM +0100, Jan Kiszka wrote:
On 20.03.20 19:21, Tom Rini wrote:
On Mon, Mar 16, 2020 at 08:09:53PM +0100, Jan Kiszka wrote:
Hi all,
=> ls mmc 0:1 /usr/lib/linux-image-4.9.11-1.3.0-dirty CACHE: Misaligned operation at range [bdfff998, bdfffd98] CACHE: Misaligned operation at range [bdfff998, bdfffd98] CACHE: Misaligned operation at range [bdfff998, bdfffd98] CACHE: Misaligned operation at range [bdfff998, bdfffd98] invalid extent block
I'm using master (50be9f0e1ccc) on the MCIMX7SABRE, defconfig.
What could this be? The filesystem is fine from Linux POV.
Use tune2fs -l and see if there's any new'ish features enabled that we need some sort of check-and-reject for would be my first guess.
Here are the reported feature flags:
has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Of that, only metadata_csum means that you can't write to that image, but you're just trying to read and that should be fine. Can you go back in time a little and see if this problem persists or if it's been introduced of late? Or recreate it on other platforms/SoCs? Thanks!
Bisected, regression of d5aee659f217 ("fs: ext4: cache extent data"). Reverting this commit over master resolves the issue.
Any idea what could be wrong? What I noticed is that the extent has a zeroed magic when things go wrong, so maybe it is falsely considered to be cached?
Jan
Anything too fancy in here? But the method of creating this filesystem does not deviate from many other setups we have for U-Boot (on other boards).
Yes, but for some time now e2fsprogs has introduced new default features that require compatibility checks.

On 3/25/20 1:11 PM, Jan Kiszka wrote:
On 25.03.20 16:00, Tom Rini wrote:
On Wed, Mar 25, 2020 at 07:32:30AM +0100, Jan Kiszka wrote:
On 20.03.20 19:21, Tom Rini wrote:
On Mon, Mar 16, 2020 at 08:09:53PM +0100, Jan Kiszka wrote:
Hi all,
=> ls mmc 0:1 /usr/lib/linux-image-4.9.11-1.3.0-dirty CACHE: Misaligned operation at range [bdfff998, bdfffd98] CACHE: Misaligned operation at range [bdfff998, bdfffd98] CACHE: Misaligned operation at range [bdfff998, bdfffd98] CACHE: Misaligned operation at range [bdfff998, bdfffd98] invalid extent block
I'm using master (50be9f0e1ccc) on the MCIMX7SABRE, defconfig.
What could this be? The filesystem is fine from Linux POV.
Use tune2fs -l and see if there's any new'ish features enabled that we need some sort of check-and-reject for would be my first guess.
Here are the reported feature flags:
has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Of that, only metadata_csum means that you can't write to that image, but you're just trying to read and that should be fine. Can you go back in time a little and see if this problem persists or if it's been introduced of late? Or recreate it on other platforms/SoCs? Thanks!
Bisected, regression of d5aee659f217 ("fs: ext4: cache extent data"). Reverting this commit over master resolves the issue.
Any idea what could be wrong? What I noticed is that the extent has a zeroed magic when things go wrong, so maybe it is falsely considered to be cached?
This is puzzling. I took another look at that patch and I don't see anything wrong. My guess would be:
- Some unrelated memory corruption bug was exposed simply because this patch uses dynamic memory or stack slightly differently than before.
- Something writes to the cached block, whereas the cache code assumes the buffer is read-only.
The cache metadata exists on the stack and so only lasts for the duration of read_allocated_block() or ext4fs_read_file(), so there's no issue with re-using the cache across different devices, or persisting across an ext4 write operation or anything like that. Is this easy to reproduce; is there a small disk image that shows the problem?

On 25.03.20 21:01, Stephen Warren wrote:
On 3/25/20 1:11 PM, Jan Kiszka wrote:
On 25.03.20 16:00, Tom Rini wrote:
On Wed, Mar 25, 2020 at 07:32:30AM +0100, Jan Kiszka wrote:
On 20.03.20 19:21, Tom Rini wrote:
On Mon, Mar 16, 2020 at 08:09:53PM +0100, Jan Kiszka wrote:
Hi all,
=> ls mmc 0:1 /usr/lib/linux-image-4.9.11-1.3.0-dirty CACHE: Misaligned operation at range [bdfff998, bdfffd98] CACHE: Misaligned operation at range [bdfff998, bdfffd98] CACHE: Misaligned operation at range [bdfff998, bdfffd98] CACHE: Misaligned operation at range [bdfff998, bdfffd98] invalid extent block
I'm using master (50be9f0e1ccc) on the MCIMX7SABRE, defconfig.
What could this be? The filesystem is fine from Linux POV.
Use tune2fs -l and see if there's any new'ish features enabled that we need some sort of check-and-reject for would be my first guess.
Here are the reported feature flags:
has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Of that, only metadata_csum means that you can't write to that image, but you're just trying to read and that should be fine. Can you go back in time a little and see if this problem persists or if it's been introduced of late? Or recreate it on other platforms/SoCs? Thanks!
Bisected, regression of d5aee659f217 ("fs: ext4: cache extent data"). Reverting this commit over master resolves the issue.
Any idea what could be wrong? What I noticed is that the extent has a zeroed magic when things go wrong, so maybe it is falsely considered to be cached?
This is puzzling. I took another look at that patch and I don't see anything wrong. My guess would be:
- Some unrelated memory corruption bug was exposed simply because this
patch uses dynamic memory or stack slightly differently than before.
- Something writes to the cached block, whereas the cache code assumes
the buffer is read-only.
The cache metadata exists on the stack and so only lasts for the duration of read_allocated_block() or ext4fs_read_file(), so there's no issue with re-using the cache across different devices, or persisting across an ext4 write operation or anything like that. Is this easy to reproduce; is there a small disk image that shows the problem?
Found it: alignment issue, apparently surfaced by your change when switching from zalloc (which does cacheline? alignment) to malloc. Is this sensitivity maybe SoC specific?
Jan
participants (3)
-
Jan Kiszka
-
Stephen Warren
-
Tom Rini