[PATCH] arm64: Fix map_range() not splitting mapped blocks

The implementation of map_range() creates the requested mapping by walking the page tables, iterating over multiple PTEs and/or descending into existing table mappings as needed. When doing so, it assumes any pre-existing valid PTE to be a table mapping. This assumption is wrong if the platform code attempts to successively map two overlapping ranges where the latter intersects a block mapping created for the former.
As a result, map_range() treats the existing block mapping as a table mapping and descends into it i.e. starts interpreting the previously-mapped range as an array of PTEs, writing to them and potentially even descending further (extra fun with MMIO ranges!).
Instead, pass any valid non-table mapping to split_block(), which ensures that it actually was a block mapping (calls panic() otherwise) before splitting it.
Fixes: 41e2787f5ec4 ("arm64: Reduce add_map() complexity") Signed-off-by: Pierre-Clément Tosi ptosi@google.com --- arch/arm/cpu/armv8/cache_v8.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c index 697334086f..57d06f0575 100644 --- a/arch/arm/cpu/armv8/cache_v8.c +++ b/arch/arm/cpu/armv8/cache_v8.c @@ -326,6 +326,8 @@ static void map_range(u64 virt, u64 phys, u64 size, int level, /* Going one level down */ if (pte_type(&table[i]) == PTE_TYPE_FAULT) set_pte_table(&table[i], create_table()); + else if (pte_type(&table[i]) != PTE_TYPE_TABLE) + split_block(&table[i], level);
next_table = (u64 *)(table[i] & GENMASK_ULL(47, PAGE_SHIFT)); next_size = min(map_size - (virt & (map_size - 1)), size);

Hi Pierre,
On Mon, Mar 18, 2024 at 4:35 PM Pierre-Clément Tosi ptosi@google.com wrote:
The implementation of map_range() creates the requested mapping by walking the page tables, iterating over multiple PTEs and/or descending into existing table mappings as needed. When doing so, it assumes any pre-existing valid PTE to be a table mapping. This assumption is wrong if the platform code attempts to successively map two overlapping ranges where the latter intersects a block mapping created for the former.
As a result, map_range() treats the existing block mapping as a table mapping and descends into it i.e. starts interpreting the previously-mapped range as an array of PTEs, writing to them and potentially even descending further (extra fun with MMIO ranges!).
Instead, pass any valid non-table mapping to split_block(), which ensures that it actually was a block mapping (calls panic() otherwise) before splitting it.
Fixes: 41e2787f5ec4 ("arm64: Reduce add_map() complexity") Signed-off-by: Pierre-Clément Tosi ptosi@google.com
This fixes the boot regression on colibri-imx8x.
Thanks a lot for your fix!
Tested-by: Fabio Estevam festevam@gmail.com

On Mon, Mar 18, 2024 at 04:46:55PM -0300, Fabio Estevam wrote:
Hi Pierre,
On Mon, Mar 18, 2024 at 4:35 PM Pierre-Clément Tosi ptosi@google.com wrote:
The implementation of map_range() creates the requested mapping by walking the page tables, iterating over multiple PTEs and/or descending into existing table mappings as needed. When doing so, it assumes any pre-existing valid PTE to be a table mapping. This assumption is wrong if the platform code attempts to successively map two overlapping ranges where the latter intersects a block mapping created for the former.
As a result, map_range() treats the existing block mapping as a table mapping and descends into it i.e. starts interpreting the previously-mapped range as an array of PTEs, writing to them and potentially even descending further (extra fun with MMIO ranges!).
Instead, pass any valid non-table mapping to split_block(), which ensures that it actually was a block mapping (calls panic() otherwise) before splitting it.
Fixes: 41e2787f5ec4 ("arm64: Reduce add_map() complexity") Signed-off-by: Pierre-Clément Tosi ptosi@google.com
This fixes the boot regression on colibri-imx8x.
Thanks a lot for your fix!
Tested-by: Fabio Estevam festevam@gmail.com
Tested-by: Hiago De Franco hiago.franco@toradex.com # Toradex Verdin AM62
On Sat, Mar 23, 2024 at 10:34:54AM -0400, Tom Rini wrote:
On Fri, Mar 22, 2024 at 04:33:03PM -0300, Fabio Estevam wrote:
On Fri, Mar 22, 2024 at 4:31 PM Fabio Estevam festevam@gmail.com wrote:
As Pierre's explanation addresses Marc's concern, do you think this can go to 2024.01 to fix the boot regression on imx8qxp/8qm?
I meant 2024.04, sorry.
How much testing has this seen outside of imx?
Tom, I tested with the AM62 TI arm processor (Toradex Verdin AM62) and it works fine.
U-Boot 2024.04-rc4-00001-g5db2e36c8e97 (Mar 25 2024 - 17:28:20 -0300)
SoC: AM62X SR1.0 HS-FS DRAM: 1 GiB Core: 138 devices, 29 uclasses, devicetree: separate MMC: mmc@fa10000: 0, mmc@fa00000: 1 Loading Environment from MMC... OK In: serial@2800000 Out: serial@2800000 Err: serial@2800000 Model: Toradex 0074 Verdin AM62 Dual 1GB IT V1.1A Serial#: 15133548 Carrier: Toradex Dahlia V1.1C, Serial# 10952631
I've added my Tested-by as well.
-- Tom
Best Regards,
Hiago.

Hi Tom,
On Mon, Mar 25, 2024 at 5:37 PM Hiago De Franco hiagofranco@gmail.com wrote:
How much testing has this seen outside of imx?
Tom, I tested with the AM62 TI arm processor (Toradex Verdin AM62) and it works fine.
Do you think this one can be applied to next?
Then we would have time for more testing until 2024.07.

Hi Tom,
On Sat, Mar 30, 2024 at 5:03 PM Fabio Estevam festevam@gmail.com wrote:
Hi Tom,
On Mon, Mar 25, 2024 at 5:37 PM Hiago De Franco hiagofranco@gmail.com wrote:
How much testing has this seen outside of imx?
Tom, I tested with the AM62 TI arm processor (Toradex Verdin AM62) and it works fine.
Do you think this one can be applied to next?
Then we would have time for more testing until 2024.07.
Can this go in now?
Thanks

On Mon, 18 Mar 2024 19:35:49 +0000, Pierre-Clément Tosi ptosi@google.com wrote:
The implementation of map_range() creates the requested mapping by walking the page tables, iterating over multiple PTEs and/or descending into existing table mappings as needed. When doing so, it assumes any pre-existing valid PTE to be a table mapping. This assumption is wrong if the platform code attempts to successively map two overlapping ranges where the latter intersects a block mapping created for the former.
As a result, map_range() treats the existing block mapping as a table mapping and descends into it i.e. starts interpreting the previously-mapped range as an array of PTEs, writing to them and potentially even descending further (extra fun with MMIO ranges!).
Instead, pass any valid non-table mapping to split_block(), which ensures that it actually was a block mapping (calls panic() otherwise) before splitting it.
Fixes: 41e2787f5ec4 ("arm64: Reduce add_map() complexity") Signed-off-by: Pierre-Clément Tosi ptosi@google.com
arch/arm/cpu/armv8/cache_v8.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c index 697334086f..57d06f0575 100644 --- a/arch/arm/cpu/armv8/cache_v8.c +++ b/arch/arm/cpu/armv8/cache_v8.c @@ -326,6 +326,8 @@ static void map_range(u64 virt, u64 phys, u64 size, int level, /* Going one level down */ if (pte_type(&table[i]) == PTE_TYPE_FAULT) set_pte_table(&table[i], create_table());
else if (pte_type(&table[i]) != PTE_TYPE_TABLE)
split_block(&table[i], level);
next_table = (u64 *)(table[i] & GENMASK_ULL(47, PAGE_SHIFT)); next_size = min(map_size - (virt & (map_size - 1)), size);
This seems pretty reasonable, thanks for looking into this. However, I can't help but notice that this is done without any BBM, and no TLBI either.
Are we guaranteed that the updated page tables are not live at the point of update?
Thanks,
M.

Hi Marc,
On Tue, Mar 19, 2024 at 09:43:03AM +0000, Marc Zyngier wrote:
This seems pretty reasonable, thanks for looking into this. However, I can't help but notice that this is done without any BBM, and no TLBI either.
Are we guaranteed that the updated page tables are not live at the point of update?
This means gd->arch.tlb_addr pointing to the live PTs during setup_pgtables().
In arch/arm/cpu/armv8, setup_all_pgtables() runs with SCTLR_ELx.M unset.
In arch/arm/cpu/armv8/fsl-layerscape, setup_pgtables() is called twice:
- early_mmu_setup() calls it with SCTLR_ELx.M unset; - final_mmu_setup() overwrites gd->arch.tlb_addr before calling it iff CFG_SYS_MEM_RESERVE_SECURE is defined i.e. if CONFIG_SYS_SOC="fsl-layerscape" so that <asm/arch-fsl-layerscape/config.h> gets auto-included through <include/config.h>.
So can CONFIG_FSL_LAYERSCAPE be set while CONFIG_SYS_SOC != "fsl-layerscape"?
I suppose Fabio and Stefano can answer this and/or help with ensuring that setup_pgtables() is never called on live PTs.
Thanks,

Hi Pierre,
On Tue, Mar 19, 2024 at 8:39 AM Pierre-Clément Tosi ptosi@google.com wrote:
This means gd->arch.tlb_addr pointing to the live PTs during setup_pgtables().
In arch/arm/cpu/armv8, setup_all_pgtables() runs with SCTLR_ELx.M unset.
In arch/arm/cpu/armv8/fsl-layerscape, setup_pgtables() is called twice:
- early_mmu_setup() calls it with SCTLR_ELx.M unset;
- final_mmu_setup() overwrites gd->arch.tlb_addr before calling it iff CFG_SYS_MEM_RESERVE_SECURE is defined i.e. if CONFIG_SYS_SOC="fsl-layerscape" so that <asm/arch-fsl-layerscape/config.h> gets auto-included through <include/config.h>.
So can CONFIG_FSL_LAYERSCAPE be set while CONFIG_SYS_SOC != "fsl-layerscape"?
No, this cannot happen.
Only the following Layerscape SoCs select CONFIG_FSL_LAYERSCAPE in arch/arm/cpu/armv8/fsl-layerscape/Kconfig: LS1012A, LS1028A, LS1043A, LS1046A, LS1088A, LS2080A, LX2162A and LX2160A
I saw the original boot problem with the i.MX8QX.
The i.MX8QX is part of the i.MX family, not the Layerscape family.
I suppose Fabio and Stefano can answer this and/or help with ensuring that setup_pgtables() is never called on live PTs.
Let me know if you need any clarification.
Thanks,
Fabio Estevam

Hi Fabio,
On Tue, Mar 19, 2024 at 09:13:12AM -0300, Fabio Estevam wrote:
Hi Pierre,
On Tue, Mar 19, 2024 at 8:39 AM Pierre-Clément Tosi ptosi@google.com wrote:
This means gd->arch.tlb_addr pointing to the live PTs during setup_pgtables().
In arch/arm/cpu/armv8, setup_all_pgtables() runs with SCTLR_ELx.M unset.
In arch/arm/cpu/armv8/fsl-layerscape, setup_pgtables() is called twice:
- early_mmu_setup() calls it with SCTLR_ELx.M unset;
- final_mmu_setup() overwrites gd->arch.tlb_addr before calling it iff CFG_SYS_MEM_RESERVE_SECURE is defined i.e. if CONFIG_SYS_SOC="fsl-layerscape" so that <asm/arch-fsl-layerscape/config.h> gets auto-included through <include/config.h>.
So can CONFIG_FSL_LAYERSCAPE be set while CONFIG_SYS_SOC != "fsl-layerscape"?
No, this cannot happen.
Thanks for confirming.
For clarity, it might then make sense to drop that #ifdef in final_mmu_setup().
Only the following Layerscape SoCs select CONFIG_FSL_LAYERSCAPE in arch/arm/cpu/armv8/fsl-layerscape/Kconfig: LS1012A, LS1028A, LS1043A, LS1046A, LS1088A, LS2080A, LX2162A and LX2160A
I saw the original boot problem with the i.MX8QX.
The i.MX8QX is part of the i.MX family, not the Layerscape family.
Sure.
To be clear, the concern here was that split_block() doesn't perform what the CPU architecture requires when modifying page tables that the MMU is using and the question therefore was: can setup_pgtables() be called on such live PTs?
For most AArch64 U-Boot ports (including the i.MX family), the answer is trivial because they use the arch code i.e. setup_all_pgtables(). However, as fsl-layerscape re-implements mmu_setup(), it had to be looked at separately, hence my question, which you answered above.
HTH,

Hi Tom,
On Tue, Mar 19, 2024 at 9:39 AM Pierre-Clément Tosi ptosi@google.com wrote:
For most AArch64 U-Boot ports (including the i.MX family), the answer is trivial because they use the arch code i.e. setup_all_pgtables(). However, as fsl-layerscape re-implements mmu_setup(), it had to be looked at separately, hence my question, which you answered above.
As Pierre's explanation addresses Marc's concern, do you think this can go to 2024.01 to fix the boot regression on imx8qxp/8qm?
Thanks

On Fri, Mar 22, 2024 at 4:31 PM Fabio Estevam festevam@gmail.com wrote:
As Pierre's explanation addresses Marc's concern, do you think this can go to 2024.01 to fix the boot regression on imx8qxp/8qm?
I meant 2024.04, sorry.

On Fri, Mar 22, 2024 at 04:33:03PM -0300, Fabio Estevam wrote:
On Fri, Mar 22, 2024 at 4:31 PM Fabio Estevam festevam@gmail.com wrote:
As Pierre's explanation addresses Marc's concern, do you think this can go to 2024.01 to fix the boot regression on imx8qxp/8qm?
I meant 2024.04, sorry.
How much testing has this seen outside of imx?

On Tue, 19 Mar 2024 12:39:26 +0000, Pierre-Clément Tosi ptosi@google.com wrote:
Hi Fabio,
On Tue, Mar 19, 2024 at 09:13:12AM -0300, Fabio Estevam wrote:
Hi Pierre,
On Tue, Mar 19, 2024 at 8:39 AM Pierre-Clément Tosi ptosi@google.com wrote:
This means gd->arch.tlb_addr pointing to the live PTs during setup_pgtables().
In arch/arm/cpu/armv8, setup_all_pgtables() runs with SCTLR_ELx.M unset.
In arch/arm/cpu/armv8/fsl-layerscape, setup_pgtables() is called twice:
- early_mmu_setup() calls it with SCTLR_ELx.M unset;
- final_mmu_setup() overwrites gd->arch.tlb_addr before calling it iff CFG_SYS_MEM_RESERVE_SECURE is defined i.e. if CONFIG_SYS_SOC="fsl-layerscape" so that <asm/arch-fsl-layerscape/config.h> gets auto-included through <include/config.h>.
So can CONFIG_FSL_LAYERSCAPE be set while CONFIG_SYS_SOC != "fsl-layerscape"?
No, this cannot happen.
Thanks for confirming.
For clarity, it might then make sense to drop that #ifdef in final_mmu_setup().
Only the following Layerscape SoCs select CONFIG_FSL_LAYERSCAPE in arch/arm/cpu/armv8/fsl-layerscape/Kconfig: LS1012A, LS1028A, LS1043A, LS1046A, LS1088A, LS2080A, LX2162A and LX2160A
I saw the original boot problem with the i.MX8QX.
The i.MX8QX is part of the i.MX family, not the Layerscape family.
Sure.
To be clear, the concern here was that split_block() doesn't perform what the CPU architecture requires when modifying page tables that the MMU is using and the question therefore was: can setup_pgtables() be called on such live PTs?
For most AArch64 U-Boot ports (including the i.MX family), the answer is trivial because they use the arch code i.e. setup_all_pgtables(). However, as fsl-layerscape re-implements mmu_setup(), it had to be looked at separately, hence my question, which you answered above.
Thanks for the details.
With that,
Reviewed-by: Marc Zyngier maz@kernel.org
M.

On Mon, 18 Mar 2024 19:35:49 +0000, Pierre-Clément Tosi wrote:
The implementation of map_range() creates the requested mapping by walking the page tables, iterating over multiple PTEs and/or descending into existing table mappings as needed. When doing so, it assumes any pre-existing valid PTE to be a table mapping. This assumption is wrong if the platform code attempts to successively map two overlapping ranges where the latter intersects a block mapping created for the former.
[...]
Applied to u-boot/master, thanks!
participants (5)
-
Fabio Estevam
-
Hiago De Franco
-
Marc Zyngier
-
Pierre-Clément Tosi
-
Tom Rini