[U-Boot] [PATCH v4 00/11] arm64: Unify MMU code v3

Howdy,
Currently on arm64 there is a big pile of mess when it comes to MMU support and page tables. Each board does its own little thing and the generic code is pretty dumb and nobody actually uses it.
This patch set tries to clean that up. After this series is applied, all boards except for the FSL Layerscape ones are converted to the new generic page table logic and have icache+dcache enabled.
The new code always uses 4k page size. It dynamically allocates 1G or 2M pages for ranges that fit. When a dcache attribute request comes in that requires a smaller granularity than our previous allocation could fulfill, pages get automatically split.
I have tested and verified the code works on HiKey (bare metal), vexpress64 (Foundation Model) and zynqmp (QEMU). The TX1 target is untested, but given the simplicity of the maps I doubt it'll break. ThunderX in theory should also work, but I haven't tested it. I would be very happy if people with access to those system could give the patch set a try.
With this we're a big step closer to a good base line for EFI payload support, since we can now just require that all boards always have dcache enabled.
I would also be incredibly happy if some Freescale people could look at their MMU code and try to unify it into the now cleaned up generic code. I don't think we're far off here.
Alex
v1 -> v2:
- Fix comment for create_table() - Rework page table size calculation - Move mmu tables into board files - New patch: thunderx: Move mmu table into board file
v2 -> v3:
- Move PGTABLE_SIZE out of the ASSEMBLY section - Run pte splitting code in separate page table - Replace is_level in count_required_pts with more obvious pte_type - New patch: arm64: Disable TTBR1 maps in EL1
v3 -> v4:
- Fix Lv3 PTEs that are always of type PTE_TYPE_LEVEL (fixes zynqmp)
Alexander Graf (11): thunderx: Calculate TCR dynamically arm64: Disable TTBR1 maps in EL1 arm64: Make full va map code more dynamic thunderx: Move mmu table into board file zymqmp: Replace home grown mmu code with generic table approach tegra: Replace home grown mmu code with generic table approach vexpress64: Add MMU tables dwmmc: Increase retry timeout hikey: Add MMU tables arm64: Remove non-full-va map code arm64: Only allow dcache disabled in SPL builds
arch/arm/cpu/armv8/cache.S | 54 +++ arch/arm/cpu/armv8/cache_v8.c | 551 +++++++++++++++++++------ arch/arm/cpu/armv8/fsl-layerscape/cpu.c | 37 +- arch/arm/cpu/armv8/zynqmp/cpu.c | 217 +++------- arch/arm/include/asm/arch-fsl-layerscape/cpu.h | 94 ++--- arch/arm/include/asm/armv8/mmu.h | 123 ++---- arch/arm/include/asm/global_data.h | 7 +- arch/arm/include/asm/system.h | 12 +- arch/arm/mach-tegra/arm64-mmu.c | 132 +----- board/armltd/vexpress64/vexpress64.c | 21 + board/cavium/thunderx/thunderx.c | 24 ++ board/hisilicon/hikey/hikey.c | 21 + doc/README.arm64 | 20 - drivers/mmc/dw_mmc.c | 2 +- include/configs/hikey.h | 4 +- include/configs/thunderx_88xx.h | 30 -- include/configs/vexpress_aemv8a.h | 5 +- 17 files changed, 738 insertions(+), 616 deletions(-)

Based on the memory map we can determine a lot of hard coded fields of TCR, like the maximum VA and max PA we want to support. Calculate those dynamically to reduce the chance for pit falls.
Signed-off-by: Alexander Graf agraf@suse.de --- arch/arm/cpu/armv8/cache_v8.c | 59 +++++++++++++++++++++++++++++++++++++++- arch/arm/include/asm/armv8/mmu.h | 6 +--- include/configs/thunderx_88xx.h | 3 -- 3 files changed, 59 insertions(+), 9 deletions(-)
diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c index 71f0020..9229532 100644 --- a/arch/arm/cpu/armv8/cache_v8.c +++ b/arch/arm/cpu/armv8/cache_v8.c @@ -38,6 +38,58 @@ static struct mm_region mem_map[] = CONFIG_SYS_MEM_MAP; #define PTL1_ENTRIES CONFIG_SYS_PTL1_ENTRIES #define PTL2_ENTRIES CONFIG_SYS_PTL2_ENTRIES
+static u64 get_tcr(int el, u64 *pips, u64 *pva_bits) +{ + u64 max_addr = 0; + u64 ips, va_bits; + u64 tcr; + int i; + + /* Find the largest address we need to support */ + for (i = 0; i < ARRAY_SIZE(mem_map); i++) + max_addr = max(max_addr, mem_map[i].base + mem_map[i].size); + + /* Calculate the maximum physical (and thus virtual) address */ + if (max_addr > (1ULL << 44)) { + ips = 5; + va_bits = 48; + } else if (max_addr > (1ULL << 42)) { + ips = 4; + va_bits = 44; + } else if (max_addr > (1ULL << 40)) { + ips = 3; + va_bits = 42; + } else if (max_addr > (1ULL << 36)) { + ips = 2; + va_bits = 40; + } else if (max_addr > (1ULL << 32)) { + ips = 1; + va_bits = 36; + } else { + ips = 0; + va_bits = 32; + } + + if (el == 1) { + tcr = TCR_EL1_RSVD | (ips << 32); + } else if (el == 2) { + tcr = TCR_EL2_RSVD | (ips << 16); + } else { + tcr = TCR_EL3_RSVD | (ips << 16); + } + + /* PTWs cacheable, inner/outer WBWA and inner shareable */ + tcr |= TCR_TG0_64K | TCR_SHARED_INNER | TCR_ORGN_WBWA | TCR_IRGN_WBWA; + tcr |= TCR_T0SZ(VA_BITS); + + if (pips) + *pips = ips; + if (pva_bits) + *pva_bits = va_bits; + + return tcr; +} + static void setup_pgtables(void) { int l1_e, l2_e; @@ -110,6 +162,10 @@ __weak void mmu_setup(void) /* Set up page tables only on BSP */ if (coreid == BSP_COREID) setup_pgtables(); + + el = current_el(); + set_ttbr_tcr_mair(el, gd->arch.tlb_addr, get_tcr(el, NULL, NULL), + MEMORY_ATTRIBUTES); #else /* Setup an identity-mapping for all spaces */ for (i = 0; i < (PGTABLE_SIZE >> 3); i++) { @@ -128,7 +184,6 @@ __weak void mmu_setup(void) } }
-#endif /* load TTBR0 */ el = current_el(); if (el == 1) { @@ -144,6 +199,8 @@ __weak void mmu_setup(void) TCR_EL3_RSVD | TCR_FLAGS | TCR_EL3_IPS_BITS, MEMORY_ATTRIBUTES); } +#endif + /* enable the mmu */ set_sctlr(get_sctlr() | CR_M); } diff --git a/arch/arm/include/asm/armv8/mmu.h b/arch/arm/include/asm/armv8/mmu.h index 897f010..39ff745 100644 --- a/arch/arm/include/asm/armv8/mmu.h +++ b/arch/arm/include/asm/armv8/mmu.h @@ -159,11 +159,6 @@ #define TCR_EL1_IPS_BITS (UL(3) << 32) /* 42 bits physical address */ #define TCR_EL2_IPS_BITS (3 << 16) /* 42 bits physical address */ #define TCR_EL3_IPS_BITS (3 << 16) /* 42 bits physical address */ -#else -#define TCR_EL1_IPS_BITS CONFIG_SYS_TCR_EL1_IPS_BITS -#define TCR_EL2_IPS_BITS CONFIG_SYS_TCR_EL2_IPS_BITS -#define TCR_EL3_IPS_BITS CONFIG_SYS_TCR_EL3_IPS_BITS -#endif
/* PTWs cacheable, inner/outer WBWA and inner shareable */ #define TCR_FLAGS (TCR_TG0_64K | \ @@ -171,6 +166,7 @@ TCR_ORGN_WBWA | \ TCR_IRGN_WBWA | \ TCR_T0SZ(VA_BITS)) +#endif
#define TCR_EL1_RSVD (1 << 31) #define TCR_EL2_RSVD (1 << 31 | 1 << 23) diff --git a/include/configs/thunderx_88xx.h b/include/configs/thunderx_88xx.h index cece4dd..b9f93ad 100644 --- a/include/configs/thunderx_88xx.h +++ b/include/configs/thunderx_88xx.h @@ -50,9 +50,6 @@ #define CONFIG_SYS_PGTABLE_SIZE \ ((CONFIG_SYS_PTL1_ENTRIES + \ CONFIG_SYS_MEM_MAP_SIZE * CONFIG_SYS_PTL2_ENTRIES) * 8) -#define CONFIG_SYS_TCR_EL1_IPS_BITS (5UL << 32) -#define CONFIG_SYS_TCR_EL2_IPS_BITS (5 << 16) -#define CONFIG_SYS_TCR_EL3_IPS_BITS (5 << 16)
/* Link Definitions */ #define CONFIG_SYS_TEXT_BASE 0x00500000

Hi,
On 25 February 2016 at 17:49, Alexander Graf agraf@suse.de wrote:
Based on the memory map we can determine a lot of hard coded fields of TCR, like the maximum VA and max PA we want to support. Calculate those dynamically to reduce the chance for pit falls.
Signed-off-by: Alexander Graf agraf@suse.de
arch/arm/cpu/armv8/cache_v8.c | 59 +++++++++++++++++++++++++++++++++++++++- arch/arm/include/asm/armv8/mmu.h | 6 +--- include/configs/thunderx_88xx.h | 3 -- 3 files changed, 59 insertions(+), 9 deletions(-)
This approach (extern variable) looks good to me.
Reviewed-by: Simon Glass sjg@chromium.org
I didn't know ARRAY_SIZE() worked on external variables, so I leaned something also.
Regards, Simon

Am 26.02.2016 um 03:51 schrieb Simon Glass sjg@chromium.org:
Hi,
On 25 February 2016 at 17:49, Alexander Graf agraf@suse.de wrote: Based on the memory map we can determine a lot of hard coded fields of TCR, like the maximum VA and max PA we want to support. Calculate those dynamically to reduce the chance for pit falls.
Signed-off-by: Alexander Graf agraf@suse.de
arch/arm/cpu/armv8/cache_v8.c | 59 +++++++++++++++++++++++++++++++++++++++- arch/arm/include/asm/armv8/mmu.h | 6 +--- include/configs/thunderx_88xx.h | 3 -- 3 files changed, 59 insertions(+), 9 deletions(-)
This approach (extern variable) looks good to me.
Reviewed-by: Simon Glass sjg@chromium.org
I didn't know ARRAY_SIZE() worked on external variables, so I leaned something also.
It doesn't. Could you please point me to a case where I'm still using ARRAY_SIZE in v4? I thought I had moved everything to null-terminating entries.
Alex

Hi Alexander,
On 26 February 2016 at 01:58, Alexander Graf agraf@suse.de wrote:
Am 26.02.2016 um 03:51 schrieb Simon Glass sjg@chromium.org:
Hi,
On 25 February 2016 at 17:49, Alexander Graf agraf@suse.de wrote: Based on the memory map we can determine a lot of hard coded fields of TCR, like the maximum VA and max PA we want to support. Calculate those dynamically to reduce the chance for pit falls.
Signed-off-by: Alexander Graf agraf@suse.de
arch/arm/cpu/armv8/cache_v8.c | 59 +++++++++++++++++++++++++++++++++++++++- arch/arm/include/asm/armv8/mmu.h | 6 +--- include/configs/thunderx_88xx.h | 3 -- 3 files changed, 59 insertions(+), 9 deletions(-)
This approach (extern variable) looks good to me.
Reviewed-by: Simon Glass sjg@chromium.org
I didn't know ARRAY_SIZE() worked on external variables, so I leaned something also.
It doesn't. Could you please point me to a case where I'm still using ARRAY_SIZE in v4? I thought I had moved everything to null-terminating entries.
I imagined some gcc linker sizeof() magic. Oh well.
See near the top of get_tcr() - the patch on this thread.
Regards, Simon

On 02/26/2016 04:39 PM, Simon Glass wrote:
Hi Alexander,
On 26 February 2016 at 01:58, Alexander Graf agraf@suse.de wrote:
Am 26.02.2016 um 03:51 schrieb Simon Glass sjg@chromium.org:
Hi,
On 25 February 2016 at 17:49, Alexander Graf agraf@suse.de wrote: Based on the memory map we can determine a lot of hard coded fields of TCR, like the maximum VA and max PA we want to support. Calculate those dynamically to reduce the chance for pit falls.
Signed-off-by: Alexander Graf agraf@suse.de
arch/arm/cpu/armv8/cache_v8.c | 59 +++++++++++++++++++++++++++++++++++++++- arch/arm/include/asm/armv8/mmu.h | 6 +--- include/configs/thunderx_88xx.h | 3 -- 3 files changed, 59 insertions(+), 9 deletions(-)
This approach (extern variable) looks good to me.
Reviewed-by: Simon Glass sjg@chromium.org
I didn't know ARRAY_SIZE() worked on external variables, so I leaned something also.
It doesn't. Could you please point me to a case where I'm still using ARRAY_SIZE in v4? I thought I had moved everything to null-terminating entries.
I imagined some gcc linker sizeof() magic. Oh well.
See near the top of get_tcr() - the patch on this thread.
In 1/11 the mem_map is still a locally set array which gets preinitialized to a constant array. That one is supported by ARRAY_SIZE.
Later in "thunderx: Move mmu table into board file" I move the mem_map variable into board files and change all ARRAY_SIZE instances to loops that loop until they hit an empty entry. This is mostly to make review and bisectability easier, since that way only few things change at a time.
Alex

Hi Alex,
On 26 February 2016 at 08:44, Alexander Graf agraf@suse.de wrote:
On 02/26/2016 04:39 PM, Simon Glass wrote:
Hi Alexander,
On 26 February 2016 at 01:58, Alexander Graf agraf@suse.de wrote:
Am 26.02.2016 um 03:51 schrieb Simon Glass sjg@chromium.org:
Hi,
On 25 February 2016 at 17:49, Alexander Graf agraf@suse.de wrote: Based on the memory map we can determine a lot of hard coded fields of TCR, like the maximum VA and max PA we want to support. Calculate those dynamically to reduce the chance for pit falls.
Signed-off-by: Alexander Graf agraf@suse.de
arch/arm/cpu/armv8/cache_v8.c | 59 +++++++++++++++++++++++++++++++++++++++- arch/arm/include/asm/armv8/mmu.h | 6 +--- include/configs/thunderx_88xx.h | 3 -- 3 files changed, 59 insertions(+), 9 deletions(-)
This approach (extern variable) looks good to me.
Reviewed-by: Simon Glass sjg@chromium.org
I didn't know ARRAY_SIZE() worked on external variables, so I leaned something also.
It doesn't. Could you please point me to a case where I'm still using ARRAY_SIZE in v4? I thought I had moved everything to null-terminating entries.
I imagined some gcc linker sizeof() magic. Oh well.
See near the top of get_tcr() - the patch on this thread.
In 1/11 the mem_map is still a locally set array which gets preinitialized to a constant array. That one is supported by ARRAY_SIZE.
Later in "thunderx: Move mmu table into board file" I move the mem_map variable into board files and change all ARRAY_SIZE instances to loops that loop until they hit an empty entry. This is mostly to make review and bisectability easier, since that way only few things change at a time.
OK, makes sense. Sorry I haven't received it very closely but you seem to have got plenty of input.
Regards, Simon

When running in EL1, AArch64 knows two page table maps. One with addresses that start with all zeros (TTBR0) and one with addresses that start with all ones (TTBR1).
In U-Boot we don't care about the high up maps, so just disable them to ensure we don't walk an invalid page table by accident.
Reported-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Alexander Graf agraf@suse.de --- arch/arm/cpu/armv8/cache_v8.c | 2 +- arch/arm/include/asm/armv8/mmu.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c index 9229532..d92f2d1 100644 --- a/arch/arm/cpu/armv8/cache_v8.c +++ b/arch/arm/cpu/armv8/cache_v8.c @@ -71,7 +71,7 @@ static u64 get_tcr(int el, u64 *pips, u64 *pva_bits) }
if (el == 1) { - tcr = TCR_EL1_RSVD | (ips << 32); + tcr = TCR_EL1_RSVD | (ips << 32) | TCR_EPD1_DISABLE; } else if (el == 2) { tcr = TCR_EL2_RSVD | (ips << 16); } else { diff --git a/arch/arm/include/asm/armv8/mmu.h b/arch/arm/include/asm/armv8/mmu.h index 39ff745..0080ae6 100644 --- a/arch/arm/include/asm/armv8/mmu.h +++ b/arch/arm/include/asm/armv8/mmu.h @@ -154,6 +154,7 @@ #define TCR_TG0_4K (0 << 14) #define TCR_TG0_64K (1 << 14) #define TCR_TG0_16K (2 << 14) +#define TCR_EPD1_DISABLE (1 << 23)
#ifndef CONFIG_SYS_FULL_VA #define TCR_EL1_IPS_BITS (UL(3) << 32) /* 42 bits physical address */

The idea to generate our pages tables from an array of memory ranges is very sound. However, instead of hard coding the code to create up to 2 levels of 64k granule page tables, we really should just create normal 4k page tables that allow us to set caching attributes on 2M or 4k level later on.
So this patch moves the full_va mapping code to 4k page size and makes it fully flexible to dynamically create as many levels as necessary for a map (including dynamic 1G/2M pages). It also adds support to dynamically split a large map into smaller ones when some code wants to set dcache attributes.
With all this in place, there is very little reason to create your own page tables in board specific files.
Signed-off-by: Alexander Graf agraf@suse.de
---
v1 -> v2:
- Fix comment for create_table() - Rework page table size calculation - s/DPRINTF/debug/g - Improve panic messages
v2 -> v3:
- Move PGTABLE_SIZE out of the ASSEMBLY section - Run pte splitting code in separate page table - Replace is_level in count_required_pts with more obvious pte_type
v3 -> v4:
- Lv3 PTEs are always of type table --- arch/arm/cpu/armv8/cache.S | 54 +++++ arch/arm/cpu/armv8/cache_v8.c | 453 ++++++++++++++++++++++++++++++++----- arch/arm/include/asm/armv8/mmu.h | 68 +++--- arch/arm/include/asm/global_data.h | 5 +- arch/arm/include/asm/system.h | 14 +- include/configs/thunderx_88xx.h | 14 +- 6 files changed, 500 insertions(+), 108 deletions(-)
diff --git a/arch/arm/cpu/armv8/cache.S b/arch/arm/cpu/armv8/cache.S index ab8c089..a9f4fec 100644 --- a/arch/arm/cpu/armv8/cache.S +++ b/arch/arm/cpu/armv8/cache.S @@ -10,6 +10,7 @@ #include <asm-offsets.h> #include <config.h> #include <asm/macro.h> +#include <asm/system.h> #include <linux/linkage.h>
/* @@ -160,3 +161,56 @@ ENTRY(__asm_flush_l3_cache) ret ENDPROC(__asm_flush_l3_cache) .weak __asm_flush_l3_cache + +/* + * void __asm_switch_ttbr(ulong new_ttbr) + * + * Safely switches to a new page table. + */ +ENTRY(__asm_switch_ttbr) + /* x2 = SCTLR (alive throghout the function) */ + switch_el x4, 3f, 2f, 1f +3: mrs x2, sctlr_el3 + b 0f +2: mrs x2, sctlr_el2 + b 0f +1: mrs x2, sctlr_el1 +0: + + /* Unset CR_M | CR_C | CR_I from SCTLR to disable all caches */ + movn x1, #(CR_M | CR_C | CR_I) + and x1, x2, x1 + switch_el x4, 3f, 2f, 1f +3: msr sctlr_el3, x1 + b 0f +2: msr sctlr_el2, x1 + b 0f +1: msr sctlr_el1, x1 +0: isb + + /* This call only clobbers x30 (lr) and x9 (unused) */ + mov x3, x30 + bl __asm_invalidate_tlb_all + + /* From here on we're running safely with caches disabled */ + + /* Set TTBR to our first argument */ + switch_el x4, 3f, 2f, 1f +3: msr ttbr0_el3, x0 + b 0f +2: msr ttbr0_el2, x0 + b 0f +1: msr ttbr0_el1, x0 +0: isb + + /* Restore original SCTLR and thus enable caches again */ + switch_el x4, 3f, 2f, 1f +3: msr sctlr_el3, x2 + b 0f +2: msr sctlr_el2, x2 + b 0f +1: msr sctlr_el1, x2 +0: isb + + ret x3 +ENDPROC(__asm_switch_ttbr) diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c index d92f2d1..73628c9 100644 --- a/arch/arm/cpu/armv8/cache_v8.c +++ b/arch/arm/cpu/armv8/cache_v8.c @@ -2,6 +2,9 @@ * (C) Copyright 2013 * David Feng fenghua@phytium.com.cn * + * (C) Copyright 2016 + * Alexander Graf agraf@suse.de + * * SPDX-License-Identifier: GPL-2.0+ */
@@ -13,31 +16,28 @@ DECLARE_GLOBAL_DATA_PTR;
#ifndef CONFIG_SYS_DCACHE_OFF
-#ifdef CONFIG_SYS_FULL_VA -static void set_ptl1_entry(u64 index, u64 ptl2_entry) -{ - u64 *pgd = (u64 *)gd->arch.tlb_addr; - u64 value; - - value = ptl2_entry | PTL1_TYPE_TABLE; - pgd[index] = value; -} - -static void set_ptl2_block(u64 ptl1, u64 bfn, u64 address, u64 memory_attrs) -{ - u64 *pmd = (u64 *)ptl1; - u64 value; - - value = address | PTL2_TYPE_BLOCK | PTL2_BLOCK_AF; - value |= memory_attrs; - pmd[bfn] = value; -} +/* + * With 4k page granule, a virtual address is split into 4 lookup parts + * spanning 9 bits each: + * + * _______________________________________________ + * | | | | | | | + * | 0 | Lv0 | Lv1 | Lv2 | Lv3 | off | + * |_______|_______|_______|_______|_______|_______| + * 63-48 47-39 38-30 29-21 20-12 11-00 + * + * mask page size + * + * Lv0: FF8000000000 -- + * Lv1: 7FC0000000 1G + * Lv2: 3FE00000 2M + * Lv3: 1FF000 4K + * off: FFF + */
+#ifdef CONFIG_SYS_FULL_VA static struct mm_region mem_map[] = CONFIG_SYS_MEM_MAP;
-#define PTL1_ENTRIES CONFIG_SYS_PTL1_ENTRIES -#define PTL2_ENTRIES CONFIG_SYS_PTL2_ENTRIES - static u64 get_tcr(int el, u64 *pips, u64 *pva_bits) { u64 max_addr = 0; @@ -79,8 +79,8 @@ static u64 get_tcr(int el, u64 *pips, u64 *pva_bits) }
/* PTWs cacheable, inner/outer WBWA and inner shareable */ - tcr |= TCR_TG0_64K | TCR_SHARED_INNER | TCR_ORGN_WBWA | TCR_IRGN_WBWA; - tcr |= TCR_T0SZ(VA_BITS); + tcr |= TCR_TG0_4K | TCR_SHARED_INNER | TCR_ORGN_WBWA | TCR_IRGN_WBWA; + tcr |= TCR_T0SZ(va_bits);
if (pips) *pips = ips; @@ -90,39 +90,302 @@ static u64 get_tcr(int el, u64 *pips, u64 *pva_bits) return tcr; }
-static void setup_pgtables(void) +#define MAX_PTE_ENTRIES 512 + +static int pte_type(u64 *pte) +{ + return *pte & PTE_TYPE_MASK; +} + +/* Returns the LSB number for a PTE on level <level> */ +static int level2shift(int level) +{ + /* Page is 12 bits wide, every level translates 9 bits */ + return (12 + 9 * (3 - level)); +} + +static u64 *find_pte(u64 addr, int level) { - int l1_e, l2_e; - unsigned long pmd = 0; - unsigned long address; - - /* Setup the PMD pointers */ - for (l1_e = 0; l1_e < CONFIG_SYS_MEM_MAP_SIZE; l1_e++) { - gd->arch.pmd_addr[l1_e] = gd->arch.tlb_addr + - PTL1_ENTRIES * sizeof(u64); - gd->arch.pmd_addr[l1_e] += PTL2_ENTRIES * sizeof(u64) * l1_e; - gd->arch.pmd_addr[l1_e] = ALIGN(gd->arch.pmd_addr[l1_e], - 0x10000UL); + int start_level = 0; + u64 *pte; + u64 idx; + u64 va_bits; + int i; + + debug("addr=%llx level=%d\n", addr, level); + + get_tcr(0, NULL, &va_bits); + if (va_bits < 39) + start_level = 1; + + if (level < start_level) + return NULL; + + /* Walk through all page table levels to find our PTE */ + pte = (u64*)gd->arch.tlb_addr; + for (i = start_level; i < 4; i++) { + idx = (addr >> level2shift(i)) & 0x1FF; + pte += idx; + debug("idx=%llx PTE %p at level %d: %llx\n", idx, pte, i, *pte); + + /* Found it */ + if (i == level) + return pte; + /* PTE is no table (either invalid or block), can't traverse */ + if (pte_type(pte) != PTE_TYPE_TABLE) + return NULL; + /* Off to the next level */ + pte = (u64*)(*pte & 0x0000fffffffff000ULL); }
- /* Setup the page tables */ - for (l1_e = 0; l1_e < PTL1_ENTRIES; l1_e++) { - if (mem_map[pmd].base == - (uintptr_t)l1_e << PTL2_BITS) { - set_ptl1_entry(l1_e, gd->arch.pmd_addr[pmd]); - - for (l2_e = 0; l2_e < PTL2_ENTRIES; l2_e++) { - address = mem_map[pmd].base - + (uintptr_t)l2_e * BLOCK_SIZE; - set_ptl2_block(gd->arch.pmd_addr[pmd], l2_e, - address, mem_map[pmd].attrs); + /* Should never reach here */ + return NULL; +} + +/* Returns and creates a new full table (512 entries) */ +static u64 *create_table(void) +{ + u64 *new_table = (u64*)gd->arch.tlb_fillptr; + u64 pt_len = MAX_PTE_ENTRIES * sizeof(u64); + + /* Allocate MAX_PTE_ENTRIES pte entries */ + gd->arch.tlb_fillptr += pt_len; + + if (gd->arch.tlb_fillptr - gd->arch.tlb_addr > gd->arch.tlb_size) + panic("Insufficient RAM for page table: 0x%lx > 0x%lx. " + "Please increase the size in get_page_table_size()", + gd->arch.tlb_fillptr - gd->arch.tlb_addr, + gd->arch.tlb_size); + + /* Mark all entries as invalid */ + memset(new_table, 0, pt_len); + + return new_table; +} + +static void set_pte_table(u64 *pte, u64 *table) +{ + /* Point *pte to the new table */ + debug("Setting %p to addr=%p\n", pte, table); + *pte = PTE_TYPE_TABLE | (ulong)table; +} + +/* Add one mm_region map entry to the page tables */ +static void add_map(struct mm_region *map) +{ + u64 *pte; + u64 addr = map->base; + u64 size = map->size; + u64 attrs = map->attrs | PTE_TYPE_BLOCK | PTE_BLOCK_AF; + u64 blocksize; + int level; + u64 *new_table; + + while (size) { + pte = find_pte(addr, 0); + if (pte && (pte_type(pte) == PTE_TYPE_FAULT)) { + debug("Creating table for addr 0x%llx\n", addr); + new_table = create_table(); + set_pte_table(pte, new_table); + } + + for (level = 1; level < 4; level++) { + pte = find_pte(addr, level); + blocksize = 1ULL << level2shift(level); + debug("Checking if pte fits for addr=%llx size=%llx " + "blocksize=%llx\n", addr, size, blocksize); + if (size >= blocksize && !(addr & (blocksize - 1))) { + /* Page fits, create block PTE */ + debug("Setting PTE %p to block addr=%llx\n", + pte, addr); + *pte = addr | attrs; + addr += blocksize; + size -= blocksize; + break; + } else if ((pte_type(pte) == PTE_TYPE_FAULT)) { + /* Page doesn't fit, create subpages */ + debug("Creating subtable for addr 0x%llx " + "blksize=%llx\n", addr, blocksize); + new_table = create_table(); + set_pte_table(pte, new_table); } + } + } +} + +/* Splits a block PTE into table with subpages spanning the old block */ +static void split_block(u64 *pte, int level) +{ + u64 old_pte = *pte; + u64 *new_table; + u64 i = 0; + /* level describes the parent level, we need the child ones */ + int levelshift = level2shift(level + 1); + + if (pte_type(pte) != PTE_TYPE_BLOCK) + panic("PTE %p (%llx) is not a block. Some driver code wants to " + "modify dcache settings for an range not covered in " + "mem_map.", pte, old_pte); + + new_table = create_table(); + debug("Splitting pte %p (%llx) into %p\n", pte, old_pte, new_table); + + for (i = 0; i < MAX_PTE_ENTRIES; i++) { + new_table[i] = old_pte | (i << levelshift); + + /* Level 3 block PTEs have the table type */ + if ((level + 1) == 3) + new_table[i] |= PTE_TYPE_TABLE; + + debug("Setting new_table[%lld] = %llx\n", i, new_table[i]); + } + + /* Set the new table into effect */ + set_pte_table(pte, new_table); +} + +enum pte_type { + PTE_INVAL, + PTE_BLOCK, + PTE_LEVEL, +};
- pmd++; - } else { - set_ptl1_entry(l1_e, 0); +/* + * This is a recursively called function to count the number of + * page tables we need to cover a particular PTE range. If you + * call this with level = -1 you basically get the full 48 bit + * coverage. + */ +static int count_required_pts(u64 addr, int level, u64 maxaddr) +{ + int levelshift = level2shift(level); + u64 levelsize = 1ULL << levelshift; + u64 levelmask = levelsize - 1; + u64 levelend = addr + levelsize; + int r = 0; + int i; + enum pte_type pte_type = PTE_INVAL; + + for (i = 0; i < ARRAY_SIZE(mem_map); i++) { + struct mm_region *map = &mem_map[i]; + u64 start = map->base; + u64 end = start + map->size; + + /* Check if the PTE would overlap with the map */ + if (max(addr, start) <= min(levelend, end)) { + start = max(addr, start); + end = min(levelend, end); + + /* We need a sub-pt for this level */ + if ((start & levelmask) || (end & levelmask)) { + pte_type = PTE_LEVEL; + break; + } + + /* Lv0 can not do block PTEs, so do levels here too */ + if (level <= 0) { + pte_type = PTE_LEVEL; + break; + } + + /* PTE is active, but fits into a block */ + pte_type = PTE_BLOCK; } } + + /* + * Block PTEs at this level are already covered by the parent page + * table, so we only need to count sub page tables. + */ + if (pte_type == PTE_LEVEL) { + int sublevel = level + 1; + u64 sublevelsize = 1ULL << level2shift(sublevel); + + /* Account for the new sub page table ... */ + r = 1; + + /* ... and for all child page tables that one might have */ + for (i = 0; i < MAX_PTE_ENTRIES; i++) { + r += count_required_pts(addr, sublevel, maxaddr); + addr += sublevelsize; + + if (addr >= maxaddr) { + /* + * We reached the end of address space, no need + * to look any further. + */ + break; + } + } + } + + return r; +} + +/* Returns the estimated required size of all page tables */ +u64 get_page_table_size(void) +{ + u64 one_pt = MAX_PTE_ENTRIES * sizeof(u64); + u64 size = 0; + u64 va_bits; + int start_level = 0; + + get_tcr(0, NULL, &va_bits); + if (va_bits < 39) + start_level = 1; + + /* Account for all page tables we would need to cover our memory map */ + size = one_pt * count_required_pts(0, start_level - 1, 1ULL << va_bits); + + /* + * We need to duplicate our page table once to have an emergency pt to + * resort to when splitting page tables later on + */ + size *= 2; + + /* + * We may need to split page tables later on if dcache settings change, + * so reserve up to 4 (random pick) page tables for that. + */ + size += one_pt * 4; + + return size; +} + +static void setup_pgtables(void) +{ + int i; + + /* + * Allocate the first level we're on with invalidate entries. + * If the starting level is 0 (va_bits >= 39), then this is our + * Lv0 page table, otherwise it's the entry Lv1 page table. + */ + create_table(); + + /* Now add all MMU table entries one after another to the table */ + for (i = 0; i < ARRAY_SIZE(mem_map); i++) + add_map(&mem_map[i]); + + /* Create the same thing once more for our emergency page table */ + create_table(); +} + +static void setup_all_pgtables(void) +{ + u64 tlb_addr = gd->arch.tlb_addr; + + /* Reset the fill ptr */ + gd->arch.tlb_fillptr = tlb_addr; + + /* Create normal system page tables */ + setup_pgtables(); + + /* Create emergency page tables */ + gd->arch.tlb_addr = gd->arch.tlb_fillptr; + setup_pgtables(); + gd->arch.tlb_emerg = gd->arch.tlb_addr; + gd->arch.tlb_addr = tlb_addr; }
#else @@ -157,11 +420,9 @@ __weak void mmu_setup(void) int el;
#ifdef CONFIG_SYS_FULL_VA - unsigned long coreid = read_mpidr() & CONFIG_COREID_MASK; - - /* Set up page tables only on BSP */ - if (coreid == BSP_COREID) - setup_pgtables(); + /* Set up page tables only once */ + if (!gd->arch.tlb_fillptr) + setup_all_pgtables();
el = current_el(); set_ttbr_tcr_mair(el, gd->arch.tlb_addr, get_tcr(el, NULL, NULL), @@ -311,6 +572,88 @@ void mmu_set_region_dcache_behaviour(phys_addr_t start, size_t size, flush_dcache_range(start, end); asm volatile("dsb sy"); } +#else +static bool is_aligned(u64 addr, u64 size, u64 align) +{ + return !(addr & (align - 1)) && !(size & (align - 1)); +} + +static u64 set_one_region(u64 start, u64 size, u64 attrs, int level) +{ + int levelshift = level2shift(level); + u64 levelsize = 1ULL << levelshift; + u64 *pte = find_pte(start, level); + + /* Can we can just modify the current level block PTE? */ + if (is_aligned(start, size, levelsize)) { + *pte &= ~PMD_ATTRINDX_MASK; + *pte |= attrs; + debug("Set attrs=%llx pte=%p level=%d\n", attrs, pte, level); + + return levelsize; + } + + /* Unaligned or doesn't fit, maybe split block into table */ + debug("addr=%llx level=%d pte=%p (%llx)\n", start, level, pte, *pte); + + /* Maybe we need to split the block into a table */ + if (pte_type(pte) == PTE_TYPE_BLOCK) + split_block(pte, level); + + /* And then double-check it became a table or already is one */ + if (pte_type(pte) != PTE_TYPE_TABLE) + panic("PTE %p (%llx) for addr=%llx should be a table", + pte, *pte, start); + + /* Roll on to the next page table level */ + return 0; +} + +void mmu_set_region_dcache_behaviour(phys_addr_t start, size_t size, + enum dcache_option option) +{ + u64 attrs = PMD_ATTRINDX(option); + u64 real_start = start; + u64 real_size = size; + + debug("start=%lx size=%lx\n", (ulong)start, (ulong)size); + + /* + * We can not modify page tables that we're currently running on, + * so we first need to switch to the "emergency" page tables where + * we can safely modify our primary page tables and then switch back + */ + __asm_switch_ttbr(gd->arch.tlb_emerg); + + /* + * Loop through the address range until we find a page granule that fits + * our alignment constraints, then set it to the new cache attributes + */ + while (size > 0) { + int level; + u64 r; + + for (level = 1; level < 4; level++) { + r = set_one_region(start, size, attrs, level); + if (r) { + /* PTE successfully replaced */ + size -= r; + start += r; + break; + } + } + + } + + /* We're done modifying page tables, switch back to our primary ones */ + __asm_switch_ttbr(gd->arch.tlb_addr); + + /* + * Make sure there's nothing stale in dcache for a region that might + * have caches off now + */ + flush_dcache_range(real_start, real_start + real_size); +} #endif
#else /* CONFIG_SYS_DCACHE_OFF */ diff --git a/arch/arm/include/asm/armv8/mmu.h b/arch/arm/include/asm/armv8/mmu.h index 0080ae6..1c490dc 100644 --- a/arch/arm/include/asm/armv8/mmu.h +++ b/arch/arm/include/asm/armv8/mmu.h @@ -26,15 +26,9 @@ #define VA_BITS (42) /* 42 bits virtual address */ #else #define VA_BITS CONFIG_SYS_VA_BITS -#define PTL2_BITS CONFIG_SYS_PTL2_BITS +#define PTE_BLOCK_BITS CONFIG_SYS_PTL2_BITS #endif
-/* PAGE_SHIFT determines the page size */ -#undef PAGE_SIZE -#define PAGE_SHIFT 16 -#define PAGE_SIZE (1 << PAGE_SHIFT) -#define PAGE_MASK (~(PAGE_SIZE-1)) - /* * block/section address mask and size definitions. */ @@ -42,10 +36,21 @@ #define SECTION_SHIFT 29 #define SECTION_SIZE (UL(1) << SECTION_SHIFT) #define SECTION_MASK (~(SECTION_SIZE-1)) + +/* PAGE_SHIFT determines the page size */ +#undef PAGE_SIZE +#define PAGE_SHIFT 16 +#define PAGE_SIZE (1 << PAGE_SHIFT) +#define PAGE_MASK (~(PAGE_SIZE-1)) + #else -#define BLOCK_SHIFT CONFIG_SYS_BLOCK_SHIFT -#define BLOCK_SIZE (UL(1) << BLOCK_SHIFT) -#define BLOCK_MASK (~(BLOCK_SIZE-1)) + +/* PAGE_SHIFT determines the page size */ +#undef PAGE_SIZE +#define PAGE_SHIFT 12 +#define PAGE_SIZE (1 << PAGE_SHIFT) +#define PAGE_MASK (~(PAGE_SIZE-1)) + #endif
/***************************************************************/ @@ -71,39 +76,28 @@ */
#ifdef CONFIG_SYS_FULL_VA -/* - * Level 1 descriptor (PGD). - */
-#define PTL1_TYPE_MASK (3 << 0) -#define PTL1_TYPE_TABLE (3 << 0) - -#define PTL1_TABLE_PXN (1UL << 59) -#define PTL1_TABLE_XN (1UL << 60) -#define PTL1_TABLE_AP (1UL << 61) -#define PTL1_TABLE_NS (1UL << 63) - - -/* - * Level 2 descriptor (PMD). - */ +#define PTE_TYPE_MASK (3 << 0) +#define PTE_TYPE_FAULT (0 << 0) +#define PTE_TYPE_TABLE (3 << 0) +#define PTE_TYPE_BLOCK (1 << 0)
-#define PTL2_TYPE_MASK (3 << 0) -#define PTL2_TYPE_FAULT (0 << 0) -#define PTL2_TYPE_TABLE (3 << 0) -#define PTL2_TYPE_BLOCK (1 << 0) +#define PTE_TABLE_PXN (1UL << 59) +#define PTE_TABLE_XN (1UL << 60) +#define PTE_TABLE_AP (1UL << 61) +#define PTE_TABLE_NS (1UL << 63)
/* * Block */ -#define PTL2_MEMTYPE(x) ((x) << 2) -#define PTL2_BLOCK_NON_SHARE (0 << 8) -#define PTL2_BLOCK_OUTER_SHARE (2 << 8) -#define PTL2_BLOCK_INNER_SHARE (3 << 8) -#define PTL2_BLOCK_AF (1 << 10) -#define PTL2_BLOCK_NG (1 << 11) -#define PTL2_BLOCK_PXN (UL(1) << 53) -#define PTL2_BLOCK_UXN (UL(1) << 54) +#define PTE_BLOCK_MEMTYPE(x) ((x) << 2) +#define PTE_BLOCK_NON_SHARE (0 << 8) +#define PTE_BLOCK_OUTER_SHARE (2 << 8) +#define PTE_BLOCK_INNER_SHARE (3 << 8) +#define PTE_BLOCK_AF (1 << 10) +#define PTE_BLOCK_NG (1 << 11) +#define PTE_BLOCK_PXN (UL(1) << 53) +#define PTE_BLOCK_UXN (UL(1) << 54)
#else /* diff --git a/arch/arm/include/asm/global_data.h b/arch/arm/include/asm/global_data.h index dcfa098..259daa1 100644 --- a/arch/arm/include/asm/global_data.h +++ b/arch/arm/include/asm/global_data.h @@ -38,10 +38,11 @@ struct arch_global_data { unsigned long long timer_reset_value; #if !(defined(CONFIG_SYS_ICACHE_OFF) && defined(CONFIG_SYS_DCACHE_OFF)) unsigned long tlb_addr; + unsigned long tlb_size; #if defined(CONFIG_SYS_FULL_VA) - unsigned long pmd_addr[CONFIG_SYS_PTL1_ENTRIES]; + unsigned long tlb_fillptr; + unsigned long tlb_emerg; #endif - unsigned long tlb_size; #endif
#ifdef CONFIG_OMAP_COMMON diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h index 026e7ef..9b1cbf2 100644 --- a/arch/arm/include/asm/system.h +++ b/arch/arm/include/asm/system.h @@ -17,18 +17,19 @@ #define CR_WXN (1 << 19) /* Write Permision Imply XN */ #define CR_EE (1 << 25) /* Exception (Big) Endian */
-#ifndef CONFIG_SYS_FULL_VA -#define PGTABLE_SIZE (0x10000) -#else -#define PGTABLE_SIZE CONFIG_SYS_PGTABLE_SIZE -#endif - /* 2MB granularity */ #define MMU_SECTION_SHIFT 21 #define MMU_SECTION_SIZE (1 << MMU_SECTION_SHIFT)
#ifndef __ASSEMBLY__
+#ifndef CONFIG_SYS_FULL_VA +#define PGTABLE_SIZE (0x10000) +#else +u64 get_page_table_size(void); +#define PGTABLE_SIZE get_page_table_size() +#endif + enum dcache_option { DCACHE_OFF = 0x3, }; @@ -97,6 +98,7 @@ void __asm_flush_dcache_range(u64 start, u64 end); void __asm_invalidate_tlb_all(void); void __asm_invalidate_icache_all(void); int __asm_flush_l3_cache(void); +void __asm_switch_ttbr(u64 new_ttbr);
void armv8_switch_to_el2(void); void armv8_switch_to_el1(void); diff --git a/include/configs/thunderx_88xx.h b/include/configs/thunderx_88xx.h index b9f93ad..20b25f7 100644 --- a/include/configs/thunderx_88xx.h +++ b/include/configs/thunderx_88xx.h @@ -22,21 +22,19 @@
#define MEM_BASE 0x00500000
-#define CONFIG_COREID_MASK 0xffffff - #define CONFIG_SYS_FULL_VA
#define CONFIG_SYS_LOWMEM_BASE MEM_BASE
#define CONFIG_SYS_MEM_MAP {{0x000000000000UL, 0x40000000000UL, \ - PTL2_MEMTYPE(MT_NORMAL) | \ - PTL2_BLOCK_NON_SHARE}, \ + PTE_BLOCK_MEMTYPE(MT_NORMAL) | \ + PTE_BLOCK_NON_SHARE}, \ {0x800000000000UL, 0x40000000000UL, \ - PTL2_MEMTYPE(MT_DEVICE_NGNRNE) | \ - PTL2_BLOCK_NON_SHARE}, \ + PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | \ + PTE_BLOCK_NON_SHARE}, \ {0x840000000000UL, 0x40000000000UL, \ - PTL2_MEMTYPE(MT_DEVICE_NGNRNE) | \ - PTL2_BLOCK_NON_SHARE}, \ + PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | \ + PTE_BLOCK_NON_SHARE}, \ }
#define CONFIG_SYS_MEM_MAP_SIZE 3

The MMU range table can vary depending on things we may only find out at runtime. While the very simple ThunderX variant does not change, other boards will, so move the definition from a static entry in a header file to the board file.
Signed-off-by: Alexander Graf agraf@suse.de --- arch/arm/cpu/armv8/cache_v8.c | 8 +++----- arch/arm/include/asm/armv8/mmu.h | 2 ++ board/cavium/thunderx/thunderx.c | 24 ++++++++++++++++++++++++ include/configs/thunderx_88xx.h | 11 ----------- 4 files changed, 29 insertions(+), 16 deletions(-)
diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c index 73628c9..55c6f2f 100644 --- a/arch/arm/cpu/armv8/cache_v8.c +++ b/arch/arm/cpu/armv8/cache_v8.c @@ -36,8 +36,6 @@ DECLARE_GLOBAL_DATA_PTR; */
#ifdef CONFIG_SYS_FULL_VA -static struct mm_region mem_map[] = CONFIG_SYS_MEM_MAP; - static u64 get_tcr(int el, u64 *pips, u64 *pva_bits) { u64 max_addr = 0; @@ -46,7 +44,7 @@ static u64 get_tcr(int el, u64 *pips, u64 *pva_bits) int i;
/* Find the largest address we need to support */ - for (i = 0; i < ARRAY_SIZE(mem_map); i++) + for (i = 0; mem_map[i].size || mem_map[i].attrs; i++) max_addr = max(max_addr, mem_map[i].base + mem_map[i].size);
/* Calculate the maximum physical (and thus virtual) address */ @@ -266,7 +264,7 @@ static int count_required_pts(u64 addr, int level, u64 maxaddr) int i; enum pte_type pte_type = PTE_INVAL;
- for (i = 0; i < ARRAY_SIZE(mem_map); i++) { + for (i = 0; mem_map[i].size || mem_map[i].attrs; i++) { struct mm_region *map = &mem_map[i]; u64 start = map->base; u64 end = start + map->size; @@ -364,7 +362,7 @@ static void setup_pgtables(void) create_table();
/* Now add all MMU table entries one after another to the table */ - for (i = 0; i < ARRAY_SIZE(mem_map); i++) + for (i = 0; mem_map[i].size || mem_map[i].attrs; i++) add_map(&mem_map[i]);
/* Create the same thing once more for our emergency page table */ diff --git a/arch/arm/include/asm/armv8/mmu.h b/arch/arm/include/asm/armv8/mmu.h index 1c490dc..06126c8 100644 --- a/arch/arm/include/asm/armv8/mmu.h +++ b/arch/arm/include/asm/armv8/mmu.h @@ -203,6 +203,8 @@ struct mm_region { u64 size; u64 attrs; }; + +extern struct mm_region *mem_map; #endif
#endif /* _ASM_ARMV8_MMU_H_ */ diff --git a/board/cavium/thunderx/thunderx.c b/board/cavium/thunderx/thunderx.c index b926767..9131a38 100644 --- a/board/cavium/thunderx/thunderx.c +++ b/board/cavium/thunderx/thunderx.c @@ -10,6 +10,7 @@ #include <linux/compiler.h>
#include <cavium/atf.h> +#include <asm/armv8/mmu.h>
#if !CONFIG_IS_ENABLED(OF_CONTROL) #include <dm/platdata.h> @@ -42,6 +43,29 @@ U_BOOT_DEVICE(thunderx_serial1) = {
DECLARE_GLOBAL_DATA_PTR;
+static struct mm_region thunderx_mem_map[] = { + { + .base = 0x000000000000UL, + .size = 0x40000000000UL, + .attrs = PTE_BLOCK_MEMTYPE(MT_NORMAL) | PTE_BLOCK_NON_SHARE, + }, { + .base = 0x800000000000UL, + .size = 0x40000000000UL, + .attrs = PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | + PTE_BLOCK_NON_SHARE, + }, { + .base = 0x840000000000UL, + .size = 0x40000000000UL, + .attrs = PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | + PTE_BLOCK_NON_SHARE, + }, { + /* List terminator */ + 0, + } +}; + +struct mm_region *mem_map = thunderx_mem_map; + int board_init(void) { return 0; diff --git a/include/configs/thunderx_88xx.h b/include/configs/thunderx_88xx.h index 20b25f7..64e4616 100644 --- a/include/configs/thunderx_88xx.h +++ b/include/configs/thunderx_88xx.h @@ -26,17 +26,6 @@
#define CONFIG_SYS_LOWMEM_BASE MEM_BASE
-#define CONFIG_SYS_MEM_MAP {{0x000000000000UL, 0x40000000000UL, \ - PTE_BLOCK_MEMTYPE(MT_NORMAL) | \ - PTE_BLOCK_NON_SHARE}, \ - {0x800000000000UL, 0x40000000000UL, \ - PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | \ - PTE_BLOCK_NON_SHARE}, \ - {0x840000000000UL, 0x40000000000UL, \ - PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | \ - PTE_BLOCK_NON_SHARE}, \ - } - #define CONFIG_SYS_MEM_MAP_SIZE 3
#define CONFIG_SYS_VA_BITS 48

Now that we have nice table driven page table creating code that gives us everything we need, move to that.
Signed-off-by: Alexander Graf agraf@suse.de
---
v1 -> v2:
- Move mmu tables into board file --- arch/arm/cpu/armv8/zynqmp/cpu.c | 217 +++++++++------------------------------- include/configs/xilinx_zynqmp.h | 2 + 2 files changed, 50 insertions(+), 169 deletions(-)
diff --git a/arch/arm/cpu/armv8/zynqmp/cpu.c b/arch/arm/cpu/armv8/zynqmp/cpu.c index c71f291..5dd3cd8 100644 --- a/arch/arm/cpu/armv8/zynqmp/cpu.c +++ b/arch/arm/cpu/armv8/zynqmp/cpu.c @@ -8,6 +8,7 @@ #include <common.h> #include <asm/arch/hardware.h> #include <asm/arch/sys_proto.h> +#include <asm/armv8/mmu.h> #include <asm/io.h>
#define ZYNQ_SILICON_VER_MASK 0xF000 @@ -15,6 +16,53 @@
DECLARE_GLOBAL_DATA_PTR;
+static struct mm_region zynqmp_mem_map[] = { + { + .base = 0x0UL, + .size = 0x80000000UL, + .attrs = PTE_BLOCK_MEMTYPE(MT_NORMAL) | + PTE_BLOCK_INNER_SHARE + }, { + .base = 0x80000000UL, + .size = 0x70000000UL, + .attrs = PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | + PTE_BLOCK_NON_SHARE | + PTE_BLOCK_PXN | PTE_BLOCK_UXN + }, { + .base = 0xf8000000UL, + .size = 0x07e00000UL, + .attrs = PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | + PTE_BLOCK_NON_SHARE | + PTE_BLOCK_PXN | PTE_BLOCK_UXN + }, { + .base = 0xffe00000UL, + .size = 0x00200000UL, + .attrs = PTE_BLOCK_MEMTYPE(MT_NORMAL) | + PTE_BLOCK_INNER_SHARE + }, { + .base = 0x400000000UL, + .size = 0x200000000UL, + .attrs = PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | + PTE_BLOCK_NON_SHARE | + PTE_BLOCK_PXN | PTE_BLOCK_UXN + }, { + .base = 0x600000000UL, + .size = 0x800000000UL, + .attrs = PTE_BLOCK_MEMTYPE(MT_NORMAL) | + PTE_BLOCK_INNER_SHARE + }, { + .base = 0xe00000000UL, + .size = 0xf200000000UL, + .attrs = PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | + PTE_BLOCK_NON_SHARE | + PTE_BLOCK_PXN | PTE_BLOCK_UXN + }, { + /* List terminator */ + 0, + } +}; +struct mm_region *mem_map = zynqmp_mem_map; + static unsigned int zynqmp_get_silicon_version_secure(void) { u32 ver; @@ -44,172 +92,3 @@ unsigned int zynqmp_get_silicon_version(void)
return ZYNQMP_CSU_VERSION_SILICON; } - -#ifndef CONFIG_SYS_DCACHE_OFF -#include <asm/armv8/mmu.h> - -#define SECTION_SHIFT_L1 30UL -#define SECTION_SHIFT_L2 21UL -#define BLOCK_SIZE_L0 0x8000000000UL -#define BLOCK_SIZE_L1 (1 << SECTION_SHIFT_L1) -#define BLOCK_SIZE_L2 (1 << SECTION_SHIFT_L2) - -#define TCR_TG1_4K (1 << 31) -#define TCR_EPD1_DISABLE (1 << 23) -#define ZYNQMO_VA_BITS 40 -#define ZYNQMP_TCR TCR_TG1_4K | \ - TCR_EPD1_DISABLE | \ - TCR_SHARED_OUTER | \ - TCR_SHARED_INNER | \ - TCR_IRGN_WBWA | \ - TCR_ORGN_WBWA | \ - TCR_T0SZ(ZYNQMO_VA_BITS) - -#define MEMORY_ATTR PMD_SECT_AF | PMD_SECT_INNER_SHARE | \ - PMD_ATTRINDX(MT_NORMAL) | \ - PMD_TYPE_SECT -#define DEVICE_ATTR PMD_SECT_AF | PMD_SECT_PXN | \ - PMD_SECT_UXN | PMD_ATTRINDX(MT_DEVICE_NGNRNE) | \ - PMD_TYPE_SECT - -/* 4K size is required to place 512 entries in each level */ -#define TLB_TABLE_SIZE 0x1000 - -struct attr_tbl { - u32 num; - u64 attr; -}; - -static struct attr_tbl attr_tbll1t0[4] = { {16, 0x0}, - {8, DEVICE_ATTR}, - {32, MEMORY_ATTR}, - {456, DEVICE_ATTR} - }; -static struct attr_tbl attr_tbll2t3[4] = { {0x180, DEVICE_ATTR}, - {0x40, 0x0}, - {0x3F, DEVICE_ATTR}, - {0x1, MEMORY_ATTR} - }; - -/* - * This mmu table looks as below - * Level 0 table contains two entries to 512GB sizes. One is Level1 Table 0 - * and other Level1 Table1. - * Level1 Table0 contains entries for each 1GB from 0 to 511GB. - * Level1 Table1 contains entries for each 1GB from 512GB to 1TB. - * Level2 Table0, Level2 Table1, Level2 Table2 and Level2 Table3 contains - * entries for each 2MB starting from 0GB, 1GB, 2GB and 3GB respectively. - */ -static void zynqmp_mmu_setup(void) -{ - int el; - u32 index_attr; - u64 i, section_l1t0, section_l1t1; - u64 section_l2t0, section_l2t1, section_l2t2, section_l2t3; - u64 *level0_table = (u64 *)gd->arch.tlb_addr; - u64 *level1_table_0 = (u64 *)(gd->arch.tlb_addr + TLB_TABLE_SIZE); - u64 *level1_table_1 = (u64 *)(gd->arch.tlb_addr + (2 * TLB_TABLE_SIZE)); - u64 *level2_table_0 = (u64 *)(gd->arch.tlb_addr + (3 * TLB_TABLE_SIZE)); - u64 *level2_table_1 = (u64 *)(gd->arch.tlb_addr + (4 * TLB_TABLE_SIZE)); - u64 *level2_table_2 = (u64 *)(gd->arch.tlb_addr + (5 * TLB_TABLE_SIZE)); - u64 *level2_table_3 = (u64 *)(gd->arch.tlb_addr + (6 * TLB_TABLE_SIZE)); - - level0_table[0] = - (u64)level1_table_0 | PMD_TYPE_TABLE; - level0_table[1] = - (u64)level1_table_1 | PMD_TYPE_TABLE; - - /* - * set level 1 table 0, covering 0 to 512GB - * set level 1 table 1, covering 512GB to 1TB - */ - section_l1t0 = 0; - section_l1t1 = BLOCK_SIZE_L0; - - index_attr = 0; - for (i = 0; i < 512; i++) { - level1_table_0[i] = section_l1t0; - level1_table_0[i] |= attr_tbll1t0[index_attr].attr; - attr_tbll1t0[index_attr].num--; - if (attr_tbll1t0[index_attr].num == 0) - index_attr++; - level1_table_1[i] = section_l1t1; - level1_table_1[i] |= DEVICE_ATTR; - section_l1t0 += BLOCK_SIZE_L1; - section_l1t1 += BLOCK_SIZE_L1; - } - - level1_table_0[0] = - (u64)level2_table_0 | PMD_TYPE_TABLE; - level1_table_0[1] = - (u64)level2_table_1 | PMD_TYPE_TABLE; - level1_table_0[2] = - (u64)level2_table_2 | PMD_TYPE_TABLE; - level1_table_0[3] = - (u64)level2_table_3 | PMD_TYPE_TABLE; - - section_l2t0 = 0; - section_l2t1 = section_l2t0 + BLOCK_SIZE_L1; /* 1GB */ - section_l2t2 = section_l2t1 + BLOCK_SIZE_L1; /* 2GB */ - section_l2t3 = section_l2t2 + BLOCK_SIZE_L1; /* 3GB */ - - index_attr = 0; - - for (i = 0; i < 512; i++) { - level2_table_0[i] = section_l2t0 | MEMORY_ATTR; - level2_table_1[i] = section_l2t1 | MEMORY_ATTR; - level2_table_2[i] = section_l2t2 | DEVICE_ATTR; - level2_table_3[i] = section_l2t3 | - attr_tbll2t3[index_attr].attr; - attr_tbll2t3[index_attr].num--; - if (attr_tbll2t3[index_attr].num == 0) - index_attr++; - section_l2t0 += BLOCK_SIZE_L2; - section_l2t1 += BLOCK_SIZE_L2; - section_l2t2 += BLOCK_SIZE_L2; - section_l2t3 += BLOCK_SIZE_L2; - } - - /* flush new MMU table */ - flush_dcache_range(gd->arch.tlb_addr, - gd->arch.tlb_addr + gd->arch.tlb_size); - - /* point TTBR to the new table */ - el = current_el(); - set_ttbr_tcr_mair(el, gd->arch.tlb_addr, - ZYNQMP_TCR, MEMORY_ATTRIBUTES); - - set_sctlr(get_sctlr() | CR_M); -} - -int arch_cpu_init(void) -{ - icache_enable(); - __asm_invalidate_dcache_all(); - __asm_invalidate_tlb_all(); - return 0; -} - -/* - * This function is called from lib/board.c. - * It recreates MMU table in main memory. MMU and d-cache are enabled earlier. - * There is no need to disable d-cache for this operation. - */ -void enable_caches(void) -{ - /* The data cache is not active unless the mmu is enabled */ - if (!(get_sctlr() & CR_M)) { - invalidate_dcache_all(); - __asm_invalidate_tlb_all(); - zynqmp_mmu_setup(); - } - puts("Enabling Caches...\n"); - - set_sctlr(get_sctlr() | CR_C); -} - -u64 *arch_get_page_table(void) -{ - return (u64 *)(gd->arch.tlb_addr + 0x3000); -} -#endif diff --git a/include/configs/xilinx_zynqmp.h b/include/configs/xilinx_zynqmp.h index da868b8..08f430c 100644 --- a/include/configs/xilinx_zynqmp.h +++ b/include/configs/xilinx_zynqmp.h @@ -29,6 +29,8 @@ #define CONFIG_SYS_MEMTEST_START CONFIG_SYS_SDRAM_BASE #define CONFIG_SYS_MEMTEST_END CONFIG_SYS_SDRAM_SIZE
+#define CONFIG_SYS_FULL_VA + /* Have release address at the end of 256MB for now */ #define CPU_RELEASE_ADDR 0xFFFFFF0

Now that we have nice table driven page table creating code that gives us everything we need, move to that.
Signed-off-by: Alexander Graf agraf@suse.de
---
v1 -> v2:
- Move mmu tables into .c file --- arch/arm/mach-tegra/arm64-mmu.c | 132 +++++--------------------------------- include/configs/tegra210-common.h | 2 + 2 files changed, 19 insertions(+), 115 deletions(-)
diff --git a/arch/arm/mach-tegra/arm64-mmu.c b/arch/arm/mach-tegra/arm64-mmu.c index c227652..501c4f0 100644 --- a/arch/arm/mach-tegra/arm64-mmu.c +++ b/arch/arm/mach-tegra/arm64-mmu.c @@ -12,120 +12,22 @@ #include <asm/system.h> #include <asm/armv8/mmu.h>
-DECLARE_GLOBAL_DATA_PTR; - -#define SECTION_SHIFT_L1 30UL -#define SECTION_SHIFT_L2 21UL -#define BLOCK_SIZE_L0 0x8000000000UL -#define BLOCK_SIZE_L1 (1 << SECTION_SHIFT_L1) -#define BLOCK_SIZE_L2 (1 << SECTION_SHIFT_L2) - -#define TCR_TG1_4K (1 << 31) -#define TCR_EPD1_DISABLE (1 << 23) -#define TEGRA_VA_BITS 40 -#define TEGRA_TCR TCR_TG1_4K | \ - TCR_EPD1_DISABLE | \ - TCR_SHARED_OUTER | \ - TCR_SHARED_INNER | \ - TCR_IRGN_WBWA | \ - TCR_ORGN_WBWA | \ - TCR_T0SZ(TEGRA_VA_BITS) - -#define MEMORY_ATTR PMD_SECT_AF | PMD_SECT_INNER_SHARE | \ - PMD_ATTRINDX(MT_NORMAL) | \ - PMD_TYPE_SECT -#define DEVICE_ATTR PMD_SECT_AF | PMD_SECT_PXN | \ - PMD_SECT_UXN | PMD_ATTRINDX(MT_DEVICE_NGNRNE) | \ - PMD_TYPE_SECT - -/* 4K size is required to place 512 entries in each level */ -#define TLB_TABLE_SIZE 0x1000 - -/* - * This mmu table looks as below - * Level 0 table contains two entries to 512GB sizes. One is Level1 Table 0 - * and other Level1 Table1. - * Level1 Table0 contains entries for each 1GB from 0 to 511GB. - * Level1 Table1 contains entries for each 1GB from 512GB to 1TB. - * Level2 Table0, Level2 Table1, Level2 Table2 and Level2 Table3 contains - * entries for each 2MB starting from 0GB, 1GB, 2GB and 3GB respectively. - */ -void mmu_setup(void) -{ - int el; - u64 i, section_l1t0, section_l1t1; - u64 section_l2t0, section_l2t1, section_l2t2, section_l2t3; - u64 *level0_table = (u64 *)gd->arch.tlb_addr; - u64 *level1_table_0 = (u64 *)(gd->arch.tlb_addr + TLB_TABLE_SIZE); - u64 *level1_table_1 = (u64 *)(gd->arch.tlb_addr + (2 * TLB_TABLE_SIZE)); - u64 *level2_table_0 = (u64 *)(gd->arch.tlb_addr + (3 * TLB_TABLE_SIZE)); - u64 *level2_table_1 = (u64 *)(gd->arch.tlb_addr + (4 * TLB_TABLE_SIZE)); - u64 *level2_table_2 = (u64 *)(gd->arch.tlb_addr + (5 * TLB_TABLE_SIZE)); - u64 *level2_table_3 = (u64 *)(gd->arch.tlb_addr + (6 * TLB_TABLE_SIZE)); - - /* Invalidate all table entries */ - memset(level0_table, 0, PGTABLE_SIZE); - - level0_table[0] = - (u64)level1_table_0 | PMD_TYPE_TABLE; - level0_table[1] = - (u64)level1_table_1 | PMD_TYPE_TABLE; - - /* - * set level 1 table 0, covering 0 to 512GB - * set level 1 table 1, covering 512GB to 1TB - */ - section_l1t0 = 0; - section_l1t1 = BLOCK_SIZE_L0; - - for (i = 0; i < 512; i++) { - level1_table_0[i] = section_l1t0; - if (i >= 4) - level1_table_0[i] |= MEMORY_ATTR; - level1_table_1[i] = section_l1t1; - level1_table_1[i] |= MEMORY_ATTR; - section_l1t0 += BLOCK_SIZE_L1; - section_l1t1 += BLOCK_SIZE_L1; +static struct mm_region tegra_mem_map[] = { + { + .base = 0x0UL, + .size = 0x80000000UL, + .attrs = PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | + PTE_BLOCK_NON_SHARE | + PTE_BLOCK_PXN | PTE_BLOCK_UXN + }, { + .base = 0x80000000UL, + .size = 0xff80000000UL, + .attrs = PTE_BLOCK_MEMTYPE(MT_NORMAL) | + PTE_BLOCK_INNER_SHARE + }, { + /* List terminator */ + 0, } +};
- level1_table_0[0] = - (u64)level2_table_0 | PMD_TYPE_TABLE; - level1_table_0[1] = - (u64)level2_table_1 | PMD_TYPE_TABLE; - level1_table_0[2] = - (u64)level2_table_2 | PMD_TYPE_TABLE; - level1_table_0[3] = - (u64)level2_table_3 | PMD_TYPE_TABLE; - - section_l2t0 = 0; - section_l2t1 = section_l2t0 + BLOCK_SIZE_L1; /* 1GB */ - section_l2t2 = section_l2t1 + BLOCK_SIZE_L1; /* 2GB */ - section_l2t3 = section_l2t2 + BLOCK_SIZE_L1; /* 3GB */ - - for (i = 0; i < 512; i++) { - level2_table_0[i] = section_l2t0 | DEVICE_ATTR; - level2_table_1[i] = section_l2t1 | DEVICE_ATTR; - level2_table_2[i] = section_l2t2 | MEMORY_ATTR; - level2_table_3[i] = section_l2t3 | MEMORY_ATTR; - section_l2t0 += BLOCK_SIZE_L2; - section_l2t1 += BLOCK_SIZE_L2; - section_l2t2 += BLOCK_SIZE_L2; - section_l2t3 += BLOCK_SIZE_L2; - } - - /* flush new MMU table */ - flush_dcache_range(gd->arch.tlb_addr, - gd->arch.tlb_addr + gd->arch.tlb_size); - - /* point TTBR to the new table */ - el = current_el(); - set_ttbr_tcr_mair(el, gd->arch.tlb_addr, - TEGRA_TCR, MEMORY_ATTRIBUTES); - - set_sctlr(get_sctlr() | CR_M); -} - -u64 *arch_get_page_table(void) -{ - return (u64 *)(gd->arch.tlb_addr + (3 * TLB_TABLE_SIZE)); -} +struct mm_region *mem_map = tegra_mem_map; diff --git a/include/configs/tegra210-common.h b/include/configs/tegra210-common.h index 8f35a7b..2a6e317 100644 --- a/include/configs/tegra210-common.h +++ b/include/configs/tegra210-common.h @@ -13,6 +13,8 @@ /* Cortex-A57 uses a cache line size of 64 bytes */ #define CONFIG_SYS_CACHELINE_SIZE 64
+#define CONFIG_SYS_FULL_VA + /* * NS16550 Configuration */

There's no good excuse for running with caches disabled on AArch64, so let's just move the vexpress64 target to enable the MMU and run with caches on.
Signed-off-by: Alexander Graf agraf@suse.de
---
v1 -> v2:
- Move tables to .c file --- board/armltd/vexpress64/vexpress64.c | 21 +++++++++++++++++++++ include/configs/vexpress_aemv8a.h | 6 +++--- 2 files changed, 24 insertions(+), 3 deletions(-)
diff --git a/board/armltd/vexpress64/vexpress64.c b/board/armltd/vexpress64/vexpress64.c index 6efc8c1..973b579 100644 --- a/board/armltd/vexpress64/vexpress64.c +++ b/board/armltd/vexpress64/vexpress64.c @@ -14,6 +14,7 @@ #include <dm/platdata.h> #include <dm/platform_data/serial_pl01x.h> #include "pcie.h" +#include <asm/armv8/mmu.h>
DECLARE_GLOBAL_DATA_PTR;
@@ -28,6 +29,26 @@ U_BOOT_DEVICE(vexpress_serials) = { .platdata = &serial_platdata, };
+static struct mm_region vexpress64_mem_map[] = { + { + .base = 0x0UL, + .size = 0x80000000UL, + .attrs = PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | + PTE_BLOCK_NON_SHARE | + PTE_BLOCK_PXN | PTE_BLOCK_UXN + }, { + .base = 0x80000000UL, + .size = 0xff80000000UL, + .attrs = PTE_BLOCK_MEMTYPE(MT_NORMAL) | + PTE_BLOCK_INNER_SHARE + }, { + /* List terminator */ + 0, + } +}; + +struct mm_region *mem_map = vexpress64_mem_map; + /* This function gets replaced by platforms supporting PCIe. * The replacement function, eg. on Juno, initialises the PCIe bus. */ diff --git a/include/configs/vexpress_aemv8a.h b/include/configs/vexpress_aemv8a.h index 133041b..ddb9848 100644 --- a/include/configs/vexpress_aemv8a.h +++ b/include/configs/vexpress_aemv8a.h @@ -19,9 +19,9 @@
#define CONFIG_SUPPORT_RAW_INITRD
-/* Cache Definitions */ -#define CONFIG_SYS_DCACHE_OFF -#define CONFIG_SYS_ICACHE_OFF +/* MMU Definitions */ +#define CONFIG_SYS_CACHELINE_SIZE 64 +#define CONFIG_SYS_FULL_VA
#define CONFIG_IDENT_STRING " vexpress_aemv8a" #define CONFIG_BOOTP_VCI_STRING "U-Boot.armv8.vexpress_aemv8a"

When enable dcache on HiKey, we're running into MMC command timeouts because our retry loop is now faster than the eMMC (or an external SD card) can answer.
Increase the retry count to the same as the timeout value for status reports.
The real fix is obviously to not base this whole thing on a cycle counter but on real wall time, but that would be slightly more intrusive.
Signed-off-by: Alexander Graf agraf@suse.de --- drivers/mmc/dw_mmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mmc/dw_mmc.c b/drivers/mmc/dw_mmc.c index 909e3ca..7329f40 100644 --- a/drivers/mmc/dw_mmc.c +++ b/drivers/mmc/dw_mmc.c @@ -189,7 +189,7 @@ static int dwmci_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, data ? DIV_ROUND_UP(data->blocks, 8) : 0); int ret = 0, flags = 0, i; unsigned int timeout = 100000; - u32 retry = 10000; + u32 retry = 100000; u32 mask, ctrl; ulong start = get_timer(0); struct bounce_buffer bbstate;

The hikey runs with dcache disabled today. There really should be no reason not to use caches on AArch64, so let's add MMU definitions and enable the dcache.
Signed-off-by: Alexander Graf agraf@suse.de
---
v1 -> v2:
- Move tables to .c file --- board/hisilicon/hikey/hikey.c | 21 +++++++++++++++++++++ include/configs/hikey.h | 5 +++-- 2 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/board/hisilicon/hikey/hikey.c b/board/hisilicon/hikey/hikey.c index c4ae40b..1edc807 100644 --- a/board/hisilicon/hikey/hikey.c +++ b/board/hisilicon/hikey/hikey.c @@ -19,6 +19,7 @@ #include <asm/arch/periph.h> #include <asm/arch/pinmux.h> #include <asm/arch/hi6220.h> +#include <asm/armv8/mmu.h>
/*TODO drop this table in favour of device tree */ static const struct hikey_gpio_platdata hi6220_gpio[] = { @@ -87,6 +88,26 @@ U_BOOT_DEVICE(hikey_seriala) = { .platdata = &serial_platdata, };
+static struct mm_region hikey_mem_map[] = { + { + .base = 0x0UL, + .size = 0x80000000UL, + .attrs = PTE_BLOCK_MEMTYPE(MT_NORMAL) | + PTE_BLOCK_INNER_SHARE + }, { + .base = 0x80000000UL, + .size = 0x80000000UL, + .attrs = PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | + PTE_BLOCK_NON_SHARE | + PTE_BLOCK_PXN | PTE_BLOCK_UXN + }, { + /* List terminator */ + 0, + } +}; + +struct mm_region *mem_map = hikey_mem_map; + #ifdef CONFIG_BOARD_EARLY_INIT_F int board_uart_init(void) { diff --git a/include/configs/hikey.h b/include/configs/hikey.h index 796861e..d33dcef 100644 --- a/include/configs/hikey.h +++ b/include/configs/hikey.h @@ -21,8 +21,9 @@
#define CONFIG_SUPPORT_RAW_INITRD
-/* Cache Definitions */ -#define CONFIG_SYS_DCACHE_OFF +/* MMU Definitions */ +#define CONFIG_SYS_CACHELINE_SIZE 64 +#define CONFIG_SYS_FULL_VA
#define CONFIG_IDENT_STRING "hikey"

By now the code to only have a single page table level with 64k page size and 42 bit address space is no longer used by any board in tree, so we can safely remove it.
To clean up code, move the layerscape mmu code to the new defines, removing redundant field definitions.
Signed-off-by: Alexander Graf agraf@suse.de
---
v1 -> v2:
- Add layerscape empty mmu table in .c file --- arch/arm/cpu/armv8/cache_v8.c | 90 ------------------------ arch/arm/cpu/armv8/fsl-layerscape/cpu.c | 37 ++++++++-- arch/arm/include/asm/arch-fsl-layerscape/cpu.h | 94 +++++++++++++------------- arch/arm/include/asm/armv8/mmu.h | 66 +----------------- arch/arm/include/asm/global_data.h | 2 +- arch/arm/include/asm/system.h | 12 ++-- doc/README.arm64 | 20 ------ include/configs/hikey.h | 1 - include/configs/tegra210-common.h | 2 - include/configs/thunderx_88xx.h | 14 ---- include/configs/vexpress_aemv8a.h | 1 - include/configs/xilinx_zynqmp.h | 2 - 12 files changed, 85 insertions(+), 256 deletions(-)
diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c index 55c6f2f..df15e00 100644 --- a/arch/arm/cpu/armv8/cache_v8.c +++ b/arch/arm/cpu/armv8/cache_v8.c @@ -35,7 +35,6 @@ DECLARE_GLOBAL_DATA_PTR; * off: FFF */
-#ifdef CONFIG_SYS_FULL_VA static u64 get_tcr(int el, u64 *pips, u64 *pva_bits) { u64 max_addr = 0; @@ -386,38 +385,11 @@ static void setup_all_pgtables(void) gd->arch.tlb_addr = tlb_addr; }
-#else - -inline void set_pgtable_section(u64 *page_table, u64 index, u64 section, - u64 memory_type, u64 attribute) -{ - u64 value; - - value = section | PMD_TYPE_SECT | PMD_SECT_AF; - value |= PMD_ATTRINDX(memory_type); - value |= attribute; - page_table[index] = value; -} - -inline void set_pgtable_table(u64 *page_table, u64 index, u64 *table_addr) -{ - u64 value; - - value = (u64)table_addr | PMD_TYPE_TABLE; - page_table[index] = value; -} -#endif - /* to activate the MMU we need to set up virtual memory */ __weak void mmu_setup(void) { -#ifndef CONFIG_SYS_FULL_VA - bd_t *bd = gd->bd; - u64 *page_table = (u64 *)gd->arch.tlb_addr, i, j; -#endif int el;
-#ifdef CONFIG_SYS_FULL_VA /* Set up page tables only once */ if (!gd->arch.tlb_fillptr) setup_all_pgtables(); @@ -425,40 +397,6 @@ __weak void mmu_setup(void) el = current_el(); set_ttbr_tcr_mair(el, gd->arch.tlb_addr, get_tcr(el, NULL, NULL), MEMORY_ATTRIBUTES); -#else - /* Setup an identity-mapping for all spaces */ - for (i = 0; i < (PGTABLE_SIZE >> 3); i++) { - set_pgtable_section(page_table, i, i << SECTION_SHIFT, - MT_DEVICE_NGNRNE, PMD_SECT_NON_SHARE); - } - - /* Setup an identity-mapping for all RAM space */ - for (i = 0; i < CONFIG_NR_DRAM_BANKS; i++) { - ulong start = bd->bi_dram[i].start; - ulong end = bd->bi_dram[i].start + bd->bi_dram[i].size; - for (j = start >> SECTION_SHIFT; - j < end >> SECTION_SHIFT; j++) { - set_pgtable_section(page_table, j, j << SECTION_SHIFT, - MT_NORMAL, PMD_SECT_NON_SHARE); - } - } - - /* load TTBR0 */ - el = current_el(); - if (el == 1) { - set_ttbr_tcr_mair(el, gd->arch.tlb_addr, - TCR_EL1_RSVD | TCR_FLAGS | TCR_EL1_IPS_BITS, - MEMORY_ATTRIBUTES); - } else if (el == 2) { - set_ttbr_tcr_mair(el, gd->arch.tlb_addr, - TCR_EL2_RSVD | TCR_FLAGS | TCR_EL2_IPS_BITS, - MEMORY_ATTRIBUTES); - } else { - set_ttbr_tcr_mair(el, gd->arch.tlb_addr, - TCR_EL3_RSVD | TCR_FLAGS | TCR_EL3_IPS_BITS, - MEMORY_ATTRIBUTES); - } -#endif
/* enable the mmu */ set_sctlr(get_sctlr() | CR_M); @@ -544,33 +482,6 @@ u64 *__weak arch_get_page_table(void) { return NULL; }
-#ifndef CONFIG_SYS_FULL_VA -void mmu_set_region_dcache_behaviour(phys_addr_t start, size_t size, - enum dcache_option option) -{ - u64 *page_table = arch_get_page_table(); - u64 upto, end; - - if (page_table == NULL) - return; - - end = ALIGN(start + size, (1 << MMU_SECTION_SHIFT)) >> - MMU_SECTION_SHIFT; - start = start >> MMU_SECTION_SHIFT; - for (upto = start; upto < end; upto++) { - page_table[upto] &= ~PMD_ATTRINDX_MASK; - page_table[upto] |= PMD_ATTRINDX(option); - } - asm volatile("dsb sy"); - __asm_invalidate_tlb_all(); - asm volatile("dsb sy"); - asm volatile("isb"); - start = start << MMU_SECTION_SHIFT; - end = end << MMU_SECTION_SHIFT; - flush_dcache_range(start, end); - asm volatile("dsb sy"); -} -#else static bool is_aligned(u64 addr, u64 size, u64 align) { return !(addr & (align - 1)) && !(size & (align - 1)); @@ -652,7 +563,6 @@ void mmu_set_region_dcache_behaviour(phys_addr_t start, size_t size, */ flush_dcache_range(real_start, real_start + real_size); } -#endif
#else /* CONFIG_SYS_DCACHE_OFF */
diff --git a/arch/arm/cpu/armv8/fsl-layerscape/cpu.c b/arch/arm/cpu/armv8/fsl-layerscape/cpu.c index 6ea28ed..7404bd9 100644 --- a/arch/arm/cpu/armv8/fsl-layerscape/cpu.c +++ b/arch/arm/cpu/armv8/fsl-layerscape/cpu.c @@ -26,6 +26,14 @@
DECLARE_GLOBAL_DATA_PTR;
+static struct mm_region layerscape_mem_map[] = { + { + /* List terminator */ + 0, + } +}; +struct mm_region *mem_map = layerscape_mem_map; + void cpu_name(char *name) { struct ccsr_gur __iomem *gur = (void *)(CONFIG_SYS_FSL_GUTS_ADDR); @@ -48,6 +56,25 @@ void cpu_name(char *name) }
#ifndef CONFIG_SYS_DCACHE_OFF +static void set_pgtable_section(u64 *page_table, u64 index, u64 section, + u64 memory_type, u64 attribute) +{ + u64 value; + + value = section | PTE_TYPE_BLOCK | PTE_BLOCK_AF; + value |= PMD_ATTRINDX(memory_type); + value |= attribute; + page_table[index] = value; +} + +static void set_pgtable_table(u64 *page_table, u64 index, u64 *table_addr) +{ + u64 value; + + value = (u64)table_addr | PTE_TYPE_TABLE; + page_table[index] = value; +} + /* * Set the block entries according to the information of the table. */ @@ -114,10 +141,10 @@ static int find_table(const struct sys_mmu_table *list,
temp_base -= block_size;
- if ((level_table[index - 1] & PMD_TYPE_MASK) == - PMD_TYPE_TABLE) { + if ((level_table[index - 1] & PTE_TYPE_MASK) == + PTE_TYPE_TABLE) { level_table = (u64 *)(level_table[index - 1] & - ~PMD_TYPE_MASK); + ~PTE_TYPE_MASK); level++; continue; } else { @@ -220,7 +247,7 @@ static inline int final_secure_ddr(u64 *level0_table, struct table_info table = {}; struct sys_mmu_table ddr_entry = { 0, 0, BLOCK_SIZE_L1, MT_NORMAL, - PMD_SECT_OUTER_SHARE | PMD_SECT_NS + PTE_BLOCK_OUTER_SHARE | PTE_BLOCK_NS }; u64 index;
@@ -243,7 +270,7 @@ static inline int final_secure_ddr(u64 *level0_table, ddr_entry.virt_addr = phys_addr; ddr_entry.phys_addr = phys_addr; ddr_entry.size = CONFIG_SYS_MEM_RESERVE_SECURE; - ddr_entry.attribute = PMD_SECT_OUTER_SHARE; + ddr_entry.attribute = PTE_BLOCK_OUTER_SHARE; ret = find_table(&ddr_entry, &table, level0_table); if (ret) { printf("MMU error: could not find secure ddr table\n"); diff --git a/arch/arm/include/asm/arch-fsl-layerscape/cpu.h b/arch/arm/include/asm/arch-fsl-layerscape/cpu.h index 15ade84..93bbda3 100644 --- a/arch/arm/include/asm/arch-fsl-layerscape/cpu.h +++ b/arch/arm/include/asm/arch-fsl-layerscape/cpu.h @@ -117,48 +117,48 @@ static const struct sys_mmu_table early_mmu_table[] = { #ifdef CONFIG_FSL_LSCH3 { CONFIG_SYS_FSL_CCSR_BASE, CONFIG_SYS_FSL_CCSR_BASE, CONFIG_SYS_FSL_CCSR_SIZE, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { CONFIG_SYS_FSL_OCRAM_BASE, CONFIG_SYS_FSL_OCRAM_BASE, - CONFIG_SYS_FSL_OCRAM_SIZE, MT_NORMAL, PMD_SECT_NON_SHARE }, + CONFIG_SYS_FSL_OCRAM_SIZE, MT_NORMAL, PTE_BLOCK_NON_SHARE }, /* For IFC Region #1, only the first 4MB is cache-enabled */ { CONFIG_SYS_FSL_IFC_BASE1, CONFIG_SYS_FSL_IFC_BASE1, - CONFIG_SYS_FSL_IFC_SIZE1_1, MT_NORMAL, PMD_SECT_NON_SHARE }, + CONFIG_SYS_FSL_IFC_SIZE1_1, MT_NORMAL, PTE_BLOCK_NON_SHARE }, { CONFIG_SYS_FSL_IFC_BASE1 + CONFIG_SYS_FSL_IFC_SIZE1_1, CONFIG_SYS_FSL_IFC_BASE1 + CONFIG_SYS_FSL_IFC_SIZE1_1, CONFIG_SYS_FSL_IFC_SIZE1 - CONFIG_SYS_FSL_IFC_SIZE1_1, - MT_DEVICE_NGNRNE, PMD_SECT_NON_SHARE }, + MT_DEVICE_NGNRNE, PTE_BLOCK_NON_SHARE }, { CONFIG_SYS_FLASH_BASE, CONFIG_SYS_FSL_IFC_BASE1, - CONFIG_SYS_FSL_IFC_SIZE1, MT_DEVICE_NGNRNE, PMD_SECT_NON_SHARE }, + CONFIG_SYS_FSL_IFC_SIZE1, MT_DEVICE_NGNRNE, PTE_BLOCK_NON_SHARE }, { CONFIG_SYS_FSL_DRAM_BASE1, CONFIG_SYS_FSL_DRAM_BASE1, CONFIG_SYS_FSL_DRAM_SIZE1, MT_NORMAL, - PMD_SECT_OUTER_SHARE | PMD_SECT_NS }, + PTE_BLOCK_OUTER_SHARE | PTE_BLOCK_NS }, /* Map IFC region #2 up to CONFIG_SYS_FLASH_BASE for NAND boot */ { CONFIG_SYS_FSL_IFC_BASE2, CONFIG_SYS_FSL_IFC_BASE2, CONFIG_SYS_FLASH_BASE - CONFIG_SYS_FSL_IFC_BASE2, - MT_DEVICE_NGNRNE, PMD_SECT_NON_SHARE }, + MT_DEVICE_NGNRNE, PTE_BLOCK_NON_SHARE }, { CONFIG_SYS_FSL_DCSR_BASE, CONFIG_SYS_FSL_DCSR_BASE, CONFIG_SYS_FSL_DCSR_SIZE, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { CONFIG_SYS_FSL_DRAM_BASE2, CONFIG_SYS_FSL_DRAM_BASE2, CONFIG_SYS_FSL_DRAM_SIZE2, MT_NORMAL, - PMD_SECT_OUTER_SHARE | PMD_SECT_NS }, + PTE_BLOCK_OUTER_SHARE | PTE_BLOCK_NS }, #elif defined(CONFIG_FSL_LSCH2) { CONFIG_SYS_FSL_CCSR_BASE, CONFIG_SYS_FSL_CCSR_BASE, CONFIG_SYS_FSL_CCSR_SIZE, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { CONFIG_SYS_FSL_OCRAM_BASE, CONFIG_SYS_FSL_OCRAM_BASE, - CONFIG_SYS_FSL_OCRAM_SIZE, MT_NORMAL, PMD_SECT_NON_SHARE }, + CONFIG_SYS_FSL_OCRAM_SIZE, MT_NORMAL, PTE_BLOCK_NON_SHARE }, { CONFIG_SYS_FSL_DCSR_BASE, CONFIG_SYS_FSL_DCSR_BASE, CONFIG_SYS_FSL_DCSR_SIZE, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { CONFIG_SYS_FSL_QSPI_BASE, CONFIG_SYS_FSL_QSPI_BASE, - CONFIG_SYS_FSL_QSPI_SIZE, MT_DEVICE_NGNRNE, PMD_SECT_NON_SHARE }, + CONFIG_SYS_FSL_QSPI_SIZE, MT_DEVICE_NGNRNE, PTE_BLOCK_NON_SHARE }, { CONFIG_SYS_FSL_IFC_BASE, CONFIG_SYS_FSL_IFC_BASE, - CONFIG_SYS_FSL_IFC_SIZE, MT_DEVICE_NGNRNE, PMD_SECT_NON_SHARE }, + CONFIG_SYS_FSL_IFC_SIZE, MT_DEVICE_NGNRNE, PTE_BLOCK_NON_SHARE }, { CONFIG_SYS_FSL_DRAM_BASE1, CONFIG_SYS_FSL_DRAM_BASE1, - CONFIG_SYS_FSL_DRAM_SIZE1, MT_NORMAL, PMD_SECT_OUTER_SHARE }, + CONFIG_SYS_FSL_DRAM_SIZE1, MT_NORMAL, PTE_BLOCK_OUTER_SHARE }, { CONFIG_SYS_FSL_DRAM_BASE2, CONFIG_SYS_FSL_DRAM_BASE2, - CONFIG_SYS_FSL_DRAM_SIZE2, MT_NORMAL, PMD_SECT_OUTER_SHARE }, + CONFIG_SYS_FSL_DRAM_SIZE2, MT_NORMAL, PTE_BLOCK_OUTER_SHARE }, #endif };
@@ -166,96 +166,96 @@ static const struct sys_mmu_table final_mmu_table[] = { #ifdef CONFIG_FSL_LSCH3 { CONFIG_SYS_FSL_CCSR_BASE, CONFIG_SYS_FSL_CCSR_BASE, CONFIG_SYS_FSL_CCSR_SIZE, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { CONFIG_SYS_FSL_OCRAM_BASE, CONFIG_SYS_FSL_OCRAM_BASE, - CONFIG_SYS_FSL_OCRAM_SIZE, MT_NORMAL, PMD_SECT_NON_SHARE }, + CONFIG_SYS_FSL_OCRAM_SIZE, MT_NORMAL, PTE_BLOCK_NON_SHARE }, { CONFIG_SYS_FSL_DRAM_BASE1, CONFIG_SYS_FSL_DRAM_BASE1, CONFIG_SYS_FSL_DRAM_SIZE1, MT_NORMAL, - PMD_SECT_OUTER_SHARE | PMD_SECT_NS }, + PTE_BLOCK_OUTER_SHARE | PTE_BLOCK_NS }, { CONFIG_SYS_FSL_QSPI_BASE2, CONFIG_SYS_FSL_QSPI_BASE2, CONFIG_SYS_FSL_QSPI_SIZE2, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { CONFIG_SYS_FSL_IFC_BASE2, CONFIG_SYS_FSL_IFC_BASE2, - CONFIG_SYS_FSL_IFC_SIZE2, MT_DEVICE_NGNRNE, PMD_SECT_NON_SHARE }, + CONFIG_SYS_FSL_IFC_SIZE2, MT_DEVICE_NGNRNE, PTE_BLOCK_NON_SHARE }, { CONFIG_SYS_FSL_DCSR_BASE, CONFIG_SYS_FSL_DCSR_BASE, CONFIG_SYS_FSL_DCSR_SIZE, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { CONFIG_SYS_FSL_MC_BASE, CONFIG_SYS_FSL_MC_BASE, CONFIG_SYS_FSL_MC_SIZE, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { CONFIG_SYS_FSL_NI_BASE, CONFIG_SYS_FSL_NI_BASE, CONFIG_SYS_FSL_NI_SIZE, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, /* For QBMAN portal, only the first 64MB is cache-enabled */ { CONFIG_SYS_FSL_QBMAN_BASE, CONFIG_SYS_FSL_QBMAN_BASE, CONFIG_SYS_FSL_QBMAN_SIZE_1, MT_NORMAL, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN | PMD_SECT_NS }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN | PTE_BLOCK_NS }, { CONFIG_SYS_FSL_QBMAN_BASE + CONFIG_SYS_FSL_QBMAN_SIZE_1, CONFIG_SYS_FSL_QBMAN_BASE + CONFIG_SYS_FSL_QBMAN_SIZE_1, CONFIG_SYS_FSL_QBMAN_SIZE - CONFIG_SYS_FSL_QBMAN_SIZE_1, - MT_DEVICE_NGNRNE, PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + MT_DEVICE_NGNRNE, PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { CONFIG_SYS_PCIE1_PHYS_ADDR, CONFIG_SYS_PCIE1_PHYS_ADDR, CONFIG_SYS_PCIE1_PHYS_SIZE, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { CONFIG_SYS_PCIE2_PHYS_ADDR, CONFIG_SYS_PCIE2_PHYS_ADDR, CONFIG_SYS_PCIE2_PHYS_SIZE, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { CONFIG_SYS_PCIE3_PHYS_ADDR, CONFIG_SYS_PCIE3_PHYS_ADDR, CONFIG_SYS_PCIE3_PHYS_SIZE, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, #if defined(CONFIG_LS2080A) || defined(CONFIG_LS2085A) { CONFIG_SYS_PCIE4_PHYS_ADDR, CONFIG_SYS_PCIE4_PHYS_ADDR, CONFIG_SYS_PCIE4_PHYS_SIZE, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, #endif { CONFIG_SYS_FSL_WRIOP1_BASE, CONFIG_SYS_FSL_WRIOP1_BASE, CONFIG_SYS_FSL_WRIOP1_SIZE, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { CONFIG_SYS_FSL_AIOP1_BASE, CONFIG_SYS_FSL_AIOP1_BASE, CONFIG_SYS_FSL_AIOP1_SIZE, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { CONFIG_SYS_FSL_PEBUF_BASE, CONFIG_SYS_FSL_PEBUF_BASE, CONFIG_SYS_FSL_PEBUF_SIZE, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { CONFIG_SYS_FSL_DRAM_BASE2, CONFIG_SYS_FSL_DRAM_BASE2, CONFIG_SYS_FSL_DRAM_SIZE2, MT_NORMAL, - PMD_SECT_OUTER_SHARE | PMD_SECT_NS }, + PTE_BLOCK_OUTER_SHARE | PTE_BLOCK_NS }, #elif defined(CONFIG_FSL_LSCH2) { CONFIG_SYS_FSL_BOOTROM_BASE, CONFIG_SYS_FSL_BOOTROM_BASE, CONFIG_SYS_FSL_BOOTROM_SIZE, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { CONFIG_SYS_FSL_CCSR_BASE, CONFIG_SYS_FSL_CCSR_BASE, CONFIG_SYS_FSL_CCSR_SIZE, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { CONFIG_SYS_FSL_OCRAM_BASE, CONFIG_SYS_FSL_OCRAM_BASE, - CONFIG_SYS_FSL_OCRAM_SIZE, MT_NORMAL, PMD_SECT_NON_SHARE }, + CONFIG_SYS_FSL_OCRAM_SIZE, MT_NORMAL, PTE_BLOCK_NON_SHARE }, { CONFIG_SYS_FSL_DCSR_BASE, CONFIG_SYS_FSL_DCSR_BASE, CONFIG_SYS_FSL_DCSR_SIZE, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { CONFIG_SYS_FSL_QSPI_BASE, CONFIG_SYS_FSL_QSPI_BASE, CONFIG_SYS_FSL_QSPI_SIZE, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { CONFIG_SYS_FSL_IFC_BASE, CONFIG_SYS_FSL_IFC_BASE, - CONFIG_SYS_FSL_IFC_SIZE, MT_DEVICE_NGNRNE, PMD_SECT_NON_SHARE }, + CONFIG_SYS_FSL_IFC_SIZE, MT_DEVICE_NGNRNE, PTE_BLOCK_NON_SHARE }, { CONFIG_SYS_FSL_DRAM_BASE1, CONFIG_SYS_FSL_DRAM_BASE1, CONFIG_SYS_FSL_DRAM_SIZE1, MT_NORMAL, - PMD_SECT_OUTER_SHARE | PMD_SECT_NS }, + PTE_BLOCK_OUTER_SHARE | PTE_BLOCK_NS }, { CONFIG_SYS_FSL_QBMAN_BASE, CONFIG_SYS_FSL_QBMAN_BASE, CONFIG_SYS_FSL_QBMAN_SIZE, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { CONFIG_SYS_FSL_DRAM_BASE2, CONFIG_SYS_FSL_DRAM_BASE2, - CONFIG_SYS_FSL_DRAM_SIZE2, MT_NORMAL, PMD_SECT_OUTER_SHARE }, + CONFIG_SYS_FSL_DRAM_SIZE2, MT_NORMAL, PTE_BLOCK_OUTER_SHARE }, { CONFIG_SYS_PCIE1_PHYS_ADDR, CONFIG_SYS_PCIE1_PHYS_ADDR, CONFIG_SYS_PCIE1_PHYS_SIZE, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { CONFIG_SYS_PCIE2_PHYS_ADDR, CONFIG_SYS_PCIE2_PHYS_ADDR, CONFIG_SYS_PCIE2_PHYS_SIZE, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { CONFIG_SYS_PCIE3_PHYS_ADDR, CONFIG_SYS_PCIE3_PHYS_ADDR, CONFIG_SYS_PCIE3_PHYS_SIZE, MT_DEVICE_NGNRNE, - PMD_SECT_NON_SHARE | PMD_SECT_PXN | PMD_SECT_UXN }, + PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { CONFIG_SYS_FSL_DRAM_BASE3, CONFIG_SYS_FSL_DRAM_BASE3, - CONFIG_SYS_FSL_DRAM_SIZE3, MT_NORMAL, PMD_SECT_OUTER_SHARE }, + CONFIG_SYS_FSL_DRAM_SIZE3, MT_NORMAL, PTE_BLOCK_OUTER_SHARE }, #endif }; #endif diff --git a/arch/arm/include/asm/armv8/mmu.h b/arch/arm/include/asm/armv8/mmu.h index 06126c8..0d08ed3 100644 --- a/arch/arm/include/asm/armv8/mmu.h +++ b/arch/arm/include/asm/armv8/mmu.h @@ -22,28 +22,12 @@ * calculated specifically. */
-#ifndef CONFIG_SYS_FULL_VA -#define VA_BITS (42) /* 42 bits virtual address */ -#else #define VA_BITS CONFIG_SYS_VA_BITS #define PTE_BLOCK_BITS CONFIG_SYS_PTL2_BITS -#endif
/* * block/section address mask and size definitions. */ -#ifndef CONFIG_SYS_FULL_VA -#define SECTION_SHIFT 29 -#define SECTION_SIZE (UL(1) << SECTION_SHIFT) -#define SECTION_MASK (~(SECTION_SIZE-1)) - -/* PAGE_SHIFT determines the page size */ -#undef PAGE_SIZE -#define PAGE_SHIFT 16 -#define PAGE_SIZE (1 << PAGE_SHIFT) -#define PAGE_MASK (~(PAGE_SIZE-1)) - -#else
/* PAGE_SHIFT determines the page size */ #undef PAGE_SIZE @@ -51,8 +35,6 @@ #define PAGE_SIZE (1 << PAGE_SHIFT) #define PAGE_MASK (~(PAGE_SIZE-1))
-#endif - /***************************************************************/
/* @@ -75,8 +57,6 @@ * */
-#ifdef CONFIG_SYS_FULL_VA - #define PTE_TYPE_MASK (3 << 0) #define PTE_TYPE_FAULT (0 << 0) #define PTE_TYPE_TABLE (3 << 0) @@ -91,6 +71,7 @@ * Block */ #define PTE_BLOCK_MEMTYPE(x) ((x) << 2) +#define PTE_BLOCK_NS (1 << 5) #define PTE_BLOCK_NON_SHARE (0 << 8) #define PTE_BLOCK_OUTER_SHARE (2 << 8) #define PTE_BLOCK_INNER_SHARE (3 << 8) @@ -99,29 +80,6 @@ #define PTE_BLOCK_PXN (UL(1) << 53) #define PTE_BLOCK_UXN (UL(1) << 54)
-#else -/* - * Level 2 descriptor (PMD). - */ -#define PMD_TYPE_MASK (3 << 0) -#define PMD_TYPE_FAULT (0 << 0) -#define PMD_TYPE_TABLE (3 << 0) -#define PMD_TYPE_SECT (1 << 0) - -/* - * Section - */ -#define PMD_SECT_NS (1 << 5) -#define PMD_SECT_NON_SHARE (0 << 8) -#define PMD_SECT_OUTER_SHARE (2 << 8) -#define PMD_SECT_INNER_SHARE (3 << 8) -#define PMD_SECT_AF (1 << 10) -#define PMD_SECT_NG (1 << 11) -#define PMD_SECT_PXN (UL(1) << 53) -#define PMD_SECT_UXN (UL(1) << 54) - -#endif - /* * AttrIndx[2:0] */ @@ -150,33 +108,11 @@ #define TCR_TG0_16K (2 << 14) #define TCR_EPD1_DISABLE (1 << 23)
-#ifndef CONFIG_SYS_FULL_VA -#define TCR_EL1_IPS_BITS (UL(3) << 32) /* 42 bits physical address */ -#define TCR_EL2_IPS_BITS (3 << 16) /* 42 bits physical address */ -#define TCR_EL3_IPS_BITS (3 << 16) /* 42 bits physical address */ - -/* PTWs cacheable, inner/outer WBWA and inner shareable */ -#define TCR_FLAGS (TCR_TG0_64K | \ - TCR_SHARED_INNER | \ - TCR_ORGN_WBWA | \ - TCR_IRGN_WBWA | \ - TCR_T0SZ(VA_BITS)) -#endif - #define TCR_EL1_RSVD (1 << 31) #define TCR_EL2_RSVD (1 << 31 | 1 << 23) #define TCR_EL3_RSVD (1 << 31 | 1 << 23)
#ifndef __ASSEMBLY__ -#ifndef CONFIG_SYS_FULL_VA - -void set_pgtable_section(u64 *page_table, u64 index, - u64 section, u64 memory_type, - u64 attribute); -void set_pgtable_table(u64 *page_table, u64 index, - u64 *table_addr); - -#endif static inline void set_ttbr_tcr_mair(int el, u64 table, u64 tcr, u64 attr) { asm volatile("dsb sy"); diff --git a/arch/arm/include/asm/global_data.h b/arch/arm/include/asm/global_data.h index 259daa1..77d2653 100644 --- a/arch/arm/include/asm/global_data.h +++ b/arch/arm/include/asm/global_data.h @@ -39,7 +39,7 @@ struct arch_global_data { #if !(defined(CONFIG_SYS_ICACHE_OFF) && defined(CONFIG_SYS_DCACHE_OFF)) unsigned long tlb_addr; unsigned long tlb_size; -#if defined(CONFIG_SYS_FULL_VA) +#if defined(CONFIG_ARM64) unsigned long tlb_fillptr; unsigned long tlb_emerg; #endif diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h index 9b1cbf2..ac1173d 100644 --- a/arch/arm/include/asm/system.h +++ b/arch/arm/include/asm/system.h @@ -17,18 +17,14 @@ #define CR_WXN (1 << 19) /* Write Permision Imply XN */ #define CR_EE (1 << 25) /* Exception (Big) Endian */
-/* 2MB granularity */ -#define MMU_SECTION_SHIFT 21 -#define MMU_SECTION_SIZE (1 << MMU_SECTION_SHIFT) - #ifndef __ASSEMBLY__
-#ifndef CONFIG_SYS_FULL_VA -#define PGTABLE_SIZE (0x10000) -#else u64 get_page_table_size(void); #define PGTABLE_SIZE get_page_table_size() -#endif + +/* 2MB granularity */ +#define MMU_SECTION_SHIFT 21 +#define MMU_SECTION_SIZE (1 << MMU_SECTION_SHIFT)
enum dcache_option { DCACHE_OFF = 0x3, diff --git a/doc/README.arm64 b/doc/README.arm64 index de669cb..f658fa2 100644 --- a/doc/README.arm64 +++ b/doc/README.arm64 @@ -36,26 +36,6 @@ Notes 6. CONFIG_ARM64 instead of CONFIG_ARMV8 is used to distinguish aarch64 and aarch32 specific codes.
-7. CONFIG_SYS_FULL_VA is used to enable 2-level page tables. For cores - supporting 64k pages it allows usage of full 48+ virtual/physical addresses - - Enabling this option requires the following ones to be defined: - - CONFIG_SYS_MEM_MAP - an array of 'struct mm_region' describing the - system memory map (start, length, attributes) - - CONFIG_SYS_MEM_MAP_SIZE - number of entries in CONFIG_SYS_MEM_MAP - - CONFIG_SYS_PTL1_ENTRIES - number of 1st level page table entries - - CONFIG_SYS_PTL2_ENTRIES - number of 1nd level page table entries - for the largest CONFIG_SYS_MEM_MAP entry - - CONFIG_COREID_MASK - the mask value used to get the core from the - MPIDR_EL1 register - - CONFIG_SYS_PTL2_BITS - number of bits addressed by the 2nd level - page tables - - CONFIG_SYS_BLOCK_SHIFT - number of bits addressed by a single block - entry from L2 page tables - - CONFIG_SYS_PGTABLE_SIZE - total size of the page table - - CONFIG_SYS_TCR_EL{1,2,3}_IPS_BITS - the IPS field of the TCR_EL{1,2,3} - -
Contributor diff --git a/include/configs/hikey.h b/include/configs/hikey.h index d33dcef..2d9ace9 100644 --- a/include/configs/hikey.h +++ b/include/configs/hikey.h @@ -23,7 +23,6 @@
/* MMU Definitions */ #define CONFIG_SYS_CACHELINE_SIZE 64 -#define CONFIG_SYS_FULL_VA
#define CONFIG_IDENT_STRING "hikey"
diff --git a/include/configs/tegra210-common.h b/include/configs/tegra210-common.h index 2a6e317..8f35a7b 100644 --- a/include/configs/tegra210-common.h +++ b/include/configs/tegra210-common.h @@ -13,8 +13,6 @@ /* Cortex-A57 uses a cache line size of 64 bytes */ #define CONFIG_SYS_CACHELINE_SIZE 64
-#define CONFIG_SYS_FULL_VA - /* * NS16550 Configuration */ diff --git a/include/configs/thunderx_88xx.h b/include/configs/thunderx_88xx.h index 64e4616..736d0a5 100644 --- a/include/configs/thunderx_88xx.h +++ b/include/configs/thunderx_88xx.h @@ -22,22 +22,8 @@
#define MEM_BASE 0x00500000
-#define CONFIG_SYS_FULL_VA - #define CONFIG_SYS_LOWMEM_BASE MEM_BASE
-#define CONFIG_SYS_MEM_MAP_SIZE 3 - -#define CONFIG_SYS_VA_BITS 48 -#define CONFIG_SYS_PTL2_BITS 42 -#define CONFIG_SYS_BLOCK_SHIFT 29 -#define CONFIG_SYS_PTL1_ENTRIES 64 -#define CONFIG_SYS_PTL2_ENTRIES 8192 - -#define CONFIG_SYS_PGTABLE_SIZE \ - ((CONFIG_SYS_PTL1_ENTRIES + \ - CONFIG_SYS_MEM_MAP_SIZE * CONFIG_SYS_PTL2_ENTRIES) * 8) - /* Link Definitions */ #define CONFIG_SYS_TEXT_BASE 0x00500000 #define CONFIG_SYS_INIT_SP_ADDR (CONFIG_SYS_SDRAM_BASE + 0x7fff0) diff --git a/include/configs/vexpress_aemv8a.h b/include/configs/vexpress_aemv8a.h index ddb9848..2949170 100644 --- a/include/configs/vexpress_aemv8a.h +++ b/include/configs/vexpress_aemv8a.h @@ -21,7 +21,6 @@
/* MMU Definitions */ #define CONFIG_SYS_CACHELINE_SIZE 64 -#define CONFIG_SYS_FULL_VA
#define CONFIG_IDENT_STRING " vexpress_aemv8a" #define CONFIG_BOOTP_VCI_STRING "U-Boot.armv8.vexpress_aemv8a" diff --git a/include/configs/xilinx_zynqmp.h b/include/configs/xilinx_zynqmp.h index 08f430c..da868b8 100644 --- a/include/configs/xilinx_zynqmp.h +++ b/include/configs/xilinx_zynqmp.h @@ -29,8 +29,6 @@ #define CONFIG_SYS_MEMTEST_START CONFIG_SYS_SDRAM_BASE #define CONFIG_SYS_MEMTEST_END CONFIG_SYS_SDRAM_SIZE
-#define CONFIG_SYS_FULL_VA - /* Have release address at the end of 256MB for now */ #define CPU_RELEASE_ADDR 0xFFFFFF0

Now that we have an easy way to describe memory regions and enable the MMU, there really shouldn't be anything holding people back from running with caches enabled on AArch64. To make sure people catch early if they're missing on the caching fun, give them a compile error.
Signed-off-by: Alexander Graf agraf@suse.de --- arch/arm/cpu/armv8/cache_v8.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c index df15e00..d1bd06b 100644 --- a/arch/arm/cpu/armv8/cache_v8.c +++ b/arch/arm/cpu/armv8/cache_v8.c @@ -566,6 +566,15 @@ void mmu_set_region_dcache_behaviour(phys_addr_t start, size_t size,
#else /* CONFIG_SYS_DCACHE_OFF */
+/* + * For SPL builds, we may want to not have dcache enabled. Any real U-Boot + * running however really wants to have dcache and the MMU active. Check that + * everything is sane and give the developer a hint if it isn't. + */ +#ifndef CONFIG_SPL_BUILD +#error Please describe your MMU layout in CONFIG_SYS_MEM_MAP and enable dcache. +#endif + void invalidate_dcache_all(void) { }

On 02/25/2016 05:49 PM, Alexander Graf wrote:
Howdy,
Currently on arm64 there is a big pile of mess when it comes to MMU support and page tables. Each board does its own little thing and the generic code is pretty dumb and nobody actually uses it.
This patch set tries to clean that up. After this series is applied, all boards except for the FSL Layerscape ones are converted to the new generic page table logic and have icache+dcache enabled.
The new code always uses 4k page size. It dynamically allocates 1G or 2M pages for ranges that fit. When a dcache attribute request comes in that requires a smaller granularity than our previous allocation could fulfill, pages get automatically split.
I have tested and verified the code works on HiKey (bare metal), vexpress64 (Foundation Model) and zynqmp (QEMU). The TX1 target is untested, but given the simplicity of the maps I doubt it'll break. ThunderX in theory should also work, but I haven't tested it. I would be very happy if people with access to those system could give the patch set a try.
With this we're a big step closer to a good base line for EFI payload support, since we can now just require that all boards always have dcache enabled.
I would also be incredibly happy if some Freescale people could look at their MMU code and try to unify it into the now cleaned up generic code. I don't think we're far off here.
Tested-by: Stephen Warren swarren@nvidia.com (On p2371-0000 and p2371-2180)
participants (3)
-
Alexander Graf
-
Simon Glass
-
Stephen Warren