[PATCH] armv8: MMU: Mark code memory Executable, any other Non-Executable

From: Marek Bykowski marek.bykowski@gmail.com
If the location the ARM CPU is accessing is executable (translation table descriptor Execute-Never attribute bit cleared) then the ARM CPU fetches a number of instructions from that location all at the same time. For example, Cortex-A57 can source up to 128 bits per fetch depending on alignment.
If the CPU mispredicts to the Execute-Never region, it creates the memory fault but it actually never uses the instructions mispredicted. The CPU branches away elsewhere. So, as long as we program the MMU correctly these mispredictions will only affect the performance.
However if we fail programming so and the instruction fetch logic goes mispredict to non-instruction memory it may eventually perturb it, eg. corrupt the FIFO, or the control registers, load the unified cache the data side memory system hits into subsequently.
U-Boot adheres into attributing the device regions to Execute-Never but it actually fails doing so for data regions. Data as well as Device Regions should be Execute-Never.
This patch enables attributing data memory regions to Non-Executable, and code region to Executable, additionally to Read-Only. Read-Only ensures the code region is only Readable resulting in Instruction Abort, Permission Fault exception on a write.
To use the updated attributes the DDR memory of interest should be Non-Executable, it is by default Read-Write Access Permission as well with an example as follows:
static struct mm_region axxia_mem_map[] = { { .virt = 0x0UL, /* DDR */ .phys = 0x0UL, .size = 0x400000000ULL, .attrs = PTE_BLOCK_MEMTYPE(MT_NORMAL) | PTE_BLOCK_INNER_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { .virt = AXXIA_CCN_512_BASE, .phys = AXXIA_CCN_512_BASE, .size = AXXIA_CCN_512_SIZE, .attrs = PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { .virt = AXI_MMAP_BASE, .phys = AXI_MMAP_BASE, .size = AXI_MMAP_SIZE, .attrs = PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { .virt = AXI_PERIPH_BASE, .phys = AXI_PERIPH_BASE, .size = AXI_PERIPH_SIZE, .attrs = PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { /* List terminator */ 0, } };
add_text_map() routine in cache_v8.c locates then the code region in it and modifies its attributes to Executable, Read-Only. The HW Debugger views the Memory Map set then as:
EL2N:0x00000000-0x3FD35FFF NP:0x00000000-0x3FD35FFF Normal RW C S XN EL2N:0x3FD36000-0x3FD86FFF NP:0x3FD36000-0x3FD86FFF Normal RO C S EL2N:0x3FD87000-0x3FFFFFFFF NP:0x3FD87000-0x3FFFFFFFF Normal RW C S XN EL2N:0x400000000-0x3FFFFFFFFF <unmapped> EL2N:0x4000000000-0x403FFFFFFF NP:0x4000000000-0x403FFFFFFF Device-nGnRnE RW NC XN EL2N:0x4040000000-0x7FFFFFFFFF <unmapped> EL2N:0x8000000000-0x803FFFFFFF NP:0x8000000000-0x803FFFFFFF Device-nGnRnE RW NC XN EL2N:0x8040000000-0x807FFFFFFF <unmapped> EL2N:0x8080000000-0x80BFFFFFFF NP:0x8080000000-0x80BFFFFFFF Device-nGnRnE RW NC XN EL2N:0x80C0000000-0xFFFFFFFFFF <unmapped>
where: C, NC is Cacheable, Non-Cacheable respectively, S, NS - Shareable and Non-Shareable, XN - Execute-Never, if XN isn't present it is Executable.
May be worth to add that Shareability is of no matter to Non-Cacheable as it concerns only the Cacheable Regions.
Signed-off-by: Marek Bykowski marek.bykowski@gmail.com --- arch/arm/cpu/armv8/cache_v8.c | 21 +++++++++++++++++++++ arch/arm/cpu/armv8/u-boot.lds | 6 ++++-- arch/arm/include/asm/armv8/mmu.h | 6 ++++++ arch/arm/lib/sections.c | 1 + include/asm-generic/sections.h | 1 + 5 files changed, 33 insertions(+), 2 deletions(-)
diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c index c1a08fb4ac..cb25fc0c8a 100644 --- a/arch/arm/cpu/armv8/cache_v8.c +++ b/arch/arm/cpu/armv8/cache_v8.c @@ -11,6 +11,7 @@ #include <cpu_func.h> #include <asm/system.h> #include <asm/armv8/mmu.h> +#include <asm/sections.h>
DECLARE_GLOBAL_DATA_PTR;
@@ -361,6 +362,24 @@ __weak u64 get_page_table_size(void) return size; }
+__weak void add_text_map(void) +{ + if (!(gd->flags & GD_FLG_RELOC)) + return; + + /* Text is always XN=0, read-only region. */ + struct mm_region text = { + .virt = (unsigned long)__image_copy_start, + .phys = (unsigned long)__image_copy_start, + .size = (unsigned long)__text_end - (unsigned long)__image_copy_start, + .attrs = PTE_BLOCK_MEMTYPE(MT_NORMAL) | + PTE_BLOCK_INNER_SHARE | + PTE_BLOCK_AP_RO + }; + + add_map(&text); +} + void setup_pgtables(void) { int i; @@ -378,6 +397,8 @@ void setup_pgtables(void) /* Now add all MMU table entries one after another to the table */ for (i = 0; mem_map[i].size || mem_map[i].attrs; i++) add_map(&mem_map[i]); + + add_text_map(); }
static void setup_all_pgtables(void) diff --git a/arch/arm/cpu/armv8/u-boot.lds b/arch/arm/cpu/armv8/u-boot.lds index 2554980595..20f98ae74d 100644 --- a/arch/arm/cpu/armv8/u-boot.lds +++ b/arch/arm/cpu/armv8/u-boot.lds @@ -20,8 +20,8 @@ SECTIONS #endif . = 0x00000000;
- . = ALIGN(8); - .text : + /* Align .text to the page size */ + .text ALIGN(0x1000): { *(.__image_copy_start) CPUDIR/start.o (.text*) @@ -39,6 +39,8 @@ SECTIONS .text_rest : { *(.text*) + . = NEXT(0x1000); + KEEP(*(.__text_end)) }
#ifdef CONFIG_ARMV8_PSCI diff --git a/arch/arm/include/asm/armv8/mmu.h b/arch/arm/include/asm/armv8/mmu.h index 4a573208df..4bd15c7e08 100644 --- a/arch/arm/include/asm/armv8/mmu.h +++ b/arch/arm/include/asm/armv8/mmu.h @@ -58,6 +58,12 @@ */ #define PTE_BLOCK_MEMTYPE(x) ((x) << 2) #define PTE_BLOCK_NS (1 << 5) +/* + * AP[1] bit is ignored by hardware and is + * treated as if it was One in EL2/EL3 + */ +#define PTE_BLOCK_AP_RO (1 << 7) +#define PTE_BLOCK_AP_RW (0 << 7) #define PTE_BLOCK_NON_SHARE (0 << 8) #define PTE_BLOCK_OUTER_SHARE (2 << 8) #define PTE_BLOCK_INNER_SHARE (3 << 8) diff --git a/arch/arm/lib/sections.c b/arch/arm/lib/sections.c index 3bb2d8468c..a76f2d3049 100644 --- a/arch/arm/lib/sections.c +++ b/arch/arm/lib/sections.c @@ -18,6 +18,7 @@ * aliasing warnings. */
+char __text_end[0] __attribute__((section(".__text_end"))); char __bss_start[0] __attribute__((section(".__bss_start"))); char __bss_end[0] __attribute__((section(".__bss_end"))); char __image_copy_start[0] __attribute__((section(".__image_copy_start"))); diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h index 17a31ec788..1487e16432 100644 --- a/include/asm-generic/sections.h +++ b/include/asm-generic/sections.h @@ -70,6 +70,7 @@ extern void _start(void); */ #ifdef CONFIG_ARM
+extern char __text_end[]; extern char __bss_start[]; extern char __bss_end[]; extern char __image_copy_start[];

From: Marek Bykowski marek.bykowski@gmail.com
If the location the ARM CPU is accessing is executable (translation table descriptor Execute-Never attribute bit cleared) then the ARM CPU fetches a number of instructions from that location all at the same time. For example, Cortex-A57 can source up to 128 bits per fetch depending on alignment.
If the CPU mispredicts to the Execute-Never region, it creates the memory fault but it actually never uses the instructions mispredicted. The CPU branches away elsewhere. So, as long as we program the MMU correctly these mispredictions will only affect the performance.
However if we fail programming so and the instruction fetch logic goes mispredict to non-instruction memory it may eventually perturb it, eg. corrupt the FIFO, or the control registers, load the unified cache the data side memory system hits into subsequently.
U-Boot adheres into attributing the device regions to Execute-Never but it actually fails doing so for data regions. Data as well as Device Regions should be Execute-Never.
This patch enables attributing data memory regions to Non-Executable, and code region to Executable, additionally to Read-Only. Read-Only ensures the code region is only Readable resulting in Instruction Abort, Permission Fault exception on a write.
To use the updated attributes the DDR memory of interest should be Non-Executable, it is by default Read-Write Access Permission as well with an example as follows:
static struct mm_region axxia_mem_map[] = { { .virt = 0x0UL, /* DDR */ .phys = 0x0UL, .size = 0x400000000ULL, .attrs = PTE_BLOCK_MEMTYPE(MT_NORMAL) | PTE_BLOCK_INNER_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { .virt = AXXIA_CCN_512_BASE, .phys = AXXIA_CCN_512_BASE, .size = AXXIA_CCN_512_SIZE, .attrs = PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { .virt = AXI_MMAP_BASE, .phys = AXI_MMAP_BASE, .size = AXI_MMAP_SIZE, .attrs = PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { .virt = AXI_PERIPH_BASE, .phys = AXI_PERIPH_BASE, .size = AXI_PERIPH_SIZE, .attrs = PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { /* List terminator */ 0, } };
add_text_map() routine in cache_v8.c locates then the code region in it and modifies its attributes to Executable, Read-Only. The HW Debugger views the Memory Map set then as:
EL2N:0x00000000-0x3FD35FFF NP:0x00000000-0x3FD35FFF Normal RW C S XN EL2N:0x3FD36000-0x3FD86FFF NP:0x3FD36000-0x3FD86FFF Normal RO C S EL2N:0x3FD87000-0x3FFFFFFFF NP:0x3FD87000-0x3FFFFFFFF Normal RW C S XN EL2N:0x400000000-0x3FFFFFFFFF <unmapped> EL2N:0x4000000000-0x403FFFFFFF NP:0x4000000000-0x403FFFFFFF Device-nGnRnE RW NC XN EL2N:0x4040000000-0x7FFFFFFFFF <unmapped> EL2N:0x8000000000-0x803FFFFFFF NP:0x8000000000-0x803FFFFFFF Device-nGnRnE RW NC XN EL2N:0x8040000000-0x807FFFFFFF <unmapped> EL2N:0x8080000000-0x80BFFFFFFF NP:0x8080000000-0x80BFFFFFFF Device-nGnRnE RW NC XN EL2N:0x80C0000000-0xFFFFFFFFFF <unmapped>
where: C, NC is Cacheable, Non-Cacheable respectively, S, NS - Shareable and Non-Shareable, XN - Execute-Never, if XN isn't present it is Executable.
Shareability of the Device is always treated as Outer Shareable, regardless of the attributes, therefore the Shareability for the Device Regions is not mentioned here.
Signed-off-by: Marek Bykowski marek.bykowski@gmail.com --- Patch v1 -> v2: I have only changed the description of the commit as incorrectly address the Shareability domain there.
arch/arm/cpu/armv8/cache_v8.c | 21 +++++++++++++++++++++ arch/arm/cpu/armv8/u-boot.lds | 6 ++++-- arch/arm/include/asm/armv8/mmu.h | 6 ++++++ arch/arm/lib/sections.c | 1 + include/asm-generic/sections.h | 1 + 5 files changed, 33 insertions(+), 2 deletions(-)
diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c index c1a08fb4ac..cb25fc0c8a 100644 --- a/arch/arm/cpu/armv8/cache_v8.c +++ b/arch/arm/cpu/armv8/cache_v8.c @@ -11,6 +11,7 @@ #include <cpu_func.h> #include <asm/system.h> #include <asm/armv8/mmu.h> +#include <asm/sections.h>
DECLARE_GLOBAL_DATA_PTR;
@@ -361,6 +362,24 @@ __weak u64 get_page_table_size(void) return size; }
+__weak void add_text_map(void) +{ + if (!(gd->flags & GD_FLG_RELOC)) + return; + + /* Text is always XN=0, read-only region. */ + struct mm_region text = { + .virt = (unsigned long)__image_copy_start, + .phys = (unsigned long)__image_copy_start, + .size = (unsigned long)__text_end - (unsigned long)__image_copy_start, + .attrs = PTE_BLOCK_MEMTYPE(MT_NORMAL) | + PTE_BLOCK_INNER_SHARE | + PTE_BLOCK_AP_RO + }; + + add_map(&text); +} + void setup_pgtables(void) { int i; @@ -378,6 +397,8 @@ void setup_pgtables(void) /* Now add all MMU table entries one after another to the table */ for (i = 0; mem_map[i].size || mem_map[i].attrs; i++) add_map(&mem_map[i]); + + add_text_map(); }
static void setup_all_pgtables(void) diff --git a/arch/arm/cpu/armv8/u-boot.lds b/arch/arm/cpu/armv8/u-boot.lds index 2554980595..20f98ae74d 100644 --- a/arch/arm/cpu/armv8/u-boot.lds +++ b/arch/arm/cpu/armv8/u-boot.lds @@ -20,8 +20,8 @@ SECTIONS #endif . = 0x00000000;
- . = ALIGN(8); - .text : + /* Align .text to the page size */ + .text ALIGN(0x1000): { *(.__image_copy_start) CPUDIR/start.o (.text*) @@ -39,6 +39,8 @@ SECTIONS .text_rest : { *(.text*) + . = NEXT(0x1000); + KEEP(*(.__text_end)) }
#ifdef CONFIG_ARMV8_PSCI diff --git a/arch/arm/include/asm/armv8/mmu.h b/arch/arm/include/asm/armv8/mmu.h index 4a573208df..4bd15c7e08 100644 --- a/arch/arm/include/asm/armv8/mmu.h +++ b/arch/arm/include/asm/armv8/mmu.h @@ -58,6 +58,12 @@ */ #define PTE_BLOCK_MEMTYPE(x) ((x) << 2) #define PTE_BLOCK_NS (1 << 5) +/* + * AP[1] bit is ignored by hardware and is + * treated as if it was One in EL2/EL3 + */ +#define PTE_BLOCK_AP_RO (1 << 7) +#define PTE_BLOCK_AP_RW (0 << 7) #define PTE_BLOCK_NON_SHARE (0 << 8) #define PTE_BLOCK_OUTER_SHARE (2 << 8) #define PTE_BLOCK_INNER_SHARE (3 << 8) diff --git a/arch/arm/lib/sections.c b/arch/arm/lib/sections.c index 3bb2d8468c..a76f2d3049 100644 --- a/arch/arm/lib/sections.c +++ b/arch/arm/lib/sections.c @@ -18,6 +18,7 @@ * aliasing warnings. */
+char __text_end[0] __attribute__((section(".__text_end"))); char __bss_start[0] __attribute__((section(".__bss_start"))); char __bss_end[0] __attribute__((section(".__bss_end"))); char __image_copy_start[0] __attribute__((section(".__image_copy_start"))); diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h index 17a31ec788..1487e16432 100644 --- a/include/asm-generic/sections.h +++ b/include/asm-generic/sections.h @@ -70,6 +70,7 @@ extern void _start(void); */ #ifdef CONFIG_ARM
+extern char __text_end[]; extern char __bss_start[]; extern char __bss_end[]; extern char __image_copy_start[];

On Fri, Jun 19, 2020 at 02:53:32PM +0200, Marek Bykowski wrote:
From: Marek Bykowski marek.bykowski@gmail.com
However if we fail programming so and the instruction fetch logic goes mispredict to non-instruction memory it may eventually perturb it, eg. corrupt the FIFO, or the control registers, load the unified cache the data side memory system hits into subsequently.
U-Boot adheres into attributing the device regions to Execute-Never but it actually fails doing so for data regions. Data as well as Device Regions should be Execute-Never.
I kind of missed to say that 'not having' this patch may result in an fault/faults on the systems with Cortex-A57. On our system the System Memory Controller gets violated and raises an error interrupt. The fault that it gets received is out-of-order type, that is an attempt to read from an out-of-range address with details below:
out_of_range_addr 0x702200200 -> faulting address out_of_range_length 0x40 -> burst size out_of_range_type 0x5 -> stands for a wrapped read command out_of_range_source_id 0x0 -> source indicates it is coming from CPU0
Burst size 64 bytes (a cache line size) and the wrapped read may suggest the data side memory system hit into the cache loaded by the mispredited instruction fetch.
Re-programming the MMU as per-patch makes the errors disappear.
Marek

On Fri, Jun 19, 2020 at 02:53:32PM +0200, marek.bykowski@gmail.com wrote:
From: Marek Bykowski marek.bykowski@gmail.com
If the location the ARM CPU is accessing is executable (translation table descriptor Execute-Never attribute bit cleared) then the ARM CPU fetches a number of instructions from that location all at the same time. For example, Cortex-A57 can source up to 128 bits per fetch depending on alignment.
If the CPU mispredicts to the Execute-Never region, it creates the memory fault but it actually never uses the instructions mispredicted. The CPU branches away elsewhere. So, as long as we program the MMU correctly these mispredictions will only affect the performance.
However if we fail programming so and the instruction fetch logic goes mispredict to non-instruction memory it may eventually perturb it, eg. corrupt the FIFO, or the control registers, load the unified cache the data side memory system hits into subsequently.
U-Boot adheres into attributing the device regions to Execute-Never but it actually fails doing so for data regions. Data as well as Device Regions should be Execute-Never.
This patch enables attributing data memory regions to Non-Executable, and code region to Executable, additionally to Read-Only. Read-Only ensures the code region is only Readable resulting in Instruction Abort, Permission Fault exception on a write.
To use the updated attributes the DDR memory of interest should be Non-Executable, it is by default Read-Write Access Permission as well with an example as follows:
static struct mm_region axxia_mem_map[] = { { .virt = 0x0UL, /* DDR */ .phys = 0x0UL, .size = 0x400000000ULL, .attrs = PTE_BLOCK_MEMTYPE(MT_NORMAL) | PTE_BLOCK_INNER_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { .virt = AXXIA_CCN_512_BASE, .phys = AXXIA_CCN_512_BASE, .size = AXXIA_CCN_512_SIZE, .attrs = PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { .virt = AXI_MMAP_BASE, .phys = AXI_MMAP_BASE, .size = AXI_MMAP_SIZE, .attrs = PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { .virt = AXI_PERIPH_BASE, .phys = AXI_PERIPH_BASE, .size = AXI_PERIPH_SIZE, .attrs = PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { /* List terminator */ 0, } };
add_text_map() routine in cache_v8.c locates then the code region in it and modifies its attributes to Executable, Read-Only. The HW Debugger views the Memory Map set then as:
EL2N:0x00000000-0x3FD35FFF NP:0x00000000-0x3FD35FFF Normal RW C S XN EL2N:0x3FD36000-0x3FD86FFF NP:0x3FD36000-0x3FD86FFF Normal RO C S EL2N:0x3FD87000-0x3FFFFFFFF NP:0x3FD87000-0x3FFFFFFFF Normal RW C S XN EL2N:0x400000000-0x3FFFFFFFFF <unmapped> EL2N:0x4000000000-0x403FFFFFFF NP:0x4000000000-0x403FFFFFFF Device-nGnRnE RW NC XN EL2N:0x4040000000-0x7FFFFFFFFF <unmapped> EL2N:0x8000000000-0x803FFFFFFF NP:0x8000000000-0x803FFFFFFF Device-nGnRnE RW NC XN EL2N:0x8040000000-0x807FFFFFFF <unmapped> EL2N:0x8080000000-0x80BFFFFFFF NP:0x8080000000-0x80BFFFFFFF Device-nGnRnE RW NC XN EL2N:0x80C0000000-0xFFFFFFFFFF <unmapped>
where: C, NC is Cacheable, Non-Cacheable respectively, S, NS - Shareable and Non-Shareable, XN - Execute-Never, if XN isn't present it is Executable.
Shareability of the Device is always treated as Outer Shareable, regardless of the attributes, therefore the Shareability for the Device Regions is not mentioned here.
Signed-off-by: Marek Bykowski marek.bykowski@gmail.com
Patch v1 -> v2: I have only changed the description of the commit as incorrectly address the Shareability domain there.
arch/arm/cpu/armv8/cache_v8.c | 21 +++++++++++++++++++++ arch/arm/cpu/armv8/u-boot.lds | 6 ++++-- arch/arm/include/asm/armv8/mmu.h | 6 ++++++ arch/arm/lib/sections.c | 1 + include/asm-generic/sections.h | 1 + 5 files changed, 33 insertions(+), 2 deletions(-)
diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c index c1a08fb4ac..cb25fc0c8a 100644 --- a/arch/arm/cpu/armv8/cache_v8.c +++ b/arch/arm/cpu/armv8/cache_v8.c @@ -11,6 +11,7 @@ #include <cpu_func.h> #include <asm/system.h> #include <asm/armv8/mmu.h> +#include <asm/sections.h>
DECLARE_GLOBAL_DATA_PTR;
@@ -361,6 +362,24 @@ __weak u64 get_page_table_size(void) return size; }
+__weak void add_text_map(void) +{
- if (!(gd->flags & GD_FLG_RELOC))
return;
- /* Text is always XN=0, read-only region. */
- struct mm_region text = {
.virt = (unsigned long)__image_copy_start,
.phys = (unsigned long)__image_copy_start,
.size = (unsigned long)__text_end - (unsigned long)__image_copy_start,
.attrs = PTE_BLOCK_MEMTYPE(MT_NORMAL) |
PTE_BLOCK_INNER_SHARE |
PTE_BLOCK_AP_RO
- };
- add_map(&text);
+}
void setup_pgtables(void) { int i; @@ -378,6 +397,8 @@ void setup_pgtables(void) /* Now add all MMU table entries one after another to the table */ for (i = 0; mem_map[i].size || mem_map[i].attrs; i++) add_map(&mem_map[i]);
- add_text_map();
}
static void setup_all_pgtables(void) diff --git a/arch/arm/cpu/armv8/u-boot.lds b/arch/arm/cpu/armv8/u-boot.lds index 2554980595..20f98ae74d 100644 --- a/arch/arm/cpu/armv8/u-boot.lds +++ b/arch/arm/cpu/armv8/u-boot.lds @@ -20,8 +20,8 @@ SECTIONS #endif . = 0x00000000;
- . = ALIGN(8);
- .text :
- /* Align .text to the page size */
- .text ALIGN(0x1000): { *(.__image_copy_start) CPUDIR/start.o (.text*)
@@ -39,6 +39,8 @@ SECTIONS .text_rest : { *(.text*)
. = NEXT(0x1000);
}KEEP(*(.__text_end))
#ifdef CONFIG_ARMV8_PSCI diff --git a/arch/arm/include/asm/armv8/mmu.h b/arch/arm/include/asm/armv8/mmu.h index 4a573208df..4bd15c7e08 100644 --- a/arch/arm/include/asm/armv8/mmu.h +++ b/arch/arm/include/asm/armv8/mmu.h @@ -58,6 +58,12 @@ */ #define PTE_BLOCK_MEMTYPE(x) ((x) << 2) #define PTE_BLOCK_NS (1 << 5) +/*
- AP[1] bit is ignored by hardware and is
- treated as if it was One in EL2/EL3
- */
+#define PTE_BLOCK_AP_RO (1 << 7) +#define PTE_BLOCK_AP_RW (0 << 7) #define PTE_BLOCK_NON_SHARE (0 << 8) #define PTE_BLOCK_OUTER_SHARE (2 << 8) #define PTE_BLOCK_INNER_SHARE (3 << 8) diff --git a/arch/arm/lib/sections.c b/arch/arm/lib/sections.c index 3bb2d8468c..a76f2d3049 100644 --- a/arch/arm/lib/sections.c +++ b/arch/arm/lib/sections.c @@ -18,6 +18,7 @@
- aliasing warnings.
*/
+char __text_end[0] __attribute__((section(".__text_end"))); char __bss_start[0] __attribute__((section(".__bss_start"))); char __bss_end[0] __attribute__((section(".__bss_end"))); char __image_copy_start[0] __attribute__((section(".__image_copy_start"))); diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h index 17a31ec788..1487e16432 100644 --- a/include/asm-generic/sections.h +++ b/include/asm-generic/sections.h @@ -70,6 +70,7 @@ extern void _start(void); */ #ifdef CONFIG_ARM
+extern char __text_end[]; extern char __bss_start[]; extern char __bss_end[]; extern char __image_copy_start[];
This causes failures on qemu_arm64: https://gitlab.denx.de/u-boot/u-boot/-/jobs/126529 and xilinx_versal_virt: https://gitlab.denx.de/u-boot/u-boot/-/jobs/126530

I'm on holidays right now, seeing the logs in my mobile I cannot infer much.
What's the memory map after the patch? Intentional? I mean the code/instruction region is read-only and executable and else read-write and non-executable?
May it happen that the tests try to write to instruction, read-only region? If failures then than it is on purpose.
If anything else needing a grater attention I'm sorry but it will have to probably wait until I'm back from holidays, about two weeks from now.
But thanks for looking through it. Our customers is keen on seeing the change in the Upstream.
I'm pretty sure we can work it out as all of the ARM SW suits, namely arm-trusted-firmware and ARM Linux kernel are using the notion of executable/non-executable and read-only/read-write.
Marek
On Sat, Jul 18, 2020, 5:34 PM Tom Rini trini@konsulko.com wrote:
On Fri, Jun 19, 2020 at 02:53:32PM +0200, marek.bykowski@gmail.com wrote:
From: Marek Bykowski marek.bykowski@gmail.com
If the location the ARM CPU is accessing is executable (translation table descriptor Execute-Never attribute bit cleared) then the ARM CPU fetches a number of instructions from that location all at the same time. For example, Cortex-A57 can source up to 128 bits per fetch depending on alignment.
If the CPU mispredicts to the Execute-Never region, it creates the memory fault but it actually never uses the instructions mispredicted. The CPU branches away elsewhere. So, as long as we program the MMU correctly these mispredictions will only affect the performance.
However if we fail programming so and the instruction fetch logic goes mispredict to non-instruction memory it may eventually perturb it, eg. corrupt the FIFO, or the control registers, load the unified cache the data side memory system hits into subsequently.
U-Boot adheres into attributing the device regions to Execute-Never but it actually fails doing so for data regions. Data as well as Device
Regions
should be Execute-Never.
This patch enables attributing data memory regions to Non-Executable, and code region to Executable, additionally to Read-Only. Read-Only
ensures
the code region is only Readable resulting in Instruction Abort,
Permission
Fault exception on a write.
To use the updated attributes the DDR memory of interest should be Non-Executable, it is by default Read-Write Access Permission as well with an example as follows:
static struct mm_region axxia_mem_map[] = { { .virt = 0x0UL, /* DDR */ .phys = 0x0UL, .size = 0x400000000ULL, .attrs = PTE_BLOCK_MEMTYPE(MT_NORMAL) | PTE_BLOCK_INNER_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { .virt = AXXIA_CCN_512_BASE, .phys = AXXIA_CCN_512_BASE, .size = AXXIA_CCN_512_SIZE, .attrs = PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { .virt = AXI_MMAP_BASE, .phys = AXI_MMAP_BASE, .size = AXI_MMAP_SIZE, .attrs = PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { .virt = AXI_PERIPH_BASE, .phys = AXI_PERIPH_BASE, .size = AXI_PERIPH_SIZE, .attrs = PTE_BLOCK_MEMTYPE(MT_DEVICE_NGNRNE) | PTE_BLOCK_NON_SHARE | PTE_BLOCK_PXN | PTE_BLOCK_UXN }, { /* List terminator */ 0, } };
add_text_map() routine in cache_v8.c locates then the code region in it
and
modifies its attributes to Executable, Read-Only. The HW Debugger views the Memory Map set then as:
EL2N:0x00000000-0x3FD35FFF NP:0x00000000-0x3FD35FFF Normal RW C S XN EL2N:0x3FD36000-0x3FD86FFF NP:0x3FD36000-0x3FD86FFF Normal RO C S EL2N:0x3FD87000-0x3FFFFFFFF NP:0x3FD87000-0x3FFFFFFFF Normal RW C S XN EL2N:0x400000000-0x3FFFFFFFFF <unmapped> EL2N:0x4000000000-0x403FFFFFFF NP:0x4000000000-0x403FFFFFFF
Device-nGnRnE RW NC XN
EL2N:0x4040000000-0x7FFFFFFFFF <unmapped> EL2N:0x8000000000-0x803FFFFFFF NP:0x8000000000-0x803FFFFFFF
Device-nGnRnE RW NC XN
EL2N:0x8040000000-0x807FFFFFFF <unmapped> EL2N:0x8080000000-0x80BFFFFFFF NP:0x8080000000-0x80BFFFFFFF
Device-nGnRnE RW NC XN
EL2N:0x80C0000000-0xFFFFFFFFFF <unmapped>
where: C, NC is Cacheable, Non-Cacheable respectively, S, NS - Shareable and Non-Shareable, XN - Execute-Never, if XN isn't present it is Executable.
Shareability of the Device is always treated as Outer Shareable, regardless of the attributes, therefore the Shareability for the Device Regions is not mentioned here.
Signed-off-by: Marek Bykowski marek.bykowski@gmail.com
Patch v1 -> v2: I have only changed the description of the commit as incorrectly address the Shareability domain there.
arch/arm/cpu/armv8/cache_v8.c | 21 +++++++++++++++++++++ arch/arm/cpu/armv8/u-boot.lds | 6 ++++-- arch/arm/include/asm/armv8/mmu.h | 6 ++++++ arch/arm/lib/sections.c | 1 + include/asm-generic/sections.h | 1 + 5 files changed, 33 insertions(+), 2 deletions(-)
diff --git a/arch/arm/cpu/armv8/cache_v8.c
b/arch/arm/cpu/armv8/cache_v8.c
index c1a08fb4ac..cb25fc0c8a 100644 --- a/arch/arm/cpu/armv8/cache_v8.c +++ b/arch/arm/cpu/armv8/cache_v8.c @@ -11,6 +11,7 @@ #include <cpu_func.h> #include <asm/system.h> #include <asm/armv8/mmu.h> +#include <asm/sections.h>
DECLARE_GLOBAL_DATA_PTR;
@@ -361,6 +362,24 @@ __weak u64 get_page_table_size(void) return size; }
+__weak void add_text_map(void) +{
if (!(gd->flags & GD_FLG_RELOC))
return;
/* Text is always XN=0, read-only region. */
struct mm_region text = {
.virt = (unsigned long)__image_copy_start,
.phys = (unsigned long)__image_copy_start,
.size = (unsigned long)__text_end - (unsigned
long)__image_copy_start,
.attrs = PTE_BLOCK_MEMTYPE(MT_NORMAL) |
PTE_BLOCK_INNER_SHARE |
PTE_BLOCK_AP_RO
};
add_map(&text);
+}
void setup_pgtables(void) { int i; @@ -378,6 +397,8 @@ void setup_pgtables(void) /* Now add all MMU table entries one after another to the table */ for (i = 0; mem_map[i].size || mem_map[i].attrs; i++) add_map(&mem_map[i]);
add_text_map();
}
static void setup_all_pgtables(void) diff --git a/arch/arm/cpu/armv8/u-boot.lds
b/arch/arm/cpu/armv8/u-boot.lds
index 2554980595..20f98ae74d 100644 --- a/arch/arm/cpu/armv8/u-boot.lds +++ b/arch/arm/cpu/armv8/u-boot.lds @@ -20,8 +20,8 @@ SECTIONS #endif . = 0x00000000;
. = ALIGN(8);
.text :
/* Align .text to the page size */
.text ALIGN(0x1000): { *(.__image_copy_start) CPUDIR/start.o (.text*)
@@ -39,6 +39,8 @@ SECTIONS .text_rest : { *(.text*)
. = NEXT(0x1000);
KEEP(*(.__text_end)) }
#ifdef CONFIG_ARMV8_PSCI diff --git a/arch/arm/include/asm/armv8/mmu.h
b/arch/arm/include/asm/armv8/mmu.h
index 4a573208df..4bd15c7e08 100644 --- a/arch/arm/include/asm/armv8/mmu.h +++ b/arch/arm/include/asm/armv8/mmu.h @@ -58,6 +58,12 @@ */ #define PTE_BLOCK_MEMTYPE(x) ((x) << 2) #define PTE_BLOCK_NS (1 << 5) +/*
- AP[1] bit is ignored by hardware and is
- treated as if it was One in EL2/EL3
- */
+#define PTE_BLOCK_AP_RO (1 << 7) +#define PTE_BLOCK_AP_RW (0 << 7) #define PTE_BLOCK_NON_SHARE (0 << 8) #define PTE_BLOCK_OUTER_SHARE (2 << 8) #define PTE_BLOCK_INNER_SHARE (3 << 8) diff --git a/arch/arm/lib/sections.c b/arch/arm/lib/sections.c index 3bb2d8468c..a76f2d3049 100644 --- a/arch/arm/lib/sections.c +++ b/arch/arm/lib/sections.c @@ -18,6 +18,7 @@
- aliasing warnings.
*/
+char __text_end[0] __attribute__((section(".__text_end"))); char __bss_start[0] __attribute__((section(".__bss_start"))); char __bss_end[0] __attribute__((section(".__bss_end"))); char __image_copy_start[0]
__attribute__((section(".__image_copy_start")));
diff --git a/include/asm-generic/sections.h
b/include/asm-generic/sections.h
index 17a31ec788..1487e16432 100644 --- a/include/asm-generic/sections.h +++ b/include/asm-generic/sections.h @@ -70,6 +70,7 @@ extern void _start(void); */ #ifdef CONFIG_ARM
+extern char __text_end[]; extern char __bss_start[]; extern char __bss_end[]; extern char __image_copy_start[];
This causes failures on qemu_arm64: https://gitlab.denx.de/u-boot/u-boot/-/jobs/126529 and xilinx_versal_virt: https://gitlab.denx.de/u-boot/u-boot/-/jobs/126530
-- Tom

On Tue, Jul 21, 2020 at 02:33:03AM +0200, Marek wrote:
I'm on holidays right now, seeing the logs in my mobile I cannot infer much.
What's the memory map after the patch? Intentional? I mean the code/instruction region is read-only and executable and else read-write and non-executable?
May it happen that the tests try to write to instruction, read-only region? If failures then than it is on purpose.
If anything else needing a grater attention I'm sorry but it will have to probably wait until I'm back from holidays, about two weeks from now.
But thanks for looking through it. Our customers is keen on seeing the change in the Upstream.
I'm pretty sure we can work it out as all of the ARM SW suits, namely arm-trusted-firmware and ARM Linux kernel are using the notion of executable/non-executable and read-only/read-write.
I think this will have to wait for you to return from holidays and investigate more. All I know is that the QEMU models are working as expected prior to the patch and fail with it.

I think this will have to wait for you to return from holidays and investigate more. All I know is that the QEMU models are working as expected prior to the patch and fail with it.
-- Tom
From the esr (Exception Syndrome Register) = 0x9600004F:
- EC (Exception Class) -> it is an exception from data abort - from ISS (Instruction specific syndrome) -> WnR field set = Abort caused by Write (not Read), Data Fault Status Code = Permission fault, third level
It looks like it results as a Write attempted to a read-only region. Are you sure the tests are devised/designed so that they are not trying to write to a read-only region?
@Tom Probably I could send you a patch with debugging in that will print out all the necessary information before reaching a prompt (or after a command). Would you agree to apply and run it?
What I'm really looking for is the address map with the attributes. Sth around it: EL2N:0x00000000-0x3FD35FFF NP:0x00000000-0x3FD35FFF Normal RW C S XN
Marek

On Wed, Aug 19, 2020 at 03:17:24PM +0200, Marek Bykowski wrote:
I think this will have to wait for you to return from holidays and investigate more. All I know is that the QEMU models are working as expected prior to the patch and fail with it.
-- Tom
From the esr (Exception Syndrome Register) = 0x9600004F:
- EC (Exception Class) -> it is an exception from data abort
- from ISS (Instruction specific syndrome) -> WnR field set = Abort caused by Write (not Read), Data Fault Status Code = Permission fault, third level
It looks like it results as a Write attempted to a read-only region. Are you sure the tests are devised/designed so that they are not trying to write to a read-only region?
@Tom Probably I could send you a patch with debugging in that will print out all the necessary information before reaching a prompt (or after a command). Would you agree to apply and run it?
What I'm really looking for is the address map with the attributes. Sth around it: EL2N:0x00000000-0x3FD35FFF NP:0x00000000-0x3FD35FFF Normal RW C S XN
If I recall this failure correctly, it's that qemu fails to boot, with this applied, and that's where things blow up as far as the testing goes. Maybe there's something wrong in board/emulation/qemu-arm/qemu-arm.c ?
participants (4)
-
Marek
-
Marek Bykowski
-
marek.bykowski@gmail.com
-
Tom Rini