[PATCH 0/5] Fixes for running U-boot under QEMU/KVM

This series fixes a number of issues that exist in the QEMU/mach-virt port of u-boot, and that prevent it from executing correctly under virtualization (as opposed to TCG emulation)
As the Linux EFI subsystem maintainer, I am looking to increase test coverage for the EFI related changes that are under development for Linux, and one of the things I plan to do is start using U-boot as test firmware for boot testing. This can be done under TCG emulation, but given how loosely TCG implements the architecture, it is better to test under virtualization as well.
With these changes applied, u-boot can boot Linux in EFI mode under KVM.
Cc: Tom Rini trini@konsulko.com Cc: Sughosh Ganu sughosh.ganu@linaro.org Cc: Heinrich Schuchardt xypron.glpk@gmx.de
Ard Biesheuvel (5): arm: enable allocate-on-read for LPAE's DCACHE_WRITEBACK arm: qemu: enable LPAE on 32-bit arm: qemu: implement enable_caches() arm: qemu: disable the EFI workaround for older GRUB arm: qemu: override flash accessors to use virtualizable instructions
arch/arm/include/asm/system.h | 2 +- board/emulation/qemu-arm/qemu-arm.c | 62 ++++++++++++++++++++ configs/qemu_arm_defconfig | 2 + include/configs/qemu-arm.h | 1 + 4 files changed, 66 insertions(+), 1 deletion(-)

The LPAE version of DCACHE_WRITEBACK is currently defined as no-allocate for both reads and writes, which deviates from the non-LPAE definition, and mostly defeats the purpose of enabling the caches in the first place.
So align LPAE with !LPAE, and enable allocate-on-read.
Signed-off-by: Ard Biesheuvel ardb@kernel.org --- arch/arm/include/asm/system.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h index 7a40b56acdca..21b26557d28b 100644 --- a/arch/arm/include/asm/system.h +++ b/arch/arm/include/asm/system.h @@ -445,7 +445,7 @@ static inline void set_dacr(unsigned int val) * Memory types */ #define MEMORY_ATTRIBUTES ((0x00 << (0 * 8)) | (0x88 << (1 * 8)) | \ - (0xcc << (2 * 8)) | (0xff << (3 * 8))) + (0xee << (2 * 8)) | (0xff << (3 * 8)))
/* options available for data cache on each page */ enum dcache_option {

On 6/6/20 7:15 PM, Ard Biesheuvel wrote:
The LPAE version of DCACHE_WRITEBACK is currently defined as no-allocate for both reads and writes, which deviates from the non-LPAE definition, and mostly defeats the purpose of enabling the caches in the first place.
So align LPAE with !LPAE, and enable allocate-on-read.
Hello Ard,
thanks for analyzing why booting Linux on QEMU fails in some scenarios.
Do you know where in U-Boot is the value for !LPAE is defined?
Signed-off-by: Ard Biesheuvel ardb@kernel.org
arch/arm/include/asm/system.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h index 7a40b56acdca..21b26557d28b 100644 --- a/arch/arm/include/asm/system.h +++ b/arch/arm/include/asm/system.h @@ -445,7 +445,7 @@ static inline void set_dacr(unsigned int val)
- Memory types
*/
To me the lines below look like black magic.
In the comment above the definition, please, add a reference explaining where the values are defined and a comment explaining why the actual values are chosen.
Maybe this could be a starting point for the description:
"This constant is used define memory attribute encodings in a Long-descriptor format translation table entry for stage 1 translations. It is used to set the Memory Attribute Indirection Registers MAIR and HMAIR. For details see [1,2].
[1] MAIR0, Memory Attribute Indirection Register 0
https://developer.arm.com/docs/ddi0595/b/aarch32-system-registers/mair0/a/DD... [2] HMAIR0, Hyp Memory Attribute Indirection Register 0
https://developer.arm.com/docs/ddi0595/b/aarch32-system-registers/hmair0 "
Best regards
Heinrich
#define MEMORY_ATTRIBUTES ((0x00 << (0 * 8)) | (0x88 << (1 * 8)) | \
(0xcc << (2 * 8)) | (0xff << (3 * 8)))
(0xee << (2 * 8)) | (0xff << (3 * 8)))
/* options available for data cache on each page */ enum dcache_option {

On Sat, 6 Jun 2020 at 22:14, Heinrich Schuchardt xypron.glpk@gmx.de wrote:
On 6/6/20 7:15 PM, Ard Biesheuvel wrote:
The LPAE version of DCACHE_WRITEBACK is currently defined as no-allocate for both reads and writes, which deviates from the non-LPAE definition, and mostly defeats the purpose of enabling the caches in the first place.
So align LPAE with !LPAE, and enable allocate-on-read.
Hello Ard,
thanks for analyzing why booting Linux on QEMU fails in some scenarios.
Do you know where in U-Boot is the value for !LPAE is defined?
Non-LPAE ARMV7A has (in arch/arm/include/asm/system.h)
DCACHE_WRITETHROUGH = DCACHE_OFF | TTB_SECT_C_MASK, DCACHE_WRITEBACK = DCACHE_WRITETHROUGH | TTB_SECT_B_MASK, DCACHE_WRITEALLOC = DCACHE_WRITEBACK | TTB_SECT_TEX(1),
and so DCACHE_WRITEBACK has the C and B bits set in the block descriptor, and the TEX field set to 0x0
G5.7.2 in the ARM ARM (DDI0487E.a) describes this as
Outer and Inner Write-Back, Read-Allocate No Write-Allocate
DCACHE_WRITEALLOC has the C and B bits set in the block descriptor, and the TEX field set to 0x1, which is described as
Outer and Inner Write-Back, Read-Allocate Write-Allocate
Signed-off-by: Ard Biesheuvel ardb@kernel.org
arch/arm/include/asm/system.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h index 7a40b56acdca..21b26557d28b 100644 --- a/arch/arm/include/asm/system.h +++ b/arch/arm/include/asm/system.h @@ -445,7 +445,7 @@ static inline void set_dacr(unsigned int val)
- Memory types
*/
To me the lines below look like black magic.
In the comment above the definition, please, add a reference explaining where the values are defined and a comment explaining why the actual values are chosen.
Maybe this could be a starting point for the description:
"This constant is used define memory attribute encodings in a Long-descriptor format translation table entry for stage 1 translations. It is used to set the Memory Attribute Indirection Registers MAIR and HMAIR. For details see [1,2].
[1] MAIR0, Memory Attribute Indirection Register 0
https://developer.arm.com/docs/ddi0595/b/aarch32-system-registers/mair0/a/DD... [2] HMAIR0, Hyp Memory Attribute Indirection Register 0
https://developer.arm.com/docs/ddi0595/b/aarch32-system-registers/hmair0 "
Better refer to the ARM ARM for the A profile here (not R). [DDI0487E.a]
So the memory types are indexed: four fields of MAIR are populated with the four chosen memory types:
[0] Device-nGnrnE [1] Outer and Inner Write-Through, Read-Allocate No Write-Allocate [2] Outer and Inner Write-Back, Read-Allocate No Write-Allocate [3] Outer and Inner Write-Back, Read-Allocate Write-Allocate
and the enum just selects one of these fields:
DCACHE_OFF = TTB_SECT | TTB_SECT_MAIR(0) | TTB_SECT_XN_MASK, DCACHE_WRITETHROUGH = TTB_SECT | TTB_SECT_MAIR(1), DCACHE_WRITEBACK = TTB_SECT | TTB_SECT_MAIR(2), DCACHE_WRITEALLOC = TTB_SECT | TTB_SECT_MAIR(3),
BTW it seems DCACHE_WRITETHROUGH is also incorrect: this should be 0xaa for read-allocate as well.
#define MEMORY_ATTRIBUTES ((0x00 << (0 * 8)) | (0x88 << (1 * 8)) | \
(0xcc << (2 * 8)) | (0xff << (3 * 8)))
(0xee << (2 * 8)) | (0xff << (3 * 8)))
/* options available for data cache on each page */ enum dcache_option {

On 6/7/20 1:17 AM, Ard Biesheuvel wrote:
On Sat, 6 Jun 2020 at 22:14, Heinrich Schuchardt xypron.glpk@gmx.de wrote:
On 6/6/20 7:15 PM, Ard Biesheuvel wrote:
The LPAE version of DCACHE_WRITEBACK is currently defined as no-allocate for both reads and writes, which deviates from the non-LPAE definition, and mostly defeats the purpose of enabling the caches in the first place.
So align LPAE with !LPAE, and enable allocate-on-read.
Hello Ard,
thanks for analyzing why booting Linux on QEMU fails in some scenarios.
Do you know where in U-Boot is the value for !LPAE is defined?
Non-LPAE ARMV7A has (in arch/arm/include/asm/system.h)
DCACHE_WRITETHROUGH = DCACHE_OFF | TTB_SECT_C_MASK, DCACHE_WRITEBACK = DCACHE_WRITETHROUGH | TTB_SECT_B_MASK, DCACHE_WRITEALLOC = DCACHE_WRITEBACK | TTB_SECT_TEX(1),
and so DCACHE_WRITEBACK has the C and B bits set in the block descriptor, and the TEX field set to 0x0
G5.7.2 in the ARM ARM (DDI0487E.a) describes this as
Outer and Inner Write-Back, Read-Allocate No Write-Allocate
DCACHE_WRITEALLOC has the C and B bits set in the block descriptor, and the TEX field set to 0x1, which is described as
Outer and Inner Write-Back, Read-Allocate Write-Allocate
Signed-off-by: Ard Biesheuvel ardb@kernel.org
arch/arm/include/asm/system.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h index 7a40b56acdca..21b26557d28b 100644 --- a/arch/arm/include/asm/system.h +++ b/arch/arm/include/asm/system.h @@ -445,7 +445,7 @@ static inline void set_dacr(unsigned int val)
- Memory types
*/
To me the lines below look like black magic.
In the comment above the definition, please, add a reference explaining where the values are defined and a comment explaining why the actual values are chosen.
Maybe this could be a starting point for the description:
"This constant is used define memory attribute encodings in a Long-descriptor format translation table entry for stage 1 translations. It is used to set the Memory Attribute Indirection Registers MAIR and HMAIR. For details see [1,2].
[1] MAIR0, Memory Attribute Indirection Register 0
https://developer.arm.com/docs/ddi0595/b/aarch32-system-registers/mair0/a/DD... [2] HMAIR0, Hyp Memory Attribute Indirection Register 0
https://developer.arm.com/docs/ddi0595/b/aarch32-system-registers/hmair0 "
Better refer to the ARM ARM for the A profile here (not R). [DDI0487E.a]
So the memory types are indexed: four fields of MAIR are populated with the four chosen memory types:
[0] Device-nGnrnE [1] Outer and Inner Write-Through, Read-Allocate No Write-Allocate [2] Outer and Inner Write-Back, Read-Allocate No Write-Allocate [3] Outer and Inner Write-Back, Read-Allocate Write-Allocate
and the enum just selects one of these fields:
DCACHE_OFF = TTB_SECT | TTB_SECT_MAIR(0) | TTB_SECT_XN_MASK, DCACHE_WRITETHROUGH = TTB_SECT | TTB_SECT_MAIR(1), DCACHE_WRITEBACK = TTB_SECT | TTB_SECT_MAIR(2), DCACHE_WRITEALLOC = TTB_SECT | TTB_SECT_MAIR(3),
BTW it seems DCACHE_WRITETHROUGH is also incorrect: this should be 0xaa for read-allocate as well.
This description would give a good starting point for other developers when supplied as a comment for MEMORY_ATTRIBUTES.
Best regards
Heinrich
#define MEMORY_ATTRIBUTES ((0x00 << (0 * 8)) | (0x88 << (1 * 8)) | \
(0xcc << (2 * 8)) | (0xff << (3 * 8)))
(0xee << (2 * 8)) | (0xff << (3 * 8)))
/* options available for data cache on each page */ enum dcache_option {

QEMU's mach-virt machine only supports selecting CPU models that implement the virtualization extensions, and are therefore guaranteed to support LPAE as well.
Initially, QEMU would not allow emulating these CPUs running in HYP mode (or EL2, for AArch64), but today, it also contains a complete implementation of the virtualization extensions themselves.
This means we could be running U-Boot in HYP mode, in which case the LPAE long descriptor page table format is the only format that is supported. If we are not running in HYP mode, we can use either.
So let's enable CONFIG_ARMV7_LPAE for qemu_arm_defconfig so that we get the best support for running with the MMU and caches enabled at any privilege level.
Signed-off-by: Ard Biesheuvel ardb@kernel.org --- configs/qemu_arm_defconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/configs/qemu_arm_defconfig b/configs/qemu_arm_defconfig index a8473988bd76..75bdce7708c7 100644 --- a/configs/qemu_arm_defconfig +++ b/configs/qemu_arm_defconfig @@ -1,5 +1,6 @@ CONFIG_ARM=y CONFIG_ARM_SMCCC=y +CONFIG_ARMV7_LPAE=y CONFIG_ARCH_QEMU=y CONFIG_ENV_SIZE=0x40000 CONFIG_ENV_SECT_SIZE=0x40000

On 6/6/20 7:15 PM, Ard Biesheuvel wrote:
QEMU's mach-virt machine only supports selecting CPU models that implement the virtualization extensions, and are therefore guaranteed to support LPAE as well.
I wonder why qemu-system-arm -machine virt -cpu help lists cortex-a9 (which is not LPAE enabled).
But when I try to use it I get qemu-system-arm: mach-virt: CPU type cortex-a9-arm-cpu not supported This looks like a missing feature in QEMU.
The default CPU for machine=virt is arm,cortex-a15.
Acked-by: Heinrich Schuchardt <xypron.glpk@gmx.de.
Initially, QEMU would not allow emulating these CPUs running in HYP mode (or EL2, for AArch64), but today, it also contains a complete implementation of the virtualization extensions themselves.
This means we could be running U-Boot in HYP mode, in which case the LPAE long descriptor page table format is the only format that is supported. If we are not running in HYP mode, we can use either.
So let's enable CONFIG_ARMV7_LPAE for qemu_arm_defconfig so that we get the best support for running with the MMU and caches enabled at any privilege level.
Signed-off-by: Ard Biesheuvel ardb@kernel.org
configs/qemu_arm_defconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/configs/qemu_arm_defconfig b/configs/qemu_arm_defconfig index a8473988bd76..75bdce7708c7 100644 --- a/configs/qemu_arm_defconfig +++ b/configs/qemu_arm_defconfig @@ -1,5 +1,6 @@ CONFIG_ARM=y CONFIG_ARM_SMCCC=y +CONFIG_ARMV7_LPAE=y CONFIG_ARCH_QEMU=y CONFIG_ENV_SIZE=0x40000 CONFIG_ENV_SECT_SIZE=0x40000

On 6/6/20 10:32 PM, Heinrich Schuchardt wrote:
On 6/6/20 7:15 PM, Ard Biesheuvel wrote:
QEMU's mach-virt machine only supports selecting CPU models that implement the virtualization extensions, and are therefore guaranteed to support LPAE as well.
I wonder why qemu-system-arm -machine virt -cpu help lists cortex-a9 (which is not LPAE enabled).
But when I try to use it I get qemu-system-arm: mach-virt: CPU type cortex-a9-arm-cpu not supported This looks like a missing feature in QEMU.
The default CPU for machine=virt is arm,cortex-a15.
Acked-by: Heinrich Schuchardt <xypron.glpk@gmx.de.
Initially, QEMU would not allow emulating these CPUs running in HYP mode (or EL2, for AArch64), but today, it also contains a complete implementation of the virtualization extensions themselves.
This means we could be running U-Boot in HYP mode, in which case the LPAE long descriptor page table format is the only format that is supported. If we are not running in HYP mode, we can use either.
So let's enable CONFIG_ARMV7_LPAE for qemu_arm_defconfig so that we get the best support for running with the MMU and caches enabled at any privilege level.
Signed-off-by: Ard Biesheuvel ardb@kernel.org
You missed to CC the maintainer of QEMU ARM 'VIRT' BOARD. - We have scripts/get_maintainer.pl to find the maintainers.
Cc: Tuomas Tynkkynen tuomas.tynkkynen@iki.fi
Best regards
Heinrich
configs/qemu_arm_defconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/configs/qemu_arm_defconfig b/configs/qemu_arm_defconfig index a8473988bd76..75bdce7708c7 100644 --- a/configs/qemu_arm_defconfig +++ b/configs/qemu_arm_defconfig @@ -1,5 +1,6 @@ CONFIG_ARM=y CONFIG_ARM_SMCCC=y +CONFIG_ARMV7_LPAE=y CONFIG_ARCH_QEMU=y CONFIG_ENV_SIZE=0x40000 CONFIG_ENV_SECT_SIZE=0x40000

On Sat, 6 Jun 2020 at 22:49, Heinrich Schuchardt xypron.glpk@gmx.de wrote:
On 6/6/20 10:32 PM, Heinrich Schuchardt wrote:
On 6/6/20 7:15 PM, Ard Biesheuvel wrote:
QEMU's mach-virt machine only supports selecting CPU models that implement the virtualization extensions, and are therefore guaranteed to support LPAE as well.
I wonder why qemu-system-arm -machine virt -cpu help lists cortex-a9 (which is not LPAE enabled).
But when I try to use it I get qemu-system-arm: mach-virt: CPU type cortex-a9-arm-cpu not supported This looks like a missing feature in QEMU.
This is not a missing feature. The virt board uses PSCI for powerdown and reset, and to bring up secondary cores. PSCI requires the HVC instruction, which is only available if the virt extensions are implemented.
So emulating CPUs without the virt extensions would require a replacement for PSCI to be implemented as well, which seems rather pointless to me.
The default CPU for machine=virt is arm,cortex-a15.
Acked-by: Heinrich Schuchardt <xypron.glpk@gmx.de.
Thanks.
Initially, QEMU would not allow emulating these CPUs running in HYP mode (or EL2, for AArch64), but today, it also contains a complete implementation of the virtualization extensions themselves.
This means we could be running U-Boot in HYP mode, in which case the LPAE long descriptor page table format is the only format that is supported. If we are not running in HYP mode, we can use either.
So let's enable CONFIG_ARMV7_LPAE for qemu_arm_defconfig so that we get the best support for running with the MMU and caches enabled at any privilege level.
Signed-off-by: Ard Biesheuvel ardb@kernel.org
You missed to CC the maintainer of QEMU ARM 'VIRT' BOARD. - We have scripts/get_maintainer.pl to find the maintainers.
Cc: Tuomas Tynkkynen tuomas.tynkkynen@iki.fi
Apologies. I will cc Tuomas for v2.

Am June 7, 2020 8:59:00 AM UTC schrieb Ard Biesheuvel ardb@kernel.org:
On Sat, 6 Jun 2020 at 22:49, Heinrich Schuchardt xypron.glpk@gmx.de wrote:
On 6/6/20 10:32 PM, Heinrich Schuchardt wrote:
On 6/6/20 7:15 PM, Ard Biesheuvel wrote:
QEMU's mach-virt machine only supports selecting CPU models that implement the virtualization extensions, and are therefore
guaranteed
to support LPAE as well.
I wonder why qemu-system-arm -machine virt -cpu help lists cortex-a9 (which is not LPAE enabled).
But when I try to use it I get qemu-system-arm: mach-virt: CPU type cortex-a9-arm-cpu not
supported
This looks like a missing feature in QEMU.
This is not a missing feature. The virt board uses PSCI for powerdown and reset, and to bring up secondary cores. PSCI requires the HVC instruction, which is only available if the virt extensions are implemented.
By missing feature I meant -cpu help output should be filtered according to the -machine value if provided.
So emulating CPUs without the virt extensions would require a replacement for PSCI to be implemented as well, which seems rather pointless to me.
The default CPU for machine=virt is arm,cortex-a15.
Acked-by: Heinrich Schuchardt <xypron.glpk@gmx.de.
Thanks.
Initially, QEMU would not allow emulating these CPUs running in
HYP
mode (or EL2, for AArch64), but today, it also contains a complete implementation of the virtualization extensions themselves.
This means we could be running U-Boot in HYP mode, in which case
the
LPAE long descriptor page table format is the only format that is supported. If we are not running in HYP mode, we can use either.
So let's enable CONFIG_ARMV7_LPAE for qemu_arm_defconfig so that
we
get the best support for running with the MMU and caches enabled
at
any privilege level.
Signed-off-by: Ard Biesheuvel ardb@kernel.org
You missed to CC the maintainer of QEMU ARM 'VIRT' BOARD. - We have scripts/get_maintainer.pl to find the maintainers.
Cc: Tuomas Tynkkynen tuomas.tynkkynen@iki.fi
Apologies. I will cc Tuomas for v2.

On Sun, 7 Jun 2020 at 13:03, Heinrich Schuchardt xypron.glpk@gmx.de wrote:
Am June 7, 2020 8:59:00 AM UTC schrieb Ard Biesheuvel ardb@kernel.org:
On Sat, 6 Jun 2020 at 22:49, Heinrich Schuchardt xypron.glpk@gmx.de wrote:
On 6/6/20 10:32 PM, Heinrich Schuchardt wrote:
On 6/6/20 7:15 PM, Ard Biesheuvel wrote:
QEMU's mach-virt machine only supports selecting CPU models that implement the virtualization extensions, and are therefore
guaranteed
to support LPAE as well.
I wonder why qemu-system-arm -machine virt -cpu help lists cortex-a9 (which is not LPAE enabled).
But when I try to use it I get qemu-system-arm: mach-virt: CPU type cortex-a9-arm-cpu not
supported
This looks like a missing feature in QEMU.
This is not a missing feature. The virt board uses PSCI for powerdown and reset, and to bring up secondary cores. PSCI requires the HVC instruction, which is only available if the virt extensions are implemented.
By missing feature I meant -cpu help output should be filtered according to the -machine value if provided.
Ah fair enough. Yes, that would be useful.
Unfortunately, ARM does not permit me to contribute to QEMU, so hopefully someone else can take this on.

Add an override for enable_caches to enable the I and D caches, along with the cached 1:1 mapping of all of DRAM. This is needed for running U-Boot under virtualization with QEMU/kvm.
Signed-off-by: Ard Biesheuvel ardb@kernel.org --- board/emulation/qemu-arm/qemu-arm.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/board/emulation/qemu-arm/qemu-arm.c b/board/emulation/qemu-arm/qemu-arm.c index 69e8ef46f1f5..1b0d543b93c1 100644 --- a/board/emulation/qemu-arm/qemu-arm.c +++ b/board/emulation/qemu-arm/qemu-arm.c @@ -4,6 +4,7 @@ */
#include <common.h> +#include <cpu_func.h> #include <dm.h> #include <fdtdec.h> #include <init.h> @@ -94,6 +95,12 @@ void *board_fdt_blob_setup(void) return (void *)CONFIG_SYS_SDRAM_BASE; }
+void enable_caches(void) +{ + icache_enable(); + dcache_enable(); +} + #if defined(CONFIG_EFI_RNG_PROTOCOL) #include <efi_loader.h> #include <efi_rng.h>

On 6/6/20 7:15 PM, Ard Biesheuvel wrote:
Add an override for enable_caches to enable the I and D caches, along with the cached 1:1 mapping of all of DRAM. This is needed for running U-Boot under virtualization with QEMU/kvm.
Signed-off-by: Ard Biesheuvel ardb@kernel.org
board/emulation/qemu-arm/qemu-arm.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/board/emulation/qemu-arm/qemu-arm.c b/board/emulation/qemu-arm/qemu-arm.c index 69e8ef46f1f5..1b0d543b93c1 100644 --- a/board/emulation/qemu-arm/qemu-arm.c +++ b/board/emulation/qemu-arm/qemu-arm.c @@ -4,6 +4,7 @@ */
#include <common.h> +#include <cpu_func.h> #include <dm.h> #include <fdtdec.h> #include <init.h> @@ -94,6 +95,12 @@ void *board_fdt_blob_setup(void) return (void *)CONFIG_SYS_SDRAM_BASE; }
+void enable_caches(void) +{
icache_enable();
dcache_enable();
+}
For other ARM architectures I have seen:
int arch_cpu_init(void) { icache_enable(); return 0; }
void enable_caches(void) { dcache_enable(); }
Some boards have
if (!icache_status()) icache_enable();
others
#if !CONFIG_IS_ENABLED(SYS_ICACHE_OFF) icache_enable(); #endif
Tom could you, please, advice what is the correct way to do it.
Best regards
Heinrich
#if defined(CONFIG_EFI_RNG_PROTOCOL) #include <efi_loader.h> #include <efi_rng.h>

On Sat, Jun 06, 2020 at 10:50:59PM +0200, Heinrich Schuchardt wrote:
On 6/6/20 7:15 PM, Ard Biesheuvel wrote:
Add an override for enable_caches to enable the I and D caches, along with the cached 1:1 mapping of all of DRAM. This is needed for running U-Boot under virtualization with QEMU/kvm.
Signed-off-by: Ard Biesheuvel ardb@kernel.org
board/emulation/qemu-arm/qemu-arm.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/board/emulation/qemu-arm/qemu-arm.c b/board/emulation/qemu-arm/qemu-arm.c index 69e8ef46f1f5..1b0d543b93c1 100644 --- a/board/emulation/qemu-arm/qemu-arm.c +++ b/board/emulation/qemu-arm/qemu-arm.c @@ -4,6 +4,7 @@ */
#include <common.h> +#include <cpu_func.h> #include <dm.h> #include <fdtdec.h> #include <init.h> @@ -94,6 +95,12 @@ void *board_fdt_blob_setup(void) return (void *)CONFIG_SYS_SDRAM_BASE; }
+void enable_caches(void) +{
icache_enable();
dcache_enable();
+}
For other ARM architectures I have seen:
int arch_cpu_init(void) { icache_enable(); return 0; }
void enable_caches(void) { dcache_enable(); }
Some boards have
if (!icache_status()) icache_enable();
others
#if !CONFIG_IS_ENABLED(SYS_ICACHE_OFF) icache_enable(); #endif
Tom could you, please, advice what is the correct way to do it.
Off-hand? Dealing with it per SoC/mach/board as we do today. Sometimes we need a dcache_disable() before the dcache_enable() as well to clear out previous mappings. So dropping this in here is probably the best choice for qemu-arm. Thanks!

The QEMU/mach-virt targeted port of u-boot currently only runs on QEMU under TCG emulation, which does not model the caches at all, and so no users can exist that are relying on the GRUB hack for EFI boot.
We will shortly enable support for running under KVM, but the GRUB hack (which disables all caches without doing cache cleaning by VA during ExitBootServices()) is likely to cause more problems than it solves, given that KVM hosts require correct maintenance if they incorporate non-architected system caches.
So let's disable the GRUB hack by default on the QEMU/mach-virt port.
Signed-off-by: Ard Biesheuvel ardb@kernel.org --- configs/qemu_arm_defconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/configs/qemu_arm_defconfig b/configs/qemu_arm_defconfig index 75bdce7708c7..1d2b4437cb07 100644 --- a/configs/qemu_arm_defconfig +++ b/configs/qemu_arm_defconfig @@ -47,3 +47,4 @@ CONFIG_USB=y CONFIG_DM_USB=y CONFIG_USB_EHCI_HCD=y CONFIG_USB_EHCI_PCI=y +# CONFIG_EFI_GRUB_ARM32_WORKAROUND is not set

On 6/6/20 7:15 PM, Ard Biesheuvel wrote:
The QEMU/mach-virt targeted port of u-boot currently only runs on QEMU under TCG emulation, which does not model the caches at all, and so no users can exist that are relying on the GRUB hack for EFI boot.
We will shortly enable support for running under KVM, but the GRUB hack (which disables all caches without doing cache cleaning by VA during ExitBootServices()) is likely to cause more problems than it solves, given that KVM hosts require correct maintenance if they incorporate non-architected system caches.
So let's disable the GRUB hack by default on the QEMU/mach-virt port.
Signed-off-by: Ard Biesheuvel ardb@kernel.org
This patch could be merged with 2/5. You are changing the same defconfig.
Reviewed-by: Heinrich Schuchardt xypron.glpk@gmx.de
configs/qemu_arm_defconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/configs/qemu_arm_defconfig b/configs/qemu_arm_defconfig index 75bdce7708c7..1d2b4437cb07 100644 --- a/configs/qemu_arm_defconfig +++ b/configs/qemu_arm_defconfig @@ -47,3 +47,4 @@ CONFIG_USB=y CONFIG_DM_USB=y CONFIG_USB_EHCI_HCD=y CONFIG_USB_EHCI_PCI=y +# CONFIG_EFI_GRUB_ARM32_WORKAROUND is not set

Some instructions in the ARM ISA have multiple output registers, such as ldrd/ldp (load pair), where two registers are loaded from memory, but also ldr with indexing, where the memory base register is incremented as well when the value is loaded to the destination register.
MMIO emulation under KVM is based on using the architecturally defined syndrome information that is provided when an exception is taken to the hypervisor. This syndrome information describes whether the instruction that triggered the exception is a load or a store, what the faulting address was, and which register was the destination register.
This syndrome information can only describe one destination register, and when the trapping instruction is one with multiple outputs, KVM throws an error like
kvm [615929]: Data abort outside memslots with no valid syndrome info
on the host and kills the QEMU process with the following error:
U-Boot 2020.07-rc3-00208-g88bd5b179360-dirty (Jun 06 2020 - 11:59:22 +0200)
DRAM: 1 GiB Flash: error: kvm run failed Function not implemented R00=00000001 R01=00000040 R02=7ee0ce20 R03=00000000 R04=7ffd9eec R05=00000004 R06=7ffda3f8 R07=00000055 R08=7ffd9eec R09=7ef0ded0 R10=7ee0ce20 R11=00000000 R12=00000004 R13=7ee0cdf8 R14=00000000 R15=7ff72d08 PSR=200001d3 --C- A svc32 QEMU: Terminated
This means that, in order to run U-Boot in QEMU under KVM, we need to avoid such instructions when accessing emulated devices. For the flash in particular, which is a hybrid between a ROM (backed by a memslot) when in array mode, and an emulated MMIO device (when in write mode), we need to take care to only use instructions that KVM can deal with when they trap.
So override the flash accessors that are used when running on QEMU. Note that the write accessors are included for completeness, but the read accessors are the ones that need this special care.
Signed-off-by: Ard Biesheuvel ardb@kernel.org --- board/emulation/qemu-arm/qemu-arm.c | 55 ++++++++++++++++++++ include/configs/qemu-arm.h | 1 + 2 files changed, 56 insertions(+)
diff --git a/board/emulation/qemu-arm/qemu-arm.c b/board/emulation/qemu-arm/qemu-arm.c index 1b0d543b93c1..32e18fd8b985 100644 --- a/board/emulation/qemu-arm/qemu-arm.c +++ b/board/emulation/qemu-arm/qemu-arm.c @@ -142,3 +142,58 @@ efi_status_t platform_get_rng_device(struct udevice **dev) return EFI_SUCCESS; } #endif /* CONFIG_EFI_RNG_PROTOCOL */ + +#ifdef CONFIG_ARM64 +#define __W "w" +#else +#define __W +#endif + +void flash_write8(u8 value, void *addr) +{ + asm("strb %" __W "1, %0" : "=m"(*(u8 *)addr) : "r"(value)); +} + +void flash_write16(u16 value, void *addr) +{ + asm("strh %" __W "1, %0" : "=m"(*(u16 *)addr) : "r"(value)); +} + +void flash_write32(u32 value, void *addr) +{ + asm("str %" __W "1, %0" : "=m"(*(u32 *)addr) : "r"(value)); +} + +void flash_write64(u64 value, void *addr) +{ + BUG(); /* FLASH_CFI_64BIT is not implemented by QEMU */ +} + +u8 flash_read8(void *addr) +{ + u8 ret; + + asm("ldrb %" __W "0, %1" : "=r"(ret) : "m"(*(u8 *)addr)); + return ret; +} + +u16 flash_read16(void *addr) +{ + u16 ret; + + asm("ldrh %" __W "0, %1" : "=r"(ret) : "m"(*(u16 *)addr)); + return ret; +} + +u32 flash_read32(void *addr) +{ + u32 ret; + + asm("ldr %" __W "0, %1" : "=r"(ret) : "m"(*(u32 *)addr)); + return ret; +} + +u64 flash_read64(void *addr) +{ + BUG(); /* FLASH_CFI_64BIT is not implemented by QEMU */ +} diff --git a/include/configs/qemu-arm.h b/include/configs/qemu-arm.h index 1ef75a87836b..bc8b7c5c1238 100644 --- a/include/configs/qemu-arm.h +++ b/include/configs/qemu-arm.h @@ -53,5 +53,6 @@ #define CONFIG_SYS_MAX_FLASH_BANKS 2 #endif #define CONFIG_SYS_MAX_FLASH_SECT 256 /* Sector: 256K, Bank: 64M */ +#define CONFIG_CFI_FLASH_USE_WEAK_ACCESSORS
#endif /* __CONFIG_H */

On 6/6/20 7:15 PM, Ard Biesheuvel wrote:
Some instructions in the ARM ISA have multiple output registers, such as ldrd/ldp (load pair), where two registers are loaded from memory, but also ldr with indexing, where the memory base register is incremented as well when the value is loaded to the destination register.
MMIO emulation under KVM is based on using the architecturally defined syndrome information that is provided when an exception is taken to the hypervisor. This syndrome information describes whether the instruction that triggered the exception is a load or a store, what the faulting address was, and which register was the destination register.
This syndrome information can only describe one destination register, and when the trapping instruction is one with multiple outputs, KVM throws an error like
kvm [615929]: Data abort outside memslots with no valid syndrome info
on the host and kills the QEMU process with the following error:
U-Boot 2020.07-rc3-00208-g88bd5b179360-dirty (Jun 06 2020 - 11:59:22 +0200)
DRAM: 1 GiB Flash: error: kvm run failed Function not implemented R00=00000001 R01=00000040 R02=7ee0ce20 R03=00000000 R04=7ffd9eec R05=00000004 R06=7ffda3f8 R07=00000055 R08=7ffd9eec R09=7ef0ded0 R10=7ee0ce20 R11=00000000 R12=00000004 R13=7ee0cdf8 R14=00000000 R15=7ff72d08 PSR=200001d3 --C- A svc32 QEMU: Terminated
This means that, in order to run U-Boot in QEMU under KVM, we need to avoid such instructions when accessing emulated devices. For the flash in particular, which is a hybrid between a ROM (backed by a memslot) when in array mode, and an emulated MMIO device (when in write mode), we need to take care to only use instructions that KVM can deal with when they trap.
So override the flash accessors that are used when running on QEMU. Note that the write accessors are included for completeness, but the read accessors are the ones that need this special care.
Signed-off-by: Ard Biesheuvel ardb@kernel.org
board/emulation/qemu-arm/qemu-arm.c | 55 ++++++++++++++++++++ include/configs/qemu-arm.h | 1 + 2 files changed, 56 insertions(+)
diff --git a/board/emulation/qemu-arm/qemu-arm.c b/board/emulation/qemu-arm/qemu-arm.c index 1b0d543b93c1..32e18fd8b985 100644 --- a/board/emulation/qemu-arm/qemu-arm.c +++ b/board/emulation/qemu-arm/qemu-arm.c @@ -142,3 +142,58 @@ efi_status_t platform_get_rng_device(struct udevice **dev) return EFI_SUCCESS; } #endif /* CONFIG_EFI_RNG_PROTOCOL */
+#ifdef CONFIG_ARM64 +#define __W "w" +#else +#define __W +#endif
+void flash_write8(u8 value, void *addr) +{
- asm("strb %" __W "1, %0" : "=m"(*(u8 *)addr) : "r"(value));
+}
+void flash_write16(u16 value, void *addr) +{
- asm("strh %" __W "1, %0" : "=m"(*(u16 *)addr) : "r"(value));
+}
+void flash_write32(u32 value, void *addr) +{
- asm("str %" __W "1, %0" : "=m"(*(u32 *)addr) : "r"(value));
+}
+void flash_write64(u64 value, void *addr) +{
- BUG(); /* FLASH_CFI_64BIT is not implemented by QEMU */
Why should this BUG() on aarch64? Why is panicking in U-Boot better than crashing in QEMU? Why can't this be realized as two 32bit writes?
This seems to be wrong: drivers/mtd/cfi_flash.c:179: /* No architectures currently implement __raw_readq() */
I find definitions for all architectures.
So shouldn't you correct cfi_flash.c and override __arch_getq() and __arch_putq() instead?
+}
+u8 flash_read8(void *addr) +{
- u8 ret;
- asm("ldrb %" __W "0, %1" : "=r"(ret) : "m"(*(u8 *)addr));
- return ret;
+}
+u16 flash_read16(void *addr) +{
- u16 ret;
- asm("ldrh %" __W "0, %1" : "=r"(ret) : "m"(*(u16 *)addr));
- return ret;
+}
+u32 flash_read32(void *addr) +{
- u32 ret;
- asm("ldr %" __W "0, %1" : "=r"(ret) : "m"(*(u32 *)addr));
- return ret;
+}
+u64 flash_read64(void *addr) +{
- BUG(); /* FLASH_CFI_64BIT is not implemented by QEMU */
Same here.
Best regards
Heinrich
+} diff --git a/include/configs/qemu-arm.h b/include/configs/qemu-arm.h index 1ef75a87836b..bc8b7c5c1238 100644 --- a/include/configs/qemu-arm.h +++ b/include/configs/qemu-arm.h @@ -53,5 +53,6 @@ #define CONFIG_SYS_MAX_FLASH_BANKS 2 #endif #define CONFIG_SYS_MAX_FLASH_SECT 256 /* Sector: 256K, Bank: 64M */ +#define CONFIG_CFI_FLASH_USE_WEAK_ACCESSORS
#endif /* __CONFIG_H */

On Sat, 6 Jun 2020 at 23:08, Heinrich Schuchardt xypron.glpk@gmx.de wrote:
On 6/6/20 7:15 PM, Ard Biesheuvel wrote:
Some instructions in the ARM ISA have multiple output registers, such as ldrd/ldp (load pair), where two registers are loaded from memory, but also ldr with indexing, where the memory base register is incremented as well when the value is loaded to the destination register.
MMIO emulation under KVM is based on using the architecturally defined syndrome information that is provided when an exception is taken to the hypervisor. This syndrome information describes whether the instruction that triggered the exception is a load or a store, what the faulting address was, and which register was the destination register.
This syndrome information can only describe one destination register, and when the trapping instruction is one with multiple outputs, KVM throws an error like
kvm [615929]: Data abort outside memslots with no valid syndrome info
on the host and kills the QEMU process with the following error:
U-Boot 2020.07-rc3-00208-g88bd5b179360-dirty (Jun 06 2020 - 11:59:22 +0200)
DRAM: 1 GiB Flash: error: kvm run failed Function not implemented R00=00000001 R01=00000040 R02=7ee0ce20 R03=00000000 R04=7ffd9eec R05=00000004 R06=7ffda3f8 R07=00000055 R08=7ffd9eec R09=7ef0ded0 R10=7ee0ce20 R11=00000000 R12=00000004 R13=7ee0cdf8 R14=00000000 R15=7ff72d08 PSR=200001d3 --C- A svc32 QEMU: Terminated
This means that, in order to run U-Boot in QEMU under KVM, we need to avoid such instructions when accessing emulated devices. For the flash in particular, which is a hybrid between a ROM (backed by a memslot) when in array mode, and an emulated MMIO device (when in write mode), we need to take care to only use instructions that KVM can deal with when they trap.
So override the flash accessors that are used when running on QEMU. Note that the write accessors are included for completeness, but the read accessors are the ones that need this special care.
Signed-off-by: Ard Biesheuvel ardb@kernel.org
board/emulation/qemu-arm/qemu-arm.c | 55 ++++++++++++++++++++ include/configs/qemu-arm.h | 1 + 2 files changed, 56 insertions(+)
diff --git a/board/emulation/qemu-arm/qemu-arm.c b/board/emulation/qemu-arm/qemu-arm.c index 1b0d543b93c1..32e18fd8b985 100644 --- a/board/emulation/qemu-arm/qemu-arm.c +++ b/board/emulation/qemu-arm/qemu-arm.c @@ -142,3 +142,58 @@ efi_status_t platform_get_rng_device(struct udevice **dev) return EFI_SUCCESS; } #endif /* CONFIG_EFI_RNG_PROTOCOL */
+#ifdef CONFIG_ARM64 +#define __W "w" +#else +#define __W +#endif
+void flash_write8(u8 value, void *addr) +{
asm("strb %" __W "1, %0" : "=m"(*(u8 *)addr) : "r"(value));
+}
+void flash_write16(u16 value, void *addr) +{
asm("strh %" __W "1, %0" : "=m"(*(u16 *)addr) : "r"(value));
+}
+void flash_write32(u32 value, void *addr) +{
asm("str %" __W "1, %0" : "=m"(*(u32 *)addr) : "r"(value));
+}
+void flash_write64(u64 value, void *addr) +{
BUG(); /* FLASH_CFI_64BIT is not implemented by QEMU */
Why should this BUG() on aarch64?
QEMU's CFI emulation does not implement 8 byte width, so there is no point in implementing this accessor, as it will never be called anyway.
The BUG() is there to ensure that *if* QEMU ever does get support for 8 byte wide CFI flash, we notice immediately, rather than having to debug weird failures.
Why is panicking in U-Boot better than crashing in QEMU?
Because U-boot crashes in the guest, while QEMU crashes in the host.
Why can't this be realized as two 32bit writes?
It could be but there is no point: QEMU will never exercise this code path anyway.
This seems to be wrong: drivers/mtd/cfi_flash.c:179: /* No architectures currently implement __raw_readq() */
I find definitions for all architectures.
So shouldn't you correct cfi_flash.c and override __arch_getq() and __arch_putq() instead?
If anyone wants to fix 8 byte CFI, they are welcome to do so, but it is a separate issue.
+}
+u8 flash_read8(void *addr) +{
u8 ret;
asm("ldrb %" __W "0, %1" : "=r"(ret) : "m"(*(u8 *)addr));
return ret;
+}
+u16 flash_read16(void *addr) +{
u16 ret;
asm("ldrh %" __W "0, %1" : "=r"(ret) : "m"(*(u16 *)addr));
return ret;
+}
+u32 flash_read32(void *addr) +{
u32 ret;
asm("ldr %" __W "0, %1" : "=r"(ret) : "m"(*(u32 *)addr));
return ret;
+}
+u64 flash_read64(void *addr) +{
BUG(); /* FLASH_CFI_64BIT is not implemented by QEMU */
Same here.
Best regards
Heinrich
+} diff --git a/include/configs/qemu-arm.h b/include/configs/qemu-arm.h index 1ef75a87836b..bc8b7c5c1238 100644 --- a/include/configs/qemu-arm.h +++ b/include/configs/qemu-arm.h @@ -53,5 +53,6 @@ #define CONFIG_SYS_MAX_FLASH_BANKS 2 #endif #define CONFIG_SYS_MAX_FLASH_SECT 256 /* Sector: 256K, Bank: 64M */ +#define CONFIG_CFI_FLASH_USE_WEAK_ACCESSORS
#endif /* __CONFIG_H */
participants (3)
-
Ard Biesheuvel
-
Heinrich Schuchardt
-
Tom Rini