[PATCH 0/3] arm: smh: Fix and improve semihosting traps

While playing around with QEMU's semihosting implementation, I stared at our semihosting trap code, and found some issues, that this mini-series fixes.
Please have a look!
Cheers, Andre
Andre Przywara (3): arm: smh: specify Thumb trap instruction arm: smh: Make semihosting trap calls more robust arm: smh: Allow semihosting trap calls to be inlined
arch/arm/lib/semihosting.c | 44 +++++++++++++++++++++++++++----------- 1 file changed, 31 insertions(+), 13 deletions(-)

The ARM semihosting interface uses different trap instructions for different architectures and instruction sets. So far we were using AArch64 and ARMv7-M, and had an untested v7-A entry. The latter does not work when building for Thumb, as can be verified by using qemu_arm_defconfig, then enabling SEMIHOSTING and SYS_THUMB_BUILD: ========== {standard input}:35: Error: invalid swi expression {standard input}:35: Error: value of 1193046 too large for field of 2 bytes at 0 ==========
Fix this by providing the recommended instruction[1] for Thumb, and using the ARM instruction only when not building for Thumb. This also removes some comment, as QEMU for ARM allows to now test this case. Also use the opportunity to clean up the inline assembly, and just define the actual trap instruction inside #ifdef's, to improve readability.
[1] https://developer.arm.com/documentation/dui0471/g/Semihosting/The-semihostin...
Signed-off-by: Andre Przywara andre.przywara@arm.com --- arch/arm/lib/semihosting.c | 29 ++++++++++++++++------------- 1 file changed, 16 insertions(+), 13 deletions(-)
diff --git a/arch/arm/lib/semihosting.c b/arch/arm/lib/semihosting.c index 01d652a6b83..acc6b1be4f3 100644 --- a/arch/arm/lib/semihosting.c +++ b/arch/arm/lib/semihosting.c @@ -4,11 +4,6 @@ * Copyright 2014 Broadcom Corporation */
-/* - * This code has been tested on arm64/aarch64 fastmodel only. An untested - * placeholder exists for armv7 architectures, but since they are commonly - * available in silicon now, fastmodel usage makes less sense for them. - */ #include <common.h> #include <log.h> #include <semihosting.h> @@ -25,20 +20,28 @@ #define SYSFLEN 0x0C #define SYSERRNO 0x13
+#if defined(CONFIG_ARM64) + #define SMH_TRAP "hlt #0xf000" +#elif defined(CONFIG_CPU_V7M) + #define SMH_TRAP "bkpt #0xAB" +#elif defined(CONFIG_SYS_THUMB_BUILD) + #define SMH_TRAP "svc #0xab" +#else + #define SMH_TRAP "svc #0x123456" +#endif + /* * Call the handler */ static noinline long smh_trap(unsigned int sysnum, void *addr) { register long result asm("r0"); -#if defined(CONFIG_ARM64) - asm volatile ("hlt #0xf000" : "=r" (result) : "0"(sysnum), "r"(addr) : "memory"); -#elif defined(CONFIG_CPU_V7M) - asm volatile ("bkpt #0xAB" : "=r" (result) : "0"(sysnum), "r"(addr) : "memory"); -#else - /* Note - untested placeholder */ - asm volatile ("svc #0x123456" : "=r" (result) : "0"(sysnum), "r"(addr) : "memory"); -#endif + + asm volatile (SMH_TRAP + : "=r" (result) + : "0"(sysnum), "r"(addr) + : "memory"); + return result; }

On Wed, Oct 05, 2022 at 05:38:47PM +0100, Andre Przywara wrote:
The ARM semihosting interface uses different trap instructions for different architectures and instruction sets. So far we were using AArch64 and ARMv7-M, and had an untested v7-A entry. The latter does not work when building for Thumb, as can be verified by using qemu_arm_defconfig, then enabling SEMIHOSTING and SYS_THUMB_BUILD: ========== {standard input}:35: Error: invalid swi expression {standard input}:35: Error: value of 1193046 too large for field of 2 bytes at 0 ==========
Fix this by providing the recommended instruction[1] for Thumb, and using the ARM instruction only when not building for Thumb. This also removes some comment, as QEMU for ARM allows to now test this case. Also use the opportunity to clean up the inline assembly, and just define the actual trap instruction inside #ifdef's, to improve readability.
[1] https://developer.arm.com/documentation/dui0471/g/Semihosting/The-semihostin...
Signed-off-by: Andre Przywara andre.przywara@arm.com
Applied to u-boot/master, thanks!

Commit f4b540e25c5c("arm: smh: Fix uninitialized parameters with newer GCCs") added a memory clobber to the semihosting inline assembly trap calls, to avoid too eager GCC optimisation: when passing a pointer, newer compilers couldn't be bothered to actually fill in the structure that it pointed to, as this data would seemingly never be used (at least from the compiler's point of view). But instead of the memory clobber we need to tell the compiler that we are passing an *array* instead of some generic pointer, this forces the compiler to actually populate the data structure. This involves some rather hideous cast, which is best hidden in a macro.
But regardless of that, we actually need the memory clobber, but for two different reasons: explain them in comments.
Signed-off-by: Andre Przywara andre.przywara@arm.com --- arch/arm/lib/semihosting.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/arch/arm/lib/semihosting.c b/arch/arm/lib/semihosting.c index acc6b1be4f3..3dee7d51b36 100644 --- a/arch/arm/lib/semihosting.c +++ b/arch/arm/lib/semihosting.c @@ -20,6 +20,12 @@ #define SYSFLEN 0x0C #define SYSERRNO 0x13
+/* + * Macro to force the compiler to *populate* memory (for an array or struct) + * before passing the pointer to an inline assembly call. + */ +#define USE_PTR(ptr) *(const char (*)[]) (ptr) + #if defined(CONFIG_ARM64) #define SMH_TRAP "hlt #0xf000" #elif defined(CONFIG_CPU_V7M) @@ -37,9 +43,17 @@ static noinline long smh_trap(unsigned int sysnum, void *addr) { register long result asm("r0");
+ /* + * We need a memory clobber (aka compiler barrier) for two reasons: + * - The compiler needs to populate any data structures pointed to + * by "addr" *before* the trap instruction is called. + * - At least the SYSREAD function puts the result into memory pointed + * to by "addr", so the compiler must not use a cached version of + * the previous content, after the call has finished. + */ asm volatile (SMH_TRAP : "=r" (result) - : "0"(sysnum), "r"(addr) + : "0"(sysnum), "r"(USE_PTR(addr)) : "memory");
return result;

On Wed, Oct 05, 2022 at 05:38:48PM +0100, Andre Przywara wrote:
Commit f4b540e25c5c("arm: smh: Fix uninitialized parameters with newer GCCs") added a memory clobber to the semihosting inline assembly trap calls, to avoid too eager GCC optimisation: when passing a pointer, newer compilers couldn't be bothered to actually fill in the structure that it pointed to, as this data would seemingly never be used (at least from the compiler's point of view). But instead of the memory clobber we need to tell the compiler that we are passing an *array* instead of some generic pointer, this forces the compiler to actually populate the data structure. This involves some rather hideous cast, which is best hidden in a macro.
But regardless of that, we actually need the memory clobber, but for two different reasons: explain them in comments.
Signed-off-by: Andre Przywara andre.przywara@arm.com
Applied to u-boot/master, thanks!

Currently our semihosting trap function is somewhat fragile: we rely on the current compiler behaviour to assign the second inline assembly argument to the next free register (r1/x1), which happens to be the "addr" argument to the smh_trap() function (per the calling convention). I guess this is also the reason for the noinline attribute.
Make it explicit what we want: the "addr" argument needs to go into r1, so we add another register variable. This allows to drop the "noinline" attribute, so now the compiler beautifully inlines just the trap instruction directly into the calling function.
Signed-off-by: Andre Przywara andre.przywara@arm.com --- arch/arm/lib/semihosting.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/arm/lib/semihosting.c b/arch/arm/lib/semihosting.c index 3dee7d51b36..939c0f75132 100644 --- a/arch/arm/lib/semihosting.c +++ b/arch/arm/lib/semihosting.c @@ -39,9 +39,10 @@ /* * Call the handler */ -static noinline long smh_trap(unsigned int sysnum, void *addr) +static long smh_trap(unsigned int sysnum, void *addr) { register long result asm("r0"); + register void *_addr asm("r1") = addr;
/* * We need a memory clobber (aka compiler barrier) for two reasons: @@ -53,7 +54,7 @@ static noinline long smh_trap(unsigned int sysnum, void *addr) */ asm volatile (SMH_TRAP : "=r" (result) - : "0"(sysnum), "r"(USE_PTR(addr)) + : "0"(sysnum), "r"(USE_PTR(_addr)) : "memory");
return result;

On Wed, Oct 05, 2022 at 05:38:49PM +0100, Andre Przywara wrote:
Currently our semihosting trap function is somewhat fragile: we rely on the current compiler behaviour to assign the second inline assembly argument to the next free register (r1/x1), which happens to be the "addr" argument to the smh_trap() function (per the calling convention). I guess this is also the reason for the noinline attribute.
Make it explicit what we want: the "addr" argument needs to go into r1, so we add another register variable. This allows to drop the "noinline" attribute, so now the compiler beautifully inlines just the trap instruction directly into the calling function.
Signed-off-by: Andre Przywara andre.przywara@arm.com
Applied to u-boot/master, thanks!

On 10/5/22 12:38, Andre Przywara wrote:
While playing around with QEMU's semihosting implementation, I stared at our semihosting trap code, and found some issues, that this mini-series fixes.
Please have a look!
Cheers, Andre
Andre Przywara (3): arm: smh: specify Thumb trap instruction arm: smh: Make semihosting trap calls more robust arm: smh: Allow semihosting trap calls to be inlined
arch/arm/lib/semihosting.c | 44 +++++++++++++++++++++++++++----------- 1 file changed, 31 insertions(+), 13 deletions(-)
This all LGTM
Although I wonder if we should just move smh_trap into an assembly file. I think that would apply most of these optimization barriers.
Your series will also confict with Kautuk's series [1]. I think the bugs you described above will apply to RISC-V as well.
--Sean
[1] https://lore.kernel.org/all/20220923070320.617623-1-kconsul@ventanamicro.com...

On Wed, 5 Oct 2022 19:41:01 -0400 Sean Anderson sean.anderson@seco.com wrote:
Hi Sean,
On 10/5/22 12:38, Andre Przywara wrote:
While playing around with QEMU's semihosting implementation, I stared at our semihosting trap code, and found some issues, that this mini-series fixes.
Please have a look!
Cheers, Andre
Andre Przywara (3): arm: smh: specify Thumb trap instruction arm: smh: Make semihosting trap calls more robust arm: smh: Allow semihosting trap calls to be inlined
arch/arm/lib/semihosting.c | 44 +++++++++++++++++++++++++++----------- 1 file changed, 31 insertions(+), 13 deletions(-)
This all LGTM
Although I wonder if we should just move smh_trap into an assembly file. I think that would apply most of these optimization barriers.
I was already thinking about that, since it's indeed the most clean and robust solution. But then I didn't want to throw away the hours of research I put into this, and be it for future reference in the mailing list archive ;-) And I really like how the compiler just puts the "hlt" directly into the generated code. But since it's not performance critical, a function is just fine.
Your series will also confict with Kautuk's series [1]. I think the bugs you described above will apply to RISC-V as well.
Ah, this now makes the case for an assembly file even bigger, since we now have only the trap function in a separate file anyway.
I will send something on top of Kautuk's series.
Many thanks for the heads up!
Cheers, Andre
--Sean
[1] https://lore.kernel.org/all/20220923070320.617623-1-kconsul@ventanamicro.com...
participants (3)
-
Andre Przywara
-
Sean Anderson
-
Tom Rini