[PATCH 0/4] Fixes for RISC-V U-Boot SPL / OpenSBI boot flow

Rick's recent patch series, which adds support for U-Boot SPL to the Andes platform, brought several problems of the current U-Boot SPL boot flow on RISC-V to light. Discussion on the relevant parts starts at [1].
The problem showed itself in the form of code corruption. At start, OpenSBI relocates itself to its link address. This allows it to be loaded independently of the link address. In the case that the link address ranges of U-Boot SPL and OpenSBI overlap, code corruption occurs if the relocation starts while some harts are still running U-Boot SPL. This series prevents this problem by specifying the hart that performs the relocation and then making sure that it is the last hart to enter OpenSBI, allowing relocation to be completed safely. A recent version of OpenSBI is required for the changes to work.
This patch series resolves the problems associated with the use case of overlapping link address ranges. However, it is still recommended to select non-overlapping ranges for U-Boot SPL and OpenSBI.
[1]: https://lists.denx.de/pipermail/u-boot/2019-November/389385.html
Lukas Auer (4): spl: opensbi: specify main hart as preferred boot hart riscv: add functions for reading the IPI status riscv: add option to wait for ack from secondary harts in smp functions spl: opensbi: wait for ack from secondary harts before entering OpenSBI
arch/riscv/cpu/start.S | 2 ++ arch/riscv/include/asm/smp.h | 3 ++- arch/riscv/lib/andes_plic.c | 9 ++++++++ arch/riscv/lib/bootm.c | 2 +- arch/riscv/lib/sbi_ipi.c | 11 +++++++++ arch/riscv/lib/sifive_clint.c | 9 ++++++++ arch/riscv/lib/smp.c | 43 +++++++++++++++++++++++++++-------- arch/riscv/lib/spl.c | 2 +- common/spl/spl_opensbi.c | 13 ++++++++++- include/opensbi.h | 18 ++++++++++++++- 10 files changed, 98 insertions(+), 14 deletions(-)

OpenSBI uses a relocation lottery to determine the hart to relocate OpenSBI to its link address. In the U-Boot SPL boot flow, the main hart schedules the secondary harts to enter OpenSBI before doing so itself. One of the secondary harts will therefore always be the winner of the relocation lottery. This is problematic if the link address ranges of OpenSBI and U-Boot SPL overlap. OpenSBI will be relocated and therefore overwrite U-Boot SPL while some harts may still run it, leading to code corruption.
Avoid this problem by specifying the main hart as the preferred boot hart to perform the OpenSBI relocation. The main hart will be the last hart to enter OpenSBI, relocation can therefore occur safely.
The boot hart field was added to version 2 of the OpenSBI FW_DYNAMIC info structure. The header file include/opensbi.h is synchronized with include/sbi/fw_dynamic.h from the OpenSBI project to update the info structure. The header file is recent as of commit 7a13beb21326 ("firmware: Add preferred boot HART field in struct fw_dynamic_info").
Reported-by: Rick Chen rick@andestech.com Suggested-by: Anup Patel Anup.Patel@wdc.com Signed-off-by: Lukas Auer lukas.auer@aisec.fraunhofer.de ---
common/spl/spl_opensbi.c | 1 + include/opensbi.h | 18 +++++++++++++++++- 2 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/common/spl/spl_opensbi.c b/common/spl/spl_opensbi.c index a6b4480ed2..79ee7edcf9 100644 --- a/common/spl/spl_opensbi.c +++ b/common/spl/spl_opensbi.c @@ -69,6 +69,7 @@ void spl_invoke_opensbi(struct spl_image_info *spl_image) opensbi_info.next_addr = uboot_entry; opensbi_info.next_mode = FW_DYNAMIC_INFO_NEXT_MODE_S; opensbi_info.options = SBI_SCRATCH_NO_BOOT_PRINTS; + opensbi_info.boot_hart = gd->arch.boot_hart;
opensbi_entry = (void (*)(ulong, ulong, ulong))spl_image->entry_point; invalidate_icache_all(); diff --git a/include/opensbi.h b/include/opensbi.h index 9f1d62e7dd..d812cc8ccd 100644 --- a/include/opensbi.h +++ b/include/opensbi.h @@ -11,7 +11,7 @@ #define FW_DYNAMIC_INFO_MAGIC_VALUE 0x4942534f
/** Maximum supported info version */ -#define FW_DYNAMIC_INFO_VERSION 0x1 +#define FW_DYNAMIC_INFO_VERSION 0x2
/** Possible next mode values */ #define FW_DYNAMIC_INFO_NEXT_MODE_U 0x0 @@ -35,6 +35,22 @@ struct fw_dynamic_info { unsigned long next_mode; /** Options for OpenSBI library */ unsigned long options; + /** + * Preferred boot HART id + * + * It is possible that the previous booting stage uses same link + * address as the FW_DYNAMIC firmware. In this case, the relocation + * lottery mechanism can potentially overwrite the previous booting + * stage while other HARTs are still running in the previous booting + * stage leading to boot-time crash. To avoid this boot-time crash, + * the previous booting stage can specify last HART that will jump + * to the FW_DYNAMIC firmware as the preferred boot HART. + * + * To avoid specifying a preferred boot HART, the previous booting + * stage can set it to -1UL which will force the FW_DYNAMIC firmware + * to use the relocation lottery mechanism. + */ + unsigned long boot_hart; } __packed;
#endif

Add the function riscv_get_ipi() for reading the pending status of IPIs. The supported controllers are Andes' Platform Level Interrupt Controller (PLIC), the Supervisor Binary Interface (SBI), and SiFive's Core Local Interruptor (CLINT).
Signed-off-by: Lukas Auer lukas.auer@aisec.fraunhofer.de --- I do not have access to the datasheet of the Andes PLIC. The riscv_clear_ipi() implementation seems to read the IPI status from the claim register before writing back the results to clear them. Based on this, I also used the claim register. Rick, please let me know if that is ok or if I should use the pending register instead.
arch/riscv/lib/andes_plic.c | 9 +++++++++ arch/riscv/lib/sbi_ipi.c | 11 +++++++++++ arch/riscv/lib/sifive_clint.c | 9 +++++++++ arch/riscv/lib/smp.c | 12 ++++++++++++ 4 files changed, 41 insertions(+)
diff --git a/arch/riscv/lib/andes_plic.c b/arch/riscv/lib/andes_plic.c index 28568e4e2b..731ac3a148 100644 --- a/arch/riscv/lib/andes_plic.c +++ b/arch/riscv/lib/andes_plic.c @@ -114,6 +114,15 @@ int riscv_clear_ipi(int hart) return 0; }
+int riscv_get_ipi(int hart, int *pending) +{ + PLIC_BASE_GET(); + + *pending = !!readl((void __iomem *)CLAIM_REG(gd->arch.plic, hart)); + + return 0; +} + static const struct udevice_id andes_plic_ids[] = { { .compatible = "riscv,plic1", .data = RISCV_SYSCON_PLIC }, { } diff --git a/arch/riscv/lib/sbi_ipi.c b/arch/riscv/lib/sbi_ipi.c index 170346da68..9a698ce74e 100644 --- a/arch/riscv/lib/sbi_ipi.c +++ b/arch/riscv/lib/sbi_ipi.c @@ -23,3 +23,14 @@ int riscv_clear_ipi(int hart)
return 0; } + +int riscv_get_ipi(int hart, int *pending) +{ + /* + * The SBI does not support reading the IPI status. We always return 0 + * to indicate that no IPI is pending. + */ + *pending = 0; + + return 0; +} diff --git a/arch/riscv/lib/sifive_clint.c b/arch/riscv/lib/sifive_clint.c index d24e0d585b..d7899d16d7 100644 --- a/arch/riscv/lib/sifive_clint.c +++ b/arch/riscv/lib/sifive_clint.c @@ -71,6 +71,15 @@ int riscv_clear_ipi(int hart) return 0; }
+int riscv_get_ipi(int hart, int *pending) +{ + CLINT_BASE_GET(); + + *pending = readl((void __iomem *)MSIP_REG(gd->arch.clint, hart)); + + return 0; +} + static const struct udevice_id sifive_clint_ids[] = { { .compatible = "riscv,clint0", .data = RISCV_SYSCON_CLINT }, { } diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index cc66f15567..6ff0de4b74 100644 --- a/arch/riscv/lib/smp.c +++ b/arch/riscv/lib/smp.c @@ -31,6 +31,18 @@ extern int riscv_send_ipi(int hart); */ extern int riscv_clear_ipi(int hart);
+/** + * riscv_get_ipi() - Get status of inter-processor interrupt (IPI) + * + * Platform code must provide this function. + * + * @hart: Hart ID of hart to be checked + * @pending: Pointer to variable with result of the check, + * 1 if IPI is pending, 0 otherwise + * @return 0 if OK, -ve on error + */ +extern int riscv_get_ipi(int hart, int *pending); + static int send_ipi_many(struct ipi_data *ipi) { ofnode node, cpus;

Add a wait option to smp_call_function() to wait for the secondary harts to acknowledge the call-function request. The request is considered to be acknowledged once each secondary hart has cleared the corresponding IPI.
As part of the call-function request, the secondary harts invalidate the instruction cache after clearing the IPI. This adds a delay between acknowledgment (clear IPI) and fulfillment (call function) of the request. We want to use the acknowledgment to be able to judge when the request has been completed. Remove the delay by clearing the IPI after cache invalidation and just before calling the function from the request.
Signed-off-by: Lukas Auer lukas.auer@aisec.fraunhofer.de ---
arch/riscv/cpu/start.S | 2 ++ arch/riscv/include/asm/smp.h | 3 ++- arch/riscv/lib/bootm.c | 2 +- arch/riscv/lib/smp.c | 31 ++++++++++++++++++++++--------- arch/riscv/lib/spl.c | 2 +- common/spl/spl_opensbi.c | 2 +- 6 files changed, 29 insertions(+), 13 deletions(-)
diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S index 0a2ce6d691..60631638dd 100644 --- a/arch/riscv/cpu/start.S +++ b/arch/riscv/cpu/start.S @@ -197,6 +197,7 @@ spl_secondary_hart_stack_gd_setup: la a0, secondary_hart_relocate mv a1, s0 mv a2, s0 + mv a3, zero jal smp_call_function
/* hang if relocation of secondary harts has failed */ @@ -337,6 +338,7 @@ relocate_secondary_harts:
mv a1, s2 mv a2, s3 + mv a3, zero jal smp_call_function
/* hang if relocation of secondary harts has failed */ diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h index bc863fdbaf..74de92ed13 100644 --- a/arch/riscv/include/asm/smp.h +++ b/arch/riscv/include/asm/smp.h @@ -46,8 +46,9 @@ void handle_ipi(ulong hart); * @addr: Address of function * @arg0: First argument of function * @arg1: Second argument of function + * @wait: Wait for harts to acknowledge request * @return 0 if OK, -ve on error */ -int smp_call_function(ulong addr, ulong arg0, ulong arg1); +int smp_call_function(ulong addr, ulong arg0, ulong arg1, int wait);
#endif diff --git a/arch/riscv/lib/bootm.c b/arch/riscv/lib/bootm.c index efbd3e23e7..e96137a50c 100644 --- a/arch/riscv/lib/bootm.c +++ b/arch/riscv/lib/bootm.c @@ -99,7 +99,7 @@ static void boot_jump_linux(bootm_headers_t *images, int flag) if (IMAGE_ENABLE_OF_LIBFDT && images->ft_len) { #ifdef CONFIG_SMP ret = smp_call_function(images->ep, - (ulong)images->ft_addr, 0); + (ulong)images->ft_addr, 0, 0); if (ret) hang(); #endif diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index 6ff0de4b74..d575e904c2 100644 --- a/arch/riscv/lib/smp.c +++ b/arch/riscv/lib/smp.c @@ -43,11 +43,11 @@ extern int riscv_clear_ipi(int hart); */ extern int riscv_get_ipi(int hart, int *pending);
-static int send_ipi_many(struct ipi_data *ipi) +static int send_ipi_many(struct ipi_data *ipi, int wait) { ofnode node, cpus; u32 reg; - int ret; + int ret, pending;
cpus = ofnode_path("/cpus"); if (!ofnode_valid(cpus)) { @@ -90,6 +90,15 @@ static int send_ipi_many(struct ipi_data *ipi) pr_err("Cannot send IPI to hart %d\n", reg); return ret; } + + if (wait) { + pending = 1; + while (pending) { + ret = riscv_get_ipi(reg, &pending); + if (ret) + return ret; + } + } }
return 0; @@ -103,21 +112,25 @@ void handle_ipi(ulong hart) if (hart >= CONFIG_NR_CPUS) return;
+ __smp_mb(); + + smp_function = (void (*)(ulong, ulong, ulong))gd->arch.ipi[hart].addr; + invalidate_icache_all(); + + /* + * Clear the IPI to acknowledge the request before jumping to the + * requested function. + */ ret = riscv_clear_ipi(hart); if (ret) { pr_err("Cannot clear IPI of hart %ld\n", hart); return; }
- __smp_mb(); - - smp_function = (void (*)(ulong, ulong, ulong))gd->arch.ipi[hart].addr; - invalidate_icache_all(); - smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1); }
-int smp_call_function(ulong addr, ulong arg0, ulong arg1) +int smp_call_function(ulong addr, ulong arg0, ulong arg1, int wait) { int ret = 0; struct ipi_data ipi; @@ -126,7 +139,7 @@ int smp_call_function(ulong addr, ulong arg0, ulong arg1) ipi.arg0 = arg0; ipi.arg1 = arg1;
- ret = send_ipi_many(&ipi); + ret = send_ipi_many(&ipi, wait);
return ret; } diff --git a/arch/riscv/lib/spl.c b/arch/riscv/lib/spl.c index bea8695987..fd3c2efc5b 100644 --- a/arch/riscv/lib/spl.c +++ b/arch/riscv/lib/spl.c @@ -40,7 +40,7 @@ void __noreturn jump_to_image_no_args(struct spl_image_info *spl_image)
debug("image entry point: 0x%lX\n", spl_image->entry_point); #ifdef CONFIG_SMP - ret = smp_call_function(spl_image->entry_point, (ulong)fdt_blob, 0); + ret = smp_call_function(spl_image->entry_point, (ulong)fdt_blob, 0, 0); if (ret) hang(); #endif diff --git a/common/spl/spl_opensbi.c b/common/spl/spl_opensbi.c index 79ee7edcf9..91a411a3db 100644 --- a/common/spl/spl_opensbi.c +++ b/common/spl/spl_opensbi.c @@ -77,7 +77,7 @@ void spl_invoke_opensbi(struct spl_image_info *spl_image) #ifdef CONFIG_SMP ret = smp_call_function((ulong)spl_image->entry_point, (ulong)spl_image->fdt_addr, - (ulong)&opensbi_info); + (ulong)&opensbi_info, 0); if (ret) hang(); #endif

At the start, OpenSBI relocates itself to its link address. If the link address ranges of U-Boot SPL and OpenSBI overlap, the relocation can lead to code corruption if a hart is still running U-Boot SPL during relocation. To avoid this problem, the main hart is specified as the preferred boot hart to perform the relocation. This fixes the code corruption problems based on the assumption that since the main hart schedules the secondary harts to enter OpenSBI, it will be the last to enter OpenSBI. However it was reported that this assumption is not always correct.
To make sure the assumption always holds true, wait for all secondary harts to acknowledge the call-function request before entering OpenSBI on the main hart.
Reported-by: Rick Chen rick@andestech.com Signed-off-by: Lukas Auer lukas.auer@aisec.fraunhofer.de ---
common/spl/spl_opensbi.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/common/spl/spl_opensbi.c b/common/spl/spl_opensbi.c index 91a411a3db..5ea59d423f 100644 --- a/common/spl/spl_opensbi.c +++ b/common/spl/spl_opensbi.c @@ -75,9 +75,19 @@ void spl_invoke_opensbi(struct spl_image_info *spl_image) invalidate_icache_all();
#ifdef CONFIG_SMP + /* + * Start OpenSBI on all secondary harts and wait for acknowledgment. + * + * OpenSBI first relocates itself to its link address. This is done by + * the main hart. To make sure no hart is still running U-Boot SPL + * during relocation, we wait for all secondary harts to acknowledge + * the call-function request before entering OpenSBI on the main hart. + * Otherwise, code corruption can occur if the link address ranges of + * U-Boot SPL and OpenSBI overlap. + */ ret = smp_call_function((ulong)spl_image->entry_point, (ulong)spl_image->fdt_addr, - (ulong)&opensbi_info, 0); + (ulong)&opensbi_info, 1); if (ret) hang(); #endif
participants (1)
-
Lukas Auer