[RFC PATCH 0/2] Coroutines

This series introduces a simple coroutines framework and uses it to speed up the efi_init_obj_list() function. It is just an example to trigger discussions; hopefully other places in U-Boot can benefit from a similar treatment. Suggestions are welcome.
I came up with this idea after analyzing some profiling data (ftrace) taken during the execution of the "printenv -e" command on a Kria KV260 board. When the command is first invoked, efi_init_obj_list() is called which takes a significant amount of time (2.872 seconds). This time can be split into 0.570 s for efi_disks_register(), 0.811 s for efi_tcg2_register() and 1.418 s for efi_tcg2_do_initial_measurement(). All the other child functions are much quicker. Another interesting observation is that a large part of the time is actually spent waiting in __udelay(). More precisely: - For efi_disks_register(): 421 ms / 570 ms = 73.8 % spent in __udelay() - For efi_tcg2_register(): 805 ms / 811 ms = 99.1 % spent in __udelay() - For efi_tcg2_do_initial_measurement(): 1395.025 ms / 1418.372 ms = 98.3 % spent in __udelay()
Given the above data, it was reasonable to think that a nice speedup could be obtained if these functions could somehow be run in parallel. efi_tcg2_do_initial_measurement() unfortunately needs to be excluded because it depends on the first two. But efi_disks_register() and efi_tcg2_register() are clearly independant and are therefore good candidates for concurrent execution.
So, I considered two options: - Spin up a secondary core to take care of one function while the other one runs on the main core - Introduce some kind of software scheduling I quickly ruled out the first one for several reasons: initializing a secondary core is typically quite hardware-specific, it would not scale well if more functions than available cores would need to be run in parallel, it would make debugging harder etc. Software scheduling however can be accomplished quite easily, especially since we don't need to consider preemptive multitasking. Coroutines [1] for example can perfectly do the job. They provide a way to save and restore the execution context (registers and stack). Here is how it look like:
static void do_some_work(int v) { int i; for (i = 0; i < 5; i++) { printf("%d", v); /* Save context and resume main thread */ co_yield(); } }
static void co1(void *arg) { do_some_work(1); /* Mark coroutine as "done" and resume main thread */ co_exit(); }
static void co2(void *arg) { do_some_work(2); co_exit(); }
void main_thread(void) { struct co *co1, *co2;
co1 = co_create(co1, ...); co2 = co_create(co2, ...);
do { printf("A"); if (!co1->done) { /* Invoke or resume first coroutine */ co_resume(co1); } printf("B"); if (!cor21->done) { /* Invoke or resume second coroutine */ co_resume(co2); } } while (!(co1->done && co2->done));
/* At this point, co1 and co2 have both called co_exit() */ }
The above example would print: A1B2A1B2A1B2A1B2A1B2.
- The first commit introduces the coroutine framework, loosely based on libaco [2]. The code was simplified and reformatted to better suit U-Boot. - The second commit modifies efi_init_obj_list() in order to turn efi_disks_register() and efi_tcg2_register() into coroutines when COROUTINES in enabled.
[1] https://en.wikipedia.org/wiki/Coroutine [2] https://github.com/hnes/libaco/
Jerome Forissier (2): Introduce coroutines framework efi_loader: optimize efi_init_obj_list() with coroutines
arch/arm/cpu/armv8/Makefile | 1 + arch/arm/cpu/armv8/co_switch.S | 35 +++++++ arch/x86/cpu/i386/Makefile | 1 + arch/x86/cpu/i386/co_switch.S | 26 +++++ arch/x86/cpu/x86_64/Makefile | 2 + arch/x86/cpu/x86_64/co_switch.S | 26 +++++ include/coroutines.h | 151 +++++++++++++++++++++++++++ lib/Kconfig | 10 ++ lib/Makefile | 2 + lib/coroutines.c | 176 ++++++++++++++++++++++++++++++++ lib/efi_loader/efi_setup.c | 113 +++++++++++++++++++- lib/time.c | 14 ++- 12 files changed, 552 insertions(+), 5 deletions(-) create mode 100644 arch/arm/cpu/armv8/co_switch.S create mode 100644 arch/x86/cpu/i386/co_switch.S create mode 100644 arch/x86/cpu/x86_64/co_switch.S create mode 100644 include/coroutines.h create mode 100644 lib/coroutines.c

Adds the COROUTINES Kconfig symbol which introduces a new internal API for coroutines support. As explained in the Kconfig file, this is meant to provide some kind of cooperative multi-tasking with the goal to improve performance by overlapping lengthy operations.
The API as well as the implementation is very much inspired from libaco [1]. The reference implementation is simplified to remove all things not needed in U-Boot, the coding style is updated, and the aco_ prefix is replaced by co_.
I believe the stack handling could be simplified: the stack of the main coroutine could probably probably be used by the secondary coroutines instead of allocating a new stack dynamically.
Only i386, x86_64 and aarch64 are supported at the moment. Other architectures need to provide a _co_switch() function in assembly.
Only aarch64 has been tested.
[1] https://github.com/hnes/libaco/
Signed-off-by: Jerome Forissier jerome.forissier@linaro.org --- arch/arm/cpu/armv8/Makefile | 1 + arch/arm/cpu/armv8/co_switch.S | 35 +++++++ arch/x86/cpu/i386/Makefile | 1 + arch/x86/cpu/i386/co_switch.S | 26 +++++ arch/x86/cpu/x86_64/Makefile | 2 + arch/x86/cpu/x86_64/co_switch.S | 26 +++++ include/coroutines.h | 151 +++++++++++++++++++++++++++ lib/Kconfig | 10 ++ lib/Makefile | 2 + lib/coroutines.c | 176 ++++++++++++++++++++++++++++++++ 10 files changed, 430 insertions(+) create mode 100644 arch/arm/cpu/armv8/co_switch.S create mode 100644 arch/x86/cpu/i386/co_switch.S create mode 100644 arch/x86/cpu/x86_64/co_switch.S create mode 100644 include/coroutines.h create mode 100644 lib/coroutines.c
diff --git a/arch/arm/cpu/armv8/Makefile b/arch/arm/cpu/armv8/Makefile index 2e71ff2dc97..6d07b6aa9f9 100644 --- a/arch/arm/cpu/armv8/Makefile +++ b/arch/arm/cpu/armv8/Makefile @@ -46,3 +46,4 @@ obj-$(CONFIG_TARGET_BCMNS3) += bcmns3/ obj-$(CONFIG_XEN) += xen/ obj-$(CONFIG_ARMV8_CE_SHA1) += sha1_ce_glue.o sha1_ce_core.o obj-$(CONFIG_ARMV8_CE_SHA256) += sha256_ce_glue.o sha256_ce_core.o +obj-$(CONFIG_COROUTINES) += co_switch.o diff --git a/arch/arm/cpu/armv8/co_switch.S b/arch/arm/cpu/armv8/co_switch.S new file mode 100644 index 00000000000..2fa6c52935a --- /dev/null +++ b/arch/arm/cpu/armv8/co_switch.S @@ -0,0 +1,35 @@ +/* void _co_switch(struct uco *from_co, struct uco *to_co); */ +.text +.globl _co_switch +.type _co_switch, @function +_co_switch: + // x0: from_co + // x1: to_co + // from_co and to_co layout: { pc, sp, x19-x29 } + + // Save context to from_co (x0) + // AAPCS64 says "A subroutine invocation must preserve the contents of the + // registers r19-r29 and SP" + adr x2, 1f // pc we should use to resume after this function + mov x3, sp + stp x2, x3, [x0, #0] // pc, sp + stp x19, x20, [x0, #16] + stp x21, x22, [x0, #32] + stp x23, x24, [x0, #48] + stp x25, x26, [x0, #64] + stp x27, x28, [x0, #80] + stp x29, x30, [x0, #96] + + // Load new context from to_co (x1) + ldp x2, x3, [x1, #0] // pc, sp + ldp x19, x20, [x1, #16] + ldp x21, x22, [x1, #32] + ldp x23, x24, [x1, #48] + ldp x25, x26, [x1, #64] + ldp x27, x28, [x1, #80] + ldp x29, x30, [x1, #96] + mov sp, x3 + br x2 + +1: // Return to the caller + ret diff --git a/arch/x86/cpu/i386/Makefile b/arch/x86/cpu/i386/Makefile index 18e152074a7..7066904a41e 100644 --- a/arch/x86/cpu/i386/Makefile +++ b/arch/x86/cpu/i386/Makefile @@ -9,3 +9,4 @@ ifndef CONFIG_TPL_BUILD obj-y += interrupt.o endif obj-y += setjmp.o +obj-$(CONFIG_COROUTINES) += co_switch.o diff --git a/arch/x86/cpu/i386/co_switch.S b/arch/x86/cpu/i386/co_switch.S new file mode 100644 index 00000000000..ec4d2d778f7 --- /dev/null +++ b/arch/x86/cpu/i386/co_switch.S @@ -0,0 +1,26 @@ +/* void _co_switch(struct uco *from_co, struct uco *to_co); */ +.text +.globl _co_switch +.type _co_switch, @function +.intel_syntax noprefix +_co_switch: + mov eax,DWORD PTR [esp+0x4] // from_co + mov edx,DWORD PTR [esp] // retaddr + lea ecx,[esp+0x4] // esp + mov DWORD PTR [eax+0x8],ebp //<ebp + mov DWORD PTR [eax+0x4],ecx //<esp + mov DWORD PTR [eax+0x0],edx //<retaddr + mov DWORD PTR [eax+0xc],edi //<edi + mov ecx,DWORD PTR [esp+0x8] // to_co + mov DWORD PTR [eax+0x10],esi //<esi + mov DWORD PTR [eax+0x14],ebx //<ebx + mov edx,DWORD PTR [ecx+0x4] //>esp + mov ebp,DWORD PTR [ecx+0x8] //>ebp + mov eax,DWORD PTR [ecx+0x0] //>retaddr + mov edi,DWORD PTR [ecx+0xc] //>edi + mov esi,DWORD PTR [ecx+0x10] //>esi + mov ebx,DWORD PTR [ecx+0x14] //>ebx + xor ecx,ecx + mov esp,edx + xor edx,edx + jmp eax diff --git a/arch/x86/cpu/x86_64/Makefile b/arch/x86/cpu/x86_64/Makefile index e929563b2c1..862f522242c 100644 --- a/arch/x86/cpu/x86_64/Makefile +++ b/arch/x86/cpu/x86_64/Makefile @@ -8,3 +8,5 @@ obj-y += cpu.o interrupts.o setjmp.o ifndef CONFIG_EFI obj-y += misc.o endif + +obj-$(CONFIG_COUROUTINES) += co_switch.o diff --git a/arch/x86/cpu/x86_64/co_switch.S b/arch/x86/cpu/x86_64/co_switch.S new file mode 100644 index 00000000000..ec928f2d1f7 --- /dev/null +++ b/arch/x86/cpu/x86_64/co_switch.S @@ -0,0 +1,26 @@ +/* void _co_switch(struct uco *from_co, struct uco *to_co); */ +.text +.globl _co_switch +.type _co_switch, @function +.intel_syntax noprefix +_co_switch: + mov rdx,QWORD PTR [rsp] // retaddr + lea rcx,[rsp+0x8] // rsp + mov QWORD PTR [rdi+0x0], r12 + mov QWORD PTR [rdi+0x8], r13 + mov QWORD PTR [rdi+0x10],r14 + mov QWORD PTR [rdi+0x18],r15 + mov QWORD PTR [rdi+0x20],rdx // retaddr + mov QWORD PTR [rdi+0x28],rcx // rsp + mov QWORD PTR [rdi+0x30],rbx + mov QWORD PTR [rdi+0x38],rbp + mov r12,QWORD PTR [rsi+0x0] + mov r13,QWORD PTR [rsi+0x8] + mov r14,QWORD PTR [rsi+0x10] + mov r15,QWORD PTR [rsi+0x18] + mov rax,QWORD PTR [rsi+0x20] // retaddr + mov rcx,QWORD PTR [rsi+0x28] // rsp + mov rbx,QWORD PTR [rsi+0x30] + mov rbp,QWORD PTR [rsi+0x38] + mov rsp,rcx + jmp rax diff --git a/include/coroutines.h b/include/coroutines.h new file mode 100644 index 00000000000..2e2fb1170a3 --- /dev/null +++ b/include/coroutines.h @@ -0,0 +1,151 @@ +/* SPDX-License-Identifier: Apache-2.0 */ +/* + * Copyright 2018 Sen Han 00hnes@gmail.com + * Copyright 2025 Linaro Limited + */ + +#ifndef _COROUTINES_H_ +#define _COROUTINES_H_ + +#ifndef CONFIG_COROUTINES + +static inline void co_yield(void) {} +static inline void co_exit(void) {} + +#else + +#ifdef __UBOOT__ +#include <log.h> +#else +#include <assert.h> +#endif +#include <limits.h> +#include <stdbool.h> +#include <stdint.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <time.h> + +#ifdef __i386__ +#define UCO_REG_IDX_RETADDR 0 +#define UCO_REG_IDX_SP 1 +#define UCO_REG_IDX_BP 2 +#elif __x86_64__ +#define UCO_REG_IDX_RETADDR 4 +#define UCO_REG_IDX_SP 5 +#define UCO_REG_IDX_BP 7 +#elif __aarch64__ +#define UCO_REG_IDX_RETADDR 0 +#define UCO_REG_IDX_SP 1 +#else +#error Architecture no supported +#endif + +struct co_save_stack { + void* ptr; + size_t sz; + size_t valid_sz; + size_t max_cpsz; /* max copy size in bytes */ +}; + +struct co_stack { + void *ptr; + size_t sz; + void *align_highptr; + void *align_retptr; + size_t align_validsz; + size_t align_limit; + struct co *owner; + void *real_ptr; + size_t real_sz; +}; + +struct co { + /* + * CPU registers state (callee-savec plus SP, PC) + */ +#ifdef __i386__ + void* reg[6]; +#elif __x86_64__ + void* reg[8]; +#elif __aarch64__ + void *reg[14]; // pc, sp, x19-x29, x30 (lr) +#endif + struct co *main_co; + void *arg; + bool done; + + void (*fp)(void); + + struct co_save_stack save_stack; + struct co_stack *stack; +}; + +#if defined(__i386__) || defined(__x86_64__) +#define UCO_THREAD __thread +#else +#define UCO_THREAD +#endif + +extern UCO_THREAD struct co *current_co; + +static inline struct co *co_get_co(void) +{ + return current_co; +} + +static inline void *co_get_arg(void) +{ + return co_get_co()->arg; +} + +struct co_stack *co_stack_new(size_t sz); + +void co_stack_destroy(struct co_stack *s); + +struct co *co_create(struct co *main_co, + struct co_stack *stack, + size_t save_stack_sz, void (*fp)(void), + void *arg); + +void co_resume(struct co *resume_co); + +void co_destroy(struct co *co); + +void *_co_switch(struct co *from_co, struct co *to_co); + +static inline void _co_yield_to_main_co(struct co *yield_co) +{ + assert(yield_co); + assert(yield_co->main_co); + _co_switch(yield_co, yield_co->main_co); +} + +static inline void co_yield(void) +{ + if (current_co) + _co_yield_to_main_co(current_co); +} + +static inline bool co_is_main_co(struct co *co) +{ + return !co->main_co; +} + +static inline void co_exit(void) +{ + struct co *co = co_get_co(); + + if (!co) + return; + co->done = true; + assert(co->stack->owner == co); + co->stack->owner = NULL; + co->stack->align_validsz = 0; + _co_yield_to_main_co(co); + assert(false); +} + +#endif /* CONFIG_COROUTINES */ +#endif /* _COROUTINES_H_ */ diff --git a/lib/Kconfig b/lib/Kconfig index 8f1a96d98c4..b6c1380b927 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -1226,6 +1226,16 @@ config PHANDLE_CHECK_SEQ enable this config option to distinguish them using phandles in fdtdec_get_alias_seq() function.
+config COROUTINES + bool "Enable coroutine support" + help + Coroutines allow to implement a simple form of cooperative + multi-tasking. The main thread of execution registers one or + more functions as coroutine entry points, then it schedules one + of them. At any point the scheduled coroutine may yield, that is, + suspend its execution and return back to the main thread. At this + point another coroutine may be scheduled and so on until all the + registered coroutines are done. endmenu
source "lib/fwu_updates/Kconfig" diff --git a/lib/Makefile b/lib/Makefile index 5cb3278d2ef..7b809151f5a 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -159,6 +159,8 @@ obj-$(CONFIG_LIB_ELF) += elf.o
obj-$(CONFIG_$(PHASE_)SEMIHOSTING) += semihosting.o
+obj-$(CONFIG_COROUTINES) += coroutines.o + # # Build a fast OID lookup registry from include/linux/oid_registry.h # diff --git a/lib/coroutines.c b/lib/coroutines.c new file mode 100644 index 00000000000..b23c2004a06 --- /dev/null +++ b/lib/coroutines.c @@ -0,0 +1,176 @@ +// SPDX-License-Identifier: Apache-2.0 + +// Copyright 2018 Sen Han 00hnes@gmail.com +// Copyright 2025 Linaro Limited + +#include <coroutines.h> +#include <stdio.h> +#include <stdint.h> + + +/* Current co-routine */ +UCO_THREAD struct co *current_co; + +struct co_stack *co_stack_new(size_t sz) +{ + struct co_stack *p = calloc(1, sizeof(*p)); + uintptr_t u_p; + + if (!p) + return NULL; + + if (sz < 4096) + sz = 4096; + + p->sz = sz; + p->ptr = malloc(sz); + if (!p->ptr) { + free(p); + return NULL; + } + + p->owner = NULL; + u_p = (uintptr_t)(p->sz - (sizeof(void*) << 1) + (uintptr_t)p->ptr); + u_p = (u_p >> 4) << 4; + p->align_highptr = (void*)u_p; + p->align_retptr = (void*)(u_p - sizeof(void*)); + assert(p->sz > (16 + (sizeof(void*) << 1) + sizeof(void*))); + p->align_limit = p->sz - 16 - (sizeof(void*) << 1); + + return p; +} + +void co_stack_destroy(struct co_stack *s){ + if (!s) + return; + free(s->ptr); + free(s); +} + +struct co *co_create(struct co *main_co, + struct co_stack *stack, + size_t save_stack_sz, + void (*fp)(void), void *arg) +{ + struct co *p = malloc(sizeof(*p)); + assert(p); + memset(p, 0, sizeof(*p)); + + if (main_co) { + assert(stack); + p->stack = stack; +#ifdef __i386__ + // POSIX.1-2008 (IEEE Std 1003.1-2008) - General Information - Data Types - Pointer Types + // http://pubs.opengroup.org/onlinepubs/9699919799.2008edition/functions/V2_cha... + p->reg[UCO_REG_IDX_RETADDR] = (void*)fp; + // push retaddr + p->reg[UCO_REG_IDX_SP] = p->stack->align_retptr; +#elif __x86_64__ + p->reg[UCO_REG_IDX_RETADDR] = (void*)fp; + p->reg[UCO_REG_IDX_SP] = p->stack->align_retptr; +#elif __aarch64__ + p->reg[UCO_REG_IDX_RETADDR] = (void *)fp; + // FIXME setting to align_retptr causes a crash + p->reg[UCO_REG_IDX_SP] = p->stack->align_highptr; +#endif + p->main_co = main_co; + p->arg = arg; + p->fp = fp; + if (!save_stack_sz) + save_stack_sz = 64; + p->save_stack.ptr = malloc(save_stack_sz); + assert(p->save_stack.ptr); + p->save_stack.sz = save_stack_sz; + p->save_stack.valid_sz = 0; + } else { + p->main_co = NULL; + p->arg = arg; + p->fp = fp; + p->stack = NULL; + p->save_stack.ptr = NULL; + } + return p; +} + +static void grab_stack(struct co *resume_co) +{ + struct co *owner_co = resume_co->stack->owner; + + if (owner_co) { + assert(owner_co->stack == resume_co->stack); + assert((uintptr_t)(owner_co->stack->align_retptr) >= + (uintptr_t)(owner_co->reg[UCO_REG_IDX_SP])); + assert((uintptr_t)owner_co->stack->align_highptr - + (uintptr_t)owner_co->stack->align_limit + <= (uintptr_t)owner_co->reg[UCO_REG_IDX_SP]); + owner_co->save_stack.valid_sz = + (uintptr_t)owner_co->stack->align_retptr - + (uintptr_t)owner_co->reg[UCO_REG_IDX_SP]; + if (owner_co->save_stack.sz < owner_co->save_stack.valid_sz) { + free(owner_co->save_stack.ptr); + owner_co->save_stack.ptr = NULL; + do { + owner_co->save_stack.sz <<= 1; + assert(owner_co->save_stack.sz > 0); + } while (owner_co->save_stack.sz < + owner_co->save_stack.valid_sz); + owner_co->save_stack.ptr = + malloc(owner_co->save_stack.sz); + assert(owner_co->save_stack.ptr); + } + if (owner_co->save_stack.valid_sz > 0) + memcpy(owner_co->save_stack.ptr, + owner_co->reg[UCO_REG_IDX_SP], + owner_co->save_stack.valid_sz); + if (owner_co->save_stack.valid_sz > + owner_co->save_stack.max_cpsz) + owner_co->save_stack.max_cpsz = + owner_co->save_stack.valid_sz; + owner_co->stack->owner = NULL; + owner_co->stack->align_validsz = 0; + } + assert(!resume_co->stack->owner); + assert(resume_co->save_stack.valid_sz <= + resume_co->stack->align_limit - sizeof(void *)); + if (resume_co->save_stack.valid_sz > 0) + memcpy((void*) + (uintptr_t)(resume_co->stack->align_retptr) - + resume_co->save_stack.valid_sz, + resume_co->save_stack.ptr, + resume_co->save_stack.valid_sz); + if (resume_co->save_stack.valid_sz > resume_co->save_stack.max_cpsz) + resume_co->save_stack.max_cpsz = resume_co->save_stack.valid_sz; + resume_co->stack->align_validsz = + resume_co->save_stack.valid_sz + sizeof(void *); + resume_co->stack->owner = resume_co; +} + +void co_resume(struct co *resume_co) +{ + assert(resume_co && resume_co->main_co && !resume_co->done); + + if (resume_co->stack->owner != resume_co) + grab_stack(resume_co); + + current_co = resume_co; + _co_switch(resume_co->main_co, resume_co); + current_co = resume_co->main_co; +} + +void co_destroy(struct co *co){ + if (!co) + return; + + if(co_is_main_co(co)){ + free(co); + current_co = NULL; + } else { + if(co->stack->owner == co){ + co->stack->owner = NULL; + co->stack->align_validsz = 0; + } + free(co->save_stack.ptr); + co->save_stack.ptr = NULL; + free(co); + } +}

On 1/20/25 14:50, Jerome Forissier wrote:
Adds the COROUTINES Kconfig symbol which introduces a new internal API for coroutines support. As explained in the Kconfig file, this is meant to provide some kind of cooperative multi-tasking with the goal to improve performance by overlapping lengthy operations.
The API as well as the implementation is very much inspired from libaco [1]. The reference implementation is simplified to remove all things not needed in U-Boot, the coding style is updated, and the aco_ prefix is replaced by co_.
I believe the stack handling could be simplified: the stack of the main coroutine could probably probably be used by the secondary coroutines instead of allocating a new stack dynamically.
Only i386, x86_64 and aarch64 are supported at the moment. Other architectures need to provide a _co_switch() function in assembly.
Only aarch64 has been tested.
I can't see the reason why to keep x86 around if it is not tested.
Licenses should be cleared for all files.
Also I think this mostly should target full u-boot and not really TPL/SPL as of today.
I have applied this patch to measure time difference on kv260 and kd240 and pretty much the time is the same.
diff --git a/lib/efi_loader/efi_setup.c b/lib/efi_loader/efi_setup.c index 94160f4bd86c..18d2260b8c11 100644 --- a/lib/efi_loader/efi_setup.c +++ b/lib/efi_loader/efi_setup.c @@ -258,6 +258,7 @@ extern int udelay_yield; efi_status_t efi_init_obj_list(void) { efi_status_t ret = EFI_SUCCESS; + ulong t0, t1; #if CONFIG_IS_ENABLED(COROUTINES) struct co_stack *stk = NULL; struct co *main_co = NULL; @@ -272,6 +273,7 @@ efi_status_t efi_init_obj_list(void) /* Set up console modes */ efi_setup_console_size();
+ t0 = timer_get_us(); #if CONFIG_IS_ENABLED(COROUTINES) main_co = co_create(NULL, NULL, 0, NULL, NULL); if (!main_co) { @@ -333,6 +335,9 @@ efi_status_t efi_init_obj_list(void) efi_obj_list_initialized = ret; } #endif + t1 = timer_get_us(); + + printf("time counted %ld\n", t1 - t0);
/* Initialize variable services */ ret = efi_init_variables();
Please correct me if you have measured it differently.
Thanks, Michal

Hi Michal,
On 1/21/25 12:44, Michal Simek wrote:
On 1/20/25 14:50, Jerome Forissier wrote:
Adds the COROUTINES Kconfig symbol which introduces a new internal API for coroutines support. As explained in the Kconfig file, this is meant to provide some kind of cooperative multi-tasking with the goal to improve performance by overlapping lengthy operations.
The API as well as the implementation is very much inspired from libaco [1]. The reference implementation is simplified to remove all things not needed in U-Boot, the coding style is updated, and the aco_ prefix is replaced by co_.
I believe the stack handling could be simplified: the stack of the main coroutine could probably probably be used by the secondary coroutines instead of allocating a new stack dynamically.
Only i386, x86_64 and aarch64 are supported at the moment. Other architectures need to provide a _co_switch() function in assembly.
Only aarch64 has been tested.
I can't see the reason why to keep x86 around if it is not tested.
OK, I'll drop x86 in v2.
Licenses should be cleared for all files.
I will add "SPDX-License-Identifier: Apache-2.0 OR GPL-2.0-or-later" to the coroutines C implementation (include/coroutines.h and lib/coroutines.c) since it originates from libaco which is Apache-2.0 then was modified by me for U-Boot. I will add GPL-2.0-or-later to arch/arm/cpu/armv8/co_switch.S since it is a new file written by me.
Also I think this mostly should target full u-boot and not really TPL/SPL as of today.
I agree, but why do you think it targets xPL? efi_init_obj_list() is called as a result of "printenv -e" in the main U-Boot for example. I mentioned in the cover letter that I think coroutines could be beneficial in other places too.
I have applied this patch to measure time difference on kv260 and kd240 and pretty much the time is the same.
diff --git a/lib/efi_loader/efi_setup.c b/lib/efi_loader/efi_setup.c index 94160f4bd86c..18d2260b8c11 100644 --- a/lib/efi_loader/efi_setup.c +++ b/lib/efi_loader/efi_setup.c @@ -258,6 +258,7 @@ extern int udelay_yield; efi_status_t efi_init_obj_list(void) { efi_status_t ret = EFI_SUCCESS; + ulong t0, t1; #if CONFIG_IS_ENABLED(COROUTINES) struct co_stack *stk = NULL; struct co *main_co = NULL; @@ -272,6 +273,7 @@ efi_status_t efi_init_obj_list(void) /* Set up console modes */ efi_setup_console_size();
+ t0 = timer_get_us(); #if CONFIG_IS_ENABLED(COROUTINES) main_co = co_create(NULL, NULL, 0, NULL, NULL); if (!main_co) { @@ -333,6 +335,9 @@ efi_status_t efi_init_obj_list(void) efi_obj_list_initialized = ret; } #endif + t1 = timer_get_us();
+ printf("time counted %ld\n", t1 - t0);
/* Initialize variable services */ ret = efi_init_variables();
Please correct me if you have measured it differently.
That's fine. Do you have a SD card inserted in your board? If not, the efi_disks_register() call is very quick and therefore it makes little difference if it's run sequentially or in parallel with efi_tcg2_register().
Thanks,

On 1/21/25 14:15, Jerome Forissier wrote:
Hi Michal,
On 1/21/25 12:44, Michal Simek wrote:
On 1/20/25 14:50, Jerome Forissier wrote:
Adds the COROUTINES Kconfig symbol which introduces a new internal API for coroutines support. As explained in the Kconfig file, this is meant to provide some kind of cooperative multi-tasking with the goal to improve performance by overlapping lengthy operations.
The API as well as the implementation is very much inspired from libaco [1]. The reference implementation is simplified to remove all things not needed in U-Boot, the coding style is updated, and the aco_ prefix is replaced by co_.
I believe the stack handling could be simplified: the stack of the main coroutine could probably probably be used by the secondary coroutines instead of allocating a new stack dynamically.
Only i386, x86_64 and aarch64 are supported at the moment. Other architectures need to provide a _co_switch() function in assembly.
Only aarch64 has been tested.
I can't see the reason why to keep x86 around if it is not tested.
OK, I'll drop x86 in v2.
Licenses should be cleared for all files.
I will add "SPDX-License-Identifier: Apache-2.0 OR GPL-2.0-or-later" to the coroutines C implementation (include/coroutines.h and lib/coroutines.c) since it originates from libaco which is Apache-2.0 then was modified by me for U-Boot. I will add GPL-2.0-or-later to arch/arm/cpu/armv8/co_switch.S since it is a new file written by me.
Also I think this mostly should target full u-boot and not really TPL/SPL as of today.
I agree, but why do you think it targets xPL? efi_init_obj_list() is called as a result of "printenv -e" in the main U-Boot for example. I mentioned in the cover letter that I think coroutines could be beneficial in other places too.
It is build for SPL too if you look at files which are touched. I think you should define separate symbols for SPL/TPL.
I have applied this patch to measure time difference on kv260 and kd240 and pretty much the time is the same.
diff --git a/lib/efi_loader/efi_setup.c b/lib/efi_loader/efi_setup.c index 94160f4bd86c..18d2260b8c11 100644 --- a/lib/efi_loader/efi_setup.c +++ b/lib/efi_loader/efi_setup.c @@ -258,6 +258,7 @@ extern int udelay_yield; efi_status_t efi_init_obj_list(void) { efi_status_t ret = EFI_SUCCESS; + ulong t0, t1; #if CONFIG_IS_ENABLED(COROUTINES) struct co_stack *stk = NULL; struct co *main_co = NULL; @@ -272,6 +273,7 @@ efi_status_t efi_init_obj_list(void) /* Set up console modes */ efi_setup_console_size();
+ t0 = timer_get_us(); #if CONFIG_IS_ENABLED(COROUTINES) main_co = co_create(NULL, NULL, 0, NULL, NULL); if (!main_co) { @@ -333,6 +335,9 @@ efi_status_t efi_init_obj_list(void) efi_obj_list_initialized = ret; } #endif + t1 = timer_get_us();
+ printf("time counted %ld\n", t1 - t0);
/* Initialize variable services */ ret = efi_init_variables();
Please correct me if you have measured it differently.
That's fine. Do you have a SD card inserted in your board?
yes I have. I do use xilinx_zynqmp_kria_defconfig only.
If not, the efi_disks_register() call is very quick and therefore it makes little difference if it's run sequentially or in parallel with efi_tcg2_register().
I think in my case where distro boot runs things are initialized already that's why diff is not there.
Thanks, Michal

When COROUTINES is enabled, schedule efi_disks_register() and efi_tcg2_register() as two coroutines in efi_init_obj_list() instead of invoking them sequentially. The voluntary yield point is introduced inside udelay() which is called frequently as a result of each function polling the hardware. This allows the two coroutines to make progress simultaneously and reduce the wall clock time required by efi_init_obj_list().
Tested on Kria KV260 with a microSD card inserted with the "printenv -e" command. With COROUTINES disabled, efi_init_obj_list() completes in 2821 ms on average (2825, 2822, 2817). With COROUTINES enabled, it takes 2265 ms (2262, 2260, 2272). That is a reduction of 556 ms which is not bad at all considering that measured separately, efi_tcg2_register() takes ~825 ms and efi_disks_register() needs ~600 ms, so assuming they would overlap perfectly one can expect a 600 ms improvement at best.
The code size penalty for this improvement is 1340 bytes.
Signed-off-by: Jerome Forissier jerome.forissier@linaro.org --- lib/efi_loader/efi_setup.c | 113 +++++++++++++++++++++++++++++++++++-- lib/time.c | 14 ++++- 2 files changed, 122 insertions(+), 5 deletions(-)
diff --git a/lib/efi_loader/efi_setup.c b/lib/efi_loader/efi_setup.c index aa59bc7779d..94160f4bd86 100644 --- a/lib/efi_loader/efi_setup.c +++ b/lib/efi_loader/efi_setup.c @@ -7,10 +7,12 @@
#define LOG_CATEGORY LOGC_EFI
+#include <coroutines.h> #include <efi_loader.h> #include <efi_variable.h> #include <log.h> #include <asm-generic/unaligned.h> +#include <stdlib.h>
#define OBJ_LIST_NOT_INITIALIZED 1
@@ -208,6 +210,46 @@ out: return -1; }
+#if CONFIG_IS_ENABLED(COROUTINES) + +static void efi_disks_register_co(void) +{ + efi_status_t ret; + + if (efi_obj_list_initialized != OBJ_LIST_NOT_INITIALIZED) + goto out; + + /* + * Probe block devices to find the ESP. + * efi_disks_register() must be called before efi_init_variables(). + */ + ret = efi_disks_register(); + if (ret != EFI_SUCCESS) + efi_obj_list_initialized = ret; +out: + co_exit(); +} + +static void efi_tcg2_register_co(void) +{ + efi_status_t ret = EFI_SUCCESS; + + if (efi_obj_list_initialized != OBJ_LIST_NOT_INITIALIZED) + goto out; + + if (IS_ENABLED(CONFIG_EFI_TCG2_PROTOCOL)) { + ret = efi_tcg2_register(); + if (ret != EFI_SUCCESS) + efi_obj_list_initialized = ret; + } +out: + co_exit(); +} + +extern int udelay_yield; + +#endif /* COROUTINES */ + /** * efi_init_obj_list() - Initialize and populate EFI object list * @@ -216,6 +258,12 @@ out: efi_status_t efi_init_obj_list(void) { efi_status_t ret = EFI_SUCCESS; +#if CONFIG_IS_ENABLED(COROUTINES) + struct co_stack *stk = NULL; + struct co *main_co = NULL; + struct co *co1 = NULL; + struct co *co2 = NULL; +#endif
/* Initialize once only */ if (efi_obj_list_initialized != OBJ_LIST_NOT_INITIALIZED) @@ -224,6 +272,53 @@ efi_status_t efi_init_obj_list(void) /* Set up console modes */ efi_setup_console_size();
+#if CONFIG_IS_ENABLED(COROUTINES) + main_co = co_create(NULL, NULL, 0, NULL, NULL); + if (!main_co) { + ret = EFI_OUT_OF_RESOURCES; + goto out; + } + + stk = co_stack_new(8192); + if (!stk) { + ret = EFI_OUT_OF_RESOURCES; + goto out; + } + + co1 = co_create(main_co, stk, 0, efi_disks_register_co, NULL); + if (!co1) { + ret = EFI_OUT_OF_RESOURCES; + goto out; + } + + co2 = co_create(main_co, stk, 0, efi_tcg2_register_co, NULL); + if (!co2) { + ret = EFI_OUT_OF_RESOURCES; + goto out; + } + + udelay_yield = 0xCAFEDECA; + do { + if (!co1->done) + co_resume(co1); + if (!co2->done) + co_resume(co2); + } while (!(co1->done && co2->done)); + udelay_yield = 0; + + co_stack_destroy(stk); + co_destroy(main_co); + co_destroy(co1); + co_destroy(co2); + stk = NULL; + main_co = co1 = co2 = NULL; + + if (efi_obj_list_initialized != OBJ_LIST_NOT_INITIALIZED) { + /* Some kind of error was saved by a coroutine */ + ret = efi_obj_list_initialized; + goto out; + } +#else /* * Probe block devices to find the ESP. * efi_disks_register() must be called before efi_init_variables(). @@ -232,6 +327,13 @@ efi_status_t efi_init_obj_list(void) if (ret != EFI_SUCCESS) goto out;
+ if (IS_ENABLED(CONFIG_EFI_TCG2_PROTOCOL)) { + ret = efi_tcg2_register(); + if (ret != EFI_SUCCESS) + efi_obj_list_initialized = ret; + } +#endif + /* Initialize variable services */ ret = efi_init_variables(); if (ret != EFI_SUCCESS) @@ -272,10 +374,6 @@ efi_status_t efi_init_obj_list(void) }
if (IS_ENABLED(CONFIG_EFI_TCG2_PROTOCOL)) { - ret = efi_tcg2_register(); - if (ret != EFI_SUCCESS) - goto out; - ret = efi_tcg2_do_initial_measurement(); if (ret == EFI_SECURITY_VIOLATION) goto out; @@ -350,6 +448,13 @@ efi_status_t efi_init_obj_list(void) !IS_ENABLED(CONFIG_EFI_CAPSULE_ON_DISK_EARLY)) ret = efi_launch_capsules(); out: +#if CONFIG_IS_ENABLED(COROUTINES) + co_stack_destroy(stk); + co_destroy(main_co); + co_destroy(co1); + co_destroy(co2); efi_obj_list_initialized = ret; +#endif + return ret; } diff --git a/lib/time.c b/lib/time.c index d88edafb196..c11288102fe 100644 --- a/lib/time.c +++ b/lib/time.c @@ -17,6 +17,7 @@ #include <asm/global_data.h> #include <asm/io.h> #include <linux/delay.h> +#include <coroutines.h>
#ifndef CFG_WD_PERIOD # define CFG_WD_PERIOD (10 * 1000 * 1000) /* 10 seconds default */ @@ -190,6 +191,8 @@ void __weak __udelay(unsigned long usec)
/* ------------------------------------------------------------------------- */
+int udelay_yield; + void udelay(unsigned long usec) { ulong kv; @@ -197,7 +200,16 @@ void udelay(unsigned long usec) do { schedule(); kv = usec > CFG_WD_PERIOD ? CFG_WD_PERIOD : usec; - __udelay(kv); + if (CONFIG_IS_ENABLED(COROUTINES) && + udelay_yield == 0xCAFEDECA) { + ulong t0 = timer_get_us(); + do { + co_yield(); + __udelay(10); + } while (timer_get_us() < t0 + kv); + } else { + __udelay(kv); + } usec -= kv; } while(usec); }
participants (2)
-
Jerome Forissier
-
Michal Simek