[PATCH 00/19] spl: Support a relocating jump between phases (VBE part F)

This series includes a way to deal with multiple XPL phases being built to run from the same region of SRAM. This is useful because it may not be possible to fit all the different phases in different parts of the SRAM. Also it is a pain to have to build them with different values for CONFIG_TEXT_BASE such that they can be loaded at different addresses.
The mechanism is to copy some relocation code to the top of memory, then load the next phase below that. Finally, the first phase jumps to the relocation code, which copies or decompresses the next phase, overwriting the first phase in the process. Finally, the relocation code jumps to the start of the next phase.
In this way the maximum amount of space can be used, particular if the next phase is compressed.
For this to work, some code in U-Boot must be placed in a 'rcode' (relocation code) section. This ensures it can be copied as a block, thus reducing the amount of the first-stage code which needs to survive the relocation process.
For now there is a qemu-arm64 test for this feature, which ensures that the basic mechanism is sound. Further work will likely expand this test to include VPL and LZ4-compression, which so far have only been used on real hardware.
Overall, without SPL_RELOC_LOADER enabled, this series provides a very small code-size benefit due to it dropping some unneeded symbols.
Simon Glass (19): spl: Reduce the size of the bl_len field spl: Provide a way of indicating the phase to load spl: Avoid including hash algorithms which are not wanted spl: Provide a way to mark code needed for relocation lib: Mark crc8 as relocation code lib: Mark lz4 as relocation code lib: Mark memcpy() and memmove() as relocation code spl: Add a type for the jumper function spl: Add support for a relocating jump to the next phase spl: Plumb in the relocating loader spl: Support jumping to VPL from TPL spl: Record the correct name of the next phase spl: Show how to fill in the size of the next image spl: Add debugging in spl_set_header_raw_uboot() arm: qemu: Allow SPL and TPL arm: Add a new qemu_arm64_tpl board arm: Provide an rcode section in ARMv8 link script arm: qemu_arm64_tpl: Enable the relocating loader CI: Add new test for reloc loader
.azure-pipelines.yml | 3 + .gitlab-ci.yml | 6 + MAINTAINERS | 6 + arch/arm/Kconfig | 2 + arch/arm/cpu/armv8/u-boot-spl.lds | 8 ++ arch/arm/dts/qemu-arm64.dts | 17 +++ arch/arm/mach-qemu/Kconfig | 20 +++ board/emulation/qemu-arm/Kconfig | 2 +- board/emulation/qemu-arm/MAINTAINERS | 5 + board/emulation/qemu-arm/Makefile | 1 + board/emulation/qemu-arm/qemu-arm.env | 4 + board/emulation/qemu-arm/xpl.c | 50 +++++++ common/spl/Kconfig | 9 ++ common/spl/Kconfig.tpl | 9 ++ common/spl/Kconfig.vpl | 8 ++ common/spl/Makefile | 1 + common/spl/spl.c | 45 +++++-- common/spl/spl_reloc.c | 182 ++++++++++++++++++++++++++ configs/qemu_arm64_tpl_defconfig | 83 ++++++++++++ doc/develop/spl.rst | 35 +++++ include/asm-generic/sections.h | 16 +++ include/spl.h | 57 +++++++- lib/Makefile | 6 +- lib/crc8.c | 5 +- lib/lz4.c | 27 ++-- lib/lz4_wrapper.c | 2 +- lib/string.c | 5 +- test/py/tests/test_reloc.py | 21 +++ 28 files changed, 604 insertions(+), 31 deletions(-) create mode 100644 board/emulation/qemu-arm/xpl.c create mode 100644 common/spl/spl_reloc.c create mode 100644 configs/qemu_arm64_tpl_defconfig create mode 100644 test/py/tests/test_reloc.py

This is a block length, so typicaly 512 bytes. Reduce the size to 16 bits to save space, before more fields are added in future work.
Signed-off-by: Simon Glass sjg@chromium.org ---
include/spl.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/spl.h b/include/spl.h index d90eed956af..1cfaa08ed6a 100644 --- a/include/spl.h +++ b/include/spl.h @@ -309,7 +309,7 @@ struct spl_load_info { spl_load_reader read; void *priv; #if IS_ENABLED(CONFIG_SPL_LOAD_BLOCK) - int bl_len; + u16 bl_len; #endif };

On Wed, Sep 25, 2024 at 02:55:27PM +0200, Simon Glass wrote:
This is a block length, so typicaly 512 bytes. Reduce the size to 16 bits to save space, before more fields are added in future work.
Signed-off-by: Simon Glass sjg@chromium.org
include/spl.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
I don't think this works as intended: 02: spl: Reduce the size of the bl_len field arm: (for 1/1 boards) spl/u-boot-spl:all +8.0 spl/u-boot-spl:text +8.0 chiliboard : spl/u-boot-spl:all +8 spl/u-boot-spl:text +8 spl-u-boot-spl: add: 0/0, grow: 2/0 bytes: 8/0 (8) function old new delta spl_nand_load_image 60 64 +4 spl_mmc_load 320 324 +4

Hi Tom,
On Thu, 26 Sept 2024 at 06:06, Tom Rini trini@konsulko.com wrote:
On Wed, Sep 25, 2024 at 02:55:27PM +0200, Simon Glass wrote:
This is a block length, so typicaly 512 bytes. Reduce the size to 16 bits to save space, before more fields are added in future work.
Signed-off-by: Simon Glass sjg@chromium.org
include/spl.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
I don't think this works as intended: 02: spl: Reduce the size of the bl_len field arm: (for 1/1 boards) spl/u-boot-spl:all +8.0 spl/u-boot-spl:text +8.0 chiliboard : spl/u-boot-spl:all +8 spl/u-boot-spl:text +8 spl-u-boot-spl: add: 0/0, grow: 2/0 bytes: 8/0 (8) function old new delta spl_nand_load_image 60 64 +4 spl_mmc_load 320 324 +4
Yes, unfortunately some ARM devices need more code to access a 16-bit value. Perhaps this optimisation is not worth it?
Regards, Simon

On Thu, Sep 26, 2024 at 11:36:17PM +0200, Simon Glass wrote:
Hi Tom,
On Thu, 26 Sept 2024 at 06:06, Tom Rini trini@konsulko.com wrote:
On Wed, Sep 25, 2024 at 02:55:27PM +0200, Simon Glass wrote:
This is a block length, so typicaly 512 bytes. Reduce the size to 16 bits to save space, before more fields are added in future work.
Signed-off-by: Simon Glass sjg@chromium.org
include/spl.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
I don't think this works as intended: 02: spl: Reduce the size of the bl_len field arm: (for 1/1 boards) spl/u-boot-spl:all +8.0 spl/u-boot-spl:text +8.0 chiliboard : spl/u-boot-spl:all +8 spl/u-boot-spl:text +8 spl-u-boot-spl: add: 0/0, grow: 2/0 bytes: 8/0 (8) function old new delta spl_nand_load_image 60 64 +4 spl_mmc_load 320 324 +4
Yes, unfortunately some ARM devices need more code to access a 16-bit value. Perhaps this optimisation is not worth it?
Seems not worth it, yeah.

Provide a field in struct spl_load_info to indicate the phase of the image which should be loaded. This is needed by VBE, which can load images in various phases.
Set the phase to none by default.
Signed-off-by: Simon Glass sjg@chromium.org ---
include/spl.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+)
diff --git a/include/spl.h b/include/spl.h index 1cfaa08ed6a..113d50152a1 100644 --- a/include/spl.h +++ b/include/spl.h @@ -14,6 +14,7 @@ #include <asm/global_data.h> #include <asm/spl.h> #include <handoff.h> +#include <image.h> #include <mmc.h>
struct blk_desc; @@ -304,6 +305,7 @@ typedef ulong (*spl_load_reader)(struct spl_load_info *load, ulong sector, * @read: Function to call to read from the device * @priv: Private data for the device * @bl_len: Block length for reading in bytes + * @phase: Image phase to load */ struct spl_load_info { spl_load_reader read; @@ -311,6 +313,9 @@ struct spl_load_info { #if IS_ENABLED(CONFIG_SPL_LOAD_BLOCK) u16 bl_len; #endif +#if CONFIG_IS_ENABLED(BOOTMETH_VBE) + u8 phase; +#endif };
static inline int spl_get_bl_len(struct spl_load_info *info) @@ -332,6 +337,23 @@ static inline void spl_set_bl_len(struct spl_load_info *info, int bl_len) #endif }
+static inline void spl_set_phase(struct spl_load_info *info, + enum image_phase_t phase) +{ +#if CONFIG_IS_ENABLED(BOOTMETH_VBE) + info->phase = phase; +#endif +} + +static inline enum image_phase_t spl_get_phase(struct spl_load_info *info) +{ +#if CONFIG_IS_ENABLED(BOOTMETH_VBE) + return info->phase; +#else + return IH_PHASE_NONE; +#endif +} + /** * spl_load_init() - Set up a new spl_load_info structure */ @@ -342,6 +364,7 @@ static inline void spl_load_init(struct spl_load_info *load, load->read = h_read; load->priv = priv; spl_set_bl_len(load, bl_len); + spl_set_phase(load, IH_PHASE_NONE); }
/*

Update the build rule so that hash algorithms are only included in an SPL build if they are requested. This helps to reduce code size.
Signed-off-by: Simon Glass sjg@chromium.org ---
lib/Makefile | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/lib/Makefile b/lib/Makefile index 9478257e634..c29d03d4c6e 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -76,9 +76,9 @@ obj-$(CONFIG_ECDSA) += ecdsa/ obj-$(CONFIG_$(SPL_)RSA) += rsa/ obj-$(CONFIG_HASH) += hash-checksum.o obj-$(CONFIG_BLAKE2) += blake2/blake2b.o -obj-$(CONFIG_$(SPL_)SHA1) += sha1.o -obj-$(CONFIG_$(SPL_)SHA256) += sha256.o -obj-$(CONFIG_$(SPL_)SHA512) += sha512.o +obj-$(CONFIG_$(SPL_TPL_)SHA1) += sha1.o +obj-$(CONFIG_$(SPL_TPL_)SHA256) += sha256.o +obj-$(CONFIG_$(SPL_TPL_)SHA512) += sha512.o obj-$(CONFIG_CRYPT_PW) += crypt/ obj-$(CONFIG_$(SPL_)ASN1_DECODER) += asn1_decoder.o

Add a linker symbol which can be used to mark relocation code, so it can be collected by the linker and copied into a suitable place and executed when needed.
Signed-off-by: Simon Glass sjg@chromium.org ---
include/asm-generic/sections.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h index b6bca53db10..3fd5c772a1a 100644 --- a/include/asm-generic/sections.h +++ b/include/asm-generic/sections.h @@ -67,6 +67,9 @@ extern char __text_start[]; /* This marks the text region which must be relocated */ extern char __image_copy_start[], __image_copy_end[];
+/* This marks the rcode region used for SPL relocation */ +extern char _rcode_start[], _rcode_end[]; + extern char __bss_end[]; extern char __rel_dyn_start[], __rel_dyn_end[]; extern char _image_binary_end[]; @@ -77,4 +80,17 @@ extern char _image_binary_end[]; */ extern void _start(void);
+#ifndef USE_HOSTCC +#if CONFIG_IS_ENABLED(RELOC_LOADER) +#define __rcode __section(".text.rcode") +#define __rdata __section(".text.rdata") +#else +#define __rcode +#define __rdata +#endif +#else +#define __rcode +#define __rdata +#endif + #endif /* _ASM_GENERIC_SECTIONS_H_ */

On Wed, Sep 25, 2024 at 02:55:30PM +0200, Simon Glass wrote:
Add a linker symbol which can be used to mark relocation code, so it can be collected by the linker and copied into a suitable place and executed when needed.
Signed-off-by: Simon Glass sjg@chromium.org
include/asm-generic/sections.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
This patch, unless there's an implicit dependency on the other series and so introduces an #include leads to fail to build on sandbox.
https://source.denx.de/u-boot/u-boot/-/jobs/906194

Hi Tom,
On Thu, 26 Sept 2024 at 06:06, Tom Rini trini@konsulko.com wrote:
On Wed, Sep 25, 2024 at 02:55:30PM +0200, Simon Glass wrote:
Add a linker symbol which can be used to mark relocation code, so it can be collected by the linker and copied into a suitable place and executed when needed.
Signed-off-by: Simon Glass sjg@chromium.org
include/asm-generic/sections.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
This patch, unless there's an implicit dependency on the other series and so introduces an #include leads to fail to build on sandbox.
Yes it is in part D:
https://patchwork.ozlabs.org/project/uboot/patch/20240920072444.134997-2-sjg...
Regards, Simon

Mark the crc8 code as needed by relocation. This is used as a simple check against corruption of the code when copying.
Signed-off-by: Simon Glass sjg@chromium.org ---
lib/crc8.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/lib/crc8.c b/lib/crc8.c index 811e19917b4..bbb229c3892 100644 --- a/lib/crc8.c +++ b/lib/crc8.c @@ -6,11 +6,12 @@ #ifdef USE_HOSTCC #include <arpa/inet.h> #endif +#include <asm/sections.h> #include <u-boot/crc.h>
#define POLY (0x1070U << 3)
-static unsigned char _crc8(unsigned short data) +__rcode static unsigned char _crc8(unsigned short data) { int i;
@@ -23,7 +24,7 @@ static unsigned char _crc8(unsigned short data) return (unsigned char)(data >> 8); }
-unsigned int crc8(unsigned int crc, const unsigned char *vptr, int len) +__rcode unsigned int crc8(unsigned int crc, const unsigned char *vptr, int len) { int i;

Mark the lz4 decompression code as needed by relocation. This is used to decompress the next-phase image.
Drop the 'safe' versions from SPL as they are not needed. Change the static array to a local one, to avoid link errors when trying to access the data.
Signed-off-by: Simon Glass sjg@chromium.org ---
lib/lz4.c | 27 +++++++++++++++------------ lib/lz4_wrapper.c | 2 +- 2 files changed, 16 insertions(+), 13 deletions(-)
diff --git a/lib/lz4.c b/lib/lz4.c index 63955a0b178..f93d74535e4 100644 --- a/lib/lz4.c +++ b/lib/lz4.c @@ -33,15 +33,16 @@ #include <linux/bug.h> #include <asm/unaligned.h> #include <u-boot/lz4.h> +#include <asm/sections.h>
#define FORCE_INLINE inline __attribute__((always_inline))
-static FORCE_INLINE u16 LZ4_readLE16(const void *src) +__rcode static FORCE_INLINE u16 LZ4_readLE16(const void *src) { return get_unaligned_le16(src); }
-static FORCE_INLINE void LZ4_copy8(void *dst, const void *src) +__rcode static FORCE_INLINE void LZ4_copy8(void *dst, const void *src) { put_unaligned(get_unaligned((const u64 *)src), (u64 *)dst); } @@ -53,7 +54,7 @@ typedef int32_t S32; typedef uint64_t U64; typedef uintptr_t uptrval;
-static FORCE_INLINE void LZ4_write32(void *memPtr, U32 value) +__rcode static FORCE_INLINE void LZ4_write32(void *memPtr, U32 value) { put_unaligned(value, (U32 *)memPtr); } @@ -63,7 +64,7 @@ static FORCE_INLINE void LZ4_write32(void *memPtr, U32 value) **************************************/
/* customized version of memcpy, which may overwrite up to 7 bytes beyond dstEnd */ -static void LZ4_wildCopy(void* dstPtr, const void* srcPtr, void* dstEnd) +__rcode static void LZ4_wildCopy(void *dstPtr, const void *srcPtr, void *dstEnd) { BYTE* d = (BYTE*)dstPtr; const BYTE* s = (const BYTE*)srcPtr; @@ -118,8 +119,7 @@ typedef enum { decode_full_block = 0, partial_decode = 1 } earlyEnd_directive; * Note that it is important for performance that this function really get inlined, * in order to remove useless branches during compilation optimization. */ -static FORCE_INLINE int LZ4_decompress_generic( - const char * const src, +__rcode static FORCE_INLINE int LZ4_decompress_generic(const char * const src, char * const dst, int srcSize, /* @@ -141,6 +141,8 @@ static FORCE_INLINE int LZ4_decompress_generic( const size_t dictSize ) { + const unsigned int inc32table[8] = {0, 1, 2, 1, 0, 4, 4, 4}; + const int dec64table[8] = {0, 0, 0, -1, -4, 1, 2, 3}; const BYTE *ip = (const BYTE *) src; const BYTE * const iend = ip + srcSize;
@@ -149,8 +151,6 @@ static FORCE_INLINE int LZ4_decompress_generic( BYTE *cpy;
const BYTE * const dictEnd = (const BYTE *)dictStart + dictSize; - static const unsigned int inc32table[8] = {0, 1, 2, 1, 0, 4, 4, 4}; - static const int dec64table[8] = {0, 0, 0, -1, -4, 1, 2, 3};
const int safeDecode = (endOnInput == endOnInputSize); const int checkOffset = ((safeDecode) && (dictSize < (int)(64 * KB))); @@ -514,8 +514,9 @@ _output_error: return (int) (-(((const char *)ip) - src)) - 1; }
-int LZ4_decompress_safe(const char *source, char *dest, - int compressedSize, int maxDecompressedSize) +#ifndef CONFIG_SPL_BUILD +__rcode int LZ4_decompress_safe(const char *source, char *dest, + int compressedSize, int maxDecompressedSize) { return LZ4_decompress_generic(source, dest, compressedSize, maxDecompressedSize, @@ -523,11 +524,13 @@ int LZ4_decompress_safe(const char *source, char *dest, noDict, (BYTE *)dest, NULL, 0); }
-int LZ4_decompress_safe_partial(const char *src, char *dst, - int compressedSize, int targetOutputSize, int dstCapacity) +__rcode int LZ4_decompress_safe_partial(const char *src, char *dst, + int compressedSize, + int targetOutputSize, int dstCapacity) { dstCapacity = min(targetOutputSize, dstCapacity); return LZ4_decompress_generic(src, dst, compressedSize, dstCapacity, endOnInputSize, partial_decode, noDict, (BYTE *)dst, NULL, 0); } +#endif diff --git a/lib/lz4_wrapper.c b/lib/lz4_wrapper.c index 4d48e7b0e8b..b1204511170 100644 --- a/lib/lz4_wrapper.c +++ b/lib/lz4_wrapper.c @@ -15,7 +15,7 @@
#define LZ4F_BLOCKUNCOMPRESSED_FLAG 0x80000000U
-int ulz4fn(const void *src, size_t srcn, void *dst, size_t *dstn) +__rcode int ulz4fn(const void *src, size_t srcn, void *dst, size_t *dstn) { const void *end = dst + *dstn; const void *in = src;

On Wed, Sep 25, 2024 at 02:55:32PM +0200, Simon Glass wrote:
Mark the lz4 decompression code as needed by relocation. This is used to decompress the next-phase image.
Drop the 'safe' versions from SPL as they are not needed. Change the static array to a local one, to avoid link errors when trying to access the data.
Signed-off-by: Simon Glass sjg@chromium.org
lib/lz4.c | 27 +++++++++++++++------------
Some amount of how you're restructuring here leads to (a number of platforms): puma-rk3399 : all +132 bss +16 rodata -64 text +180 u-boot: add: 0/-2, grow: 3/0 bytes: 180/-64 (116) function old new delta ulz4fn 1396 1456 +60 LZ4_decompress_safe_partial 1056 1116 +60 LZ4_decompress_safe 1020 1080 +60 static.inc32table 32 - -32 static.dec64table 32 - -32

Mark these functions as needed by relocation. This is used to copy data while relocating the next-phase image.
Drop the 'safe' versions from SPL as they are not needed. Change the static array to a local one, to avoid link errors when trying to access the data.
Signed-off-by: Simon Glass sjg@chromium.org ---
lib/string.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/lib/string.c b/lib/string.c index feae9519f2f..0e0900de8bf 100644 --- a/lib/string.c +++ b/lib/string.c @@ -21,6 +21,7 @@ #include <linux/string.h> #include <linux/ctype.h> #include <malloc.h> +#include <asm/sections.h>
/** * strncasecmp - Case insensitive, length-limited string comparison @@ -559,7 +560,7 @@ __used void * memset(void * s,int c,size_t count) * You should not use this function to access IO space, use memcpy_toio() * or memcpy_fromio() instead. */ -__used void * memcpy(void *dest, const void *src, size_t count) +__rcode __used void *memcpy(void *dest, const void *src, size_t count) { unsigned long *dl = (unsigned long *)dest, *sl = (unsigned long *)src; char *d8, *s8; @@ -593,7 +594,7 @@ __used void * memcpy(void *dest, const void *src, size_t count) * * Unlike memcpy(), memmove() copes with overlapping areas. */ -__used void * memmove(void * dest,const void *src,size_t count) +__rcode __used void *memmove(void *dest, const void *src, size_t count) { char *tmp, *s;

This function will be used by the relocating jumper too, so add a typedef to the header file to avoid mismatches.
Signed-off-by: Simon Glass sjg@chromium.org ---
common/spl/spl.c | 3 +-- include/spl.h | 3 +++ 2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/common/spl/spl.c b/common/spl/spl.c index 75938aa63cc..52f7900b431 100644 --- a/common/spl/spl.c +++ b/common/spl/spl.c @@ -672,8 +672,7 @@ void board_init_r(gd_t *dummy1, ulong dummy2) BOOT_DEVICE_NONE, BOOT_DEVICE_NONE, }; - typedef void __noreturn (*jump_to_image_t)(struct spl_image_info *); - jump_to_image_t jump_to_image = &jump_to_image_no_args; + spl_jump_to_image_t jump_to_image = &jump_to_image_no_args; struct spl_image_info spl_image; int ret, os;
diff --git a/include/spl.h b/include/spl.h index 113d50152a1..f73e5f5209c 100644 --- a/include/spl.h +++ b/include/spl.h @@ -274,6 +274,9 @@ struct spl_image_info { #endif };
+/* function to jump to an image from SPL */ +typedef void __noreturn (*spl_jump_to_image_t)(struct spl_image_info *); + static inline void *spl_image_fdt_addr(struct spl_image_info *info) { #if CONFIG_IS_ENABLED(LOAD_FIT) || CONFIG_IS_ENABLED(LOAD_FIT_FULL)

When one XPL phase wants to jump to the next, the next phase must be loaded into its required address. This means that the TEXT_BASE for the two phases must be different and there cannot be any memory overlap between the phases. It also can mean that phases need to be moved around to accommodate any size growth.
Having two XPL phases in SRAM at the same time can be tricky if SRAM is limited, which it often is. It would be better if the second phase could be loaded somewhere else, then decompressed into place over the top of the first phase.
Introduce a relocating jump for XPL to support this. This selects a suitable place to load the (typically compressed) next phase, copies some decompression code out of the first phase, then jumps to this code to decompress and start the next phase.
This feature makes it much easier to support Verified Boot for Embedded (VBE) on RK3399 boards, which have 192KB of SRAM.
Add some documentation as well.
Signed-off-by: Simon Glass sjg@chromium.org ---
MAINTAINERS | 6 ++ common/spl/Kconfig | 9 ++ common/spl/Kconfig.tpl | 9 ++ common/spl/Kconfig.vpl | 8 ++ common/spl/Makefile | 1 + common/spl/spl_reloc.c | 182 +++++++++++++++++++++++++++++++++++++++++ doc/develop/spl.rst | 35 ++++++++ include/spl.h | 29 +++++++ 8 files changed, 279 insertions(+) create mode 100644 common/spl/spl_reloc.c
diff --git a/MAINTAINERS b/MAINTAINERS index 7ab39d91a55..1e59d9f6452 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1591,6 +1591,12 @@ F: include/spi_flash.h F: include/linux/mtd/cfi.h F: include/linux/mtd/spi-nor.h
+SPL RELOC +M: Simon Glass sjg@chromium.org +S: Maintained +T: git https://source.denx.de/u-boot/custodians/u-boot-dm.git +F: common/spl/spl_reloc.c + SPMI M: Mateusz Kulikowski mateusz.kulikowski@gmail.com S: Maintained diff --git a/common/spl/Kconfig b/common/spl/Kconfig index 3c44e329d62..01345a30637 100644 --- a/common/spl/Kconfig +++ b/common/spl/Kconfig @@ -969,6 +969,15 @@ config SPL_NAND_IDENT help SPL uses the chip ID list to identify the NAND flash.
+config SPL_RELOC_LOADER + bool "Allow relocating the next phase" + select SPL_CRC8 + help + In some cases multiple U-Boot phases need to run in SRAM, typically + at the same address. Enable this to support loading the next phase + to temporary memory, then copying it into place afterwards, then + jumping to it. + config SPL_UBI bool "Support UBI" help diff --git a/common/spl/Kconfig.tpl b/common/spl/Kconfig.tpl index 92d4d43ec87..03fcf024cdf 100644 --- a/common/spl/Kconfig.tpl +++ b/common/spl/Kconfig.tpl @@ -268,6 +268,15 @@ config TPL_RAM_DEVICE be already in memory when TPL takes over, e.g. loaded by the boot ROM.
+config TPL_RELOC_LOADER + bool "Allow relocating the next phase" + select TPL_CRC8 + help + In some cases multiple U-Boot phases need to run in SRAM, typically + at the same address. Enable this to support loading the next phase + to temporary memory, then copying it into place afterwards, then + jumping to it. + config TPL_RTC bool "Support RTC drivers" help diff --git a/common/spl/Kconfig.vpl b/common/spl/Kconfig.vpl index eb57dfabea5..97dfc630152 100644 --- a/common/spl/Kconfig.vpl +++ b/common/spl/Kconfig.vpl @@ -181,6 +181,14 @@ config VPL_PCI necessary driver support. This enables the drivers in drivers/pci as part of a VPL build.
+config VPL_RELOC_LOADER + bool "Allow relocating the next phase" + help + In some cases multiple U-Boot phases need to run in SRAM, typically + at the same address. Enable this to support loading the next phase + to temporary memory, then copying it into place afterwards, then + jumping to it. + config VPL_RTC bool "Support RTC drivers" help diff --git a/common/spl/Makefile b/common/spl/Makefile index 137b18428bd..a3bf1214739 100644 --- a/common/spl/Makefile +++ b/common/spl/Makefile @@ -12,6 +12,7 @@ obj-$(CONFIG_$(SPL_TPL_)BOOTROM_SUPPORT) += spl_bootrom.o obj-$(CONFIG_$(SPL_TPL_)LOAD_FIT) += spl_fit.o obj-$(CONFIG_$(SPL_TPL_)BLK_FS) += spl_blk_fs.o obj-$(CONFIG_$(SPL_TPL_)LEGACY_IMAGE_FORMAT) += spl_legacy.o +obj-$(CONFIG_$(SPL_TPL_)RELOC_LOADER) += spl_reloc.o obj-$(CONFIG_$(SPL_TPL_)NOR_SUPPORT) += spl_nor.o obj-$(CONFIG_$(SPL_TPL_)XIP_SUPPORT) += spl_xip.o obj-$(CONFIG_$(SPL_TPL_)YMODEM_SUPPORT) += spl_ymodem.o diff --git a/common/spl/spl_reloc.c b/common/spl/spl_reloc.c new file mode 100644 index 00000000000..96ad3d2b4f2 --- /dev/null +++ b/common/spl/spl_reloc.c @@ -0,0 +1,182 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright 2024 Google LLC + * Written by Simon Glass sjg@chromium.org + */ + +#define LOG_CATEGORY LOGC_BOOT + +#include <display_options.h> +#include <gzip.h> +#include <image.h> +#include <log.h> +#include <mapmem.h> +#include <spl.h> +#include <asm/global_data.h> +#include <asm/io.h> +#include <asm/sections.h> +#include <asm/unaligned.h> +#include <linux/types.h> +#include <lzma/LzmaTypes.h> +#include <lzma/LzmaDec.h> +#include <lzma/LzmaTools.h> +#include <u-boot/crc.h> +#include <u-boot/lz4.h> + +DECLARE_GLOBAL_DATA_PTR; + +/* provide a way to jump straight into the relocation code, for debugging */ +#define DEBUG_JUMP 0 + +enum { + /* margin to allow for stack growth */ + RELOC_STACK_MARGIN = 0x800, + + /* align base address for DMA controllers which require it */ + BASE_ALIGN = 0x200, + + STACK_PROT_VALUE = 0x51ce4697, +}; + +typedef int (*rcode_func)(struct spl_image_info *image); + +static int setup_layout(struct spl_image_info *image, ulong *addrp) +{ + uint rcode_size, fdt_size; + ulong limit, rcode_base; + int buf_size, margin; + char *rcode_buf; + uint need_size; + ulong base; + + limit = ALIGN(map_to_sysmem(&limit) - RELOC_STACK_MARGIN, 8); + image->stack_prot = map_sysmem(limit, sizeof(uint)); + *image->stack_prot = STACK_PROT_VALUE; + + fdt_size = fdt_totalsize(gd->fdt_blob); + base = ALIGN(map_to_sysmem(gd->fdt_blob) + fdt_size + BASE_ALIGN - 1, + BASE_ALIGN); + + rcode_size = _rcode_end - _rcode_start; + rcode_base = limit - rcode_size; + buf_size = rcode_base - base; + need_size = image->size + image->fdt_size; + margin = buf_size - need_size; + printf("spl_reloc %s->%s: margin%s%lx limit %lx fdt_size %x base %lx avail %x image %x fdt %x need %x\n", + spl_phase_name(spl_phase()), spl_phase_name(spl_next_phase()), + margin >= 0 ? " " : " -", abs(margin), limit, fdt_size, base, + buf_size, image->size, image->fdt_size, need_size); + if (margin < 0) { + log_err("Image size %x but buffer is only %x\n", need_size, + buf_size); + return -ENOSPC; + } + + rcode_buf = map_sysmem(rcode_base, rcode_size); + log_debug("_rcode_start %p: %x -- func %p %x\n", _rcode_start, + *(uint *)_rcode_start, setup_layout, *(uint *)setup_layout); + + image->reloc_offset = rcode_buf - _rcode_start; + log_debug("_rcode start %lx base %lx size %x offset %lx\n", + (ulong)map_to_sysmem(_rcode_start), rcode_base, rcode_size, + image->reloc_offset); + + memcpy(rcode_buf, _rcode_start, rcode_size); + + image->buf = map_sysmem(base, need_size); + image->fdt_buf = image->buf + image->size; + image->rcode_buf = rcode_buf; + *addrp = base; + + return 0; +} + +int spl_reloc_prepare(struct spl_image_info *image, ulong *addrp) +{ + int ret; + + ret = setup_layout(image, addrp); + if (ret) + return ret; + + return 0; +} + +typedef void __noreturn (*image_entry_noargs_t)(uint crc, uint unc_len); + +/* this is the relocation + jump code that is copied to the top of memory */ +__rcode int rcode_reloc_and_jump(struct spl_image_info *image) +{ + image_entry_noargs_t entry = (image_entry_noargs_t)image->entry_point; + u32 *dst; + ulong image_len; + size_t unc_len; + int ret, crc; + uint magic; + + dst = map_sysmem(image->load_addr, image->size); + unc_len = (void *)image->rcode_buf - (void *)dst; + image_len = image->size; + if (*image->stack_prot != STACK_PROT_VALUE) + return -EFAULT; + magic = get_unaligned_le32(image->buf); + if (CONFIG_IS_ENABLED(LZ4) && magic == LZ4F_MAGIC) { + log_debug("lz4\n"); + ret = ulz4fn(image->buf, image_len, dst, &unc_len); + if (ret) + return ret; + } else { + u32 *src, *end, *ptr; + + log_debug("uncomp"); + unc_len = image->size; + for (src = image->buf, end = (void *)src + image->size, + ptr = dst; src < end;) + *ptr++ = *src++; + } + if (*image->stack_prot != STACK_PROT_VALUE) + return -EFAULT; + + /* copy in the FDT if needed */ + if (image->fdt_size) + memcpy(image->fdt_start, image->fdt_buf, image->fdt_size); + + crc = crc8(0, (u8 *)dst, unc_len); + + /* jump to the entry point */ + entry(crc, unc_len); +} + +int spl_reloc_jump(struct spl_image_info *image, spl_jump_to_image_t jump) +{ + rcode_func loader; + int ret; + + log_debug("malloc usage %x bytes (%d KB of %d KB)\n", gd->malloc_ptr, + gd->malloc_ptr / 1024, CONFIG_VAL(SYS_MALLOC_F_LEN) / 1024); + + if (*image->stack_prot != STACK_PROT_VALUE) { + /* did you call spl_reloc_prepare() ? */ + log_err("stack busted, cannot continue\n"); + return -EFAULT; + } + loader = (rcode_func)(void *)rcode_reloc_and_jump + image->reloc_offset; + log_debug("Jumping via %p to %lx - image %p size %x load %lx\n", loader, + image->entry_point, image, image->size, image->load_addr); + + log_debug("unc_len %lx\n", + image->rcode_buf - map_sysmem(image->load_addr, image->size)); + if (DEBUG_JUMP) { + rcode_reloc_and_jump(image); + } else { + /* + * Must disable LOG_DEBUG since the decompressor cannot call + * log functions, printf(), etc. + */ + _Static_assert(DEBUG_JUMP || !_DEBUG, + "Cannot have debug output from decompressor"); + ret = loader(image); + } + + return -EFAULT; +} diff --git a/doc/develop/spl.rst b/doc/develop/spl.rst index 4bb48e6b7b3..4c406f2b45d 100644 --- a/doc/develop/spl.rst +++ b/doc/develop/spl.rst @@ -203,3 +203,38 @@ end of RAM as per the bloblists received, before carrying out further reservations or updating the relocation address. For e.g, U-boot proper uses function "setup_relocaddr_from_bloblist" to parse the bloblists passed from previous stage and skip the memory reserved from previous stage accordingly. + + +Relocating loader +----------------- + +When one xPL phase wants to jump to the next, the next phase must be loaded into +its required address. This means that the TEXT_BASE for the two phases must be +different and there cannot be any memory overlap between the phases. It also can +mean that phases need to be moved around to accommodate any size growth. + +Having two xPL phases in SRAM at the same time can be tricky if SRAM is limited, +which it often is. It would be better if the second phase could be loaded +somewhere else, then decompressed into place over the top of the first phase. + +The relocating loader (CONFIG_SPL_RELOC_LOADER) provides this feature. Itselects +a suitable place to load the (typically compressed) next phase, copies some +decompression code out of the first phase, then jumps to this code to decompress +and start the next phase. + +This feature makes it much easier to support Verified Boot for Embedded (VBE) on +RK3399 boards, for example, which have 192KB of SRAM. + +To use this feature: + +#. Enable xPL_RELOC_LOADER for the phase which wants to use it. It will then be + used to load the next phase +#. Create an SPL_LOAD_IMAGE_METHOD() function to perform the load. Insert a call + to spl_reloc_prepare, passing the image information within + ``struct spl_image_info`` (``size`` and ``fdt_size``). This will return + the address of the temporary place to which the image should be loaded +#. Load the image to that address +#. Set the required ``load_addr`` and ``entry_point`` +#. Return 0 from the SPL_LOAD_IMAGE_METHOD() function, indicating success +#. The common SPL code will then copy / decompress your image to the provided + ``load_addr`` and then jump to it at the ``entry_point`` address diff --git a/include/spl.h b/include/spl.h index f73e5f5209c..ecc6a2728f3 100644 --- a/include/spl.h +++ b/include/spl.h @@ -272,6 +272,15 @@ struct spl_image_info { ulong dcrc_length; ulong dcrc; #endif +#if CONFIG_IS_ENABLED(RELOC_LOADER) + void *buf; + void *fdt_buf; + void *fdt_start; + void *rcode_buf; + uint *stack_prot; + ulong reloc_offset; + u32 fdt_size; +#endif };
/* function to jump to an image from SPL */ @@ -357,6 +366,22 @@ static inline enum image_phase_t spl_get_phase(struct spl_load_info *info) #endif }
+static inline void spl_set_fdt_size(struct spl_image_info *img, uint fdt_size) +{ +#if CONFIG_IS_ENABLED(RELOC_LOADER) + img->fdt_size = fdt_size; +#endif +} + +static inline uint spl_get_fdt_size(struct spl_image_info *img) +{ +#if CONFIG_IS_ENABLED(RELOC_LOADER) + return img->fdt_size; +#else + return 0; +#endif +} + /** * spl_load_init() - Set up a new spl_load_info structure */ @@ -1127,4 +1152,8 @@ int spl_write_upl_handoff(struct spl_image_info *spl_image); */ void spl_upl_init(void);
+int spl_reloc_prepare(struct spl_image_info *image, ulong *addrp); + +int spl_reloc_jump(struct spl_image_info *image, spl_jump_to_image_t func); + #endif

This is fairly easy to use. The SPL loader sets up some fields in the spl_image_info struct and calls spl_reloc_prepare(). When SPL is ready to do the jump it must call spl_reloc_jump() instead of jump_to_image().
Add this logic.
Signed-off-by: Simon Glass sjg@chromium.org ---
common/spl/spl.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/common/spl/spl.c b/common/spl/spl.c index 52f7900b431..d01e9861f88 100644 --- a/common/spl/spl.c +++ b/common/spl/spl.c @@ -829,6 +829,18 @@ void board_init_r(gd_t *dummy1, ulong dummy2) }
spl_board_prepare_for_boot(); + + if (CONFIG_IS_ENABLED(RELOC_LOADER)) { + int ret; + + ret = spl_reloc_jump(&spl_image, jump_to_image); + if (ret) { + if (spl_phase() == PHASE_VPL) + printf("jump failed %d\n", ret); + hang(); + } + } + jump_to_image(&spl_image); }

Use spl_get_image_pos() to obtain the image position to jump to. Add the symbols used for VPL so that the correct image can be loaded.
Use the functions provided for accessing these symbols and add a few comments too.
Signed-off-by: Simon Glass sjg@chromium.org ---
common/spl/spl.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-)
diff --git a/common/spl/spl.c b/common/spl/spl.c index d01e9861f88..623e486c210 100644 --- a/common/spl/spl.c +++ b/common/spl/spl.c @@ -50,15 +50,19 @@ u32 *boot_params_ptr = NULL;
#if CONFIG_IS_ENABLED(BINMAN_UBOOT_SYMBOLS) /* See spl.h for information about this */ +#if defined(CONFIG_SPL_BUILD) && !defined(CONFIG_TPL_BUILD) && !defined(CONFIG_VPL_BUILD) binman_sym_declare(ulong, u_boot_any, image_pos); binman_sym_declare(ulong, u_boot_any, size); +#endif
-#ifdef CONFIG_TPL +#if defined(CONFIG_TPL) +/* TPL jumps straight to SPL */ binman_sym_declare(ulong, u_boot_spl_any, image_pos); binman_sym_declare(ulong, u_boot_spl_any, size); #endif
#ifdef CONFIG_VPL +/* TPL jumps to VPL */ binman_sym_declare(ulong, u_boot_vpl_any, image_pos); binman_sym_declare(ulong, u_boot_vpl_any, size); #endif @@ -179,9 +183,15 @@ ulong spl_get_image_pos(void) if (spl_next_phase() == PHASE_VPL) return binman_sym(ulong, u_boot_vpl_any, image_pos); #endif - return spl_next_phase() == PHASE_SPL ? - binman_sym(ulong, u_boot_spl_any, image_pos) : - binman_sym(ulong, u_boot_any, image_pos); +#if defined(CONFIG_TPL) && !defined(CONFIG_VPL) + if (spl_next_phase() == PHASE_SPL) + return binman_sym(ulong, u_boot_spl_any, image_pos); +#endif +#if defined(CONFIG_SPL_BUILD) && !defined(CONFIG_TPL_BUILD) && !defined(CONFIG_VPL_BUILD) + return binman_sym(ulong, u_boot_any, image_pos); +#endif + + return BINMAN_SYM_MISSING; }
ulong spl_get_image_size(void) @@ -263,8 +273,8 @@ void spl_set_header_raw_uboot(struct spl_image_info *spl_image) */ if (u_boot_pos && u_boot_pos != BINMAN_SYM_MISSING) { /* Binman does not support separated entry addresses */ - spl_image->entry_point = u_boot_pos; - spl_image->load_addr = u_boot_pos; + spl_image->entry_point = spl_get_image_text_base(); + spl_image->load_addr = spl_get_image_text_base(); } else { spl_image->entry_point = CONFIG_SYS_UBOOT_START; spl_image->load_addr = CONFIG_TEXT_BASE;

On Wed, Sep 25, 2024 at 02:55:37PM +0200, Simon Glass wrote:
Use spl_get_image_pos() to obtain the image position to jump to. Add the symbols used for VPL so that the correct image can be loaded.
Use the functions provided for accessing these symbols and add a few comments too.
Signed-off-by: Simon Glass sjg@chromium.org
common/spl/spl.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-)
diff --git a/common/spl/spl.c b/common/spl/spl.c index d01e9861f88..623e486c210 100644 --- a/common/spl/spl.c +++ b/common/spl/spl.c @@ -50,15 +50,19 @@ u32 *boot_params_ptr = NULL;
#if CONFIG_IS_ENABLED(BINMAN_UBOOT_SYMBOLS) /* See spl.h for information about this */ +#if defined(CONFIG_SPL_BUILD) && !defined(CONFIG_TPL_BUILD) && !defined(CONFIG_VPL_BUILD) binman_sym_declare(ulong, u_boot_any, image_pos); binman_sym_declare(ulong, u_boot_any, size); +#endif
-#ifdef CONFIG_TPL +#if defined(CONFIG_TPL) +/* TPL jumps straight to SPL */ binman_sym_declare(ulong, u_boot_spl_any, image_pos); binman_sym_declare(ulong, u_boot_spl_any, size); #endif
#ifdef CONFIG_VPL +/* TPL jumps to VPL */ binman_sym_declare(ulong, u_boot_vpl_any, image_pos); binman_sym_declare(ulong, u_boot_vpl_any, size); #endif
So I see on a64-olinuxino and others a size reduction here, as those symbols aren't included now. Do we have something in the tooling to ensure that we aren't now referencing / dereferencing invalid links?

Hi Tom,
On Thu, 26 Sept 2024 at 06:07, Tom Rini trini@konsulko.com wrote:
On Wed, Sep 25, 2024 at 02:55:37PM +0200, Simon Glass wrote:
Use spl_get_image_pos() to obtain the image position to jump to. Add the symbols used for VPL so that the correct image can be loaded.
Use the functions provided for accessing these symbols and add a few comments too.
Signed-off-by: Simon Glass sjg@chromium.org
common/spl/spl.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-)
diff --git a/common/spl/spl.c b/common/spl/spl.c index d01e9861f88..623e486c210 100644 --- a/common/spl/spl.c +++ b/common/spl/spl.c @@ -50,15 +50,19 @@ u32 *boot_params_ptr = NULL;
#if CONFIG_IS_ENABLED(BINMAN_UBOOT_SYMBOLS) /* See spl.h for information about this */ +#if defined(CONFIG_SPL_BUILD) && !defined(CONFIG_TPL_BUILD) && !defined(CONFIG_VPL_BUILD) binman_sym_declare(ulong, u_boot_any, image_pos); binman_sym_declare(ulong, u_boot_any, size); +#endif
-#ifdef CONFIG_TPL +#if defined(CONFIG_TPL) +/* TPL jumps straight to SPL */ binman_sym_declare(ulong, u_boot_spl_any, image_pos); binman_sym_declare(ulong, u_boot_spl_any, size); #endif
#ifdef CONFIG_VPL +/* TPL jumps to VPL */ binman_sym_declare(ulong, u_boot_vpl_any, image_pos); binman_sym_declare(ulong, u_boot_vpl_any, size); #endif
So I see on a64-olinuxino and others a size reduction here, as those symbols aren't included now. Do we have something in the tooling to ensure that we aren't now referencing / dereferencing invalid links?
Yes I noticed that on some other boards. I dug into it a bit and decided that the symbol was being declared when it didn't need to be. If the symbol were used but not declared, we get an error.
Regards, Simon

On Thu, Sep 26, 2024 at 11:33:52PM +0200, Simon Glass wrote:
Hi Tom,
On Thu, 26 Sept 2024 at 06:07, Tom Rini trini@konsulko.com wrote:
On Wed, Sep 25, 2024 at 02:55:37PM +0200, Simon Glass wrote:
Use spl_get_image_pos() to obtain the image position to jump to. Add the symbols used for VPL so that the correct image can be loaded.
Use the functions provided for accessing these symbols and add a few comments too.
Signed-off-by: Simon Glass sjg@chromium.org
common/spl/spl.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-)
diff --git a/common/spl/spl.c b/common/spl/spl.c index d01e9861f88..623e486c210 100644 --- a/common/spl/spl.c +++ b/common/spl/spl.c @@ -50,15 +50,19 @@ u32 *boot_params_ptr = NULL;
#if CONFIG_IS_ENABLED(BINMAN_UBOOT_SYMBOLS) /* See spl.h for information about this */ +#if defined(CONFIG_SPL_BUILD) && !defined(CONFIG_TPL_BUILD) && !defined(CONFIG_VPL_BUILD) binman_sym_declare(ulong, u_boot_any, image_pos); binman_sym_declare(ulong, u_boot_any, size); +#endif
-#ifdef CONFIG_TPL +#if defined(CONFIG_TPL) +/* TPL jumps straight to SPL */ binman_sym_declare(ulong, u_boot_spl_any, image_pos); binman_sym_declare(ulong, u_boot_spl_any, size); #endif
#ifdef CONFIG_VPL +/* TPL jumps to VPL */ binman_sym_declare(ulong, u_boot_vpl_any, image_pos); binman_sym_declare(ulong, u_boot_vpl_any, size); #endif
So I see on a64-olinuxino and others a size reduction here, as those symbols aren't included now. Do we have something in the tooling to ensure that we aren't now referencing / dereferencing invalid links?
Yes I noticed that on some other boards. I dug into it a bit and decided that the symbol was being declared when it didn't need to be. If the symbol were used but not declared, we get an error.
OK, thanks.

This is only "U-Boot" when in SPL. For earlier phases it should use the correct value, so update this.
Signed-off-by: Simon Glass sjg@chromium.org ---
common/spl/spl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/common/spl/spl.c b/common/spl/spl.c index 623e486c210..2466b98f5a8 100644 --- a/common/spl/spl.c +++ b/common/spl/spl.c @@ -280,7 +280,7 @@ void spl_set_header_raw_uboot(struct spl_image_info *spl_image) spl_image->load_addr = CONFIG_TEXT_BASE; } spl_image->os = IH_OS_U_BOOT; - spl_image->name = "U-Boot"; + spl_image->name = spl_phase_name(spl_next_phase()); }
__weak int spl_parse_board_header(struct spl_image_info *spl_image,

Binman provides the exact size of the SPL image being loaded, so show how to fill this in. The code is not used, since it does provide a size increase.
Signed-off-by: Simon Glass sjg@chromium.org ---
common/spl/spl.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/common/spl/spl.c b/common/spl/spl.c index 2466b98f5a8..878036210c4 100644 --- a/common/spl/spl.c +++ b/common/spl/spl.c @@ -275,6 +275,7 @@ void spl_set_header_raw_uboot(struct spl_image_info *spl_image) /* Binman does not support separated entry addresses */ spl_image->entry_point = spl_get_image_text_base(); spl_image->load_addr = spl_get_image_text_base(); + /* if needed: spl_image->size = spl_get_image_size(); */ } else { spl_image->entry_point = CONFIG_SYS_UBOOT_START; spl_image->load_addr = CONFIG_TEXT_BASE;

Add some debugging here so it is easier to see what is going on.
Signed-off-by: Simon Glass sjg@chromium.org ---
common/spl/spl.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/common/spl/spl.c b/common/spl/spl.c index 878036210c4..75fa1a854d9 100644 --- a/common/spl/spl.c +++ b/common/spl/spl.c @@ -276,12 +276,17 @@ void spl_set_header_raw_uboot(struct spl_image_info *spl_image) spl_image->entry_point = spl_get_image_text_base(); spl_image->load_addr = spl_get_image_text_base(); /* if needed: spl_image->size = spl_get_image_size(); */ + log_debug("Next load addr %lx\n", spl_image->load_addr); } else { spl_image->entry_point = CONFIG_SYS_UBOOT_START; spl_image->load_addr = CONFIG_TEXT_BASE; + log_debug("Default load addr %x (u_boot_pos=%lx)\n", + CONFIG_TEXT_BASE, u_boot_pos); } spl_image->os = IH_OS_U_BOOT; spl_image->name = spl_phase_name(spl_next_phase()); + log_debug("Next phase: %s at %lx size %lx\n", spl_image->name, + spl_image->load_addr, (ulong)spl_image->size); }
__weak int spl_parse_board_header(struct spl_image_info *spl_image,

Indicate that these boards can be supported, so a new 'TPL' board can be added.
Signed-off-by: Simon Glass sjg@chromium.org ---
arch/arm/Kconfig | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 656f588a97c..dfc735237aa 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1043,6 +1043,8 @@ config ARCH_QEMU select DM_SERIAL select OF_CONTROL select PL01X_SERIAL + select SUPPORT_SPL + select SUPPORT_TPL imply CMD_DM imply DM_RNG imply DM_RTC

We want to be able to test the relocating XPL loader. Add a new build for ARM QEMU which supports booting from TPL into SPL
This builds an image containing TPL, SPL and U-Boot proper. To run it:
qemu-system-aarch64 -machine virt -nographic -cpu cortex-a57 \ -bios image.bin
Signed-off-by: Simon Glass sjg@chromium.org ---
arch/arm/dts/qemu-arm64.dts | 18 ++++++ arch/arm/mach-qemu/Kconfig | 20 +++++++ board/emulation/qemu-arm/Kconfig | 2 +- board/emulation/qemu-arm/MAINTAINERS | 5 ++ board/emulation/qemu-arm/Makefile | 1 + board/emulation/qemu-arm/qemu-arm.env | 4 ++ board/emulation/qemu-arm/xpl.c | 35 ++++++++++++ configs/qemu_arm64_tpl_defconfig | 82 +++++++++++++++++++++++++++ 8 files changed, 166 insertions(+), 1 deletion(-) create mode 100644 board/emulation/qemu-arm/xpl.c create mode 100644 configs/qemu_arm64_tpl_defconfig
diff --git a/arch/arm/dts/qemu-arm64.dts b/arch/arm/dts/qemu-arm64.dts index 096b3910728..de943642e76 100644 --- a/arch/arm/dts/qemu-arm64.dts +++ b/arch/arm/dts/qemu-arm64.dts @@ -8,4 +8,22 @@ /dts-v1/;
/ { +#ifdef CONFIG_BINMAN + binman { + u-boot-tpl { + }; + + u-boot-spl { + symbols-base = <0>; + offset = <CONFIG_SPL_TEXT_BASE>; + }; + + u-boot { + offset = <CONFIG_TEXT_BASE>; + }; + + fdtmap { + }; + }; +#endif }; diff --git a/arch/arm/mach-qemu/Kconfig b/arch/arm/mach-qemu/Kconfig index 186c3582ebf..8ade9028af6 100644 --- a/arch/arm/mach-qemu/Kconfig +++ b/arch/arm/mach-qemu/Kconfig @@ -25,6 +25,26 @@ config TARGET_QEMU_ARM_64BIT select ARM64 select BOARD_LATE_INIT
+config TARGET_QEMU_ARM_64BIT_TPL + bool "ARMv8, 64bit, with TPL" + select ARM64 + select BOARD_LATE_INIT + select SPL + select TPL + select SPL_LIBCOMMON_SUPPORT + select TPL_LIBCOMMON_SUPPORT + select SPL_LIBGENERIC_SUPPORT + select TPL_LIBGENERIC_SUPPORT + select SPL_OF_CONTROL + select TPL_OF_CONTROL + select SPL_DM + select TPL_DM + select BINMAN + select SPL_SERIAL + select TPL_SERIAL + imply SPL_FRAMEWORK_BOARD_INIT_F + imply RPL_FRAMEWORK_BOARD_INIT_F + endchoice
endif diff --git a/board/emulation/qemu-arm/Kconfig b/board/emulation/qemu-arm/Kconfig index e21c135e86f..32c71bb0421 100644 --- a/board/emulation/qemu-arm/Kconfig +++ b/board/emulation/qemu-arm/Kconfig @@ -1,4 +1,4 @@ -if TARGET_QEMU_ARM_32BIT || TARGET_QEMU_ARM_64BIT +if TARGET_QEMU_ARM_32BIT || TARGET_QEMU_ARM_64BIT || TARGET_QEMU_ARM_64BIT_TPL
config TEXT_BASE default 0x00000000 diff --git a/board/emulation/qemu-arm/MAINTAINERS b/board/emulation/qemu-arm/MAINTAINERS index 5154262f29e..559409c18bb 100644 --- a/board/emulation/qemu-arm/MAINTAINERS +++ b/board/emulation/qemu-arm/MAINTAINERS @@ -6,3 +6,8 @@ F: board/emulation/common/ F: include/configs/qemu-arm.h F: configs/qemu_arm_defconfig F: configs/qemu_arm64_defconfig + +QEMU ARM 'VIRT' TPL BOARD +M: Simon Glass sjg@chromium.org +S: Maintained +F: configs/qemu_arm64_tpl_defconfig diff --git a/board/emulation/qemu-arm/Makefile b/board/emulation/qemu-arm/Makefile index a22d1237ff4..ef40943052c 100644 --- a/board/emulation/qemu-arm/Makefile +++ b/board/emulation/qemu-arm/Makefile @@ -1,3 +1,4 @@ # SPDX-License-Identifier: GPL-2.0+
obj-y += qemu-arm.o +obj-$(CONFIG_SPL_BUILD) += xpl.o diff --git a/board/emulation/qemu-arm/qemu-arm.env b/board/emulation/qemu-arm/qemu-arm.env index fb4adef281e..0190db82e4e 100644 --- a/board/emulation/qemu-arm/qemu-arm.env +++ b/board/emulation/qemu-arm/qemu-arm.env @@ -13,3 +13,7 @@ pxefile_addr_r=0x40300000 kernel_addr_r=0x40400000 ramdisk_addr_r=0x44000000 boot_targets=qfw usb scsi virtio nvme dhcp + +#ifdef CONFIG_TARGET_QEMU_ARM_64BIT_TPL +board_name="qemu-arm64_tpl" +#endif diff --git a/board/emulation/qemu-arm/xpl.c b/board/emulation/qemu-arm/xpl.c new file mode 100644 index 00000000000..ec59e0ff327 --- /dev/null +++ b/board/emulation/qemu-arm/xpl.c @@ -0,0 +1,35 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright 2024 Google LLC + * Written by Simon Glass sjg@chromium.org + */ + +#define LOG_DEBUG +#define LOG_CATEGORY LOGC_BOOT + +#include <mapmem.h> +#include <spl.h> + +unsigned int spl_boot_device(void) +{ + return BOOT_DEVICE_BOARD; +} + +static int binman_load_image(struct spl_image_info *img, + struct spl_boot_device *bootdev) +{ + ulong base = spl_get_image_pos(); + ulong size = spl_get_image_size(); + + log_debug("Booting from address %lx size %lx\n", base, size); + img->name = spl_phase_name(spl_next_phase()); + img->load_addr = base; + img->entry_point = base; + + return 0; +} +SPL_LOAD_IMAGE_METHOD("binman", 0, BOOT_DEVICE_BOARD, binman_load_image); + +void reset_cpu(void) +{ +} diff --git a/configs/qemu_arm64_tpl_defconfig b/configs/qemu_arm64_tpl_defconfig new file mode 100644 index 00000000000..f09e357e5cc --- /dev/null +++ b/configs/qemu_arm64_tpl_defconfig @@ -0,0 +1,82 @@ +CONFIG_ARM=y +CONFIG_POSITION_INDEPENDENT=y +CONFIG_ARCH_QEMU=y +CONFIG_TEXT_BASE=0x13000 +CONFIG_SYS_MALLOC_LEN=0x1000000 +CONFIG_NR_DRAM_BANKS=1 +CONFIG_HAS_CUSTOM_SYS_INIT_SP_ADDR=y +CONFIG_CUSTOM_SYS_INIT_SP_ADDR=0x40200000 +CONFIG_ENV_SIZE=0x40000 +CONFIG_ENV_SECT_SIZE=0x40000 +CONFIG_DEFAULT_DEVICE_TREE="qemu-arm64" +CONFIG_SPL_TEXT_BASE=0x6000 +CONFIG_TARGET_QEMU_ARM_64BIT_TPL=y +CONFIG_SPL_BSS_MAX_SIZE=0x10000 +CONFIG_DEBUG_UART_BASE=0x9000000 +CONFIG_DEBUG_UART_CLOCK=0 +CONFIG_ARMV8_CRYPTO=y +CONFIG_SYS_LOAD_ADDR=0x40200000 +CONFIG_ENV_ADDR=0x4000000 +CONFIG_TPL_MAX_SIZE=0x10000 +CONFIG_PCI=y +CONFIG_DEBUG_UART=y +CONFIG_AHCI=y +CONFIG_EFI_HTTP_BOOT=y +CONFIG_FIT=y +CONFIG_FIT_SIGNATURE=y +CONFIG_FIT_VERBOSE=y +CONFIG_FIT_BEST_MATCH=y +CONFIG_BOOTSTD_FULL=y +CONFIG_LEGACY_IMAGE_FORMAT=y +CONFIG_USE_PREBOOT=y +# CONFIG_PRE_CONSOLE_BUFFER is not set +# CONFIG_DISPLAY_CPUINFO is not set +# CONFIG_DISPLAY_BOARDINFO is not set +CONFIG_PCI_INIT_R=y +CONFIG_SPL_MAX_SIZE=0x10000 +# CONFIG_SPL_SEPARATE_BSS is not set +CONFIG_CMD_SMBIOS=y +CONFIG_CMD_BOOTZ=y +CONFIG_CMD_BOOTEFI_SELFTEST=y +CONFIG_CMD_NVEDIT_EFI=y +CONFIG_CMD_DFU=y +CONFIG_CMD_MTD=y +CONFIG_CMD_PCI=y +CONFIG_CMD_EFIDEBUG=y +CONFIG_CMD_TPM=y +CONFIG_CMD_MTDPARTS=y +CONFIG_ENV_IS_IN_FLASH=y +CONFIG_SCSI_AHCI=y +CONFIG_AHCI_PCI=y +CONFIG_DFU_TFTP=y +CONFIG_DFU_MTD=y +CONFIG_DFU_RAM=y +# CONFIG_MMC is not set +CONFIG_MTD=y +CONFIG_DM_MTD=y +CONFIG_MTD_NOR_FLASH=y +CONFIG_FLASH_SHOW_PROGRESS=0 +CONFIG_CFI_FLASH=y +CONFIG_CFI_FLASH_USE_WEAK_ACCESSORS=y +CONFIG_SYS_FLASH_USE_BUFFER_WRITE=y +CONFIG_FLASH_CFI_MTD=y +CONFIG_SYS_FLASH_CFI=y +CONFIG_SYS_MAX_FLASH_SECT=256 +CONFIG_SYS_MAX_FLASH_BANKS=2 +CONFIG_SYS_MAX_FLASH_BANKS_DETECT=y +CONFIG_E1000=y +CONFIG_NVME_PCI=y +CONFIG_PCIE_ECAM_GENERIC=y +CONFIG_SCSI=y +CONFIG_DEBUG_UART_PL011=y +CONFIG_DEBUG_UART_SHIFT=2 +CONFIG_SYSRESET=y +CONFIG_SYSRESET_CMD_POWEROFF=y +CONFIG_SYSRESET_PSCI=y +CONFIG_TPM2_MMIO=y +CONFIG_USB_EHCI_HCD=y +CONFIG_USB_EHCI_PCI=y +# CONFIG_BINMAN_FDT is not set +CONFIG_SEMIHOSTING=y +CONFIG_TPM=y +CONFIG_TPL_LZ4=y

Collect the relocation code in one place so that it can be used by the SPL relocating-loader.
Signed-off-by: Simon Glass sjg@chromium.org ---
arch/arm/cpu/armv8/u-boot-spl.lds | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/arch/arm/cpu/armv8/u-boot-spl.lds b/arch/arm/cpu/armv8/u-boot-spl.lds index fed69644b55..a296af39320 100644 --- a/arch/arm/cpu/armv8/u-boot-spl.lds +++ b/arch/arm/cpu/armv8/u-boot-spl.lds @@ -27,8 +27,16 @@ SECTIONS .text : { . = ALIGN(8); CPUDIR/start.o (.text*) + + /* put relocation code all together */ + _rcode_start = .; + *(.text.rcode) + *(.text.rdata) + _rcode_end = .; + *(.text*) } >.sram + _rcode_size = _rcode_end - _rcode_start;
.rodata : { . = ALIGN(8);

Move SPL to start in the RAM region, using the relocating loader to copy it there and run from there.
Add a simple test to make sure this works as expected.
Signed-off-by: Simon Glass sjg@chromium.org ---
arch/arm/dts/qemu-arm64.dts | 1 - board/emulation/qemu-arm/xpl.c | 15 +++++++++++++++ configs/qemu_arm64_tpl_defconfig | 3 ++- test/py/tests/test_reloc.py | 21 +++++++++++++++++++++ 4 files changed, 38 insertions(+), 2 deletions(-) create mode 100644 test/py/tests/test_reloc.py
diff --git a/arch/arm/dts/qemu-arm64.dts b/arch/arm/dts/qemu-arm64.dts index de943642e76..098992f5da8 100644 --- a/arch/arm/dts/qemu-arm64.dts +++ b/arch/arm/dts/qemu-arm64.dts @@ -15,7 +15,6 @@
u-boot-spl { symbols-base = <0>; - offset = <CONFIG_SPL_TEXT_BASE>; };
u-boot { diff --git a/board/emulation/qemu-arm/xpl.c b/board/emulation/qemu-arm/xpl.c index ec59e0ff327..aa002ebcc90 100644 --- a/board/emulation/qemu-arm/xpl.c +++ b/board/emulation/qemu-arm/xpl.c @@ -20,12 +20,27 @@ static int binman_load_image(struct spl_image_info *img, { ulong base = spl_get_image_pos(); ulong size = spl_get_image_size(); + ulong spl_load_addr; + int ret;
log_debug("Booting from address %lx size %lx\n", base, size); img->name = spl_phase_name(spl_next_phase()); img->load_addr = base; img->entry_point = base;
+ if (CONFIG_IS_ENABLED(RELOC_LOADER)) { + img->size = size; + spl_set_fdt_size(img, 0); + ret = spl_reloc_prepare(img, &spl_load_addr); + if (ret) + return log_msg_ret("rel", ret); + log_debug("Loading to %lx\n", spl_load_addr); + memcpy(map_sysmem(spl_load_addr, size), + map_sysmem(base, size), size); + img->load_addr = CONFIG_SPL_TEXT_BASE; + img->entry_point = CONFIG_SPL_TEXT_BASE; + } + return 0; } SPL_LOAD_IMAGE_METHOD("binman", 0, BOOT_DEVICE_BOARD, binman_load_image); diff --git a/configs/qemu_arm64_tpl_defconfig b/configs/qemu_arm64_tpl_defconfig index f09e357e5cc..5f506d2b553 100644 --- a/configs/qemu_arm64_tpl_defconfig +++ b/configs/qemu_arm64_tpl_defconfig @@ -9,7 +9,7 @@ CONFIG_CUSTOM_SYS_INIT_SP_ADDR=0x40200000 CONFIG_ENV_SIZE=0x40000 CONFIG_ENV_SECT_SIZE=0x40000 CONFIG_DEFAULT_DEVICE_TREE="qemu-arm64" -CONFIG_SPL_TEXT_BASE=0x6000 +CONFIG_SPL_TEXT_BASE=0x40100000 CONFIG_TARGET_QEMU_ARM_64BIT_TPL=y CONFIG_SPL_BSS_MAX_SIZE=0x10000 CONFIG_DEBUG_UART_BASE=0x9000000 @@ -35,6 +35,7 @@ CONFIG_USE_PREBOOT=y CONFIG_PCI_INIT_R=y CONFIG_SPL_MAX_SIZE=0x10000 # CONFIG_SPL_SEPARATE_BSS is not set +CONFIG_TPL_RELOC_LOADER=y CONFIG_CMD_SMBIOS=y CONFIG_CMD_BOOTZ=y CONFIG_CMD_BOOTEFI_SELFTEST=y diff --git a/test/py/tests/test_reloc.py b/test/py/tests/test_reloc.py new file mode 100644 index 00000000000..82913194fba --- /dev/null +++ b/test/py/tests/test_reloc.py @@ -0,0 +1,21 @@ +# SPDX-License-Identifier: GPL-2.0 +# Copyright 2024 Google LLC +# Written by Simon Glass sjg@chromium.org + +import pytest + +@pytest.mark.buildconfigspec('target_qemu_arm_64bit_tpl') +def test_reloc_loader(u_boot_console): + try: + cons = u_boot_console + # x = cons.restart_uboot() + output = cons.get_spawn_output().replace('\r', '') + assert 'spl_reloc TPL->SPL' in output + assert 'Loading to 40100200' in output + + # Sanity check that it is picking up the correct environment + board_name = cons.run_command('print board_name') + assert board_name == 'board_name="qemu-arm64_tpl"' + finally: + # Restart afterward to get the normal U-Boot back + u_boot_console.restart_uboot()

Add this to CI. This relies on a u-boot-test-hooks update
Signed-off-by: Simon Glass sjg@chromium.org ---
.azure-pipelines.yml | 3 +++ .gitlab-ci.yml | 6 ++++++ 2 files changed, 9 insertions(+)
diff --git a/.azure-pipelines.yml b/.azure-pipelines.yml index 93111eb6127..51c4346ce64 100644 --- a/.azure-pipelines.yml +++ b/.azure-pipelines.yml @@ -415,6 +415,9 @@ stages: qemu_arm64: TEST_PY_BD: "qemu_arm64" TEST_PY_TEST_SPEC: "not sleep" + qemu_arm64_tpl: + TEST_PY_BD: "qemu_arm64_tpl" + TEST_PY_TEST_SPEC: "test_reloc_loader" qemu_m68k: TEST_PY_BD: "M5208EVBE" TEST_PY_ID: "--id qemu" diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml index 7d621031b85..8020853b73c 100644 --- a/.gitlab-ci.yml +++ b/.gitlab-ci.yml @@ -344,6 +344,12 @@ qemu_arm64 test.py: TEST_PY_TEST_SPEC: "not sleep" <<: *buildman_and_testpy_dfn
+qemu_arm64 tpl_test.py: + variables: + TEST_PY_BD: "qemu_arm64_tpl" + TEST_PY_TEST_SPEC: "test_reloc_loader" + <<: *buildman_and_testpy_dfn + qemu_m68k test.py: variables: TEST_PY_BD: "M5208EVBE"
participants (2)
-
Simon Glass
-
Tom Rini