[PATCH v3 0/5] Add pmem node for preserving distro ISO's

When installing a distro via EFI HTTP boot some OS installers expect the .iso image to be preserved and treat it as a "CDROM" to install packages.
This is problematic in EFI, since U-Boot mounts the image, starts the installer, and eventually calls ExitBootServices. At that point the image U-Boot mounted disappears. Some distros don't care and download the missing packages from a web archive, while others halt the installation complaining they can't find certain packages.
If the firmware uses ACPI, this is supported by using NFIT which provides NVDIMM ramdisks to the OS and preserves the image. We don't have anything in place for Device Trees though. Since DT supports persistent memory nodes (pmem) use those and preserve the .iso for installers.
The issue can be reproduced by attempting an EFI HTTP boot with Ubuntu live server ISO, or a Rocky Linux ISO. The installation would fail with the failure to locate certain packages.
The earlier version had aligned the addition of pmem nodes with the EFI HTTP boot feature. This has been changed, based on a review comment from Heinrich to instead be done by looping through the memory based blkmamp devices.
Changes since V2: * Fix a checkpatch error for putting a blank line after a function * Use blkmap device based scanning for adding the pmem nodes
Ilias Apalodimas (2): efi_loader: add a function to remove memory from the EFI map efi_loader: preserve installer images in pmem
Masahisa Kojima (1): fdt: add support for adding pmem nodes
Sughosh Ganu (2): blkmap: store type of blkmap device in corresponding structure blkmap: add pmem nodes for blkmap mem mapping devices
boot/fdt_support.c | 40 ++++++++++++- boot/image-fdt.c | 9 +++ cmd/blkmap.c | 16 ++++-- drivers/block/blkmap.c | 90 +++-------------------------- drivers/block/blkmap_helper.c | 47 +++++++++++++++- include/blkmap.h | 103 +++++++++++++++++++++++++++++++++- include/efi_loader.h | 11 ++-- include/fdt_support.h | 13 +++++ lib/efi_loader/efi_bootmgr.c | 22 ++++++-- lib/efi_loader/efi_memory.c | 51 ++++++++++++----- lib/lmb.c | 4 +- test/dm/blkmap.c | 16 +++--- 12 files changed, 302 insertions(+), 120 deletions(-)

From: Masahisa Kojima kojima.masahisa@socionext.com
One of the problems OS installers face, when running in EFI, is that the mounted ISO after calling ExitBootServices goes away. For some distros this is a problem since they rely on finding some core packages before continuing the installation. Distros have works around this -- e.g Fedora has a special kernel command line parameter called inst.stage2 [0].
ACPI has NFIT and NVDIMM support to provide ramdisks to the OS, but we don't have anything in place for DTs. Linux and device trees have support for persistent memory devices. So add a function that can inject a pmem node in a DT, so we can use it when launhing OS installers with EFI.
[0] https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/ins...
Signed-off-by: Masahisa Kojima kojima.masahisa@socionext.com Signed-off-by: Sughosh Ganu sughosh.ganu@linaro.org --- Changes since V2: * Fix a checkpatch error by putting a blank line after a function
boot/fdt_support.c | 40 +++++++++++++++++++++++++++++++++++++++- include/fdt_support.h | 13 +++++++++++++ 2 files changed, 52 insertions(+), 1 deletion(-)
diff --git a/boot/fdt_support.c b/boot/fdt_support.c index 49efeec3681..613685b80eb 100644 --- a/boot/fdt_support.c +++ b/boot/fdt_support.c @@ -18,6 +18,7 @@ #include <dm/ofnode.h> #include <linux/ctype.h> #include <linux/types.h> +#include <linux/sizes.h> #include <asm/global_data.h> #include <asm/unaligned.h> #include <linux/libfdt.h> @@ -464,7 +465,6 @@ void do_fixup_by_compat_u32(void *fdt, const char *compat, do_fixup_by_compat(fdt, compat, prop, &tmp, 4, create); }
-#ifdef CONFIG_ARCH_FIXUP_FDT_MEMORY /* * fdt_pack_reg - pack address and size array into the "reg"-suitable stream */ @@ -493,6 +493,7 @@ static int fdt_pack_reg(const void *fdt, void *buf, u64 *address, u64 *size, return p - (char *)buf; }
+#ifdef CONFIG_ARCH_FIXUP_FDT_MEMORY #if CONFIG_NR_DRAM_BANKS > 4 #define MEMORY_BANKS_MAX CONFIG_NR_DRAM_BANKS #else @@ -2222,3 +2223,40 @@ int fdt_valid(struct fdt_header **blobp) } return 1; } + +int fdt_fixup_pmem_region(void *blob, ulong addr, u32 size) +{ + u64 pmem_start[2] = { 0 }; + u64 pmem_size[2] = { 0 }; + char pmem_node[32] = {0}; + int nodeoffset, len; + int err; + u8 tmp[4 * 16]; /* Up to 64-bit address + 64-bit size */ + + if (!IS_ALIGNED(addr, SZ_2M) || !IS_ALIGNED(addr + size, SZ_2M)) { + printf("Start and end address needs at 2MB alignment\n"); + return -1; + } + snprintf(pmem_node, sizeof(pmem_node), "pmem@%lx", addr); + nodeoffset = fdt_find_or_add_subnode(blob, 0, pmem_node); + if (nodeoffset < 0) + return nodeoffset; + + err = fdt_setprop_string(blob, nodeoffset, "compatible", "pmem-region"); + if (err) + return err; + err = fdt_setprop_empty(blob, nodeoffset, "volatile"); + if (err) + return err; + pmem_start[0] = addr; + pmem_size[0] = size; + len = fdt_pack_reg(blob, tmp, pmem_start, pmem_size, 1); + err = fdt_setprop(blob, nodeoffset, "reg", tmp, len); + if (err < 0) { + printf("WARNING: could not set pmem %s %s.\n", "reg", + fdt_strerror(err)); + return err; + } + + return 0; +} diff --git a/include/fdt_support.h b/include/fdt_support.h index f0ad2e6b365..aea24df828f 100644 --- a/include/fdt_support.h +++ b/include/fdt_support.h @@ -507,4 +507,17 @@ void fdt_fixup_pstore(void *blob); */ int fdt_kaslrseed(void *blob, bool overwrite);
+/** + * fdt_fixup_pmem_region() - add a pmem node on the device tree + * + * This functions injects a pmem node to the device tree. Usually + * used with EFI installers to preserve installer images + * + * @blob: device tree provided by caller + * @addr: start address of the pmem node + * @size: size of the memory of the pmem node + * Return: 0 on success or < 0 on failure + */ +int fdt_fixup_pmem_region(void *blob, ulong addr, u32 size); + #endif /* ifndef __FDT_SUPPORT_H */

On 20.01.25 11:50, Sughosh Ganu wrote:
From: Masahisa Kojima kojima.masahisa@socionext.com
One of the problems OS installers face, when running in EFI, is that the mounted ISO after calling ExitBootServices goes away. For some distros this is a problem since they rely on finding some core packages before continuing the installation. Distros have works around this -- e.g Fedora has a special kernel command line parameter called inst.stage2 [0].
ACPI has NFIT and NVDIMM support to provide ramdisks to the OS, but we don't have anything in place for DTs. Linux and device trees have support for persistent memory devices. So add a function that can inject a pmem node in a DT, so we can use it when launhing OS installers with EFI.
We also have ACPI support in U-Boot. In a future patch series I guess we could add the generation of said ACPI tables.
[0] https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/ins...
Signed-off-by: Masahisa Kojima kojima.masahisa@socionext.com Signed-off-by: Sughosh Ganu sughosh.ganu@linaro.org
Changes since V2:
Fix a checkpatch error by putting a blank line after a function
boot/fdt_support.c | 40 +++++++++++++++++++++++++++++++++++++++- include/fdt_support.h | 13 +++++++++++++ 2 files changed, 52 insertions(+), 1 deletion(-)
diff --git a/boot/fdt_support.c b/boot/fdt_support.c index 49efeec3681..613685b80eb 100644 --- a/boot/fdt_support.c +++ b/boot/fdt_support.c @@ -18,6 +18,7 @@ #include <dm/ofnode.h> #include <linux/ctype.h> #include <linux/types.h> +#include <linux/sizes.h> #include <asm/global_data.h> #include <asm/unaligned.h> #include <linux/libfdt.h> @@ -464,7 +465,6 @@ void do_fixup_by_compat_u32(void *fdt, const char *compat, do_fixup_by_compat(fdt, compat, prop, &tmp, 4, create); }
-#ifdef CONFIG_ARCH_FIXUP_FDT_MEMORY> /*
- fdt_pack_reg - pack address and size array into the "reg"-suitable stream
*/ @@ -493,6 +493,7 @@ static int fdt_pack_reg(const void *fdt, void *buf, u64 *address, u64 *size, return p - (char *)buf; }
+#ifdef CONFIG_ARCH_FIXUP_FDT_MEMORY
Do we need this #ifdef at all? We tend to leave it to the linker to eliminate functions.
#if CONFIG_NR_DRAM_BANKS > 4 #define MEMORY_BANKS_MAX CONFIG_NR_DRAM_BANKS #else @@ -2222,3 +2223,40 @@ int fdt_valid(struct fdt_header **blobp) } return 1; }
+int fdt_fixup_pmem_region(void *blob, ulong addr, u32 size) +{
- u64 pmem_start[2] = { 0 };
- u64 pmem_size[2] = { 0 };
- char pmem_node[32] = {0};
- int nodeoffset, len;
- int err;
- u8 tmp[4 * 16]; /* Up to 64-bit address + 64-bit size */
- if (!IS_ALIGNED(addr, SZ_2M) || !IS_ALIGNED(addr + size, SZ_2M)) {
printf("Start and end address needs at 2MB alignment\n");
Nits:
%s/needs at 2MB alignment/must be 2 MiB aligned/
return -1;
- }
- snprintf(pmem_node, sizeof(pmem_node), "pmem@%lx", addr);
- nodeoffset = fdt_find_or_add_subnode(blob, 0, pmem_node);
- if (nodeoffset < 0)
return nodeoffset;
- err = fdt_setprop_string(blob, nodeoffset, "compatible", "pmem-region");
- if (err)
return err;
- err = fdt_setprop_empty(blob, nodeoffset, "volatile");
- if (err)
return err;
- pmem_start[0] = addr;
- pmem_size[0] = size;
- len = fdt_pack_reg(blob, tmp, pmem_start, pmem_size, 1);
- err = fdt_setprop(blob, nodeoffset, "reg", tmp, len);
- if (err < 0) {
printf("WARNING: could not set pmem %s %s.\n", "reg",
fdt_strerror(err));
return err;
- }
- return 0;
+} diff --git a/include/fdt_support.h b/include/fdt_support.h index f0ad2e6b365..aea24df828f 100644 --- a/include/fdt_support.h +++ b/include/fdt_support.h @@ -507,4 +507,17 @@ void fdt_fixup_pstore(void *blob); */ int fdt_kaslrseed(void *blob, bool overwrite);
+/**
- fdt_fixup_pmem_region() - add a pmem node on the device tree
- This functions injects a pmem node to the device tree. Usually
- used with EFI installers to preserve installer images
If I understand the code correctly, the function adds or updates a pmem device-tree node for the given start address.
Otherwise looks good to me.
Best regards
Heinrich
- @blob: device tree provided by caller
- @addr: start address of the pmem node
- @size: size of the memory of the pmem node
- Return: 0 on success or < 0 on failure
- */
+int fdt_fixup_pmem_region(void *blob, ulong addr, u32 size);
- #endif /* ifndef __FDT_SUPPORT_H */

On Mon, 20 Jan 2025 at 18:01, Heinrich Schuchardt xypron.glpk@gmx.de wrote:
On 20.01.25 11:50, Sughosh Ganu wrote:
From: Masahisa Kojima kojima.masahisa@socionext.com
One of the problems OS installers face, when running in EFI, is that the mounted ISO after calling ExitBootServices goes away. For some distros this is a problem since they rely on finding some core packages before continuing the installation. Distros have works around this -- e.g Fedora has a special kernel command line parameter called inst.stage2 [0].
ACPI has NFIT and NVDIMM support to provide ramdisks to the OS, but we don't have anything in place for DTs. Linux and device trees have support for persistent memory devices. So add a function that can inject a pmem node in a DT, so we can use it when launhing OS installers with EFI.
We also have ACPI support in U-Boot. In a future patch series I guess we could add the generation of said ACPI tables.
[0] https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/ins...
Signed-off-by: Masahisa Kojima kojima.masahisa@socionext.com Signed-off-by: Sughosh Ganu sughosh.ganu@linaro.org
Changes since V2:
Fix a checkpatch error by putting a blank line after a function
boot/fdt_support.c | 40 +++++++++++++++++++++++++++++++++++++++- include/fdt_support.h | 13 +++++++++++++ 2 files changed, 52 insertions(+), 1 deletion(-)
diff --git a/boot/fdt_support.c b/boot/fdt_support.c index 49efeec3681..613685b80eb 100644 --- a/boot/fdt_support.c +++ b/boot/fdt_support.c @@ -18,6 +18,7 @@ #include <dm/ofnode.h> #include <linux/ctype.h> #include <linux/types.h> +#include <linux/sizes.h> #include <asm/global_data.h> #include <asm/unaligned.h> #include <linux/libfdt.h> @@ -464,7 +465,6 @@ void do_fixup_by_compat_u32(void *fdt, const char *compat, do_fixup_by_compat(fdt, compat, prop, &tmp, 4, create); }
-#ifdef CONFIG_ARCH_FIXUP_FDT_MEMORY> /*
- fdt_pack_reg - pack address and size array into the "reg"-suitable stream
*/ @@ -493,6 +493,7 @@ static int fdt_pack_reg(const void *fdt, void *buf, u64 *address, u64 *size, return p - (char *)buf; }
+#ifdef CONFIG_ARCH_FIXUP_FDT_MEMORY
Do we need this #ifdef at all? We tend to leave it to the linker to eliminate functions.
This patch was authored by Kojima-san. I will have to check if we need to keep the ifdef.
#if CONFIG_NR_DRAM_BANKS > 4 #define MEMORY_BANKS_MAX CONFIG_NR_DRAM_BANKS #else @@ -2222,3 +2223,40 @@ int fdt_valid(struct fdt_header **blobp) } return 1; }
+int fdt_fixup_pmem_region(void *blob, ulong addr, u32 size) +{
u64 pmem_start[2] = { 0 };
u64 pmem_size[2] = { 0 };
char pmem_node[32] = {0};
int nodeoffset, len;
int err;
u8 tmp[4 * 16]; /* Up to 64-bit address + 64-bit size */
if (!IS_ALIGNED(addr, SZ_2M) || !IS_ALIGNED(addr + size, SZ_2M)) {
printf("Start and end address needs at 2MB alignment\n");
Nits:
%s/needs at 2MB alignment/must be 2 MiB aligned/
Will change
return -1;
}
snprintf(pmem_node, sizeof(pmem_node), "pmem@%lx", addr);
nodeoffset = fdt_find_or_add_subnode(blob, 0, pmem_node);
if (nodeoffset < 0)
return nodeoffset;
err = fdt_setprop_string(blob, nodeoffset, "compatible", "pmem-region");
if (err)
return err;
err = fdt_setprop_empty(blob, nodeoffset, "volatile");
if (err)
return err;
pmem_start[0] = addr;
pmem_size[0] = size;
len = fdt_pack_reg(blob, tmp, pmem_start, pmem_size, 1);
err = fdt_setprop(blob, nodeoffset, "reg", tmp, len);
if (err < 0) {
printf("WARNING: could not set pmem %s %s.\n", "reg",
fdt_strerror(err));
return err;
}
return 0;
+} diff --git a/include/fdt_support.h b/include/fdt_support.h index f0ad2e6b365..aea24df828f 100644 --- a/include/fdt_support.h +++ b/include/fdt_support.h @@ -507,4 +507,17 @@ void fdt_fixup_pstore(void *blob); */ int fdt_kaslrseed(void *blob, bool overwrite);
+/**
- fdt_fixup_pmem_region() - add a pmem node on the device tree
- This functions injects a pmem node to the device tree. Usually
- used with EFI installers to preserve installer images
If I understand the code correctly, the function adds or updates a pmem device-tree node for the given start address.
Will reword the comment.
-sughosh
Otherwise looks good to me.
Best regards
Heinrich
- @blob: device tree provided by caller
- @addr: start address of the pmem node
- @size: size of the memory of the pmem node
- Return: 0 on success or < 0 on failure
- */
+int fdt_fixup_pmem_region(void *blob, ulong addr, u32 size);
- #endif /* ifndef __FDT_SUPPORT_H */

On Mon, 20 Jan 2025 at 19:32, Sughosh Ganu sughosh.ganu@linaro.org wrote:
On Mon, 20 Jan 2025 at 18:01, Heinrich Schuchardt xypron.glpk@gmx.de wrote:
On 20.01.25 11:50, Sughosh Ganu wrote:
From: Masahisa Kojima kojima.masahisa@socionext.com
One of the problems OS installers face, when running in EFI, is that the mounted ISO after calling ExitBootServices goes away. For some distros this is a problem since they rely on finding some core packages before continuing the installation. Distros have works around this -- e.g Fedora has a special kernel command line parameter called inst.stage2 [0].
ACPI has NFIT and NVDIMM support to provide ramdisks to the OS, but we don't have anything in place for DTs. Linux and device trees have support for persistent memory devices. So add a function that can inject a pmem node in a DT, so we can use it when launhing OS installers with EFI.
We also have ACPI support in U-Boot. In a future patch series I guess we could add the generation of said ACPI tables.
[0] https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/ins...
Signed-off-by: Masahisa Kojima kojima.masahisa@socionext.com Signed-off-by: Sughosh Ganu sughosh.ganu@linaro.org
Changes since V2:
Fix a checkpatch error by putting a blank line after a function
boot/fdt_support.c | 40 +++++++++++++++++++++++++++++++++++++++- include/fdt_support.h | 13 +++++++++++++ 2 files changed, 52 insertions(+), 1 deletion(-)
diff --git a/boot/fdt_support.c b/boot/fdt_support.c index 49efeec3681..613685b80eb 100644 --- a/boot/fdt_support.c +++ b/boot/fdt_support.c @@ -18,6 +18,7 @@ #include <dm/ofnode.h> #include <linux/ctype.h> #include <linux/types.h> +#include <linux/sizes.h> #include <asm/global_data.h> #include <asm/unaligned.h> #include <linux/libfdt.h> @@ -464,7 +465,6 @@ void do_fixup_by_compat_u32(void *fdt, const char *compat, do_fixup_by_compat(fdt, compat, prop, &tmp, 4, create); }
-#ifdef CONFIG_ARCH_FIXUP_FDT_MEMORY> /*
- fdt_pack_reg - pack address and size array into the "reg"-suitable stream
*/ @@ -493,6 +493,7 @@ static int fdt_pack_reg(const void *fdt, void *buf, u64 *address, u64 *size, return p - (char *)buf; }
+#ifdef CONFIG_ARCH_FIXUP_FDT_MEMORY
Do we need this #ifdef at all? We tend to leave it to the linker to eliminate functions.
This patch was authored by Kojima-san. I will have to check if we need to keep the ifdef.
I tried removing the ifdef from the said location. However, this causes a size increase on platforms where this config symbol is not enabled -- I tried this with the xilinx_zynqmp_virt_defconfig. This could be because the fdt_fixup_memory() function is included on all platforms as it gets called from cmd/fdt.c. The fdt_fixup_memory() function calls fdt_fixup_memory_banks() which is protected by this ifdef. Removing the ifdef then results in the fdt_fixup_memory_banks() getting included, instead of the empty stub resulting in the size increase.
-sughosh
#if CONFIG_NR_DRAM_BANKS > 4 #define MEMORY_BANKS_MAX CONFIG_NR_DRAM_BANKS #else @@ -2222,3 +2223,40 @@ int fdt_valid(struct fdt_header **blobp) } return 1; }
+int fdt_fixup_pmem_region(void *blob, ulong addr, u32 size) +{
u64 pmem_start[2] = { 0 };
u64 pmem_size[2] = { 0 };
char pmem_node[32] = {0};
int nodeoffset, len;
int err;
u8 tmp[4 * 16]; /* Up to 64-bit address + 64-bit size */
if (!IS_ALIGNED(addr, SZ_2M) || !IS_ALIGNED(addr + size, SZ_2M)) {
printf("Start and end address needs at 2MB alignment\n");
Nits:
%s/needs at 2MB alignment/must be 2 MiB aligned/
Will change
return -1;
}
snprintf(pmem_node, sizeof(pmem_node), "pmem@%lx", addr);
nodeoffset = fdt_find_or_add_subnode(blob, 0, pmem_node);
if (nodeoffset < 0)
return nodeoffset;
err = fdt_setprop_string(blob, nodeoffset, "compatible", "pmem-region");
if (err)
return err;
err = fdt_setprop_empty(blob, nodeoffset, "volatile");
if (err)
return err;
pmem_start[0] = addr;
pmem_size[0] = size;
len = fdt_pack_reg(blob, tmp, pmem_start, pmem_size, 1);
err = fdt_setprop(blob, nodeoffset, "reg", tmp, len);
if (err < 0) {
printf("WARNING: could not set pmem %s %s.\n", "reg",
fdt_strerror(err));
return err;
}
return 0;
+} diff --git a/include/fdt_support.h b/include/fdt_support.h index f0ad2e6b365..aea24df828f 100644 --- a/include/fdt_support.h +++ b/include/fdt_support.h @@ -507,4 +507,17 @@ void fdt_fixup_pstore(void *blob); */ int fdt_kaslrseed(void *blob, bool overwrite);
+/**
- fdt_fixup_pmem_region() - add a pmem node on the device tree
- This functions injects a pmem node to the device tree. Usually
- used with EFI installers to preserve installer images
If I understand the code correctly, the function adds or updates a pmem device-tree node for the given start address.
Will reword the comment.
-sughosh
Otherwise looks good to me.
Best regards
Heinrich
- @blob: device tree provided by caller
- @addr: start address of the pmem node
- @size: size of the memory of the pmem node
- Return: 0 on success or < 0 on failure
- */
+int fdt_fixup_pmem_region(void *blob, ulong addr, u32 size);
- #endif /* ifndef __FDT_SUPPORT_H */

From: Ilias Apalodimas ilias.apalodimas@linaro.org
With upcoming changes supporting pmem nodes, we need to remove the pmem area from the EFI memory map. Add a function to do that.
Signed-off-by: Ilias Apalodimas ilias.apalodimas@linaro.org Signed-off-by: Sughosh Ganu sughosh.ganu@linaro.org --- Changes since V2: None
include/efi_loader.h | 11 +++++--- lib/efi_loader/efi_memory.c | 51 +++++++++++++++++++++++++++---------- lib/lmb.c | 4 +-- 3 files changed, 47 insertions(+), 19 deletions(-)
diff --git a/include/efi_loader.h b/include/efi_loader.h index 0d858c1e12e..944794e1637 100644 --- a/include/efi_loader.h +++ b/include/efi_loader.h @@ -831,7 +831,7 @@ efi_status_t efi_get_memory_map(efi_uintn_t *memory_map_size, efi_status_t efi_add_memory_map(u64 start, u64 size, int memory_type);
/** - * efi_add_memory_map_pg() - add pages to the memory map + * efi_update_memory_map() - update the memory map by adding/removing pages * * @start: start address, must be a multiple of * EFI_PAGE_SIZE @@ -839,11 +839,14 @@ efi_status_t efi_add_memory_map(u64 start, u64 size, int memory_type); * @memory_type: type of memory added * @overlap_conventional: region may only overlap free(conventional) * memory + * @remove: remove memory map * Return: status code */ -efi_status_t efi_add_memory_map_pg(u64 start, u64 pages, - int memory_type, - bool overlap_conventional); +efi_status_t efi_update_memory_map(u64 start, u64 pages, int memory_type, + bool overlap_conventional, bool remove); + +/* Remove memory from the EFI memory map */ +efi_status_t efi_remove_memory_map(u64 start, u64 size, int memory_type);
/* Called by board init to initialize the EFI drivers */ efi_status_t efi_driver_init(void); diff --git a/lib/efi_loader/efi_memory.c b/lib/efi_loader/efi_memory.c index 1212772471e..2362995abf6 100644 --- a/lib/efi_loader/efi_memory.c +++ b/lib/efi_loader/efi_memory.c @@ -258,7 +258,7 @@ static s64 efi_mem_carve_out(struct efi_mem_list *map, }
/** - * efi_add_memory_map_pg() - add pages to the memory map + * efi_update_memory_map() - update the memory map by adding/removing pages * * @start: start address, must be a multiple of * EFI_PAGE_SIZE @@ -266,11 +266,11 @@ static s64 efi_mem_carve_out(struct efi_mem_list *map, * @memory_type: type of memory added * @overlap_conventional: region may only overlap free(conventional) * memory + * @remove: remove memory map * Return: status code */ -efi_status_t efi_add_memory_map_pg(u64 start, u64 pages, - int memory_type, - bool overlap_conventional) +efi_status_t efi_update_memory_map(u64 start, u64 pages, int memory_type, + bool overlap_conventional, bool remove) { struct efi_mem_list *lmem; struct efi_mem_list *newlist; @@ -278,9 +278,9 @@ efi_status_t efi_add_memory_map_pg(u64 start, u64 pages, uint64_t carved_pages = 0; struct efi_event *evt;
- EFI_PRINT("%s: 0x%llx 0x%llx %d %s\n", __func__, + EFI_PRINT("%s: 0x%llx 0x%llx %d %s %s\n", __func__, start, pages, memory_type, overlap_conventional ? - "yes" : "no"); + "yes" : "no", remove ? "remove" : "add");
if (memory_type >= EFI_MAX_MEMORY_TYPE) return EFI_INVALID_PARAMETER; @@ -363,7 +363,10 @@ efi_status_t efi_add_memory_map_pg(u64 start, u64 pages, }
/* Add our new map */ - list_add_tail(&newlist->link, &efi_mem); + if (!remove) + list_add_tail(&newlist->link, &efi_mem); + else + free(newlist);
/* And make sure memory is listed in descending order */ efi_mem_sort(); @@ -400,7 +403,29 @@ efi_status_t efi_add_memory_map(u64 start, u64 size, int memory_type) pages = efi_size_in_pages(size + (start & EFI_PAGE_MASK)); start &= ~EFI_PAGE_MASK;
- return efi_add_memory_map_pg(start, pages, memory_type, false); + return efi_update_memory_map(start, pages, memory_type, false, false); +} + +/** + * efi_remove_memory_map() - remove memory area to the memory map + * + * @start: start address of the memory area + * @size: length in bytes of the memory area + * @memory_type: type of memory removed + * + * Return: status code + * + * This function automatically aligns the start and size of the memory area + * to EFI_PAGE_SIZE. + */ +efi_status_t efi_remove_memory_map(u64 start, u64 size, int memory_type) +{ + u64 pages; + + pages = efi_size_in_pages(size + (start & EFI_PAGE_MASK)); + start &= ~EFI_PAGE_MASK; + + return efi_update_memory_map(start, pages, memory_type, false, true); }
/** @@ -501,7 +526,7 @@ efi_status_t efi_allocate_pages(enum efi_allocate_type type,
efi_addr = (u64)(uintptr_t)map_sysmem(addr, 0); /* Reserve that map in our memory maps */ - ret = efi_add_memory_map_pg(efi_addr, pages, memory_type, true); + ret = efi_update_memory_map(efi_addr, pages, memory_type, true, false); if (ret != EFI_SUCCESS) { /* Map would overlap, bail out */ lmb_free_flags(addr, (u64)pages << EFI_PAGE_SHIFT, flags); @@ -822,8 +847,8 @@ static void add_u_boot_and_runtime(void) uboot_stack_size) & ~EFI_PAGE_MASK; uboot_pages = ((uintptr_t)map_sysmem(gd->ram_top - 1, 0) - uboot_start + EFI_PAGE_MASK) >> EFI_PAGE_SHIFT; - efi_add_memory_map_pg(uboot_start, uboot_pages, EFI_BOOT_SERVICES_CODE, - false); + efi_update_memory_map(uboot_start, uboot_pages, EFI_BOOT_SERVICES_CODE, + false, false); #if defined(__aarch64__) /* * Runtime Services must be 64KiB aligned according to the @@ -841,8 +866,8 @@ static void add_u_boot_and_runtime(void) runtime_end = (uintptr_t)__efi_runtime_stop; runtime_end = (runtime_end + runtime_mask) & ~runtime_mask; runtime_pages = (runtime_end - runtime_start) >> EFI_PAGE_SHIFT; - efi_add_memory_map_pg(runtime_start, runtime_pages, - EFI_RUNTIME_SERVICES_CODE, false); + efi_update_memory_map(runtime_start, runtime_pages, + EFI_RUNTIME_SERVICES_CODE, false, false); }
int efi_memory_init(void) diff --git a/lib/lmb.c b/lib/lmb.c index 7ca44591e1d..d5aea1ed8fe 100644 --- a/lib/lmb.c +++ b/lib/lmb.c @@ -455,11 +455,11 @@ static int lmb_map_update_notify(phys_addr_t addr, phys_size_t size, u8 op, pages = efi_size_in_pages(size + (efi_addr & EFI_PAGE_MASK)); efi_addr &= ~EFI_PAGE_MASK;
- status = efi_add_memory_map_pg(efi_addr, pages, + status = efi_update_memory_map(efi_addr, pages, op == MAP_OP_RESERVE ? EFI_BOOT_SERVICES_DATA : EFI_CONVENTIONAL_MEMORY, - false); + false, false); if (status != EFI_SUCCESS) { log_err("%s: LMB Map notify failure %lu\n", __func__, status & ~EFI_ERROR_MASK);

On 20.01.25 11:50, Sughosh Ganu wrote:
From: Ilias Apalodimas ilias.apalodimas@linaro.org
With upcoming changes supporting pmem nodes, we need to remove the pmem area from the EFI memory map. Add a function to do that.
Signed-off-by: Ilias Apalodimas ilias.apalodimas@linaro.org Signed-off-by: Sughosh Ganu sughosh.ganu@linaro.org
Reviewed-by: Heinrich Schuchardt xypron.glpk@gmx.de
Changes since V2: None
include/efi_loader.h | 11 +++++--- lib/efi_loader/efi_memory.c | 51 +++++++++++++++++++++++++++---------- lib/lmb.c | 4 +-- 3 files changed, 47 insertions(+), 19 deletions(-)
diff --git a/include/efi_loader.h b/include/efi_loader.h index 0d858c1e12e..944794e1637 100644 --- a/include/efi_loader.h +++ b/include/efi_loader.h @@ -831,7 +831,7 @@ efi_status_t efi_get_memory_map(efi_uintn_t *memory_map_size, efi_status_t efi_add_memory_map(u64 start, u64 size, int memory_type);
/**
- efi_add_memory_map_pg() - add pages to the memory map
- efi_update_memory_map() - update the memory map by adding/removing pages
- @start: start address, must be a multiple of
EFI_PAGE_SIZE
@@ -839,11 +839,14 @@ efi_status_t efi_add_memory_map(u64 start, u64 size, int memory_type);
- @memory_type: type of memory added
- @overlap_conventional: region may only overlap free(conventional)
memory
*/
- @remove: remove memory map
- Return: status code
-efi_status_t efi_add_memory_map_pg(u64 start, u64 pages,
int memory_type,
bool overlap_conventional);
+efi_status_t efi_update_memory_map(u64 start, u64 pages, int memory_type,
bool overlap_conventional, bool remove);
+/* Remove memory from the EFI memory map */ +efi_status_t efi_remove_memory_map(u64 start, u64 size, int memory_type);
/* Called by board init to initialize the EFI drivers */ efi_status_t efi_driver_init(void); diff --git a/lib/efi_loader/efi_memory.c b/lib/efi_loader/efi_memory.c index 1212772471e..2362995abf6 100644 --- a/lib/efi_loader/efi_memory.c +++ b/lib/efi_loader/efi_memory.c @@ -258,7 +258,7 @@ static s64 efi_mem_carve_out(struct efi_mem_list *map, }
/**
- efi_add_memory_map_pg() - add pages to the memory map
- efi_update_memory_map() - update the memory map by adding/removing pages
- @start: start address, must be a multiple of
EFI_PAGE_SIZE
@@ -266,11 +266,11 @@ static s64 efi_mem_carve_out(struct efi_mem_list *map,
- @memory_type: type of memory added
- @overlap_conventional: region may only overlap free(conventional)
memory
*/
- @remove: remove memory map
- Return: status code
-efi_status_t efi_add_memory_map_pg(u64 start, u64 pages,
int memory_type,
bool overlap_conventional)
+efi_status_t efi_update_memory_map(u64 start, u64 pages, int memory_type,
{ struct efi_mem_list *lmem; struct efi_mem_list *newlist;bool overlap_conventional, bool remove)
@@ -278,9 +278,9 @@ efi_status_t efi_add_memory_map_pg(u64 start, u64 pages, uint64_t carved_pages = 0; struct efi_event *evt;
- EFI_PRINT("%s: 0x%llx 0x%llx %d %s\n", __func__,
- EFI_PRINT("%s: 0x%llx 0x%llx %d %s %s\n", __func__, start, pages, memory_type, overlap_conventional ?
"yes" : "no");
"yes" : "no", remove ? "remove" : "add");
if (memory_type >= EFI_MAX_MEMORY_TYPE) return EFI_INVALID_PARAMETER;
@@ -363,7 +363,10 @@ efi_status_t efi_add_memory_map_pg(u64 start, u64 pages, }
/* Add our new map */
list_add_tail(&newlist->link, &efi_mem);
if (!remove)
list_add_tail(&newlist->link, &efi_mem);
else
free(newlist);
/* And make sure memory is listed in descending order */ efi_mem_sort();
@@ -400,7 +403,29 @@ efi_status_t efi_add_memory_map(u64 start, u64 size, int memory_type) pages = efi_size_in_pages(size + (start & EFI_PAGE_MASK)); start &= ~EFI_PAGE_MASK;
- return efi_add_memory_map_pg(start, pages, memory_type, false);
- return efi_update_memory_map(start, pages, memory_type, false, false);
+}
+/**
- efi_remove_memory_map() - remove memory area to the memory map
- @start: start address of the memory area
- @size: length in bytes of the memory area
- @memory_type: type of memory removed
- Return: status code
- This function automatically aligns the start and size of the memory area
- to EFI_PAGE_SIZE.
- */
+efi_status_t efi_remove_memory_map(u64 start, u64 size, int memory_type) +{
u64 pages;
pages = efi_size_in_pages(size + (start & EFI_PAGE_MASK));
start &= ~EFI_PAGE_MASK;
return efi_update_memory_map(start, pages, memory_type, false, true); }
/**
@@ -501,7 +526,7 @@ efi_status_t efi_allocate_pages(enum efi_allocate_type type,
efi_addr = (u64)(uintptr_t)map_sysmem(addr, 0); /* Reserve that map in our memory maps */
- ret = efi_add_memory_map_pg(efi_addr, pages, memory_type, true);
- ret = efi_update_memory_map(efi_addr, pages, memory_type, true, false); if (ret != EFI_SUCCESS) { /* Map would overlap, bail out */ lmb_free_flags(addr, (u64)pages << EFI_PAGE_SHIFT, flags);
@@ -822,8 +847,8 @@ static void add_u_boot_and_runtime(void) uboot_stack_size) & ~EFI_PAGE_MASK; uboot_pages = ((uintptr_t)map_sysmem(gd->ram_top - 1, 0) - uboot_start + EFI_PAGE_MASK) >> EFI_PAGE_SHIFT;
- efi_add_memory_map_pg(uboot_start, uboot_pages, EFI_BOOT_SERVICES_CODE,
false);
- efi_update_memory_map(uboot_start, uboot_pages, EFI_BOOT_SERVICES_CODE,
#if defined(__aarch64__) /*false, false);
- Runtime Services must be 64KiB aligned according to the
@@ -841,8 +866,8 @@ static void add_u_boot_and_runtime(void) runtime_end = (uintptr_t)__efi_runtime_stop; runtime_end = (runtime_end + runtime_mask) & ~runtime_mask; runtime_pages = (runtime_end - runtime_start) >> EFI_PAGE_SHIFT;
- efi_add_memory_map_pg(runtime_start, runtime_pages,
EFI_RUNTIME_SERVICES_CODE, false);
efi_update_memory_map(runtime_start, runtime_pages,
EFI_RUNTIME_SERVICES_CODE, false, false);
}
int efi_memory_init(void)
diff --git a/lib/lmb.c b/lib/lmb.c index 7ca44591e1d..d5aea1ed8fe 100644 --- a/lib/lmb.c +++ b/lib/lmb.c @@ -455,11 +455,11 @@ static int lmb_map_update_notify(phys_addr_t addr, phys_size_t size, u8 op, pages = efi_size_in_pages(size + (efi_addr & EFI_PAGE_MASK)); efi_addr &= ~EFI_PAGE_MASK;
- status = efi_add_memory_map_pg(efi_addr, pages,
- status = efi_update_memory_map(efi_addr, pages, op == MAP_OP_RESERVE ? EFI_BOOT_SERVICES_DATA : EFI_CONVENTIONAL_MEMORY,
false);
if (status != EFI_SUCCESS) { log_err("%s: LMB Map notify failure %lu\n", __func__, status & ~EFI_ERROR_MASK);false, false);

From: Ilias Apalodimas ilias.apalodimas@linaro.org
One of the problems OS installers face, when running in EFI, is that the mounted ISO after calling ExitBootServices goes away. For some distros this is a problem since they rely on finding some core packages before continuing the installation. Distros have works around this -- e.g Fedora has a special kernel command line parameter called inst.stage2 [0].
ACPI has NFIT and NVDIMM support to provide ramdisks to the OS, but we don't have anything in place for DTs. Linux and device trees have support for persistent memory devices.
It's worth noting that for linux to instantiate the /dev/pmemX device, the memory described in the pmem node has to be omitted from the EFI memory map we hand over to the OS if ZONE_DEVICES and SPARSEMEM is enabled. With those enabled the pmem driver ends up calling devm_memremap_pages() instead of devm_memremap(). The latter works whether the memory is omitted or marked as reserved, but mapping pages only works if the memory is omitted.
On top of that, depending on how the kernel is configured, that memory area must be page aligned or 2MB aligned. PowerPC is an exception here and requires 16MB alignment, but since we don't have EFI support for it, limit the alignment to 2MB.
Ensure that the ISO image is 2MB aligned and remove the region occupied by the image from the EFI memory map.
Signed-off-by: Ilias Apalodimas ilias.apalodimas@linaro.org Signed-off-by: Sughosh Ganu sughosh.ganu@linaro.org Reviewed-by: Heinrich Schuchardt xypron.glpk@gmx.de --- Changes since V2: None
lib/efi_loader/efi_bootmgr.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/lib/efi_loader/efi_bootmgr.c b/lib/efi_loader/efi_bootmgr.c index c6124c590d9..081eff057f4 100644 --- a/lib/efi_loader/efi_bootmgr.c +++ b/lib/efi_loader/efi_bootmgr.c @@ -18,6 +18,8 @@ #include <efi_loader.h> #include <efi_variable.h> #include <asm/unaligned.h> +#include <linux/kernel.h> +#include <linux/sizes.h>
static const struct efi_boot_services *bs; static const struct efi_runtime_services *rs; @@ -362,13 +364,16 @@ static efi_status_t prepare_loaded_image(u16 *label, ulong addr, ulong size, }
/* - * TODO: expose the ramdisk to OS. - * Need to pass the ramdisk information by the architecture-specific - * methods such as 'pmem' device-tree node. + * Linux supports 'pmem' which allows OS installers to find, reclaim + * the mounted images and continue the installation since the contents + * of the pmem region are treated as local media. + * + * The memory regions used for it needs to be carved out of the EFI + * memory map. */ - ret = efi_add_memory_map(addr, size, EFI_RESERVED_MEMORY_TYPE); + ret = efi_remove_memory_map(addr, size, EFI_CONVENTIONAL_MEMORY); if (ret != EFI_SUCCESS) { - log_err("Memory reservation failed\n"); + log_err("Failed to reserve memory\n"); goto err; }
@@ -490,6 +495,13 @@ static efi_status_t try_load_from_uri_path(struct efi_device_path_uri *uridp, ret = EFI_INVALID_PARAMETER; goto err; } + /* + * Depending on the kernel configuration, pmem memory area must be page + * aligned or 2MB aligned. PowerPC is an exception here and requires + * 16MB alignment, but since we don't have EFI support for it, limit + * the alignment to 2MB. + */ + image_size = ALIGN(image_size, SZ_2M);
/* * If the file extension is ".iso" or ".img", mount it and try to load

Add information about the type of blkmap device in the blkmap structure. Currently, the blkmap device is used for mapping to either a memory based block device, or another block device (linear mapping). Put information in the blkmap structure to identify if it is associated with a memory or linear mapped device. Which can then be used to take specific action based on the type of blkmap device.
Signed-off-by: Sughosh Ganu sughosh.ganu@linaro.org --- Changes since V2: New patch
cmd/blkmap.c | 16 ++++++++++++---- drivers/block/blkmap.c | 10 +++++++++- drivers/block/blkmap_helper.c | 2 +- include/blkmap.h | 12 +++++++++++- test/dm/blkmap.c | 16 ++++++++-------- 5 files changed, 41 insertions(+), 15 deletions(-)
diff --git a/cmd/blkmap.c b/cmd/blkmap.c index 164f80f1387..1bf0747ab16 100644 --- a/cmd/blkmap.c +++ b/cmd/blkmap.c @@ -119,15 +119,23 @@ static int do_blkmap_map(struct cmd_tbl *cmdtp, int flag, static int do_blkmap_create(struct cmd_tbl *cmdtp, int flag, int argc, char *const argv[]) { + enum blkmap_type type; const char *label; int err;
- if (argc != 2) + if (argc != 3) return CMD_RET_USAGE;
label = argv[1];
- err = blkmap_create(label, NULL); + if (!strcmp(argv[2], "linear")) + type = BLKMAP_LINEAR; + else if (!strcmp(argv[2], "mem")) + type = BLKMAP_MEM; + else + return CMD_RET_USAGE; + + err = blkmap_create(label, NULL, type); if (err) { printf("Unable to create "%s": %d\n", label, err); return CMD_RET_FAILURE; @@ -218,7 +226,7 @@ U_BOOT_CMD_WITH_SUBCMDS( "blkmap read <addr> <blk#> <cnt>\n" "blkmap write <addr> <blk#> <cnt>\n" "blkmap get <label> dev [<var>] - store device number in variable\n" - "blkmap create <label> - create device\n" + "blkmap create <label> <type> - create device(linear/mem)\n" "blkmap destroy <label> - destroy device\n" "blkmap map <label> <blk#> <cnt> linear <interface> <dev> <blk#> - device mapping\n" "blkmap map <label> <blk#> <cnt> mem <addr> - memory mapping\n", @@ -228,6 +236,6 @@ U_BOOT_CMD_WITH_SUBCMDS( U_BOOT_SUBCMD_MKENT(read, 5, 1, do_blkmap_common), U_BOOT_SUBCMD_MKENT(write, 5, 1, do_blkmap_common), U_BOOT_SUBCMD_MKENT(get, 5, 1, do_blkmap_get), - U_BOOT_SUBCMD_MKENT(create, 2, 1, do_blkmap_create), + U_BOOT_SUBCMD_MKENT(create, 3, 1, do_blkmap_create), U_BOOT_SUBCMD_MKENT(destroy, 2, 1, do_blkmap_destroy), U_BOOT_SUBCMD_MKENT(map, 32, 1, do_blkmap_map)); diff --git a/drivers/block/blkmap.c b/drivers/block/blkmap.c index 34eed1380dc..a817345b6bc 100644 --- a/drivers/block/blkmap.c +++ b/drivers/block/blkmap.c @@ -153,6 +153,9 @@ int blkmap_map_linear(struct udevice *dev, lbaint_t blknr, lbaint_t blkcnt, struct blk_desc *bd, *lbd; int err;
+ if (bm->type != BLKMAP_LINEAR) + return log_msg_ret("Invalid blkmap type", -EINVAL); + bd = dev_get_uclass_plat(bm->blk); lbd = dev_get_uclass_plat(lblk); if (lbd->blksz != bd->blksz) { @@ -240,6 +243,9 @@ int __blkmap_map_mem(struct udevice *dev, lbaint_t blknr, lbaint_t blkcnt, struct blkmap_mem *bmm; int err;
+ if (bm->type != BLKMAP_MEM) + return log_msg_ret("Invalid blkmap type", -EINVAL); + bmm = malloc(sizeof(*bmm)); if (!bmm) return -ENOMEM; @@ -435,7 +441,8 @@ struct udevice *blkmap_from_label(const char *label) return NULL; }
-int blkmap_create(const char *label, struct udevice **devp) +int blkmap_create(const char *label, struct udevice **devp, + enum blkmap_type type) { char *hname, *hlabel; struct udevice *dev; @@ -472,6 +479,7 @@ int blkmap_create(const char *label, struct udevice **devp) device_set_name_alloced(dev); bm = dev_get_plat(dev); bm->label = hlabel; + bm->type = type;
if (devp) *devp = dev; diff --git a/drivers/block/blkmap_helper.c b/drivers/block/blkmap_helper.c index bfba14110d2..56cbe57d4aa 100644 --- a/drivers/block/blkmap_helper.c +++ b/drivers/block/blkmap_helper.c @@ -19,7 +19,7 @@ int blkmap_create_ramdisk(const char *label, ulong image_addr, ulong image_size, struct blk_desc *desc; struct udevice *bm_dev;
- ret = blkmap_create(label, &bm_dev); + ret = blkmap_create(label, &bm_dev, BLKMAP_MEM); if (ret) { log_err("failed to create blkmap\n"); return ret; diff --git a/include/blkmap.h b/include/blkmap.h index d53095437fa..21169c30af1 100644 --- a/include/blkmap.h +++ b/include/blkmap.h @@ -9,6 +9,12 @@
#include <dm/lists.h>
+/* Type of blkmap device, Linear or Memory */ +enum blkmap_type { + BLKMAP_LINEAR = 1, + BLKMAP_MEM, +}; + /** * struct blkmap - Block map * @@ -16,11 +22,13 @@ * * @label: Human readable name of this blkmap * @blk: Underlying block device + * @type: Type of blkmap device * @slices: List of slices associated with this blkmap */ struct blkmap { char *label; struct udevice *blk; + enum blkmap_type type; struct list_head slices; };
@@ -78,9 +86,11 @@ struct udevice *blkmap_from_label(const char *label); * * @label: Label of the new blkmap * @devp: If not NULL, updated with the address of the resulting device + * @type: Type of blkmap device to create * Returns: 0 on success, negative error code on failure */ -int blkmap_create(const char *label, struct udevice **devp); +int blkmap_create(const char *label, struct udevice **devp, + enum blkmap_type type);
/** * blkmap_destroy() - Destroy blkmap diff --git a/test/dm/blkmap.c b/test/dm/blkmap.c index a6a0b4d4e20..06816cb4b54 100644 --- a/test/dm/blkmap.c +++ b/test/dm/blkmap.c @@ -56,7 +56,7 @@ static int dm_test_blkmap_read(struct unit_test_state *uts) struct udevice *dev, *blk; const struct mapping *m;
- ut_assertok(blkmap_create("rdtest", &dev)); + ut_assertok(blkmap_create("rdtest", &dev, BLKMAP_MEM)); ut_assertok(blk_get_from_parent(dev, &blk));
/* Generate an ordered and an unordered pattern in memory */ @@ -85,7 +85,7 @@ static int dm_test_blkmap_write(struct unit_test_state *uts) struct udevice *dev, *blk; const struct mapping *m;
- ut_assertok(blkmap_create("wrtest", &dev)); + ut_assertok(blkmap_create("wrtest", &dev, BLKMAP_MEM)); ut_assertok(blk_get_from_parent(dev, &blk));
/* Generate an ordered and an unordered pattern in memory */ @@ -114,7 +114,7 @@ static int dm_test_blkmap_slicing(struct unit_test_state *uts) { struct udevice *dev;
- ut_assertok(blkmap_create("slicetest", &dev)); + ut_assertok(blkmap_create("slicetest", &dev, BLKMAP_MEM));
ut_assertok(blkmap_map_mem(dev, 8, 8, NULL));
@@ -140,19 +140,19 @@ static int dm_test_blkmap_creation(struct unit_test_state *uts) { struct udevice *first, *second;
- ut_assertok(blkmap_create("first", &first)); + ut_assertok(blkmap_create("first", &first, BLKMAP_LINEAR));
/* Can't have two "first"s */ - ut_asserteq(-EBUSY, blkmap_create("first", &second)); + ut_asserteq(-EBUSY, blkmap_create("first", &second, BLKMAP_LINEAR));
/* But "second" should be fine */ - ut_assertok(blkmap_create("second", &second)); + ut_assertok(blkmap_create("second", &second, BLKMAP_LINEAR));
/* Once "first" is destroyed, we should be able to create it * again */ ut_assertok(blkmap_destroy(first)); - ut_assertok(blkmap_create("first", &first)); + ut_assertok(blkmap_create("first", &first, BLKMAP_LINEAR));
ut_assertok(blkmap_destroy(first)); ut_assertok(blkmap_destroy(second)); @@ -168,7 +168,7 @@ static int dm_test_cmd_blkmap(struct unit_test_state *uts) ut_assertok(run_command("blkmap info", 0)); ut_assert_console_end();
- ut_assertok(run_command("blkmap create ramdisk", 0)); + ut_assertok(run_command("blkmap create ramdisk mem", 0)); ut_assert_nextline("Created "ramdisk""); ut_assert_console_end();

On mån, jan 20, 2025 at 16:20, Sughosh Ganu sughosh.ganu@linaro.org wrote:
Add information about the type of blkmap device in the blkmap structure. Currently, the blkmap device is used for mapping to either a memory based block device, or another block device (linear mapping). Put information in the blkmap structure to identify if it is associated with a memory or linear mapped device. Which can then be used to take specific action based on the type of blkmap device.
Is this restriction really necessary? Why should it not be possible to setup a block map like this:
myblkmap: .--------. .-----. | slice0 +------> RAM | :--------: '-----' .-------. | slice1 +------------------> eMMC0 | :--------: .-------. '-------' | slice2 +------> eMMC1 | '........' '-------'
Linux's "device mapper", after which blkmaps are modeled, works in this way. I.e. a blkmap is just a collection of slices, and it is up to each slice how its data is provided, meaning that the user is free to compose their virtual block device in whatever way they need.
Looking at the pmem patch that follows this one, I am not able to find anything that would motivate restricting the functionality either.

On Mon, 20 Jan 2025 at 17:55, Tobias Waldekranz tobias@waldekranz.com wrote:
On mån, jan 20, 2025 at 16:20, Sughosh Ganu sughosh.ganu@linaro.org wrote:
Add information about the type of blkmap device in the blkmap structure. Currently, the blkmap device is used for mapping to either a memory based block device, or another block device (linear mapping). Put information in the blkmap structure to identify if it is associated with a memory or linear mapped device. Which can then be used to take specific action based on the type of blkmap device.
Is this restriction really necessary? Why should it not be possible to setup a block map like this:
myblkmap: .--------. .-----. | slice0 +------> RAM | :--------: '-----' .-------. | slice1 +------------------> eMMC0 | :--------: .-------. '-------' | slice2 +------> eMMC1 | '........' '-------'
Linux's "device mapper", after which blkmaps are modeled, works in this way. I.e. a blkmap is just a collection of slices, and it is up to each slice how its data is provided, meaning that the user is free to compose their virtual block device in whatever way they need.
The blkmap structure, the way it is designed, is pointing to the underlying block device. How can a single blkmap then be associated with slices of different types? Would that not contravene with the idea of a block device associating with a blkmap?
Looking at the pmem patch that follows this one, I am not able to find anything that would motivate restricting the functionality either.
The subsequent patch is adding the persistent memory node to the device-tree. The pmem node that is to be added is the memory mapped blkmap device. The logic does check for the type of the blkmap device and then proceeds to add the pmem node only for the memory mapped blkmaps.
-sughosh

On mån, jan 20, 2025 at 19:30, Sughosh Ganu sughosh.ganu@linaro.org wrote:
On Mon, 20 Jan 2025 at 17:55, Tobias Waldekranz tobias@waldekranz.com wrote:
On mån, jan 20, 2025 at 16:20, Sughosh Ganu sughosh.ganu@linaro.org wrote:
Add information about the type of blkmap device in the blkmap structure. Currently, the blkmap device is used for mapping to either a memory based block device, or another block device (linear mapping). Put information in the blkmap structure to identify if it is associated with a memory or linear mapped device. Which can then be used to take specific action based on the type of blkmap device.
Is this restriction really necessary? Why should it not be possible to setup a block map like this:
myblkmap: .--------. .-----. | slice0 +------> RAM | :--------: '-----' .-------. | slice1 +------------------> eMMC0 | :--------: .-------. '-------' | slice2 +------> eMMC1 | '........' '-------'
Linux's "device mapper", after which blkmaps are modeled, works in this way. I.e. a blkmap is just a collection of slices, and it is up to each slice how its data is provided, meaning that the user is free to compose their virtual block device in whatever way they need.
The blkmap structure, the way it is designed, is pointing to the underlying block device. How can a single blkmap then be associated
The `struct udevice *blk` from `struct blkmap` is a reference to the block device which represents the block map itself ("myblkmap" in the picture above), not any lower device.
with slices of different types? Would that not contravene with the idea of a block device associating with a blkmap?
For slices which are linear mappings (and are thus backed by some other underlying block device), their reference to that lower device ("eMMC0" and "eMMC1" above) is stored in the `struct udevice *blk` member of `struct blkmap_linear`.
Slices which are backed by memory does not have any reference to a lower device, but merely a pointer to the start of the mapping - `void *addr` in `struct blkmap_mem`.
The overarching idea is that the block map does not have to know anything about the implementation of how any individual slice chooses to provide its data. It only knows about their sizes and offsets. Based on that information, it simply routes incoming read/write requests to the correct slice.
Looking at the pmem patch that follows this one, I am not able to find anything that would motivate restricting the functionality either.
The subsequent patch is adding the persistent memory node to the device-tree. The pmem node that is to be added is the memory mapped blkmap device. The logic does check for the type of the blkmap device and then proceeds to add the pmem node only for the memory mapped blkmaps.
Sorry I am confused. Why do you need a block map device to add the pmem node to the device tree?

On Mon, 20 Jan 2025 at 20:06, Tobias Waldekranz tobias@waldekranz.com wrote:
On mån, jan 20, 2025 at 19:30, Sughosh Ganu sughosh.ganu@linaro.org wrote:
On Mon, 20 Jan 2025 at 17:55, Tobias Waldekranz tobias@waldekranz.com wrote:
On mån, jan 20, 2025 at 16:20, Sughosh Ganu sughosh.ganu@linaro.org wrote:
Add information about the type of blkmap device in the blkmap structure. Currently, the blkmap device is used for mapping to either a memory based block device, or another block device (linear mapping). Put information in the blkmap structure to identify if it is associated with a memory or linear mapped device. Which can then be used to take specific action based on the type of blkmap device.
Is this restriction really necessary? Why should it not be possible to setup a block map like this:
myblkmap: .--------. .-----. | slice0 +------> RAM | :--------: '-----' .-------. | slice1 +------------------> eMMC0 | :--------: .-------. '-------' | slice2 +------> eMMC1 | '........' '-------'
Linux's "device mapper", after which blkmaps are modeled, works in this way. I.e. a blkmap is just a collection of slices, and it is up to each slice how its data is provided, meaning that the user is free to compose their virtual block device in whatever way they need.
The blkmap structure, the way it is designed, is pointing to the underlying block device. How can a single blkmap then be associated
The `struct udevice *blk` from `struct blkmap` is a reference to the block device which represents the block map itself ("myblkmap" in the picture above), not any lower device.
Okay. I got confused with the comment associated with that member, which says, "Underlying block device". This I interpreted to be the block device that is associated with the blkmap structure.
with slices of different types? Would that not contravene with the idea of a block device associating with a blkmap?
For slices which are linear mappings (and are thus backed by some other underlying block device), their reference to that lower device ("eMMC0" and "eMMC1" above) is stored in the `struct udevice *blk` member of `struct blkmap_linear`.
Okay. But then, the computation of the blocksize seems to be happening at the blkmap device level, which again implies having the same set of slices associated with the blkmap. Any reason why the blksize is not taken from the block device associated with that slice? That would make it clear that the slice mapping type is independent from the parent blkmap device.
Slices which are backed by memory does not have any reference to a lower device, but merely a pointer to the start of the mapping - `void *addr` in `struct blkmap_mem`.
The overarching idea is that the block map does not have to know anything about the implementation of how any individual slice chooses to provide its data. It only knows about their sizes and offsets. Based on that information, it simply routes incoming read/write requests to the correct slice.
Okay. I think, for my solution, I will just need to move type identification to the slice, instead of the blkmap device.
Looking at the pmem patch that follows this one, I am not able to find anything that would motivate restricting the functionality either.
The subsequent patch is adding the persistent memory node to the device-tree. The pmem node that is to be added is the memory mapped blkmap device. The logic does check for the type of the blkmap device and then proceeds to add the pmem node only for the memory mapped blkmaps.
Sorry I am confused. Why do you need a block map device to add the pmem node to the device tree?
This is needed to include the RAM based block device information in the device-tree as pmem node. The OS installer then uses this pmem device as the block device which contains the installation packages, and proceeds with the OS installation.
-sughosh

On mån, jan 20, 2025 at 21:10, Sughosh Ganu sughosh.ganu@linaro.org wrote:
On Mon, 20 Jan 2025 at 20:06, Tobias Waldekranz tobias@waldekranz.com wrote:
On mån, jan 20, 2025 at 19:30, Sughosh Ganu sughosh.ganu@linaro.org wrote:
On Mon, 20 Jan 2025 at 17:55, Tobias Waldekranz tobias@waldekranz.com wrote:
On mån, jan 20, 2025 at 16:20, Sughosh Ganu sughosh.ganu@linaro.org wrote:
Add information about the type of blkmap device in the blkmap structure. Currently, the blkmap device is used for mapping to either a memory based block device, or another block device (linear mapping). Put information in the blkmap structure to identify if it is associated with a memory or linear mapped device. Which can then be used to take specific action based on the type of blkmap device.
Is this restriction really necessary? Why should it not be possible to setup a block map like this:
myblkmap: .--------. .-----. | slice0 +------> RAM | :--------: '-----' .-------. | slice1 +------------------> eMMC0 | :--------: .-------. '-------' | slice2 +------> eMMC1 | '........' '-------'
Linux's "device mapper", after which blkmaps are modeled, works in this way. I.e. a blkmap is just a collection of slices, and it is up to each slice how its data is provided, meaning that the user is free to compose their virtual block device in whatever way they need.
The blkmap structure, the way it is designed, is pointing to the underlying block device. How can a single blkmap then be associated
The `struct udevice *blk` from `struct blkmap` is a reference to the block device which represents the block map itself ("myblkmap" in the picture above), not any lower device.
Okay. I got confused with the comment associated with that member, which says, "Underlying block device". This I interpreted to be the block device that is associated with the blkmap structure.
Yeah I agree that it could be made clearer :)
with slices of different types? Would that not contravene with the idea of a block device associating with a blkmap?
For slices which are linear mappings (and are thus backed by some other underlying block device), their reference to that lower device ("eMMC0" and "eMMC1" above) is stored in the `struct udevice *blk` member of `struct blkmap_linear`.
Okay. But then, the computation of the blocksize seems to be happening at the blkmap device level, which again implies having the same set of slices associated with the blkmap. Any reason why the blksize is not taken from the block device associated with that slice? That would make it clear that the slice mapping type is independent from the parent blkmap device.
In the original series, only linear mappings to devices which used block sizes of 512 was supported, precisely because otherwise you need to do proper translation to work in all cases.
I tried to argue this point on the list back then: https://lore.kernel.org/u-boot/875y3wohrt.fsf@waldekranz.com/ but I did not get my point across and the restriction was lifted anyway.
Slices which are backed by memory does not have any reference to a lower device, but merely a pointer to the start of the mapping - `void *addr` in `struct blkmap_mem`.
The overarching idea is that the block map does not have to know anything about the implementation of how any individual slice chooses to provide its data. It only knows about their sizes and offsets. Based on that information, it simply routes incoming read/write requests to the correct slice.
Okay. I think, for my solution, I will just need to move type identification to the slice, instead of the blkmap device.
Looking at the pmem patch that follows this one, I am not able to find anything that would motivate restricting the functionality either.
The subsequent patch is adding the persistent memory node to the device-tree. The pmem node that is to be added is the memory mapped blkmap device. The logic does check for the type of the blkmap device and then proceeds to add the pmem node only for the memory mapped blkmaps.
Sorry I am confused. Why do you need a block map device to add the pmem node to the device tree?
This is needed to include the RAM based block device information in the device-tree as pmem node. The OS installer then uses this pmem device as the block device which contains the installation packages, and proceeds with the OS installation.
But even if the user has not setup a blkmap, don't you want to inject the pmem node in the DT anyway? All you need is the size and offset of the blob right? Is that not available from `image_setup_libfdt()`?

On Mon, 20 Jan 2025 at 21:50, Tobias Waldekranz tobias@waldekranz.com wrote:
On mån, jan 20, 2025 at 21:10, Sughosh Ganu sughosh.ganu@linaro.org wrote:
On Mon, 20 Jan 2025 at 20:06, Tobias Waldekranz tobias@waldekranz.com wrote:
On mån, jan 20, 2025 at 19:30, Sughosh Ganu sughosh.ganu@linaro.org wrote:
On Mon, 20 Jan 2025 at 17:55, Tobias Waldekranz tobias@waldekranz.com wrote:
On mån, jan 20, 2025 at 16:20, Sughosh Ganu sughosh.ganu@linaro.org wrote:
Add information about the type of blkmap device in the blkmap structure. Currently, the blkmap device is used for mapping to either a memory based block device, or another block device (linear mapping). Put information in the blkmap structure to identify if it is associated with a memory or linear mapped device. Which can then be used to take specific action based on the type of blkmap device.
Is this restriction really necessary? Why should it not be possible to setup a block map like this:
myblkmap: .--------. .-----. | slice0 +------> RAM | :--------: '-----' .-------. | slice1 +------------------> eMMC0 | :--------: .-------. '-------' | slice2 +------> eMMC1 | '........' '-------'
Linux's "device mapper", after which blkmaps are modeled, works in this way. I.e. a blkmap is just a collection of slices, and it is up to each slice how its data is provided, meaning that the user is free to compose their virtual block device in whatever way they need.
The blkmap structure, the way it is designed, is pointing to the underlying block device. How can a single blkmap then be associated
The `struct udevice *blk` from `struct blkmap` is a reference to the block device which represents the block map itself ("myblkmap" in the picture above), not any lower device.
Okay. I got confused with the comment associated with that member, which says, "Underlying block device". This I interpreted to be the block device that is associated with the blkmap structure.
Yeah I agree that it could be made clearer :)
with slices of different types? Would that not contravene with the idea of a block device associating with a blkmap?
For slices which are linear mappings (and are thus backed by some other underlying block device), their reference to that lower device ("eMMC0" and "eMMC1" above) is stored in the `struct udevice *blk` member of `struct blkmap_linear`.
Okay. But then, the computation of the blocksize seems to be happening at the blkmap device level, which again implies having the same set of slices associated with the blkmap. Any reason why the blksize is not taken from the block device associated with that slice? That would make it clear that the slice mapping type is independent from the parent blkmap device.
In the original series, only linear mappings to devices which used block sizes of 512 was supported, precisely because otherwise you need to do proper translation to work in all cases.
I tried to argue this point on the list back then: https://lore.kernel.org/u-boot/875y3wohrt.fsf@waldekranz.com/ but I did not get my point across and the restriction was lifted anyway.
Slices which are backed by memory does not have any reference to a lower device, but merely a pointer to the start of the mapping - `void *addr` in `struct blkmap_mem`.
The overarching idea is that the block map does not have to know anything about the implementation of how any individual slice chooses to provide its data. It only knows about their sizes and offsets. Based on that information, it simply routes incoming read/write requests to the correct slice.
Okay. I think, for my solution, I will just need to move type identification to the slice, instead of the blkmap device.
Looking at the pmem patch that follows this one, I am not able to find anything that would motivate restricting the functionality either.
The subsequent patch is adding the persistent memory node to the device-tree. The pmem node that is to be added is the memory mapped blkmap device. The logic does check for the type of the blkmap device and then proceeds to add the pmem node only for the memory mapped blkmaps.
Sorry I am confused. Why do you need a block map device to add the pmem node to the device tree?
This is needed to include the RAM based block device information in the device-tree as pmem node. The OS installer then uses this pmem device as the block device which contains the installation packages, and proceeds with the OS installation.
But even if the user has not setup a blkmap, don't you want to inject the pmem node in the DT anyway? All you need is the size and offset of the blob right? Is that not available from `image_setup_libfdt()`?
Not sure if I am getting your point. The image_setup_libfdt() function is fixing up the devicetree before it gets passed on to the OS. In my subsequent patch, the image_setup_libfdt() is calling the blkmap helper function (added in that patch) to check for any memory mapped blkmap devices, and add corresponding pmem nodes for those. The current case where this is happening in U-Boot is as part of EFI_HTTP_BOOT, where if an ISO or an img file has been obtained over the network, it gets mounted as a blkmap device, and that gets notified to the OS. So, if this happens to be an install image, the kernel knows about it through the pmem node in the DT.
If you are referring to a scenario where the memory based block device does not get set up as a blkmap device, that will anyways require explicit intervention of the user to add the pmem node, because otherwise there is no way to find out existence of such a device. This can then be done through an explicit command. But the EFI_HTTP_BOOT use-case does require scanning for the memory base blkmap devices.
-sughosh

On tis, jan 21, 2025 at 00:55, Sughosh Ganu sughosh.ganu@linaro.org wrote:
On Mon, 20 Jan 2025 at 21:50, Tobias Waldekranz tobias@waldekranz.com wrote:
On mån, jan 20, 2025 at 21:10, Sughosh Ganu sughosh.ganu@linaro.org wrote:
On Mon, 20 Jan 2025 at 20:06, Tobias Waldekranz tobias@waldekranz.com wrote:
On mån, jan 20, 2025 at 19:30, Sughosh Ganu sughosh.ganu@linaro.org wrote:
On Mon, 20 Jan 2025 at 17:55, Tobias Waldekranz tobias@waldekranz.com wrote:
On mån, jan 20, 2025 at 16:20, Sughosh Ganu sughosh.ganu@linaro.org wrote: > Add information about the type of blkmap device in the blkmap > structure. Currently, the blkmap device is used for mapping to either > a memory based block device, or another block device (linear > mapping). Put information in the blkmap structure to identify if it is > associated with a memory or linear mapped device. Which can then be > used to take specific action based on the type of blkmap device.
Is this restriction really necessary? Why should it not be possible to setup a block map like this:
myblkmap: .--------. .-----. | slice0 +------> RAM | :--------: '-----' .-------. | slice1 +------------------> eMMC0 | :--------: .-------. '-------' | slice2 +------> eMMC1 | '........' '-------'
Linux's "device mapper", after which blkmaps are modeled, works in this way. I.e. a blkmap is just a collection of slices, and it is up to each slice how its data is provided, meaning that the user is free to compose their virtual block device in whatever way they need.
The blkmap structure, the way it is designed, is pointing to the underlying block device. How can a single blkmap then be associated
The `struct udevice *blk` from `struct blkmap` is a reference to the block device which represents the block map itself ("myblkmap" in the picture above), not any lower device.
Okay. I got confused with the comment associated with that member, which says, "Underlying block device". This I interpreted to be the block device that is associated with the blkmap structure.
Yeah I agree that it could be made clearer :)
with slices of different types? Would that not contravene with the idea of a block device associating with a blkmap?
For slices which are linear mappings (and are thus backed by some other underlying block device), their reference to that lower device ("eMMC0" and "eMMC1" above) is stored in the `struct udevice *blk` member of `struct blkmap_linear`.
Okay. But then, the computation of the blocksize seems to be happening at the blkmap device level, which again implies having the same set of slices associated with the blkmap. Any reason why the blksize is not taken from the block device associated with that slice? That would make it clear that the slice mapping type is independent from the parent blkmap device.
In the original series, only linear mappings to devices which used block sizes of 512 was supported, precisely because otherwise you need to do proper translation to work in all cases.
I tried to argue this point on the list back then: https://lore.kernel.org/u-boot/875y3wohrt.fsf@waldekranz.com/ but I did not get my point across and the restriction was lifted anyway.
Slices which are backed by memory does not have any reference to a lower device, but merely a pointer to the start of the mapping - `void *addr` in `struct blkmap_mem`.
The overarching idea is that the block map does not have to know anything about the implementation of how any individual slice chooses to provide its data. It only knows about their sizes and offsets. Based on that information, it simply routes incoming read/write requests to the correct slice.
Okay. I think, for my solution, I will just need to move type identification to the slice, instead of the blkmap device.
Looking at the pmem patch that follows this one, I am not able to find anything that would motivate restricting the functionality either.
The subsequent patch is adding the persistent memory node to the device-tree. The pmem node that is to be added is the memory mapped blkmap device. The logic does check for the type of the blkmap device and then proceeds to add the pmem node only for the memory mapped blkmaps.
Sorry I am confused. Why do you need a block map device to add the pmem node to the device tree?
This is needed to include the RAM based block device information in the device-tree as pmem node. The OS installer then uses this pmem device as the block device which contains the installation packages, and proceeds with the OS installation.
But even if the user has not setup a blkmap, don't you want to inject the pmem node in the DT anyway? All you need is the size and offset of the blob right? Is that not available from `image_setup_libfdt()`?
Not sure if I am getting your point. The image_setup_libfdt() function is fixing up the devicetree before it gets passed on to the OS. In my subsequent patch, the image_setup_libfdt() is calling the blkmap helper function (added in that patch) to check for any memory mapped blkmap devices, and add corresponding pmem nodes for those. The current case where this is happening in U-Boot is as part of EFI_HTTP_BOOT, where if an ISO or an img file has been obtained over the network, it gets mounted as a blkmap device, and that gets notified to the OS. So, if this happens to be an install image, the kernel knows about it through the pmem node in the DT.
Alright, but then it seems like the implementation of EFI_HTTP_BOOT is the one with enough context to known when to add the pmem node, no? Could it not use the event subsystem to register a fixup? I.e. when it creates the block map, it also ought to know that the image must be protected. I imagine something like (pseudo code, but you get the idea):
+struct efi_pmem { + void *base; + size_t size; +} + +efi_add_pmem(void *_pmem, struct event *event) +{ + const struct event_ft_fixup *fixup = &event->data.ft_fixup; + struct efi_pmem pmem = _pmem; + + fdt_fixup_pmem_region(fixup->tree, pmem->addr, pmem->size); + free(pmem); +}
efi_http_boot() { ... bm = blkmap_ramdisk_create(imgbase, imgsize); + if (needs_protection) { + struct efi_pmem *pmem = xmalloc(sizeof(*pmem)); + pmem->base = imgbase; + pmem->size = imgsize; + event_register("efi_add_pmem", EVT_FT_FIXUP, efi_add_pmem, pmem); + } ... }
If you are referring to a scenario where the memory based block device does not get set up as a blkmap device, that will anyways require explicit intervention of the user to add the pmem node, because otherwise there is no way to find out existence of such a device. This can then be done through an explicit command. But the EFI_HTTP_BOOT use-case does require scanning for the memory base blkmap devices.
With the approach above, I think we could get out of marking every configured blkmap as reserved (which might not be what the user wants), and instead let the creator of the device decide on whether this is needed or not.

On Tue, 21 Jan 2025 at 03:06, Tobias Waldekranz tobias@waldekranz.com wrote:
On tis, jan 21, 2025 at 00:55, Sughosh Ganu sughosh.ganu@linaro.org wrote:
On Mon, 20 Jan 2025 at 21:50, Tobias Waldekranz tobias@waldekranz.com wrote:
On mån, jan 20, 2025 at 21:10, Sughosh Ganu sughosh.ganu@linaro.org wrote:
On Mon, 20 Jan 2025 at 20:06, Tobias Waldekranz tobias@waldekranz.com wrote:
On mån, jan 20, 2025 at 19:30, Sughosh Ganu sughosh.ganu@linaro.org wrote:
On Mon, 20 Jan 2025 at 17:55, Tobias Waldekranz tobias@waldekranz.com wrote: > > On mån, jan 20, 2025 at 16:20, Sughosh Ganu sughosh.ganu@linaro.org wrote: > > Add information about the type of blkmap device in the blkmap > > structure. Currently, the blkmap device is used for mapping to either > > a memory based block device, or another block device (linear > > mapping). Put information in the blkmap structure to identify if it is > > associated with a memory or linear mapped device. Which can then be > > used to take specific action based on the type of blkmap device. > > Is this restriction really necessary? Why should it not be possible to > setup a block map like this: > > myblkmap: > .--------. .-----. > | slice0 +------> RAM | > :--------: '-----' .-------. > | slice1 +------------------> eMMC0 | > :--------: .-------. '-------' > | slice2 +------> eMMC1 | > '........' '-------' > > Linux's "device mapper", after which blkmaps are modeled, works in this > way. I.e. a blkmap is just a collection of slices, and it is up to each > slice how its data is provided, meaning that the user is free to compose > their virtual block device in whatever way they need.
The blkmap structure, the way it is designed, is pointing to the underlying block device. How can a single blkmap then be associated
The `struct udevice *blk` from `struct blkmap` is a reference to the block device which represents the block map itself ("myblkmap" in the picture above), not any lower device.
Okay. I got confused with the comment associated with that member, which says, "Underlying block device". This I interpreted to be the block device that is associated with the blkmap structure.
Yeah I agree that it could be made clearer :)
with slices of different types? Would that not contravene with the idea of a block device associating with a blkmap?
For slices which are linear mappings (and are thus backed by some other underlying block device), their reference to that lower device ("eMMC0" and "eMMC1" above) is stored in the `struct udevice *blk` member of `struct blkmap_linear`.
Okay. But then, the computation of the blocksize seems to be happening at the blkmap device level, which again implies having the same set of slices associated with the blkmap. Any reason why the blksize is not taken from the block device associated with that slice? That would make it clear that the slice mapping type is independent from the parent blkmap device.
In the original series, only linear mappings to devices which used block sizes of 512 was supported, precisely because otherwise you need to do proper translation to work in all cases.
I tried to argue this point on the list back then: https://lore.kernel.org/u-boot/875y3wohrt.fsf@waldekranz.com/ but I did not get my point across and the restriction was lifted anyway.
Slices which are backed by memory does not have any reference to a lower device, but merely a pointer to the start of the mapping - `void *addr` in `struct blkmap_mem`.
The overarching idea is that the block map does not have to know anything about the implementation of how any individual slice chooses to provide its data. It only knows about their sizes and offsets. Based on that information, it simply routes incoming read/write requests to the correct slice.
Okay. I think, for my solution, I will just need to move type identification to the slice, instead of the blkmap device.
> > Looking at the pmem patch that follows this one, I am not able to find > anything that would motivate restricting the functionality either.
The subsequent patch is adding the persistent memory node to the device-tree. The pmem node that is to be added is the memory mapped blkmap device. The logic does check for the type of the blkmap device and then proceeds to add the pmem node only for the memory mapped blkmaps.
Sorry I am confused. Why do you need a block map device to add the pmem node to the device tree?
This is needed to include the RAM based block device information in the device-tree as pmem node. The OS installer then uses this pmem device as the block device which contains the installation packages, and proceeds with the OS installation.
But even if the user has not setup a blkmap, don't you want to inject the pmem node in the DT anyway? All you need is the size and offset of the blob right? Is that not available from `image_setup_libfdt()`?
Not sure if I am getting your point. The image_setup_libfdt() function is fixing up the devicetree before it gets passed on to the OS. In my subsequent patch, the image_setup_libfdt() is calling the blkmap helper function (added in that patch) to check for any memory mapped blkmap devices, and add corresponding pmem nodes for those. The current case where this is happening in U-Boot is as part of EFI_HTTP_BOOT, where if an ISO or an img file has been obtained over the network, it gets mounted as a blkmap device, and that gets notified to the OS. So, if this happens to be an install image, the kernel knows about it through the pmem node in the DT.
Alright, but then it seems like the implementation of EFI_HTTP_BOOT is the one with enough context to known when to add the pmem node, no? Could it not use the event subsystem to register a fixup? I.e. when it creates the block map, it also ought to know that the image must be protected. I imagine something like (pseudo code, but you get the idea):
+struct efi_pmem {
- void *base;
- size_t size;
+}
+efi_add_pmem(void *_pmem, struct event *event) +{
- const struct event_ft_fixup *fixup = &event->data.ft_fixup;
- struct efi_pmem pmem = _pmem;
- fdt_fixup_pmem_region(fixup->tree, pmem->addr, pmem->size);
- free(pmem);
+}
efi_http_boot() { ... bm = blkmap_ramdisk_create(imgbase, imgsize);
- if (needs_protection) {
struct efi_pmem *pmem = xmalloc(sizeof(*pmem));
pmem->base = imgbase;
pmem->size = imgsize;
event_register("efi_add_pmem", EVT_FT_FIXUP, efi_add_pmem, pmem);
- } ...
}
The earlier version was written on similar lines, where the helper function was getting the image address and size from the efi_http_boot context structure. There was a review comment asking me to decouple the solution from EFI_HTTP_BOOT, and add the pmem nodes after scanning the blkmap devices. This would make it a little more generic, and not tie it closely with EFI_HTTP_BOOT.
Also, one issue with using the event based framework is that the user might choose to pass a different devicetree to the OS from the one on which the fixup was done. Calling the helper function from image_setup_libfdt() ensures that the fixup happens on the DT that gets passed on to the OS.
If you are referring to a scenario where the memory based block device does not get set up as a blkmap device, that will anyways require explicit intervention of the user to add the pmem node, because otherwise there is no way to find out existence of such a device. This can then be done through an explicit command. But the EFI_HTTP_BOOT use-case does require scanning for the memory base blkmap devices.
With the approach above, I think we could get out of marking every configured blkmap as reserved (which might not be what the user wants), and instead let the creator of the device decide on whether this is needed or not.
I am not sure what you mean by marking the blkmap as reserved, but these changes are simply marking the blkmap devices in the DT before booting the OS. The kernel can then parse these pmem regions to check for the availability of an installation.
-sughosh

Hi Sughosh
On Mon, 20 Jan 2025 at 12:51, Sughosh Ganu sughosh.ganu@linaro.org wrote:
Add information about the type of blkmap device in the blkmap structure. Currently, the blkmap device is used for mapping to either a memory based block device, or another block device (linear mapping). Put information in the blkmap structure to identify if it is associated with a memory or linear mapped device. Which can then be used to take specific action based on the type of blkmap device.
I haven't followed up all the discussions yet, but I do have a question. Are blkmap devices now unconditionally added in a pmem node? (if they are backed by memory)
Thanks /Ilias
Signed-off-by: Sughosh Ganu sughosh.ganu@linaro.org
Changes since V2: New patch
cmd/blkmap.c | 16 ++++++++++++---- drivers/block/blkmap.c | 10 +++++++++- drivers/block/blkmap_helper.c | 2 +- include/blkmap.h | 12 +++++++++++- test/dm/blkmap.c | 16 ++++++++-------- 5 files changed, 41 insertions(+), 15 deletions(-)
diff --git a/cmd/blkmap.c b/cmd/blkmap.c index 164f80f1387..1bf0747ab16 100644 --- a/cmd/blkmap.c +++ b/cmd/blkmap.c @@ -119,15 +119,23 @@ static int do_blkmap_map(struct cmd_tbl *cmdtp, int flag, static int do_blkmap_create(struct cmd_tbl *cmdtp, int flag, int argc, char *const argv[]) {
enum blkmap_type type; const char *label; int err;
if (argc != 2)
if (argc != 3) return CMD_RET_USAGE; label = argv[1];
err = blkmap_create(label, NULL);
if (!strcmp(argv[2], "linear"))
type = BLKMAP_LINEAR;
else if (!strcmp(argv[2], "mem"))
type = BLKMAP_MEM;
else
return CMD_RET_USAGE;
err = blkmap_create(label, NULL, type); if (err) { printf("Unable to create \"%s\": %d\n", label, err); return CMD_RET_FAILURE;
@@ -218,7 +226,7 @@ U_BOOT_CMD_WITH_SUBCMDS( "blkmap read <addr> <blk#> <cnt>\n" "blkmap write <addr> <blk#> <cnt>\n" "blkmap get <label> dev [<var>] - store device number in variable\n"
"blkmap create <label> - create device\n"
"blkmap create <label> <type> - create device(linear/mem)\n" "blkmap destroy <label> - destroy device\n" "blkmap map <label> <blk#> <cnt> linear <interface> <dev> <blk#> - device mapping\n" "blkmap map <label> <blk#> <cnt> mem <addr> - memory mapping\n",
@@ -228,6 +236,6 @@ U_BOOT_CMD_WITH_SUBCMDS( U_BOOT_SUBCMD_MKENT(read, 5, 1, do_blkmap_common), U_BOOT_SUBCMD_MKENT(write, 5, 1, do_blkmap_common), U_BOOT_SUBCMD_MKENT(get, 5, 1, do_blkmap_get),
U_BOOT_SUBCMD_MKENT(create, 2, 1, do_blkmap_create),
U_BOOT_SUBCMD_MKENT(create, 3, 1, do_blkmap_create), U_BOOT_SUBCMD_MKENT(destroy, 2, 1, do_blkmap_destroy), U_BOOT_SUBCMD_MKENT(map, 32, 1, do_blkmap_map));
diff --git a/drivers/block/blkmap.c b/drivers/block/blkmap.c index 34eed1380dc..a817345b6bc 100644 --- a/drivers/block/blkmap.c +++ b/drivers/block/blkmap.c @@ -153,6 +153,9 @@ int blkmap_map_linear(struct udevice *dev, lbaint_t blknr, lbaint_t blkcnt, struct blk_desc *bd, *lbd; int err;
if (bm->type != BLKMAP_LINEAR)
return log_msg_ret("Invalid blkmap type", -EINVAL);
bd = dev_get_uclass_plat(bm->blk); lbd = dev_get_uclass_plat(lblk); if (lbd->blksz != bd->blksz) {
@@ -240,6 +243,9 @@ int __blkmap_map_mem(struct udevice *dev, lbaint_t blknr, lbaint_t blkcnt, struct blkmap_mem *bmm; int err;
if (bm->type != BLKMAP_MEM)
return log_msg_ret("Invalid blkmap type", -EINVAL);
bmm = malloc(sizeof(*bmm)); if (!bmm) return -ENOMEM;
@@ -435,7 +441,8 @@ struct udevice *blkmap_from_label(const char *label) return NULL; }
-int blkmap_create(const char *label, struct udevice **devp) +int blkmap_create(const char *label, struct udevice **devp,
enum blkmap_type type)
{ char *hname, *hlabel; struct udevice *dev; @@ -472,6 +479,7 @@ int blkmap_create(const char *label, struct udevice **devp) device_set_name_alloced(dev); bm = dev_get_plat(dev); bm->label = hlabel;
bm->type = type; if (devp) *devp = dev;
diff --git a/drivers/block/blkmap_helper.c b/drivers/block/blkmap_helper.c index bfba14110d2..56cbe57d4aa 100644 --- a/drivers/block/blkmap_helper.c +++ b/drivers/block/blkmap_helper.c @@ -19,7 +19,7 @@ int blkmap_create_ramdisk(const char *label, ulong image_addr, ulong image_size, struct blk_desc *desc; struct udevice *bm_dev;
ret = blkmap_create(label, &bm_dev);
ret = blkmap_create(label, &bm_dev, BLKMAP_MEM); if (ret) { log_err("failed to create blkmap\n"); return ret;
diff --git a/include/blkmap.h b/include/blkmap.h index d53095437fa..21169c30af1 100644 --- a/include/blkmap.h +++ b/include/blkmap.h @@ -9,6 +9,12 @@
#include <dm/lists.h>
+/* Type of blkmap device, Linear or Memory */ +enum blkmap_type {
BLKMAP_LINEAR = 1,
BLKMAP_MEM,
+};
/**
- struct blkmap - Block map
@@ -16,11 +22,13 @@
- @label: Human readable name of this blkmap
- @blk: Underlying block device
*/
- @type: Type of blkmap device
- @slices: List of slices associated with this blkmap
struct blkmap { char *label; struct udevice *blk;
enum blkmap_type type; struct list_head slices;
};
@@ -78,9 +86,11 @@ struct udevice *blkmap_from_label(const char *label);
- @label: Label of the new blkmap
- @devp: If not NULL, updated with the address of the resulting device
*/
- @type: Type of blkmap device to create
- Returns: 0 on success, negative error code on failure
-int blkmap_create(const char *label, struct udevice **devp); +int blkmap_create(const char *label, struct udevice **devp,
enum blkmap_type type);
/**
- blkmap_destroy() - Destroy blkmap
diff --git a/test/dm/blkmap.c b/test/dm/blkmap.c index a6a0b4d4e20..06816cb4b54 100644 --- a/test/dm/blkmap.c +++ b/test/dm/blkmap.c @@ -56,7 +56,7 @@ static int dm_test_blkmap_read(struct unit_test_state *uts) struct udevice *dev, *blk; const struct mapping *m;
ut_assertok(blkmap_create("rdtest", &dev));
ut_assertok(blkmap_create("rdtest", &dev, BLKMAP_MEM)); ut_assertok(blk_get_from_parent(dev, &blk)); /* Generate an ordered and an unordered pattern in memory */
@@ -85,7 +85,7 @@ static int dm_test_blkmap_write(struct unit_test_state *uts) struct udevice *dev, *blk; const struct mapping *m;
ut_assertok(blkmap_create("wrtest", &dev));
ut_assertok(blkmap_create("wrtest", &dev, BLKMAP_MEM)); ut_assertok(blk_get_from_parent(dev, &blk)); /* Generate an ordered and an unordered pattern in memory */
@@ -114,7 +114,7 @@ static int dm_test_blkmap_slicing(struct unit_test_state *uts) { struct udevice *dev;
ut_assertok(blkmap_create("slicetest", &dev));
ut_assertok(blkmap_create("slicetest", &dev, BLKMAP_MEM)); ut_assertok(blkmap_map_mem(dev, 8, 8, NULL));
@@ -140,19 +140,19 @@ static int dm_test_blkmap_creation(struct unit_test_state *uts) { struct udevice *first, *second;
ut_assertok(blkmap_create("first", &first));
ut_assertok(blkmap_create("first", &first, BLKMAP_LINEAR)); /* Can't have two "first"s */
ut_asserteq(-EBUSY, blkmap_create("first", &second));
ut_asserteq(-EBUSY, blkmap_create("first", &second, BLKMAP_LINEAR)); /* But "second" should be fine */
ut_assertok(blkmap_create("second", &second));
ut_assertok(blkmap_create("second", &second, BLKMAP_LINEAR)); /* Once "first" is destroyed, we should be able to create it * again */ ut_assertok(blkmap_destroy(first));
ut_assertok(blkmap_create("first", &first));
ut_assertok(blkmap_create("first", &first, BLKMAP_LINEAR)); ut_assertok(blkmap_destroy(first)); ut_assertok(blkmap_destroy(second));
@@ -168,7 +168,7 @@ static int dm_test_cmd_blkmap(struct unit_test_state *uts) ut_assertok(run_command("blkmap info", 0)); ut_assert_console_end();
ut_assertok(run_command("blkmap create ramdisk", 0));
ut_assertok(run_command("blkmap create ramdisk mem", 0)); ut_assert_nextline("Created \"ramdisk\""); ut_assert_console_end();
-- 2.34.1

On Tue, 21 Jan 2025 at 21:25, Ilias Apalodimas ilias.apalodimas@linaro.org wrote:
Hi Sughosh
On Mon, 20 Jan 2025 at 12:51, Sughosh Ganu sughosh.ganu@linaro.org wrote:
Add information about the type of blkmap device in the blkmap structure. Currently, the blkmap device is used for mapping to either a memory based block device, or another block device (linear mapping). Put information in the blkmap structure to identify if it is associated with a memory or linear mapped device. Which can then be used to take specific action based on the type of blkmap device.
I haven't followed up all the discussions yet, but I do have a question. Are blkmap devices now unconditionally added in a pmem node? (if they are backed by memory)
Yes, all memory backed blkmap devices are being added as pmem nodes.
-sughosh
Thanks /Ilias
Signed-off-by: Sughosh Ganu sughosh.ganu@linaro.org
Changes since V2: New patch
cmd/blkmap.c | 16 ++++++++++++---- drivers/block/blkmap.c | 10 +++++++++- drivers/block/blkmap_helper.c | 2 +- include/blkmap.h | 12 +++++++++++- test/dm/blkmap.c | 16 ++++++++-------- 5 files changed, 41 insertions(+), 15 deletions(-)
diff --git a/cmd/blkmap.c b/cmd/blkmap.c index 164f80f1387..1bf0747ab16 100644 --- a/cmd/blkmap.c +++ b/cmd/blkmap.c @@ -119,15 +119,23 @@ static int do_blkmap_map(struct cmd_tbl *cmdtp, int flag, static int do_blkmap_create(struct cmd_tbl *cmdtp, int flag, int argc, char *const argv[]) {
enum blkmap_type type; const char *label; int err;
if (argc != 2)
if (argc != 3) return CMD_RET_USAGE; label = argv[1];
err = blkmap_create(label, NULL);
if (!strcmp(argv[2], "linear"))
type = BLKMAP_LINEAR;
else if (!strcmp(argv[2], "mem"))
type = BLKMAP_MEM;
else
return CMD_RET_USAGE;
err = blkmap_create(label, NULL, type); if (err) { printf("Unable to create \"%s\": %d\n", label, err); return CMD_RET_FAILURE;
@@ -218,7 +226,7 @@ U_BOOT_CMD_WITH_SUBCMDS( "blkmap read <addr> <blk#> <cnt>\n" "blkmap write <addr> <blk#> <cnt>\n" "blkmap get <label> dev [<var>] - store device number in variable\n"
"blkmap create <label> - create device\n"
"blkmap create <label> <type> - create device(linear/mem)\n" "blkmap destroy <label> - destroy device\n" "blkmap map <label> <blk#> <cnt> linear <interface> <dev> <blk#> - device mapping\n" "blkmap map <label> <blk#> <cnt> mem <addr> - memory mapping\n",
@@ -228,6 +236,6 @@ U_BOOT_CMD_WITH_SUBCMDS( U_BOOT_SUBCMD_MKENT(read, 5, 1, do_blkmap_common), U_BOOT_SUBCMD_MKENT(write, 5, 1, do_blkmap_common), U_BOOT_SUBCMD_MKENT(get, 5, 1, do_blkmap_get),
U_BOOT_SUBCMD_MKENT(create, 2, 1, do_blkmap_create),
U_BOOT_SUBCMD_MKENT(create, 3, 1, do_blkmap_create), U_BOOT_SUBCMD_MKENT(destroy, 2, 1, do_blkmap_destroy), U_BOOT_SUBCMD_MKENT(map, 32, 1, do_blkmap_map));
diff --git a/drivers/block/blkmap.c b/drivers/block/blkmap.c index 34eed1380dc..a817345b6bc 100644 --- a/drivers/block/blkmap.c +++ b/drivers/block/blkmap.c @@ -153,6 +153,9 @@ int blkmap_map_linear(struct udevice *dev, lbaint_t blknr, lbaint_t blkcnt, struct blk_desc *bd, *lbd; int err;
if (bm->type != BLKMAP_LINEAR)
return log_msg_ret("Invalid blkmap type", -EINVAL);
bd = dev_get_uclass_plat(bm->blk); lbd = dev_get_uclass_plat(lblk); if (lbd->blksz != bd->blksz) {
@@ -240,6 +243,9 @@ int __blkmap_map_mem(struct udevice *dev, lbaint_t blknr, lbaint_t blkcnt, struct blkmap_mem *bmm; int err;
if (bm->type != BLKMAP_MEM)
return log_msg_ret("Invalid blkmap type", -EINVAL);
bmm = malloc(sizeof(*bmm)); if (!bmm) return -ENOMEM;
@@ -435,7 +441,8 @@ struct udevice *blkmap_from_label(const char *label) return NULL; }
-int blkmap_create(const char *label, struct udevice **devp) +int blkmap_create(const char *label, struct udevice **devp,
enum blkmap_type type)
{ char *hname, *hlabel; struct udevice *dev; @@ -472,6 +479,7 @@ int blkmap_create(const char *label, struct udevice **devp) device_set_name_alloced(dev); bm = dev_get_plat(dev); bm->label = hlabel;
bm->type = type; if (devp) *devp = dev;
diff --git a/drivers/block/blkmap_helper.c b/drivers/block/blkmap_helper.c index bfba14110d2..56cbe57d4aa 100644 --- a/drivers/block/blkmap_helper.c +++ b/drivers/block/blkmap_helper.c @@ -19,7 +19,7 @@ int blkmap_create_ramdisk(const char *label, ulong image_addr, ulong image_size, struct blk_desc *desc; struct udevice *bm_dev;
ret = blkmap_create(label, &bm_dev);
ret = blkmap_create(label, &bm_dev, BLKMAP_MEM); if (ret) { log_err("failed to create blkmap\n"); return ret;
diff --git a/include/blkmap.h b/include/blkmap.h index d53095437fa..21169c30af1 100644 --- a/include/blkmap.h +++ b/include/blkmap.h @@ -9,6 +9,12 @@
#include <dm/lists.h>
+/* Type of blkmap device, Linear or Memory */ +enum blkmap_type {
BLKMAP_LINEAR = 1,
BLKMAP_MEM,
+};
/**
- struct blkmap - Block map
@@ -16,11 +22,13 @@
- @label: Human readable name of this blkmap
- @blk: Underlying block device
*/
- @type: Type of blkmap device
- @slices: List of slices associated with this blkmap
struct blkmap { char *label; struct udevice *blk;
enum blkmap_type type; struct list_head slices;
};
@@ -78,9 +86,11 @@ struct udevice *blkmap_from_label(const char *label);
- @label: Label of the new blkmap
- @devp: If not NULL, updated with the address of the resulting device
*/
- @type: Type of blkmap device to create
- Returns: 0 on success, negative error code on failure
-int blkmap_create(const char *label, struct udevice **devp); +int blkmap_create(const char *label, struct udevice **devp,
enum blkmap_type type);
/**
- blkmap_destroy() - Destroy blkmap
diff --git a/test/dm/blkmap.c b/test/dm/blkmap.c index a6a0b4d4e20..06816cb4b54 100644 --- a/test/dm/blkmap.c +++ b/test/dm/blkmap.c @@ -56,7 +56,7 @@ static int dm_test_blkmap_read(struct unit_test_state *uts) struct udevice *dev, *blk; const struct mapping *m;
ut_assertok(blkmap_create("rdtest", &dev));
ut_assertok(blkmap_create("rdtest", &dev, BLKMAP_MEM)); ut_assertok(blk_get_from_parent(dev, &blk)); /* Generate an ordered and an unordered pattern in memory */
@@ -85,7 +85,7 @@ static int dm_test_blkmap_write(struct unit_test_state *uts) struct udevice *dev, *blk; const struct mapping *m;
ut_assertok(blkmap_create("wrtest", &dev));
ut_assertok(blkmap_create("wrtest", &dev, BLKMAP_MEM)); ut_assertok(blk_get_from_parent(dev, &blk)); /* Generate an ordered and an unordered pattern in memory */
@@ -114,7 +114,7 @@ static int dm_test_blkmap_slicing(struct unit_test_state *uts) { struct udevice *dev;
ut_assertok(blkmap_create("slicetest", &dev));
ut_assertok(blkmap_create("slicetest", &dev, BLKMAP_MEM)); ut_assertok(blkmap_map_mem(dev, 8, 8, NULL));
@@ -140,19 +140,19 @@ static int dm_test_blkmap_creation(struct unit_test_state *uts) { struct udevice *first, *second;
ut_assertok(blkmap_create("first", &first));
ut_assertok(blkmap_create("first", &first, BLKMAP_LINEAR)); /* Can't have two "first"s */
ut_asserteq(-EBUSY, blkmap_create("first", &second));
ut_asserteq(-EBUSY, blkmap_create("first", &second, BLKMAP_LINEAR)); /* But "second" should be fine */
ut_assertok(blkmap_create("second", &second));
ut_assertok(blkmap_create("second", &second, BLKMAP_LINEAR)); /* Once "first" is destroyed, we should be able to create it * again */ ut_assertok(blkmap_destroy(first));
ut_assertok(blkmap_create("first", &first));
ut_assertok(blkmap_create("first", &first, BLKMAP_LINEAR)); ut_assertok(blkmap_destroy(first)); ut_assertok(blkmap_destroy(second));
@@ -168,7 +168,7 @@ static int dm_test_cmd_blkmap(struct unit_test_state *uts) ut_assertok(run_command("blkmap info", 0)); ut_assert_console_end();
ut_assertok(run_command("blkmap create ramdisk", 0));
ut_assertok(run_command("blkmap create ramdisk mem", 0)); ut_assert_nextline("Created \"ramdisk\""); ut_assert_console_end();
-- 2.34.1

On Tue, 21 Jan 2025 at 18:02, Sughosh Ganu sughosh.ganu@linaro.org wrote:
On Tue, 21 Jan 2025 at 21:25, Ilias Apalodimas ilias.apalodimas@linaro.org wrote:
Hi Sughosh
On Mon, 20 Jan 2025 at 12:51, Sughosh Ganu sughosh.ganu@linaro.org wrote:
Add information about the type of blkmap device in the blkmap structure. Currently, the blkmap device is used for mapping to either a memory based block device, or another block device (linear mapping). Put information in the blkmap structure to identify if it is associated with a memory or linear mapped device. Which can then be used to take specific action based on the type of blkmap device.
I haven't followed up all the discussions yet, but I do have a question. Are blkmap devices now unconditionally added in a pmem node? (if they are backed by memory)
Yes, all memory backed blkmap devices are being added as pmem nodes.
Alright, I don't think we want that. We want a boolean to the mapping function, that cmd tools can conditionally set. Not all cases want to preserve memory for the OS. The EFI part can set that flag to true.
I need to do some more testing, but a quick test in QEMU had this: vda: vda1 nd_pmem namespace0.0: unable to guarantee persistence of writes nd_pmem namespace0.0: could not reserve region [mem 0x40200000-0x547fffff] nd_pmem namespace0.0: probe with driver nd_pmem failed with error -16
AFAICT the kernel behaves differently depending on its config and the only reliable way to make this work, is to remove the mapping from the EFI memory map. But that further complicates things because we can't have the blkmap functions randomly call EFI functions....
Regards /Ilias
-sughosh
Thanks /Ilias
Signed-off-by: Sughosh Ganu sughosh.ganu@linaro.org
Changes since V2: New patch
cmd/blkmap.c | 16 ++++++++++++---- drivers/block/blkmap.c | 10 +++++++++- drivers/block/blkmap_helper.c | 2 +- include/blkmap.h | 12 +++++++++++- test/dm/blkmap.c | 16 ++++++++-------- 5 files changed, 41 insertions(+), 15 deletions(-)
diff --git a/cmd/blkmap.c b/cmd/blkmap.c index 164f80f1387..1bf0747ab16 100644 --- a/cmd/blkmap.c +++ b/cmd/blkmap.c @@ -119,15 +119,23 @@ static int do_blkmap_map(struct cmd_tbl *cmdtp, int flag, static int do_blkmap_create(struct cmd_tbl *cmdtp, int flag, int argc, char *const argv[]) {
enum blkmap_type type; const char *label; int err;
if (argc != 2)
if (argc != 3) return CMD_RET_USAGE; label = argv[1];
err = blkmap_create(label, NULL);
if (!strcmp(argv[2], "linear"))
type = BLKMAP_LINEAR;
else if (!strcmp(argv[2], "mem"))
type = BLKMAP_MEM;
else
return CMD_RET_USAGE;
err = blkmap_create(label, NULL, type); if (err) { printf("Unable to create \"%s\": %d\n", label, err); return CMD_RET_FAILURE;
@@ -218,7 +226,7 @@ U_BOOT_CMD_WITH_SUBCMDS( "blkmap read <addr> <blk#> <cnt>\n" "blkmap write <addr> <blk#> <cnt>\n" "blkmap get <label> dev [<var>] - store device number in variable\n"
"blkmap create <label> - create device\n"
"blkmap create <label> <type> - create device(linear/mem)\n" "blkmap destroy <label> - destroy device\n" "blkmap map <label> <blk#> <cnt> linear <interface> <dev> <blk#> - device mapping\n" "blkmap map <label> <blk#> <cnt> mem <addr> - memory mapping\n",
@@ -228,6 +236,6 @@ U_BOOT_CMD_WITH_SUBCMDS( U_BOOT_SUBCMD_MKENT(read, 5, 1, do_blkmap_common), U_BOOT_SUBCMD_MKENT(write, 5, 1, do_blkmap_common), U_BOOT_SUBCMD_MKENT(get, 5, 1, do_blkmap_get),
U_BOOT_SUBCMD_MKENT(create, 2, 1, do_blkmap_create),
U_BOOT_SUBCMD_MKENT(create, 3, 1, do_blkmap_create), U_BOOT_SUBCMD_MKENT(destroy, 2, 1, do_blkmap_destroy), U_BOOT_SUBCMD_MKENT(map, 32, 1, do_blkmap_map));
diff --git a/drivers/block/blkmap.c b/drivers/block/blkmap.c index 34eed1380dc..a817345b6bc 100644 --- a/drivers/block/blkmap.c +++ b/drivers/block/blkmap.c @@ -153,6 +153,9 @@ int blkmap_map_linear(struct udevice *dev, lbaint_t blknr, lbaint_t blkcnt, struct blk_desc *bd, *lbd; int err;
if (bm->type != BLKMAP_LINEAR)
return log_msg_ret("Invalid blkmap type", -EINVAL);
bd = dev_get_uclass_plat(bm->blk); lbd = dev_get_uclass_plat(lblk); if (lbd->blksz != bd->blksz) {
@@ -240,6 +243,9 @@ int __blkmap_map_mem(struct udevice *dev, lbaint_t blknr, lbaint_t blkcnt, struct blkmap_mem *bmm; int err;
if (bm->type != BLKMAP_MEM)
return log_msg_ret("Invalid blkmap type", -EINVAL);
bmm = malloc(sizeof(*bmm)); if (!bmm) return -ENOMEM;
@@ -435,7 +441,8 @@ struct udevice *blkmap_from_label(const char *label) return NULL; }
-int blkmap_create(const char *label, struct udevice **devp) +int blkmap_create(const char *label, struct udevice **devp,
enum blkmap_type type)
{ char *hname, *hlabel; struct udevice *dev; @@ -472,6 +479,7 @@ int blkmap_create(const char *label, struct udevice **devp) device_set_name_alloced(dev); bm = dev_get_plat(dev); bm->label = hlabel;
bm->type = type; if (devp) *devp = dev;
diff --git a/drivers/block/blkmap_helper.c b/drivers/block/blkmap_helper.c index bfba14110d2..56cbe57d4aa 100644 --- a/drivers/block/blkmap_helper.c +++ b/drivers/block/blkmap_helper.c @@ -19,7 +19,7 @@ int blkmap_create_ramdisk(const char *label, ulong image_addr, ulong image_size, struct blk_desc *desc; struct udevice *bm_dev;
ret = blkmap_create(label, &bm_dev);
ret = blkmap_create(label, &bm_dev, BLKMAP_MEM); if (ret) { log_err("failed to create blkmap\n"); return ret;
diff --git a/include/blkmap.h b/include/blkmap.h index d53095437fa..21169c30af1 100644 --- a/include/blkmap.h +++ b/include/blkmap.h @@ -9,6 +9,12 @@
#include <dm/lists.h>
+/* Type of blkmap device, Linear or Memory */ +enum blkmap_type {
BLKMAP_LINEAR = 1,
BLKMAP_MEM,
+};
/**
- struct blkmap - Block map
@@ -16,11 +22,13 @@
- @label: Human readable name of this blkmap
- @blk: Underlying block device
*/
- @type: Type of blkmap device
- @slices: List of slices associated with this blkmap
struct blkmap { char *label; struct udevice *blk;
enum blkmap_type type; struct list_head slices;
};
@@ -78,9 +86,11 @@ struct udevice *blkmap_from_label(const char *label);
- @label: Label of the new blkmap
- @devp: If not NULL, updated with the address of the resulting device
*/
- @type: Type of blkmap device to create
- Returns: 0 on success, negative error code on failure
-int blkmap_create(const char *label, struct udevice **devp); +int blkmap_create(const char *label, struct udevice **devp,
enum blkmap_type type);
/**
- blkmap_destroy() - Destroy blkmap
diff --git a/test/dm/blkmap.c b/test/dm/blkmap.c index a6a0b4d4e20..06816cb4b54 100644 --- a/test/dm/blkmap.c +++ b/test/dm/blkmap.c @@ -56,7 +56,7 @@ static int dm_test_blkmap_read(struct unit_test_state *uts) struct udevice *dev, *blk; const struct mapping *m;
ut_assertok(blkmap_create("rdtest", &dev));
ut_assertok(blkmap_create("rdtest", &dev, BLKMAP_MEM)); ut_assertok(blk_get_from_parent(dev, &blk)); /* Generate an ordered and an unordered pattern in memory */
@@ -85,7 +85,7 @@ static int dm_test_blkmap_write(struct unit_test_state *uts) struct udevice *dev, *blk; const struct mapping *m;
ut_assertok(blkmap_create("wrtest", &dev));
ut_assertok(blkmap_create("wrtest", &dev, BLKMAP_MEM)); ut_assertok(blk_get_from_parent(dev, &blk)); /* Generate an ordered and an unordered pattern in memory */
@@ -114,7 +114,7 @@ static int dm_test_blkmap_slicing(struct unit_test_state *uts) { struct udevice *dev;
ut_assertok(blkmap_create("slicetest", &dev));
ut_assertok(blkmap_create("slicetest", &dev, BLKMAP_MEM)); ut_assertok(blkmap_map_mem(dev, 8, 8, NULL));
@@ -140,19 +140,19 @@ static int dm_test_blkmap_creation(struct unit_test_state *uts) { struct udevice *first, *second;
ut_assertok(blkmap_create("first", &first));
ut_assertok(blkmap_create("first", &first, BLKMAP_LINEAR)); /* Can't have two "first"s */
ut_asserteq(-EBUSY, blkmap_create("first", &second));
ut_asserteq(-EBUSY, blkmap_create("first", &second, BLKMAP_LINEAR)); /* But "second" should be fine */
ut_assertok(blkmap_create("second", &second));
ut_assertok(blkmap_create("second", &second, BLKMAP_LINEAR)); /* Once "first" is destroyed, we should be able to create it * again */ ut_assertok(blkmap_destroy(first));
ut_assertok(blkmap_create("first", &first));
ut_assertok(blkmap_create("first", &first, BLKMAP_LINEAR)); ut_assertok(blkmap_destroy(first)); ut_assertok(blkmap_destroy(second));
@@ -168,7 +168,7 @@ static int dm_test_cmd_blkmap(struct unit_test_state *uts) ut_assertok(run_command("blkmap info", 0)); ut_assert_console_end();
ut_assertok(run_command("blkmap create ramdisk", 0));
ut_assertok(run_command("blkmap create ramdisk mem", 0)); ut_assert_nextline("Created \"ramdisk\""); ut_assert_console_end();
-- 2.34.1

On Wed, 22 Jan 2025 at 03:18, Ilias Apalodimas ilias.apalodimas@linaro.org wrote:
On Tue, 21 Jan 2025 at 18:02, Sughosh Ganu sughosh.ganu@linaro.org wrote:
On Tue, 21 Jan 2025 at 21:25, Ilias Apalodimas ilias.apalodimas@linaro.org wrote:
Hi Sughosh
On Mon, 20 Jan 2025 at 12:51, Sughosh Ganu sughosh.ganu@linaro.org wrote:
Add information about the type of blkmap device in the blkmap structure. Currently, the blkmap device is used for mapping to either a memory based block device, or another block device (linear mapping). Put information in the blkmap structure to identify if it is associated with a memory or linear mapped device. Which can then be used to take specific action based on the type of blkmap device.
I haven't followed up all the discussions yet, but I do have a question. Are blkmap devices now unconditionally added in a pmem node? (if they are backed by memory)
Yes, all memory backed blkmap devices are being added as pmem nodes.
Alright, I don't think we want that. We want a boolean to the mapping function, that cmd tools can conditionally set. Not all cases want to preserve memory for the OS. The EFI part can set that flag to true.
I need to do some more testing, but a quick test in QEMU had this: vda: vda1 nd_pmem namespace0.0: unable to guarantee persistence of writes nd_pmem namespace0.0: could not reserve region [mem 0x40200000-0x547fffff] nd_pmem namespace0.0: probe with driver nd_pmem failed with error -16
I will check this, but this seems to be the case of the memory range not having been removed from the EFI memory map? In that case, like you mention below, I don't think that the blkmap code can ensure that. And then, it would be better to add the pmem node from the caller of the blkmap device addition (in this case the efi_http_boot code), rather than doing this from the blkmap driver.
-sughosh
AFAICT the kernel behaves differently depending on its config and the only reliable way to make this work, is to remove the mapping from the EFI memory map. But that further complicates things because we can't have the blkmap functions randomly call EFI functions....
Regards /Ilias
-sughosh
Thanks /Ilias
Signed-off-by: Sughosh Ganu sughosh.ganu@linaro.org
Changes since V2: New patch
cmd/blkmap.c | 16 ++++++++++++---- drivers/block/blkmap.c | 10 +++++++++- drivers/block/blkmap_helper.c | 2 +- include/blkmap.h | 12 +++++++++++- test/dm/blkmap.c | 16 ++++++++-------- 5 files changed, 41 insertions(+), 15 deletions(-)
diff --git a/cmd/blkmap.c b/cmd/blkmap.c index 164f80f1387..1bf0747ab16 100644 --- a/cmd/blkmap.c +++ b/cmd/blkmap.c @@ -119,15 +119,23 @@ static int do_blkmap_map(struct cmd_tbl *cmdtp, int flag, static int do_blkmap_create(struct cmd_tbl *cmdtp, int flag, int argc, char *const argv[]) {
enum blkmap_type type; const char *label; int err;
if (argc != 2)
if (argc != 3) return CMD_RET_USAGE; label = argv[1];
err = blkmap_create(label, NULL);
if (!strcmp(argv[2], "linear"))
type = BLKMAP_LINEAR;
else if (!strcmp(argv[2], "mem"))
type = BLKMAP_MEM;
else
return CMD_RET_USAGE;
err = blkmap_create(label, NULL, type); if (err) { printf("Unable to create \"%s\": %d\n", label, err); return CMD_RET_FAILURE;
@@ -218,7 +226,7 @@ U_BOOT_CMD_WITH_SUBCMDS( "blkmap read <addr> <blk#> <cnt>\n" "blkmap write <addr> <blk#> <cnt>\n" "blkmap get <label> dev [<var>] - store device number in variable\n"
"blkmap create <label> - create device\n"
"blkmap create <label> <type> - create device(linear/mem)\n" "blkmap destroy <label> - destroy device\n" "blkmap map <label> <blk#> <cnt> linear <interface> <dev> <blk#> - device mapping\n" "blkmap map <label> <blk#> <cnt> mem <addr> - memory mapping\n",
@@ -228,6 +236,6 @@ U_BOOT_CMD_WITH_SUBCMDS( U_BOOT_SUBCMD_MKENT(read, 5, 1, do_blkmap_common), U_BOOT_SUBCMD_MKENT(write, 5, 1, do_blkmap_common), U_BOOT_SUBCMD_MKENT(get, 5, 1, do_blkmap_get),
U_BOOT_SUBCMD_MKENT(create, 2, 1, do_blkmap_create),
U_BOOT_SUBCMD_MKENT(create, 3, 1, do_blkmap_create), U_BOOT_SUBCMD_MKENT(destroy, 2, 1, do_blkmap_destroy), U_BOOT_SUBCMD_MKENT(map, 32, 1, do_blkmap_map));
diff --git a/drivers/block/blkmap.c b/drivers/block/blkmap.c index 34eed1380dc..a817345b6bc 100644 --- a/drivers/block/blkmap.c +++ b/drivers/block/blkmap.c @@ -153,6 +153,9 @@ int blkmap_map_linear(struct udevice *dev, lbaint_t blknr, lbaint_t blkcnt, struct blk_desc *bd, *lbd; int err;
if (bm->type != BLKMAP_LINEAR)
return log_msg_ret("Invalid blkmap type", -EINVAL);
bd = dev_get_uclass_plat(bm->blk); lbd = dev_get_uclass_plat(lblk); if (lbd->blksz != bd->blksz) {
@@ -240,6 +243,9 @@ int __blkmap_map_mem(struct udevice *dev, lbaint_t blknr, lbaint_t blkcnt, struct blkmap_mem *bmm; int err;
if (bm->type != BLKMAP_MEM)
return log_msg_ret("Invalid blkmap type", -EINVAL);
bmm = malloc(sizeof(*bmm)); if (!bmm) return -ENOMEM;
@@ -435,7 +441,8 @@ struct udevice *blkmap_from_label(const char *label) return NULL; }
-int blkmap_create(const char *label, struct udevice **devp) +int blkmap_create(const char *label, struct udevice **devp,
enum blkmap_type type)
{ char *hname, *hlabel; struct udevice *dev; @@ -472,6 +479,7 @@ int blkmap_create(const char *label, struct udevice **devp) device_set_name_alloced(dev); bm = dev_get_plat(dev); bm->label = hlabel;
bm->type = type; if (devp) *devp = dev;
diff --git a/drivers/block/blkmap_helper.c b/drivers/block/blkmap_helper.c index bfba14110d2..56cbe57d4aa 100644 --- a/drivers/block/blkmap_helper.c +++ b/drivers/block/blkmap_helper.c @@ -19,7 +19,7 @@ int blkmap_create_ramdisk(const char *label, ulong image_addr, ulong image_size, struct blk_desc *desc; struct udevice *bm_dev;
ret = blkmap_create(label, &bm_dev);
ret = blkmap_create(label, &bm_dev, BLKMAP_MEM); if (ret) { log_err("failed to create blkmap\n"); return ret;
diff --git a/include/blkmap.h b/include/blkmap.h index d53095437fa..21169c30af1 100644 --- a/include/blkmap.h +++ b/include/blkmap.h @@ -9,6 +9,12 @@
#include <dm/lists.h>
+/* Type of blkmap device, Linear or Memory */ +enum blkmap_type {
BLKMAP_LINEAR = 1,
BLKMAP_MEM,
+};
/**
- struct blkmap - Block map
@@ -16,11 +22,13 @@
- @label: Human readable name of this blkmap
- @blk: Underlying block device
*/
- @type: Type of blkmap device
- @slices: List of slices associated with this blkmap
struct blkmap { char *label; struct udevice *blk;
enum blkmap_type type; struct list_head slices;
};
@@ -78,9 +86,11 @@ struct udevice *blkmap_from_label(const char *label);
- @label: Label of the new blkmap
- @devp: If not NULL, updated with the address of the resulting device
*/
- @type: Type of blkmap device to create
- Returns: 0 on success, negative error code on failure
-int blkmap_create(const char *label, struct udevice **devp); +int blkmap_create(const char *label, struct udevice **devp,
enum blkmap_type type);
/**
- blkmap_destroy() - Destroy blkmap
diff --git a/test/dm/blkmap.c b/test/dm/blkmap.c index a6a0b4d4e20..06816cb4b54 100644 --- a/test/dm/blkmap.c +++ b/test/dm/blkmap.c @@ -56,7 +56,7 @@ static int dm_test_blkmap_read(struct unit_test_state *uts) struct udevice *dev, *blk; const struct mapping *m;
ut_assertok(blkmap_create("rdtest", &dev));
ut_assertok(blkmap_create("rdtest", &dev, BLKMAP_MEM)); ut_assertok(blk_get_from_parent(dev, &blk)); /* Generate an ordered and an unordered pattern in memory */
@@ -85,7 +85,7 @@ static int dm_test_blkmap_write(struct unit_test_state *uts) struct udevice *dev, *blk; const struct mapping *m;
ut_assertok(blkmap_create("wrtest", &dev));
ut_assertok(blkmap_create("wrtest", &dev, BLKMAP_MEM)); ut_assertok(blk_get_from_parent(dev, &blk)); /* Generate an ordered and an unordered pattern in memory */
@@ -114,7 +114,7 @@ static int dm_test_blkmap_slicing(struct unit_test_state *uts) { struct udevice *dev;
ut_assertok(blkmap_create("slicetest", &dev));
ut_assertok(blkmap_create("slicetest", &dev, BLKMAP_MEM)); ut_assertok(blkmap_map_mem(dev, 8, 8, NULL));
@@ -140,19 +140,19 @@ static int dm_test_blkmap_creation(struct unit_test_state *uts) { struct udevice *first, *second;
ut_assertok(blkmap_create("first", &first));
ut_assertok(blkmap_create("first", &first, BLKMAP_LINEAR)); /* Can't have two "first"s */
ut_asserteq(-EBUSY, blkmap_create("first", &second));
ut_asserteq(-EBUSY, blkmap_create("first", &second, BLKMAP_LINEAR)); /* But "second" should be fine */
ut_assertok(blkmap_create("second", &second));
ut_assertok(blkmap_create("second", &second, BLKMAP_LINEAR)); /* Once "first" is destroyed, we should be able to create it * again */ ut_assertok(blkmap_destroy(first));
ut_assertok(blkmap_create("first", &first));
ut_assertok(blkmap_create("first", &first, BLKMAP_LINEAR)); ut_assertok(blkmap_destroy(first)); ut_assertok(blkmap_destroy(second));
@@ -168,7 +168,7 @@ static int dm_test_cmd_blkmap(struct unit_test_state *uts) ut_assertok(run_command("blkmap info", 0)); ut_assert_console_end();
ut_assertok(run_command("blkmap create ramdisk", 0));
ut_assertok(run_command("blkmap create ramdisk mem", 0)); ut_assert_nextline("Created \"ramdisk\""); ut_assert_console_end();
-- 2.34.1

The EFI HTTP boot puts the ISO installer image at some location in memory which needs to be added to the devicetree as persistent memory (pmem) node. The OS installer then gets information about the presence of this ISO image through the pmem node and proceeds with the installation.
In U-Boot, this ISO image gets mounted as a memory mapped blkmap device. Add a helper function which iterates through all such memory mapped blkmap devices, and calls the FDT fixup function to add the pmem node. Invoke this helper function as part of the DT fixup which happens before booting the OS.
Signed-off-by: Sughosh Ganu sughosh.ganu@linaro.org --- Changes since V2: New patch
boot/image-fdt.c | 9 ++++ drivers/block/blkmap.c | 80 ------------------------------ drivers/block/blkmap_helper.c | 45 +++++++++++++++++ include/blkmap.h | 91 +++++++++++++++++++++++++++++++++++ 4 files changed, 145 insertions(+), 80 deletions(-)
diff --git a/boot/image-fdt.c b/boot/image-fdt.c index 9d1598b1a93..9af00f406bb 100644 --- a/boot/image-fdt.c +++ b/boot/image-fdt.c @@ -8,6 +8,7 @@ * Wolfgang Denk, DENX Software Engineering, wd@denx.de. */
+#include <blkmap.h> #include <command.h> #include <fdt_support.h> #include <fdtdec.h> @@ -649,6 +650,14 @@ int image_setup_libfdt(struct bootm_headers *images, void *blob, bool lmb) if (!ft_verify_fdt(blob)) goto err;
+ if (CONFIG_IS_ENABLED(BLKMAP)) { + fdt_ret = blkmap_fdt_pmem_setup(blob); + if (fdt_ret) { + log_err("pmem node fixup failed\n"); + goto err; + } + } + /* after here we are using a livetree */ if (!of_live_active() && CONFIG_IS_ENABLED(EVENT)) { struct event_ft_fixup fixup; diff --git a/drivers/block/blkmap.c b/drivers/block/blkmap.c index a817345b6bc..f4a89277173 100644 --- a/drivers/block/blkmap.c +++ b/drivers/block/blkmap.c @@ -14,57 +14,6 @@ #include <dm/lists.h> #include <dm/root.h>
-struct blkmap; - -/** - * struct blkmap_slice - Region mapped to a blkmap - * - * Common data for a region mapped to a blkmap, specialized by each - * map type. - * - * @node: List node used to associate this slice with a blkmap - * @blknr: Start block number of the mapping - * @blkcnt: Number of blocks covered by this mapping - */ -struct blkmap_slice { - struct list_head node; - - lbaint_t blknr; - lbaint_t blkcnt; - - /** - * @read: - Read from slice - * - * @read.bm: Blkmap to which this slice belongs - * @read.bms: This slice - * @read.blknr: Start block number to read from - * @read.blkcnt: Number of blocks to read - * @read.buffer: Buffer to store read data to - */ - ulong (*read)(struct blkmap *bm, struct blkmap_slice *bms, - lbaint_t blknr, lbaint_t blkcnt, void *buffer); - - /** - * @write: - Write to slice - * - * @write.bm: Blkmap to which this slice belongs - * @write.bms: This slice - * @write.blknr: Start block number to write to - * @write.blkcnt: Number of blocks to write - * @write.buffer: Data to be written - */ - ulong (*write)(struct blkmap *bm, struct blkmap_slice *bms, - lbaint_t blknr, lbaint_t blkcnt, const void *buffer); - - /** - * @destroy: - Tear down slice - * - * @read.bm: Blkmap to which this slice belongs - * @read.bms: This slice - */ - void (*destroy)(struct blkmap *bm, struct blkmap_slice *bms); -}; - static bool blkmap_slice_contains(struct blkmap_slice *bms, lbaint_t blknr) { return (blknr >= bms->blknr) && (blknr < (bms->blknr + bms->blkcnt)); @@ -114,20 +63,6 @@ static int blkmap_slice_add(struct blkmap *bm, struct blkmap_slice *new) return 0; }
-/** - * struct blkmap_linear - Linear mapping to other block device - * - * @slice: Common map data - * @blk: Target block device of this mapping - * @blknr: Start block number of the target device - */ -struct blkmap_linear { - struct blkmap_slice slice; - - struct udevice *blk; - lbaint_t blknr; -}; - static ulong blkmap_linear_read(struct blkmap *bm, struct blkmap_slice *bms, lbaint_t blknr, lbaint_t blkcnt, void *buffer) { @@ -188,21 +123,6 @@ int blkmap_map_linear(struct udevice *dev, lbaint_t blknr, lbaint_t blkcnt, return err; }
-/** - * struct blkmap_mem - Memory mapping - * - * @slice: Common map data - * @addr: Target memory region of this mapping - * @remapped: True if @addr is backed by a physical to virtual memory - * mapping that must be torn down at the end of this mapping's - * lifetime. - */ -struct blkmap_mem { - struct blkmap_slice slice; - void *addr; - bool remapped; -}; - static ulong blkmap_mem_read(struct blkmap *bm, struct blkmap_slice *bms, lbaint_t blknr, lbaint_t blkcnt, void *buffer) { diff --git a/drivers/block/blkmap_helper.c b/drivers/block/blkmap_helper.c index 56cbe57d4aa..c91a4410d9c 100644 --- a/drivers/block/blkmap_helper.c +++ b/drivers/block/blkmap_helper.c @@ -7,8 +7,11 @@
#include <blk.h> #include <blkmap.h> +#include <fdt_support.h> #include <dm/device.h> #include <dm/device-internal.h> +#include <dm/uclass.h> +#include <linux/kernel.h>
int blkmap_create_ramdisk(const char *label, ulong image_addr, ulong image_size, struct udevice **devp) @@ -51,3 +54,45 @@ err:
return ret; } + +static int blkmap_add_pmem_node(void *fdt, struct blkmap *bm) +{ + int ret; + u32 size; + ulong addr; + struct blkmap_mem *bmm; + struct blkmap_slice *bms; + struct blk_desc *bd = dev_get_uclass_plat(bm->blk); + + list_for_each_entry(bms, &bm->slices, node) { + bmm = container_of(bms, struct blkmap_mem, slice); + + addr = (ulong)(uintptr_t)bmm->addr; + size = (u32)bms->blkcnt << bd->log2blksz; + + ret = fdt_fixup_pmem_region(fdt, addr, size); + if (ret) + return ret; + } + + return 0; +} + +int blkmap_fdt_pmem_setup(void *fdt) +{ + int ret; + struct udevice *dev; + struct uclass *uc; + struct blkmap *bm; + + uclass_id_foreach_dev(UCLASS_BLKMAP, dev, uc) { + bm = dev_get_plat(dev); + if (bm->type == BLKMAP_MEM) { + ret = blkmap_add_pmem_node(fdt, bm); + if (ret) + return ret; + } + } + + return 0; +} diff --git a/include/blkmap.h b/include/blkmap.h index 21169c30af1..4fe8ec2767c 100644 --- a/include/blkmap.h +++ b/include/blkmap.h @@ -7,6 +7,7 @@ #ifndef _BLKMAP_H #define _BLKMAP_H
+#include <blk.h> #include <dm/lists.h>
/* Type of blkmap device, Linear or Memory */ @@ -32,6 +33,84 @@ struct blkmap { struct list_head slices; };
+/** + * struct blkmap_slice - Region mapped to a blkmap + * + * Common data for a region mapped to a blkmap, specialized by each + * map type. + * + * @node: List node used to associate this slice with a blkmap + * @blknr: Start block number of the mapping + * @blkcnt: Number of blocks covered by this mapping + */ +struct blkmap_slice { + struct list_head node; + + lbaint_t blknr; + lbaint_t blkcnt; + + /** + * @read: - Read from slice + * + * @read.bm: Blkmap to which this slice belongs + * @read.bms: This slice + * @read.blknr: Start block number to read from + * @read.blkcnt: Number of blocks to read + * @read.buffer: Buffer to store read data to + */ + ulong (*read)(struct blkmap *bm, struct blkmap_slice *bms, + lbaint_t blknr, lbaint_t blkcnt, void *buffer); + + /** + * @write: - Write to slice + * + * @write.bm: Blkmap to which this slice belongs + * @write.bms: This slice + * @write.blknr: Start block number to write to + * @write.blkcnt: Number of blocks to write + * @write.buffer: Data to be written + */ + ulong (*write)(struct blkmap *bm, struct blkmap_slice *bms, + lbaint_t blknr, lbaint_t blkcnt, const void *buffer); + + /** + * @destroy: - Tear down slice + * + * @read.bm: Blkmap to which this slice belongs + * @read.bms: This slice + */ + void (*destroy)(struct blkmap *bm, struct blkmap_slice *bms); +}; + +/** + * struct blkmap_mem - Memory mapping + * + * @slice: Common map data + * @addr: Target memory region of this mapping + * @remapped: True if @addr is backed by a physical to virtual memory + * mapping that must be torn down at the end of this mapping's + * lifetime. + */ +struct blkmap_mem { + struct blkmap_slice slice; + void *addr; + bool remapped; +}; + +/** + * struct blkmap_linear - Linear mapping to other block device + * + * @slice: Common map data + * @blk: Target block device of this mapping + * @blknr: Start block number of the target device + */ +struct blkmap_linear { + struct blkmap_slice slice; + + struct udevice *blk; + lbaint_t blknr; +}; + /** * blkmap_map_linear() - Map region of other block device * @@ -112,4 +191,16 @@ int blkmap_destroy(struct udevice *dev); int blkmap_create_ramdisk(const char *label, ulong image_addr, ulong image_size, struct udevice **devp);
+/** + * blkmap_fdt_pmem_setup() - Add pmem nodes to the devicetree + * @fdt: Devicetree to add the pmem nodes to + * + * Iterate through all the blkmap devices, look for BLKMAP_MEM devices, + * and add pmem nodes corresponding to the blkmap slice to the + * devicetree. + * + * Returns: 0 on success, negative error on failure + */ +int blkmap_fdt_pmem_setup(void *fdt); + #endif /* _BLKMAP_H */

Heinrich, Tobias
There's a slight problem that I forgot when commenting v2.
Heinrich's idea of plugging this into blkmap is eventually the right thing to do.
However, when I started coding this I only added the pmem memory as 'reserved' in the DT hoping that would work. Unfortunately, this depends on a kernel config option. I've managed to track down the problem here[0], but I haven't found time to test it properly and send it upstream. So for this feature to work reliably we *need* to remove the memory map we hand over to the OS.
Since using EFI memmap function into the blkmap code makes no sense, can we perhaps merge v2 (or a variant of it), which only targets EFI, with an explanation of *why* while I try to sort out the kernel issue?
[0] https://git.linaro.org/people/ilias.apalodimas/net-next.git/commit/?h=main
Thanks /Ilias
On Mon, 20 Jan 2025 at 12:51, Sughosh Ganu sughosh.ganu@linaro.org wrote:
When installing a distro via EFI HTTP boot some OS installers expect the .iso image to be preserved and treat it as a "CDROM" to install packages.
This is problematic in EFI, since U-Boot mounts the image, starts the installer, and eventually calls ExitBootServices. At that point the image U-Boot mounted disappears. Some distros don't care and download the missing packages from a web archive, while others halt the installation complaining they can't find certain packages.
If the firmware uses ACPI, this is supported by using NFIT which provides NVDIMM ramdisks to the OS and preserves the image. We don't have anything in place for Device Trees though. Since DT supports persistent memory nodes (pmem) use those and preserve the .iso for installers.
The issue can be reproduced by attempting an EFI HTTP boot with Ubuntu live server ISO, or a Rocky Linux ISO. The installation would fail with the failure to locate certain packages.
The earlier version had aligned the addition of pmem nodes with the EFI HTTP boot feature. This has been changed, based on a review comment from Heinrich to instead be done by looping through the memory based blkmamp devices.
Changes since V2:
- Fix a checkpatch error for putting a blank line after a function
- Use blkmap device based scanning for adding the pmem nodes
Ilias Apalodimas (2): efi_loader: add a function to remove memory from the EFI map efi_loader: preserve installer images in pmem
Masahisa Kojima (1): fdt: add support for adding pmem nodes
Sughosh Ganu (2): blkmap: store type of blkmap device in corresponding structure blkmap: add pmem nodes for blkmap mem mapping devices
boot/fdt_support.c | 40 ++++++++++++- boot/image-fdt.c | 9 +++ cmd/blkmap.c | 16 ++++-- drivers/block/blkmap.c | 90 +++-------------------------- drivers/block/blkmap_helper.c | 47 +++++++++++++++- include/blkmap.h | 103 +++++++++++++++++++++++++++++++++- include/efi_loader.h | 11 ++-- include/fdt_support.h | 13 +++++ lib/efi_loader/efi_bootmgr.c | 22 ++++++-- lib/efi_loader/efi_memory.c | 51 ++++++++++++----- lib/lmb.c | 4 +- test/dm/blkmap.c | 16 +++--- 12 files changed, 302 insertions(+), 120 deletions(-)
-- 2.34.1

On Fri, Jan 24, 2025 at 01:39:45PM +0200, Ilias Apalodimas wrote:
Heinrich, Tobias
There's a slight problem that I forgot when commenting v2.
Heinrich's idea of plugging this into blkmap is eventually the right thing to do.
However, when I started coding this I only added the pmem memory as 'reserved' in the DT hoping that would work. Unfortunately, this depends on a kernel config option. I've managed to track down the problem here[0], but I haven't found time to test it properly and send it upstream. So for this feature to work reliably we *need* to remove the memory map we hand over to the OS.
Since using EFI memmap function into the blkmap code makes no sense, can we perhaps merge v2 (or a variant of it), which only targets EFI, with an explanation of *why* while I try to sort out the kernel issue?
[0] https://git.linaro.org/people/ilias.apalodimas/net-next.git/commit/?h=main
I think this is a reasonable path forward, FWIW.

On fre, jan 24, 2025 at 13:39, Ilias Apalodimas ilias.apalodimas@linaro.org wrote:
Heinrich, Tobias
There's a slight problem that I forgot when commenting v2.
Heinrich's idea of plugging this into blkmap is eventually the right thing to do.
However, when I started coding this I only added the pmem memory as 'reserved' in the DT hoping that would work. Unfortunately, this depends on a kernel config option. I've managed to track down the problem here[0], but I haven't found time to test it properly and send it upstream. So for this feature to work reliably we *need* to remove the memory map we hand over to the OS.
Since using EFI memmap function into the blkmap code makes no sense, can we perhaps merge v2 (or a variant of it), which only targets EFI, with an explanation of *why* while I try to sort out the kernel issue?
I was not a part of the first two iterations of this series, but my view is basically this:
Adding some flag to memory backed slices of block maps, that the fdt-fixup code can use to know whether a pmem node should be injected or not, is completely fine by me.
What I am opposed to is adding restrictions on how block maps can be composed, i.e., limiting a block map to only contain either linear or memory mappings.

On Fri, 24 Jan 2025 at 17:10, Ilias Apalodimas ilias.apalodimas@linaro.org wrote:
Heinrich, Tobias
There's a slight problem that I forgot when commenting v2.
Heinrich's idea of plugging this into blkmap is eventually the right thing to do.
However, when I started coding this I only added the pmem memory as 'reserved' in the DT hoping that would work. Unfortunately, this depends on a kernel config option. I've managed to track down the problem here[0], but I haven't found time to test it properly and send it upstream. So for this feature to work reliably we *need* to remove the memory map we hand over to the OS.
Since using EFI memmap function into the blkmap code makes no sense, can we perhaps merge v2 (or a variant of it), which only targets EFI, with an explanation of *why* while I try to sort out the kernel issue?
If it has been decided to go with the approach taken in V2, I will send an updated series which incorporates the other nits that Heinrich had on the patches. Those were handled in this version.
-sughosh
[0] https://git.linaro.org/people/ilias.apalodimas/net-next.git/commit/?h=main
Thanks /Ilias
On Mon, 20 Jan 2025 at 12:51, Sughosh Ganu sughosh.ganu@linaro.org wrote:
When installing a distro via EFI HTTP boot some OS installers expect the .iso image to be preserved and treat it as a "CDROM" to install packages.
This is problematic in EFI, since U-Boot mounts the image, starts the installer, and eventually calls ExitBootServices. At that point the image U-Boot mounted disappears. Some distros don't care and download the missing packages from a web archive, while others halt the installation complaining they can't find certain packages.
If the firmware uses ACPI, this is supported by using NFIT which provides NVDIMM ramdisks to the OS and preserves the image. We don't have anything in place for Device Trees though. Since DT supports persistent memory nodes (pmem) use those and preserve the .iso for installers.
The issue can be reproduced by attempting an EFI HTTP boot with Ubuntu live server ISO, or a Rocky Linux ISO. The installation would fail with the failure to locate certain packages.
The earlier version had aligned the addition of pmem nodes with the EFI HTTP boot feature. This has been changed, based on a review comment from Heinrich to instead be done by looping through the memory based blkmamp devices.
Changes since V2:
- Fix a checkpatch error for putting a blank line after a function
- Use blkmap device based scanning for adding the pmem nodes
Ilias Apalodimas (2): efi_loader: add a function to remove memory from the EFI map efi_loader: preserve installer images in pmem
Masahisa Kojima (1): fdt: add support for adding pmem nodes
Sughosh Ganu (2): blkmap: store type of blkmap device in corresponding structure blkmap: add pmem nodes for blkmap mem mapping devices
boot/fdt_support.c | 40 ++++++++++++- boot/image-fdt.c | 9 +++ cmd/blkmap.c | 16 ++++-- drivers/block/blkmap.c | 90 +++-------------------------- drivers/block/blkmap_helper.c | 47 +++++++++++++++- include/blkmap.h | 103 +++++++++++++++++++++++++++++++++- include/efi_loader.h | 11 ++-- include/fdt_support.h | 13 +++++ lib/efi_loader/efi_bootmgr.c | 22 ++++++-- lib/efi_loader/efi_memory.c | 51 ++++++++++++----- lib/lmb.c | 4 +- test/dm/blkmap.c | 16 +++--- 12 files changed, 302 insertions(+), 120 deletions(-)
-- 2.34.1

Hi Sughosh,
On Mon, 27 Jan 2025 at 08:47, Sughosh Ganu sughosh.ganu@linaro.org wrote:
On Fri, 24 Jan 2025 at 17:10, Ilias Apalodimas ilias.apalodimas@linaro.org wrote:
Heinrich, Tobias
There's a slight problem that I forgot when commenting v2.
Heinrich's idea of plugging this into blkmap is eventually the right thing to do.
However, when I started coding this I only added the pmem memory as 'reserved' in the DT hoping that would work. Unfortunately, this depends on a kernel config option. I've managed to track down the problem here[0], but I haven't found time to test it properly and send it upstream. So for this feature to work reliably we *need* to remove the memory map we hand over to the OS.
Since using EFI memmap function into the blkmap code makes no sense, can we perhaps merge v2 (or a variant of it), which only targets EFI, with an explanation of *why* while I try to sort out the kernel issue?
If it has been decided to go with the approach taken in V2, I will send an updated series which incorporates the other nits that Heinrich had on the patches. Those were handled in this version.
Yea I think that's the sane thing we can do for now, since this depends on being able to remove the memory for the map we present to the OS.
Just don't limit it to http only.
Cheers /Ilias
-sughosh
[0] https://git.linaro.org/people/ilias.apalodimas/net-next.git/commit/?h=main
Thanks /Ilias
On Mon, 20 Jan 2025 at 12:51, Sughosh Ganu sughosh.ganu@linaro.org wrote:
When installing a distro via EFI HTTP boot some OS installers expect the .iso image to be preserved and treat it as a "CDROM" to install packages.
This is problematic in EFI, since U-Boot mounts the image, starts the installer, and eventually calls ExitBootServices. At that point the image U-Boot mounted disappears. Some distros don't care and download the missing packages from a web archive, while others halt the installation complaining they can't find certain packages.
If the firmware uses ACPI, this is supported by using NFIT which provides NVDIMM ramdisks to the OS and preserves the image. We don't have anything in place for Device Trees though. Since DT supports persistent memory nodes (pmem) use those and preserve the .iso for installers.
The issue can be reproduced by attempting an EFI HTTP boot with Ubuntu live server ISO, or a Rocky Linux ISO. The installation would fail with the failure to locate certain packages.
The earlier version had aligned the addition of pmem nodes with the EFI HTTP boot feature. This has been changed, based on a review comment from Heinrich to instead be done by looping through the memory based blkmamp devices.
Changes since V2:
- Fix a checkpatch error for putting a blank line after a function
- Use blkmap device based scanning for adding the pmem nodes
Ilias Apalodimas (2): efi_loader: add a function to remove memory from the EFI map efi_loader: preserve installer images in pmem
Masahisa Kojima (1): fdt: add support for adding pmem nodes
Sughosh Ganu (2): blkmap: store type of blkmap device in corresponding structure blkmap: add pmem nodes for blkmap mem mapping devices
boot/fdt_support.c | 40 ++++++++++++- boot/image-fdt.c | 9 +++ cmd/blkmap.c | 16 ++++-- drivers/block/blkmap.c | 90 +++-------------------------- drivers/block/blkmap_helper.c | 47 +++++++++++++++- include/blkmap.h | 103 +++++++++++++++++++++++++++++++++- include/efi_loader.h | 11 ++-- include/fdt_support.h | 13 +++++ lib/efi_loader/efi_bootmgr.c | 22 ++++++-- lib/efi_loader/efi_memory.c | 51 ++++++++++++----- lib/lmb.c | 4 +- test/dm/blkmap.c | 16 +++--- 12 files changed, 302 insertions(+), 120 deletions(-)
-- 2.34.1
participants (5)
-
Heinrich Schuchardt
-
Ilias Apalodimas
-
Sughosh Ganu
-
Tobias Waldekranz
-
Tom Rini