[U-Boot] [PATCH 1/3] spl: ram: Do not memcpy() identical buffers

If the source and destination buffer address is identical, there is no need to memcpy() the content. Skip the memcpy() in such a case.
Signed-off-by: Marek Vasut marex@denx.de Cc: Michal Simek michal.simek@xilinx.com Cc: Tom Rini trini@konsulko.com --- common/spl/spl_ram.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/common/spl/spl_ram.c b/common/spl/spl_ram.c index 954e91a004..2ef33f717e 100644 --- a/common/spl/spl_ram.c +++ b/common/spl/spl_ram.c @@ -24,7 +24,8 @@ static ulong spl_ram_load_read(struct spl_load_info *load, ulong sector, { debug("%s: sector %lx, count %lx, buf %lx\n", __func__, sector, count, (ulong)buf); - memcpy(buf, (void *)(CONFIG_SPL_LOAD_FIT_ADDRESS + sector), count); + if (buf != CONFIG_SPL_LOAD_FIT_ADDRESS + sector) + memcpy(buf, (void *)(CONFIG_SPL_LOAD_FIT_ADDRESS + sector), count); return count; }

Add code to load simple fitImage from the kernel volume, the args volume is unnecessary.
Signed-off-by: Marek Vasut marex@denx.de Cc: Ladislav Michl ladis@linux-mips.org Cc: Tom Rini trini@konsulko.com --- common/spl/spl_ubi.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/common/spl/spl_ubi.c b/common/spl/spl_ubi.c index 67e5fadd7c..6addd46ece 100644 --- a/common/spl/spl_ubi.c +++ b/common/spl/spl_ubi.c @@ -11,6 +11,14 @@ #include <ubispl.h> #include <spl.h>
+static ulong spl_ubi_load_read(struct spl_load_info *load, ulong sector, + ulong count, void *buf) +{ + if (buf != (void *)CONFIG_SPL_LOAD_FIT_ADDRESS + sector) + memcpy(buf, (void *)(CONFIG_SYS_LOAD_ADDR + sector), count); + return count; +} + int spl_ubi_load_image(struct spl_image_info *spl_image, struct spl_boot_device *bootdev) { @@ -54,7 +62,19 @@ int spl_ubi_load_image(struct spl_image_info *spl_image, ret = ubispl_load_volumes(&info, volumes, 2); if (!ret) { header = (struct image_header *)volumes[0].load_addr; - spl_parse_image_header(spl_image, header); + + if (IS_ENABLED(CONFIG_SPL_LOAD_FIT) && + image_get_magic(header) == FDT_MAGIC) { + struct spl_load_info load; + + debug("Found FIT\n"); + load.bl_len = 1; + load.read = spl_ubi_load_read; + spl_load_simple_fit(spl_image, &load, 0, header); + } else { + spl_parse_image_header(spl_image, header); + } + puts("Linux loaded.\n"); goto out; }

On Tue, May 28, 2019 at 03:49:14AM +0200, Marek Vasut wrote:
Add code to load simple fitImage from the kernel volume, the args volume is unnecessary.
Signed-off-by: Marek Vasut marex@denx.de Cc: Ladislav Michl ladis@linux-mips.org Cc: Tom Rini trini@konsulko.com
common/spl/spl_ubi.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/common/spl/spl_ubi.c b/common/spl/spl_ubi.c index 67e5fadd7c..6addd46ece 100644 --- a/common/spl/spl_ubi.c +++ b/common/spl/spl_ubi.c @@ -11,6 +11,14 @@ #include <ubispl.h> #include <spl.h>
+static ulong spl_ubi_load_read(struct spl_load_info *load, ulong sector,
ulong count, void *buf)
+{
- if (buf != (void *)CONFIG_SPL_LOAD_FIT_ADDRESS + sector)
memcpy(buf, (void *)(CONFIG_SYS_LOAD_ADDR + sector), count);
- return count;
+}
int spl_ubi_load_image(struct spl_image_info *spl_image, struct spl_boot_device *bootdev) { @@ -54,7 +62,19 @@ int spl_ubi_load_image(struct spl_image_info *spl_image, ret = ubispl_load_volumes(&info, volumes, 2); if (!ret) { header = (struct image_header *)volumes[0].load_addr;
spl_parse_image_header(spl_image, header);
if (IS_ENABLED(CONFIG_SPL_LOAD_FIT) &&
image_get_magic(header) == FDT_MAGIC) {
struct spl_load_info load;
debug("Found FIT\n");
load.bl_len = 1;
load.read = spl_ubi_load_read;
spl_load_simple_fit(spl_image, &load, 0, header);
} else {
spl_parse_image_header(spl_image, header);
}
}puts("Linux loaded.\n"); goto out;
This breaks building on igep00x0 and am335x_igep003x, sorry.

If SPL OS_BOOT is enabled and CONFIG_SYS_SPL_ARGS_ADDR is defined, place the kernel DTB at that location, so the rest of the OS_BOOT machinery in SPL has the DTB blob at the correct location.
Signed-off-by: Marek Vasut marex@denx.de Cc: Tom Rini trini@konsulko.com --- common/spl/spl_fit.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/common/spl/spl_fit.c b/common/spl/spl_fit.c index 87ecf0bb9e..e3312b3477 100644 --- a/common/spl/spl_fit.c +++ b/common/spl/spl_fit.c @@ -291,7 +291,11 @@ static int spl_fit_append_fdt(struct spl_image_info *spl_image, * Read the device tree and place it after the image. * Align the destination address to ARCH_DMA_MINALIGN. */ +#if defined(CONFIG_SPL_OS_BOOT) && defined(CONFIG_SYS_SPL_ARGS_ADDR) + image_info.load_addr = CONFIG_SYS_SPL_ARGS_ADDR; +#else image_info.load_addr = spl_image->load_addr + spl_image->size; +#endif ret = spl_load_fit_image(info, sector, fit, base_offset, node, &image_info);
@@ -472,7 +476,7 @@ int spl_load_simple_fit(struct spl_image_info *spl_image, * Booting a next-stage U-Boot may require us to append the FDT. * We allow this to fail, as the U-Boot image might embed its FDT. */ - if (spl_image->os == IH_OS_U_BOOT) + if (spl_image->os == IH_OS_U_BOOT || spl_image->os == IH_OS_LINUX) spl_fit_append_fdt(spl_image, info, sector, fit, images, base_offset);

On Tue, May 28, 2019 at 03:49:15AM +0200, Marek Vasut wrote:
If SPL OS_BOOT is enabled and CONFIG_SYS_SPL_ARGS_ADDR is defined, place the kernel DTB at that location, so the rest of the OS_BOOT machinery in SPL has the DTB blob at the correct location.
Signed-off-by: Marek Vasut marex@denx.de Cc: Tom Rini trini@konsulko.com
common/spl/spl_fit.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/common/spl/spl_fit.c b/common/spl/spl_fit.c index 87ecf0bb9e..e3312b3477 100644 --- a/common/spl/spl_fit.c +++ b/common/spl/spl_fit.c @@ -291,7 +291,11 @@ static int spl_fit_append_fdt(struct spl_image_info *spl_image, * Read the device tree and place it after the image. * Align the destination address to ARCH_DMA_MINALIGN. */ +#if defined(CONFIG_SPL_OS_BOOT) && defined(CONFIG_SYS_SPL_ARGS_ADDR)
- image_info.load_addr = CONFIG_SYS_SPL_ARGS_ADDR;
+#else image_info.load_addr = spl_image->load_addr + spl_image->size; +#endif ret = spl_load_fit_image(info, sector, fit, base_offset, node, &image_info);
@@ -472,7 +476,7 @@ int spl_load_simple_fit(struct spl_image_info *spl_image, * Booting a next-stage U-Boot may require us to append the FDT. * We allow this to fail, as the U-Boot image might embed its FDT. */
- if (spl_image->os == IH_OS_U_BOOT)
- if (spl_image->os == IH_OS_U_BOOT || spl_image->os == IH_OS_LINUX) spl_fit_append_fdt(spl_image, info, sector, fit, images, base_offset);
As-is and I haven't tried to debug this yet, it breaks the U-Boot booting case for am335x_evm.

On Tue, May 28, 2019 at 03:49:13AM +0200, Marek Vasut wrote:
If the source and destination buffer address is identical, there is no need to memcpy() the content. Skip the memcpy() in such a case.
Signed-off-by: Marek Vasut marex@denx.de Cc: Michal Simek michal.simek@xilinx.com Cc: Tom Rini trini@konsulko.com
Shouldn't memcpy catch that itself?

On 5/28/19 4:06 AM, Tom Rini wrote:
On Tue, May 28, 2019 at 03:49:13AM +0200, Marek Vasut wrote:
If the source and destination buffer address is identical, there is no need to memcpy() the content. Skip the memcpy() in such a case.
Signed-off-by: Marek Vasut marex@denx.de Cc: Michal Simek michal.simek@xilinx.com Cc: Tom Rini trini@konsulko.com
Shouldn't memcpy catch that itself?
memcpy(3) says The memcpy() function copies n bytes from memory area src to memory area dest. The memory areas must not overlap. Use memmove(3) if the memory areas do overlap.

On Tue, May 28, 2019 at 04:07:44AM +0200, Marek Vasut wrote:
On 5/28/19 4:06 AM, Tom Rini wrote:
On Tue, May 28, 2019 at 03:49:13AM +0200, Marek Vasut wrote:
If the source and destination buffer address is identical, there is no need to memcpy() the content. Skip the memcpy() in such a case.
Signed-off-by: Marek Vasut marex@denx.de Cc: Michal Simek michal.simek@xilinx.com Cc: Tom Rini trini@konsulko.com
Shouldn't memcpy catch that itself?
memcpy(3) says The memcpy() function copies n bytes from memory area src to memory area dest. The memory areas must not overlap. Use memmove(3) if the memory areas do overlap.
OK, and shouldn't memcpy optimize that case? Does it usually?

On 5/28/19 4:42 AM, Tom Rini wrote:
On Tue, May 28, 2019 at 04:07:44AM +0200, Marek Vasut wrote:
On 5/28/19 4:06 AM, Tom Rini wrote:
On Tue, May 28, 2019 at 03:49:13AM +0200, Marek Vasut wrote:
If the source and destination buffer address is identical, there is no need to memcpy() the content. Skip the memcpy() in such a case.
Signed-off-by: Marek Vasut marex@denx.de Cc: Michal Simek michal.simek@xilinx.com Cc: Tom Rini trini@konsulko.com
Shouldn't memcpy catch that itself?
memcpy(3) says The memcpy() function copies n bytes from memory area src to memory area dest. The memory areas must not overlap. Use memmove(3) if the memory areas do overlap.
OK, and shouldn't memcpy optimize that case? Does it usually?
As the manpage says "The memory areas must not overlap." , I would expect it does not have to ?

On Tue, May 28, 2019 at 04:44:52AM +0200, Marek Vasut wrote:
On 5/28/19 4:42 AM, Tom Rini wrote:
On Tue, May 28, 2019 at 04:07:44AM +0200, Marek Vasut wrote:
On 5/28/19 4:06 AM, Tom Rini wrote:
On Tue, May 28, 2019 at 03:49:13AM +0200, Marek Vasut wrote:
If the source and destination buffer address is identical, there is no need to memcpy() the content. Skip the memcpy() in such a case.
Signed-off-by: Marek Vasut marex@denx.de Cc: Michal Simek michal.simek@xilinx.com Cc: Tom Rini trini@konsulko.com
Shouldn't memcpy catch that itself?
memcpy(3) says The memcpy() function copies n bytes from memory area src to memory area dest. The memory areas must not overlap. Use memmove(3) if the memory areas do overlap.
OK, and shouldn't memcpy optimize that case? Does it usually?
As the manpage says "The memory areas must not overlap." , I would expect it does not have to ?
I guess I'm not being clear enough, sorry. Go look at how this is implemented in a few places please and report back to us. Someone else, or many someone else, have probably already figured out if optimizing this case in general, in memcpy, is a good idea or not. Thanks!

On 5/28/19 5:04 AM, Tom Rini wrote:
On Tue, May 28, 2019 at 04:44:52AM +0200, Marek Vasut wrote:
On 5/28/19 4:42 AM, Tom Rini wrote:
On Tue, May 28, 2019 at 04:07:44AM +0200, Marek Vasut wrote:
On 5/28/19 4:06 AM, Tom Rini wrote:
On Tue, May 28, 2019 at 03:49:13AM +0200, Marek Vasut wrote:
If the source and destination buffer address is identical, there is no need to memcpy() the content. Skip the memcpy() in such a case.
Signed-off-by: Marek Vasut marex@denx.de Cc: Michal Simek michal.simek@xilinx.com Cc: Tom Rini trini@konsulko.com
Shouldn't memcpy catch that itself?
memcpy(3) says The memcpy() function copies n bytes from memory area src to memory area dest. The memory areas must not overlap. Use memmove(3) if the memory areas do overlap.
OK, and shouldn't memcpy optimize that case? Does it usually?
As the manpage says "The memory areas must not overlap." , I would expect it does not have to ?
I guess I'm not being clear enough, sorry. Go look at how this is implemented in a few places please and report back to us. Someone else, or many someone else, have probably already figured out if optimizing this case in general, in memcpy, is a good idea or not. Thanks!
If even [1] says the behavior is undefined, then what does it matter whether some implementation added an optimization for such undefined behavior? We cannot depend on it, can we ?
[1] https://pubs.opengroup.org/onlinepubs/009695399/functions/memcpy.html

On Tue, May 28, 2019 at 05:24:34AM +0200, Marek Vasut wrote:
On 5/28/19 5:04 AM, Tom Rini wrote:
On Tue, May 28, 2019 at 04:44:52AM +0200, Marek Vasut wrote:
On 5/28/19 4:42 AM, Tom Rini wrote:
On Tue, May 28, 2019 at 04:07:44AM +0200, Marek Vasut wrote:
On 5/28/19 4:06 AM, Tom Rini wrote:
On Tue, May 28, 2019 at 03:49:13AM +0200, Marek Vasut wrote:
> If the source and destination buffer address is identical, there is > no need to memcpy() the content. Skip the memcpy() in such a case. > > Signed-off-by: Marek Vasut marex@denx.de > Cc: Michal Simek michal.simek@xilinx.com > Cc: Tom Rini trini@konsulko.com
Shouldn't memcpy catch that itself?
memcpy(3) says The memcpy() function copies n bytes from memory area src to memory area dest. The memory areas must not overlap. Use memmove(3) if the memory areas do overlap.
OK, and shouldn't memcpy optimize that case? Does it usually?
As the manpage says "The memory areas must not overlap." , I would expect it does not have to ?
I guess I'm not being clear enough, sorry. Go look at how this is implemented in a few places please and report back to us. Someone else, or many someone else, have probably already figured out if optimizing this case in general, in memcpy, is a good idea or not. Thanks!
If even [1] says the behavior is undefined, then what does it matter whether some implementation added an optimization for such undefined behavior? We cannot depend on it, can we ?
[1] https://pubs.opengroup.org/onlinepubs/009695399/functions/memcpy.html
Yes, yes it would be worth seeing if other groups that have looked into the performance impact here have also decided to optimize this undefined behavior or not, thanks.

On 5/28/2019 4:19 AM, Tom Rini wrote:
On Tue, May 28, 2019 at 05:24:34AM +0200, Marek Vasut wrote:
On 5/28/19 5:04 AM, Tom Rini wrote:
On Tue, May 28, 2019 at 04:44:52AM +0200, Marek Vasut wrote:
On 5/28/19 4:42 AM, Tom Rini wrote:
On Tue, May 28, 2019 at 04:07:44AM +0200, Marek Vasut wrote:
On 5/28/19 4:06 AM, Tom Rini wrote: > On Tue, May 28, 2019 at 03:49:13AM +0200, Marek Vasut wrote: > >> If the source and destination buffer address is identical, there is >> no need to memcpy() the content. Skip the memcpy() in such a case. >> >> Signed-off-by: Marek Vasut marex@denx.de >> Cc: Michal Simek michal.simek@xilinx.com >> Cc: Tom Rini trini@konsulko.com > Shouldn't memcpy catch that itself? > memcpy(3) says The memcpy() function copies n bytes from memory area src to memory area dest. The memory areas must not overlap. Use memmove(3) if the memory areas do overlap.
OK, and shouldn't memcpy optimize that case? Does it usually?
As the manpage says "The memory areas must not overlap." , I would expect it does not have to ?
I guess I'm not being clear enough, sorry. Go look at how this is implemented in a few places please and report back to us. Someone else, or many someone else, have probably already figured out if optimizing this case in general, in memcpy, is a good idea or not. Thanks!
If even [1] says the behavior is undefined, then what does it matter whether some implementation added an optimization for such undefined behavior? We cannot depend on it, can we ?
[1] https://pubs.opengroup.org/onlinepubs/009695399/functions/memcpy.html
Yes, yes it would be worth seeing if other groups that have looked into the performance impact here have also decided to optimize this undefined behavior or not, thanks.
I don't think this is an optimization question, really. Calling memcpy with overlapping areas is an error. The result is explicitly undefined. It may well be that all the existing implementations effectively do nothing, either by explicit check or by actually copying the data over itself. However, to rely on that behavior is asking for trouble down the line. Undefined behavior is exactly that. Don't do it.
U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot

On 5/28/19 1:19 PM, Tom Rini wrote:
On Tue, May 28, 2019 at 05:24:34AM +0200, Marek Vasut wrote:
On 5/28/19 5:04 AM, Tom Rini wrote:
On Tue, May 28, 2019 at 04:44:52AM +0200, Marek Vasut wrote:
On 5/28/19 4:42 AM, Tom Rini wrote:
On Tue, May 28, 2019 at 04:07:44AM +0200, Marek Vasut wrote:
On 5/28/19 4:06 AM, Tom Rini wrote: > On Tue, May 28, 2019 at 03:49:13AM +0200, Marek Vasut wrote: > >> If the source and destination buffer address is identical, there is >> no need to memcpy() the content. Skip the memcpy() in such a case. >> >> Signed-off-by: Marek Vasut marex@denx.de >> Cc: Michal Simek michal.simek@xilinx.com >> Cc: Tom Rini trini@konsulko.com > > Shouldn't memcpy catch that itself? > memcpy(3) says The memcpy() function copies n bytes from memory area src to memory area dest. The memory areas must not overlap. Use memmove(3) if the memory areas do overlap.
OK, and shouldn't memcpy optimize that case? Does it usually?
As the manpage says "The memory areas must not overlap." , I would expect it does not have to ?
I guess I'm not being clear enough, sorry. Go look at how this is implemented in a few places please and report back to us. Someone else, or many someone else, have probably already figured out if optimizing this case in general, in memcpy, is a good idea or not. Thanks!
If even [1] says the behavior is undefined, then what does it matter whether some implementation added an optimization for such undefined behavior? We cannot depend on it, can we ?
[1] https://pubs.opengroup.org/onlinepubs/009695399/functions/memcpy.html
Yes, yes it would be worth seeing if other groups that have looked into the performance impact here have also decided to optimize this undefined behavior or not, thanks.
I will just drop this patch, since U-Boot memcpy() implementation does this check. But let me be very clear here, that check is part of undefined behavior (!) and I don't think it's the right thing to do in memcpy() itself.
participants (3)
-
J. William Campbell
-
Marek Vasut
-
Tom Rini