[U-Boot] [PATCH v2 0/8] arm: a few steps to reduce the boot time

This patchset reduces the boot time for ARM architecture, Exynos boards, and boards with DFU enabled.
For tested Trats2 and Odroid X2 devices, this was done in four steps.
1. Enable the arch memcpy and memset
2. Enable arch memset for .bss clear
3. Reduce the .bss section small as possible by removing the static dfu buffer (32MiB in .bss - Trats2), and use the malloc.
4. Skip zeroing the memory reserved for malloc at malloc init. For Trats2 it was 80MiB of memory.
The .bss section will grow if we have a lot of static variables. This section is cleared before jump to the relocated U-Boot, and it was done word by word. To reduce the time for this step, we can enable arch memset, which uses multiple ARM registers.
For configs with DFU enabled, we can find the dfu buffer in this section, which has at least 8MB (32MB for Trats2). This is a lot of useless data, which is not required for standard boot. So this buffer should be dynamic allocated.
The next issue was about the malloc reserved memory, which was zeroed at malloc init in one of early init call. Some boards has more than one MiB of reserved malloc memory. Zeroing this, adds another boot delay. Now the memory is zeroed only on calloc call.
So, actually the all was about unnecessary operations on 'big' data.
Przemyslaw Marczak (8): exynos: config: enable arch memcpy and arch memset arm: relocation: clear .bss section with arch memset if defined dfu: mmc: file buffer: remove static allocation dlmalloc: add option for skip memset in malloc init README: add info about skip memset at malloc init kconfig: malloc: add option for skip memset at malloc init trats2: defconfig: enable expert and skip memset at malloc init odroid: defconfig: enable expert and skip malloc memset
Kconfig | 26 +++++++++++++++++++------- README | 7 +++++++ arch/arm/lib/crt0.S | 10 +++++++++- common/dlmalloc.c | 10 +++++++--- configs/odroid_defconfig | 2 ++ configs/trats2_defconfig | 2 ++ drivers/dfu/dfu_mmc.c | 25 ++++++++++++++++++++++--- include/configs/exynos-common.h | 3 +++ 8 files changed, 71 insertions(+), 14 deletions(-)

This commit enables the following configs: - CONFIG_USE_ARCH_MEMCPY - CONFIG_USE_ARCH_MEMSET This increases the performance of memcpy/memset and also reduces the boot time.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry: - ~1527ms - before this change (arch memset enabled for .bss clear) - ~1384ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Cc: Minkyu Kang mk7.kang@samsung.com Cc: Akshay Saraswat akshay.s@samsung.com Cc: Simon Glass sjg@chromium.org Cc: Sjoerd Simons sjoerd.simons@collabora.co.uk --- include/configs/exynos-common.h | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/include/configs/exynos-common.h b/include/configs/exynos-common.h index 1f3ee55..5c14c40 100644 --- a/include/configs/exynos-common.h +++ b/include/configs/exynos-common.h @@ -30,6 +30,9 @@ #define CONFIG_SKIP_LOWLEVEL_INIT #define CONFIG_BOARD_EARLY_INIT_F
+#define CONFIG_USE_ARCH_MEMCPY +#define CONFIG_USE_ARCH_MEMSET + /* Keep L2 Cache Disabled */ #define CONFIG_CMD_CACHE

On 16 February 2015 at 08:13, Przemyslaw Marczak p.marczak@samsung.com wrote:
This commit enables the following configs:
- CONFIG_USE_ARCH_MEMCPY
- CONFIG_USE_ARCH_MEMSET
This increases the performance of memcpy/memset and also reduces the boot time.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry:
- ~1527ms - before this change (arch memset enabled for .bss clear)
- ~1384ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Cc: Minkyu Kang mk7.kang@samsung.com Cc: Akshay Saraswat akshay.s@samsung.com Cc: Simon Glass sjg@chromium.org Cc: Sjoerd Simons sjoerd.simons@collabora.co.uk
include/configs/exynos-common.h | 3 +++ 1 file changed, 3 insertions(+)
Reviewed-by: Simon Glass sjg@chromium.org
BTW in case you are interested, in the Chromium U-Boot tree (chromeos-v2013.06 branch) we have exynos support for turning on the cache in SPL and leaving it on through to the end of U-Boot. It runs two SPLs and two U-Boots (with verified boot and kernel verification) in a total of about 750ms. This shipped last year with Pit and Pi (Samsung Chromebook 2).
Might be some interesting patches there...
Regards, Simon

Hello,
On 02/18/2015 05:23 AM, Simon Glass wrote:
On 16 February 2015 at 08:13, Przemyslaw Marczak p.marczak@samsung.com wrote:
This commit enables the following configs:
- CONFIG_USE_ARCH_MEMCPY
- CONFIG_USE_ARCH_MEMSET
This increases the performance of memcpy/memset and also reduces the boot time.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry:
- ~1527ms - before this change (arch memset enabled for .bss clear)
- ~1384ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Cc: Minkyu Kang mk7.kang@samsung.com Cc: Akshay Saraswat akshay.s@samsung.com Cc: Simon Glass sjg@chromium.org Cc: Sjoerd Simons sjoerd.simons@collabora.co.uk
include/configs/exynos-common.h | 3 +++ 1 file changed, 3 insertions(+)
Reviewed-by: Simon Glass sjg@chromium.org
BTW in case you are interested, in the Chromium U-Boot tree (chromeos-v2013.06 branch) we have exynos support for turning on the cache in SPL and leaving it on through to the end of U-Boot. It runs two SPLs and two U-Boots (with verified boot and kernel verification) in a total of about 750ms. This shipped last year with Pit and Pi (Samsung Chromebook 2).
Might be some interesting patches there...
Regards, Simon
This is very interesting. Some time ago I made some tests witch the cache on/off cases for s-boot(bl1/Bl2 for trats2). Enabling the cache incredible improve the performance. Since it is easy to break the Trats2, such changes in the s-boot has no sense. But it could be easy in the future to modify the bl2 for Odroid.
Best regards,

For ARM architecture, enable the CONFIG_USE_ARCH_MEMSET/MEMCPY, will highly increase the memset/memcpy performance. This is able thanks to the ARM multiple register instructions.
Unfortunatelly the relocation is done without the cache enabled, so it takes some time, but zeroing the BSS memory takes much more longer, especially for the configs with big static buffers.
A quick test confirms, that the boot time improvement after using the arch memcpy for relocation has no significant meaning. The same test confirms that enable the memset for zeroing BSS, reduces the boot time.
So this patch enables the arch memset for zeroing the BSS after the relocation process. For ARM boards, this can be enabled in board configs by defining: 'CONFIG_USE_ARCH_MEMSET'.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry: - ~1384ms - before this change - ~888ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Cc: Albert Aribaud albert.u.boot@aribaud.net Cc: Tom Rini trini@ti.com --- arch/arm/lib/crt0.S | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S index 22df3e5..fab3d2c 100644 --- a/arch/arm/lib/crt0.S +++ b/arch/arm/lib/crt0.S @@ -115,14 +115,22 @@ here: bl c_runtime_cpu_setup /* we still call old routine here */
ldr r0, =__bss_start /* this is auto-relocated! */ - ldr r1, =__bss_end /* this is auto-relocated! */
+#ifdef CONFIG_USE_ARCH_MEMSET + ldr r3, =__bss_end /* this is auto-relocated! */ + mov r1, #0x00000000 /* prepare zero to clear BSS */ + + subs r2, r3, r0 /* r2 = memset len */ + bl memset +#else + ldr r1, =__bss_end /* this is auto-relocated! */ mov r2, #0x00000000 /* prepare zero to clear BSS */
clbss_l:cmp r0, r1 /* while not at end of BSS */ strlo r2, [r0] /* clear 32-bit BSS word */ addlo r0, r0, #4 /* move to next */ blo clbss_l +#endif
bl coloured_LED_init bl red_led_on

Hello,
On 02/16/2015 04:13 PM, Przemyslaw Marczak wrote:
For ARM architecture, enable the CONFIG_USE_ARCH_MEMSET/MEMCPY, will highly increase the memset/memcpy performance. This is able thanks to the ARM multiple register instructions.
Unfortunatelly the relocation is done without the cache enabled, so it takes some time, but zeroing the BSS memory takes much more longer, especially for the configs with big static buffers.
A quick test confirms, that the boot time improvement after using the arch memcpy for relocation has no significant meaning. The same test confirms that enable the memset for zeroing BSS, reduces the boot time.
So this patch enables the arch memset for zeroing the BSS after the relocation process. For ARM boards, this can be enabled in board configs by defining: 'CONFIG_USE_ARCH_MEMSET'.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry:
- ~1384ms - before this change
- ~888ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Cc: Albert Aribaud albert.u.boot@aribaud.net Cc: Tom Rini trini@ti.com
arch/arm/lib/crt0.S | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S index 22df3e5..fab3d2c 100644 --- a/arch/arm/lib/crt0.S +++ b/arch/arm/lib/crt0.S @@ -115,14 +115,22 @@ here: bl c_runtime_cpu_setup /* we still call old routine here */
ldr r0, =__bss_start /* this is auto-relocated! */
- ldr r1, =__bss_end /* this is auto-relocated! */
+#ifdef CONFIG_USE_ARCH_MEMSET
- ldr r3, =__bss_end /* this is auto-relocated! */
- mov r1, #0x00000000 /* prepare zero to clear BSS */
- subs r2, r3, r0 /* r2 = memset len */
- bl memset
+#else
ldr r1, =__bss_end /* this is auto-relocated! */ mov r2, #0x00000000 /* prepare zero to clear BSS */
clbss_l:cmp r0, r1 /* while not at end of BSS */ strlo r2, [r0] /* clear 32-bit BSS word */ addlo r0, r0, #4 /* move to next */ blo clbss_l
+#endif
bl coloured_LED_init bl red_led_on
This commit left unchanged. After boot time test using oscilloscope and the clock cycle counter I didn't noticed a time difference in more then one ms. In this case I think that insert a duplicated code here, has no sense.
Best regards,

Hi Przemyslaw,
On 16 February 2015 at 08:21, Przemyslaw Marczak p.marczak@samsung.com wrote:
Hello,
On 02/16/2015 04:13 PM, Przemyslaw Marczak wrote:
For ARM architecture, enable the CONFIG_USE_ARCH_MEMSET/MEMCPY, will highly increase the memset/memcpy performance. This is able thanks to the ARM multiple register instructions.
Unfortunatelly the relocation is done without the cache enabled, so it takes some time, but zeroing the BSS memory takes much more longer, especially for the configs with big static buffers.
A quick test confirms, that the boot time improvement after using the arch memcpy for relocation has no significant meaning. The same test confirms that enable the memset for zeroing BSS, reduces the boot time.
So this patch enables the arch memset for zeroing the BSS after the relocation process. For ARM boards, this can be enabled in board configs by defining: 'CONFIG_USE_ARCH_MEMSET'.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry:
- ~1384ms - before this change
- ~888ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Cc: Albert Aribaud albert.u.boot@aribaud.net Cc: Tom Rini trini@ti.com
arch/arm/lib/crt0.S | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S index 22df3e5..fab3d2c 100644 --- a/arch/arm/lib/crt0.S +++ b/arch/arm/lib/crt0.S @@ -115,14 +115,22 @@ here: bl c_runtime_cpu_setup /* we still call old routine here */
ldr r0, =__bss_start /* this is auto-relocated! */
ldr r1, =__bss_end /* this is auto-relocated! */
+#ifdef CONFIG_USE_ARCH_MEMSET
ldr r3, =__bss_end /* this is auto-relocated! */
mov r1, #0x00000000 /* prepare zero to clear BSS */
subs r2, r3, r0 /* r2 = memset len */
bl memset
+#else
ldr r1, =__bss_end /* this is auto-relocated! */ mov r2, #0x00000000 /* prepare zero to clear BSS */
clbss_l:cmp r0, r1 /* while not at end of BSS */ strlo r2, [r0] /* clear 32-bit BSS word */ addlo r0, r0, #4 /* move to next */ blo clbss_l
+#endif
bl coloured_LED_init bl red_led_on
This commit left unchanged. After boot time test using oscilloscope and the clock cycle counter I didn't noticed a time difference in more then one ms. In this case I think that insert a duplicated code here, has no sense.
I don't understand this comment, sorry.
Regards, Simon

Hello,
On 02/18/2015 05:32 AM, Simon Glass wrote:
Hi Przemyslaw,
On 16 February 2015 at 08:21, Przemyslaw Marczak p.marczak@samsung.com wrote:
Hello,
On 02/16/2015 04:13 PM, Przemyslaw Marczak wrote:
For ARM architecture, enable the CONFIG_USE_ARCH_MEMSET/MEMCPY, will highly increase the memset/memcpy performance. This is able thanks to the ARM multiple register instructions.
Unfortunatelly the relocation is done without the cache enabled, so it takes some time, but zeroing the BSS memory takes much more longer, especially for the configs with big static buffers.
A quick test confirms, that the boot time improvement after using the arch memcpy for relocation has no significant meaning. The same test confirms that enable the memset for zeroing BSS, reduces the boot time.
So this patch enables the arch memset for zeroing the BSS after the relocation process. For ARM boards, this can be enabled in board configs by defining: 'CONFIG_USE_ARCH_MEMSET'.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry:
- ~1384ms - before this change
- ~888ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Cc: Albert Aribaud albert.u.boot@aribaud.net Cc: Tom Rini trini@ti.com
arch/arm/lib/crt0.S | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S index 22df3e5..fab3d2c 100644 --- a/arch/arm/lib/crt0.S +++ b/arch/arm/lib/crt0.S @@ -115,14 +115,22 @@ here: bl c_runtime_cpu_setup /* we still call old routine here */
ldr r0, =__bss_start /* this is auto-relocated! */
ldr r1, =__bss_end /* this is auto-relocated! */
+#ifdef CONFIG_USE_ARCH_MEMSET
ldr r3, =__bss_end /* this is auto-relocated! */
mov r1, #0x00000000 /* prepare zero to clear BSS */
subs r2, r3, r0 /* r2 = memset len */
bl memset
+#else
ldr r1, =__bss_end /* this is auto-relocated! */ mov r2, #0x00000000 /* prepare zero to clear BSS */
clbss_l:cmp r0, r1 /* while not at end of BSS */ strlo r2, [r0] /* clear 32-bit BSS word */ addlo r0, r0, #4 /* move to next */ blo clbss_l
+#endif
bl coloured_LED_init bl red_led_on
This commit left unchanged. After boot time test using oscilloscope and the clock cycle counter I didn't noticed a time difference in more then one ms. In this case I think that insert a duplicated code here, has no sense.
I don't understand this comment, sorry.
Regards, Simon
Sorry for the misleading message. When I send this patch set, I forgot about adding the message-id of the previous thread as "in-reply-to".
There was a discussion about insert the memory zeroing routines as an asm here, instead of using the 'memset' call. But I tested that there is no difference in the performance. So in this case, it's better to use the common lib and this commit is the same as it was in the first version.
(I missed the changelog)
Best regards,

On 18 February 2015 at 05:31, Przemyslaw Marczak p.marczak@samsung.com wrote:
Hello,
On 02/18/2015 05:32 AM, Simon Glass wrote:
Hi Przemyslaw,
On 16 February 2015 at 08:21, Przemyslaw Marczak p.marczak@samsung.com wrote:
Hello,
On 02/16/2015 04:13 PM, Przemyslaw Marczak wrote:
For ARM architecture, enable the CONFIG_USE_ARCH_MEMSET/MEMCPY, will highly increase the memset/memcpy performance. This is able thanks to the ARM multiple register instructions.
Unfortunatelly the relocation is done without the cache enabled, so it takes some time, but zeroing the BSS memory takes much more longer, especially for the configs with big static buffers.
A quick test confirms, that the boot time improvement after using the arch memcpy for relocation has no significant meaning. The same test confirms that enable the memset for zeroing BSS, reduces the boot time.
So this patch enables the arch memset for zeroing the BSS after the relocation process. For ARM boards, this can be enabled in board configs by defining: 'CONFIG_USE_ARCH_MEMSET'.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry:
- ~1384ms - before this change
- ~888ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Cc: Albert Aribaud albert.u.boot@aribaud.net Cc: Tom Rini trini@ti.com
arch/arm/lib/crt0.S | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S index 22df3e5..fab3d2c 100644 --- a/arch/arm/lib/crt0.S +++ b/arch/arm/lib/crt0.S @@ -115,14 +115,22 @@ here: bl c_runtime_cpu_setup /* we still call old routine here */
ldr r0, =__bss_start /* this is auto-relocated! */
ldr r1, =__bss_end /* this is auto-relocated! */
+#ifdef CONFIG_USE_ARCH_MEMSET
ldr r3, =__bss_end /* this is auto-relocated! */
mov r1, #0x00000000 /* prepare zero to clear BSS */
subs r2, r3, r0 /* r2 = memset len */
bl memset
+#else
ldr r1, =__bss_end /* this is auto-relocated! */ mov r2, #0x00000000 /* prepare zero to clear BSS */
clbss_l:cmp r0, r1 /* while not at end of BSS */ strlo r2, [r0] /* clear 32-bit BSS word */ addlo r0, r0, #4 /* move to next */ blo clbss_l
+#endif
bl coloured_LED_init bl red_led_on
This commit left unchanged. After boot time test using oscilloscope and the clock cycle counter I didn't noticed a time difference in more then one ms. In this case I think that insert a duplicated code here, has no sense.
I don't understand this comment, sorry.
Regards, Simon
Sorry for the misleading message. When I send this patch set, I forgot about adding the message-id of the previous thread as "in-reply-to".
There was a discussion about insert the memory zeroing routines as an asm here, instead of using the 'memset' call. But I tested that there is no difference in the performance. So in this case, it's better to use the common lib and this commit is the same as it was in the first version.
(I missed the changelog)
I see, thanks.
Reviewed-by: Simon Glass sjg@chromium.org

For writing files, DFU implementation requires the file buffer with the len at least of file size. For big files it requires the same big buffer.
Previously the file buffer was allocated as a static variable, so it was a part of U-Boot .bss section. For 32MiB len of buffer we have 32MiB of additional space, required for this section.
The .bss needs to be cleared after the relocation. This introduces an additional boot delay at every start, but usually the dfu feature is not required at the standard boot, so the buffer should be allocated only if required.
This patch removes the static allocation of this buffer, and alloc it with memalign after first call of function: - dfu_fill_entity_mmc() and the buffer is freed on dfu_free_entity() call.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry: - ~888ms - before this change (arch memset enabled for .bss clear) - ~464ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Cc: Lukasz Majewski l.majewski@samsung.com Cc: Stephen Warren swarren@nvidia.com Cc: Pantelis Antoniou panto@antoniou-consulting.com Cc: Tom Rini trini@ti.com Cc: Marek Vasut marek.vasut@gmail.com --- drivers/dfu/dfu_mmc.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/drivers/dfu/dfu_mmc.c b/drivers/dfu/dfu_mmc.c index 62d72fe..fd865e1 100644 --- a/drivers/dfu/dfu_mmc.c +++ b/drivers/dfu/dfu_mmc.c @@ -16,8 +16,7 @@ #include <fat.h> #include <mmc.h>
-static unsigned char __aligned(CONFIG_SYS_CACHELINE_SIZE) - dfu_file_buf[CONFIG_SYS_DFU_MAX_FILE_SIZE]; +static unsigned char *dfu_file_buf; static long dfu_file_buf_len;
static int mmc_access_part(struct dfu_entity *dfu, struct mmc *mmc, int part) @@ -211,7 +210,7 @@ int dfu_flush_medium_mmc(struct dfu_entity *dfu)
if (dfu->layout != DFU_RAW_ADDR) { /* Do stuff here. */ - ret = mmc_file_op(DFU_OP_WRITE, dfu, &dfu_file_buf, + ret = mmc_file_op(DFU_OP_WRITE, dfu, dfu_file_buf, &dfu_file_buf_len);
/* Now that we're done */ @@ -263,6 +262,14 @@ int dfu_read_medium_mmc(struct dfu_entity *dfu, u64 offset, void *buf, return ret; }
+void dfu_free_entity_mmc(struct dfu_entity *dfu) +{ + if (dfu_file_buf) { + free(dfu_file_buf); + dfu_file_buf = NULL; + } +} + /* * @param s Parameter string containing space-separated arguments: * 1st: @@ -370,6 +377,18 @@ int dfu_fill_entity_mmc(struct dfu_entity *dfu, char *devstr, char *s) dfu->write_medium = dfu_write_medium_mmc; dfu->flush_medium = dfu_flush_medium_mmc; dfu->inited = 0; + dfu->free_entity = dfu_free_entity_mmc; + + /* Check if file buffer is ready */ + if (!dfu_file_buf) { + dfu_file_buf = memalign(CONFIG_SYS_CACHELINE_SIZE, + CONFIG_SYS_DFU_MAX_FILE_SIZE); + if (!dfu_file_buf) { + error("Could not memalign 0x%x bytes", + CONFIG_SYS_DFU_MAX_FILE_SIZE); + return -ENOMEM; + } + }
return 0; }

On 16 February 2015 at 08:13, Przemyslaw Marczak p.marczak@samsung.com wrote:
For writing files, DFU implementation requires the file buffer with the len at least of file size. For big files it requires the same big buffer.
Previously the file buffer was allocated as a static variable, so it was a part of U-Boot .bss section. For 32MiB len of buffer we have 32MiB of additional space, required for this section.
The .bss needs to be cleared after the relocation. This introduces an additional boot delay at every start, but usually the dfu feature is not required at the standard boot, so the buffer should be allocated only if required.
This patch removes the static allocation of this buffer, and alloc it with memalign after first call of function:
- dfu_fill_entity_mmc()
and the buffer is freed on dfu_free_entity() call.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry:
- ~888ms - before this change (arch memset enabled for .bss clear)
- ~464ms - after this change
Wow.
Reviewed-by: Simon Glass sjg@chromium.org

This commit introduces new config: CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING.
Before this change, the all amount of memory reserved for the malloc, was set to zero in mem_malloc_init(). When the malloc reserved memory exceeds few MiB, then the boot process can slow down.
So enabling this config, is an option to reduce the boot time.
Note: After enable this option, only calloc() will return the pointer to zeroed memory area. Previously, without this option, the memory pointed to untouched malloc memory region, was filled with zeros. So it means, that code with malloc() calls should be reexamined.
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com --- common/dlmalloc.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/common/dlmalloc.c b/common/dlmalloc.c index 6453ee9..63f68ed 100644 --- a/common/dlmalloc.c +++ b/common/dlmalloc.c @@ -1535,9 +1535,9 @@ void mem_malloc_init(ulong start, ulong size)
debug("using memory %#lx-%#lx for malloc()\n", mem_malloc_start, mem_malloc_end); - - memset((void *)mem_malloc_start, 0, size); - +#ifndef CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING + memset((void *)mem_malloc_start, 0x0, size); +#endif malloc_bin_reloc(); }
@@ -2948,10 +2948,12 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size;
/* check if expand_top called, in which case don't need to clear */ +#ifndef CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING #if MORECORE_CLEARS mchunkptr oldtop = top; INTERNAL_SIZE_T oldtopsize = chunksize(top); #endif +#endif Void_t* mem = mALLOc (sz);
if ((long)n < 0) return NULL; @@ -2977,6 +2979,7 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size;
csz = chunksize(p);
+#ifndef CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING #if MORECORE_CLEARS if (p == oldtop && csz > oldtopsize) { @@ -2984,6 +2987,7 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size; csz = oldtopsize; } #endif +#endif
MALLOC_ZERO(mem, csz - SIZE_SZ); return mem;

Hi Przemyslaw,
On 16 February 2015 at 08:13, Przemyslaw Marczak p.marczak@samsung.com wrote:
This commit introduces new config: CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING.
Before this change, the all amount of memory reserved for the malloc, was set to zero in mem_malloc_init(). When the malloc reserved memory exceeds few MiB, then the boot process can slow down.
So enabling this config, is an option to reduce the boot time.
Note: After enable this option, only calloc() will return the pointer to zeroed memory area. Previously, without this option, the memory pointed to untouched malloc memory region, was filled with zeros. So it means, that code with malloc() calls should be reexamined.
Can this go in Kconfig somewhere?
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com
common/dlmalloc.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/common/dlmalloc.c b/common/dlmalloc.c index 6453ee9..63f68ed 100644 --- a/common/dlmalloc.c +++ b/common/dlmalloc.c @@ -1535,9 +1535,9 @@ void mem_malloc_init(ulong start, ulong size)
debug("using memory %#lx-%#lx for malloc()\n", mem_malloc_start, mem_malloc_end);
memset((void *)mem_malloc_start, 0, size);
+#ifndef CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING
memset((void *)mem_malloc_start, 0x0, size);
+#endif malloc_bin_reloc(); }
@@ -2948,10 +2948,12 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size;
/* check if expand_top called, in which case don't need to clear */ +#ifndef CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING #if MORECORE_CLEARS mchunkptr oldtop = top; INTERNAL_SIZE_T oldtopsize = chunksize(top); #endif +#endif Void_t* mem = mALLOc (sz);
if ((long)n < 0) return NULL; @@ -2977,6 +2979,7 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size;
csz = chunksize(p);
+#ifndef CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING #if MORECORE_CLEARS if (p == oldtop && csz > oldtopsize) { @@ -2984,6 +2987,7 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size; csz = oldtopsize; } #endif +#endif
MALLOC_ZERO(mem, csz - SIZE_SZ); return mem;
-- 1.9.1
Regards, Simon

Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com --- README | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/README b/README index fefa71c..8673640 100644 --- a/README +++ b/README @@ -3989,6 +3989,13 @@ Configuration Settings: - CONFIG_SYS_MALLOC_LEN: Size of DRAM reserved for malloc() use.
+- CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING: + Do not set to zero the reserved DRAM area when init malloc. + For very big CONFIG_SYS_MALLOC_LEN(more than one MB), this will + reduce the boot time. + Before enabling this, please check if malloc calls, maybe + should be replaced by calloc - if expects zeroed memory. + - CONFIG_SYS_MALLOC_F_LEN Size of the malloc() pool for use before relocation. If this is defined, then a very simple malloc() implementation

Hi Przemyslaw,
On 16 February 2015 at 08:13, Przemyslaw Marczak p.marczak@samsung.com wrote:
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com
README | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/README b/README index fefa71c..8673640 100644 --- a/README +++ b/README @@ -3989,6 +3989,13 @@ Configuration Settings:
- CONFIG_SYS_MALLOC_LEN: Size of DRAM reserved for malloc() use.
+- CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING:
Do not set to zero the reserved DRAM area when init malloc.
For very big CONFIG_SYS_MALLOC_LEN(more than one MB), this will
reduce the boot time.
Before enabling this, please check if malloc calls, maybe
should be replaced by calloc - if expects zeroed memory.
I think if you put this in Kconfig you can put this help there.
- CONFIG_SYS_MALLOC_F_LEN Size of the malloc() pool for use before relocation. If this is defined, then a very simple malloc() implementation
(this one is in Kconfig now too)
Regards, Simon

Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com --- Kconfig | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/Kconfig b/Kconfig index 4157da3..e08e44a 100644 --- a/Kconfig +++ b/Kconfig @@ -57,13 +57,25 @@ config CC_OPTIMIZE_FOR_SIZE This option is enabled by default for U-Boot.
menuconfig EXPERT - bool "Configure standard U-Boot features (expert users)" - help - This option allows certain base U-Boot options and settings - to be disabled or tweaked. This is for specialized - environments which can tolerate a "non-standard" U-Boot. - Only use this if you really know what you are doing. - + bool "Configure standard U-Boot features (expert users)" + help + This option allows certain base U-Boot options and settings + to be disabled or tweaked. This is for specialized + environments which can tolerate a "non-standard" U-Boot. + Only use this if you really know what you are doing. + +if EXPERT + config SYS_MALLOC_INIT_SKIP_ZEROING + bool "Skip memset at malloc init (reduce boot time)" + help + This avoids zeroing memory reserved for malloc at malloc init. + Significant boot time reduction is visible for configs in which + CONFIG_SYS_MALLOC_LEN value, has more than few MiB. + Useful for bzip2, bmp logo. + Warning: + When enable, make sure that calloc() is used when zeroed + memory is needed. +endif endmenu # General setup
menu "Boot images"

Hi Przemyslaw,
On 16 February 2015 at 08:13, Przemyslaw Marczak p.marczak@samsung.com wrote:
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com
Kconfig | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/Kconfig b/Kconfig index 4157da3..e08e44a 100644 --- a/Kconfig +++ b/Kconfig @@ -57,13 +57,25 @@ config CC_OPTIMIZE_FOR_SIZE This option is enabled by default for U-Boot.
Ah, you have done this. Then I think you can merge this patch with the dlmalloc patch and drop the README one.
menuconfig EXPERT
bool "Configure standard U-Boot features (expert users)"
help
This option allows certain base U-Boot options and settings
to be disabled or tweaked. This is for specialized
environments which can tolerate a "non-standard" U-Boot.
Only use this if you really know what you are doing.
bool "Configure standard U-Boot features (expert users)"
help
This option allows certain base U-Boot options and settings
to be disabled or tweaked. This is for specialized
environments which can tolerate a "non-standard" U-Boot.
Only use this if you really know what you are doing.
+if EXPERT
config SYS_MALLOC_INIT_SKIP_ZEROING
bool "Skip memset at malloc init (reduce boot time)"
help
This avoids zeroing memory reserved for malloc at malloc init.
Significant boot time reduction is visible for configs in which
CONFIG_SYS_MALLOC_LEN value, has more than few MiB.
Useful for bzip2, bmp logo.
Warning:
When enable, make sure that calloc() is used when zeroed
memory is needed.
+endif endmenu # General setup
menu "Boot images"
1.9.1
Regards, Simon

Hi Simon,
On 02/18/2015 05:32 AM, Simon Glass wrote:
Hi Przemyslaw,
On 16 February 2015 at 08:13, Przemyslaw Marczak p.marczak@samsung.com wrote:
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com
Kconfig | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/Kconfig b/Kconfig index 4157da3..e08e44a 100644 --- a/Kconfig +++ b/Kconfig @@ -57,13 +57,25 @@ config CC_OPTIMIZE_FOR_SIZE This option is enabled by default for U-Boot.
Ah, you have done this. Then I think you can merge this patch with the dlmalloc patch and drop the README one.
Shouldn't we keep both, README and Kconfig help? Kconfig is just a configuration tool, README is a documentation. Sometimes it could be faster to find something in the text instead of config.
menuconfig EXPERT
bool "Configure standard U-Boot features (expert users)"
help
This option allows certain base U-Boot options and settings
to be disabled or tweaked. This is for specialized
environments which can tolerate a "non-standard" U-Boot.
Only use this if you really know what you are doing.
bool "Configure standard U-Boot features (expert users)"
help
This option allows certain base U-Boot options and settings
to be disabled or tweaked. This is for specialized
environments which can tolerate a "non-standard" U-Boot.
Only use this if you really know what you are doing.
+if EXPERT
config SYS_MALLOC_INIT_SKIP_ZEROING
bool "Skip memset at malloc init (reduce boot time)"
help
This avoids zeroing memory reserved for malloc at malloc init.
Significant boot time reduction is visible for configs in which
CONFIG_SYS_MALLOC_LEN value, has more than few MiB.
Useful for bzip2, bmp logo.
Warning:
When enable, make sure that calloc() is used when zeroed
memory is needed.
+endif endmenu # General setup
menu "Boot images"
1.9.1
Regards, Simon
Best regards,

+Masahiro
Hi,
On 18 February 2015 at 05:40, Przemyslaw Marczak p.marczak@samsung.com wrote:
Hi Simon,
On 02/18/2015 05:32 AM, Simon Glass wrote:
Hi Przemyslaw,
On 16 February 2015 at 08:13, Przemyslaw Marczak p.marczak@samsung.com wrote:
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com
Kconfig | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/Kconfig b/Kconfig index 4157da3..e08e44a 100644 --- a/Kconfig +++ b/Kconfig @@ -57,13 +57,25 @@ config CC_OPTIMIZE_FOR_SIZE This option is enabled by default for U-Boot.
Ah, you have done this. Then I think you can merge this patch with the dlmalloc patch and drop the README one.
Shouldn't we keep both, README and Kconfig help? Kconfig is just a configuration tool, README is a documentation. Sometimes it could be faster to find something in the text instead of config.
Agreed, but isn't it going to be a pain to add it in both places and keep it in sync? Maybe we could create a script which creates a README.kconfig containing all the options and help.
menuconfig EXPERT
bool "Configure standard U-Boot features (expert users)"
help
This option allows certain base U-Boot options and settings
to be disabled or tweaked. This is for specialized
environments which can tolerate a "non-standard" U-Boot.
Only use this if you really know what you are doing.
bool "Configure standard U-Boot features (expert users)"
help
This option allows certain base U-Boot options and settings
to be disabled or tweaked. This is for specialized
environments which can tolerate a "non-standard" U-Boot.
Only use this if you really know what you are doing.
+if EXPERT
config SYS_MALLOC_INIT_SKIP_ZEROING
bool "Skip memset at malloc init (reduce boot time)"
help
This avoids zeroing memory reserved for malloc at malloc init.
Significant boot time reduction is visible for configs in which
CONFIG_SYS_MALLOC_LEN value, has more than few MiB.
Useful for bzip2, bmp logo.
Warning:
When enable, make sure that calloc() is used when zeroed
memory is needed.
+endif endmenu # General setup
menu "Boot images"
Regards, Simon

On Thu, 19 Feb 2015 11:59:07 -0700 Simon Glass sjg@chromium.org wrote:
+Masahiro
Hi,
On 18 February 2015 at 05:40, Przemyslaw Marczak p.marczak@samsung.com wrote:
Hi Simon,
On 02/18/2015 05:32 AM, Simon Glass wrote:
Hi Przemyslaw,
On 16 February 2015 at 08:13, Przemyslaw Marczak p.marczak@samsung.com wrote:
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com
Kconfig | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/Kconfig b/Kconfig index 4157da3..e08e44a 100644 --- a/Kconfig +++ b/Kconfig @@ -57,13 +57,25 @@ config CC_OPTIMIZE_FOR_SIZE This option is enabled by default for U-Boot.
Ah, you have done this. Then I think you can merge this patch with the dlmalloc patch and drop the README one.
Shouldn't we keep both, README and Kconfig help? Kconfig is just a configuration tool, README is a documentation. Sometimes it could be faster to find something in the text instead of config.
Agreed, but isn't it going to be a pain to add it in both places and keep it in sync? Maybe we could create a script which creates a README.kconfig containing all the options and help.
I agree with Simon.
README is useful when we need a wider and more general explanation about a feature, for ex. doc/driver-model/README.txt
Per-config explanation should be documented in Kconfig only.
Another reasone I prefer Kconfig help is: When we remove a CONFIG from Kconfig, the documentation in a separete README might be left over.
BTW, I did not know that U-Boot filled all the malloc space with zero.
I guess our consensus is that malloc() returns uninitialized memory.
So, I am happy with this patch. (Perhaps, could it be enabled by default? )
Best Regards Masahiro Yamada

Hello,
On 02/20/2015 08:32 AM, Masahiro Yamada wrote:
On Thu, 19 Feb 2015 11:59:07 -0700 Simon Glass sjg@chromium.org wrote:
+Masahiro
Hi,
On 18 February 2015 at 05:40, Przemyslaw Marczak p.marczak@samsung.com wrote:
Hi Simon,
On 02/18/2015 05:32 AM, Simon Glass wrote:
Hi Przemyslaw,
On 16 February 2015 at 08:13, Przemyslaw Marczak p.marczak@samsung.com wrote:
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com
Kconfig | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/Kconfig b/Kconfig index 4157da3..e08e44a 100644 --- a/Kconfig +++ b/Kconfig @@ -57,13 +57,25 @@ config CC_OPTIMIZE_FOR_SIZE This option is enabled by default for U-Boot.
Ah, you have done this. Then I think you can merge this patch with the dlmalloc patch and drop the README one.
Shouldn't we keep both, README and Kconfig help? Kconfig is just a configuration tool, README is a documentation. Sometimes it could be faster to find something in the text instead of config.
Agreed, but isn't it going to be a pain to add it in both places and keep it in sync? Maybe we could create a script which creates a README.kconfig containing all the options and help.
I agree with Simon.
README is useful when we need a wider and more general explanation about a feature, for ex. doc/driver-model/README.txt
Per-config explanation should be documented in Kconfig only.
Another reasone I prefer Kconfig help is: When we remove a CONFIG from Kconfig, the documentation in a separete README might be left over.
Yes, but this is the maintainer role, to keep it in sync. It's no problem for me. I can remove the README entry. I just think that it may be useful for people who just starting the fun with U-Boot.
BTW, I did not know that U-Boot filled all the malloc space with zero.
The same as I, but checking the execution time of some functions, shows that something takes too long time.
I guess our consensus is that malloc() returns uninitialized memory.
Yes, as should it do.
So, I am happy with this patch. (Perhaps, could it be enabled by default? )
This could potentially break something. Let's give a free hand to maintainers to enable this after test. And this is why I add such config option.
Best Regards Masahiro Yamada
So as Simon and Masahiro wish, I will remove the README entry for this.
Best regards,

Reduce the boot time of Trats2 by disabling the memset at malloc init.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry: - ~464ms - before this change (arch memset enabled for .bss clear) - ~341ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com --- configs/trats2_defconfig | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/configs/trats2_defconfig b/configs/trats2_defconfig index 1b98b73..21cc37a 100644 --- a/configs/trats2_defconfig +++ b/configs/trats2_defconfig @@ -3,3 +3,5 @@ CONFIG_ARCH_EXYNOS=y CONFIG_TARGET_TRATS2=y CONFIG_OF_CONTROL=y CONFIG_DEFAULT_DEVICE_TREE="exynos4412-trats2" +CONFIG_EXPERT=y +CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING=y

On 16 February 2015 at 08:13, Przemyslaw Marczak p.marczak@samsung.com wrote:
Reduce the boot time of Trats2 by disabling the memset at malloc init.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry:
- ~464ms - before this change (arch memset enabled for .bss clear)
- ~341ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com
configs/trats2_defconfig | 2 ++ 1 file changed, 2 insertions(+)
Reviewed-by: Simon Glass sjg@chromium.org

Reduce the boot time of Odroid X2/U3 by disabling the memset at malloc init.
This was tested on Odroid X2. A quick test with checking gpio pin state using the oscilloscope. Boot time from start to bootcmd (change gpio state by memory write command): - ~228ms - before this change (arch memset enabled for .bss clear) - ~100ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com --- configs/odroid_defconfig | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/configs/odroid_defconfig b/configs/odroid_defconfig index a842837..24af164 100644 --- a/configs/odroid_defconfig +++ b/configs/odroid_defconfig @@ -3,3 +3,5 @@ CONFIG_ARCH_EXYNOS=y CONFIG_TARGET_ODROID=y CONFIG_OF_CONTROL=y CONFIG_DEFAULT_DEVICE_TREE="exynos4412-odroid" +CONFIG_EXPERT=y +CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING=y

On 16 February 2015 at 08:13, Przemyslaw Marczak p.marczak@samsung.com wrote:
Reduce the boot time of Odroid X2/U3 by disabling the memset at malloc init.
This was tested on Odroid X2. A quick test with checking gpio pin state using the oscilloscope. Boot time from start to bootcmd (change gpio state by memory write command):
- ~228ms - before this change (arch memset enabled for .bss clear)
- ~100ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com
Reviewed-by: Simon Glass sjg@chromium.org

Hello Simon,
On 02/18/2015 05:32 AM, Simon Glass wrote:
On 16 February 2015 at 08:13, Przemyslaw Marczak p.marczak@samsung.com wrote:
Reduce the boot time of Odroid X2/U3 by disabling the memset at malloc init.
This was tested on Odroid X2. A quick test with checking gpio pin state using the oscilloscope. Boot time from start to bootcmd (change gpio state by memory write command):
- ~228ms - before this change (arch memset enabled for .bss clear)
- ~100ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com
Reviewed-by: Simon Glass sjg@chromium.org
Thanks for the review.
Best regards,

On 02/16/2015 08:13 AM, Przemyslaw Marczak wrote:
This patchset reduces the boot time for ARM architecture, Exynos boards, and boards with DFU enabled.
I tested this series on NVIDIA's Jetson TK1 board. It doesn't seem to introduce any new issues, but I did find a couple have crept in recently:
I'm running the following in U-Boot:
setenv dfu_alt_info "/dfu_test.bin ext4 0 1;/dfudummy.bin ext4 0 1" dfu 0 mmc 0
1)
Whenever any file is uploaded through DFU, I see:
#File System is consistent file found deleting update journal finished File System is consistent update journal finished 18425346722729591336 bytes written in 4070 ms (3.9 EiB/s)
Notice that the byte count is way off (that's from a 4KB file). The byte count is always the same invalid number. I'm not sure if this message comes from the ext4 or DFU code.
2)
Both the 4096 and 1048576 DFU tests fail in the read-back stage, with the following appearing on the U-Boot console:
#File System is consistent file found deleting update journal finished File System is consistent update journal finished 18425346722734241092 bytes written in 4070 ms (3.9 EiB/s) DOWNLOAD ... OK Ctrl+C to exit ... #File System is consistent file found deleting update journal finished File System is consistent update journal finished 18425346722734241092 bytes written in 4091 ms (3.9 EiB/s) DOWNLOAD ... OK Ctrl+C to exit ... 4096 bytes read in 127 ms (31.3 KiB/s) # UPLOAD ... done Ctrl+C to exit ... 4096 bytes read in 126 ms (31.3 KiB/s) #dfu_read: Wrong sequence number! [1] [3]
The host-side DFU test log ends with:
Opening DFU USB device... ID 0955:701a Warning: Assuming DFU version 1.0 Run-time device DFU version 0100 Found DFU: [0955:701a] devnum=0, cfg=1, intf=0, alt=0, name="/dfu_test.bin" Claiming USB DFU Interface... Setting Alternate Setting #0 ... Determining device status: state = dfuIDLE, status = 0 dfuIDLE, continuing DFU mode device DFU version 0110 Device returned transfer size 4096 bytes_per_hash=4096 Copying data from DFU device to PC Starting upload: [###dfu_upload: libusb_control_msg returned -9

On 02/17/2015 02:43 PM, Stephen Warren wrote:
On 02/16/2015 08:13 AM, Przemyslaw Marczak wrote:
This patchset reduces the boot time for ARM architecture, Exynos boards, and boards with DFU enabled.
I tested this series on NVIDIA's Jetson TK1 board. It doesn't seem to introduce any new issues, but I did find a couple have crept in recently:
I'm running the following in U-Boot:
setenv dfu_alt_info "/dfu_test.bin ext4 0 1;/dfudummy.bin ext4 0 1" dfu 0 mmc 0
Whenever any file is uploaded through DFU, I see:
#File System is consistent file found deleting update journal finished File System is consistent update journal finished 18425346722729591336 bytes written in 4070 ms (3.9 EiB/s)
Notice that the byte count is way off (that's from a 4KB file). The byte count is always the same invalid number. I'm not sure if this message comes from the ext4 or DFU code.
For the record in this thread, this is fixed by: [PATCH] fs: ext4 write: return file len on success

Hello Stephen,
On 02/17/2015 11:39 PM, Stephen Warren wrote:
On 02/17/2015 02:43 PM, Stephen Warren wrote:
On 02/16/2015 08:13 AM, Przemyslaw Marczak wrote:
This patchset reduces the boot time for ARM architecture, Exynos boards, and boards with DFU enabled.
I tested this series on NVIDIA's Jetson TK1 board. It doesn't seem to introduce any new issues, but I did find a couple have crept in recently:
I'm running the following in U-Boot:
setenv dfu_alt_info "/dfu_test.bin ext4 0 1;/dfudummy.bin ext4 0 1" dfu 0 mmc 0
Whenever any file is uploaded through DFU, I see:
#File System is consistent file found deleting update journal finished File System is consistent update journal finished 18425346722729591336 bytes written in 4070 ms (3.9 EiB/s)
Notice that the byte count is way off (that's from a 4KB file). The byte count is always the same invalid number. I'm not sure if this message comes from the ext4 or DFU code.
For the record in this thread, this is fixed by: [PATCH] fs: ext4 write: return file len on success
Thank you for testing. I should add the ext4 fix to this patchset, so it will be linked in the next version.
Best regards,

This patchset reduces the boot time for ARM architecture, Exynos boards, and (ARM) boards with DFU enabled.
For tested Trats2 and Odroid X2 devices, this was done in four steps.
1. Enable the arch memcpy and memset - ARCH specific
2. Enable arch memset for .bss clear - ARCH specific
3. Reduce the .bss section small as possible by: (board specific) - remove static dfu buffer (32MiB in .bss - Trats2), and use malloc
4. Skip zeroing the memory reserved for malloc at malloc init. For Trats2 it was 80MiB of memory.
The .bss section will grow if we have a lot of static variables. This section is cleared before jump to the relocated U-Boot, and it was done word by word. To reduce the time for this step, we can enable arch memset, which uses multiple ARM registers.
For configs with DFU enabled, we can find the dfu buffer in this section, which has at least 8MB (32MB for Trats2). This is a lot of useless data, which is not required for standard boot. So this buffer should be dynamic allocated.
So, actually the all was about unnecessary operations on 'big' data.
Przemyslaw Marczak (6): exynos: config: enable arch memcpy and arch memset arm: relocation: clear .bss section with arch memset if defined dfu: mmc: file buffer: remove static allocation dlmalloc: add option for skip memset in malloc init trats2: defconfig: enable expert and skip memset at malloc init odroid: defconfig: enable expert and skip memset at malloc init
Kconfig | 26 +++++++++++++++++++------- arch/arm/lib/crt0.S | 10 +++++++++- common/dlmalloc.c | 10 +++++++--- configs/odroid_defconfig | 2 ++ configs/trats2_defconfig | 2 ++ drivers/dfu/dfu_mmc.c | 25 ++++++++++++++++++++++--- include/configs/exynos-common.h | 3 +++ 7 files changed, 64 insertions(+), 14 deletions(-)

This commit enables the following configs: - CONFIG_USE_ARCH_MEMCPY - CONFIG_USE_ARCH_MEMSET This increases the performance of memcpy/memset and also reduces the boot time.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry: - ~1527ms - before this change (arch memset enabled for .bss clear) - ~1384ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org Cc: Minkyu Kang mk7.kang@samsung.com Cc: Akshay Saraswat akshay.s@samsung.com Cc: Simon Glass sjg@chromium.org Cc: Sjoerd Simons sjoerd.simons@collabora.co.uk
--- Changes V3 - none --- include/configs/exynos-common.h | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/include/configs/exynos-common.h b/include/configs/exynos-common.h index 59676ae..87f8db0 100644 --- a/include/configs/exynos-common.h +++ b/include/configs/exynos-common.h @@ -24,6 +24,9 @@ #define CONFIG_SKIP_LOWLEVEL_INIT #define CONFIG_BOARD_EARLY_INIT_F
+#define CONFIG_USE_ARCH_MEMCPY +#define CONFIG_USE_ARCH_MEMSET + /* Keep L2 Cache Disabled */ #define CONFIG_CMD_CACHE

For ARM architecture, enable the CONFIG_USE_ARCH_MEMSET/MEMCPY, will highly increase the memset/memcpy performance. This is able thanks to the ARM multiple register instructions.
Unfortunatelly the relocation is done without the cache enabled, so it takes some time, but zeroing the BSS memory takes much more longer, especially for the configs with big static buffers.
A quick test confirms, that the boot time improvement after using the arch memcpy for relocation has no significant meaning. The same test confirms that enable the memset for zeroing BSS, reduces the boot time.
So this patch enables the arch memset for zeroing the BSS after the relocation process. For ARM boards, this can be enabled in board configs by defining: 'CONFIG_USE_ARCH_MEMSET'.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry: - ~1384ms - before this change - ~888ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org Cc: Albert Aribaud albert.u.boot@aribaud.net Cc: Tom Rini trini@ti.com --- Changes V3 - none --- arch/arm/lib/crt0.S | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S index 22df3e5..fab3d2c 100644 --- a/arch/arm/lib/crt0.S +++ b/arch/arm/lib/crt0.S @@ -115,14 +115,22 @@ here: bl c_runtime_cpu_setup /* we still call old routine here */
ldr r0, =__bss_start /* this is auto-relocated! */ - ldr r1, =__bss_end /* this is auto-relocated! */
+#ifdef CONFIG_USE_ARCH_MEMSET + ldr r3, =__bss_end /* this is auto-relocated! */ + mov r1, #0x00000000 /* prepare zero to clear BSS */ + + subs r2, r3, r0 /* r2 = memset len */ + bl memset +#else + ldr r1, =__bss_end /* this is auto-relocated! */ mov r2, #0x00000000 /* prepare zero to clear BSS */
clbss_l:cmp r0, r1 /* while not at end of BSS */ strlo r2, [r0] /* clear 32-bit BSS word */ addlo r0, r0, #4 /* move to next */ blo clbss_l +#endif
bl coloured_LED_init bl red_led_on

For writing files, DFU implementation requires the file buffer with the len at least of file size. For big files it requires the same big buffer.
Previously the file buffer was allocated as a static variable, so it was a part of U-Boot .bss section. For 32MiB len of buffer we have 32MiB of additional space, required for this section.
The .bss needs to be cleared after the relocation. This introduces an additional boot delay at every start, but usually the dfu feature is not required at the standard boot, so the buffer should be allocated only if required.
This patch removes the static allocation of this buffer, and alloc it with memalign after first call of function: - dfu_fill_entity_mmc() and the buffer is freed on dfu_free_entity() call.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry: - ~888ms - before this change (arch memset enabled for .bss clear) - ~464ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org Cc: Lukasz Majewski l.majewski@samsung.com Cc: Stephen Warren swarren@nvidia.com Cc: Pantelis Antoniou panto@antoniou-consulting.com Cc: Tom Rini trini@ti.com Cc: Marek Vasut marek.vasut@gmail.com
--- Changes V3 - none --- drivers/dfu/dfu_mmc.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/drivers/dfu/dfu_mmc.c b/drivers/dfu/dfu_mmc.c index 62d72fe..fd865e1 100644 --- a/drivers/dfu/dfu_mmc.c +++ b/drivers/dfu/dfu_mmc.c @@ -16,8 +16,7 @@ #include <fat.h> #include <mmc.h>
-static unsigned char __aligned(CONFIG_SYS_CACHELINE_SIZE) - dfu_file_buf[CONFIG_SYS_DFU_MAX_FILE_SIZE]; +static unsigned char *dfu_file_buf; static long dfu_file_buf_len;
static int mmc_access_part(struct dfu_entity *dfu, struct mmc *mmc, int part) @@ -211,7 +210,7 @@ int dfu_flush_medium_mmc(struct dfu_entity *dfu)
if (dfu->layout != DFU_RAW_ADDR) { /* Do stuff here. */ - ret = mmc_file_op(DFU_OP_WRITE, dfu, &dfu_file_buf, + ret = mmc_file_op(DFU_OP_WRITE, dfu, dfu_file_buf, &dfu_file_buf_len);
/* Now that we're done */ @@ -263,6 +262,14 @@ int dfu_read_medium_mmc(struct dfu_entity *dfu, u64 offset, void *buf, return ret; }
+void dfu_free_entity_mmc(struct dfu_entity *dfu) +{ + if (dfu_file_buf) { + free(dfu_file_buf); + dfu_file_buf = NULL; + } +} + /* * @param s Parameter string containing space-separated arguments: * 1st: @@ -370,6 +377,18 @@ int dfu_fill_entity_mmc(struct dfu_entity *dfu, char *devstr, char *s) dfu->write_medium = dfu_write_medium_mmc; dfu->flush_medium = dfu_flush_medium_mmc; dfu->inited = 0; + dfu->free_entity = dfu_free_entity_mmc; + + /* Check if file buffer is ready */ + if (!dfu_file_buf) { + dfu_file_buf = memalign(CONFIG_SYS_CACHELINE_SIZE, + CONFIG_SYS_DFU_MAX_FILE_SIZE); + if (!dfu_file_buf) { + error("Could not memalign 0x%x bytes", + CONFIG_SYS_DFU_MAX_FILE_SIZE); + return -ENOMEM; + } + }
return 0; }

This commit introduces new config: CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING.
Before this change, the all amount of memory reserved for the malloc, was set to zero in mem_malloc_init(). When the malloc reserved memory exceeds few MiB, then the boot process can slow down.
So enabling this config, is an option to reduce the boot time.
This option can be enabled by Kconfig.
Note: After enable this option, only calloc() will return the pointer to zeroed memory area. Previously, without this option, the memory pointed to untouched malloc memory region, was filled with zeros. So it means, that code with malloc() calls should be reexamined.
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org
--- Changes v3: - squash the commit with the Kconfig option --- Kconfig | 26 +++++++++++++++++++------- common/dlmalloc.c | 10 +++++++--- 2 files changed, 26 insertions(+), 10 deletions(-)
diff --git a/Kconfig b/Kconfig index 75bab7f..87d4daf 100644 --- a/Kconfig +++ b/Kconfig @@ -76,13 +76,25 @@ config SYS_MALLOC_F_LEN initial serial device and any others that are needed.
menuconfig EXPERT - bool "Configure standard U-Boot features (expert users)" - help - This option allows certain base U-Boot options and settings - to be disabled or tweaked. This is for specialized - environments which can tolerate a "non-standard" U-Boot. - Only use this if you really know what you are doing. - + bool "Configure standard U-Boot features (expert users)" + help + This option allows certain base U-Boot options and settings + to be disabled or tweaked. This is for specialized + environments which can tolerate a "non-standard" U-Boot. + Only use this if you really know what you are doing. + +if EXPERT + config SYS_MALLOC_INIT_SKIP_ZEROING + bool "Skip memset at malloc init (reduce boot time)" + help + This avoids zeroing memory reserved for malloc at malloc init. + Significant boot time reduction is visible for configs in which + CONFIG_SYS_MALLOC_LEN value, has more than few MiB. + Useful for bzip2, bmp logo. + Warning: + When enabling this, please check if malloc calls, maybe + should be replaced by calloc - if expects zeroed memory. +endif endmenu # General setup
menu "Boot images" diff --git a/common/dlmalloc.c b/common/dlmalloc.c index 6453ee9..63f68ed 100644 --- a/common/dlmalloc.c +++ b/common/dlmalloc.c @@ -1535,9 +1535,9 @@ void mem_malloc_init(ulong start, ulong size)
debug("using memory %#lx-%#lx for malloc()\n", mem_malloc_start, mem_malloc_end); - - memset((void *)mem_malloc_start, 0, size); - +#ifndef CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING + memset((void *)mem_malloc_start, 0x0, size); +#endif malloc_bin_reloc(); }
@@ -2948,10 +2948,12 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size;
/* check if expand_top called, in which case don't need to clear */ +#ifndef CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING #if MORECORE_CLEARS mchunkptr oldtop = top; INTERNAL_SIZE_T oldtopsize = chunksize(top); #endif +#endif Void_t* mem = mALLOc (sz);
if ((long)n < 0) return NULL; @@ -2977,6 +2979,7 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size;
csz = chunksize(p);
+#ifndef CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING #if MORECORE_CLEARS if (p == oldtop && csz > oldtopsize) { @@ -2984,6 +2987,7 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size; csz = oldtopsize; } #endif +#endif
MALLOC_ZERO(mem, csz - SIZE_SZ); return mem;

Hi Przemyslaw,
On Fri, 20 Feb 2015 12:06:17 +0100 Przemyslaw Marczak p.marczak@samsung.com wrote:
This commit introduces new config: CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING.
Before this change, the all amount of memory reserved for the malloc, was set to zero in mem_malloc_init(). When the malloc reserved memory exceeds few MiB, then the boot process can slow down.
So enabling this config, is an option to reduce the boot time.
This option can be enabled by Kconfig.
Note: After enable this option, only calloc() will return the pointer to zeroed memory area. Previously, without this option, the memory pointed to untouched malloc memory region, was filled with zeros. So it means, that code with malloc() calls should be reexamined.
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org
Changes v3:
- squash the commit with the Kconfig option
Kconfig | 26 +++++++++++++++++++------- common/dlmalloc.c | 10 +++++++--- 2 files changed, 26 insertions(+), 10 deletions(-)
diff --git a/Kconfig b/Kconfig index 75bab7f..87d4daf 100644 --- a/Kconfig +++ b/Kconfig @@ -76,13 +76,25 @@ config SYS_MALLOC_F_LEN initial serial device and any others that are needed.
menuconfig EXPERT
bool "Configure standard U-Boot features (expert users)"
help
This option allows certain base U-Boot options and settings
to be disabled or tweaked. This is for specialized
environments which can tolerate a "non-standard" U-Boot.
Only use this if you really know what you are doing.
- bool "Configure standard U-Boot features (expert users)"
- help
This option allows certain base U-Boot options and settings
to be disabled or tweaked. This is for specialized
environments which can tolerate a "non-standard" U-Boot.
Only use this if you really know what you are doing.
+if EXPERT
- config SYS_MALLOC_INIT_SKIP_ZEROING
- bool "Skip memset at malloc init (reduce boot time)"
- help
This avoids zeroing memory reserved for malloc at malloc init.
Significant boot time reduction is visible for configs in which
CONFIG_SYS_MALLOC_LEN value, has more than few MiB.
Useful for bzip2, bmp logo.
Warning:
When enabling this, please check if malloc calls, maybe
should be replaced by calloc - if expects zeroed memory.
+endif endmenu # General setup
menu "Boot images" diff --git a/common/dlmalloc.c b/common/dlmalloc.c index 6453ee9..63f68ed 100644 --- a/common/dlmalloc.c +++ b/common/dlmalloc.c @@ -1535,9 +1535,9 @@ void mem_malloc_init(ulong start, ulong size)
debug("using memory %#lx-%#lx for malloc()\n", mem_malloc_start, mem_malloc_end);
- memset((void *)mem_malloc_start, 0, size);
+#ifndef CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING
- memset((void *)mem_malloc_start, 0x0, size);
+#endif malloc_bin_reloc(); }
@@ -2948,10 +2948,12 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size;
/* check if expand_top called, in which case don't need to clear */ +#ifndef CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING #if MORECORE_CLEARS mchunkptr oldtop = top; INTERNAL_SIZE_T oldtopsize = chunksize(top); #endif +#endif Void_t* mem = mALLOc (sz);
if ((long)n < 0) return NULL; @@ -2977,6 +2979,7 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size;
csz = chunksize(p);
+#ifndef CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING #if MORECORE_CLEARS if (p == oldtop && csz > oldtopsize) { @@ -2984,6 +2987,7 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size; csz = oldtopsize; } #endif +#endif
MALLOC_ZERO(mem, csz - SIZE_SZ); return mem;
You are adding only "ifndef" conditionals.
IMHO, Isn't "#ifdef CONFIG_SYS_MALLOC_INIT_ZEROING" better? (Generally speaking, CONFIG options that disable a feature are not preferable.)
You also need to add "default y" to the Kconfig and add "# CONIFG_SYS_MALLOC_INIT_ZEROING is not set" to your defconfig.
Best Regards Masahiro Yamada

Hello,
On 02/20/2015 01:52 PM, Masahiro Yamada wrote:
Hi Przemyslaw,
On Fri, 20 Feb 2015 12:06:17 +0100 Przemyslaw Marczak p.marczak@samsung.com wrote:
This commit introduces new config: CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING.
Before this change, the all amount of memory reserved for the malloc, was set to zero in mem_malloc_init(). When the malloc reserved memory exceeds few MiB, then the boot process can slow down.
So enabling this config, is an option to reduce the boot time.
This option can be enabled by Kconfig.
Note: After enable this option, only calloc() will return the pointer to zeroed memory area. Previously, without this option, the memory pointed to untouched malloc memory region, was filled with zeros. So it means, that code with malloc() calls should be reexamined.
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org
Changes v3:
- squash the commit with the Kconfig option
Kconfig | 26 +++++++++++++++++++------- common/dlmalloc.c | 10 +++++++--- 2 files changed, 26 insertions(+), 10 deletions(-)
diff --git a/Kconfig b/Kconfig index 75bab7f..87d4daf 100644 --- a/Kconfig +++ b/Kconfig @@ -76,13 +76,25 @@ config SYS_MALLOC_F_LEN initial serial device and any others that are needed.
menuconfig EXPERT
bool "Configure standard U-Boot features (expert users)"
help
This option allows certain base U-Boot options and settings
to be disabled or tweaked. This is for specialized
environments which can tolerate a "non-standard" U-Boot.
Only use this if you really know what you are doing.
- bool "Configure standard U-Boot features (expert users)"
- help
This option allows certain base U-Boot options and settings
to be disabled or tweaked. This is for specialized
environments which can tolerate a "non-standard" U-Boot.
Only use this if you really know what you are doing.
+if EXPERT
- config SYS_MALLOC_INIT_SKIP_ZEROING
- bool "Skip memset at malloc init (reduce boot time)"
- help
This avoids zeroing memory reserved for malloc at malloc init.
Significant boot time reduction is visible for configs in which
CONFIG_SYS_MALLOC_LEN value, has more than few MiB.
Useful for bzip2, bmp logo.
Warning:
When enabling this, please check if malloc calls, maybe
should be replaced by calloc - if expects zeroed memory.
+endif endmenu # General setup
menu "Boot images" diff --git a/common/dlmalloc.c b/common/dlmalloc.c index 6453ee9..63f68ed 100644 --- a/common/dlmalloc.c +++ b/common/dlmalloc.c @@ -1535,9 +1535,9 @@ void mem_malloc_init(ulong start, ulong size)
debug("using memory %#lx-%#lx for malloc()\n", mem_malloc_start, mem_malloc_end);
- memset((void *)mem_malloc_start, 0, size);
+#ifndef CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING
- memset((void *)mem_malloc_start, 0x0, size);
+#endif malloc_bin_reloc(); }
@@ -2948,10 +2948,12 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size;
/* check if expand_top called, in which case don't need to clear */
+#ifndef CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING #if MORECORE_CLEARS mchunkptr oldtop = top; INTERNAL_SIZE_T oldtopsize = chunksize(top); #endif +#endif Void_t* mem = mALLOc (sz);
if ((long)n < 0) return NULL;
@@ -2977,6 +2979,7 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size;
csz = chunksize(p);
+#ifndef CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING #if MORECORE_CLEARS if (p == oldtop && csz > oldtopsize) { @@ -2984,6 +2987,7 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size; csz = oldtopsize; } #endif +#endif
MALLOC_ZERO(mem, csz - SIZE_SZ); return mem;
You are adding only "ifndef" conditionals.
IMHO, Isn't "#ifdef CONFIG_SYS_MALLOC_INIT_ZEROING" better? (Generally speaking, CONFIG options that disable a feature are not preferable.)
You also need to add "default y" to the Kconfig and add "# CONIFG_SYS_MALLOC_INIT_ZEROING is not set" to your defconfig.
Best Regards Masahiro Yamada
Ok, will change this to positive.
Best regards,

Reduce the boot time of Trats2 by disabling the memset at malloc init.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry: - ~464ms - before this change (arch memset enabled for .bss clear) - ~341ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org --- Changes V3 - none --- configs/trats2_defconfig | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/configs/trats2_defconfig b/configs/trats2_defconfig index 1b98b73..21cc37a 100644 --- a/configs/trats2_defconfig +++ b/configs/trats2_defconfig @@ -3,3 +3,5 @@ CONFIG_ARCH_EXYNOS=y CONFIG_TARGET_TRATS2=y CONFIG_OF_CONTROL=y CONFIG_DEFAULT_DEVICE_TREE="exynos4412-trats2" +CONFIG_EXPERT=y +CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING=y

Reduce the boot time of Odroid X2/U3 by disabling the memset at malloc init.
This was tested on Odroid X2. A quick test with checking gpio pin state using the oscilloscope. Boot time from start to bootcmd (change gpio state by memory write command): - ~228ms - before this change (arch memset enabled for .bss clear) - ~100ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org
--- Changes V3 - update commit head --- configs/odroid_defconfig | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/configs/odroid_defconfig b/configs/odroid_defconfig index a842837..24af164 100644 --- a/configs/odroid_defconfig +++ b/configs/odroid_defconfig @@ -3,3 +3,5 @@ CONFIG_ARCH_EXYNOS=y CONFIG_TARGET_ODROID=y CONFIG_OF_CONTROL=y CONFIG_DEFAULT_DEVICE_TREE="exynos4412-odroid" +CONFIG_EXPERT=y +CONFIG_SYS_MALLOC_INIT_SKIP_ZEROING=y

This patchset reduces the boot time for ARM architecture, Exynos boards, and (ARM) boards with DFU enabled.
For tested Trats2 and Odroid X2 devices, this was done in four steps.
1. Enable the arch memcpy and memset - ARCH specific
2. Enable arch memset for .bss clear - ARCH specific
3. Reduce the .bss section small as possible by: (board specific) - remove static dfu buffer (32MiB in .bss - Trats2), and use malloc
4. Skip zeroing the memory reserved for malloc at malloc init. For Trats2 it was 80MiB of memory.
The .bss section will grow if we have a lot of static variables. This section is cleared before jump to the relocated U-Boot, and it was done word by word. To reduce the time for this step, we can enable arch memset, which uses multiple ARM registers.
For configs with DFU enabled, we can find the dfu buffer in this section, which has at least 8MB (32MB for Trats2). This is a lot of useless data, which is not required for standard boot. So this buffer should be dynamic allocated.
So, actually the all was about unnecessary operations on 'big' data.
Przemyslaw Marczak (6): exynos: config: enable arch memcpy and arch memset arm: relocation: clear .bss section with arch memset if defined dfu: mmc: file buffer: remove static allocation dlmalloc: do memset in malloc init as new default config trats2: defconfig: disable memset at malloc init odroid: defconfig: disable memset at malloc init
Kconfig | 32 +++++++++++++++++++++++++------- arch/arm/lib/crt0.S | 10 +++++++++- common/dlmalloc.c | 10 +++++++--- configs/odroid_defconfig | 1 + configs/trats2_defconfig | 1 + drivers/dfu/dfu_mmc.c | 25 ++++++++++++++++++++++--- include/configs/exynos-common.h | 3 +++ 7 files changed, 68 insertions(+), 14 deletions(-)

This commit enables the following configs: - CONFIG_USE_ARCH_MEMCPY - CONFIG_USE_ARCH_MEMSET This increases the performance of memcpy/memset and also reduces the boot time.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry: - ~1527ms - before this change (arch memset enabled for .bss clear) - ~1384ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Cc: Minkyu Kang mk7.kang@samsung.com Cc: Akshay Saraswat akshay.s@samsung.com Cc: Simon Glass sjg@chromium.org Cc: Sjoerd Simons sjoerd.simons@collabora.co.uk
--- Changes V3 - none
Changes V4 - none --- include/configs/exynos-common.h | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/include/configs/exynos-common.h b/include/configs/exynos-common.h index 59676ae..87f8db0 100644 --- a/include/configs/exynos-common.h +++ b/include/configs/exynos-common.h @@ -24,6 +24,9 @@ #define CONFIG_SKIP_LOWLEVEL_INIT #define CONFIG_BOARD_EARLY_INIT_F
+#define CONFIG_USE_ARCH_MEMCPY +#define CONFIG_USE_ARCH_MEMSET + /* Keep L2 Cache Disabled */ #define CONFIG_CMD_CACHE

For ARM architecture, enable the CONFIG_USE_ARCH_MEMSET/MEMCPY, will highly increase the memset/memcpy performance. This is able thanks to the ARM multiple register instructions.
Unfortunatelly the relocation is done without the cache enabled, so it takes some time, but zeroing the BSS memory takes much more longer, especially for the configs with big static buffers.
A quick test confirms, that the boot time improvement after using the arch memcpy for relocation has no significant meaning. The same test confirms that enable the memset for zeroing BSS, reduces the boot time.
So this patch enables the arch memset for zeroing the BSS after the relocation process. For ARM boards, this can be enabled in board configs by defining: 'CONFIG_USE_ARCH_MEMSET'.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry: - ~1384ms - before this change - ~888ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Cc: Albert Aribaud albert.u.boot@aribaud.net Cc: Tom Rini trini@ti.com --- Changes V3 - none
Changes V4 - none --- arch/arm/lib/crt0.S | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S index 22df3e5..fab3d2c 100644 --- a/arch/arm/lib/crt0.S +++ b/arch/arm/lib/crt0.S @@ -115,14 +115,22 @@ here: bl c_runtime_cpu_setup /* we still call old routine here */
ldr r0, =__bss_start /* this is auto-relocated! */ - ldr r1, =__bss_end /* this is auto-relocated! */
+#ifdef CONFIG_USE_ARCH_MEMSET + ldr r3, =__bss_end /* this is auto-relocated! */ + mov r1, #0x00000000 /* prepare zero to clear BSS */ + + subs r2, r3, r0 /* r2 = memset len */ + bl memset +#else + ldr r1, =__bss_end /* this is auto-relocated! */ mov r2, #0x00000000 /* prepare zero to clear BSS */
clbss_l:cmp r0, r1 /* while not at end of BSS */ strlo r2, [r0] /* clear 32-bit BSS word */ addlo r0, r0, #4 /* move to next */ blo clbss_l +#endif
bl coloured_LED_init bl red_led_on

For writing files, DFU implementation requires the file buffer with the len at least of file size. For big files it requires the same big buffer.
Previously the file buffer was allocated as a static variable, so it was a part of U-Boot .bss section. For 32MiB len of buffer we have 32MiB of additional space, required for this section.
The .bss needs to be cleared after the relocation. This introduces an additional boot delay at every start, but usually the dfu feature is not required at the standard boot, so the buffer should be allocated only if required.
This patch removes the static allocation of this buffer, and alloc it with memalign after first call of function: - dfu_fill_entity_mmc() and the buffer is freed on dfu_free_entity() call.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry: - ~888ms - before this change (arch memset enabled for .bss clear) - ~464ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Cc: Lukasz Majewski l.majewski@samsung.com Cc: Stephen Warren swarren@nvidia.com Cc: Pantelis Antoniou panto@antoniou-consulting.com Cc: Tom Rini trini@ti.com Cc: Marek Vasut marek.vasut@gmail.com
--- Changes V3 - none
Changes V4 - none --- drivers/dfu/dfu_mmc.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/drivers/dfu/dfu_mmc.c b/drivers/dfu/dfu_mmc.c index 62d72fe..fd865e1 100644 --- a/drivers/dfu/dfu_mmc.c +++ b/drivers/dfu/dfu_mmc.c @@ -16,8 +16,7 @@ #include <fat.h> #include <mmc.h>
-static unsigned char __aligned(CONFIG_SYS_CACHELINE_SIZE) - dfu_file_buf[CONFIG_SYS_DFU_MAX_FILE_SIZE]; +static unsigned char *dfu_file_buf; static long dfu_file_buf_len;
static int mmc_access_part(struct dfu_entity *dfu, struct mmc *mmc, int part) @@ -211,7 +210,7 @@ int dfu_flush_medium_mmc(struct dfu_entity *dfu)
if (dfu->layout != DFU_RAW_ADDR) { /* Do stuff here. */ - ret = mmc_file_op(DFU_OP_WRITE, dfu, &dfu_file_buf, + ret = mmc_file_op(DFU_OP_WRITE, dfu, dfu_file_buf, &dfu_file_buf_len);
/* Now that we're done */ @@ -263,6 +262,14 @@ int dfu_read_medium_mmc(struct dfu_entity *dfu, u64 offset, void *buf, return ret; }
+void dfu_free_entity_mmc(struct dfu_entity *dfu) +{ + if (dfu_file_buf) { + free(dfu_file_buf); + dfu_file_buf = NULL; + } +} + /* * @param s Parameter string containing space-separated arguments: * 1st: @@ -370,6 +377,18 @@ int dfu_fill_entity_mmc(struct dfu_entity *dfu, char *devstr, char *s) dfu->write_medium = dfu_write_medium_mmc; dfu->flush_medium = dfu_flush_medium_mmc; dfu->inited = 0; + dfu->free_entity = dfu_free_entity_mmc; + + /* Check if file buffer is ready */ + if (!dfu_file_buf) { + dfu_file_buf = memalign(CONFIG_SYS_CACHELINE_SIZE, + CONFIG_SYS_DFU_MAX_FILE_SIZE); + if (!dfu_file_buf) { + error("Could not memalign 0x%x bytes", + CONFIG_SYS_DFU_MAX_FILE_SIZE); + return -ENOMEM; + } + }
return 0; }

This commit introduces new config: CONFIG_SYS_MALLOC_INIT_DO_ZEROING.
This config is an expert option and is enabled by default.
The all amount of memory reserved for the malloc, is by default set to zero in mem_malloc_init(). When the malloc reserved memory exceeds few MiB, then the boot process can slow down.
So disabling this config, is an expert option to reduce the boot time, and can be disabled by Kconfig.
Note: After disable this option, only calloc() will return the pointer to the zeroed memory area. Previously, without this option, the memory pointed to untouched malloc memory region, was filled with zeros. So it means, that code with malloc() calls should be reexamined.
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com
--- Changes v3: - squash the commit with the Kconfig option
Changes v4: - adjust commit message for the new config --- Kconfig | 32 +++++++++++++++++++++++++------- common/dlmalloc.c | 10 +++++++--- 2 files changed, 32 insertions(+), 10 deletions(-)
diff --git a/Kconfig b/Kconfig index 75bab7f..9ea99b5 100644 --- a/Kconfig +++ b/Kconfig @@ -76,13 +76,31 @@ config SYS_MALLOC_F_LEN initial serial device and any others that are needed.
menuconfig EXPERT - bool "Configure standard U-Boot features (expert users)" - help - This option allows certain base U-Boot options and settings - to be disabled or tweaked. This is for specialized - environments which can tolerate a "non-standard" U-Boot. - Only use this if you really know what you are doing. - + bool "Configure standard U-Boot features (expert users)" + default y + help + This option allows certain base U-Boot options and settings + to be disabled or tweaked. This is for specialized + environments which can tolerate a "non-standard" U-Boot. + Only use this if you really know what you are doing. + +if EXPERT + config SYS_MALLOC_INIT_DO_ZEROING + bool "Init with zeros the memory reserved for malloc (slow)" + default y + help + This setting is enabled by default. The reserved malloc + memory is initialized with zeros, so first malloc calls + will return the pointer to the zeroed memory. But this + slows the boot time. + + It is recommended to disable it, when CONFIG_SYS_MALLOC_LEN + value, has more than few MiB, e.g. when uses bzip2 or bmp logo. + Then the boot time can be significantly reduced. + Warning: + When disabling this, please check if malloc calls, maybe + should be replaced by calloc - if expects zeroed memory. +endif endmenu # General setup
menu "Boot images" diff --git a/common/dlmalloc.c b/common/dlmalloc.c index 6453ee9..21e103b 100644 --- a/common/dlmalloc.c +++ b/common/dlmalloc.c @@ -1535,9 +1535,9 @@ void mem_malloc_init(ulong start, ulong size)
debug("using memory %#lx-%#lx for malloc()\n", mem_malloc_start, mem_malloc_end); - - memset((void *)mem_malloc_start, 0, size); - +#ifdef CONFIG_SYS_MALLOC_INIT_DO_ZEROING + memset((void *)mem_malloc_start, 0x0, size); +#endif malloc_bin_reloc(); }
@@ -2948,10 +2948,12 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size;
/* check if expand_top called, in which case don't need to clear */ +#ifdef CONFIG_SYS_MALLOC_INIT_DO_ZEROING #if MORECORE_CLEARS mchunkptr oldtop = top; INTERNAL_SIZE_T oldtopsize = chunksize(top); #endif +#endif Void_t* mem = mALLOc (sz);
if ((long)n < 0) return NULL; @@ -2977,6 +2979,7 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size;
csz = chunksize(p);
+#ifdef CONFIG_SYS_MALLOC_INIT_DO_ZEROING #if MORECORE_CLEARS if (p == oldtop && csz > oldtopsize) { @@ -2984,6 +2987,7 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size; csz = oldtopsize; } #endif +#endif
MALLOC_ZERO(mem, csz - SIZE_SZ); return mem;

Hi Przemyslaw,
On 23 February 2015 at 10:16, Przemyslaw Marczak p.marczak@samsung.com wrote:
This commit introduces new config: CONFIG_SYS_MALLOC_INIT_DO_ZEROING.
Minor nit: CONFIG_SYS_MALLOC_CLEAR_ON_INIT might be better.
This config is an expert option and is enabled by default.
The all amount of memory reserved for the malloc, is by default set to zero in mem_malloc_init(). When the malloc reserved memory exceeds few MiB, then the boot process can slow down.
So disabling this config, is an expert option to reduce the boot time, and can be disabled by Kconfig.
Note: After disable this option, only calloc() will return the pointer to the zeroed memory area. Previously, without this option, the memory pointed to untouched malloc memory region, was filled with zeros. So it means, that code with malloc() calls should be reexamined.
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com
Reviewed-by: Simon Glass sjg@chromium.org
Changes v3:
- squash the commit with the Kconfig option
Changes v4:
- adjust commit message for the new config
Kconfig | 32 +++++++++++++++++++++++++------- common/dlmalloc.c | 10 +++++++--- 2 files changed, 32 insertions(+), 10 deletions(-)
diff --git a/Kconfig b/Kconfig index 75bab7f..9ea99b5 100644 --- a/Kconfig +++ b/Kconfig @@ -76,13 +76,31 @@ config SYS_MALLOC_F_LEN initial serial device and any others that are needed.
menuconfig EXPERT
bool "Configure standard U-Boot features (expert users)"
help
This option allows certain base U-Boot options and settings
to be disabled or tweaked. This is for specialized
environments which can tolerate a "non-standard" U-Boot.
Only use this if you really know what you are doing.
bool "Configure standard U-Boot features (expert users)"
default y
help
This option allows certain base U-Boot options and settings
to be disabled or tweaked. This is for specialized
environments which can tolerate a "non-standard" U-Boot.
Only use this if you really know what you are doing.
+if EXPERT
config SYS_MALLOC_INIT_DO_ZEROING
bool "Init with zeros the memory reserved for malloc (slow)"
default y
help
This setting is enabled by default. The reserved malloc
memory is initialized with zeros, so first malloc calls
will return the pointer to the zeroed memory. But this
slows the boot time.
It is recommended to disable it, when CONFIG_SYS_MALLOC_LEN
value, has more than few MiB, e.g. when uses bzip2 or bmp logo.
Then the boot time can be significantly reduced.
Warning:
When disabling this, please check if malloc calls, maybe
should be replaced by calloc - if expects zeroed memory.
+endif endmenu # General setup
menu "Boot images" diff --git a/common/dlmalloc.c b/common/dlmalloc.c index 6453ee9..21e103b 100644 --- a/common/dlmalloc.c +++ b/common/dlmalloc.c @@ -1535,9 +1535,9 @@ void mem_malloc_init(ulong start, ulong size)
debug("using memory %#lx-%#lx for malloc()\n", mem_malloc_start, mem_malloc_end);
memset((void *)mem_malloc_start, 0, size);
+#ifdef CONFIG_SYS_MALLOC_INIT_DO_ZEROING
memset((void *)mem_malloc_start, 0x0, size);
+#endif malloc_bin_reloc(); }
@@ -2948,10 +2948,12 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size;
/* check if expand_top called, in which case don't need to clear */ +#ifdef CONFIG_SYS_MALLOC_INIT_DO_ZEROING #if MORECORE_CLEARS mchunkptr oldtop = top; INTERNAL_SIZE_T oldtopsize = chunksize(top); #endif +#endif Void_t* mem = mALLOc (sz);
if ((long)n < 0) return NULL; @@ -2977,6 +2979,7 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size;
csz = chunksize(p);
+#ifdef CONFIG_SYS_MALLOC_INIT_DO_ZEROING #if MORECORE_CLEARS if (p == oldtop && csz > oldtopsize) { @@ -2984,6 +2987,7 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size; csz = oldtopsize; } #endif +#endif
MALLOC_ZERO(mem, csz - SIZE_SZ); return mem;
-- 1.9.1

Hello Simon,
On 02/23/2015 06:38 PM, Simon Glass wrote:
Hi Przemyslaw,
On 23 February 2015 at 10:16, Przemyslaw Marczak p.marczak@samsung.com wrote:
This commit introduces new config: CONFIG_SYS_MALLOC_INIT_DO_ZEROING.
Minor nit: CONFIG_SYS_MALLOC_CLEAR_ON_INIT might be better.
...snip...
The config name is updated and I get XU3 for test so I disabled the memset also for this board in V5.
Best regards,

Reduce the boot time of Trats2 by disabling the memset at malloc init.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry: - ~464ms - before this change (arch memset enabled for .bss clear) - ~341ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com
--- Changes V3 - none
Changes V4 - trats2_defconfig: remove CONFIG_EXPERT - trats2_defconfig: disable CONFIG_SYS_MALLOC_INIT_DO_ZEROING --- configs/trats2_defconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/configs/trats2_defconfig b/configs/trats2_defconfig index 1b98b73..e6d2062 100644 --- a/configs/trats2_defconfig +++ b/configs/trats2_defconfig @@ -3,3 +3,4 @@ CONFIG_ARCH_EXYNOS=y CONFIG_TARGET_TRATS2=y CONFIG_OF_CONTROL=y CONFIG_DEFAULT_DEVICE_TREE="exynos4412-trats2" +# CONFIG_SYS_MALLOC_INIT_DO_ZEROING is not set

Reduce the boot time of Odroid X2/U3 by disabling the memset at malloc init.
This was tested on Odroid X2. A quick test with checking gpio pin state using the oscilloscope. Boot time from start to bootcmd (change gpio state by memory write command): - ~228ms - before this change (arch memset enabled for .bss clear) - ~100ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com
--- Changes V3 - update commit head
Changes V4 - odroid_defconfig: remove CONFIG_EXPERT - odroid_defconfig: disable CONFIG_SYS_MALLOC_INIT_DO_ZEROING --- configs/odroid_defconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/configs/odroid_defconfig b/configs/odroid_defconfig index a842837..592232f 100644 --- a/configs/odroid_defconfig +++ b/configs/odroid_defconfig @@ -3,3 +3,4 @@ CONFIG_ARCH_EXYNOS=y CONFIG_TARGET_ODROID=y CONFIG_OF_CONTROL=y CONFIG_DEFAULT_DEVICE_TREE="exynos4412-odroid" +# CONFIG_SYS_MALLOC_INIT_DO_ZEROING is not set

This patchset reduces the boot time for ARM architecture, Exynos boards, and (ARM) boards with DFU enabled.
For tested Trats2 and Odroid X2/XU3 devices, this was done in four steps.
1. Enable the arch memcpy and memset - ARCH specific
2. Enable arch memset for .bss clear - ARCH specific
3. Reduce the .bss section small as possible by: (board specific) - remove static dfu buffer (32MiB in .bss - Trats2), and use malloc
4. Skip zeroing the memory reserved for malloc at malloc init. For Trats2 it was 80MiB of memory.
The .bss section will grow if we have a lot of static variables. This section is cleared before jump to the relocated U-Boot, and it was done word by word. To reduce the time for this step, we can enable arch memset, which uses multiple ARM registers.
For configs with DFU enabled, we can find the dfu buffer in this section, which has at least 8MB (32MB for Trats2). This is a lot of useless data, which is not required for standard boot. So this buffer should be dynamic allocated.
So, actually the all was about unnecessary operations on 'big' data.
Przemyslaw Marczak (7): exynos: config: enable arch memcpy and arch memset arm: relocation: clear .bss section with arch memset if defined dfu: mmc: file buffer: remove static allocation dlmalloc: do memset in malloc init as new default config trats2: defconfig: disable memset at malloc init odroid: defconfig: disable memset at malloc init odroid-xu3: defconfig: disable memset at malloc init
Kconfig | 32 +++++++++++++++++++++++++------- arch/arm/lib/crt0.S | 10 +++++++++- common/dlmalloc.c | 10 +++++++--- configs/odroid-xu3_defconfig | 2 ++ configs/odroid_defconfig | 1 + configs/trats2_defconfig | 1 + drivers/dfu/dfu_mmc.c | 25 ++++++++++++++++++++++--- include/configs/exynos-common.h | 3 +++ 8 files changed, 70 insertions(+), 14 deletions(-)

This commit enables the following configs: - CONFIG_USE_ARCH_MEMCPY - CONFIG_USE_ARCH_MEMSET This increases the performance of memcpy/memset and also reduces the boot time.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry: - ~1527ms - before this change (arch memset enabled for .bss clear) - ~1384ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org Cc: Minkyu Kang mk7.kang@samsung.com Cc: Akshay Saraswat akshay.s@samsung.com Cc: Simon Glass sjg@chromium.org Cc: Sjoerd Simons sjoerd.simons@collabora.co.uk
--- Changes V3, V4, V5 - none --- include/configs/exynos-common.h | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/include/configs/exynos-common.h b/include/configs/exynos-common.h index 59676ae..87f8db0 100644 --- a/include/configs/exynos-common.h +++ b/include/configs/exynos-common.h @@ -24,6 +24,9 @@ #define CONFIG_SKIP_LOWLEVEL_INIT #define CONFIG_BOARD_EARLY_INIT_F
+#define CONFIG_USE_ARCH_MEMCPY +#define CONFIG_USE_ARCH_MEMSET + /* Keep L2 Cache Disabled */ #define CONFIG_CMD_CACHE

Hi Przemyslaw,
This commit enables the following configs:
- CONFIG_USE_ARCH_MEMCPY
- CONFIG_USE_ARCH_MEMSET
This increases the performance of memcpy/memset and also reduces the boot time.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry:
- ~1527ms - before this change (arch memset enabled for .bss clear)
- ~1384ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org Cc: Minkyu Kang mk7.kang@samsung.com Cc: Akshay Saraswat akshay.s@samsung.com Cc: Simon Glass sjg@chromium.org Cc: Sjoerd Simons sjoerd.simons@collabora.co.uk
Changes V3, V4, V5
- none
include/configs/exynos-common.h | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/include/configs/exynos-common.h b/include/configs/exynos-common.h index 59676ae..87f8db0 100644 --- a/include/configs/exynos-common.h +++ b/include/configs/exynos-common.h @@ -24,6 +24,9 @@ #define CONFIG_SKIP_LOWLEVEL_INIT #define CONFIG_BOARD_EARLY_INIT_F
+#define CONFIG_USE_ARCH_MEMCPY +#define CONFIG_USE_ARCH_MEMSET
/* Keep L2 Cache Disabled */ #define CONFIG_CMD_CACHE
Acked-by: Lukasz Majewski l.majewski@samsung.com

For ARM architecture, enable the CONFIG_USE_ARCH_MEMSET/MEMCPY, will highly increase the memset/memcpy performance. This is able thanks to the ARM multiple register instructions.
Unfortunatelly the relocation is done without the cache enabled, so it takes some time, but zeroing the BSS memory takes much more longer, especially for the configs with big static buffers.
A quick test confirms, that the boot time improvement after using the arch memcpy for relocation has no significant meaning. The same test confirms that enable the memset for zeroing BSS, reduces the boot time.
So this patch enables the arch memset for zeroing the BSS after the relocation process. For ARM boards, this can be enabled in board configs by defining: 'CONFIG_USE_ARCH_MEMSET'.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry: - ~1384ms - before this change - ~888ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org Cc: Albert Aribaud albert.u.boot@aribaud.net Cc: Tom Rini trini@ti.com --- Changes V3, V4, V5 - none --- arch/arm/lib/crt0.S | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S index 22df3e5..fab3d2c 100644 --- a/arch/arm/lib/crt0.S +++ b/arch/arm/lib/crt0.S @@ -115,14 +115,22 @@ here: bl c_runtime_cpu_setup /* we still call old routine here */
ldr r0, =__bss_start /* this is auto-relocated! */ - ldr r1, =__bss_end /* this is auto-relocated! */
+#ifdef CONFIG_USE_ARCH_MEMSET + ldr r3, =__bss_end /* this is auto-relocated! */ + mov r1, #0x00000000 /* prepare zero to clear BSS */ + + subs r2, r3, r0 /* r2 = memset len */ + bl memset +#else + ldr r1, =__bss_end /* this is auto-relocated! */ mov r2, #0x00000000 /* prepare zero to clear BSS */
clbss_l:cmp r0, r1 /* while not at end of BSS */ strlo r2, [r0] /* clear 32-bit BSS word */ addlo r0, r0, #4 /* move to next */ blo clbss_l +#endif
bl coloured_LED_init bl red_led_on

Hi Przemyslaw,
For ARM architecture, enable the CONFIG_USE_ARCH_MEMSET/MEMCPY, will highly increase the memset/memcpy performance. This is able thanks to the ARM multiple register instructions.
Unfortunatelly the relocation is done without the cache enabled, so it takes some time, but zeroing the BSS memory takes much more longer, especially for the configs with big static buffers.
A quick test confirms, that the boot time improvement after using the arch memcpy for relocation has no significant meaning. The same test confirms that enable the memset for zeroing BSS, reduces the boot time.
So this patch enables the arch memset for zeroing the BSS after the relocation process. For ARM boards, this can be enabled in board configs by defining: 'CONFIG_USE_ARCH_MEMSET'.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry:
- ~1384ms - before this change
- ~888ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org Cc: Albert Aribaud albert.u.boot@aribaud.net Cc: Tom Rini trini@ti.com
Changes V3, V4, V5
- none
arch/arm/lib/crt0.S | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S index 22df3e5..fab3d2c 100644 --- a/arch/arm/lib/crt0.S +++ b/arch/arm/lib/crt0.S @@ -115,14 +115,22 @@ here: bl c_runtime_cpu_setup /* we still call old routine here */ ldr r0, =__bss_start /* this is auto-relocated! */
- ldr r1, =__bss_end /* this is
auto-relocated! */ +#ifdef CONFIG_USE_ARCH_MEMSET
- ldr r3, =__bss_end /* this is
auto-relocated! */
- mov r1, #0x00000000 /* prepare zero to
clear BSS */ +
- subs r2, r3, r0 /* r2 = memset len */
- bl memset
+#else
- ldr r1, =__bss_end /* this is
auto-relocated! */ mov r2, #0x00000000 /* prepare zero to clear BSS */ clbss_l:cmp r0, r1 /* while not at end of BSS */ strlo r2, [r0] /* clear 32-bit BSS word */ addlo r0, r0, #4 /* move to next */ blo clbss_l +#endif
bl coloured_LED_init bl red_led_on
Acked-by: Lukasz Majewski l.majewski@samsung.com

For writing files, DFU implementation requires the file buffer with the len at least of file size. For big files it requires the same big buffer.
Previously the file buffer was allocated as a static variable, so it was a part of U-Boot .bss section. For 32MiB len of buffer we have 32MiB of additional space, required for this section.
The .bss needs to be cleared after the relocation. This introduces an additional boot delay at every start, but usually the dfu feature is not required at the standard boot, so the buffer should be allocated only if required.
This patch removes the static allocation of this buffer, and alloc it with memalign after first call of function: - dfu_fill_entity_mmc() and the buffer is freed on dfu_free_entity() call.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry: - ~888ms - before this change (arch memset enabled for .bss clear) - ~464ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org Cc: Lukasz Majewski l.majewski@samsung.com Cc: Stephen Warren swarren@nvidia.com Cc: Pantelis Antoniou panto@antoniou-consulting.com Cc: Tom Rini trini@ti.com Cc: Marek Vasut marek.vasut@gmail.com
--- Changes V3, V4, V5 - none --- drivers/dfu/dfu_mmc.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/drivers/dfu/dfu_mmc.c b/drivers/dfu/dfu_mmc.c index 62d72fe..fd865e1 100644 --- a/drivers/dfu/dfu_mmc.c +++ b/drivers/dfu/dfu_mmc.c @@ -16,8 +16,7 @@ #include <fat.h> #include <mmc.h>
-static unsigned char __aligned(CONFIG_SYS_CACHELINE_SIZE) - dfu_file_buf[CONFIG_SYS_DFU_MAX_FILE_SIZE]; +static unsigned char *dfu_file_buf; static long dfu_file_buf_len;
static int mmc_access_part(struct dfu_entity *dfu, struct mmc *mmc, int part) @@ -211,7 +210,7 @@ int dfu_flush_medium_mmc(struct dfu_entity *dfu)
if (dfu->layout != DFU_RAW_ADDR) { /* Do stuff here. */ - ret = mmc_file_op(DFU_OP_WRITE, dfu, &dfu_file_buf, + ret = mmc_file_op(DFU_OP_WRITE, dfu, dfu_file_buf, &dfu_file_buf_len);
/* Now that we're done */ @@ -263,6 +262,14 @@ int dfu_read_medium_mmc(struct dfu_entity *dfu, u64 offset, void *buf, return ret; }
+void dfu_free_entity_mmc(struct dfu_entity *dfu) +{ + if (dfu_file_buf) { + free(dfu_file_buf); + dfu_file_buf = NULL; + } +} + /* * @param s Parameter string containing space-separated arguments: * 1st: @@ -370,6 +377,18 @@ int dfu_fill_entity_mmc(struct dfu_entity *dfu, char *devstr, char *s) dfu->write_medium = dfu_write_medium_mmc; dfu->flush_medium = dfu_flush_medium_mmc; dfu->inited = 0; + dfu->free_entity = dfu_free_entity_mmc; + + /* Check if file buffer is ready */ + if (!dfu_file_buf) { + dfu_file_buf = memalign(CONFIG_SYS_CACHELINE_SIZE, + CONFIG_SYS_DFU_MAX_FILE_SIZE); + if (!dfu_file_buf) { + error("Could not memalign 0x%x bytes", + CONFIG_SYS_DFU_MAX_FILE_SIZE); + return -ENOMEM; + } + }
return 0; }

Hi Przemyslaw,
For writing files, DFU implementation requires the file buffer with the len at least of file size. For big files it requires the same big buffer.
Previously the file buffer was allocated as a static variable, so it was a part of U-Boot .bss section. For 32MiB len of buffer we have 32MiB of additional space, required for this section.
The .bss needs to be cleared after the relocation. This introduces an additional boot delay at every start, but usually the dfu feature is not required at the standard boot, so the buffer should be allocated only if required.
This patch removes the static allocation of this buffer, and alloc it with memalign after first call of function:
- dfu_fill_entity_mmc()
and the buffer is freed on dfu_free_entity() call.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry:
- ~888ms - before this change (arch memset enabled for .bss clear)
- ~464ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org Cc: Lukasz Majewski l.majewski@samsung.com Cc: Stephen Warren swarren@nvidia.com Cc: Pantelis Antoniou panto@antoniou-consulting.com Cc: Tom Rini trini@ti.com Cc: Marek Vasut marek.vasut@gmail.com
Changes V3, V4, V5
- none
drivers/dfu/dfu_mmc.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/drivers/dfu/dfu_mmc.c b/drivers/dfu/dfu_mmc.c index 62d72fe..fd865e1 100644 --- a/drivers/dfu/dfu_mmc.c +++ b/drivers/dfu/dfu_mmc.c @@ -16,8 +16,7 @@ #include <fat.h> #include <mmc.h>
-static unsigned char __aligned(CONFIG_SYS_CACHELINE_SIZE)
dfu_file_buf[CONFIG_SYS_DFU_MAX_FILE_SIZE]; +static unsigned char *dfu_file_buf; static long dfu_file_buf_len;
static int mmc_access_part(struct dfu_entity *dfu, struct mmc *mmc, int part) @@ -211,7 +210,7 @@ int dfu_flush_medium_mmc(struct dfu_entity *dfu) if (dfu->layout != DFU_RAW_ADDR) { /* Do stuff here. */
ret = mmc_file_op(DFU_OP_WRITE, dfu, &dfu_file_buf,
ret = mmc_file_op(DFU_OP_WRITE, dfu, dfu_file_buf, &dfu_file_buf_len);
/* Now that we're done */
@@ -263,6 +262,14 @@ int dfu_read_medium_mmc(struct dfu_entity *dfu, u64 offset, void *buf, return ret; }
+void dfu_free_entity_mmc(struct dfu_entity *dfu) +{
- if (dfu_file_buf) {
free(dfu_file_buf);
dfu_file_buf = NULL;
- }
+}
/*
- @param s Parameter string containing space-separated arguments:
- 1st:
@@ -370,6 +377,18 @@ int dfu_fill_entity_mmc(struct dfu_entity *dfu, char *devstr, char *s) dfu->write_medium = dfu_write_medium_mmc; dfu->flush_medium = dfu_flush_medium_mmc; dfu->inited = 0;
- dfu->free_entity = dfu_free_entity_mmc;
- /* Check if file buffer is ready */
- if (!dfu_file_buf) {
dfu_file_buf = memalign(CONFIG_SYS_CACHELINE_SIZE,
CONFIG_SYS_DFU_MAX_FILE_SIZE);
if (!dfu_file_buf) {
error("Could not memalign 0x%x bytes",
CONFIG_SYS_DFU_MAX_FILE_SIZE);
return -ENOMEM;
}
}
return 0;
}
Acked-by: Lukasz Majewski l.majewski@samsung.com

This commit introduces new config: CONFIG_SYS_MALLOC_CLEAR_ON_INIT.
This config is an expert option and is enabled by default.
The all amount of memory reserved for the malloc, is by default set to zero in mem_malloc_init(). When the malloc reserved memory exceeds few MiB, then the boot process can slow down.
So disabling this config, is an expert option to reduce the boot time, and can be disabled by Kconfig.
Note: After disable this option, only calloc() will return the pointer to the zeroed memory area. Previously, without this option, the memory pointed to untouched malloc memory region, was filled with zeros. So it means, that code with malloc() calls should be reexamined.
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org
--- Changes v3: - squash the commit with the Kconfig option
Changes v4: - adjust commit message for the new config
Changes v5: - change config name to CONFIG_SYS_MALLOC_CLEAR_ON_INIT --- Kconfig | 32 +++++++++++++++++++++++++------- common/dlmalloc.c | 10 +++++++--- 2 files changed, 32 insertions(+), 10 deletions(-)
diff --git a/Kconfig b/Kconfig index 75bab7f..d6c75d5 100644 --- a/Kconfig +++ b/Kconfig @@ -76,13 +76,31 @@ config SYS_MALLOC_F_LEN initial serial device and any others that are needed.
menuconfig EXPERT - bool "Configure standard U-Boot features (expert users)" - help - This option allows certain base U-Boot options and settings - to be disabled or tweaked. This is for specialized - environments which can tolerate a "non-standard" U-Boot. - Only use this if you really know what you are doing. - + bool "Configure standard U-Boot features (expert users)" + default y + help + This option allows certain base U-Boot options and settings + to be disabled or tweaked. This is for specialized + environments which can tolerate a "non-standard" U-Boot. + Only use this if you really know what you are doing. + +if EXPERT + config SYS_MALLOC_CLEAR_ON_INIT + bool "Init with zeros the memory reserved for malloc (slow)" + default y + help + This setting is enabled by default. The reserved malloc + memory is initialized with zeros, so first malloc calls + will return the pointer to the zeroed memory. But this + slows the boot time. + + It is recommended to disable it, when CONFIG_SYS_MALLOC_LEN + value, has more than few MiB, e.g. when uses bzip2 or bmp logo. + Then the boot time can be significantly reduced. + Warning: + When disabling this, please check if malloc calls, maybe + should be replaced by calloc - if expects zeroed memory. +endif endmenu # General setup
menu "Boot images" diff --git a/common/dlmalloc.c b/common/dlmalloc.c index 6453ee9..b2ce063 100644 --- a/common/dlmalloc.c +++ b/common/dlmalloc.c @@ -1535,9 +1535,9 @@ void mem_malloc_init(ulong start, ulong size)
debug("using memory %#lx-%#lx for malloc()\n", mem_malloc_start, mem_malloc_end); - - memset((void *)mem_malloc_start, 0, size); - +#ifdef CONFIG_SYS_MALLOC_CLEAR_ON_INIT + memset((void *)mem_malloc_start, 0x0, size); +#endif malloc_bin_reloc(); }
@@ -2948,10 +2948,12 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size;
/* check if expand_top called, in which case don't need to clear */ +#ifdef CONFIG_SYS_MALLOC_CLEAR_ON_INIT #if MORECORE_CLEARS mchunkptr oldtop = top; INTERNAL_SIZE_T oldtopsize = chunksize(top); #endif +#endif Void_t* mem = mALLOc (sz);
if ((long)n < 0) return NULL; @@ -2977,6 +2979,7 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size;
csz = chunksize(p);
+#ifdef CONFIG_SYS_MALLOC_CLEAR_ON_INIT #if MORECORE_CLEARS if (p == oldtop && csz > oldtopsize) { @@ -2984,6 +2987,7 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size; csz = oldtopsize; } #endif +#endif
MALLOC_ZERO(mem, csz - SIZE_SZ); return mem;

Hi Przemyslaw,
This commit introduces new config: CONFIG_SYS_MALLOC_CLEAR_ON_INIT.
This config is an expert option and is enabled by default.
The all amount of memory reserved for the malloc, is by default set to zero in mem_malloc_init(). When the malloc reserved memory exceeds few MiB, then the boot process can slow down.
So disabling this config, is an expert option to reduce the boot time, and can be disabled by Kconfig.
Note: After disable this option, only calloc() will return the pointer to the zeroed memory area. Previously, without this option, the memory pointed to untouched malloc memory region, was filled with zeros. So it means, that code with malloc() calls should be reexamined.
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org
Changes v3:
- squash the commit with the Kconfig option
Changes v4:
- adjust commit message for the new config
Changes v5:
- change config name to CONFIG_SYS_MALLOC_CLEAR_ON_INIT
Kconfig | 32 +++++++++++++++++++++++++------- common/dlmalloc.c | 10 +++++++--- 2 files changed, 32 insertions(+), 10 deletions(-)
diff --git a/Kconfig b/Kconfig index 75bab7f..d6c75d5 100644 --- a/Kconfig +++ b/Kconfig @@ -76,13 +76,31 @@ config SYS_MALLOC_F_LEN initial serial device and any others that are needed.
menuconfig EXPERT
bool "Configure standard U-Boot features (expert users)"
help
This option allows certain base U-Boot options and settings
to be disabled or tweaked. This is for specialized
environments which can tolerate a "non-standard" U-Boot.
Only use this if you really know what you are doing.
- bool "Configure standard U-Boot features (expert users)"
- default y
- help
This option allows certain base U-Boot options and settings
to be disabled or tweaked. This is for specialized
environments which can tolerate a "non-standard" U-Boot.
Only use this if you really know what you are doing.
+if EXPERT
- config SYS_MALLOC_CLEAR_ON_INIT
- bool "Init with zeros the memory reserved for malloc (slow)"
- default y
- help
This setting is enabled by default. The reserved malloc
memory is initialized with zeros, so first malloc calls
will return the pointer to the zeroed memory. But this
slows the boot time.
It is recommended to disable it, when CONFIG_SYS_MALLOC_LEN
value, has more than few MiB, e.g. when uses bzip2 or bmp
logo.
Then the boot time can be significantly reduced.
Warning:
When disabling this, please check if malloc calls, maybe
should be replaced by calloc - if expects zeroed memory.
+endif endmenu # General setup
menu "Boot images" diff --git a/common/dlmalloc.c b/common/dlmalloc.c index 6453ee9..b2ce063 100644 --- a/common/dlmalloc.c +++ b/common/dlmalloc.c @@ -1535,9 +1535,9 @@ void mem_malloc_init(ulong start, ulong size)
debug("using memory %#lx-%#lx for malloc()\n", mem_malloc_start, mem_malloc_end);
- memset((void *)mem_malloc_start, 0, size);
+#ifdef CONFIG_SYS_MALLOC_CLEAR_ON_INIT
- memset((void *)mem_malloc_start, 0x0, size);
+#endif malloc_bin_reloc(); }
@@ -2948,10 +2948,12 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size;
/* check if expand_top called, in which case don't need to clear */ +#ifdef CONFIG_SYS_MALLOC_CLEAR_ON_INIT #if MORECORE_CLEARS mchunkptr oldtop = top; INTERNAL_SIZE_T oldtopsize = chunksize(top); #endif +#endif Void_t* mem = mALLOc (sz);
if ((long)n < 0) return NULL; @@ -2977,6 +2979,7 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size; csz = chunksize(p);
+#ifdef CONFIG_SYS_MALLOC_CLEAR_ON_INIT #if MORECORE_CLEARS if (p == oldtop && csz > oldtopsize) { @@ -2984,6 +2987,7 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size; csz = oldtopsize; } #endif +#endif
MALLOC_ZERO(mem, csz - SIZE_SZ); return mem;
Acked-by: Lukasz Majewski l.majewski@samsung.com

Reduce the boot time of Trats2 by disabling the memset at malloc init.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry: - ~464ms - before this change (arch memset enabled for .bss clear) - ~341ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org
--- Changes V3 - none
Changes V4 - trats2_defconfig: remove CONFIG_EXPERT - trats2_defconfig: disable CONFIG_SYS_MALLOC_INIT_DO_ZEROING
Changes v5: - update disabled config name to CONFIG_SYS_MALLOC_CLEAR_ON_INIT --- configs/trats2_defconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/configs/trats2_defconfig b/configs/trats2_defconfig index 1b98b73..9359706 100644 --- a/configs/trats2_defconfig +++ b/configs/trats2_defconfig @@ -3,3 +3,4 @@ CONFIG_ARCH_EXYNOS=y CONFIG_TARGET_TRATS2=y CONFIG_OF_CONTROL=y CONFIG_DEFAULT_DEVICE_TREE="exynos4412-trats2" +# CONFIG_SYS_MALLOC_CLEAR_ON_INIT is not set

Hi Przemyslaw,
Reduce the boot time of Trats2 by disabling the memset at malloc init.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry:
- ~464ms - before this change (arch memset enabled for .bss clear)
- ~341ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org
Changes V3
- none
Changes V4
- trats2_defconfig: remove CONFIG_EXPERT
- trats2_defconfig: disable CONFIG_SYS_MALLOC_INIT_DO_ZEROING
Changes v5:
- update disabled config name to CONFIG_SYS_MALLOC_CLEAR_ON_INIT
configs/trats2_defconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/configs/trats2_defconfig b/configs/trats2_defconfig index 1b98b73..9359706 100644 --- a/configs/trats2_defconfig +++ b/configs/trats2_defconfig @@ -3,3 +3,4 @@ CONFIG_ARCH_EXYNOS=y CONFIG_TARGET_TRATS2=y CONFIG_OF_CONTROL=y CONFIG_DEFAULT_DEVICE_TREE="exynos4412-trats2" +# CONFIG_SYS_MALLOC_CLEAR_ON_INIT is not set
Acked-by: Lukasz Majewski l.majewski@samsung.com

Reduce the boot time of Odroid X2/U3 by disabling the memset at malloc init.
This was tested on Odroid X2. A quick test with checking gpio pin state using the oscilloscope. Boot time from start to bootcmd (change gpio state by memory write command): - ~228ms - before this change (arch memset enabled for .bss clear) - ~100ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org
--- Changes V3 - update commit head
Changes V4 - odroid_defconfig: remove CONFIG_EXPERT - odroid_defconfig: disable CONFIG_SYS_MALLOC_INIT_DO_ZEROING
Changes v5 - update disabled config name to CONFIG_SYS_MALLOC_CLEAR_ON_INIT --- configs/odroid_defconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/configs/odroid_defconfig b/configs/odroid_defconfig index a842837..aac9f5a 100644 --- a/configs/odroid_defconfig +++ b/configs/odroid_defconfig @@ -3,3 +3,4 @@ CONFIG_ARCH_EXYNOS=y CONFIG_TARGET_ODROID=y CONFIG_OF_CONTROL=y CONFIG_DEFAULT_DEVICE_TREE="exynos4412-odroid" +# CONFIG_SYS_MALLOC_CLEAR_ON_INIT is not set

Hi Przemyslaw,
Reduce the boot time of Odroid X2/U3 by disabling the memset at malloc init.
This was tested on Odroid X2. A quick test with checking gpio pin state using the oscilloscope. Boot time from start to bootcmd (change gpio state by memory write command):
- ~228ms - before this change (arch memset enabled for .bss clear)
- ~100ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org
Changes V3
- update commit head
Changes V4
- odroid_defconfig: remove CONFIG_EXPERT
- odroid_defconfig: disable CONFIG_SYS_MALLOC_INIT_DO_ZEROING
Changes v5
- update disabled config name to CONFIG_SYS_MALLOC_CLEAR_ON_INIT
configs/odroid_defconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/configs/odroid_defconfig b/configs/odroid_defconfig index a842837..aac9f5a 100644 --- a/configs/odroid_defconfig +++ b/configs/odroid_defconfig @@ -3,3 +3,4 @@ CONFIG_ARCH_EXYNOS=y CONFIG_TARGET_ODROID=y CONFIG_OF_CONTROL=y CONFIG_DEFAULT_DEVICE_TREE="exynos4412-odroid" +# CONFIG_SYS_MALLOC_CLEAR_ON_INIT is not set
Acked-by: Lukasz Majewski l.majewski@samsung.com

Reduce the boot time of Odroid XU3 by disabling the memset at malloc init.
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com --- Changes v5 - new commit
--- configs/odroid-xu3_defconfig | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/configs/odroid-xu3_defconfig b/configs/odroid-xu3_defconfig index 74aa0cf..0fb4623 100644 --- a/configs/odroid-xu3_defconfig +++ b/configs/odroid-xu3_defconfig @@ -2,3 +2,5 @@ CONFIG_ARM=y CONFIG_ARCH_EXYNOS=y CONFIG_TARGET_ODROID_XU3=y CONFIG_DEFAULT_DEVICE_TREE="exynos5422-odroidxu3" +# CONFIG_SYS_MALLOC_CLEAR_ON_INIT is not set +

Hi Przemyslaw,
Reduce the boot time of Odroid XU3 by disabling the memset at malloc init.
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com
Changes v5
- new commit
configs/odroid-xu3_defconfig | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/configs/odroid-xu3_defconfig b/configs/odroid-xu3_defconfig index 74aa0cf..0fb4623 100644 --- a/configs/odroid-xu3_defconfig +++ b/configs/odroid-xu3_defconfig @@ -2,3 +2,5 @@ CONFIG_ARM=y CONFIG_ARCH_EXYNOS=y CONFIG_TARGET_ODROID_XU3=y CONFIG_DEFAULT_DEVICE_TREE="exynos5422-odroidxu3" +# CONFIG_SYS_MALLOC_CLEAR_ON_INIT is not set
Acked-by: Lukasz Majewski l.majewski@samsung.com

Hello Tom,
On 02/24/2015 11:38 AM, Przemyslaw Marczak wrote:
This patchset reduces the boot time for ARM architecture, Exynos boards, and (ARM) boards with DFU enabled.
For tested Trats2 and Odroid X2/XU3 devices, this was done in four steps.
Enable the arch memcpy and memset - ARCH specific
Enable arch memset for .bss clear - ARCH specific
Reduce the .bss section small as possible by: (board specific)
- remove static dfu buffer (32MiB in .bss - Trats2), and use malloc
Skip zeroing the memory reserved for malloc at malloc init. For Trats2 it was 80MiB of memory.
The .bss section will grow if we have a lot of static variables. This section is cleared before jump to the relocated U-Boot, and it was done word by word. To reduce the time for this step, we can enable arch memset, which uses multiple ARM registers.
For configs with DFU enabled, we can find the dfu buffer in this section, which has at least 8MB (32MB for Trats2). This is a lot of useless data, which is not required for standard boot. So this buffer should be dynamic allocated.
So, actually the all was about unnecessary operations on 'big' data.
Przemyslaw Marczak (7): exynos: config: enable arch memcpy and arch memset arm: relocation: clear .bss section with arch memset if defined dfu: mmc: file buffer: remove static allocation dlmalloc: do memset in malloc init as new default config trats2: defconfig: disable memset at malloc init odroid: defconfig: disable memset at malloc init odroid-xu3: defconfig: disable memset at malloc init
Kconfig | 32 +++++++++++++++++++++++++------- arch/arm/lib/crt0.S | 10 +++++++++- common/dlmalloc.c | 10 +++++++--- configs/odroid-xu3_defconfig | 2 ++ configs/odroid_defconfig | 1 + configs/trats2_defconfig | 1 + drivers/dfu/dfu_mmc.c | 25 ++++++++++++++++++++++--- include/configs/exynos-common.h | 3 +++ 8 files changed, 70 insertions(+), 14 deletions(-)
If you will merge this patchset, please take also this one patch:
http://patchwork.ozlabs.org/patch/440623/
Best regards,

This patchset reduces the boot time for ARM architecture, Exynos boards, and (ARM) boards with DFU enabled.
For tested Trats2 and Odroid X2/XU3 devices, this was done in four steps.
1. Enable the arch memcpy and memset - ARCH specific
2. Enable arch memset for .bss clear - ARCH specific
3. Reduce the .bss section small as possible by: (board specific) - remove static dfu buffer (32MiB in .bss - Trats2), and use malloc
4. Skip zeroing the memory reserved for malloc at malloc init. For Trats2 it was 80MiB of memory.
The .bss section will grow if we have a lot of static variables. This section is cleared before jump to the relocated U-Boot, and it was done word by word. To reduce the time for this step, we can enable arch memset, which uses multiple ARM registers.
For configs with DFU enabled, we can find the dfu buffer in this section, which has at least 8MB (32MB for Trats2). This is a lot of useless data, which is not required for standard boot. So this buffer should be dynamic allocated.
Changes v6: This version fixes merge conflict on latest master, which is: "Prepare v2015.04-rc3" It also extends malloc pool, for some board configs.
Przemyslaw Marczak (10): exynos: config: enable arch memcpy and arch memset arm: relocation: clear .bss section with arch memset if defined dfu: mmc: file buffer: remove static allocation dlmalloc: do memset in malloc init as new default config trats2: defconfig: disable memset at malloc init odroid: defconfig: disable memset at malloc init odroid-xu3: defconfig: disable memset at malloc init zynq-common: increase malloc pool len by dfu mmc file buffer size ti-armv7-common: increase malloc pool len by dfu mmc file buffer size tegra-common: increase malloc pool len by dfu mmc file buffer size
Kconfig | 32 +++++++++++++++++++++++++------- arch/arm/lib/crt0.S | 10 +++++++++- common/dlmalloc.c | 10 +++++++--- configs/odroid-xu3_defconfig | 2 ++ configs/odroid_defconfig | 2 +- configs/trats2_defconfig | 1 + drivers/dfu/dfu_mmc.c | 25 ++++++++++++++++++++++--- include/configs/exynos-common.h | 3 +++ include/configs/tegra-common.h | 5 +++++ include/configs/ti_armv7_common.h | 6 +++++- include/configs/zynq-common.h | 2 +- 11 files changed, 81 insertions(+), 17 deletions(-)

This commit enables the following configs: - CONFIG_USE_ARCH_MEMCPY - CONFIG_USE_ARCH_MEMSET This increases the performance of memcpy/memset and also reduces the boot time.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry: - ~1527ms - before this change (arch memset enabled for .bss clear) - ~1384ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org Cc: Minkyu Kang mk7.kang@samsung.com Cc: Akshay Saraswat akshay.s@samsung.com Cc: Simon Glass sjg@chromium.org Cc: Sjoerd Simons sjoerd.simons@collabora.co.uk
--- Changes V3, V4, V5, V6 - none --- include/configs/exynos-common.h | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/include/configs/exynos-common.h b/include/configs/exynos-common.h index 59676ae..87f8db0 100644 --- a/include/configs/exynos-common.h +++ b/include/configs/exynos-common.h @@ -24,6 +24,9 @@ #define CONFIG_SKIP_LOWLEVEL_INIT #define CONFIG_BOARD_EARLY_INIT_F
+#define CONFIG_USE_ARCH_MEMCPY +#define CONFIG_USE_ARCH_MEMSET + /* Keep L2 Cache Disabled */ #define CONFIG_CMD_CACHE

On Wed, Mar 04, 2015 at 02:01:21PM +0100, Przemyslaw Marczak wrote:
This commit enables the following configs:
- CONFIG_USE_ARCH_MEMCPY
- CONFIG_USE_ARCH_MEMSET
This increases the performance of memcpy/memset and also reduces the boot time.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry:
- ~1527ms - before this change (arch memset enabled for .bss clear)
- ~1384ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org Cc: Minkyu Kang mk7.kang@samsung.com Cc: Akshay Saraswat akshay.s@samsung.com Cc: Simon Glass sjg@chromium.org Cc: Sjoerd Simons sjoerd.simons@collabora.co.uk
Applied to u-boot/master, thanks!

For ARM architecture, enable the CONFIG_USE_ARCH_MEMSET/MEMCPY, will highly increase the memset/memcpy performance. This is able thanks to the ARM multiple register instructions.
Unfortunatelly the relocation is done without the cache enabled, so it takes some time, but zeroing the BSS memory takes much more longer, especially for the configs with big static buffers.
A quick test confirms, that the boot time improvement after using the arch memcpy for relocation has no significant meaning. The same test confirms that enable the memset for zeroing BSS, reduces the boot time.
So this patch enables the arch memset for zeroing the BSS after the relocation process. For ARM boards, this can be enabled in board configs by defining: 'CONFIG_USE_ARCH_MEMSET'.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry: - ~1384ms - before this change - ~888ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org Cc: Albert Aribaud albert.u.boot@aribaud.net Cc: Tom Rini trini@konsulko.com --- Changes V3, V4, V5, V6 - none --- arch/arm/lib/crt0.S | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S index 22df3e5..fab3d2c 100644 --- a/arch/arm/lib/crt0.S +++ b/arch/arm/lib/crt0.S @@ -115,14 +115,22 @@ here: bl c_runtime_cpu_setup /* we still call old routine here */
ldr r0, =__bss_start /* this is auto-relocated! */ - ldr r1, =__bss_end /* this is auto-relocated! */
+#ifdef CONFIG_USE_ARCH_MEMSET + ldr r3, =__bss_end /* this is auto-relocated! */ + mov r1, #0x00000000 /* prepare zero to clear BSS */ + + subs r2, r3, r0 /* r2 = memset len */ + bl memset +#else + ldr r1, =__bss_end /* this is auto-relocated! */ mov r2, #0x00000000 /* prepare zero to clear BSS */
clbss_l:cmp r0, r1 /* while not at end of BSS */ strlo r2, [r0] /* clear 32-bit BSS word */ addlo r0, r0, #4 /* move to next */ blo clbss_l +#endif
bl coloured_LED_init bl red_led_on

On Wed, Mar 04, 2015 at 02:01:22PM +0100, Przemyslaw Marczak wrote:
For ARM architecture, enable the CONFIG_USE_ARCH_MEMSET/MEMCPY, will highly increase the memset/memcpy performance. This is able thanks to the ARM multiple register instructions.
Unfortunatelly the relocation is done without the cache enabled, so it takes some time, but zeroing the BSS memory takes much more longer, especially for the configs with big static buffers.
A quick test confirms, that the boot time improvement after using the arch memcpy for relocation has no significant meaning. The same test confirms that enable the memset for zeroing BSS, reduces the boot time.
So this patch enables the arch memset for zeroing the BSS after the relocation process. For ARM boards, this can be enabled in board configs by defining: 'CONFIG_USE_ARCH_MEMSET'.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry:
- ~1384ms - before this change
- ~888ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org Cc: Albert Aribaud albert.u.boot@aribaud.net Cc: Tom Rini trini@konsulko.com
Applied to u-boot/master, thanks!

For writing files, DFU implementation requires the file buffer with the len at least of file size. For big files it requires the same big buffer.
Previously the file buffer was allocated as a static variable, so it was a part of U-Boot .bss section. For 32MiB len of buffer we have 32MiB of additional space, required for this section.
The .bss needs to be cleared after the relocation. This introduces an additional boot delay at every start, but usually the dfu feature is not required at the standard boot, so the buffer should be allocated only if required.
This patch removes the static allocation of this buffer, and alloc it with memalign after first call of function: - dfu_fill_entity_mmc() and the buffer is freed on dfu_free_entity() call.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry: - ~888ms - before this change (arch memset enabled for .bss clear) - ~464ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org Cc: Lukasz Majewski l.majewski@samsung.com Cc: Stephen Warren swarren@nvidia.com Cc: Pantelis Antoniou panto@antoniou-consulting.com Cc: Tom Rini trini@konsulko.com Cc: Marek Vasut marek.vasut@gmail.com
--- Changes V3, V4, V5, V6 - none --- drivers/dfu/dfu_mmc.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/drivers/dfu/dfu_mmc.c b/drivers/dfu/dfu_mmc.c index 62d72fe..fd865e1 100644 --- a/drivers/dfu/dfu_mmc.c +++ b/drivers/dfu/dfu_mmc.c @@ -16,8 +16,7 @@ #include <fat.h> #include <mmc.h>
-static unsigned char __aligned(CONFIG_SYS_CACHELINE_SIZE) - dfu_file_buf[CONFIG_SYS_DFU_MAX_FILE_SIZE]; +static unsigned char *dfu_file_buf; static long dfu_file_buf_len;
static int mmc_access_part(struct dfu_entity *dfu, struct mmc *mmc, int part) @@ -211,7 +210,7 @@ int dfu_flush_medium_mmc(struct dfu_entity *dfu)
if (dfu->layout != DFU_RAW_ADDR) { /* Do stuff here. */ - ret = mmc_file_op(DFU_OP_WRITE, dfu, &dfu_file_buf, + ret = mmc_file_op(DFU_OP_WRITE, dfu, dfu_file_buf, &dfu_file_buf_len);
/* Now that we're done */ @@ -263,6 +262,14 @@ int dfu_read_medium_mmc(struct dfu_entity *dfu, u64 offset, void *buf, return ret; }
+void dfu_free_entity_mmc(struct dfu_entity *dfu) +{ + if (dfu_file_buf) { + free(dfu_file_buf); + dfu_file_buf = NULL; + } +} + /* * @param s Parameter string containing space-separated arguments: * 1st: @@ -370,6 +377,18 @@ int dfu_fill_entity_mmc(struct dfu_entity *dfu, char *devstr, char *s) dfu->write_medium = dfu_write_medium_mmc; dfu->flush_medium = dfu_flush_medium_mmc; dfu->inited = 0; + dfu->free_entity = dfu_free_entity_mmc; + + /* Check if file buffer is ready */ + if (!dfu_file_buf) { + dfu_file_buf = memalign(CONFIG_SYS_CACHELINE_SIZE, + CONFIG_SYS_DFU_MAX_FILE_SIZE); + if (!dfu_file_buf) { + error("Could not memalign 0x%x bytes", + CONFIG_SYS_DFU_MAX_FILE_SIZE); + return -ENOMEM; + } + }
return 0; }

On Wed, Mar 04, 2015 at 02:01:23PM +0100, Przemyslaw Marczak wrote:
For writing files, DFU implementation requires the file buffer with the len at least of file size. For big files it requires the same big buffer.
Previously the file buffer was allocated as a static variable, so it was a part of U-Boot .bss section. For 32MiB len of buffer we have 32MiB of additional space, required for this section.
The .bss needs to be cleared after the relocation. This introduces an additional boot delay at every start, but usually the dfu feature is not required at the standard boot, so the buffer should be allocated only if required.
This patch removes the static allocation of this buffer, and alloc it with memalign after first call of function:
- dfu_fill_entity_mmc()
and the buffer is freed on dfu_free_entity() call.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry:
- ~888ms - before this change (arch memset enabled for .bss clear)
- ~464ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org Cc: Lukasz Majewski l.majewski@samsung.com Cc: Stephen Warren swarren@nvidia.com Cc: Pantelis Antoniou panto@antoniou-consulting.com Cc: Tom Rini trini@konsulko.com Cc: Marek Vasut marek.vasut@gmail.com
Applied to u-boot/master, thanks!

This commit introduces new config: CONFIG_SYS_MALLOC_CLEAR_ON_INIT.
This config is an expert option and is enabled by default.
The all amount of memory reserved for the malloc, is by default set to zero in mem_malloc_init(). When the malloc reserved memory exceeds few MiB, then the boot process can slow down.
So disabling this config, is an expert option to reduce the boot time, and can be disabled by Kconfig.
Note: After disable this option, only calloc() will return the pointer to the zeroed memory area. Previously, without this option, the memory pointed to untouched malloc memory region, was filled with zeros. So it means, that code with malloc() calls should be reexamined.
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org
--- Changes v3: - squash the commit with the Kconfig option
Changes v4: - adjust commit message for the new config
Changes v5: - change config name to CONFIG_SYS_MALLOC_CLEAR_ON_INIT
Changes v6: - none --- Kconfig | 32 +++++++++++++++++++++++++------- common/dlmalloc.c | 10 +++++++--- 2 files changed, 32 insertions(+), 10 deletions(-)
diff --git a/Kconfig b/Kconfig index 91a0618..4cbbfc2 100644 --- a/Kconfig +++ b/Kconfig @@ -72,13 +72,31 @@ config SYS_MALLOC_F_LEN initial serial device and any others that are needed.
menuconfig EXPERT - bool "Configure standard U-Boot features (expert users)" - help - This option allows certain base U-Boot options and settings - to be disabled or tweaked. This is for specialized - environments which can tolerate a "non-standard" U-Boot. - Only use this if you really know what you are doing. - + bool "Configure standard U-Boot features (expert users)" + default y + help + This option allows certain base U-Boot options and settings + to be disabled or tweaked. This is for specialized + environments which can tolerate a "non-standard" U-Boot. + Only use this if you really know what you are doing. + +if EXPERT + config SYS_MALLOC_CLEAR_ON_INIT + bool "Init with zeros the memory reserved for malloc (slow)" + default y + help + This setting is enabled by default. The reserved malloc + memory is initialized with zeros, so first malloc calls + will return the pointer to the zeroed memory. But this + slows the boot time. + + It is recommended to disable it, when CONFIG_SYS_MALLOC_LEN + value, has more than few MiB, e.g. when uses bzip2 or bmp logo. + Then the boot time can be significantly reduced. + Warning: + When disabling this, please check if malloc calls, maybe + should be replaced by calloc - if expects zeroed memory. +endif endmenu # General setup
menu "Boot images" diff --git a/common/dlmalloc.c b/common/dlmalloc.c index 6453ee9..b2ce063 100644 --- a/common/dlmalloc.c +++ b/common/dlmalloc.c @@ -1535,9 +1535,9 @@ void mem_malloc_init(ulong start, ulong size)
debug("using memory %#lx-%#lx for malloc()\n", mem_malloc_start, mem_malloc_end); - - memset((void *)mem_malloc_start, 0, size); - +#ifdef CONFIG_SYS_MALLOC_CLEAR_ON_INIT + memset((void *)mem_malloc_start, 0x0, size); +#endif malloc_bin_reloc(); }
@@ -2948,10 +2948,12 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size;
/* check if expand_top called, in which case don't need to clear */ +#ifdef CONFIG_SYS_MALLOC_CLEAR_ON_INIT #if MORECORE_CLEARS mchunkptr oldtop = top; INTERNAL_SIZE_T oldtopsize = chunksize(top); #endif +#endif Void_t* mem = mALLOc (sz);
if ((long)n < 0) return NULL; @@ -2977,6 +2979,7 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size;
csz = chunksize(p);
+#ifdef CONFIG_SYS_MALLOC_CLEAR_ON_INIT #if MORECORE_CLEARS if (p == oldtop && csz > oldtopsize) { @@ -2984,6 +2987,7 @@ Void_t* cALLOc(n, elem_size) size_t n; size_t elem_size; csz = oldtopsize; } #endif +#endif
MALLOC_ZERO(mem, csz - SIZE_SZ); return mem;

On Wed, Mar 04, 2015 at 02:01:24PM +0100, Przemyslaw Marczak wrote:
This commit introduces new config: CONFIG_SYS_MALLOC_CLEAR_ON_INIT.
This config is an expert option and is enabled by default.
The all amount of memory reserved for the malloc, is by default set to zero in mem_malloc_init(). When the malloc reserved memory exceeds few MiB, then the boot process can slow down.
So disabling this config, is an expert option to reduce the boot time, and can be disabled by Kconfig.
Note: After disable this option, only calloc() will return the pointer to the zeroed memory area. Previously, without this option, the memory pointed to untouched malloc memory region, was filled with zeros. So it means, that code with malloc() calls should be reexamined.
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org
Applied to u-boot/master, thanks!

Reduce the boot time of Trats2 by disabling the memset at malloc init.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry: - ~464ms - before this change (arch memset enabled for .bss clear) - ~341ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org
--- Changes V3 - none
Changes V4 - trats2_defconfig: remove CONFIG_EXPERT - trats2_defconfig: disable CONFIG_SYS_MALLOC_INIT_DO_ZEROING
Changes v5: - update disabled config name to CONFIG_SYS_MALLOC_CLEAR_ON_INIT
Changes v6: - none --- configs/trats2_defconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/configs/trats2_defconfig b/configs/trats2_defconfig index 1b98b73..9359706 100644 --- a/configs/trats2_defconfig +++ b/configs/trats2_defconfig @@ -3,3 +3,4 @@ CONFIG_ARCH_EXYNOS=y CONFIG_TARGET_TRATS2=y CONFIG_OF_CONTROL=y CONFIG_DEFAULT_DEVICE_TREE="exynos4412-trats2" +# CONFIG_SYS_MALLOC_CLEAR_ON_INIT is not set

On Wed, Mar 04, 2015 at 02:01:25PM +0100, Przemyslaw Marczak wrote:
Reduce the boot time of Trats2 by disabling the memset at malloc init.
This was tested on Trats2. A quick test with trace. Boot time from start to main_loop() entry:
- ~464ms - before this change (arch memset enabled for .bss clear)
- ~341ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org
Applied to u-boot/master, thanks!

Reduce the boot time of Odroid X2/U3 by disabling the memset at malloc init.
This was tested on Odroid X2. A quick test with checking gpio pin state using the oscilloscope. Boot time from start to bootcmd (change gpio state by memory write command): - ~228ms - before this change (arch memset enabled for .bss clear) - ~100ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org
--- Changes V3 - update commit head
Changes V4 - odroid_defconfig: remove CONFIG_EXPERT - odroid_defconfig: disable CONFIG_SYS_MALLOC_INIT_DO_ZEROING
Changes v5 - update disabled config name to CONFIG_SYS_MALLOC_CLEAR_ON_INIT
Changes v6: - fix merge conflict with master --- configs/odroid_defconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/configs/odroid_defconfig b/configs/odroid_defconfig index 816a3fa..cfb29e0 100644 --- a/configs/odroid_defconfig +++ b/configs/odroid_defconfig @@ -2,6 +2,6 @@ CONFIG_ARM=y CONFIG_ARCH_EXYNOS=y CONFIG_TARGET_ODROID=y CONFIG_OF_CONTROL=y -CONFIG_DEFAULT_DEVICE_TREE="exynos4412-odroid" CONFIG_DM_I2C=y CONFIG_DM_I2C_COMPAT=y +# CONFIG_SYS_MALLOC_CLEAR_ON_INIT is not set

On Wed, Mar 04, 2015 at 02:01:26PM +0100, Przemyslaw Marczak wrote:
Reduce the boot time of Odroid X2/U3 by disabling the memset at malloc init.
This was tested on Odroid X2. A quick test with checking gpio pin state using the oscilloscope. Boot time from start to bootcmd (change gpio state by memory write command):
- ~228ms - before this change (arch memset enabled for .bss clear)
- ~100ms - after this change
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Reviewed-by: Simon Glass sjg@chromium.org
Applied to u-boot/master, thanks!

Reduce the boot time of Odroid XU3 by disabling the memset at malloc init.
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com --- Changes v5 - new commit
Changes v6 - none
--- configs/odroid-xu3_defconfig | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/configs/odroid-xu3_defconfig b/configs/odroid-xu3_defconfig index 74aa0cf..0fb4623 100644 --- a/configs/odroid-xu3_defconfig +++ b/configs/odroid-xu3_defconfig @@ -2,3 +2,5 @@ CONFIG_ARM=y CONFIG_ARCH_EXYNOS=y CONFIG_TARGET_ODROID_XU3=y CONFIG_DEFAULT_DEVICE_TREE="exynos5422-odroidxu3" +# CONFIG_SYS_MALLOC_CLEAR_ON_INIT is not set +

On Wed, Mar 04, 2015 at 02:01:27PM +0100, Przemyslaw Marczak wrote:
Reduce the boot time of Odroid XU3 by disabling the memset at malloc init.
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com
Applied to u-boot/master, thanks!

The dfu mmc file buffer, which was static, now is allocated by memalign(), so the malloc pool len should be also increased.
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Cc: Jagannadha Sutradharudu Teki jaganna@xilinx.com Cc: Michal Simek monstr@monstr.eu --- Changes v6: - new commit --- include/configs/zynq-common.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/configs/zynq-common.h b/include/configs/zynq-common.h index 864528a..485babd 100644 --- a/include/configs/zynq-common.h +++ b/include/configs/zynq-common.h @@ -255,7 +255,7 @@ #define CONFIG_SYS_MEMTEST_START CONFIG_SYS_SDRAM_BASE #define CONFIG_SYS_MEMTEST_END (CONFIG_SYS_SDRAM_BASE + 0x1000)
-#define CONFIG_SYS_MALLOC_LEN 0xC00000 +#define CONFIG_SYS_MALLOC_LEN 0x1400000 #define CONFIG_SYS_INIT_RAM_ADDR CONFIG_SYS_SDRAM_BASE #define CONFIG_SYS_INIT_RAM_SIZE CONFIG_SYS_MALLOC_LEN #define CONFIG_SYS_INIT_SP_ADDR (CONFIG_SYS_INIT_RAM_ADDR + \

On 03/04/2015 02:01 PM, Przemyslaw Marczak wrote:
The dfu mmc file buffer, which was static, now is allocated by memalign(), so the malloc pool len should be also increased.
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Cc: Jagannadha Sutradharudu Teki jaganna@xilinx.com Cc: Michal Simek monstr@monstr.eu
Changes v6:
- new commit
include/configs/zynq-common.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/configs/zynq-common.h b/include/configs/zynq-common.h index 864528a..485babd 100644 --- a/include/configs/zynq-common.h +++ b/include/configs/zynq-common.h @@ -255,7 +255,7 @@ #define CONFIG_SYS_MEMTEST_START CONFIG_SYS_SDRAM_BASE #define CONFIG_SYS_MEMTEST_END (CONFIG_SYS_SDRAM_BASE + 0x1000)
-#define CONFIG_SYS_MALLOC_LEN 0xC00000 +#define CONFIG_SYS_MALLOC_LEN 0x1400000 #define CONFIG_SYS_INIT_RAM_ADDR CONFIG_SYS_SDRAM_BASE #define CONFIG_SYS_INIT_RAM_SIZE CONFIG_SYS_MALLOC_LEN #define CONFIG_SYS_INIT_SP_ADDR (CONFIG_SYS_INIT_RAM_ADDR + \
Acked-by: Michal Simek michal.simek@xilinx.com
Thanks, Michal

On Wed, Mar 04, 2015 at 02:01:28PM +0100, Przemyslaw Marczak wrote:
The dfu mmc file buffer, which was static, now is allocated by memalign(), so the malloc pool len should be also increased.
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Cc: Jagannadha Sutradharudu Teki jaganna@xilinx.com Cc: Michal Simek monstr@monstr.eu Acked-by: Michal Simek michal.simek@xilinx.com
Applied to u-boot/master, thanks!

The dfu mmc file buffer, which was static, now is allocated by memalign(), so the malloc pool len should be also increased.
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Cc: Tom Rini trini@konsulko.com --- Changes v6: - new commit --- include/configs/ti_armv7_common.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/include/configs/ti_armv7_common.h b/include/configs/ti_armv7_common.h index 2bd1164..f832ab3 100644 --- a/include/configs/ti_armv7_common.h +++ b/include/configs/ti_armv7_common.h @@ -127,7 +127,11 @@ * we are on so we do not need to rely on the command prompt. We set a * console baudrate of 115200 and use the default baud rate table. */ -#define CONFIG_SYS_MALLOC_LEN (16 << 20) +#ifdef CONFIG_DFU_MMC +#define CONFIG_SYS_MALLOC_LEN ((16 << 20) + CONFIG_SYS_DFU_DATA_BUF_SIZE) +#else +#define CONFIG_SYS_MALLOC_LEN (16 << 20) +#endif #define CONFIG_SYS_HUSH_PARSER #define CONFIG_SYS_PROMPT "U-Boot# " #define CONFIG_SYS_CONSOLE_INFO_QUIET

On Wed, Mar 04, 2015 at 02:01:29PM +0100, Przemyslaw Marczak wrote:
The dfu mmc file buffer, which was static, now is allocated by memalign(), so the malloc pool len should be also increased.
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Cc: Tom Rini trini@konsulko.com
Applied to u-boot/master, thanks!

The dfu mmc file buffer, which was static, now is allocated by memalign(), so the malloc pool len should be also increased.
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Cc: Tom Warren twarren.nvidia@gmail.com
--- Changes v6: - new commit --- include/configs/tegra-common.h | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/include/configs/tegra-common.h b/include/configs/tegra-common.h index 005fc6a..3e86f6c 100644 --- a/include/configs/tegra-common.h +++ b/include/configs/tegra-common.h @@ -36,7 +36,12 @@ /* * Size of malloc() pool */ +#ifdef CONFIG_DFU_MMC +#define CONFIG_SYS_MALLOC_LEN ((4 << 20) + \ + CONFIG_SYS_DFU_DATA_BUF_SIZE) +#else #define CONFIG_SYS_MALLOC_LEN (4 << 20) /* 4MB */ +#endif
#define CONFIG_SYS_NONCACHED_MEMORY (1 << 20) /* 1 MiB */

On Wed, Mar 04, 2015 at 02:01:30PM +0100, Przemyslaw Marczak wrote:
The dfu mmc file buffer, which was static, now is allocated by memalign(), so the malloc pool len should be also increased.
Signed-off-by: Przemyslaw Marczak p.marczak@samsung.com Cc: Tom Warren twarren.nvidia@gmail.com
Applied to u-boot/master, thanks!
participants (7)
-
Lukasz Majewski
-
Masahiro Yamada
-
Michal Simek
-
Przemyslaw Marczak
-
Simon Glass
-
Stephen Warren
-
Tom Rini