[PATCH v3 1/2] Makefile: Allow LTO to be disabled for a build

LTO (Link-Time Optimisation) is an very useful feature which can significantly reduce the size of U-Boot binaries. So far it has been made available for selected ARM boards and sandbox.
However, incremental builds are much slower when LTO is used. For example, an incremental build of sandbox takes 2.1 seconds on my machine, but 6.7 seconds with LTO enabled.
Add a NO_LTO parameter to the build, similar to NO_SDL, so it can be disabled during development if needed, for faster builds.
Add some documentation about LTO while we are here.
Signed-off-by: Simon Glass sjg@chromium.org ---
Changes in v3: - Rework to operate like the NO_SDL flag
Makefile | 17 ++++++++++++----- arch/arm/config.mk | 4 ++-- arch/arm/include/asm/global_data.h | 2 +- doc/build/gcc.rst | 17 +++++++++++++++++ scripts/Makefile.spl | 2 +- 5 files changed, 33 insertions(+), 9 deletions(-)
diff --git a/Makefile b/Makefile index ff25f929748..348e2130c47 100644 --- a/Makefile +++ b/Makefile @@ -643,6 +643,13 @@ export CFLAGS_EFI # Compiler flags to add when building EFI app export CFLAGS_NON_EFI # Compiler flags to remove when building EFI app export EFI_TARGET # binutils target if EFI is natively supported
+export LTO_ENABLE + +# This is y if LTO is enabled for this build. See NO_LTO=1 to disable LTO +ifeq ($(NO_LTO),) +LTO_ENABLE=$(if $(CONFIG_LTO),y) +endif + # If board code explicitly specified LDSCRIPT or CONFIG_SYS_LDSCRIPT, use # that (or fail if absent). Otherwise, search for a linker script in a # standard location. @@ -702,16 +709,16 @@ endif LTO_CFLAGS := LTO_FINAL_LDFLAGS := export LTO_CFLAGS LTO_FINAL_LDFLAGS -ifdef CONFIG_LTO +ifeq ($(LTO_ENABLE),y) ifeq ($(cc-name),clang) - LTO_CFLAGS += -flto + LTO_CFLAGS += -DLTO_ENABLE -flto LTO_FINAL_LDFLAGS += -flto
AR = $(shell $(CC) -print-prog-name=llvm-ar) NM = $(shell $(CC) -print-prog-name=llvm-nm) else NPROC := $(shell nproc 2>/dev/null || echo 1) - LTO_CFLAGS += -flto=$(NPROC) + LTO_CFLAGS += -DLTO_ENABLE -flto=$(NPROC) LTO_FINAL_LDFLAGS += -fuse-linker-plugin -flto=$(NPROC)
# use plugin aware tools @@ -1760,7 +1767,7 @@ ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(ARCH)/Makefile.postlink)
# Generate linker list symbols references to force compiler to not optimize # them away when compiling with LTO -ifdef CONFIG_LTO +ifeq ($(LTO_ENABLE),y) u-boot-keep-syms-lto := keep-syms-lto.o u-boot-keep-syms-lto_c := $(patsubst %.o,%.c,$(u-boot-keep-syms-lto))
@@ -1782,7 +1789,7 @@ endif
# Rule to link u-boot # May be overridden by arch/$(ARCH)/config.mk -ifdef CONFIG_LTO +ifeq ($(LTO_ENABLE),y) quiet_cmd_u-boot__ ?= LTO $@ cmd_u-boot__ ?= \ $(CC) -nostdlib -nostartfiles \ diff --git a/arch/arm/config.mk b/arch/arm/config.mk index b3548ce2439..2065438d053 100644 --- a/arch/arm/config.mk +++ b/arch/arm/config.mk @@ -15,11 +15,11 @@ CFLAGS_NON_EFI := -fno-pic -ffixed-r9 -ffunction-sections -fdata-sections \ -fstack-protector-strong CFLAGS_EFI := -fpic -fshort-wchar
-ifneq ($(CONFIG_LTO)$(CONFIG_USE_PRIVATE_LIBGCC),yy) +ifneq ($(LTO_ENABLE)$(CONFIG_USE_PRIVATE_LIBGCC),yy) LDFLAGS_FINAL += --gc-sections endif
-ifndef CONFIG_LTO +ifneq ($(LTO_ENABLE),y) PLATFORM_RELFLAGS += -ffunction-sections -fdata-sections endif
diff --git a/arch/arm/include/asm/global_data.h b/arch/arm/include/asm/global_data.h index 6ee2a767615..cd6112dfcda 100644 --- a/arch/arm/include/asm/global_data.h +++ b/arch/arm/include/asm/global_data.h @@ -101,7 +101,7 @@ struct arch_global_data {
#include <asm-generic/global_data.h>
-#if defined(__clang__) || defined(CONFIG_LTO) +#if defined(__clang__) || defined(LTO_ENABLE)
#define DECLARE_GLOBAL_DATA_PTR #define gd get_gd() diff --git a/doc/build/gcc.rst b/doc/build/gcc.rst index ee544ad87ee..a71f860a487 100644 --- a/doc/build/gcc.rst +++ b/doc/build/gcc.rst @@ -152,6 +152,23 @@ of dtc is new enough. It also makes sure that pylibfdt is present, if needed Note that the :doc:`tools` are always built with the included version of libfdt so it is not possible to build U-Boot tools with a system libfdt, at present.
+Link-time optimisation (LTO) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +U-Boot supports link-time optimisation which can reduce the size of the final +U-Boot binaries, particularly with SPL. + +At present this can be enabled by ARM boards by adding `CONFIG_LTO=y` into the +defconfig file. Other architectures are not supported. LTO is enabled by default +for sandbox. + +This does incur a link-time penalty of several seconds. For faster incremental +builds during development, you can disable it by setting `NO_LTO` to `1`. + +.. code-block:: bash + + NO_LTO=1 make + Other build targets ~~~~~~~~~~~~~~~~~~~
diff --git a/scripts/Makefile.spl b/scripts/Makefile.spl index 1cfb8115e31..415451431b9 100644 --- a/scripts/Makefile.spl +++ b/scripts/Makefile.spl @@ -488,7 +488,7 @@ endif
# Rule to link u-boot-spl # May be overridden by arch/$(ARCH)/config.mk -ifdef CONFIG_LTO +ifeq ($(LTO_ENABLE),y) quiet_cmd_u-boot-spl ?= LTO $@ cmd_u-boot-spl ?= \ ( \

Check that sandbox builds and runs tests OK with LTO disabled.
Signed-off-by: Simon Glass sjg@chromium.org ---
(no changes since v1)
.azure-pipelines.yml | 4 ++++ .gitlab-ci.yml | 7 +++++++ 2 files changed, 11 insertions(+)
diff --git a/.azure-pipelines.yml b/.azure-pipelines.yml index bc2b437bd99..e542a45dfe0 100644 --- a/.azure-pipelines.yml +++ b/.azure-pipelines.yml @@ -243,6 +243,9 @@ stages: sandbox_clang: TEST_PY_BD: "sandbox" OVERRIDE: "-O clang-13" + sandbox_nolto: + TEST_PY_BD: "sandbox" + BUILD_ENV: "NO_LTO=1" sandbox_spl: TEST_PY_BD: "sandbox_spl" TEST_PY_TEST_SPEC: "test_ofplatdata or test_handoff or test_spl" @@ -354,6 +357,7 @@ stages: export TEST_PY_ID="${TEST_PY_ID}" export TEST_PY_TEST_SPEC="${TEST_PY_TEST_SPEC}" export OVERRIDE="${OVERRIDE}" + export BUILD_ENV="${BUILD_ENV}" EOF cat << "EOF" >> test.sh # the below corresponds to .gitlab-ci.yml "before_script" diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml index f9cd4175079..80b5d29f953 100644 --- a/.gitlab-ci.yml +++ b/.gitlab-ci.yml @@ -33,6 +33,7 @@ stages: script: # If we've been asked to use clang only do one configuration. - export UBOOT_TRAVIS_BUILD_DIR=/tmp/${TEST_PY_BD} + - echo BUILD_ENV ${BUILD_ENV} - tools/buildman/buildman -o ${UBOOT_TRAVIS_BUILD_DIR} -w -E -W -e --board ${TEST_PY_BD} ${OVERRIDE} - cp ~/grub_x86.efi $UBOOT_TRAVIS_BUILD_DIR/ @@ -248,6 +249,12 @@ sandbox with clang test.py: OVERRIDE: "-O clang-13" <<: *buildman_and_testpy_dfn
+sandbox without LTO test.py: + variables: + TEST_PY_BD: "sandbox" + BUILD_ENV: "NO_LTO=1" + <<: *buildman_and_testpy_dfn + sandbox_spl test.py: variables: TEST_PY_BD: "sandbox_spl"

Hi Tom,
On Wed, 3 Aug 2022 at 12:13, Simon Glass sjg@chromium.org wrote:
Check that sandbox builds and runs tests OK with LTO disabled.
Signed-off-by: Simon Glass sjg@chromium.org
(no changes since v1)
.azure-pipelines.yml | 4 ++++ .gitlab-ci.yml | 7 +++++++ 2 files changed, 11 insertions(+)
Another little snippet - it seems that CI performance for the world builds has dropped 20% - 30% with LTO enabled. With tui I used to see just over 1 build a second; now it is running at about 0.78 and kaki has gone from close to 0.5 to 0.29.
I wonder if ,in CI, we should just build a selection of boards with LTO, switching the world builds to non-LTO? The problem is that we may hit code size limits in SPL.
Regards, Simon

On Sun, Aug 07, 2022 at 09:48:01AM -0600, Simon Glass wrote:
Hi Tom,
On Wed, 3 Aug 2022 at 12:13, Simon Glass sjg@chromium.org wrote:
Check that sandbox builds and runs tests OK with LTO disabled.
Signed-off-by: Simon Glass sjg@chromium.org
(no changes since v1)
.azure-pipelines.yml | 4 ++++ .gitlab-ci.yml | 7 +++++++ 2 files changed, 11 insertions(+)
Another little snippet - it seems that CI performance for the world builds has dropped 20% - 30% with LTO enabled. With tui I used to see just over 1 build a second; now it is running at about 0.78 and kaki has gone from close to 0.5 to 0.29.
I wonder if ,in CI, we should just build a selection of boards with LTO, switching the world builds to non-LTO? The problem is that we may hit code size limits in SPL.
I'm not sure how you're measuring this. We don't enable LTO by default anywhere other than sandbox, it's opt-in for a few platforms still.

Hi Tom,
On Mon, 8 Aug 2022 at 10:09, Tom Rini trini@konsulko.com wrote:
On Sun, Aug 07, 2022 at 09:48:01AM -0600, Simon Glass wrote:
Hi Tom,
On Wed, 3 Aug 2022 at 12:13, Simon Glass sjg@chromium.org wrote:
Check that sandbox builds and runs tests OK with LTO disabled.
Signed-off-by: Simon Glass sjg@chromium.org
(no changes since v1)
.azure-pipelines.yml | 4 ++++ .gitlab-ci.yml | 7 +++++++ 2 files changed, 11 insertions(+)
Another little snippet - it seems that CI performance for the world builds has dropped 20% - 30% with LTO enabled. With tui I used to see just over 1 build a second; now it is running at about 0.78 and kaki has gone from close to 0.5 to 0.29.
I wonder if ,in CI, we should just build a selection of boards with LTO, switching the world builds to non-LTO? The problem is that we may hit code size limits in SPL.
I'm not sure how you're measuring this. We don't enable LTO by default anywhere other than sandbox, it's opt-in for a few platforms still.
I just assumed that is why, since it slows incremental builds down so much. But perhaps something else is going on? I can try to bisect it, I suppose, if I take tui offline for a bit.
Regards, Simon

On Tue, Aug 09, 2022 at 01:51:13PM -0600, Simon Glass wrote:
Hi Tom,
On Mon, 8 Aug 2022 at 10:09, Tom Rini trini@konsulko.com wrote:
On Sun, Aug 07, 2022 at 09:48:01AM -0600, Simon Glass wrote:
Hi Tom,
On Wed, 3 Aug 2022 at 12:13, Simon Glass sjg@chromium.org wrote:
Check that sandbox builds and runs tests OK with LTO disabled.
Signed-off-by: Simon Glass sjg@chromium.org
(no changes since v1)
.azure-pipelines.yml | 4 ++++ .gitlab-ci.yml | 7 +++++++ 2 files changed, 11 insertions(+)
Another little snippet - it seems that CI performance for the world builds has dropped 20% - 30% with LTO enabled. With tui I used to see just over 1 build a second; now it is running at about 0.78 and kaki has gone from close to 0.5 to 0.29.
I wonder if ,in CI, we should just build a selection of boards with LTO, switching the world builds to non-LTO? The problem is that we may hit code size limits in SPL.
I'm not sure how you're measuring this. We don't enable LTO by default anywhere other than sandbox, it's opt-in for a few platforms still.
I just assumed that is why, since it slows incremental builds down so much. But perhaps something else is going on? I can try to bisect it, I suppose, if I take tui offline for a bit.
If you think there's something consistent here, yeah, it'd be good to track it down. For me, complete runs go from between 45min to 1h20min, with a few longer outliers.

On Wed, Aug 03, 2022 at 12:13:09PM -0600, Simon Glass wrote:
Check that sandbox builds and runs tests OK with LTO disabled.
Signed-off-by: Simon Glass sjg@chromium.org
Applied to u-boot/next, thanks!

Hi Tom,
On Wed, 3 Aug 2022 at 12:13, Simon Glass sjg@chromium.org wrote:
LTO (Link-Time Optimisation) is an very useful feature which can significantly reduce the size of U-Boot binaries. So far it has been made available for selected ARM boards and sandbox.
However, incremental builds are much slower when LTO is used. For example, an incremental build of sandbox takes 2.1 seconds on my machine, but 6.7 seconds with LTO enabled.
Add a NO_LTO parameter to the build, similar to NO_SDL, so it can be disabled during development if needed, for faster builds.
Add some documentation about LTO while we are here.
Signed-off-by: Simon Glass sjg@chromium.org
Changes in v3:
- Rework to operate like the NO_SDL flag
Makefile | 17 ++++++++++++----- arch/arm/config.mk | 4 ++-- arch/arm/include/asm/global_data.h | 2 +- doc/build/gcc.rst | 17 +++++++++++++++++ scripts/Makefile.spl | 2 +- 5 files changed, 33 insertions(+), 9 deletions(-)
Any word on this patch and the next one, please?
Regards, Simon

On Wed, Aug 03, 2022 at 12:13:08PM -0600, Simon Glass wrote:
LTO (Link-Time Optimisation) is an very useful feature which can significantly reduce the size of U-Boot binaries. So far it has been made available for selected ARM boards and sandbox.
However, incremental builds are much slower when LTO is used. For example, an incremental build of sandbox takes 2.1 seconds on my machine, but 6.7 seconds with LTO enabled.
Add a NO_LTO parameter to the build, similar to NO_SDL, so it can be disabled during development if needed, for faster builds.
Add some documentation about LTO while we are here.
Signed-off-by: Simon Glass sjg@chromium.org
Applied to u-boot/next, thanks!
participants (2)
-
Simon Glass
-
Tom Rini