[U-Boot] [PATCH] initcall: Move to inline function

The board_r init function was complaining that we are looping through an array, calling all our tiny init stubs sequentially via indirect function calls (which can't be speculated, so they are slow).
The solution to that is pretty easy though. All we need to do is inline the function that loops through the functions and the compiler will automatically convert almost all indirect calls into direct inlined code.
With this patch, the overall code size drops (by 40 bytes on riscv64) and boot time should become measurably faster for every target.
Signed-off-by: Alexander Graf agraf@suse.de --- common/board_r.c | 5 +---- include/initcall.h | 35 ++++++++++++++++++++++++++++++++++- lib/Makefile | 1 - lib/initcall.c | 39 --------------------------------------- 4 files changed, 35 insertions(+), 45 deletions(-) delete mode 100644 lib/initcall.c
diff --git a/common/board_r.c b/common/board_r.c index 5f3d27aa9f..472987d5d5 100644 --- a/common/board_r.c +++ b/common/board_r.c @@ -633,10 +633,7 @@ static int run_main_loop(void) }
/* - * Over time we hope to remove these functions with code fragments and - * stub functions, and instead call the relevant function directly. - * - * We also hope to remove most of the driver-related init and do it if/when + * We hope to remove most of the driver-related init and do it if/when * the driver is later used. * * TODO: perhaps reset the watchdog in the initcall function after each call? diff --git a/include/initcall.h b/include/initcall.h index 01f3f2833f..3ac01aa2cd 100644 --- a/include/initcall.h +++ b/include/initcall.h @@ -8,6 +8,39 @@
typedef int (*init_fnc_t)(void);
-int initcall_run_list(const init_fnc_t init_sequence[]); +#include <common.h> +#include <initcall.h> +#include <efi.h> + +DECLARE_GLOBAL_DATA_PTR; + +static inline int initcall_run_list(const init_fnc_t init_sequence[]) +{ + const init_fnc_t *init_fnc_ptr; + + for (init_fnc_ptr = init_sequence; *init_fnc_ptr; ++init_fnc_ptr) { + unsigned long reloc_ofs = 0; + int ret; + + if (gd->flags & GD_FLG_RELOC) + reloc_ofs = gd->reloc_off; +#ifdef CONFIG_EFI_APP + reloc_ofs = (unsigned long)image_base; +#endif + debug("initcall: %p", (char *)*init_fnc_ptr - reloc_ofs); + if (gd->flags & GD_FLG_RELOC) + debug(" (relocated to %p)\n", (char *)*init_fnc_ptr); + else + debug("\n"); + ret = (*init_fnc_ptr)(); + if (ret) { + printf("initcall sequence %p failed at call %p (err=%d)\n", + init_sequence, + (char *)*init_fnc_ptr - reloc_ofs, ret); + return -1; + } + } + return 0; +}
#endif diff --git a/lib/Makefile b/lib/Makefile index 61d7ff0678..47829bfed5 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -35,7 +35,6 @@ obj-$(CONFIG_TEST_FDTDEC) += fdtdec_test.o obj-$(CONFIG_GZIP_COMPRESSED) += gzip.o obj-$(CONFIG_GENERATE_SMBIOS_TABLE) += smbios.o obj-$(CONFIG_IMAGE_SPARSE) += image-sparse.o -obj-y += initcall.o obj-y += ldiv.o obj-$(CONFIG_MD5) += md5.o obj-y += net_utils.o diff --git a/lib/initcall.c b/lib/initcall.c deleted file mode 100644 index 8f1dac68e4..0000000000 --- a/lib/initcall.c +++ /dev/null @@ -1,39 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0+ -/* - * Copyright (c) 2013 The Chromium OS Authors. - */ - -#include <common.h> -#include <initcall.h> -#include <efi.h> - -DECLARE_GLOBAL_DATA_PTR; - -int initcall_run_list(const init_fnc_t init_sequence[]) -{ - const init_fnc_t *init_fnc_ptr; - - for (init_fnc_ptr = init_sequence; *init_fnc_ptr; ++init_fnc_ptr) { - unsigned long reloc_ofs = 0; - int ret; - - if (gd->flags & GD_FLG_RELOC) - reloc_ofs = gd->reloc_off; -#ifdef CONFIG_EFI_APP - reloc_ofs = (unsigned long)image_base; -#endif - debug("initcall: %p", (char *)*init_fnc_ptr - reloc_ofs); - if (gd->flags & GD_FLG_RELOC) - debug(" (relocated to %p)\n", (char *)*init_fnc_ptr); - else - debug("\n"); - ret = (*init_fnc_ptr)(); - if (ret) { - printf("initcall sequence %p failed at call %p (err=%d)\n", - init_sequence, - (char *)*init_fnc_ptr - reloc_ofs, ret); - return -1; - } - } - return 0; -}

Hi Alex,
On Thu, 31 Jan 2019 at 08:06, Alexander Graf agraf@suse.de wrote:
The board_r init function was complaining that we are looping through an array, calling all our tiny init stubs sequentially via indirect function calls (which can't be speculated, so they are slow).
Is this a compiler warning? Could you let me know what this is?
The solution to that is pretty easy though. All we need to do is inline the function that loops through the functions and the compiler will automatically convert almost all indirect calls into direct inlined code.
You mean it calls the functions one after the other without a function-table array?
With this patch, the overall code size drops (by 40 bytes on riscv64) and boot time should become measurably faster for every target.
Signed-off-by: Alexander Graf agraf@suse.de
common/board_r.c | 5 +---- include/initcall.h | 35 ++++++++++++++++++++++++++++++++++- lib/Makefile | 1 - lib/initcall.c | 39 --------------------------------------- 4 files changed, 35 insertions(+), 45 deletions(-) delete mode 100644 lib/initcall.c
Regards, Simon

Am 02.02.2019 um 15:13 schrieb Simon Glass sjg@chromium.org:
Hi Alex,
On Thu, 31 Jan 2019 at 08:06, Alexander Graf agraf@suse.de wrote:
The board_r init function was complaining that we are looping through an array, calling all our tiny init stubs sequentially via indirect function calls (which can't be speculated, so they are slow).
Is this a compiler warning? Could you let me know what this is?
It's the code comment I'm removing with this patch :).
The solution to that is pretty easy though. All we need to do is inline the function that loops through the functions and the compiler will automatically convert almost all indirect calls into direct inlined code.
You mean it calls the functions one after the other without a function-table array?
Exactly. Magical, eh? It even inlines them!
Alex
With this patch, the overall code size drops (by 40 bytes on riscv64) and boot time should become measurably faster for every target.
Signed-off-by: Alexander Graf agraf@suse.de
common/board_r.c | 5 +---- include/initcall.h | 35 ++++++++++++++++++++++++++++++++++- lib/Makefile | 1 - lib/initcall.c | 39 --------------------------------------- 4 files changed, 35 insertions(+), 45 deletions(-) delete mode 100644 lib/initcall.c
Regards, Simon

Hi Alex,
On Sat, 2 Feb 2019 at 09:07, Alexander Graf agraf@suse.de wrote:
Am 02.02.2019 um 15:13 schrieb Simon Glass sjg@chromium.org:
Hi Alex,
On Thu, 31 Jan 2019 at 08:06, Alexander Graf agraf@suse.de wrote:
The board_r init function was complaining that we are looping through an array, calling all our tiny init stubs sequentially via indirect function calls (which can't be speculated, so they are slow).
Is this a compiler warning? Could you let me know what this is?
It's the code comment I'm removing with this patch :).
The solution to that is pretty easy though. All we need to do is inline the function that loops through the functions and the compiler will automatically convert almost all indirect calls into direct inlined code.
You mean it calls the functions one after the other without a function-table array?
Exactly. Magical, eh? It even inlines them!
Yes it is surprising. I am also surprised that it reduces code size, but I suppose that is why it does it. Presumably the inlining is what does that.
But what happens if we #define DEBUG?
Regards, Simon

Am 08.02.2019 um 05:11 schrieb Simon Glass sjg@chromium.org:
Hi Alex,
On Sat, 2 Feb 2019 at 09:07, Alexander Graf agraf@suse.de wrote:
Am 02.02.2019 um 15:13 schrieb Simon Glass sjg@chromium.org:
Hi Alex,
On Thu, 31 Jan 2019 at 08:06, Alexander Graf agraf@suse.de wrote:
The board_r init function was complaining that we are looping through an array, calling all our tiny init stubs sequentially via indirect function calls (which can't be speculated, so they are slow).
Is this a compiler warning? Could you let me know what this is?
It's the code comment I'm removing with this patch :).
The solution to that is pretty easy though. All we need to do is inline the function that loops through the functions and the compiler will automatically convert almost all indirect calls into direct inlined code.
You mean it calls the functions one after the other without a function-table array?
Exactly. Magical, eh? It even inlines them!
Yes it is surprising. I am also surprised that it reduces code size, but I suppose that is why it does it. Presumably the inlining is what does that.
Yes, of course. With separate object files, the compiler can not inline anything at all, because it does not know how the function pointers get used.
The alternative to this *might* be LTO, which we could think about as well. It should help reduce indirection and code size overall. But I don't know how well gold works with the linker scripts we have.
But what happens if we #define DEBUG?
Define it where? ;)
Alex
Regards, Simon

Hi Alex,
On Fri, 8 Feb 2019 at 01:58, Alexander Graf agraf@suse.de wrote:
Am 08.02.2019 um 05:11 schrieb Simon Glass sjg@chromium.org:
Hi Alex,
On Sat, 2 Feb 2019 at 09:07, Alexander Graf agraf@suse.de wrote:
Am 02.02.2019 um 15:13 schrieb Simon Glass sjg@chromium.org:
Hi Alex,
On Thu, 31 Jan 2019 at 08:06, Alexander Graf agraf@suse.de wrote:
The board_r init function was complaining that we are looping through an array, calling all our tiny init stubs sequentially via indirect function calls (which can't be speculated, so they are slow).
Is this a compiler warning? Could you let me know what this is?
It's the code comment I'm removing with this patch :).
The solution to that is pretty easy though. All we need to do is inline the function that loops through the functions and the compiler will automatically convert almost all indirect calls into direct inlined code.
You mean it calls the functions one after the other without a function-table array?
Exactly. Magical, eh? It even inlines them!
Yes it is surprising. I am also surprised that it reduces code size, but I suppose that is why it does it. Presumably the inlining is what does that.
Yes, of course. With separate object files, the compiler can not inline anything at all, because it does not know how the function pointers get used.
The alternative to this *might* be LTO, which we could think about as well. It should help reduce indirection and code size overall. But I don't know how well gold works with the linker scripts we have.
Hmm don't we have that? We should.
But what happens if we #define DEBUG?
Define it where? ;)
Yes exactly. At present you can turn this on globally by putting it at the top of initcall.c. I think it is worth a comment on how to enable debugging.
Does the code size blow out horribly if debugging is enabled?
(and hello from Munich!)
Regards, Simon

On Fri, Feb 08, 2019 at 08:58:18AM +0100, Alexander Graf wrote:
Am 08.02.2019 um 05:11 schrieb Simon Glass sjg@chromium.org:
Hi Alex,
On Sat, 2 Feb 2019 at 09:07, Alexander Graf agraf@suse.de wrote:
Am 02.02.2019 um 15:13 schrieb Simon Glass sjg@chromium.org:
Hi Alex,
On Thu, 31 Jan 2019 at 08:06, Alexander Graf agraf@suse.de wrote:
The board_r init function was complaining that we are looping through an array, calling all our tiny init stubs sequentially via indirect function calls (which can't be speculated, so they are slow).
Is this a compiler warning? Could you let me know what this is?
It's the code comment I'm removing with this patch :).
The solution to that is pretty easy though. All we need to do is inline the function that loops through the functions and the compiler will automatically convert almost all indirect calls into direct inlined code.
You mean it calls the functions one after the other without a function-table array?
Exactly. Magical, eh? It even inlines them!
Yes it is surprising. I am also surprised that it reduces code size, but I suppose that is why it does it. Presumably the inlining is what does that.
Yes, of course. With separate object files, the compiler can not inline anything at all, because it does not know how the function pointers get used.
The alternative to this *might* be LTO, which we could think about as well. It should help reduce indirection and code size overall. But I don't know how well gold works with the linker scripts we have.
I don't object to LTO but there's a LOT of groundwork before it's an option. I think in addition to switching to gcc for ld, looking over an old git stash from when I tried this last, we need to globally switch to the "clang" method of keeping track of gd rather than how we do it today.

Am 10.02.2019 um 14:16 schrieb Tom Rini trini@konsulko.com:
On Fri, Feb 08, 2019 at 08:58:18AM +0100, Alexander Graf wrote:
Am 08.02.2019 um 05:11 schrieb Simon Glass sjg@chromium.org:
Hi Alex,
On Sat, 2 Feb 2019 at 09:07, Alexander Graf agraf@suse.de wrote:
Am 02.02.2019 um 15:13 schrieb Simon Glass sjg@chromium.org:
Hi Alex,
On Thu, 31 Jan 2019 at 08:06, Alexander Graf agraf@suse.de wrote:
The board_r init function was complaining that we are looping through an array, calling all our tiny init stubs sequentially via indirect function calls (which can't be speculated, so they are slow).
Is this a compiler warning? Could you let me know what this is?
It's the code comment I'm removing with this patch :).
The solution to that is pretty easy though. All we need to do is inline the function that loops through the functions and the compiler will automatically convert almost all indirect calls into direct inlined code.
You mean it calls the functions one after the other without a function-table array?
Exactly. Magical, eh? It even inlines them!
Yes it is surprising. I am also surprised that it reduces code size, but I suppose that is why it does it. Presumably the inlining is what does that.
Yes, of course. With separate object files, the compiler can not inline anything at all, because it does not know how the function pointers get used.
The alternative to this *might* be LTO, which we could think about as well. It should help reduce indirection and code size overall. But I don't know how well gold works with the linker scripts we have.
I don't object to LTO but there's a LOT of groundwork before it's an option. I think in addition to switching to gcc for ld, looking over an old git stash from when I tried this last, we need to globally switch to the "clang" method of keeping track of gd rather than how we do it today.
Sounds like x86_64 could be an easy target for experimentation then? :)
Alex
-- Tom

On Sun, Feb 10, 2019 at 02:48:32PM +0100, Alexander Graf wrote:
Am 10.02.2019 um 14:16 schrieb Tom Rini trini@konsulko.com:
On Fri, Feb 08, 2019 at 08:58:18AM +0100, Alexander Graf wrote:
Am 08.02.2019 um 05:11 schrieb Simon Glass sjg@chromium.org:
Hi Alex,
On Sat, 2 Feb 2019 at 09:07, Alexander Graf agraf@suse.de wrote:
Am 02.02.2019 um 15:13 schrieb Simon Glass sjg@chromium.org:
Hi Alex,
> On Thu, 31 Jan 2019 at 08:06, Alexander Graf agraf@suse.de wrote: > > The board_r init function was complaining that we are looping through > an array, calling all our tiny init stubs sequentially via indirect > function calls (which can't be speculated, so they are slow).
Is this a compiler warning? Could you let me know what this is?
It's the code comment I'm removing with this patch :).
> > The solution to that is pretty easy though. All we need to do is inline > the function that loops through the functions and the compiler will > automatically convert almost all indirect calls into direct inlined code.
You mean it calls the functions one after the other without a function-table array?
Exactly. Magical, eh? It even inlines them!
Yes it is surprising. I am also surprised that it reduces code size, but I suppose that is why it does it. Presumably the inlining is what does that.
Yes, of course. With separate object files, the compiler can not inline anything at all, because it does not know how the function pointers get used.
The alternative to this *might* be LTO, which we could think about as well. It should help reduce indirection and code size overall. But I don't know how well gold works with the linker scripts we have.
I don't object to LTO but there's a LOT of groundwork before it's an option. I think in addition to switching to gcc for ld, looking over an old git stash from when I tried this last, we need to globally switch to the "clang" method of keeping track of gd rather than how we do it today.
Sounds like x86_64 could be an easy target for experimentation then? :)
Could be? I tried ARM for some reason first.

On Thu, Jan 31, 2019 at 04:06:23PM +0100, Alexander Graf wrote:
The board_r init function was complaining that we are looping through an array, calling all our tiny init stubs sequentially via indirect function calls (which can't be speculated, so they are slow).
The solution to that is pretty easy though. All we need to do is inline the function that loops through the functions and the compiler will automatically convert almost all indirect calls into direct inlined code.
With this patch, the overall code size drops (by 40 bytes on riscv64) and boot time should become measurably faster for every target.
Signed-off-by: Alexander Graf agraf@suse.de
Applied to u-boot/master, thanks!
participants (3)
-
Alexander Graf
-
Simon Glass
-
Tom Rini