
On Wed, Mar 03, 2021 at 05:41:57PM +0100, Marek Behun wrote:
On Wed, 3 Mar 2021 11:11:59 -0500 Tom Rini trini@konsulko.com wrote:
On Wed, Mar 03, 2021 at 05:11:59AM +0100, Marek Behún wrote:
Hello,
I have managed to add support for building U-Boot with LTO (with GCC) in a rather sane way (in LOC changed).
This series and its follows will also be available at https://github.com/elkablo/u-boot branch lto.
I have tested these builds on Turris Omnia, Turris MOX and on Nokia N900 (via the test/nokia_rx51_test.sh script). For other tests I have created a pull-request on github to trigger CI (https://github.com/u-boot/u-boot/pull/57) For some reason it is waiting now, maybe Azure is not working or something.
As we're on the free tier with Azure it sometimes just queues us up for a long time, this job finally started running recently.
My tests on Omnia and MOX show that U-Boot boots sucessfully, and basic commands seem to work. But of course something broken due to LTO may be found later.
So for all of you that are interested and have an ARM board, please test this on your boards by enabling CONFIG_LTO option. Also please report code size reductions. (Chris Packham reports an error related to jobserver, so if `make -jN` produces an error, please try without the `-jN` flag.)
I have only tested with gcc-10. There are still some warnings printed, like: bfd plugin: invalid symbol type found but these don't seem to matter. I will look into this later.
Here are some results by how much code size reduced. Note that SPL binary seems to gain more code reduction (15.4 % on average) than main binary (4.5 % on average).
I guess this is because of how drivers are written. The optimizer cannot know which code paths won't be used, since it does not see the device tree. Maybe this could be somehow integrated with Simon's work on OF_PLATDATA_INST in the future, to make the compiler optimize out unused code paths in drivers by understanding the device tree.
u-boot.bin u-boot-spl.bin clearfog 4.34 % 19.0 KB 13.55 % 16.8 KB
controlcenterdc 4.79 % 24.2 KB 16.27 % 21.9 KB db-88f6820-amc 4.23 % 25.0 KB 16.17 % 22.9 KB db-88f6820-gp 4.42 % 22.1 KB 17.00 % 23.8 KB helios4 4.32 % 18.9 KB 13.70 % 16.8 KB nokia_rx51 6.11 % 16.5 KB turris_mox 4.17 % 31.8 KB turris_omnia 4.32 % 30.2 KB 14.91 % 16.6 KB x530 3.93 % 30.0 KB 16.26 % 23.4 KB
Marek
Thanks for starting on this! It's been on my list for a long time, especially since it does give overall better reduction than function/data-sections/discard. It does seem like clang fails to build with this series. One thing I want to try locally, and I'll fire off the results once I do it, is moving to LTO by default for ARM.
Yes, it seems clang is the last thing I need to look at. I did not even try, really, my first priority was gcc. I will look into this tomorrow.
All in all I am happy with this since it seems to be running for several different boards without issue.
If you want to enable LTO by default for ARM, we probably need to determine which gcc version should be minimal for this. Because older gcc versions may have problems with LTO. What is the current minimal version of gcc for U-Boot?
So, as I start testing things locally with two additional changes (1. LTO by default 2. No ffunction/data-sections with LTO) we see: https://gist.github.com/trini/350ab850c42293563228b8d68a1bb89a as the detailed size reduction. This also shows that with LTO we want to turn off -ffunction-sections/etc as it's not useful now.
Marek