
On Thursday 09 February 2012 05:14 PM, Wolfgang Denk wrote:
Dear Aneesh V,
In message4F33614D.8020904@ti.com you wrote:
What exactly are you talking about here that "was adding a considerable delay" - the memory copy ? Are you really sure about that?
I didn't measure it part by part, but removing relocation gave a noticeable speed-up, this platform is several orders of magnitude slower than the real silicon. So, that should not be surprising.
Could you please start using exact terminology, so we understand what you actually refer to? Did you really remove the _relocation_, i. e. link for a static address, or did you just skip the memory copy? Note that the latter should be a no-op anyway if you just load the image to the resulting target address.
I defeated relocation by passing to the relocate_code() function the same address as it is linked to. I patched up arch/arm/lib/board.c for this and fixed up the relocate_code() to correctly handle this special case. So, relocate_code() does only .bss init now.
Maybe, I should stop the arguments now and wait till that framework is a reality.
I am very much convinced that you are tracking down a red herring. It does not really matter if you run the code on real silicon or in an emulation - the relative times will always be the same. Without any detailed timing analysis I simply do not believe you that you really have found a hot spot. You focus on it because you found out that it exists and you think it was "not needed" in your configuration - without spending time on real optimization.
Please note that our bootloaders and kernel are customized and scaled down for this environment. For instance, u-boot doesn't load the kernel from network or a memory device. The kernel is preloaded in the modeled memory for it. So, u-boot was just used to jump to the kernel. As such, the u-boot run-time is now more dominated by pure software stuff such as relocation. The relative timing doesn't quite apply.
This is a fundamentally broken approach, and it will remain to be broken even if new concepts get implemented that may make it easier to skip certain steps of the initialization.
If you are concerned about boot time optimization, you _must_ start with timing measurements. You know where premature optimization leads to, don't you?
As I mentioned earlier boot-time is not my key care-about. Even on an emulation platform I will probably try SPL Linux boot next time. My key concerns are about the other aspects I mentioned, namely avoidable complexity and problems with debugger.
br, Aneesh