
On 03/06/2017 10:36 PM, Marek Vasut wrote:
On 03/07/2017 05:31 AM, york sun wrote:
On 03/06/2017 07:59 PM, Marek Vasut wrote:
On 03/06/2017 06:02 PM, York Sun wrote:
Early MMU improves performance especially on emulators. However, the early MMU is left enabled after the first stage of SPL boot. Instead of flushing D-cache and dealing with re-enabling MMU for the second stage U-Boot, disabling it for SPL build simplifies the process. The performance penalty is unnoticeable on the real hardware. As of now, SPL boot is not supported by existing emulators. So this should have no impact on emulators.
Signed-off-by: York Sun york.sun@nxp.com
This looks stupid. Why don't you just keep it enabled between SPL and U-Boot ? That'd be much more logical and sensible ...
Mark,
We are probably the only one using early MMU to speed up execution on emulators. I only noticed recently the cache/MMU wasn't enabled for most ARMv8 layerscape SoCs for SPL, except LS2080A (which is where we started). During recent debug, I learned the process of break-before-make process should be strictly followed when changing MMU.
I don't think I understand this part.
Marek,
Let me explain in detail. When we started to port U-Boot to the very first ARMv8 Layerscape SoC, we use hardware emulator (we still use emulator today) which runs very very slow, like 800KHz slow. It takes long long time to see the first message on the console. So we enabled MMU and cache very early to dramatically improve the booting speed. We are talking about booting up U-Boot in minutes vs 20~30 minutes. It improves productivity a lot. Since we are still using emulators for new SoCs, I think we should keep this feature.
That being said, the early MMU table is small and simple, residing in internal SRAM. It is necessary to create a new complete table after U-Boot relocates to DDR. Switching MMU table requires the cache to be flushed and disabling and re-enabling MMU at some point to avoid a race condition. I only learned this break-before-make rule should be strictly enforced recently.
For SPL boot, if the MMU/cache is enabled in SPL build, the normal boot part still goes through creating an early MMU, the same steps as booting from NOR flash. We have to deal with changing early MMU table with the same setup. This is troublesome and unnecessary.
The normal boot part surely creates MMU tables. If the SPL also creates MMU table and enables cache, then we need to flush cache and disable MMU before the normal boot part begins to run. Instead of dealing all these steps just for LS2080A, I figure it would be a lot easier to drop it as I proposed in this patch.
But then this is a step back and makes the whole platform slower, right?
Yes, but not noticeable on the real hardware. The most time we spend on the SPL is to initialize DDR. It can take up to seconds with ECC enabled. Dropping the MMU/cache does slow down a little bit, but not a lot. For Layerscape SoCs, the SPL image is loaded into internal SRAM which runs pretty fast.
Beside, LS2080A is the only SoC with MMU/cache enabled for SPL. We have many other SoCs in the same Layerscape family not enabling MMU/cache for SPL and we didn't notice the performance impact. It is actually cleaner to make all of them consistent.
York