[U-Boot] ARMv7 MMU shareability issue (was: [PATCH] arm: Replace test for CONFIG_ARMV7 with CONFIG_CPU_V7)

9 Dec 2015


      Hello Marek,
On Tue,  8 Dec 2015 14:43:42 +0100, Marek Vasut marex@denx.de wrote:
...
The arch/arm/lib/cache-cp15.c checks for CONFIG_ARMV7 and if this macro is
set, it configures TTBR0 register. This register must be configured for the
cache on ARMv7 to operate correctly.
The problem is that noone actually sets the CONFIG_ARMV7 macro and thus the
TTBR0 is not configured at all. On SoCFPGA, this produces all sorts of minor
issues which are hard to replicate, for example certain USB sticks are not
detected or QSPI NOR sometimes fails to write pages completely.
The solution is to replace CONFIG_ARMV7 test with CONFIG_CPU_V7 one. This is
correct because the code which added the test(s) for CONFIG_ARMV7 was added
shortly after CONFIG_ARMV7 was replaced by CONFIG_CPU_V7 and this code was
not adjusted correctly to reflect that change.
Note:
As discussed with Marek on IRC, this patch enables what is supposed to
be the correct MMU settings for ARMv7, which causes a sharp Ethernet
performance drop (40%) but also a strong general memory access
performance hit (a copy of 4 MB is almost instantaneous without the
patch and takes 2-3 seconds with it).
I would like to either fix the performance or come up with an
explanation for it before I pick this patch.
Marek's analysis shows the only MMU-related effect of the patch is that
the S bit gets set in first-level MMU table entries.
The S bit makes a mmu region shareable with other IPs within the
silicon (DMA engines, other cores, possibly even a piece of bus or
interconnect). For instance, it will cause memory writes within the
region to propagate to other IPs (which [must] also have defined this
region as shareable) so that these IPs know the region has been
written to and update their cache state accordingly.
With the S bit clear, USB and QSPI fail to work on Marek's board,
probably because writes do not propagate properly between the core and
these IPs; with the S bit set, USB and QSPI work but cache performance
is very reduced.
My hypothesis right now is that some other IP(s) hamper(s) the
propagation of shareable region writes; for instance, some IP is off or
in the wrong state, and does not properly respond, causing some stall.
At first I suspected the second core, which was off during the tests;
but Marek tried with that core on and it did not improve performance.
We can keep on looking into finding IPs that might affect shareable
accesses and try to turn them on or off more or less at random, but
that is time-consuming, and I'm not even sure I'm on the right track.
I will welcome suggestions if anyone with more experience in MMU
shareability than us -- or with brilliant insight :) -- has any.
Amicalement,
-- 
Albert.

[U-Boot] ARMv7 MMU shareability issue (was: [PATCH] arm: Replace test for CONFIG_ARMV7 with CONFIG_CPU_V7)

Albert ARIBAUD