
Hello Marek,
On Thu, 10 Dec 2015 04:53:01 +0100, Marek Vasut marex@denx.de wrote:
On Thursday, December 10, 2015 at 03:27:08 AM, Tom Rini wrote:
On Wed, Dec 09, 2015 at 03:09:13PM +0100, Marek Vasut wrote:
On Wednesday, December 09, 2015 at 02:02:03 PM, Stefan Roese wrote:
Hi All!
On 09.12.2015 09:09, Albert ARIBAUD wrote:
On Tue, 8 Dec 2015 14:43:42 +0100, Marek Vasut marex@denx.de wrote:
The arch/arm/lib/cache-cp15.c checks for CONFIG_ARMV7 and if this macro is set, it configures TTBR0 register. This register must be configured for the cache on ARMv7 to operate correctly.
The problem is that noone actually sets the CONFIG_ARMV7 macro and thus the TTBR0 is not configured at all. On SoCFPGA, this produces all sorts of minor issues which are hard to replicate, for example certain USB sticks are not detected or QSPI NOR sometimes fails to write pages completely.
The solution is to replace CONFIG_ARMV7 test with CONFIG_CPU_V7 one. This is correct because the code which added the test(s) for CONFIG_ARMV7 was added shortly after CONFIG_ARMV7 was replaced by CONFIG_CPU_V7 and this code was not adjusted correctly to reflect that change.
Note:
As discussed with Marek on IRC, this patch enables what is supposed to be the correct MMU settings for ARMv7, which causes a sharp Ethernet performance drop (40%) but also a strong general memory access performance hit (a copy of 4 MB is almost instantaneous without the patch and takes 2-3 seconds with it).
I've tested Marek's patch on my current Armada XP platform (also ARMv7). And also noticed a performance drop by about 30-40%. The dhrystone command can be used for this testing as well (CONFIG_CMD_DHRYSTONE).
So this is definitely a NAK from me to this current patch.
This is a wrong approach, this patch just fixes another patch which was broken from the getgo and the TTBR0 reg was not programmed correctly due to that. This patch must be applied, but what we need to decide is whether we need _another_ patch which removes the S-bit from the pagetable or how to deal with that part.
No, we don't have to apply this patch. The original patch here (97840b5) was likely not run-time tested when submitted as it was using the then-defunct CONFIG_ARMV7 symbol, and was likely a bugfix in an internal tree that was pushed upstream (which is on the whole, good).
I find it surprising that such a patch, which modifies common code even, was not scrutinized enough.
So in sum U-Boot has always been as broken in some specific regard before that patch as it is after that patch. So we have time to see what we need to do about enabling this feature correctly and even ensuring that we don't happen to say break things once Linux is up too.
In that case, I am looking forward to better suggestions.
I still disagree that this patch should not be applied, it corrects code which was broken. The slowdown caused by the original patch is a separate issue and should be corrected by a separate patch. If these two patches get applied at once, that is fine.
This patch has several effects:
- it selects proper ARMv7 translation table level 1 bit definitions; - it provides proper ARMv7 definitions for WT/WB/WA; - it selects proper ARMv7 settings for TTBR0.
All these are correct as per the docs I have (although I may have missed something during the readings (and cross-readings with Marek) of these last hours/days.
Now, one specific effect goes against performance, and it is the setting of bit S in all TT entries. This bit makes the corresponding region shareable, but for all I know, in U-Boot we don't have more than one core accessing the same memory or registers sets so -- at least for the major part of its execution -- there is no reason for any region to be shareable.
/That/ effect I certainly don't want.
The rest I am fine with.
So, if we apply this patch minus the inclusion of the S bit in the definition of DCACHE_OFF (and hence of all DCACHE_* enum members after it), then we get code that is more correct from a theoretical standpoint, and does not degrade performance (which is a hint, admittedly weak, to me that it is not incorrect from a practical standpoint.
It does not fix the USB and QSPI issues in the socfpga case, but I am not sure these are caused by the S bit being zero.
Best regards, Marek Vasut
Amicalement,