[U-Boot] [PATCH] mx6: Set shared override bit in PL310 AUX_CTRL register

From: Fabio Estevam fabio.estevam@freescale.com
Having bit 22 cleared in the PL310 Auxiliary Control register (shared attribute override enable) has the side effect of transforming Normal Shared Non-cacheable reads into Cacheable no-allocate reads.
Coherent DMA buffers in Linux always have a Cacheable alias via the kernel linear mapping and the processor can speculatively load cache lines into the PL310 controller. With bit 22 cleared, Non-cacheable reads would unexpectedly hit such cache lines leading to buffer corruption.
This was inspired by a patch from Catalin Marinas [1] and also from recent discussions in the linux-arm-kernel list [2] where Russell King and Rob Herring suggested that bootloaders should initialize the cache.
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/031810.h... [2] https://lkml.org/lkml/2015/2/20/199
Signed-off-by: Fabio Estevam fabio.estevam@freescale.com --- arch/arm/cpu/armv7/mx6/soc.c | 8 ++++++++ arch/arm/include/asm/pl310.h | 2 ++ 2 files changed, 10 insertions(+)
diff --git a/arch/arm/cpu/armv7/mx6/soc.c b/arch/arm/cpu/armv7/mx6/soc.c index ef02972..5aab305 100644 --- a/arch/arm/cpu/armv7/mx6/soc.c +++ b/arch/arm/cpu/armv7/mx6/soc.c @@ -506,6 +506,14 @@ void v7_outer_cache_enable(void) struct pl310_regs *const pl310 = (struct pl310_regs *)L2_PL310_BASE; unsigned int val;
+ + /* + * Set bit 22 in the auxiliary control register. If this bit + * is cleared, PL310 treats Normal Shared Non-cacheable + * accesses as Cacheable no-allocate. + */ + setbits_le32(&pl310->pl310_aux_ctrl, L310_SHARED_ATT_OVERRIDE_ENABLE); + #if defined CONFIG_MX6SL struct iomuxc *iomux = (struct iomuxc *)IOMUXC_BASE_ADDR; val = readl(&iomux->gpr[11]); diff --git a/arch/arm/include/asm/pl310.h b/arch/arm/include/asm/pl310.h index ddc245b..de7650e 100644 --- a/arch/arm/include/asm/pl310.h +++ b/arch/arm/include/asm/pl310.h @@ -16,6 +16,8 @@ #define L2X0_STNDBY_MODE_EN (1 << 0) #define L2X0_CTRL_EN 1
+#define L310_SHARED_ATT_OVERRIDE_ENABLE (1 << 22) + struct pl310_regs { u32 pl310_cache_id; u32 pl310_cache_type;

On Wed, Mar 11, 2015 at 05:12:12PM -0300, Fabio Estevam wrote:
From: Fabio Estevam fabio.estevam@freescale.com
Having bit 22 cleared in the PL310 Auxiliary Control register (shared attribute override enable) has the side effect of transforming Normal Shared Non-cacheable reads into Cacheable no-allocate reads.
Coherent DMA buffers in Linux always have a Cacheable alias via the kernel linear mapping and the processor can speculatively load cache lines into the PL310 controller.
No, this is wrong. They do not. CMA remaps pages to be non-cacheable rather than the old technique where the above statement was true.
There's some corner cases which make that less effective than it once was, and as I've already said, those need to be fixed. The reason that these were missed is because all the ARM CMA work bypassed me - CMA on ARM has had zero review from the point of view of the ARM architecture, so it's not surprising it gets stuff like this wrong.
Once that's fixed, setting bit 22 is not necessary.

On Wed, Mar 11, 2015 at 10:04 PM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
No, this is wrong. They do not. CMA remaps pages to be non-cacheable rather than the old technique where the above statement was true.
There's some corner cases which make that less effective than it once was, and as I've already said, those need to be fixed. The reason that these were missed is because all the ARM CMA work bypassed me - CMA on ARM has had zero review from the point of view of the ARM architecture, so it's not surprising it gets stuff like this wrong.
Once that's fixed, setting bit 22 is not necessary.
Understood. Thanks for the clarification.

On Thu, Mar 12, 2015 at 01:04:31AM +0000, Russell King - ARM Linux wrote:
On Wed, Mar 11, 2015 at 05:12:12PM -0300, Fabio Estevam wrote:
From: Fabio Estevam fabio.estevam@freescale.com
Having bit 22 cleared in the PL310 Auxiliary Control register (shared attribute override enable) has the side effect of transforming Normal Shared Non-cacheable reads into Cacheable no-allocate reads.
Coherent DMA buffers in Linux always have a Cacheable alias via the kernel linear mapping and the processor can speculatively load cache lines into the PL310 controller.
No, this is wrong. They do not. CMA remaps pages to be non-cacheable rather than the old technique where the above statement was true.
There's some corner cases which make that less effective than it once was, and as I've already said, those need to be fixed. The reason that these were missed is because all the ARM CMA work bypassed me - CMA on ARM has had zero review from the point of view of the ARM architecture, so it's not surprising it gets stuff like this wrong.
Once that's fixed, setting bit 22 is not necessary.
And I strongly disagree with your statement. Seriously, there are so many assumption about when this bit will no longer be required like the platform always using CMA, having fixed the CMA code in Linux. That's a boot loader patch and even though it's used mostly (only) to load Linux, we should not make this assumption.
Most importantly, not setting this bit makes your SoC non-compliant with the ARM ARM clarifications on mismatched aliases. It was a hardware mistake to have it cleared out of reset but just let firmware or boot-loaders deal with it.
(there are some very specific use-cases for this bit that the hw guys had in mind but none of them apply to Linux)

On Wed, Mar 11, 2015 at 08:12:12PM +0000, Fabio Estevam wrote:
From: Fabio Estevam fabio.estevam@freescale.com
Having bit 22 cleared in the PL310 Auxiliary Control register (shared attribute override enable) has the side effect of transforming Normal Shared Non-cacheable reads into Cacheable no-allocate reads.
Coherent DMA buffers in Linux always have a Cacheable alias via the kernel linear mapping and the processor can speculatively load cache lines into the PL310 controller. With bit 22 cleared, Non-cacheable reads would unexpectedly hit such cache lines leading to buffer corruption.
This was inspired by a patch from Catalin Marinas [1] and also from recent discussions in the linux-arm-kernel list [2] where Russell King and Rob Herring suggested that bootloaders should initialize the cache.
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/031810.h... [2] https://lkml.org/lkml/2015/2/20/199
Signed-off-by: Fabio Estevam fabio.estevam@freescale.com
Acked-by: Catalin Marinas catalin.marinas@arm.com

On Wed, Mar 11, 2015 at 05:12:12PM -0300, Fabio Estevam wrote:
From: Fabio Estevam fabio.estevam@freescale.com
Having bit 22 cleared in the PL310 Auxiliary Control register (shared attribute override enable) has the side effect of transforming Normal Shared Non-cacheable reads into Cacheable no-allocate reads.
Coherent DMA buffers in Linux always have a Cacheable alias via the kernel linear mapping and the processor can speculatively load cache lines into the PL310 controller. With bit 22 cleared, Non-cacheable reads would unexpectedly hit such cache lines leading to buffer corruption.
This was inspired by a patch from Catalin Marinas [1] and also from recent discussions in the linux-arm-kernel list [2] where Russell King and Rob Herring suggested that bootloaders should initialize the cache.
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/031810.h... [2] https://lkml.org/lkml/2015/2/20/199
Signed-off-by: Fabio Estevam fabio.estevam@freescale.com
arch/arm/cpu/armv7/mx6/soc.c | 8 ++++++++ arch/arm/include/asm/pl310.h | 2 ++ 2 files changed, 10 insertions(+)
diff --git a/arch/arm/cpu/armv7/mx6/soc.c b/arch/arm/cpu/armv7/mx6/soc.c index ef02972..5aab305 100644 --- a/arch/arm/cpu/armv7/mx6/soc.c +++ b/arch/arm/cpu/armv7/mx6/soc.c @@ -506,6 +506,14 @@ void v7_outer_cache_enable(void) struct pl310_regs *const pl310 = (struct pl310_regs *)L2_PL310_BASE; unsigned int val;
- /*
* Set bit 22 in the auxiliary control register. If this bit
* is cleared, PL310 treats Normal Shared Non-cacheable
* accesses as Cacheable no-allocate.
*/
- setbits_le32(&pl310->pl310_aux_ctrl, L310_SHARED_ATT_OVERRIDE_ENABLE);
#if defined CONFIG_MX6SL struct iomuxc *iomux = (struct iomuxc *)IOMUXC_BASE_ADDR; val = readl(&iomux->gpr[11]);
We should put this somewhere a bit more common that other A9 cores can also call into like OMAP4, SoCFPGA and maybe zynq later (based on a quick git grep pl310).

On Thu, Mar 12, 2015 at 10:41 AM, Tom Rini trini@konsulko.com wrote:
We should put this somewhere a bit more common that other A9 cores can also call into like OMAP4, SoCFPGA and maybe zynq later (based on a quick git grep pl310).
I thought about it as well, but I didn't find a suitable common place for putting it.
Suggestions? Thanks

Tom/Nishanth,
On Thu, Mar 12, 2015 at 10:57 AM, Fabio Estevam festevam@gmail.com wrote:
On Thu, Mar 12, 2015 at 10:41 AM, Tom Rini trini@konsulko.com wrote:
We should put this somewhere a bit more common that other A9 cores can also call into like OMAP4, SoCFPGA and maybe zynq later (based on a quick git grep pl310).
I thought about it as well, but I didn't find a suitable common place for putting it.
Suggestions? Thanks
What about this?
diff --git a/arch/arm/cpu/armv7/cache_v7.c b/arch/arm/cpu/armv7/cache_v7.c index 0f9d837..e3335f2 100644 --- a/arch/arm/cpu/armv7/cache_v7.c +++ b/arch/arm/cpu/armv7/cache_v7.c @@ -8,6 +8,7 @@ #include <linux/types.h> #include <common.h> #include <asm/armv7.h> +#include <asm/pl310.h> #include <asm/utils.h>
#define ARMV7_DCACHE_INVAL_ALL 1 @@ -274,8 +275,25 @@ void flush_dcache_range(unsigned long start, unsigned long stop) v7_outer_cache_flush_range(start, stop); }
+#ifdef CONFIG_SYS_L2_PL310 +static void pl310_set_override(void) +{ + struct pl310_regs *const pl310 = (struct pl310_regs *)L2_PL310_BASE; + + /* + * Set bit 22 in the auxiliary control register. If this bit + * is cleared, PL310 treats Normal Shared Non-cacheable + * accesses as Cacheable no-allocate. + */ + setbits_le32(&pl310->pl310_aux_ctrl, L310_SHARED_ATT_OVERRIDE_ENABLE); +} +#endif + void arm_init_before_mmu(void) { +#ifdef CONFIG_SYS_L2_PL310 + pl310_set_override(); +#endif v7_outer_cache_enable(); invalidate_dcache_all(); v7_inval_tlb(); diff --git a/arch/arm/include/asm/pl310.h b/arch/arm/include/asm/pl310.h index ddc245b..de7650e 100644 --- a/arch/arm/include/asm/pl310.h +++ b/arch/arm/include/asm/pl310.h @@ -16,6 +16,8 @@ #define L2X0_STNDBY_MODE_EN (1 << 0) #define L2X0_CTRL_EN 1
+#define L310_SHARED_ATT_OVERRIDE_ENABLE (1 << 22) + struct pl310_regs { u32 pl310_cache_id; u32 pl310_cache_type;

On 03/12/2015 09:25 AM, Fabio Estevam wrote:
Tom/Nishanth,
On Thu, Mar 12, 2015 at 10:57 AM, Fabio Estevam festevam@gmail.com wrote:
On Thu, Mar 12, 2015 at 10:41 AM, Tom Rini trini@konsulko.com wrote:
We should put this somewhere a bit more common that other A9 cores can also call into like OMAP4, SoCFPGA and maybe zynq later (based on a quick git grep pl310).
I thought about it as well, but I didn't find a suitable common place for putting it.
Suggestions? Thanks
What about this?
diff --git a/arch/arm/cpu/armv7/cache_v7.c b/arch/arm/cpu/armv7/cache_v7.c index 0f9d837..e3335f2 100644 --- a/arch/arm/cpu/armv7/cache_v7.c +++ b/arch/arm/cpu/armv7/cache_v7.c @@ -8,6 +8,7 @@ #include <linux/types.h> #include <common.h> #include <asm/armv7.h> +#include <asm/pl310.h> #include <asm/utils.h>
#define ARMV7_DCACHE_INVAL_ALL 1 @@ -274,8 +275,25 @@ void flush_dcache_range(unsigned long start, unsigned long stop) v7_outer_cache_flush_range(start, stop); }
+#ifdef CONFIG_SYS_L2_PL310 +static void pl310_set_override(void) +{
- struct pl310_regs *const pl310 = (struct pl310_regs *)L2_PL310_BASE;
- /*
* Set bit 22 in the auxiliary control register. If this bit
* is cleared, PL310 treats Normal Shared Non-cacheable
* accesses as Cacheable no-allocate.
*/
- setbits_le32(&pl310->pl310_aux_ctrl, L310_SHARED_ATT_OVERRIDE_ENABLE);
I dont think this works for OMAP4 (which also uses A9, PL310) - we use an smc #0 with service 0x109 (I have to reconfirm) to set l2 aux_ctrl.
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/ar...
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/ar...
we might want to ensure that: a) The "setting" part of things get into a weak function with default function that may be SoC dependent if needed b) there be revision checks as needed to add this. c) each configuration be adequately isolated perhaps?
+} +#endif
void arm_init_before_mmu(void) { +#ifdef CONFIG_SYS_L2_PL310
- pl310_set_override();
+#endif v7_outer_cache_enable(); invalidate_dcache_all(); v7_inval_tlb(); diff --git a/arch/arm/include/asm/pl310.h b/arch/arm/include/asm/pl310.h index ddc245b..de7650e 100644 --- a/arch/arm/include/asm/pl310.h +++ b/arch/arm/include/asm/pl310.h @@ -16,6 +16,8 @@ #define L2X0_STNDBY_MODE_EN (1 << 0) #define L2X0_CTRL_EN 1
+#define L310_SHARED_ATT_OVERRIDE_ENABLE (1 << 22)
struct pl310_regs { u32 pl310_cache_id; u32 pl310_cache_type;

On Thu, Mar 12, 2015 at 11:43 AM, Nishanth Menon nm@ti.com wrote:
I dont think this works for OMAP4 (which also uses A9, PL310) - we use an smc #0 with service 0x109 (I have to reconfirm) to set l2 aux_ctrl.
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/ar...
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/ar...
we might want to ensure that: a) The "setting" part of things get into a weak function with default function that may be SoC dependent if needed b) there be revision checks as needed to add this. c) each configuration be adequately isolated perhaps?
Adding a common support for this seems to be far more complex then I expected.
Maybe someone could come up with a proper common solution for this. If not, then we should go with the mx6 specific patch for the time being.
Regards,
Fabio Estevam

On Thu, Mar 12, 2015 at 10:15 AM, Fabio Estevam festevam@gmail.com wrote:
On Thu, Mar 12, 2015 at 11:43 AM, Nishanth Menon nm@ti.com wrote:
I dont think this works for OMAP4 (which also uses A9, PL310) - we use an smc #0 with service 0x109 (I have to reconfirm) to set l2 aux_ctrl.
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/ar...
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/ar...
we might want to ensure that: a) The "setting" part of things get into a weak function with default function that may be SoC dependent if needed b) there be revision checks as needed to add this. c) each configuration be adequately isolated perhaps?
Adding a common support for this seems to be far more complex then I expected.
Maybe someone could come up with a proper common solution for this. If not, then we should go with the mx6 specific patch for the time being.
Maybe the following can help? (reposting) http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/214436

Hi Nishanth,
On Thu, Mar 12, 2015 at 12:34 PM, Nishanth Menon nm@ti.com wrote:
Maybe the following can help? (reposting) http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/214436
It seems you are in a better position to provide such generic solution :-)
Do you plan to work on it?
Regards,
Fabio Estevam

On 21:35-20150409, Fabio Estevam wrote:
Hi Nishanth,
On Thu, Mar 12, 2015 at 12:34 PM, Nishanth Menon nm@ti.com wrote:
Maybe the following can help? (reposting) http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/214436
It seems you are in a better position to provide such generic solution :-)
Do you plan to work on it?
I had'nt originally planned on working on it, but then with OMAP4(pl310 base 0x48242000) and AM437x(pl310 base 0x48242000) both using A9, I am a little curious as well.. and probably can give it a shot.. but, I'd probably need a lil more info.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0246e/Beifcid... bit 22 is Shared attribute override enable
I quickly tried to get my hands on PL310 erratum doc and the best I could get my hands on to was Document Revision: 14.1 (10-Apr-2012) - not really the latest and for whatever reason, I can find this in there :( - Does anyone have the erratum number for this? I can try and get the latest erratum doc once normal human hours start back at work..
Anyways.. here is a build tested diff as a proposal.. quickly wrote up in the last 30 mins or so and only build tested(OMAP4 only).. wont bet my car on ir... but anyways.. since folks wanted an illustration..
Here are the good parts.. for imx, you probably wont have to override the weak function pl310_write_aux_ctrl and you will function happily, I can override it for AM437x and OMAP4 based on the secure call needed and we will both flow the same path in the logic. Ofcourse, currently pl310.c is not being built for SPL.. we could fix that ofcourse.. I did not attempt to do that in this diff - just hacked around it..
bad part is as follows: unlike the kernel where the default is expected to be write_sec, we assume direct non-secure access will probably work.. but then, that should be properly evaluated.
If you guys want to take the following approach to it's logical conclusion, please feel free.. I wont probably get time to dig at this given my current work obligations :( I can however help patch up/test OMAP4/AM437x as needed (I think I can make some time out for that..)
btw, something nice to do would be to extend this with proper abstraction so that we can do something like the write_sec that the kernel does today - instead of register specific access.. but on the flip side, register specific functions lets us skip case like code to handle the various register accesses.. anyways... we can always figure that out if needed...
Hope this helps.. diff --git a/arch/arm/cpu/armv7/omap4/hwinit.c b/arch/arm/cpu/armv7/omap4/hwinit.c index 9792761d40a0..b1841a060b66 100644 --- a/arch/arm/cpu/armv7/omap4/hwinit.c +++ b/arch/arm/cpu/armv7/omap4/hwinit.c @@ -19,6 +19,7 @@ #include <asm/emif.h> #include <asm/arch/gpio.h> #include <asm/omap_common.h> +#include <asm/pl310.h>
DECLARE_GLOBAL_DATA_PTR;
@@ -35,6 +36,13 @@ static const struct gpio_bank gpio_bank_44xx[6] = {
const struct gpio_bank *const omap_gpio_bank = gpio_bank_44xx;
+#ifndef CONFIG_SPL_BUILD +void pl310_write_aux_ctrl(u32 aux_ctrl) +{ + omap_smc1(OMAP4_SERVICE_PL310_AUXCTRL_REG_SET, aux_ctrl); +} +#endif + #ifdef CONFIG_SPL_BUILD /* * Some tuning of IOs for optimal power and performance diff --git a/arch/arm/cpu/armv7/start.S b/arch/arm/cpu/armv7/start.S index 5ed0f45a2661..4608c0503463 100644 --- a/arch/arm/cpu/armv7/start.S +++ b/arch/arm/cpu/armv7/start.S @@ -227,6 +227,10 @@ skip_errata_430973: skip_errata_621766: #endif
+#ifdef CONFIG_SYS_L2_PL310 + bl pl310_erratum_implement +#endif /* CONFIG_SYS_L2_PL310 */ + mov pc, r5 @ back to my caller ENDPROC(cpu_init_cp15)
diff --git a/arch/arm/include/asm/arch-omap4/sys_proto.h b/arch/arm/include/asm/arch-omap4/sys_proto.h index f30f86539130..db0fca3b3f7f 100644 --- a/arch/arm/include/asm/arch-omap4/sys_proto.h +++ b/arch/arm/include/asm/arch-omap4/sys_proto.h @@ -58,5 +58,6 @@ void force_emif_self_refresh(void); void setup_warmreset_time(void);
#define OMAP4_SERVICE_PL310_CONTROL_REG_SET 0x102 +#define OMAP4_SERVICE_PL310_AUXCTRL_REG_SET 0x109
#endif diff --git a/arch/arm/include/asm/armv7.h b/arch/arm/include/asm/armv7.h index 58d8b161215a..8ee14934ede7 100644 --- a/arch/arm/include/asm/armv7.h +++ b/arch/arm/include/asm/armv7.h @@ -142,6 +142,7 @@ void v7_arch_cp15_set_l2aux_ctrl(u32 l2auxctrl, u32 cpu_midr, u32 cpu_rev); void v7_arch_cp15_set_acr(u32 acr, u32 cpu_midr, u32 cpu_rev_comb, u32 cpu_variant, u32 cpu_rev); +void pl310_erratum_implement(void); #endif /* ! __ASSEMBLY__ */
#endif diff --git a/arch/arm/include/asm/pl310.h b/arch/arm/include/asm/pl310.h index ddc245bfd559..d0955d0ae762 100644 --- a/arch/arm/include/asm/pl310.h +++ b/arch/arm/include/asm/pl310.h @@ -12,6 +12,7 @@
/* Register bit fields */ #define PL310_AUX_CTRL_ASSOCIATIVITY_MASK (1 << 16) +#define PL310_AUX_CTRL_SHARED_ATTRIB_OVERRIDE_EN (1 << 22) #define L2X0_DYNAMIC_CLK_GATING_EN (1 << 1) #define L2X0_STNDBY_MODE_EN (1 << 0) #define L2X0_CTRL_EN 1 @@ -74,5 +75,6 @@ void pl310_inval_all(void); void pl310_clean_inval_all(void); void pl310_inval_range(u32 start, u32 end); void pl310_clean_inval_range(u32 start, u32 end); +void pl310_write_aux_ctrl(u32 aux_ctrl);
#endif diff --git a/arch/arm/lib/cache-pl310.c b/arch/arm/lib/cache-pl310.c index 1ad1f8aea085..d0cc7d8295c8 100644 --- a/arch/arm/lib/cache-pl310.c +++ b/arch/arm/lib/cache-pl310.c @@ -14,6 +14,21 @@
struct pl310_regs *const pl310 = (struct pl310_regs *)CONFIG_SYS_PL310_BASE;
+/* Override in platform code based on secure access needs */ +void __weak pl310_write_aux_ctrl(u32 aux_ctrl) +{ + writel(aux_ctrl, &pl310->pl310_aux_ctrl); +} + +void pl310_erratum_implement(void) +{ + u32 __maybe_unused reg = 0; +#ifdef CONFIG_ARM_PL310_ERRATA_xyz + reg = readl(&pl310->pl310_aux_ctrl); + pl310_write_aux_ctrl(reg | PL310_AUX_CTRL_SHARED_ATTRIB_OVERRIDE_EN); +#endif +} + static void pl310_cache_sync(void) { writel(0, &pl310->pl310_cache_sync); diff --git a/include/configs/ti_omap4_common.h b/include/configs/ti_omap4_common.h index 1c93aab1a7d7..765ceef000aa 100644 --- a/include/configs/ti_omap4_common.h +++ b/include/configs/ti_omap4_common.h @@ -23,9 +23,10 @@
#define CONFIG_SYS_THUMB_BUILD
-#ifndef CONFIG_SYS_L2CACHE_OFF +#if !defined(CONFIG_SYS_L2CACHE_OFF) && !defined(CONFIG_SPL_BUILD) #define CONFIG_SYS_L2_PL310 1 #define CONFIG_SYS_PL310_BASE 0x48242000 +#define CONFIG_ARM_PL310_ERRATA_xyz #endif #define CONFIG_SYS_CACHELINE_SIZE 32

On 03/11/2015 03:12 PM, Fabio Estevam wrote:
From: Fabio Estevam fabio.estevam@freescale.com
Having bit 22 cleared in the PL310 Auxiliary Control register (shared attribute override enable) has the side effect of transforming Normal Shared Non-cacheable reads into Cacheable no-allocate reads.
Coherent DMA buffers in Linux always have a Cacheable alias via the kernel linear mapping and the processor can speculatively load cache lines into the PL310 controller. With bit 22 cleared, Non-cacheable reads would unexpectedly hit such cache lines leading to buffer corruption.
This was inspired by a patch from Catalin Marinas [1] and also from recent discussions in the linux-arm-kernel list [2] where Russell King and Rob Herring suggested that bootloaders should initialize the cache.
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/031810.h... [2] https://lkml.org/lkml/2015/2/20/199
Signed-off-by: Fabio Estevam fabio.estevam@freescale.com
arch/arm/cpu/armv7/mx6/soc.c | 8 ++++++++ arch/arm/include/asm/pl310.h | 2 ++ 2 files changed, 10 insertions(+)
diff --git a/arch/arm/cpu/armv7/mx6/soc.c b/arch/arm/cpu/armv7/mx6/soc.c index ef02972..5aab305 100644 --- a/arch/arm/cpu/armv7/mx6/soc.c +++ b/arch/arm/cpu/armv7/mx6/soc.c @@ -506,6 +506,14 @@ void v7_outer_cache_enable(void) struct pl310_regs *const pl310 = (struct pl310_regs *)L2_PL310_BASE; unsigned int val;
- /*
* Set bit 22 in the auxiliary control register. If this bit
* is cleared, PL310 treats Normal Shared Non-cacheable
* accesses as Cacheable no-allocate.
*/
- setbits_le32(&pl310->pl310_aux_ctrl, L310_SHARED_ATT_OVERRIDE_ENABLE);
#if defined CONFIG_MX6SL struct iomuxc *iomux = (struct iomuxc *)IOMUXC_BASE_ADDR; val = readl(&iomux->gpr[11]); diff --git a/arch/arm/include/asm/pl310.h b/arch/arm/include/asm/pl310.h index ddc245b..de7650e 100644 --- a/arch/arm/include/asm/pl310.h +++ b/arch/arm/include/asm/pl310.h @@ -16,6 +16,8 @@ #define L2X0_STNDBY_MODE_EN (1 << 0) #define L2X0_CTRL_EN 1
+#define L310_SHARED_ATT_OVERRIDE_ENABLE (1 << 22)
struct pl310_regs { u32 pl310_cache_id; u32 pl310_cache_type;
is it possible for us to centralize the pl310 logic - that'd let us reuse generic logic cross SoCs without having to duplicate bits like these over and over. at least A9 based TI SoCs could potentially benefit out of this.
The only problem was to deal with actual PL310 configuration path which could be SoC dependent, but then, we could implement weak functions that allow us to override the same. I tried to do something like that for CP15 errata[1]
just my 2 cents.
[1] http://article.gmane.org/gmane.comp.boot-loaders.u-boot/214436

Hi Fabio,
On 11/03/2015 21:12, Fabio Estevam wrote:
From: Fabio Estevam fabio.estevam@freescale.com
Having bit 22 cleared in the PL310 Auxiliary Control register (shared attribute override enable) has the side effect of transforming Normal Shared Non-cacheable reads into Cacheable no-allocate reads.
Coherent DMA buffers in Linux always have a Cacheable alias via the kernel linear mapping and the processor can speculatively load cache lines into the PL310 controller. With bit 22 cleared, Non-cacheable reads would unexpectedly hit such cache lines leading to buffer corruption.
This was inspired by a patch from Catalin Marinas [1] and also from recent discussions in the linux-arm-kernel list [2] where Russell King and Rob Herring suggested that bootloaders should initialize the cache.
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/031810.h... [2] https://lkml.org/lkml/2015/2/20/199
Signed-off-by: Fabio Estevam fabio.estevam@freescale.com
arch/arm/cpu/armv7/mx6/soc.c | 8 ++++++++ arch/arm/include/asm/pl310.h | 2 ++ 2 files changed, 10 insertions(+)
diff --git a/arch/arm/cpu/armv7/mx6/soc.c b/arch/arm/cpu/armv7/mx6/soc.c index ef02972..5aab305 100644 --- a/arch/arm/cpu/armv7/mx6/soc.c +++ b/arch/arm/cpu/armv7/mx6/soc.c @@ -506,6 +506,14 @@ void v7_outer_cache_enable(void) struct pl310_regs *const pl310 = (struct pl310_regs *)L2_PL310_BASE; unsigned int val;
- /*
* Set bit 22 in the auxiliary control register. If this bit
* is cleared, PL310 treats Normal Shared Non-cacheable
* accesses as Cacheable no-allocate.
*/
- setbits_le32(&pl310->pl310_aux_ctrl, L310_SHARED_ATT_OVERRIDE_ENABLE);
#if defined CONFIG_MX6SL struct iomuxc *iomux = (struct iomuxc *)IOMUXC_BASE_ADDR; val = readl(&iomux->gpr[11]); diff --git a/arch/arm/include/asm/pl310.h b/arch/arm/include/asm/pl310.h index ddc245b..de7650e 100644 --- a/arch/arm/include/asm/pl310.h +++ b/arch/arm/include/asm/pl310.h @@ -16,6 +16,8 @@ #define L2X0_STNDBY_MODE_EN (1 << 0) #define L2X0_CTRL_EN 1
+#define L310_SHARED_ATT_OVERRIDE_ENABLE (1 << 22)
struct pl310_regs { u32 pl310_cache_id; u32 pl310_cache_type;
It looks like from the discussion and the following threads that a general solution cannot be easy found. I agree with you to apply it at least for i.MX6, and let's see if in the future we can factorize it for other SOCs.
Best regards, Stefano Babic

Hi Stefano,
On Fri, May 15, 2015 at 10:35 AM, Stefano Babic sbabic@denx.de wrote:
It looks like from the discussion and the following threads that a general solution cannot be easy found. I agree with you to apply it at least for i.MX6, and let's see if in the future we can factorize it for other SOCs.
That sounds good! Thanks
participants (6)
-
Catalin Marinas
-
Fabio Estevam
-
Nishanth Menon
-
Russell King - ARM Linux
-
Stefano Babic
-
Tom Rini