[U-Boot] [PATCH 1/2] armv8: add hooks for all cache-wide operations

From: Stephen Warren swarren@nvidia.com
SoC-specific logic may be required for all forms of cache-wide operations; invalidate and flush of both dcache and icache (note that only 3 of the 4 possible combinations make sense, since the icache never contains dirty lines). This patch adds an optional hook for all implemented cache-wide operations, and renames the one existing hook to better represent exactly which operation it is implementing. A dummy no-op implementation of each hook is provided. These dummy implementations are moved into C code, since there's no need to implement them in assembly.
Signed-off-by: Stephen Warren swarren@nvidia.com --- arch/arm/cpu/armv8/cache.S | 6 ------ arch/arm/cpu/armv8/cache_v8.c | 23 ++++++++++++++++++++--- arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S | 4 ++-- arch/arm/include/asm/system.h | 5 ++++- arch/arm/mach-tegra/tegra186/cache.c | 2 +- 5 files changed, 27 insertions(+), 13 deletions(-)
diff --git a/arch/arm/cpu/armv8/cache.S b/arch/arm/cpu/armv8/cache.S index 46f25e63f01d..23fa914dc556 100644 --- a/arch/arm/cpu/armv8/cache.S +++ b/arch/arm/cpu/armv8/cache.S @@ -150,12 +150,6 @@ ENTRY(__asm_invalidate_icache_all) ret ENDPROC(__asm_invalidate_icache_all)
-ENTRY(__asm_flush_l3_cache) - mov x0, #0 /* return status as success */ - ret -ENDPROC(__asm_flush_l3_cache) - .weak __asm_flush_l3_cache - /* * void __asm_switch_ttbr(ulong new_ttbr) * diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c index cd3f6c10ae12..d2965e9878a0 100644 --- a/arch/arm/cpu/armv8/cache_v8.c +++ b/arch/arm/cpu/armv8/cache_v8.c @@ -415,25 +415,36 @@ __weak void mmu_setup(void) set_sctlr(get_sctlr() | CR_M); }
+__weak int invalidate_dcache_all_l3(void) +{ + return 0; +} + /* * Performs a invalidation of the entire data cache at all levels */ void invalidate_dcache_all(void) { __asm_invalidate_dcache_all(); + invalidate_dcache_all_l3(); +} + +__weak int flush_dcache_all_l3(void) +{ + return 0; }
/* * Performs a clean & invalidation of the entire data cache at all levels. * This function needs to be inline to avoid using stack. - * __asm_flush_l3_cache return status of timeout + * flush_dcache_all_l3 return status of timeout */ inline void flush_dcache_all(void) { int ret;
__asm_flush_dcache_all(); - ret = __asm_flush_l3_cache(); + ret = flush_dcache_all_l3(); if (ret) debug("flushing dcache returns 0x%x\n", ret); else @@ -623,7 +634,7 @@ void mmu_set_region_dcache_behaviour(phys_addr_t start, size_t size,
void icache_enable(void) { - __asm_invalidate_icache_all(); + invalidate_icache_all(); set_sctlr(get_sctlr() | CR_I); }
@@ -637,9 +648,15 @@ int icache_status(void) return (get_sctlr() & CR_I) != 0; }
+__weak int invalidate_icache_all_l3(void) +{ + return 0; +} + void invalidate_icache_all(void) { __asm_invalidate_icache_all(); + invalidate_icache_all_l3(); }
#else /* CONFIG_SYS_ICACHE_OFF */ diff --git a/arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S b/arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S index 5d0b7a45c354..4e4ef8b7a6df 100644 --- a/arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S +++ b/arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S @@ -245,7 +245,7 @@ hnf_set_pstate:
ret
-ENTRY(__asm_flush_l3_cache) +ENTRY(flush_dcache_all_l3) /* * Return status in x0 * success 0 @@ -275,7 +275,7 @@ ENTRY(__asm_flush_l3_cache) mov x0, x8 mov lr, x29 ret -ENDPROC(__asm_flush_l3_cache) +ENDPROC(flush_dcache_all_l3) #endif
#ifdef CONFIG_MP diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h index c18e1e3a10ee..095f3742ce60 100644 --- a/arch/arm/include/asm/system.h +++ b/arch/arm/include/asm/system.h @@ -93,9 +93,12 @@ void __asm_invalidate_dcache_all(void); void __asm_flush_dcache_range(u64 start, u64 end); void __asm_invalidate_tlb_all(void); void __asm_invalidate_icache_all(void); -int __asm_flush_l3_cache(void); void __asm_switch_ttbr(u64 new_ttbr);
+int invalidate_dcache_all_l3(void); +int flush_dcache_all_l3(void); +int invalidate_icache_all_l3(void); + void armv8_switch_to_el2(void); void armv8_switch_to_el1(void); void gic_init(void); diff --git a/arch/arm/mach-tegra/tegra186/cache.c b/arch/arm/mach-tegra/tegra186/cache.c index adaed8968eb9..fb0b1142e49b 100644 --- a/arch/arm/mach-tegra/tegra186/cache.c +++ b/arch/arm/mach-tegra/tegra186/cache.c @@ -10,7 +10,7 @@ #define SMC_SIP_INVOKE_MCE 0x82FFFF00 #define MCE_SMC_ROC_FLUSH_CACHE 11
-int __asm_flush_l3_cache(void) +int flush_dcache_all_l3(void) { struct pt_regs regs = {0};

From: Stephen Warren swarren@nvidia.com
An SMC call is required for all cache-wide operations on Tegra186. This patch implements the two missing hooks now that U-Boot supports them, and fixes the mapping of "hook name" to SMC call code.
Signed-off-by: Stephen Warren swarren@nvidia.com --- arch/arm/mach-tegra/tegra186/cache.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/arch/arm/mach-tegra/tegra186/cache.c b/arch/arm/mach-tegra/tegra186/cache.c index fb0b1142e49b..3ab60b3b85b2 100644 --- a/arch/arm/mach-tegra/tegra186/cache.c +++ b/arch/arm/mach-tegra/tegra186/cache.c @@ -8,16 +8,33 @@ #include <asm/system.h>
#define SMC_SIP_INVOKE_MCE 0x82FFFF00 -#define MCE_SMC_ROC_FLUSH_CACHE 11 +#define MCE_SMC_ROC_FLUSH_CACHE 11 +#define MCE_SMC_ROC_FLUSH_CACHE_ONLY 14 +#define MCE_SMC_ROC_CLEAN_CACHE_ONLY 15
-int flush_dcache_all_l3(void) +static int smc_sip_invoke_mce(int cmd) { struct pt_regs regs = {0};
isb();
- regs.regs[0] = SMC_SIP_INVOKE_MCE | MCE_SMC_ROC_FLUSH_CACHE; + regs.regs[0] = SMC_SIP_INVOKE_MCE | cmd; smc_call(®s);
return 0; } + +int invalidate_dcache_all_l3(void) +{ + return smc_sip_invoke_mce(MCE_SMC_ROC_FLUSH_CACHE_ONLY); +} + +int flush_dcache_all_l3(void) +{ + return smc_sip_invoke_mce(MCE_SMC_ROC_CLEAN_CACHE_ONLY); +} + +int invalidate_icache_all_l3(void) +{ + return smc_sip_invoke_mce(MCE_SMC_ROC_FLUSH_CACHE); +}

On 10/17/2016 04:35 PM, Stephen Warren wrote:
From: Stephen Warren swarren@nvidia.com
SoC-specific logic may be required for all forms of cache-wide operations; invalidate and flush of both dcache and icache (note that only 3 of the 4 possible combinations make sense, since the icache never contains dirty lines). This patch adds an optional hook for all implemented cache-wide operations, and renames the one existing hook to better represent exactly which operation it is implementing. A dummy no-op implementation of each hook is provided. These dummy implementations are moved into C code, since there's no need to implement them in assembly.
Stephen,
Moving this function to C may pose an issue. I had a debug a couple of years ago that calling a C function put the stack into cache after flushing L1/L2. That's why I used asm function to flush L3.
York

On 10/18/2016 09:28 AM, york sun wrote:
On 10/17/2016 04:35 PM, Stephen Warren wrote:
From: Stephen Warren swarren@nvidia.com
SoC-specific logic may be required for all forms of cache-wide operations; invalidate and flush of both dcache and icache (note that only 3 of the 4 possible combinations make sense, since the icache never contains dirty lines). This patch adds an optional hook for all implemented cache-wide operations, and renames the one existing hook to better represent exactly which operation it is implementing. A dummy no-op implementation of each hook is provided. These dummy implementations are moved into C code, since there's no need to implement them in assembly.
Stephen,
Moving this function to C may pose an issue. I had a debug a couple of years ago that calling a C function put the stack into cache after flushing L1/L2. That's why I used asm function to flush L3.
Assuming the stack is located in cachable memory, the CPU is free (per the definition of the ARM architecture) to pull it into the cache at any time the cache is enabled (and perhaps even when it isn't enabled, at the very least for the icache on ARMv8 if not other cases too). Implementation in C vs. assembly has absolutely no effect here. I guess your statement assumes that C functions will write data to the stack and assembly functions never will. There's no strict 1:1 correlation between those two things; assembly code can touch the stack just like C code. If there's an assumption it won't, it needs to be documented in the header defining these hook functions.
I assume you're specifically talking about dirtying the dcache between the point when dcache flushing starts and the point when the dcache is disabled? If so, flush_dcache_all() itself would have to be manually coded in assembly to avoid using the stack, as would dcache_disable() and set_sctlr(). I think this is why dcache_disable() currently disables the dcache first (thus preventing it acquiring new dirty data) and then flushes the dcache afterwards (thus guaranteeing that all dirty data is flushed with no race condition). This implies that your change to swap the order of those two functions isn't correct. I'm pretty sure I'm correct in saying that the dcache can hit even if it's disabled, hence disabling the dcache while it contains dirty data won't lead to issues?

On 10/18/2016 11:14 AM, Stephen Warren wrote:
On 10/18/2016 09:28 AM, york sun wrote:
On 10/17/2016 04:35 PM, Stephen Warren wrote:
From: Stephen Warren swarren@nvidia.com
SoC-specific logic may be required for all forms of cache-wide operations; invalidate and flush of both dcache and icache (note that only 3 of the 4 possible combinations make sense, since the icache never contains dirty lines). This patch adds an optional hook for all implemented cache-wide operations, and renames the one existing hook to better represent exactly which operation it is implementing. A dummy no-op implementation of each hook is provided. These dummy implementations are moved into C code, since there's no need to implement them in assembly.
Stephen,
Moving this function to C may pose an issue. I had a debug a couple of years ago that calling a C function put the stack into cache after flushing L1/L2. That's why I used asm function to flush L3.
Assuming the stack is located in cachable memory, the CPU is free (per the definition of the ARM architecture) to pull it into the cache at any time the cache is enabled (and perhaps even when it isn't enabled, at the very least for the icache on ARMv8 if not other cases too). Implementation in C vs. assembly has absolutely no effect here. I guess your statement assumes that C functions will write data to the stack and assembly functions never will. There's no strict 1:1 correlation between those two things; assembly code can touch the stack just like C code. If there's an assumption it won't, it needs to be documented in the header defining these hook functions.
I assume you're specifically talking about dirtying the dcache between the point when dcache flushing starts and the point when the dcache is disabled? If so, flush_dcache_all() itself would have to be manually coded in assembly to avoid using the stack, as would dcache_disable() and set_sctlr(). I think this is why dcache_disable() currently disables the dcache first (thus preventing it acquiring new dirty data) and then flushes the dcache afterwards (thus guaranteeing that all dirty data is flushed with no race condition). This implies that your change to swap the order of those two functions isn't correct. I'm pretty sure I'm
I wonder if David can shed some light on the original order of calls to disable dcache.
correct in saying that the dcache can hit even if it's disabled, hence disabling the dcache while it contains dirty data won't lead to issues?
My earlier debug was based on the original order of calls. I found I had to avoid using the stack before flushing L3. Now with the changed order, I haven't tested. But I can image the stack will be dirty and flushing L3 may or may not push the data into main memory (depending on the L3 implementation whether inclusive or not).
You said you are sure dcache can hit even if it is disabled. Can you explain more? My test shows as soon as the d-cache is disabled, the core cannot get the data in dirty cache.
York

On 10/18/2016 12:40 PM, york sun wrote:
On 10/18/2016 11:14 AM, Stephen Warren wrote:
On 10/18/2016 09:28 AM, york sun wrote:
On 10/17/2016 04:35 PM, Stephen Warren wrote:
From: Stephen Warren swarren@nvidia.com
SoC-specific logic may be required for all forms of cache-wide operations; invalidate and flush of both dcache and icache (note that only 3 of the 4 possible combinations make sense, since the icache never contains dirty lines). This patch adds an optional hook for all implemented cache-wide operations, and renames the one existing hook to better represent exactly which operation it is implementing. A dummy no-op implementation of each hook is provided. These dummy implementations are moved into C code, since there's no need to implement them in assembly.
Stephen,
Moving this function to C may pose an issue. I had a debug a couple of years ago that calling a C function put the stack into cache after flushing L1/L2. That's why I used asm function to flush L3.
Assuming the stack is located in cachable memory, the CPU is free (per the definition of the ARM architecture) to pull it into the cache at any time the cache is enabled (and perhaps even when it isn't enabled, at the very least for the icache on ARMv8 if not other cases too). Implementation in C vs. assembly has absolutely no effect here. I guess your statement assumes that C functions will write data to the stack and assembly functions never will. There's no strict 1:1 correlation between those two things; assembly code can touch the stack just like C code. If there's an assumption it won't, it needs to be documented in the header defining these hook functions.
I assume you're specifically talking about dirtying the dcache between the point when dcache flushing starts and the point when the dcache is disabled? If so, flush_dcache_all() itself would have to be manually coded in assembly to avoid using the stack, as would dcache_disable() and set_sctlr(). I think this is why dcache_disable() currently disables the dcache first (thus preventing it acquiring new dirty data) and then flushes the dcache afterwards (thus guaranteeing that all dirty data is flushed with no race condition). This implies that your change to swap the order of those two functions isn't correct. I'm pretty sure I'm
I wonder if David can shed some light on the original order of calls to disable dcache.
correct in saying that the dcache can hit even if it's disabled, hence disabling the dcache while it contains dirty data won't lead to issues?
My earlier debug was based on the original order of calls. I found I had to avoid using the stack before flushing L3. Now with the changed order, I haven't tested. But I can image the stack will be dirty and flushing L3 may or may not push the data into main memory (depending on the L3 implementation whether inclusive or not).
You said you are sure dcache can hit even if it is disabled. Can you explain more? My test shows as soon as the d-cache is disabled, the core cannot get the data in dirty cache.
By "hit" here, I mean that even with the dcache disabled, when the CPU performs a read access, if the dcache contains a copy of that data, it can return it rather than requiring it to be fetched from DRAM.
Yes, with the dcache disabled, I would not expect any writes to allocate new lines in the cache (although presumably writes would update any lines already there, in a write-though sense).
At least, I'm pretty sure this is all true. It seems the only way to allow switching from cache-on to cache-off state without losing dirty data.

On 10/18/2016 02:01 PM, Stephen Warren wrote:
On 10/18/2016 12:40 PM, york sun wrote:
On 10/18/2016 11:14 AM, Stephen Warren wrote:
On 10/18/2016 09:28 AM, york sun wrote:
On 10/17/2016 04:35 PM, Stephen Warren wrote:
From: Stephen Warren swarren@nvidia.com
SoC-specific logic may be required for all forms of cache-wide operations; invalidate and flush of both dcache and icache (note that only 3 of the 4 possible combinations make sense, since the icache never contains dirty lines). This patch adds an optional hook for all implemented cache-wide operations, and renames the one existing hook to better represent exactly which operation it is implementing. A dummy no-op implementation of each hook is provided. These dummy implementations are moved into C code, since there's no need to implement them in assembly.
Stephen,
Moving this function to C may pose an issue. I had a debug a couple of years ago that calling a C function put the stack into cache after flushing L1/L2. That's why I used asm function to flush L3.
Assuming the stack is located in cachable memory, the CPU is free (per the definition of the ARM architecture) to pull it into the cache at any time the cache is enabled (and perhaps even when it isn't enabled, at the very least for the icache on ARMv8 if not other cases too). Implementation in C vs. assembly has absolutely no effect here. I guess your statement assumes that C functions will write data to the stack and assembly functions never will. There's no strict 1:1 correlation between those two things; assembly code can touch the stack just like C code. If there's an assumption it won't, it needs to be documented in the header defining these hook functions.
I assume you're specifically talking about dirtying the dcache between the point when dcache flushing starts and the point when the dcache is disabled? If so, flush_dcache_all() itself would have to be manually coded in assembly to avoid using the stack, as would dcache_disable() and set_sctlr(). I think this is why dcache_disable() currently disables the dcache first (thus preventing it acquiring new dirty data) and then flushes the dcache afterwards (thus guaranteeing that all dirty data is flushed with no race condition). This implies that your change to swap the order of those two functions isn't correct. I'm pretty sure I'm
I wonder if David can shed some light on the original order of calls to disable dcache.
correct in saying that the dcache can hit even if it's disabled, hence disabling the dcache while it contains dirty data won't lead to issues?
My earlier debug was based on the original order of calls. I found I had to avoid using the stack before flushing L3. Now with the changed order, I haven't tested. But I can image the stack will be dirty and flushing L3 may or may not push the data into main memory (depending on the L3 implementation whether inclusive or not).
You said you are sure dcache can hit even if it is disabled. Can you explain more? My test shows as soon as the d-cache is disabled, the core cannot get the data in dirty cache.
By "hit" here, I mean that even with the dcache disabled, when the CPU performs a read access, if the dcache contains a copy of that data, it can return it rather than requiring it to be fetched from DRAM.
Yes, with the dcache disabled, I would not expect any writes to allocate new lines in the cache (although presumably writes would update any lines already there, in a write-though sense).
At least, I'm pretty sure this is all true. It seems the only way to allow switching from cache-on to cache-off state without losing dirty data.
I believe my test showed otherwise. As soon as the dcache is disabled, the core cannot get the dirty cached data if the dcache was flushed by way/set for L1 and L2 (L3 wasn't flushed). That's why I proposed to change the order to flush first.
York

On 10/18/2016 03:28 PM, york sun wrote:
On 10/18/2016 02:01 PM, Stephen Warren wrote:
On 10/18/2016 12:40 PM, york sun wrote:
On 10/18/2016 11:14 AM, Stephen Warren wrote:
On 10/18/2016 09:28 AM, york sun wrote:
On 10/17/2016 04:35 PM, Stephen Warren wrote:
From: Stephen Warren swarren@nvidia.com
SoC-specific logic may be required for all forms of cache-wide operations; invalidate and flush of both dcache and icache (note that only 3 of the 4 possible combinations make sense, since the icache never contains dirty lines). This patch adds an optional hook for all implemented cache-wide operations, and renames the one existing hook to better represent exactly which operation it is implementing. A dummy no-op implementation of each hook is provided. These dummy implementations are moved into C code, since there's no need to implement them in assembly.
Stephen,
Moving this function to C may pose an issue. I had a debug a couple of years ago that calling a C function put the stack into cache after flushing L1/L2. That's why I used asm function to flush L3.
Assuming the stack is located in cachable memory, the CPU is free (per the definition of the ARM architecture) to pull it into the cache at any time the cache is enabled (and perhaps even when it isn't enabled, at the very least for the icache on ARMv8 if not other cases too). Implementation in C vs. assembly has absolutely no effect here. I guess your statement assumes that C functions will write data to the stack and assembly functions never will. There's no strict 1:1 correlation between those two things; assembly code can touch the stack just like C code. If there's an assumption it won't, it needs to be documented in the header defining these hook functions.
I assume you're specifically talking about dirtying the dcache between the point when dcache flushing starts and the point when the dcache is disabled? If so, flush_dcache_all() itself would have to be manually coded in assembly to avoid using the stack, as would dcache_disable() and set_sctlr(). I think this is why dcache_disable() currently disables the dcache first (thus preventing it acquiring new dirty data) and then flushes the dcache afterwards (thus guaranteeing that all dirty data is flushed with no race condition). This implies that your change to swap the order of those two functions isn't correct. I'm pretty sure I'm
I wonder if David can shed some light on the original order of calls to disable dcache.
correct in saying that the dcache can hit even if it's disabled, hence disabling the dcache while it contains dirty data won't lead to issues?
My earlier debug was based on the original order of calls. I found I had to avoid using the stack before flushing L3. Now with the changed order, I haven't tested. But I can image the stack will be dirty and flushing L3 may or may not push the data into main memory (depending on the L3 implementation whether inclusive or not).
You said you are sure dcache can hit even if it is disabled. Can you explain more? My test shows as soon as the d-cache is disabled, the core cannot get the data in dirty cache.
By "hit" here, I mean that even with the dcache disabled, when the CPU performs a read access, if the dcache contains a copy of that data, it can return it rather than requiring it to be fetched from DRAM.
Yes, with the dcache disabled, I would not expect any writes to allocate new lines in the cache (although presumably writes would update any lines already there, in a write-though sense).
At least, I'm pretty sure this is all true. It seems the only way to allow switching from cache-on to cache-off state without losing dirty data.
I believe my test showed otherwise. As soon as the dcache is disabled, the core cannot get the dirty cached data if the dcache was flushed by way/set for L1 and L2 (L3 wasn't flushed). That's why I proposed to change the order to flush first.
It looks like what I was saying is true for ARMv7 but not for ARMv8 (or perhaps it's not architectural but implementation-defined). For example, the Cortex A15 TRM says:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0438i/BABHEJF...
When you disable the cache, all Write-Back Cacheable requests still look up the L1 cache. If there is a cache hit, the cache is read or updated in the same way as if the cache is enabled. This enables Cacheable memory to remain fully coherent while the cache is disabled.
However, the Cortex A72 TRM says:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0488h/way1382...
When you disable the cache, all Write-Back Cacheable requests do not look up the L1 data cache. L1 cache still services the snoops from the L2 cache.
So yes, we should flush the entire cache first, then disable it.
This does indeed imply that we shouldn't write data between starting to flush the dcache and disabling the cache. However, that seems too restrictive, especially in the face of caches beyond L1/L2 which might need quite some code to flush out. I'll try to investigate that aspect some more.

Hi Stephen,
On 17 October 2016 at 15:35, Stephen Warren swarren@wwwdotorg.org wrote:
From: Stephen Warren swarren@nvidia.com
SoC-specific logic may be required for all forms of cache-wide operations; invalidate and flush of both dcache and icache (note that only 3 of the 4 possible combinations make sense, since the icache never contains dirty lines). This patch adds an optional hook for all implemented cache-wide operations, and renames the one existing hook to better represent exactly which operation it is implementing. A dummy no-op implementation of each hook is provided. These dummy implementations are moved into C code, since there's no need to implement them in assembly.
Signed-off-by: Stephen Warren swarren@nvidia.com
arch/arm/cpu/armv8/cache.S | 6 ------ arch/arm/cpu/armv8/cache_v8.c | 23 ++++++++++++++++++++--- arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S | 4 ++-- arch/arm/include/asm/system.h | 5 ++++- arch/arm/mach-tegra/tegra186/cache.c | 2 +- 5 files changed, 27 insertions(+), 13 deletions(-)
I think we should have a proper interface for this stuff rather than weak functions. Maybe we need a linker-list approach, or a cache uclass?
Regards, Simon

On 10/18/2016 10:23 AM, Simon Glass wrote:
Hi Stephen,
On 17 October 2016 at 15:35, Stephen Warren swarren@wwwdotorg.org wrote:
From: Stephen Warren swarren@nvidia.com
SoC-specific logic may be required for all forms of cache-wide operations; invalidate and flush of both dcache and icache (note that only 3 of the 4 possible combinations make sense, since the icache never contains dirty lines). This patch adds an optional hook for all implemented cache-wide operations, and renames the one existing hook to better represent exactly which operation it is implementing. A dummy no-op implementation of each hook is provided. These dummy implementations are moved into C code, since there's no need to implement them in assembly.
Signed-off-by: Stephen Warren swarren@nvidia.com
arch/arm/cpu/armv8/cache.S | 6 ------ arch/arm/cpu/armv8/cache_v8.c | 23 ++++++++++++++++++++--- arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S | 4 ++-- arch/arm/include/asm/system.h | 5 ++++- arch/arm/mach-tegra/tegra186/cache.c | 2 +- 5 files changed, 27 insertions(+), 13 deletions(-)
I think we should have a proper interface for this stuff rather than weak functions. Maybe we need a linker-list approach, or a cache uclass?
What's improper about this interface? Presumably we could argue that no function in the entire of U-Boot be called by symbol name, which doesn't seem useful.
I'm not sure exactly what you envisage by a linker-list approach. Can you provide some background? I understand how the linker can construct list of objects/implementations/..., but that doesn't seem useful here since there's a known-ahead-of-time single implementation of these functions in a single build of U-Boot.
A cache uclass seems like /massive/ overkill, especially since I'd expect these very low-level functions to be required well before any higher level code like DM/classes/... to be available.

Hi Stephen,
On 18 October 2016 at 12:58, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 10:23 AM, Simon Glass wrote:
Hi Stephen,
On 17 October 2016 at 15:35, Stephen Warren swarren@wwwdotorg.org wrote:
From: Stephen Warren swarren@nvidia.com
SoC-specific logic may be required for all forms of cache-wide operations; invalidate and flush of both dcache and icache (note that only 3 of the 4 possible combinations make sense, since the icache never contains dirty lines). This patch adds an optional hook for all implemented cache-wide operations, and renames the one existing hook to better represent exactly which operation it is implementing. A dummy no-op implementation of each hook is provided. These dummy implementations are moved into C code, since there's no need to implement them in assembly.
Signed-off-by: Stephen Warren swarren@nvidia.com
arch/arm/cpu/armv8/cache.S | 6 ------ arch/arm/cpu/armv8/cache_v8.c | 23 ++++++++++++++++++++--- arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S | 4 ++-- arch/arm/include/asm/system.h | 5 ++++- arch/arm/mach-tegra/tegra186/cache.c | 2 +- 5 files changed, 27 insertions(+), 13 deletions(-)
I think we should have a proper interface for this stuff rather than weak functions. Maybe we need a linker-list approach, or a cache uclass?
What's improper about this interface? Presumably we could argue that no function in the entire of U-Boot be called by symbol name, which doesn't seem useful.
I'm not sure exactly what you envisage by a linker-list approach. Can you provide some background? I understand how the linker can construct list of objects/implementations/..., but that doesn't seem useful here since there's a known-ahead-of-time single implementation of these functions in a single build of U-Boot.
Your own commit messages says that this is SoC-specific. I'm suggesting that we define an interface (which I think you have already done with your header file additions), and allow SoCs to implement it via a linker list.
IMO the cache code in U-Boot is starting to get a bit creaky.
A cache uclass seems like /massive/ overkill, especially since I'd expect these very low-level functions to be required well before any higher level code like DM/classes/... to be available.
DM is available very early. But it's not clear from your patch when this code is actually called.
Regards, Simon

On 10/18/2016 01:03 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 12:58, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 10:23 AM, Simon Glass wrote:
Hi Stephen,
On 17 October 2016 at 15:35, Stephen Warren swarren@wwwdotorg.org wrote:
From: Stephen Warren swarren@nvidia.com
SoC-specific logic may be required for all forms of cache-wide operations; invalidate and flush of both dcache and icache (note that only 3 of the 4 possible combinations make sense, since the icache never contains dirty lines). This patch adds an optional hook for all implemented cache-wide operations, and renames the one existing hook to better represent exactly which operation it is implementing. A dummy no-op implementation of each hook is provided. These dummy implementations are moved into C code, since there's no need to implement them in assembly.
Signed-off-by: Stephen Warren swarren@nvidia.com
arch/arm/cpu/armv8/cache.S | 6 ------ arch/arm/cpu/armv8/cache_v8.c | 23 ++++++++++++++++++++--- arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S | 4 ++-- arch/arm/include/asm/system.h | 5 ++++- arch/arm/mach-tegra/tegra186/cache.c | 2 +- 5 files changed, 27 insertions(+), 13 deletions(-)
I think we should have a proper interface for this stuff rather than weak functions. Maybe we need a linker-list approach, or a cache uclass?
What's improper about this interface? Presumably we could argue that no function in the entire of U-Boot be called by symbol name, which doesn't seem useful.
I'm not sure exactly what you envisage by a linker-list approach. Can you provide some background? I understand how the linker can construct list of objects/implementations/..., but that doesn't seem useful here since there's a known-ahead-of-time single implementation of these functions in a single build of U-Boot.
Your own commit messages says that this is SoC-specific. I'm suggesting that we define an interface (which I think you have already done with your header file additions), and allow SoCs to implement it via a linker list.
IMO the cache code in U-Boot is starting to get a bit creaky.
A cache uclass seems like /massive/ overkill, especially since I'd expect these very low-level functions to be required well before any higher level code like DM/classes/... to be available.
DM is available very early. But it's not clear from your patch when this code is actually called.
I believe that weak functions are a perfectly acceptable approach here.
Yes, the implementation of these functions is SoC-specific. The Makefiles will pull in the appropriate implementation for that SoC whenever U-Boot is built, just like every other board- or SoC-specific function in the entire of U-Boot.
There's no need for linker lists since there is only ever one implementation.

Hi Stephen,
On 18 October 2016 at 13:10, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 01:03 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 12:58, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 10:23 AM, Simon Glass wrote:
Hi Stephen,
On 17 October 2016 at 15:35, Stephen Warren swarren@wwwdotorg.org wrote:
From: Stephen Warren swarren@nvidia.com
SoC-specific logic may be required for all forms of cache-wide operations; invalidate and flush of both dcache and icache (note that only 3 of the 4 possible combinations make sense, since the icache never contains dirty lines). This patch adds an optional hook for all implemented cache-wide operations, and renames the one existing hook to better represent exactly which operation it is implementing. A dummy no-op implementation of each hook is provided. These dummy implementations are moved into C code, since there's no need to implement them in assembly.
Signed-off-by: Stephen Warren swarren@nvidia.com
arch/arm/cpu/armv8/cache.S | 6 ------ arch/arm/cpu/armv8/cache_v8.c | 23 ++++++++++++++++++++--- arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S | 4 ++-- arch/arm/include/asm/system.h | 5 ++++- arch/arm/mach-tegra/tegra186/cache.c | 2 +- 5 files changed, 27 insertions(+), 13 deletions(-)
I think we should have a proper interface for this stuff rather than weak functions. Maybe we need a linker-list approach, or a cache uclass?
What's improper about this interface? Presumably we could argue that no function in the entire of U-Boot be called by symbol name, which doesn't seem useful.
I'm not sure exactly what you envisage by a linker-list approach. Can you provide some background? I understand how the linker can construct list of objects/implementations/..., but that doesn't seem useful here since there's a known-ahead-of-time single implementation of these functions in a single build of U-Boot.
Your own commit messages says that this is SoC-specific. I'm suggesting that we define an interface (which I think you have already done with your header file additions), and allow SoCs to implement it via a linker list.
IMO the cache code in U-Boot is starting to get a bit creaky.
A cache uclass seems like /massive/ overkill, especially since I'd expect these very low-level functions to be required well before any higher level code like DM/classes/... to be available.
DM is available very early. But it's not clear from your patch when this code is actually called.
I believe that weak functions are a perfectly acceptable approach here.
Yes, the implementation of these functions is SoC-specific. The Makefiles will pull in the appropriate implementation for that SoC whenever U-Boot is built, just like every other board- or SoC-specific function in the entire of U-Boot.
There's no need for linker lists since there is only ever one implementation.
If there is only ever one implementation, why do you need weak functions? Just call them directly. I think in fact you mean that there can be no implementation (but perhaps an empty one), or one implementation. You are effectively using multiple weak functions to provide default code. I think it would be better if this were explicit.
I still think that the cache functions could do with a rethink.
Regards, Simon

On 10/18/2016 01:56 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 13:10, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 01:03 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 12:58, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 10:23 AM, Simon Glass wrote:
Hi Stephen,
On 17 October 2016 at 15:35, Stephen Warren swarren@wwwdotorg.org wrote:
From: Stephen Warren swarren@nvidia.com
SoC-specific logic may be required for all forms of cache-wide operations; invalidate and flush of both dcache and icache (note that only 3 of the 4 possible combinations make sense, since the icache never contains dirty lines). This patch adds an optional hook for all implemented cache-wide operations, and renames the one existing hook to better represent exactly which operation it is implementing. A dummy no-op implementation of each hook is provided. These dummy implementations are moved into C code, since there's no need to implement them in assembly.
Signed-off-by: Stephen Warren swarren@nvidia.com
arch/arm/cpu/armv8/cache.S | 6 ------ arch/arm/cpu/armv8/cache_v8.c | 23 ++++++++++++++++++++--- arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S | 4 ++-- arch/arm/include/asm/system.h | 5 ++++- arch/arm/mach-tegra/tegra186/cache.c | 2 +- 5 files changed, 27 insertions(+), 13 deletions(-)
I think we should have a proper interface for this stuff rather than weak functions. Maybe we need a linker-list approach, or a cache uclass?
What's improper about this interface? Presumably we could argue that no function in the entire of U-Boot be called by symbol name, which doesn't seem useful.
I'm not sure exactly what you envisage by a linker-list approach. Can you provide some background? I understand how the linker can construct list of objects/implementations/..., but that doesn't seem useful here since there's a known-ahead-of-time single implementation of these functions in a single build of U-Boot.
Your own commit messages says that this is SoC-specific. I'm suggesting that we define an interface (which I think you have already done with your header file additions), and allow SoCs to implement it via a linker list.
IMO the cache code in U-Boot is starting to get a bit creaky.
A cache uclass seems like /massive/ overkill, especially since I'd expect these very low-level functions to be required well before any higher level code like DM/classes/... to be available.
DM is available very early. But it's not clear from your patch when this code is actually called.
I believe that weak functions are a perfectly acceptable approach here.
Yes, the implementation of these functions is SoC-specific. The Makefiles will pull in the appropriate implementation for that SoC whenever U-Boot is built, just like every other board- or SoC-specific function in the entire of U-Boot.
There's no need for linker lists since there is only ever one implementation.
If there is only ever one implementation, why do you need weak functions?
As I explicitly stated above, each SoC can have a different implementation, yet only a single implementation is ever needed for a particular U-Boot build.
Just call them directly.
The code is doing that, both before and after my patch.
I think in fact you mean that there can be no implementation (but perhaps an empty one), or one implementation. You are effectively using multiple weak functions to provide default code. I think it would be better if this were explicit.
I still think that the cache functions could do with a rethink.
In my opinion, this patch doesn't change the code structure at all. There is already an interface between the core (L1/L2) cache management code and the SoC-specific cache management code. That interface already uses a weak function to provide the default no-op implementation. This patch doesn't change any of that. All this patch does is fix that existing interface to cover all 3 combinations of dcache_flush, dcache_invalidate, and icache_invalidate, rather than just one of those combinations. It's more of a bug-fix than anything else.
If you want to rework this interface sometime, be my guest. However, I don't think it's fair to require that someone who simply wants to fix the existing code (in a way that is orthogonal to your desired interface refactoring) do that refactoring first, rather than doing it yourself.

Hi Stephen,
On 18 October 2016 at 16:54, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 01:56 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 13:10, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 01:03 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 12:58, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 10:23 AM, Simon Glass wrote:
Hi Stephen,
On 17 October 2016 at 15:35, Stephen Warren swarren@wwwdotorg.org wrote: > > > > From: Stephen Warren swarren@nvidia.com > > SoC-specific logic may be required for all forms of cache-wide > operations; invalidate and flush of both dcache and icache (note that > only 3 of the 4 possible combinations make sense, since the icache > never > contains dirty lines). This patch adds an optional hook for all > implemented cache-wide operations, and renames the one existing hook > to > better represent exactly which operation it is implementing. A dummy > no-op implementation of each hook is provided. These dummy > implementations are moved into C code, since there's no need to > implement them in assembly. > > Signed-off-by: Stephen Warren swarren@nvidia.com > --- > arch/arm/cpu/armv8/cache.S | 6 ------ > arch/arm/cpu/armv8/cache_v8.c | 23 > ++++++++++++++++++++--- > arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S | 4 ++-- > arch/arm/include/asm/system.h | 5 ++++- > arch/arm/mach-tegra/tegra186/cache.c | 2 +- > 5 files changed, 27 insertions(+), 13 deletions(-) >
I think we should have a proper interface for this stuff rather than weak functions. Maybe we need a linker-list approach, or a cache uclass?
What's improper about this interface? Presumably we could argue that no function in the entire of U-Boot be called by symbol name, which doesn't seem useful.
I'm not sure exactly what you envisage by a linker-list approach. Can you provide some background? I understand how the linker can construct list of objects/implementations/..., but that doesn't seem useful here since there's a known-ahead-of-time single implementation of these functions in a single build of U-Boot.
Your own commit messages says that this is SoC-specific. I'm suggesting that we define an interface (which I think you have already done with your header file additions), and allow SoCs to implement it via a linker list.
IMO the cache code in U-Boot is starting to get a bit creaky.
A cache uclass seems like /massive/ overkill, especially since I'd expect these very low-level functions to be required well before any higher level code like DM/classes/... to be available.
DM is available very early. But it's not clear from your patch when this code is actually called.
I believe that weak functions are a perfectly acceptable approach here.
Yes, the implementation of these functions is SoC-specific. The Makefiles will pull in the appropriate implementation for that SoC whenever U-Boot is built, just like every other board- or SoC-specific function in the entire of U-Boot.
There's no need for linker lists since there is only ever one implementation.
If there is only ever one implementation, why do you need weak functions?
As I explicitly stated above, each SoC can have a different implementation, yet only a single implementation is ever needed for a particular U-Boot build.
Just call them directly.
The code is doing that, both before and after my patch.
I mean call them without needing weak functions.
I think in fact you mean that there can be no implementation (but perhaps an empty one), or one implementation. You are effectively using multiple weak functions to provide default code. I think it would be better if this were explicit.
I still think that the cache functions could do with a rethink.
In my opinion, this patch doesn't change the code structure at all. There is already an interface between the core (L1/L2) cache management code and the SoC-specific cache management code. That interface already uses a weak function to provide the default no-op implementation. This patch doesn't change any of that. All this patch does is fix that existing interface to cover all 3 combinations of dcache_flush, dcache_invalidate, and icache_invalidate, rather than just one of those combinations. It's more of a bug-fix than anything else.
Yes I see that.
If you want to rework this interface sometime, be my guest. However, I don't think it's fair to require that someone who simply wants to fix the existing code (in a way that is orthogonal to your desired interface refactoring) do that refactoring first, rather than doing it yourself.
I understand what you are saying, but isn't that how open source software works? Believe me, I have done my fair share of refactoring :-)
At least can you look at not making it any harder to fix up later? The more we pile onto this interface, the harder it will be later. We should aim to make ARMv8 really nice as it is the new thing.
Regards, Simon

On 10/18/2016 05:08 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 16:54, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 01:56 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 13:10, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 01:03 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 12:58, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 10:23 AM, Simon Glass wrote: > > > > Hi Stephen, > > On 17 October 2016 at 15:35, Stephen Warren swarren@wwwdotorg.org > wrote: >> >> >> >> From: Stephen Warren swarren@nvidia.com >> >> SoC-specific logic may be required for all forms of cache-wide >> operations; invalidate and flush of both dcache and icache (note that >> only 3 of the 4 possible combinations make sense, since the icache >> never >> contains dirty lines). This patch adds an optional hook for all >> implemented cache-wide operations, and renames the one existing hook >> to >> better represent exactly which operation it is implementing. A dummy >> no-op implementation of each hook is provided. These dummy >> implementations are moved into C code, since there's no need to >> implement them in assembly. >> >> Signed-off-by: Stephen Warren swarren@nvidia.com >> --- >> arch/arm/cpu/armv8/cache.S | 6 ------ >> arch/arm/cpu/armv8/cache_v8.c | 23 >> ++++++++++++++++++++--- >> arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S | 4 ++-- >> arch/arm/include/asm/system.h | 5 ++++- >> arch/arm/mach-tegra/tegra186/cache.c | 2 +- >> 5 files changed, 27 insertions(+), 13 deletions(-) >> > > I think we should have a proper interface for this stuff rather than > weak functions. Maybe we need a linker-list approach, or a cache > uclass?
What's improper about this interface? Presumably we could argue that no function in the entire of U-Boot be called by symbol name, which doesn't seem useful.
I'm not sure exactly what you envisage by a linker-list approach. Can you provide some background? I understand how the linker can construct list of objects/implementations/..., but that doesn't seem useful here since there's a known-ahead-of-time single implementation of these functions in a single build of U-Boot.
Your own commit messages says that this is SoC-specific. I'm suggesting that we define an interface (which I think you have already done with your header file additions), and allow SoCs to implement it via a linker list.
IMO the cache code in U-Boot is starting to get a bit creaky.
A cache uclass seems like /massive/ overkill, especially since I'd expect these very low-level functions to be required well before any higher level code like DM/classes/... to be available.
DM is available very early. But it's not clear from your patch when this code is actually called.
I believe that weak functions are a perfectly acceptable approach here.
Yes, the implementation of these functions is SoC-specific. The Makefiles will pull in the appropriate implementation for that SoC whenever U-Boot is built, just like every other board- or SoC-specific function in the entire of U-Boot.
There's no need for linker lists since there is only ever one implementation.
If there is only ever one implementation, why do you need weak functions?
As I explicitly stated above, each SoC can have a different implementation, yet only a single implementation is ever needed for a particular U-Boot build.
Just call them directly.
The code is doing that, both before and after my patch.
I mean call them without needing weak functions.
I think in fact you mean that there can be no implementation (but perhaps an empty one), or one implementation. You are effectively using multiple weak functions to provide default code. I think it would be better if this were explicit.
I still think that the cache functions could do with a rethink.
In my opinion, this patch doesn't change the code structure at all. There is already an interface between the core (L1/L2) cache management code and the SoC-specific cache management code. That interface already uses a weak function to provide the default no-op implementation. This patch doesn't change any of that. All this patch does is fix that existing interface to cover all 3 combinations of dcache_flush, dcache_invalidate, and icache_invalidate, rather than just one of those combinations. It's more of a bug-fix than anything else.
Yes I see that.
If you want to rework this interface sometime, be my guest. However, I don't think it's fair to require that someone who simply wants to fix the existing code (in a way that is orthogonal to your desired interface refactoring) do that refactoring first, rather than doing it yourself.
I understand what you are saying, but isn't that how open source software works? Believe me, I have done my fair share of refactoring :-)
At least can you look at not making it any harder to fix up later? The more we pile onto this interface, the harder it will be later. We should aim to make ARMv8 really nice as it is the new thing.
I believe moving one or three functions into any replacement scheme will have an identical level of difficulty. So, I believe the patch already satisfies the "no harder" requirement.

Hi Stephen,
On 18 October 2016 at 17:33, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 05:08 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 16:54, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 01:56 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 13:10, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 01:03 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 12:58, Stephen Warren swarren@wwwdotorg.org wrote: > > > > On 10/18/2016 10:23 AM, Simon Glass wrote: >> >> >> >> >> Hi Stephen, >> >> On 17 October 2016 at 15:35, Stephen Warren swarren@wwwdotorg.org >> wrote: >>> >>> >>> >>> >>> From: Stephen Warren swarren@nvidia.com >>> >>> SoC-specific logic may be required for all forms of cache-wide >>> operations; invalidate and flush of both dcache and icache (note >>> that >>> only 3 of the 4 possible combinations make sense, since the icache >>> never >>> contains dirty lines). This patch adds an optional hook for all >>> implemented cache-wide operations, and renames the one existing >>> hook >>> to >>> better represent exactly which operation it is implementing. A >>> dummy >>> no-op implementation of each hook is provided. These dummy >>> implementations are moved into C code, since there's no need to >>> implement them in assembly. >>> >>> Signed-off-by: Stephen Warren swarren@nvidia.com >>> --- >>> arch/arm/cpu/armv8/cache.S | 6 ------ >>> arch/arm/cpu/armv8/cache_v8.c | 23 >>> ++++++++++++++++++++--- >>> arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S | 4 ++-- >>> arch/arm/include/asm/system.h | 5 ++++- >>> arch/arm/mach-tegra/tegra186/cache.c | 2 +- >>> 5 files changed, 27 insertions(+), 13 deletions(-) >>> >> >> I think we should have a proper interface for this stuff rather than >> weak functions. Maybe we need a linker-list approach, or a cache >> uclass? > > > > > > What's improper about this interface? Presumably we could argue that > no > function in the entire of U-Boot be called by symbol name, which > doesn't > seem useful. > > I'm not sure exactly what you envisage by a linker-list approach. Can > you > provide some background? I understand how the linker can construct > list > of > objects/implementations/..., but that doesn't seem useful here since > there's > a known-ahead-of-time single implementation of these functions in a > single > build of U-Boot.
Your own commit messages says that this is SoC-specific. I'm suggesting that we define an interface (which I think you have already done with your header file additions), and allow SoCs to implement it via a linker list.
IMO the cache code in U-Boot is starting to get a bit creaky.
> A cache uclass seems like /massive/ overkill, especially since I'd > expect > these very low-level functions to be required well before any higher > level > code like DM/classes/... to be available.
DM is available very early. But it's not clear from your patch when this code is actually called.
I believe that weak functions are a perfectly acceptable approach here.
Yes, the implementation of these functions is SoC-specific. The Makefiles will pull in the appropriate implementation for that SoC whenever U-Boot is built, just like every other board- or SoC-specific function in the entire of U-Boot.
There's no need for linker lists since there is only ever one implementation.
If there is only ever one implementation, why do you need weak functions?
As I explicitly stated above, each SoC can have a different implementation, yet only a single implementation is ever needed for a particular U-Boot build.
Just call them directly.
The code is doing that, both before and after my patch.
I mean call them without needing weak functions.
I think in fact you mean that there can be no implementation (but perhaps an empty one), or one implementation. You are effectively using multiple weak functions to provide default code. I think it would be better if this were explicit.
I still think that the cache functions could do with a rethink.
In my opinion, this patch doesn't change the code structure at all. There is already an interface between the core (L1/L2) cache management code and the SoC-specific cache management code. That interface already uses a weak function to provide the default no-op implementation. This patch doesn't change any of that. All this patch does is fix that existing interface to cover all 3 combinations of dcache_flush, dcache_invalidate, and icache_invalidate, rather than just one of those combinations. It's more of a bug-fix than anything else.
Yes I see that.
If you want to rework this interface sometime, be my guest. However, I don't think it's fair to require that someone who simply wants to fix the existing code (in a way that is orthogonal to your desired interface refactoring) do that refactoring first, rather than doing it yourself.
I understand what you are saying, but isn't that how open source software works? Believe me, I have done my fair share of refactoring :-)
At least can you look at not making it any harder to fix up later? The more we pile onto this interface, the harder it will be later. We should aim to make ARMv8 really nice as it is the new thing.
I believe moving one or three functions into any replacement scheme will have an identical level of difficulty. So, I believe the patch already satisfies the "no harder" requirement.
Well it seems your mind is already made up!
Regards, Simon

On 10/18/2016 08:41 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 17:33, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 05:08 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 16:54, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 01:56 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 13:10, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 01:03 PM, Simon Glass wrote: > > > > Hi Stephen, > > On 18 October 2016 at 12:58, Stephen Warren swarren@wwwdotorg.org > wrote: >> >> >> >> On 10/18/2016 10:23 AM, Simon Glass wrote: >>> >>> >>> >>> >>> Hi Stephen, >>> >>> On 17 October 2016 at 15:35, Stephen Warren swarren@wwwdotorg.org >>> wrote: >>>> >>>> >>>> >>>> >>>> From: Stephen Warren swarren@nvidia.com >>>> >>>> SoC-specific logic may be required for all forms of cache-wide >>>> operations; invalidate and flush of both dcache and icache (note >>>> that >>>> only 3 of the 4 possible combinations make sense, since the icache >>>> never >>>> contains dirty lines). This patch adds an optional hook for all >>>> implemented cache-wide operations, and renames the one existing >>>> hook >>>> to >>>> better represent exactly which operation it is implementing. A >>>> dummy >>>> no-op implementation of each hook is provided. These dummy >>>> implementations are moved into C code, since there's no need to >>>> implement them in assembly. >>>> >>>> Signed-off-by: Stephen Warren swarren@nvidia.com >>>> --- >>>> arch/arm/cpu/armv8/cache.S | 6 ------ >>>> arch/arm/cpu/armv8/cache_v8.c | 23 >>>> ++++++++++++++++++++--- >>>> arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S | 4 ++-- >>>> arch/arm/include/asm/system.h | 5 ++++- >>>> arch/arm/mach-tegra/tegra186/cache.c | 2 +- >>>> 5 files changed, 27 insertions(+), 13 deletions(-) >>>> >>> >>> I think we should have a proper interface for this stuff rather than >>> weak functions. Maybe we need a linker-list approach, or a cache >>> uclass? >> >> >> >> >> >> What's improper about this interface? Presumably we could argue that >> no >> function in the entire of U-Boot be called by symbol name, which >> doesn't >> seem useful. >> >> I'm not sure exactly what you envisage by a linker-list approach. Can >> you >> provide some background? I understand how the linker can construct >> list >> of >> objects/implementations/..., but that doesn't seem useful here since >> there's >> a known-ahead-of-time single implementation of these functions in a >> single >> build of U-Boot. > > > > > Your own commit messages says that this is SoC-specific. I'm > suggesting that we define an interface (which I think you have already > done with your header file additions), and allow SoCs to implement it > via a linker list. > > IMO the cache code in U-Boot is starting to get a bit creaky. > >> A cache uclass seems like /massive/ overkill, especially since I'd >> expect >> these very low-level functions to be required well before any higher >> level >> code like DM/classes/... to be available. > > > > > DM is available very early. But it's not clear from your patch when > this code is actually called.
I believe that weak functions are a perfectly acceptable approach here.
Yes, the implementation of these functions is SoC-specific. The Makefiles will pull in the appropriate implementation for that SoC whenever U-Boot is built, just like every other board- or SoC-specific function in the entire of U-Boot.
There's no need for linker lists since there is only ever one implementation.
If there is only ever one implementation, why do you need weak functions?
As I explicitly stated above, each SoC can have a different implementation, yet only a single implementation is ever needed for a particular U-Boot build.
Just call them directly.
The code is doing that, both before and after my patch.
I mean call them without needing weak functions.
I think in fact you mean that there can be no implementation (but perhaps an empty one), or one implementation. You are effectively using multiple weak functions to provide default code. I think it would be better if this were explicit.
I still think that the cache functions could do with a rethink.
In my opinion, this patch doesn't change the code structure at all. There is already an interface between the core (L1/L2) cache management code and the SoC-specific cache management code. That interface already uses a weak function to provide the default no-op implementation. This patch doesn't change any of that. All this patch does is fix that existing interface to cover all 3 combinations of dcache_flush, dcache_invalidate, and icache_invalidate, rather than just one of those combinations. It's more of a bug-fix than anything else.
Yes I see that.
If you want to rework this interface sometime, be my guest. However, I don't think it's fair to require that someone who simply wants to fix the existing code (in a way that is orthogonal to your desired interface refactoring) do that refactoring first, rather than doing it yourself.
I understand what you are saying, but isn't that how open source software works? Believe me, I have done my fair share of refactoring :-)
At least can you look at not making it any harder to fix up later? The more we pile onto this interface, the harder it will be later. We should aim to make ARMv8 really nice as it is the new thing.
I believe moving one or three functions into any replacement scheme will have an identical level of difficulty. So, I believe the patch already satisfies the "no harder" requirement.
Well it seems your mind is already made up!
Well, I don't believe there are any viable or reasonable alternatives. I'm not prolonging this thread because I find it enjoyable, but because of the lack of alternatives to what this patch does.
Doing this via DM doesn't simplify anything or make it more maintainable; it's just using DM for the sake of it. DM is great when it makes life simpler and code more maintainable, but we use it because of those benefits, not just for the sake of it. Using DM adds complexity, and so there has to be a benefit of doing so. I don't believe there is here.
Doing this by linker scripts simply doesn't make sense:
Any given U-Boot binary will only contain a single implementation of these functions, so there's no need for any kind of runtime lookup, and if there was a runtime lookup, what would be the key and where would it come from? I believe we'd still have some compile-time-seledted function/data to drive the runtime lookup process, and so we'd be in exactly the same situation as we already are, just with more complexity added on top.
Using linker scripts to switch between implementations at compile time is much more complex than just letting the linker link stuff together directly. That's what it's good at. Using linker scripts would just add extra complexity without any benefit. We'd still end up with a single implementation in the binary, yet to call it code would have to indirect through the linker-defined table, rather than simply calling the same code by symbol name. Uggh!
Note that per the discussions in other forks of this thread, it's likely we'll need to code all these cache operations in assembly to avoid accessing DRAM while turning the cache on/off. This implies to me that we'll need to keep all the cache code extremely simple and direct, without any form of runtime or compile time indirection.

On Wed, Oct 19, 2016 at 09:36:52AM -0600, Stephen Warren wrote:
On 10/18/2016 08:41 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 17:33, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 05:08 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 16:54, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 01:56 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 13:10, Stephen Warren swarren@wwwdotorg.org wrote: > > >On 10/18/2016 01:03 PM, Simon Glass wrote: >> >> >> >>Hi Stephen, >> >>On 18 October 2016 at 12:58, Stephen Warren swarren@wwwdotorg.org >>wrote: >>> >>> >>> >>>On 10/18/2016 10:23 AM, Simon Glass wrote: >>>> >>>> >>>> >>>> >>>>Hi Stephen, >>>> >>>>On 17 October 2016 at 15:35, Stephen Warren swarren@wwwdotorg.org >>>>wrote: >>>>> >>>>> >>>>> >>>>> >>>>>From: Stephen Warren swarren@nvidia.com >>>>> >>>>>SoC-specific logic may be required for all forms of cache-wide >>>>>operations; invalidate and flush of both dcache and icache (note >>>>>that >>>>>only 3 of the 4 possible combinations make sense, since the icache >>>>>never >>>>>contains dirty lines). This patch adds an optional hook for all >>>>>implemented cache-wide operations, and renames the one existing >>>>>hook >>>>>to >>>>>better represent exactly which operation it is implementing. A >>>>>dummy >>>>>no-op implementation of each hook is provided. These dummy >>>>>implementations are moved into C code, since there's no need to >>>>>implement them in assembly. >>>>> >>>>>Signed-off-by: Stephen Warren swarren@nvidia.com >>>>>--- >>>>> arch/arm/cpu/armv8/cache.S | 6 ------ >>>>> arch/arm/cpu/armv8/cache_v8.c | 23 >>>>>++++++++++++++++++++--- >>>>> arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S | 4 ++-- >>>>> arch/arm/include/asm/system.h | 5 ++++- >>>>> arch/arm/mach-tegra/tegra186/cache.c | 2 +- >>>>> 5 files changed, 27 insertions(+), 13 deletions(-) >>>>> >>>> >>>>I think we should have a proper interface for this stuff rather than >>>>weak functions. Maybe we need a linker-list approach, or a cache >>>>uclass? >>> >>> >>> >>> >>> >>>What's improper about this interface? Presumably we could argue that >>>no >>>function in the entire of U-Boot be called by symbol name, which >>>doesn't >>>seem useful. >>> >>>I'm not sure exactly what you envisage by a linker-list approach. Can >>>you >>>provide some background? I understand how the linker can construct >>>list >>>of >>>objects/implementations/..., but that doesn't seem useful here since >>>there's >>>a known-ahead-of-time single implementation of these functions in a >>>single >>>build of U-Boot. >> >> >> >> >>Your own commit messages says that this is SoC-specific. I'm >>suggesting that we define an interface (which I think you have already >>done with your header file additions), and allow SoCs to implement it >>via a linker list. >> >>IMO the cache code in U-Boot is starting to get a bit creaky. >> >>>A cache uclass seems like /massive/ overkill, especially since I'd >>>expect >>>these very low-level functions to be required well before any higher >>>level >>>code like DM/classes/... to be available. >> >> >> >> >>DM is available very early. But it's not clear from your patch when >>this code is actually called. > > > > >I believe that weak functions are a perfectly acceptable approach here. > >Yes, the implementation of these functions is SoC-specific. The >Makefiles >will pull in the appropriate implementation for that SoC whenever >U-Boot >is >built, just like every other board- or SoC-specific function in the >entire >of U-Boot. > >There's no need for linker lists since there is only ever one >implementation.
If there is only ever one implementation, why do you need weak functions?
As I explicitly stated above, each SoC can have a different implementation, yet only a single implementation is ever needed for a particular U-Boot build.
Just call them directly.
The code is doing that, both before and after my patch.
I mean call them without needing weak functions.
I think in fact you mean that there can be no implementation (but perhaps an empty one), or one implementation. You are effectively using multiple weak functions to provide default code. I think it would be better if this were explicit.
I still think that the cache functions could do with a rethink.
In my opinion, this patch doesn't change the code structure at all. There is already an interface between the core (L1/L2) cache management code and the SoC-specific cache management code. That interface already uses a weak function to provide the default no-op implementation. This patch doesn't change any of that. All this patch does is fix that existing interface to cover all 3 combinations of dcache_flush, dcache_invalidate, and icache_invalidate, rather than just one of those combinations. It's more of a bug-fix than anything else.
Yes I see that.
If you want to rework this interface sometime, be my guest. However, I don't think it's fair to require that someone who simply wants to fix the existing code (in a way that is orthogonal to your desired interface refactoring) do that refactoring first, rather than doing it yourself.
I understand what you are saying, but isn't that how open source software works? Believe me, I have done my fair share of refactoring :-)
At least can you look at not making it any harder to fix up later? The more we pile onto this interface, the harder it will be later. We should aim to make ARMv8 really nice as it is the new thing.
I believe moving one or three functions into any replacement scheme will have an identical level of difficulty. So, I believe the patch already satisfies the "no harder" requirement.
Well it seems your mind is already made up!
Well, I don't believe there are any viable or reasonable alternatives. I'm not prolonging this thread because I find it enjoyable, but because of the lack of alternatives to what this patch does.
Doing this via DM doesn't simplify anything or make it more maintainable; it's just using DM for the sake of it. DM is great when it makes life simpler and code more maintainable, but we use it because of those benefits, not just for the sake of it. Using DM adds complexity, and so there has to be a benefit of doing so. I don't believe there is here.
Doing this by linker scripts simply doesn't make sense:
Any given U-Boot binary will only contain a single implementation of these functions, so there's no need for any kind of runtime lookup, and if there was a runtime lookup, what would be the key and where would it come from? I believe we'd still have some compile-time-seledted function/data to drive the runtime lookup process, and so we'd be in exactly the same situation as we already are, just with more complexity added on top.
Using linker scripts to switch between implementations at compile time is much more complex than just letting the linker link stuff together directly. That's what it's good at. Using linker scripts would just add extra complexity without any benefit. We'd still end up with a single implementation in the binary, yet to call it code would have to indirect through the linker-defined table, rather than simply calling the same code by symbol name. Uggh!
Note that per the discussions in other forks of this thread, it's likely we'll need to code all these cache operations in assembly to avoid accessing DRAM while turning the cache on/off. This implies to me that we'll need to keep all the cache code extremely simple and direct, without any form of runtime or compile time indirection.
For the record, until / unless we move to the point where we can run different SoCs within a single binary, doing this via weak functions seems to me to be the correct abstraction. If we know that some SoCs will be able to nop these, that is. If all SoCs will need something, we should simply omit the default functions. Thanks!

Hi Tom,
On 19 October 2016 at 09:39, Tom Rini trini@konsulko.com wrote:
On Wed, Oct 19, 2016 at 09:36:52AM -0600, Stephen Warren wrote:
On 10/18/2016 08:41 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 17:33, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 05:08 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 16:54, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 01:56 PM, Simon Glass wrote: > > >Hi Stephen, > >On 18 October 2016 at 13:10, Stephen Warren swarren@wwwdotorg.org >wrote: >> >> >>On 10/18/2016 01:03 PM, Simon Glass wrote: >>> >>> >>> >>>Hi Stephen, >>> >>>On 18 October 2016 at 12:58, Stephen Warren swarren@wwwdotorg.org >>>wrote: >>>> >>>> >>>> >>>>On 10/18/2016 10:23 AM, Simon Glass wrote: >>>>> >>>>> >>>>> >>>>> >>>>>Hi Stephen, >>>>> >>>>>On 17 October 2016 at 15:35, Stephen Warren swarren@wwwdotorg.org >>>>>wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>From: Stephen Warren swarren@nvidia.com >>>>>> >>>>>>SoC-specific logic may be required for all forms of cache-wide >>>>>>operations; invalidate and flush of both dcache and icache (note >>>>>>that >>>>>>only 3 of the 4 possible combinations make sense, since the icache >>>>>>never >>>>>>contains dirty lines). This patch adds an optional hook for all >>>>>>implemented cache-wide operations, and renames the one existing >>>>>>hook >>>>>>to >>>>>>better represent exactly which operation it is implementing. A >>>>>>dummy >>>>>>no-op implementation of each hook is provided. These dummy >>>>>>implementations are moved into C code, since there's no need to >>>>>>implement them in assembly. >>>>>> >>>>>>Signed-off-by: Stephen Warren swarren@nvidia.com >>>>>>--- >>>>>> arch/arm/cpu/armv8/cache.S | 6 ------ >>>>>> arch/arm/cpu/armv8/cache_v8.c | 23 >>>>>>++++++++++++++++++++--- >>>>>> arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S | 4 ++-- >>>>>> arch/arm/include/asm/system.h | 5 ++++- >>>>>> arch/arm/mach-tegra/tegra186/cache.c | 2 +- >>>>>> 5 files changed, 27 insertions(+), 13 deletions(-) >>>>>> >>>>> >>>>>I think we should have a proper interface for this stuff rather than >>>>>weak functions. Maybe we need a linker-list approach, or a cache >>>>>uclass? >>>> >>>> >>>> >>>> >>>> >>>>What's improper about this interface? Presumably we could argue that >>>>no >>>>function in the entire of U-Boot be called by symbol name, which >>>>doesn't >>>>seem useful. >>>> >>>>I'm not sure exactly what you envisage by a linker-list approach. Can >>>>you >>>>provide some background? I understand how the linker can construct >>>>list >>>>of >>>>objects/implementations/..., but that doesn't seem useful here since >>>>there's >>>>a known-ahead-of-time single implementation of these functions in a >>>>single >>>>build of U-Boot. >>> >>> >>> >>> >>>Your own commit messages says that this is SoC-specific. I'm >>>suggesting that we define an interface (which I think you have already >>>done with your header file additions), and allow SoCs to implement it >>>via a linker list. >>> >>>IMO the cache code in U-Boot is starting to get a bit creaky. >>> >>>>A cache uclass seems like /massive/ overkill, especially since I'd >>>>expect >>>>these very low-level functions to be required well before any higher >>>>level >>>>code like DM/classes/... to be available. >>> >>> >>> >>> >>>DM is available very early. But it's not clear from your patch when >>>this code is actually called. >> >> >> >> >>I believe that weak functions are a perfectly acceptable approach here. >> >>Yes, the implementation of these functions is SoC-specific. The >>Makefiles >>will pull in the appropriate implementation for that SoC whenever >>U-Boot >>is >>built, just like every other board- or SoC-specific function in the >>entire >>of U-Boot. >> >>There's no need for linker lists since there is only ever one >>implementation. > > > >If there is only ever one implementation, why do you need weak >functions?
As I explicitly stated above, each SoC can have a different implementation, yet only a single implementation is ever needed for a particular U-Boot build.
>Just call them directly.
The code is doing that, both before and after my patch.
I mean call them without needing weak functions.
>I think in fact you mean that >there can be no implementation (but perhaps an empty one), or one >implementation. You are effectively using multiple weak functions to >provide default code. I think it would be better if this were >explicit. > >I still think that the cache functions could do with a rethink.
In my opinion, this patch doesn't change the code structure at all. There is already an interface between the core (L1/L2) cache management code and the SoC-specific cache management code. That interface already uses a weak function to provide the default no-op implementation. This patch doesn't change any of that. All this patch does is fix that existing interface to cover all 3 combinations of dcache_flush, dcache_invalidate, and icache_invalidate, rather than just one of those combinations. It's more of a bug-fix than anything else.
Yes I see that.
If you want to rework this interface sometime, be my guest. However, I don't think it's fair to require that someone who simply wants to fix the existing code (in a way that is orthogonal to your desired interface refactoring) do that refactoring first, rather than doing it yourself.
I understand what you are saying, but isn't that how open source software works? Believe me, I have done my fair share of refactoring :-)
At least can you look at not making it any harder to fix up later? The more we pile onto this interface, the harder it will be later. We should aim to make ARMv8 really nice as it is the new thing.
I believe moving one or three functions into any replacement scheme will have an identical level of difficulty. So, I believe the patch already satisfies the "no harder" requirement.
Well it seems your mind is already made up!
Well, I don't believe there are any viable or reasonable alternatives. I'm not prolonging this thread because I find it enjoyable, but because of the lack of alternatives to what this patch does.
Doing this via DM doesn't simplify anything or make it more maintainable; it's just using DM for the sake of it. DM is great when it makes life simpler and code more maintainable, but we use it because of those benefits, not just for the sake of it. Using DM adds complexity, and so there has to be a benefit of doing so. I don't believe there is here.
Doing this by linker scripts simply doesn't make sense:
Any given U-Boot binary will only contain a single implementation of these functions, so there's no need for any kind of runtime lookup, and if there was a runtime lookup, what would be the key and where would it come from? I believe we'd still have some compile-time-seledted function/data to drive the runtime lookup process, and so we'd be in exactly the same situation as we already are, just with more complexity added on top.
Using linker scripts to switch between implementations at compile time is much more complex than just letting the linker link stuff together directly. That's what it's good at. Using linker scripts would just add extra complexity without any benefit. We'd still end up with a single implementation in the binary, yet to call it code would have to indirect through the linker-defined table, rather than simply calling the same code by symbol name. Uggh!
Note that per the discussions in other forks of this thread, it's likely we'll need to code all these cache operations in assembly to avoid accessing DRAM while turning the cache on/off. This implies to me that we'll need to keep all the cache code extremely simple and direct, without any form of runtime or compile time indirection.
For the record, until / unless we move to the point where we can run different SoCs within a single binary, doing this via weak functions seems to me to be the correct abstraction. If we know that some SoCs will be able to nop these, that is. If all SoCs will need something, we should simply omit the default functions. Thanks!
Is that a goal? I can see it would be useful to be able to build for multiple SoCs (i.e. avoid having to worry about what board you have) but we are a way off from that :-)
Regards, Simon

On Wed, Oct 19, 2016 at 09:59:02AM -0600, Simon Glass wrote:
Hi Tom,
On 19 October 2016 at 09:39, Tom Rini trini@konsulko.com wrote:
On Wed, Oct 19, 2016 at 09:36:52AM -0600, Stephen Warren wrote:
On 10/18/2016 08:41 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 17:33, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 05:08 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 16:54, Stephen Warren swarren@wwwdotorg.org wrote: > >On 10/18/2016 01:56 PM, Simon Glass wrote: >> >> >>Hi Stephen, >> >>On 18 October 2016 at 13:10, Stephen Warren swarren@wwwdotorg.org >>wrote: >>> >>> >>>On 10/18/2016 01:03 PM, Simon Glass wrote: >>>> >>>> >>>> >>>>Hi Stephen, >>>> >>>>On 18 October 2016 at 12:58, Stephen Warren swarren@wwwdotorg.org >>>>wrote: >>>>> >>>>> >>>>> >>>>>On 10/18/2016 10:23 AM, Simon Glass wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>Hi Stephen, >>>>>> >>>>>>On 17 October 2016 at 15:35, Stephen Warren swarren@wwwdotorg.org >>>>>>wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>From: Stephen Warren swarren@nvidia.com >>>>>>> >>>>>>>SoC-specific logic may be required for all forms of cache-wide >>>>>>>operations; invalidate and flush of both dcache and icache (note >>>>>>>that >>>>>>>only 3 of the 4 possible combinations make sense, since the icache >>>>>>>never >>>>>>>contains dirty lines). This patch adds an optional hook for all >>>>>>>implemented cache-wide operations, and renames the one existing >>>>>>>hook >>>>>>>to >>>>>>>better represent exactly which operation it is implementing. A >>>>>>>dummy >>>>>>>no-op implementation of each hook is provided. These dummy >>>>>>>implementations are moved into C code, since there's no need to >>>>>>>implement them in assembly. >>>>>>> >>>>>>>Signed-off-by: Stephen Warren swarren@nvidia.com >>>>>>>--- >>>>>>> arch/arm/cpu/armv8/cache.S | 6 ------ >>>>>>> arch/arm/cpu/armv8/cache_v8.c | 23 >>>>>>>++++++++++++++++++++--- >>>>>>> arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S | 4 ++-- >>>>>>> arch/arm/include/asm/system.h | 5 ++++- >>>>>>> arch/arm/mach-tegra/tegra186/cache.c | 2 +- >>>>>>> 5 files changed, 27 insertions(+), 13 deletions(-) >>>>>>> >>>>>> >>>>>>I think we should have a proper interface for this stuff rather than >>>>>>weak functions. Maybe we need a linker-list approach, or a cache >>>>>>uclass? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>What's improper about this interface? Presumably we could argue that >>>>>no >>>>>function in the entire of U-Boot be called by symbol name, which >>>>>doesn't >>>>>seem useful. >>>>> >>>>>I'm not sure exactly what you envisage by a linker-list approach. Can >>>>>you >>>>>provide some background? I understand how the linker can construct >>>>>list >>>>>of >>>>>objects/implementations/..., but that doesn't seem useful here since >>>>>there's >>>>>a known-ahead-of-time single implementation of these functions in a >>>>>single >>>>>build of U-Boot. >>>> >>>> >>>> >>>> >>>>Your own commit messages says that this is SoC-specific. I'm >>>>suggesting that we define an interface (which I think you have already >>>>done with your header file additions), and allow SoCs to implement it >>>>via a linker list. >>>> >>>>IMO the cache code in U-Boot is starting to get a bit creaky. >>>> >>>>>A cache uclass seems like /massive/ overkill, especially since I'd >>>>>expect >>>>>these very low-level functions to be required well before any higher >>>>>level >>>>>code like DM/classes/... to be available. >>>> >>>> >>>> >>>> >>>>DM is available very early. But it's not clear from your patch when >>>>this code is actually called. >>> >>> >>> >>> >>>I believe that weak functions are a perfectly acceptable approach here. >>> >>>Yes, the implementation of these functions is SoC-specific. The >>>Makefiles >>>will pull in the appropriate implementation for that SoC whenever >>>U-Boot >>>is >>>built, just like every other board- or SoC-specific function in the >>>entire >>>of U-Boot. >>> >>>There's no need for linker lists since there is only ever one >>>implementation. >> >> >> >>If there is only ever one implementation, why do you need weak >>functions? > > > >As I explicitly stated above, each SoC can have a different >implementation, >yet only a single implementation is ever needed for a particular U-Boot >build. > >>Just call them directly. > > > >The code is doing that, both before and after my patch.
I mean call them without needing weak functions.
> >>I think in fact you mean that >>there can be no implementation (but perhaps an empty one), or one >>implementation. You are effectively using multiple weak functions to >>provide default code. I think it would be better if this were >>explicit. >> >>I still think that the cache functions could do with a rethink. > > > >In my opinion, this patch doesn't change the code structure at all. There >is >already an interface between the core (L1/L2) cache management code and >the >SoC-specific cache management code. That interface already uses a weak >function to provide the default no-op implementation. This patch doesn't >change any of that. All this patch does is fix that existing interface to >cover all 3 combinations of dcache_flush, dcache_invalidate, and >icache_invalidate, rather than just one of those combinations. It's more >of >a bug-fix than anything else.
Yes I see that.
> >If you want to rework this interface sometime, be my guest. However, I >don't >think it's fair to require that someone who simply wants to fix the >existing >code (in a way that is orthogonal to your desired interface refactoring) >do >that refactoring first, rather than doing it yourself.
I understand what you are saying, but isn't that how open source software works? Believe me, I have done my fair share of refactoring :-)
At least can you look at not making it any harder to fix up later? The more we pile onto this interface, the harder it will be later. We should aim to make ARMv8 really nice as it is the new thing.
I believe moving one or three functions into any replacement scheme will have an identical level of difficulty. So, I believe the patch already satisfies the "no harder" requirement.
Well it seems your mind is already made up!
Well, I don't believe there are any viable or reasonable alternatives. I'm not prolonging this thread because I find it enjoyable, but because of the lack of alternatives to what this patch does.
Doing this via DM doesn't simplify anything or make it more maintainable; it's just using DM for the sake of it. DM is great when it makes life simpler and code more maintainable, but we use it because of those benefits, not just for the sake of it. Using DM adds complexity, and so there has to be a benefit of doing so. I don't believe there is here.
Doing this by linker scripts simply doesn't make sense:
Any given U-Boot binary will only contain a single implementation of these functions, so there's no need for any kind of runtime lookup, and if there was a runtime lookup, what would be the key and where would it come from? I believe we'd still have some compile-time-seledted function/data to drive the runtime lookup process, and so we'd be in exactly the same situation as we already are, just with more complexity added on top.
Using linker scripts to switch between implementations at compile time is much more complex than just letting the linker link stuff together directly. That's what it's good at. Using linker scripts would just add extra complexity without any benefit. We'd still end up with a single implementation in the binary, yet to call it code would have to indirect through the linker-defined table, rather than simply calling the same code by symbol name. Uggh!
Note that per the discussions in other forks of this thread, it's likely we'll need to code all these cache operations in assembly to avoid accessing DRAM while turning the cache on/off. This implies to me that we'll need to keep all the cache code extremely simple and direct, without any form of runtime or compile time indirection.
For the record, until / unless we move to the point where we can run different SoCs within a single binary, doing this via weak functions seems to me to be the correct abstraction. If we know that some SoCs will be able to nop these, that is. If all SoCs will need something, we should simply omit the default functions. Thanks!
Is that a goal? I can see it would be useful to be able to build for multiple SoCs (i.e. avoid having to worry about what board you have) but we are a way off from that :-)
We're well off from seeing about the reality of that, yes :)

On 10/19/2016 09:39 AM, Tom Rini wrote:
On Wed, Oct 19, 2016 at 09:36:52AM -0600, Stephen Warren wrote:
On 10/18/2016 08:41 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 17:33, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 05:08 PM, Simon Glass wrote:
Hi Stephen,
On 18 October 2016 at 16:54, Stephen Warren swarren@wwwdotorg.org wrote:
On 10/18/2016 01:56 PM, Simon Glass wrote: > > > Hi Stephen, > > On 18 October 2016 at 13:10, Stephen Warren swarren@wwwdotorg.org > wrote: >> >> >> On 10/18/2016 01:03 PM, Simon Glass wrote: >>> >>> >>> >>> Hi Stephen, >>> >>> On 18 October 2016 at 12:58, Stephen Warren swarren@wwwdotorg.org >>> wrote: >>>> >>>> >>>> >>>> On 10/18/2016 10:23 AM, Simon Glass wrote: >>>>> >>>>> >>>>> >>>>> >>>>> Hi Stephen, >>>>> >>>>> On 17 October 2016 at 15:35, Stephen Warren swarren@wwwdotorg.org >>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> From: Stephen Warren swarren@nvidia.com >>>>>> >>>>>> SoC-specific logic may be required for all forms of cache-wide >>>>>> operations; invalidate and flush of both dcache and icache (note >>>>>> that >>>>>> only 3 of the 4 possible combinations make sense, since the icache >>>>>> never >>>>>> contains dirty lines). This patch adds an optional hook for all >>>>>> implemented cache-wide operations, and renames the one existing >>>>>> hook >>>>>> to >>>>>> better represent exactly which operation it is implementing. A >>>>>> dummy >>>>>> no-op implementation of each hook is provided. These dummy >>>>>> implementations are moved into C code, since there's no need to >>>>>> implement them in assembly. >>>>>> >>>>>> Signed-off-by: Stephen Warren swarren@nvidia.com >>>>>> --- >>>>>> arch/arm/cpu/armv8/cache.S | 6 ------ >>>>>> arch/arm/cpu/armv8/cache_v8.c | 23 >>>>>> ++++++++++++++++++++--- >>>>>> arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S | 4 ++-- >>>>>> arch/arm/include/asm/system.h | 5 ++++- >>>>>> arch/arm/mach-tegra/tegra186/cache.c | 2 +- >>>>>> 5 files changed, 27 insertions(+), 13 deletions(-) >>>>>> >>>>> >>>>> I think we should have a proper interface for this stuff rather than >>>>> weak functions. Maybe we need a linker-list approach, or a cache >>>>> uclass? >>>> >>>> >>>> >>>> >>>> >>>> What's improper about this interface? Presumably we could argue that >>>> no >>>> function in the entire of U-Boot be called by symbol name, which >>>> doesn't >>>> seem useful. >>>> >>>> I'm not sure exactly what you envisage by a linker-list approach. Can >>>> you >>>> provide some background? I understand how the linker can construct >>>> list >>>> of >>>> objects/implementations/..., but that doesn't seem useful here since >>>> there's >>>> a known-ahead-of-time single implementation of these functions in a >>>> single >>>> build of U-Boot. >>> >>> >>> >>> >>> Your own commit messages says that this is SoC-specific. I'm >>> suggesting that we define an interface (which I think you have already >>> done with your header file additions), and allow SoCs to implement it >>> via a linker list. >>> >>> IMO the cache code in U-Boot is starting to get a bit creaky. >>> >>>> A cache uclass seems like /massive/ overkill, especially since I'd >>>> expect >>>> these very low-level functions to be required well before any higher >>>> level >>>> code like DM/classes/... to be available. >>> >>> >>> >>> >>> DM is available very early. But it's not clear from your patch when >>> this code is actually called. >> >> >> >> >> I believe that weak functions are a perfectly acceptable approach here. >> >> Yes, the implementation of these functions is SoC-specific. The >> Makefiles >> will pull in the appropriate implementation for that SoC whenever >> U-Boot >> is >> built, just like every other board- or SoC-specific function in the >> entire >> of U-Boot. >> >> There's no need for linker lists since there is only ever one >> implementation. > > > > If there is only ever one implementation, why do you need weak > functions?
As I explicitly stated above, each SoC can have a different implementation, yet only a single implementation is ever needed for a particular U-Boot build.
> Just call them directly.
The code is doing that, both before and after my patch.
I mean call them without needing weak functions.
> I think in fact you mean that > there can be no implementation (but perhaps an empty one), or one > implementation. You are effectively using multiple weak functions to > provide default code. I think it would be better if this were > explicit. > > I still think that the cache functions could do with a rethink.
In my opinion, this patch doesn't change the code structure at all. There is already an interface between the core (L1/L2) cache management code and the SoC-specific cache management code. That interface already uses a weak function to provide the default no-op implementation. This patch doesn't change any of that. All this patch does is fix that existing interface to cover all 3 combinations of dcache_flush, dcache_invalidate, and icache_invalidate, rather than just one of those combinations. It's more of a bug-fix than anything else.
Yes I see that.
If you want to rework this interface sometime, be my guest. However, I don't think it's fair to require that someone who simply wants to fix the existing code (in a way that is orthogonal to your desired interface refactoring) do that refactoring first, rather than doing it yourself.
I understand what you are saying, but isn't that how open source software works? Believe me, I have done my fair share of refactoring :-)
At least can you look at not making it any harder to fix up later? The more we pile onto this interface, the harder it will be later. We should aim to make ARMv8 really nice as it is the new thing.
I believe moving one or three functions into any replacement scheme will have an identical level of difficulty. So, I believe the patch already satisfies the "no harder" requirement.
Well it seems your mind is already made up!
Well, I don't believe there are any viable or reasonable alternatives. I'm not prolonging this thread because I find it enjoyable, but because of the lack of alternatives to what this patch does.
Doing this via DM doesn't simplify anything or make it more maintainable; it's just using DM for the sake of it. DM is great when it makes life simpler and code more maintainable, but we use it because of those benefits, not just for the sake of it. Using DM adds complexity, and so there has to be a benefit of doing so. I don't believe there is here.
Doing this by linker scripts simply doesn't make sense:
Any given U-Boot binary will only contain a single implementation of these functions, so there's no need for any kind of runtime lookup, and if there was a runtime lookup, what would be the key and where would it come from? I believe we'd still have some compile-time-seledted function/data to drive the runtime lookup process, and so we'd be in exactly the same situation as we already are, just with more complexity added on top.
Using linker scripts to switch between implementations at compile time is much more complex than just letting the linker link stuff together directly. That's what it's good at. Using linker scripts would just add extra complexity without any benefit. We'd still end up with a single implementation in the binary, yet to call it code would have to indirect through the linker-defined table, rather than simply calling the same code by symbol name. Uggh!
Note that per the discussions in other forks of this thread, it's likely we'll need to code all these cache operations in assembly to avoid accessing DRAM while turning the cache on/off. This implies to me that we'll need to keep all the cache code extremely simple and direct, without any form of runtime or compile time indirection.
For the record, until / unless we move to the point where we can run different SoCs within a single binary, doing this via weak functions seems to me to be the correct abstraction. If we know that some SoCs will be able to nop these, that is. If all SoCs will need something, we should simply omit the default functions. Thanks!
I believe there can be[1] SoCs where the architectural by-set/way instructions are entirely sufficient to flush the dcache. In those cases, the default no-op implementation of the hook is appropriate.
[1] and likely are; not every existing ARMv8 U-Boot port implements the current __asm_flush_l3_cache hook, and hopefully they're all correct and working.
participants (4)
-
Simon Glass
-
Stephen Warren
-
Tom Rini
-
york sun