[U-Boot] [PATCH v2] ARM926ejs: Add routines to invalidate D-Cache

After DMA operation, we need to maintain D-Cache coherency. We need to clean cache (write back the dirty lines) and then make the cache invalidate as well(hence CPU will fetch data written by DMA controller from RAM).
Tested on AT91SAM9261EK with Peripheral DMA controller.
Signed-off-by: Hong Xu hong.xu@atmel.com Tested-by: Elen Song elen.song@atmel.com CC: Albert Aribaud albert.aribaud@free.fr CC: Heiko Schocher hs@denx.de --- Changes since v1 ~ Per Albert's suggestion, add invalidate_dcache_range originally defined in include/common.h
arch/arm/lib/cache.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 46 insertions(+), 0 deletions(-)
diff --git a/arch/arm/lib/cache.c b/arch/arm/lib/cache.c index 92b61a2..0436443 100644 --- a/arch/arm/lib/cache.c +++ b/arch/arm/lib/cache.c @@ -53,3 +53,49 @@ void __flush_dcache_all(void) } void flush_dcache_all(void) __attribute__((weak, alias("__flush_dcache_all"))); + +void __invalidate_dcache_range(unsigned long start, unsigned long stop) +{ + int cache_line_len; + unsigned long mva; + +#ifdef CONFIG_ARM926EJS + +#ifdef CONFIG_SYS_CACHELINE_SIZE + cache_line_len = CONFIG_SYS_CACHELINE_SIZE; +#else + cache_line_len = 32; +#endif + /* + * If start and stop are not aligned to cache-line, + * the adjacent lines will be cleaned + */ + if ((start & (cache_line_len - 1)) != 0) + asm("mcr p15, 0, %0, c7, c10, 1" : : "r" (start)); + if ((stop & (cache_line_len - 1)) != 0) + asm("mcr p15, 0, %0, c7, c10, 1" : : "r" (stop)); + + mva = start & ~(cache_line_len - 1); + while (mva < stop) { + asm("mcr p15, 0, %0, c7, c6, 1" : : "r"(mva)); + mva += cache_line_len; + } + + /* Drain the WB */ + asm("mcr p15, 0, %0, c7, c10, 4" : : "r" (0)); +#endif + + return; +} +void invalidate_dcache_range(unsigned long start, unsigned long stop) + __attribute__((weak, alias("__invalidate_dcache_range"))); + +void __invalidate_dcache_all(void) +{ +#ifdef CONFIG_ARM926EJS + asm("mcr p15, 0, %0, c7, c6, 0" : : "r" (0)); +#endif + return; +} +void invalidate_dcache_all(void) + __attribute__((weak, alias("__invalidate_dcache_all")));

Dear Hong Xu,
After DMA operation, we need to maintain D-Cache coherency. We need to clean cache (write back the dirty lines) and then make the cache invalidate as well(hence CPU will fetch data written by DMA controller from RAM).
Tested on AT91SAM9261EK with Peripheral DMA controller.
Signed-off-by: Hong Xuhong.xu@atmel.com Tested-by: Elen Songelen.song@atmel.com CC: Albert Aribaudalbert.aribaud@free.fr CC: Heiko Schocherhs@denx.de
Changes since v1 ~ Per Albert's suggestion, add invalidate_dcache_range originally defined in include/common.h
arch/arm/lib/cache.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 46 insertions(+), 0 deletions(-)
diff --git a/arch/arm/lib/cache.c b/arch/arm/lib/cache.c index 92b61a2..0436443 100644 --- a/arch/arm/lib/cache.c +++ b/arch/arm/lib/cache.c @@ -53,3 +53,49 @@ void __flush_dcache_all(void) } void flush_dcache_all(void) __attribute__((weak, alias("__flush_dcache_all")));
+void __invalidate_dcache_range(unsigned long start, unsigned long stop) +{
- int cache_line_len;
- unsigned long mva;
+#ifdef CONFIG_ARM926EJS
+#ifdef CONFIG_SYS_CACHELINE_SIZE
- cache_line_len = CONFIG_SYS_CACHELINE_SIZE;
+#else
- cache_line_len = 32;
+#endif
- /*
* If start and stop are not aligned to cache-line,
* the adjacent lines will be cleaned
*/
- if ((start& (cache_line_len - 1)) != 0)
asm("mcr p15, 0, %0, c7, c10, 1" : : "r" (start));
- if ((stop& (cache_line_len - 1)) != 0)
asm("mcr p15, 0, %0, c7, c10, 1" : : "r" (stop));
Why so complicated? How about: /* round down to the start of the cache line */ start &= (cache_line_len - 1); /* round up to the end of the cache line */ note: if, what I assume, the range to be invalidated is [start, stop) - that means stop is the first address not to be invalidated, the next statement is not necessary stop |= (cache_line_len - 1); while (start < stop) { asm("mcr p15, 0, %0, c7, c6, 1" : : "r"(start)); start += cache_line_len; } /* Drain the WB */ asm("mcr p15, 0, %0, c7, c10, 4" : : "r" (0));
+#endif
Best Regards, Reinhard

H Reinhard,
On 08/05/2011 01:11 PM, Reinhard Meyer wrote:
Dear Hong Xu,
After DMA operation, we need to maintain D-Cache coherency. We need to clean cache (write back the dirty lines) and then make the cache invalidate as well(hence CPU will fetch data written by DMA controller from RAM).
Tested on AT91SAM9261EK with Peripheral DMA controller.
Signed-off-by: Hong Xuhong.xu@atmel.com Tested-by: Elen Songelen.song@atmel.com CC: Albert Aribaudalbert.aribaud@free.fr CC: Heiko Schocherhs@denx.de
Changes since v1 ~ Per Albert's suggestion, add invalidate_dcache_range originally defined in include/common.h
arch/arm/lib/cache.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 46 insertions(+), 0 deletions(-)
diff --git a/arch/arm/lib/cache.c b/arch/arm/lib/cache.c index 92b61a2..0436443 100644 --- a/arch/arm/lib/cache.c +++ b/arch/arm/lib/cache.c @@ -53,3 +53,49 @@ void __flush_dcache_all(void) } void flush_dcache_all(void) __attribute__((weak, alias("__flush_dcache_all")));
+void __invalidate_dcache_range(unsigned long start, unsigned long stop) +{
- int cache_line_len;
- unsigned long mva;
+#ifdef CONFIG_ARM926EJS
+#ifdef CONFIG_SYS_CACHELINE_SIZE
- cache_line_len = CONFIG_SYS_CACHELINE_SIZE;
+#else
- cache_line_len = 32;
+#endif
- /*
- If start and stop are not aligned to cache-line,
- the adjacent lines will be cleaned
- */
- if ((start& (cache_line_len - 1)) != 0)
- asm("mcr p15, 0, %0, c7, c10, 1" : : "r" (start));
- if ((stop& (cache_line_len - 1)) != 0)
- asm("mcr p15, 0, %0, c7, c10, 1" : : "r" (stop));
Why so complicated?
In case of mis-aligned (start, stop), this is to clean the cache line, aka. write back the potential dirty lines before successive invalidating
BR, Eric
How about: /* round down to the start of the cache line */ start &= (cache_line_len - 1); /* round up to the end of the cache line */ note: if, what I assume, the range to be invalidated is [start, stop) - that means stop is the first address not to be invalidated, the next statement is not necessary stop |= (cache_line_len - 1); while (start < stop) { asm("mcr p15, 0, %0, c7, c6, 1" : : "r"(start)); start += cache_line_len; } /* Drain the WB */ asm("mcr p15, 0, %0, c7, c10, 4" : : "r" (0));
+#endif
Best Regards, Reinhard

Hi Hong Xu,
Le 05/08/2011 08:17, Hong Xu a écrit :
Why so complicated?
In case of mis-aligned (start, stop), this is to clean the cache line, aka. write back the potential dirty lines before successive invalidating
How do you know the dirty data should be flushed rather than invalidated?
Amicalement,

Hi Hong Xu,
Le 05/08/2011 06:44, Hong Xu a écrit :
After DMA operation, we need to maintain D-Cache coherency. We need to clean cache (write back the dirty lines) and then make the cache invalidate as well(hence CPU will fetch data written by DMA controller from RAM).
Tested on AT91SAM9261EK with Peripheral DMA controller.
Signed-off-by: Hong Xuhong.xu@atmel.com Tested-by: Elen Songelen.song@atmel.com CC: Albert Aribaudalbert.aribaud@free.fr CC: Heiko Schocherhs@denx.de
Changes since v1 ~ Per Albert's suggestion, add invalidate_dcache_range originally defined in include/common.h
arch/arm/lib/cache.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 46 insertions(+), 0 deletions(-)
diff --git a/arch/arm/lib/cache.c b/arch/arm/lib/cache.c index 92b61a2..0436443 100644 --- a/arch/arm/lib/cache.c +++ b/arch/arm/lib/cache.c @@ -53,3 +53,49 @@ void __flush_dcache_all(void) } void flush_dcache_all(void) __attribute__((weak, alias("__flush_dcache_all")));
+void __invalidate_dcache_range(unsigned long start, unsigned long stop) +{
- int cache_line_len;
- unsigned long mva;
+#ifdef CONFIG_ARM926EJS
+#ifdef CONFIG_SYS_CACHELINE_SIZE
- cache_line_len = CONFIG_SYS_CACHELINE_SIZE;
+#else
- cache_line_len = 32;
Document magic number 32 -- for instance, indicate ARM architecture spec paragraph reference in a comment above it (and possibly emit a compile-time and/or run-time warning) -- or bail out with a compile error if CONFIG_SYS_CACHELINE_SIZE is not defined.
+#endif
- /*
* If start and stop are not aligned to cache-line,
* the adjacent lines will be cleaned
*/
- if ((start& (cache_line_len - 1)) != 0)
asm("mcr p15, 0, %0, c7, c10, 1" : : "r" (start));
- if ((stop& (cache_line_len - 1)) != 0)
asm("mcr p15, 0, %0, c7, c10, 1" : : "r" (stop));
- mva = start& ~(cache_line_len - 1);
- while (mva< stop) {
asm("mcr p15, 0, %0, c7, c6, 1" : : "r"(mva));
mva += cache_line_len;
- }
I, like Reinhard, prefer aligning start and stop and then looping through a single invalidate mcr.
But I also want to make sure logs tell us anything weird with caches, and since unaligned start or stop invalidation could lead to trashing third party data, I would like the code to emit a run-time warning if that happens, like this:
if ((start& (cache_line_len - 1)) != 0) { printf("WARNING: aligning start %x on start of cache line\n", start); start &= ~(cache_line_len - 1); } if ((stop& (cache_line_len - 1)) != (cache_line_len -1) ) { printf("WARNING: aligning stop %x on end of cache line\n", stop); start |= (cache_line_len - 1); }
That will guarantee that unaligned cache invalidates will be detectable, and/or that will entice developers to align their starts and stops.
- /* Drain the WB */
- asm("mcr p15, 0, %0, c7, c10, 4" : : "r" (0));
+#endif
- return;
+} +void invalidate_dcache_range(unsigned long start, unsigned long stop)
- __attribute__((weak, alias("__invalidate_dcache_range")));
+void __invalidate_dcache_all(void) +{ +#ifdef CONFIG_ARM926EJS
- asm("mcr p15, 0, %0, c7, c6, 0" : : "r" (0));
+#endif
- return;
+} +void invalidate_dcache_all(void)
- __attribute__((weak, alias("__invalidate_dcache_all")));
Amicalement,

Hi Albert,
On 08/05/2011 02:13 PM, Albert ARIBAUD wrote:
Hi Hong Xu,
Le 05/08/2011 06:44, Hong Xu a écrit :
After DMA operation, we need to maintain D-Cache coherency. We need to clean cache (write back the dirty lines) and then make the cache invalidate as well(hence CPU will fetch data written by DMA controller from RAM).
Tested on AT91SAM9261EK with Peripheral DMA controller.
Signed-off-by: Hong Xuhong.xu@atmel.com Tested-by: Elen Songelen.song@atmel.com CC: Albert Aribaudalbert.aribaud@free.fr CC: Heiko Schocherhs@denx.de
[...]
+#ifdef CONFIG_SYS_CACHELINE_SIZE
- cache_line_len = CONFIG_SYS_CACHELINE_SIZE;
+#else
- cache_line_len = 32;
Document magic number 32 -- for instance, indicate ARM architecture spec paragraph reference in a comment above it (and possibly emit a compile-time and/or run-time warning) -- or bail out with a compile error if CONFIG_SYS_CACHELINE_SIZE is not defined.
ARM926EJ-S Technical Reference Manual (r0p4/r0p5) only defines one cache line size, "32". From my understanding, the standard ARM926EJ-S implementation shall use 32 bytes cache line size. Of course this can be overwritten by CONFIG_SYS_CACHELINE_SIZE, but shall we force to define it?
Frankly I have not found a proper place to define this(aka. not using 32 here), for example as a Macro.
+#endif
- /*
- If start and stop are not aligned to cache-line,
- the adjacent lines will be cleaned
- */
- if ((start& (cache_line_len - 1)) != 0)
- asm("mcr p15, 0, %0, c7, c10, 1" : : "r" (start));
- if ((stop& (cache_line_len - 1)) != 0)
- asm("mcr p15, 0, %0, c7, c10, 1" : : "r" (stop));
- mva = start& ~(cache_line_len - 1);
- while (mva< stop) {
- asm("mcr p15, 0, %0, c7, c6, 1" : : "r"(mva));
- mva += cache_line_len;
- }
I, like Reinhard, prefer aligning start and stop and then looping through a single invalidate mcr.
But I also want to make sure logs tell us anything weird with caches, and since unaligned start or stop invalidation could lead to trashing third party data, I would like the code to emit a run-time warning if that happens, like this:
if ((start& (cache_line_len - 1)) != 0) { printf("WARNING: aligning start %x on start of cache line\n", start); start &= ~(cache_line_len - 1); } if ((stop& (cache_line_len - 1)) != (cache_line_len -1) ) { printf("WARNING: aligning stop %x on end of cache line\n", stop); start |= (cache_line_len - 1); }
I've tried to deal with the case that the (start, stop) is not aligned. If mis-align happens, the adjacent lines will be cleaned before invalidating. And from my view it's impossible for a driver to always pass aligned address to invalidate_dcache_range.
To answer your question in another email
How do you know the dirty data should be flushed rather than
invalidated?
"Dirty" means this cache is changed by CPU but not has not been written into memory or WB. If we invalidate it, data will lost. In most cases, I do not see a situation why the dirty data shall not be written into memory.
BR, Eric
[...]
Amicalement,

Le 05/08/2011 08:38, Hong Xu a écrit :
Hi Albert,
On 08/05/2011 02:13 PM, Albert ARIBAUD wrote:
Hi Hong Xu,
Le 05/08/2011 06:44, Hong Xu a écrit :
After DMA operation, we need to maintain D-Cache coherency. We need to clean cache (write back the dirty lines) and then make the cache invalidate as well(hence CPU will fetch data written by DMA controller from RAM).
Tested on AT91SAM9261EK with Peripheral DMA controller.
Signed-off-by: Hong Xuhong.xu@atmel.com Tested-by: Elen Songelen.song@atmel.com CC: Albert Aribaudalbert.aribaud@free.fr CC: Heiko Schocherhs@denx.de
[...]
+#ifdef CONFIG_SYS_CACHELINE_SIZE
- cache_line_len = CONFIG_SYS_CACHELINE_SIZE;
+#else
- cache_line_len = 32;
Document magic number 32 -- for instance, indicate ARM architecture spec paragraph reference in a comment above it (and possibly emit a compile-time and/or run-time warning) -- or bail out with a compile error if CONFIG_SYS_CACHELINE_SIZE is not defined.
ARM926EJ-S Technical Reference Manual (r0p4/r0p5) only defines one cache line size, "32". From my understanding, the standard ARM926EJ-S implementation shall use 32 bytes cache line size. Of course this can be overwritten by CONFIG_SYS_CACHELINE_SIZE, but shall we force to define it?
Frankly I have not found a proper place to define this(aka. not using 32 here), for example as a Macro.
As I said, I am fine with a simple comment above that '32' that says where this value comes from.
+#endif
- /*
- If start and stop are not aligned to cache-line,
- the adjacent lines will be cleaned
- */
- if ((start& (cache_line_len - 1)) != 0)
- asm("mcr p15, 0, %0, c7, c10, 1" : : "r" (start));
- if ((stop& (cache_line_len - 1)) != 0)
- asm("mcr p15, 0, %0, c7, c10, 1" : : "r" (stop));
- mva = start& ~(cache_line_len - 1);
- while (mva< stop) {
- asm("mcr p15, 0, %0, c7, c6, 1" : : "r"(mva));
- mva += cache_line_len;
- }
I, like Reinhard, prefer aligning start and stop and then looping through a single invalidate mcr.
But I also want to make sure logs tell us anything weird with caches, and since unaligned start or stop invalidation could lead to trashing third party data, I would like the code to emit a run-time warning if that happens, like this:
if ((start& (cache_line_len - 1)) != 0) { printf("WARNING: aligning start %x on start of cache line\n", start); start &= ~(cache_line_len - 1); } if ((stop& (cache_line_len - 1)) != (cache_line_len -1) ) { printf("WARNING: aligning stop %x on end of cache line\n", stop); start |= (cache_line_len - 1); }
I've tried to deal with the case that the (start, stop) is not aligned. If mis-align happens, the adjacent lines will be cleaned before invalidating. And from my view it's impossible for a driver to always pass aligned address to invalidate_dcache_range.
Why would it be impossible? If it is statically allocated it can use __attribute__(aligned)) and if it is dynamically aligned, it can be over-allocated to make sure it starts and ends at cache boundaries.
To answer your question in another email
How do you know the dirty data should be flushed rather than
invalidated?
"Dirty" means this cache is changed by CPU but not has not been written into memory or WB. If we invalidate it, data will lost. In most cases, I do not see a situation why the dirty data shall not be written into memory.
The problem is the cases that fall outside of 'most'. This kind of issue tends to have effects that, when unwanted, are extremely difficult to link to their cause and makes for long and painful debugging.
BR, Eric
Amicalement,

On 08/05/2011 02:46 PM, Albert ARIBAUD wrote:
Le 05/08/2011 08:38, Hong Xu a écrit :
Hi Albert,
On 08/05/2011 02:13 PM, Albert ARIBAUD wrote:
Hi Hong Xu,
Le 05/08/2011 06:44, Hong Xu a écrit :
After DMA operation, we need to maintain D-Cache coherency. We need to clean cache (write back the dirty lines) and then make the cache invalidate as well(hence CPU will fetch data written by DMA controller from RAM).
Tested on AT91SAM9261EK with Peripheral DMA controller.
Signed-off-by: Hong Xuhong.xu@atmel.com Tested-by: Elen Songelen.song@atmel.com CC: Albert Aribaudalbert.aribaud@free.fr CC: Heiko Schocherhs@denx.de
[...]
+#ifdef CONFIG_SYS_CACHELINE_SIZE
- cache_line_len = CONFIG_SYS_CACHELINE_SIZE;
+#else
- cache_line_len = 32;
Document magic number 32 -- for instance, indicate ARM architecture spec paragraph reference in a comment above it (and possibly emit a compile-time and/or run-time warning) -- or bail out with a compile error if CONFIG_SYS_CACHELINE_SIZE is not defined.
ARM926EJ-S Technical Reference Manual (r0p4/r0p5) only defines one cache line size, "32". From my understanding, the standard ARM926EJ-S implementation shall use 32 bytes cache line size. Of course this can be overwritten by CONFIG_SYS_CACHELINE_SIZE, but shall we force to define it?
Frankly I have not found a proper place to define this(aka. not using 32 here), for example as a Macro.
As I said, I am fine with a simple comment above that '32' that says where this value comes from.
OK
+#endif
- /*
- If start and stop are not aligned to cache-line,
- the adjacent lines will be cleaned
- */
- if ((start& (cache_line_len - 1)) != 0)
- asm("mcr p15, 0, %0, c7, c10, 1" : : "r" (start));
- if ((stop& (cache_line_len - 1)) != 0)
- asm("mcr p15, 0, %0, c7, c10, 1" : : "r" (stop));
- mva = start& ~(cache_line_len - 1);
- while (mva< stop) {
- asm("mcr p15, 0, %0, c7, c6, 1" : : "r"(mva));
- mva += cache_line_len;
- }
I, like Reinhard, prefer aligning start and stop and then looping through a single invalidate mcr.
But I also want to make sure logs tell us anything weird with caches, and since unaligned start or stop invalidation could lead to trashing third party data, I would like the code to emit a run-time warning if that happens, like this:
if ((start& (cache_line_len - 1)) != 0) { printf("WARNING: aligning start %x on start of cache line\n", start); start &= ~(cache_line_len - 1); } if ((stop& (cache_line_len - 1)) != (cache_line_len -1) ) { printf("WARNING: aligning stop %x on end of cache line\n", stop); start |= (cache_line_len - 1); }
I've tried to deal with the case that the (start, stop) is not aligned. If mis-align happens, the adjacent lines will be cleaned before invalidating. And from my view it's impossible for a driver to always pass aligned address to invalidate_dcache_range.
Why would it be impossible? If it is statically allocated it can use __attribute__(aligned)) and if it is dynamically aligned, it can be over-allocated to make sure it starts and ends at cache boundaries.
Technically it's possible, as you mentioned. But shall we add such restriction? Does this API initially have this alignment primitive?
To answer your question in another email
How do you know the dirty data should be flushed rather than
invalidated?
"Dirty" means this cache is changed by CPU but not has not been written into memory or WB. If we invalidate it, data will lost. In most cases, I do not see a situation why the dirty data shall not be written into memory.
The problem is the cases that fall outside of 'most'. This kind of issue tends to have effects that, when unwanted, are extremely difficult to link to their cause and makes for long and painful debugging.
Well, "most" here may be a little bit confusing. Assuming a write-through cache, how can you write data to memory and then regret? I think you have to undo explicitly.
How about keeping cleaning the cache when mis-aligned happens and print a WARNING?
BR, Eric
BR, Eric
Amicalement,

Hi Hong, Albert,
On Friday 05 August 2011 12:16 PM, Albert ARIBAUD wrote:
Le 05/08/2011 08:38, Hong Xu a écrit :
Hi Albert,
I've tried to deal with the case that the (start, stop) is not aligned. If mis-align happens, the adjacent lines will be cleaned before invalidating. And from my view it's impossible for a driver to always pass aligned address to invalidate_dcache_range.
Why would it be impossible? If it is statically allocated it can use __attribute__(aligned)) and if it is dynamically aligned, it can be over-allocated to make sure it starts and ends at cache boundaries.
To answer your question in another email
How do you know the dirty data should be flushed rather than
invalidated?
"Dirty" means this cache is changed by CPU but not has not been written into memory or WB. If we invalidate it, data will lost. In most cases, I do not see a situation why the dirty data shall not be written into memory.
The problem is the cases that fall outside of 'most'. This kind of issue tends to have effects that, when unwanted, are extremely difficult to link to their cause and makes for long and painful debugging.
IMHO, Hong's approach is correct. If the buffer that is invalidated is not aligned to cache-line, one cache-line at the respective boundary may have to be flushed to make sure the invalidation doesn't affect somebody else's memory.
The solution is for drivers to ensure that any buffer that needs to be invalidated is aligned to cache-line boundary at both ends. The above approach puts this onus on the driver. I have documented the alignment requirement in my recent patch series for fixing arm cache problems.
best regards, Aneesh

Hi Aneesh,
On 05/08/2011 09:10, Aneesh V wrote:
Hi Hong, Albert,
On Friday 05 August 2011 12:16 PM, Albert ARIBAUD wrote:
Le 05/08/2011 08:38, Hong Xu a écrit :
Hi Albert,
I've tried to deal with the case that the (start, stop) is not aligned. If mis-align happens, the adjacent lines will be cleaned before invalidating. And from my view it's impossible for a driver to always pass aligned address to invalidate_dcache_range.
Why would it be impossible? If it is statically allocated it can use __attribute__(aligned)) and if it is dynamically aligned, it can be over-allocated to make sure it starts and ends at cache boundaries.
To answer your question in another email
How do you know the dirty data should be flushed rather than
invalidated?
"Dirty" means this cache is changed by CPU but not has not been written into memory or WB. If we invalidate it, data will lost. In most cases, I do not see a situation why the dirty data shall not be written into memory.
The problem is the cases that fall outside of 'most'. This kind of issue tends to have effects that, when unwanted, are extremely difficult to link to their cause and makes for long and painful debugging.
IMHO, Hong's approach is correct. If the buffer that is invalidated is not aligned to cache-line, one cache-line at the respective boundary may have to be flushed to make sure the invalidation doesn't affect somebody else's memory.
The solution is for drivers to ensure that any buffer that needs to be invalidated is aligned to cache-line boundary at both ends. The above approach puts this onus on the driver. I have documented the alignment requirement in my recent patch series for fixing arm cache problems.
AIUI, you describe what I requested, so I agree with it. :)
And if there is an alignment requirement, then there is no need any more to flush lines when unaligned start/stop is passed, because it would only have a use for call cases that do not conform to the requirement.
best regards, Aneesh
Amicalement,

Hi Albert,
On Friday 05 August 2011 02:50 PM, Albert ARIBAUD wrote:
Hi Aneesh,
On 05/08/2011 09:10, Aneesh V wrote:
Hi Hong, Albert,
On Friday 05 August 2011 12:16 PM, Albert ARIBAUD wrote:
Le 05/08/2011 08:38, Hong Xu a écrit :
Hi Albert,
I've tried to deal with the case that the (start, stop) is not aligned. If mis-align happens, the adjacent lines will be cleaned before invalidating. And from my view it's impossible for a driver to always pass aligned address to invalidate_dcache_range.
Why would it be impossible? If it is statically allocated it can use __attribute__(aligned)) and if it is dynamically aligned, it can be over-allocated to make sure it starts and ends at cache boundaries.
To answer your question in another email
How do you know the dirty data should be flushed rather than
invalidated?
"Dirty" means this cache is changed by CPU but not has not been written into memory or WB. If we invalidate it, data will lost. In most cases, I do not see a situation why the dirty data shall not be written into memory.
The problem is the cases that fall outside of 'most'. This kind of issue tends to have effects that, when unwanted, are extremely difficult to link to their cause and makes for long and painful debugging.
IMHO, Hong's approach is correct. If the buffer that is invalidated is not aligned to cache-line, one cache-line at the respective boundary may have to be flushed to make sure the invalidation doesn't affect somebody else's memory.
The solution is for drivers to ensure that any buffer that needs to be invalidated is aligned to cache-line boundary at both ends. The above approach puts this onus on the driver. I have documented the alignment requirement in my recent patch series for fixing arm cache problems.
AIUI, you describe what I requested, so I agree with it. :)
And if there is an alignment requirement, then there is no need any more to flush lines when unaligned start/stop is passed, because it would only have a use for call cases that do not conform to the requirement.
Yes, but when the alignment requirement is not met by a driver that does invalidation:
1. Flushing will affect the driver itself.
2. Not flushing, may affect some other code for no fault of theirs.
(1) is clearly better in my opinion as it puts the penalty back on the offending driver and not on some other code.
So, the flushing solution is a good way to make the drivers behave!
best regards, Aneesh

Hi Aneesh,
On 08/05/2011 03:10 PM, Aneesh V wrote:
Hi Hong, Albert,
On Friday 05 August 2011 12:16 PM, Albert ARIBAUD wrote:
Le 05/08/2011 08:38, Hong Xu a écrit :
Hi Albert,
I've tried to deal with the case that the (start, stop) is not aligned. If mis-align happens, the adjacent lines will be cleaned before invalidating. And from my view it's impossible for a driver to always pass aligned address to invalidate_dcache_range.
Why would it be impossible? If it is statically allocated it can use __attribute__(aligned)) and if it is dynamically aligned, it can be over-allocated to make sure it starts and ends at cache boundaries.
To answer your question in another email
How do you know the dirty data should be flushed rather than
invalidated?
"Dirty" means this cache is changed by CPU but not has not been written into memory or WB. If we invalidate it, data will lost. In most cases, I do not see a situation why the dirty data shall not be written into memory.
The problem is the cases that fall outside of 'most'. This kind of issue tends to have effects that, when unwanted, are extremely difficult to link to their cause and makes for long and painful debugging.
IMHO, Hong's approach is correct. If the buffer that is invalidated is not aligned to cache-line, one cache-line at the respective boundary may have to be flushed to make sure the invalidation doesn't affect somebody else's memory.
The solution is for drivers to ensure that any buffer that needs to be invalidated is aligned to cache-line boundary at both ends. The above approach puts this onus on the driver. I have documented the alignment requirement in my recent patch series for fixing arm cache problems.
I have not noticed the patch series. ;-) If we put the alignment burden to the driver, I'm afraid many drivers which make use of something like a DMA controller have to modify the code heavily. This sounds not good. :)
BR, Eric
best regards, Aneesh

Hi Eric,
On Friday 05 August 2011 04:03 PM, Hong Xu wrote:
Hi Aneesh,
[snip ..]
IMHO, Hong's approach is correct. If the buffer that is invalidated is not aligned to cache-line, one cache-line at the respective boundary may have to be flushed to make sure the invalidation doesn't affect somebody else's memory.
The solution is for drivers to ensure that any buffer that needs to be invalidated is aligned to cache-line boundary at both ends. The above approach puts this onus on the driver. I have documented the alignment requirement in my recent patch series for fixing arm cache problems.
I have not noticed the patch series. ;-) If we put the alignment burden to the driver, I'm afraid many drivers which make use of something like a DMA controller have to modify the code heavily. This sounds not good. :)
We have a fundamental problem when it comes to invalidating an un-aligned buffer. Either you flush the boundary lines and corrupt your buffer at boundaries OR you invalidate without flushing and corrupt memory around your buffer. Both are not good! The only real solution is to have aligned buffers, if you want to have D-cache enabled and do DMA at the same time.
br, Aneesh

Hi Aneesh,
On 05/08/2011 12:47, Aneesh V wrote:
Hi Eric,
On Friday 05 August 2011 04:03 PM, Hong Xu wrote:
Hi Aneesh,
[snip ..]
IMHO, Hong's approach is correct. If the buffer that is invalidated is not aligned to cache-line, one cache-line at the respective boundary may have to be flushed to make sure the invalidation doesn't affect somebody else's memory.
The solution is for drivers to ensure that any buffer that needs to be invalidated is aligned to cache-line boundary at both ends. The above approach puts this onus on the driver. I have documented the alignment requirement in my recent patch series for fixing arm cache problems.
I have not noticed the patch series. ;-) If we put the alignment burden to the driver, I'm afraid many drivers which make use of something like a DMA controller have to modify the code heavily. This sounds not good. :)
We have a fundamental problem when it comes to invalidating an un-aligned buffer. Either you flush the boundary lines and corrupt your buffer at boundaries OR you invalidate without flushing and corrupt memory around your buffer. Both are not good! The only real solution is to have aligned buffers, if you want to have D-cache enabled and do DMA at the same time.
Plus, there should not be *heavy* modifications; DMA engines tend to use essentially two types of memory-resident objects: data buffers and buffer descriptors. There's only a small handful of places in the driver code to look at to find where these objects are allocated and how.
So I stand by my opinion: since the cache invalidation routine should only be called with cache-aligned objects, there is no requirement to flush the first (resp. last) cache line in case of unaligned start (resp.stop), and I don't want cache operations performed when they are not required.
br, Aneesh
Amicalement,

Dear Albert, Aneesh, Eric,
We have a fundamental problem when it comes to invalidating an un-aligned buffer. Either you flush the boundary lines and corrupt your buffer at boundaries OR you invalidate without flushing and corrupt memory around your buffer. Both are not good! The only real solution is to have aligned buffers, if you want to have D-cache enabled and do DMA at the same time.
Plus, there should not be *heavy* modifications; DMA engines tend to use essentially two types of memory-resident objects: data buffers and buffer descriptors. There's only a small handful of places in the driver code to look at to find where these objects are allocated and how.
So I stand by my opinion: since the cache invalidation routine should only be called with cache-aligned objects, there is no requirement to flush the first (resp. last) cache line in case of unaligned start (resp.stop), and I don't want cache operations performed when they are not required.
After considering all issues, any driver that does flush OR invalidate a cache line that it does not fully "own" is prone to cause problems.
At flushing: some DMA might just have put data into the partial line. At invalidating: some Software might have put data, but the writeback had not occured.
So both flush AND invalidate functions should check for this event and emit a proper warning on the console.
My 2.7 cents... Reinhard

Hi Reinhard,
On 05/08/2011 13:23, Reinhard Meyer wrote:
Dear Albert, Aneesh, Eric,
We have a fundamental problem when it comes to invalidating an un-aligned buffer. Either you flush the boundary lines and corrupt your buffer at boundaries OR you invalidate without flushing and corrupt memory around your buffer. Both are not good! The only real solution is to have aligned buffers, if you want to have D-cache enabled and do DMA at the same time.
Plus, there should not be *heavy* modifications; DMA engines tend to use essentially two types of memory-resident objects: data buffers and buffer descriptors. There's only a small handful of places in the driver code to look at to find where these objects are allocated and how.
So I stand by my opinion: since the cache invalidation routine should only be called with cache-aligned objects, there is no requirement to flush the first (resp. last) cache line in case of unaligned start (resp.stop), and I don't want cache operations performed when they are not required.
After considering all issues, any driver that does flush OR invalidate a cache line that it does not fully "own" is prone to cause problems.
At flushing: some DMA might just have put data into the partial line. At invalidating: some Software might have put data, but the writeback had not occured.
So both flush AND invalidate functions should check for this event and emit a proper warning on the console.
Fully agreed.
My 2.7 cents... Reinhard
Amicalement,

Hi Albert,
On Friday 05 August 2011 04:33 PM, Albert ARIBAUD wrote:
Hi Aneesh,
On 05/08/2011 12:47, Aneesh V wrote:
Hi Eric,
On Friday 05 August 2011 04:03 PM, Hong Xu wrote:
Hi Aneesh,
[snip ..]
IMHO, Hong's approach is correct. If the buffer that is invalidated is not aligned to cache-line, one cache-line at the respective boundary may have to be flushed to make sure the invalidation doesn't affect somebody else's memory.
The solution is for drivers to ensure that any buffer that needs to be invalidated is aligned to cache-line boundary at both ends. The above approach puts this onus on the driver. I have documented the alignment requirement in my recent patch series for fixing arm cache problems.
I have not noticed the patch series. ;-) If we put the alignment burden to the driver, I'm afraid many drivers which make use of something like a DMA controller have to modify the code heavily. This sounds not good. :)
We have a fundamental problem when it comes to invalidating an un-aligned buffer. Either you flush the boundary lines and corrupt your buffer at boundaries OR you invalidate without flushing and corrupt memory around your buffer. Both are not good! The only real solution is to have aligned buffers, if you want to have D-cache enabled and do DMA at the same time.
Plus, there should not be *heavy* modifications; DMA engines tend to use essentially two types of memory-resident objects: data buffers and buffer descriptors. There's only a small handful of places in the driver code to look at to find where these objects are allocated and how.
So I stand by my opinion: since the cache invalidation routine should only be called with cache-aligned objects, there is no requirement to flush the first (resp. last) cache line in case of unaligned start (resp.stop), and I don't want cache operations performed when they are not required.
IMHO, flushing is better, because the person who commits the mistake of invalidating the un-aligned buffer is the one who is affected and is likely to fix the issue soon. If we didn't flush, the resulting corruption will cause totally random errors that will be hard to debug. Doing an extra flush operation for a maximum of 2 lines doesn't cost us anything. This is the approach followed by the kernel too.
br, Aneesh

(BTW: responders to this thread please stop using my @free.fr address. I just noticed the big pile of U-Boot related messages that went to an account which I do not use for U-Boot any more)
On 05/08/2011 13:51, Aneesh V wrote:
Hi Albert,
On Friday 05 August 2011 04:33 PM, Albert ARIBAUD wrote:
Hi Aneesh,
On 05/08/2011 12:47, Aneesh V wrote:
Hi Eric,
On Friday 05 August 2011 04:03 PM, Hong Xu wrote:
Hi Aneesh,
[snip ..]
IMHO, Hong's approach is correct. If the buffer that is invalidated is not aligned to cache-line, one cache-line at the respective boundary may have to be flushed to make sure the invalidation doesn't affect somebody else's memory.
The solution is for drivers to ensure that any buffer that needs to be invalidated is aligned to cache-line boundary at both ends. The above approach puts this onus on the driver. I have documented the alignment requirement in my recent patch series for fixing arm cache problems.
I have not noticed the patch series. ;-) If we put the alignment burden to the driver, I'm afraid many drivers which make use of something like a DMA controller have to modify the code heavily. This sounds not good. :)
We have a fundamental problem when it comes to invalidating an un-aligned buffer. Either you flush the boundary lines and corrupt your buffer at boundaries OR you invalidate without flushing and corrupt memory around your buffer. Both are not good! The only real solution is to have aligned buffers, if you want to have D-cache enabled and do DMA at the same time.
Plus, there should not be *heavy* modifications; DMA engines tend to use essentially two types of memory-resident objects: data buffers and buffer descriptors. There's only a small handful of places in the driver code to look at to find where these objects are allocated and how.
So I stand by my opinion: since the cache invalidation routine should only be called with cache-aligned objects, there is no requirement to flush the first (resp. last) cache line in case of unaligned start (resp.stop), and I don't want cache operations performed when they are not required.
IMHO, flushing is better, because the person who commits the mistake of invalidating the un-aligned buffer is the one who is affected and is likely to fix the issue soon. If we didn't flush, the resulting corruption will cause totally random errors that will be hard to debug. Doing an extra flush operation for a maximum of 2 lines doesn't cost us anything. This is the approach followed by the kernel too.
As pointed out by Reinhard, flushing while invalidating is only almost good, and not required at all if alignment requirements are followed.
Especially, I don't buy the argument that "the person who commits the mistake of invalidating the un-aligned buffer is the one who is affected and is likely to fix the issue soon". The issue might not appear right after the call to flush is added; it might appear quite later, after several reorganizations of the ordering of data in RAM, and affect some completely unrelated person doing something completely unrelated.
OTOH, aligning buffers on cache boundaries removes the need to flush within invalidates, and will ensure no other data is at any risk.
Between an implementation that "should cause no issue" and an implementation that "cannot cause issues", I definitely favor the latter: so that's a no on my side to any flushing while invalidating a range.
br, Aneesh
Amicalement,

Hi Albert,
On Friday 05 August 2011 06:47 PM, Albert ARIBAUD wrote:
(BTW: responders to this thread please stop using my @free.fr address. I just noticed the big pile of U-Boot related messages that went to an account which I do not use for U-Boot any more)
On 05/08/2011 13:51, Aneesh V wrote:
Hi Albert,
On Friday 05 August 2011 04:33 PM, Albert ARIBAUD wrote:
Hi Aneesh,
On 05/08/2011 12:47, Aneesh V wrote:
Hi Eric,
On Friday 05 August 2011 04:03 PM, Hong Xu wrote:
Hi Aneesh,
[snip ..]
IMHO, Hong's approach is correct. If the buffer that is invalidated is not aligned to cache-line, one cache-line at the respective boundary may have to be flushed to make sure the invalidation doesn't affect somebody else's memory.
The solution is for drivers to ensure that any buffer that needs to be invalidated is aligned to cache-line boundary at both ends. The above approach puts this onus on the driver. I have documented the alignment requirement in my recent patch series for fixing arm cache problems.
I have not noticed the patch series. ;-) If we put the alignment burden to the driver, I'm afraid many drivers which make use of something like a DMA controller have to modify the code heavily. This sounds not good. :)
We have a fundamental problem when it comes to invalidating an un-aligned buffer. Either you flush the boundary lines and corrupt your buffer at boundaries OR you invalidate without flushing and corrupt memory around your buffer. Both are not good! The only real solution is to have aligned buffers, if you want to have D-cache enabled and do DMA at the same time.
Plus, there should not be *heavy* modifications; DMA engines tend to use essentially two types of memory-resident objects: data buffers and buffer descriptors. There's only a small handful of places in the driver code to look at to find where these objects are allocated and how.
So I stand by my opinion: since the cache invalidation routine should only be called with cache-aligned objects, there is no requirement to flush the first (resp. last) cache line in case of unaligned start (resp.stop), and I don't want cache operations performed when they are not required.
IMHO, flushing is better, because the person who commits the mistake of invalidating the un-aligned buffer is the one who is affected and is likely to fix the issue soon. If we didn't flush, the resulting corruption will cause totally random errors that will be hard to debug. Doing an extra flush operation for a maximum of 2 lines doesn't cost us anything. This is the approach followed by the kernel too.
As pointed out by Reinhard, flushing while invalidating is only almost good, and not required at all if alignment requirements are followed.
I don't dispute that having buffers aligned is the ideal scenario. The question is about error-handling the situation when this requirement is not met.
Especially, I don't buy the argument that "the person who commits the mistake of invalidating the un-aligned buffer is the one who is affected and is likely to fix the issue soon". The issue might not appear right after the call to flush is added; it might appear quite later, after several reorganizations of the ordering of data in RAM, and affect some completely unrelated person doing something completely unrelated.
I don't get how a flush can affect an un-related person unless the surrounding buffer also happens to be a DMA receive buffer. Please note that flush is something that can happen to any dirty line as part of the cache line replacement done by hardware when new lines are brought into the cache, it's not such a dangerous thing for normal memory locations.
OTOH, aligning buffers on cache boundaries removes the need to flush within invalidates, and will ensure no other data is at any risk.
If the buffers are aligned, the flush operation will not get executed. So, what is the risk?
Between an implementation that "should cause no issue" and an implementation that "cannot cause issues", I definitely favor the latter: so that's a no on my side to any flushing while invalidating a range.
Let's look at the different cases:
Let A be an un-aligned DMA receive buffer. Let B be a buffer lying next to A belonging to another program.
Case I - A is aligned 1. Invalidate with flush - no issue 2. Invalidate without flush - no issue
Case II - A is un-aligned, B is DMA receive buffer: 1. Invalidate with flush - corrupts B if DMA is on-going. 2. Invalidate without flush - no issue.
Case III - A is un-aligned, B is normal memory. 1. Invalidate with flush - A corrupted, no issue for B. 2. Invalidate without flush - no issue for A, B is corrupted.
Case I doesn't cause any issue either way. Case II would be rather rare. If it happens, it's more likely that B belongs to the same driver owning A. Case III is the more common error case and you can see that Invalidate with Flush is better because it doesn't affect B.
best regards, Aneesh

Hi Aneesh,
(cutting quotation for readability)
Le 05/08/2011 16:59, Aneesh V a écrit :
Hi Albert,
I don't dispute that having buffers aligned is the ideal scenario. The question is about error-handling the situation when this requirement is not met.
I understand what you're trying to achieve in this respect, that is, make the code as correct as it can be under unspecified conditions. I believe we are differing in how we construe 'as correct as it can be': you want to make the implementation of the called function as correct as it can be' at the expense of introducing a non-intuitive behavior (flush while invalidating), while I prefer the overall system to be as correct as it can be by 'doing exactly what it says on the tin', i.e. invalidating only.
I believe that the added intelligence of trying to perform as resiliently as possible is perfectly understandable, desirable and achievable in an operating system, but that in a bootloader, which has tighter constraints of size and speed notably, adding intelligence just to supplement lack of conformance in the code should be avoided.
Especially, I don't buy the argument that "the person who commits the mistake of invalidating the un-aligned buffer is the one who is affected and is likely to fix the issue soon". The issue might not appear right after the call to flush is added; it might appear quite later, after several reorganizations of the ordering of data in RAM, and affect some completely unrelated person doing something completely unrelated.
I don't get how a flush can affect an un-related person unless the surrounding buffer also happens to be a DMA receive buffer.
There you have it: it can. :)
More seriously: both leaving out the flush and doing it have issues when unaligned buffers are involved. Of these to imperfect solutions, I prefer the one that performs more efficiently in nominal cases.
Please note that flush is something that can happen to any dirty line as part of the cache line replacement done by hardware when new lines are brought into the cache, it's not such a dangerous thing for normal memory locations.
For normal memory indeed, it does not matter. But as soon as you start using -- needing -- flushes and invalidates, that means we're not dealing with normal memory any more, but with memory shared with other devices, DMAs or CPUs -- but that is true for doing flushes at the ends of an invalidate region... and for not doing them. Both solutions are simply imperfect due to the simple fact that the cache line just might contain parts of *two* different DMA buffers.
If the buffers are aligned, the flush operation will not get executed. So, what is the risk?
If the operation is not performed in the nominal case, then let's just not do it -- things have already gone wrong anyway.
Between an implementation that "should cause no issue" and an implementation that "cannot cause issues", I definitely favor the latter: so that's a no on my side to any flushing while invalidating a range.
Let's look at the different cases:
Let A be an un-aligned DMA receive buffer. Let B be a buffer lying next to A belonging to another program.
Case I - A is aligned
- Invalidate with flush - no issue
- Invalidate without flush - no issue
Case II - A is un-aligned, B is DMA receive buffer:
- Invalidate with flush - corrupts B if DMA is on-going.
- Invalidate without flush - no issue.
Case III - A is un-aligned, B is normal memory.
- Invalidate with flush - A corrupted, no issue for B.
- Invalidate without flush - no issue for A, B is corrupted.
Case I doesn't cause any issue either way. Case II would be rather rare. If it happens, it's more likely that B belongs to the same driver owning A. Case III is the more common error case and you can see that Invalidate with Flush is better because it doesn't affect B.
I do realize situations which will result in issues are rare. But I do realize that adding the extra logic of flushing is overkill compared to what the driver designer should have done, i.e. aligning some buffers, and that it will not fix all cases.
Also, I don't believe (any more...) in words like 'rare', 'likely' or 'probably'. I've gone through some pains myself about 'rare' conditions which actually were not that rare and cost me days, if not weeks on... rare... occasions.
So let's not try to solve out-of-spec conditions with hypotheses about 'probable' situations it would or would not handle -- we're out of specs, we don't handle things any more. Let's just do 'what it says on the tin', and since the function says it invalidates, let's just invalidate. The code will be simpler, cleaner, more understandable for nominal case, and will noisily signal out-of-spec cases and possibly misbehave then, but any code would, so that's ok.
And lets fix alignement issues where that belongs, i.e. in the driver buffers alignment. :)
best regards, Aneesh
Amicalement,

Hi Albert,
On Sunday 07 August 2011 12:25 PM, Albert ARIBAUD wrote:
Hi Aneesh,
(cutting quotation for readability)
Le 05/08/2011 16:59, Aneesh V a écrit :
Hi Albert,
I don't dispute that having buffers aligned is the ideal scenario. The question is about error-handling the situation when this requirement is not met.
I understand what you're trying to achieve in this respect, that is, make the code as correct as it can be under unspecified conditions. I believe we are differing in how we construe 'as correct as it can be': you want to make the implementation of the called function as correct as it can be' at the expense of introducing a non-intuitive behavior (flush while invalidating), while I prefer the overall system to be as correct as it can be by 'doing exactly what it says on the tin', i.e. invalidating only.
I understand your point of view now. I shall update my cache fix series to invalidate only the aligned part of the buffer and to print a big warning when the buffer is not aligned.
best regards, Aneesh

Le 08/08/2011 10:24, Aneesh V a écrit :
Hi Albert,
On Sunday 07 August 2011 12:25 PM, Albert ARIBAUD wrote:
Hi Aneesh,
(cutting quotation for readability)
Le 05/08/2011 16:59, Aneesh V a écrit :
Hi Albert,
I don't dispute that having buffers aligned is the ideal scenario. The question is about error-handling the situation when this requirement is not met.
I understand what you're trying to achieve in this respect, that is, make the code as correct as it can be under unspecified conditions. I believe we are differing in how we construe 'as correct as it can be': you want to make the implementation of the called function as correct as it can be' at the expense of introducing a non-intuitive behavior (flush while invalidating), while I prefer the overall system to be as correct as it can be by 'doing exactly what it says on the tin', i.e. invalidating only.
I understand your point of view now. I shall update my cache fix series to invalidate only the aligned part of the buffer and to print a big warning when the buffer is not aligned.
Thanks Aneesh.
Another point I raised with Hong Xu's patch: for range-limited operations, in case of a misalignment, why not try to *reduce* to aligned addresses rather than *expand* it? Moving start up to the next cache line boundary, and moving stop down, would still cause an imperfect situation (can't help it anyway) but it would not affect third party data any more, only the data which the cache range operation was supposed to affect.
What do you (and others) think?
best regards, Aneesh
Amicalement,

Hi Albert,
On Monday 08 August 2011 03:09 PM, Albert ARIBAUD wrote:
Le 08/08/2011 10:24, Aneesh V a écrit :
Hi Albert,
On Sunday 07 August 2011 12:25 PM, Albert ARIBAUD wrote:
Hi Aneesh,
(cutting quotation for readability)
Le 05/08/2011 16:59, Aneesh V a écrit :
Hi Albert,
I don't dispute that having buffers aligned is the ideal scenario. The question is about error-handling the situation when this requirement is not met.
I understand what you're trying to achieve in this respect, that is, make the code as correct as it can be under unspecified conditions. I believe we are differing in how we construe 'as correct as it can be': you want to make the implementation of the called function as correct as it can be' at the expense of introducing a non-intuitive behavior (flush while invalidating), while I prefer the overall system to be as correct as it can be by 'doing exactly what it says on the tin', i.e. invalidating only.
I understand your point of view now. I shall update my cache fix series to invalidate only the aligned part of the buffer and to print a big warning when the buffer is not aligned.
Thanks Aneesh.
Another point I raised with Hong Xu's patch: for range-limited operations, in case of a misalignment, why not try to *reduce* to aligned addresses rather than *expand* it? Moving start up to the next cache line boundary, and moving stop down, would still cause an imperfect situation (can't help it anyway) but it would not affect third party data any more, only the data which the cache range operation was supposed to affect.
What do you (and others) think?
I agree. Indeed this is what I meant when I wrote this above: "invalidate *only the aligned part* of the buffer"
best regards, Aneesh

Dear Albert, Aneesh, Hong,
There seem to be functions of type
xxx(start, end) and xxx(start, size).
Can't it be somehow decided to use only one variant in all cases (flush, invalidate)?
On a personal taste, I'd prefer (start, size) :)
Best Regards, Reinhard

Hi Reinhard,
On Monday 08 August 2011 03:29 PM, Reinhard Meyer wrote:
Dear Albert, Aneesh, Hong,
There seem to be functions of type
xxx(start, end) and xxx(start, size).
Can't it be somehow decided to use only one variant in all cases (flush, invalidate)?
The u-boot standard seems to be xxx(start, end) where the operation will be done on the range [start, end). This is what I figured out by looking at the prototypes and existing implementations when I did the armv7 work and I have stuck to this standard.
Hong also seems to be following the same standard.
If there is no objection, I shall add this definition to the README I am adding.
best regards, Aneesh

Hi Aneesh,
On Monday 08 August 2011 03:29 PM, Reinhard Meyer wrote:
Dear Albert, Aneesh, Hong,
There seem to be functions of type
xxx(start, end) and xxx(start, size).
Can't it be somehow decided to use only one variant in all cases (flush, invalidate)?
The u-boot standard seems to be xxx(start, end) where the operation will be done on the range [start, end). This is what I figured out by looking at the prototypes and existing implementations when I did the armv7 work and I have stuck to this standard.
Hong also seems to be following the same standard.
If there is no objection, I shall add this definition to the README I am adding.
Maybe its arch specific, I just saw this in another thread, thats why I asked:
+++ b/arch/mips/cpu/mips32/cpu.c @@ -52,6 +52,11 @@ int do_reset(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[])
void flush_cache(ulong start_addr, ulong size)
Best Regards, Reinhard

Hi Reinhard,
On Monday 08 August 2011 03:55 PM, Reinhard Meyer wrote:
Hi Aneesh,
On Monday 08 August 2011 03:29 PM, Reinhard Meyer wrote:
Dear Albert, Aneesh, Hong,
There seem to be functions of type
xxx(start, end) and xxx(start, size).
Can't it be somehow decided to use only one variant in all cases (flush, invalidate)?
The u-boot standard seems to be xxx(start, end) where the operation will be done on the range [start, end). This is what I figured out by looking at the prototypes and existing implementations when I did the armv7 work and I have stuck to this standard.
Hong also seems to be following the same standard.
If there is no objection, I shall add this definition to the README I am adding.
Maybe its arch specific, I just saw this in another thread, thats why I asked:
+++ b/arch/mips/cpu/mips32/cpu.c @@ -52,6 +52,11 @@ int do_reset(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[])
void flush_cache(ulong start_addr, ulong size)
I think the confusion about flush_cache() is because in include/common.h the prototype for flush_cache() doesn't have names for the paramaeters. It's like this:
void flush_cache (unsigned long, unsigned long);
However, the invalidate and flush range functions are clearly defined:
void flush_dcache_range(unsigned long start, unsigned long stop); void invalidate_dcache_range(unsigned long start, unsigned long stop);
I don't know what to do about flush_cache() now that it seems to have conflicting implementations.
best regards, Aneesh

Hi Aneesh,
On 08/08/2011 12:27, Aneesh V wrote:
Hi Reinhard,
On Monday 08 August 2011 03:55 PM, Reinhard Meyer wrote:
Hi Aneesh,
On Monday 08 August 2011 03:29 PM, Reinhard Meyer wrote:
Dear Albert, Aneesh, Hong,
There seem to be functions of type
xxx(start, end) and xxx(start, size).
Can't it be somehow decided to use only one variant in all cases (flush, invalidate)?
The u-boot standard seems to be xxx(start, end) where the operation will be done on the range [start, end). This is what I figured out by looking at the prototypes and existing implementations when I did the armv7 work and I have stuck to this standard.
Hong also seems to be following the same standard.
If there is no objection, I shall add this definition to the README I am adding.
Maybe its arch specific, I just saw this in another thread, thats why I asked:
+++ b/arch/mips/cpu/mips32/cpu.c @@ -52,6 +52,11 @@ int do_reset(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[])
void flush_cache(ulong start_addr, ulong size)
I think the confusion about flush_cache() is because in include/common.h the prototype for flush_cache() doesn't have names for the paramaeters. It's like this:
void flush_cache (unsigned long, unsigned long);
However, the invalidate and flush range functions are clearly defined:
void flush_dcache_range(unsigned long start, unsigned long stop); void invalidate_dcache_range(unsigned long start, unsigned long stop);
I don't know what to do about flush_cache() now that it seems to have conflicting implementations.
My opinion is that global cache functions should not have arguments at all and should act on the full cache. Note that drivers should only use range versions, not global versions; the only places I see global cache actions occurring are initialization when invalidating the whole cache just before enabling it, and control transfer to an OS where the whole cache should be flushed, invalidated and then disabled before jumping to the OS.
So here, if any arch has cache functions that are non-global but don't have "range" in their names, that should be changed.
Also, any function with only parameter types have parameter names added (cosmetic patch welcome for anything of this kind in ARM).
best regards, Aneesh
Amicalement,

On 8/5/2011 4:51 AM, Aneesh V wrote:
Hi Albert,
On Friday 05 August 2011 04:33 PM, Albert ARIBAUD wrote:
Hi Aneesh,
On 05/08/2011 12:47, Aneesh V wrote:
Hi Eric,
On Friday 05 August 2011 04:03 PM, Hong Xu wrote:
Hi Aneesh,
[snip ..]
IMHO, Hong's approach is correct. If the buffer that is invalidated is not aligned to cache-line, one cache-line at the respective boundary may have to be flushed to make sure the invalidation doesn't affect somebody else's memory.
The solution is for drivers to ensure that any buffer that needs to be invalidated is aligned to cache-line boundary at both ends. The above approach puts this onus on the driver. I have documented the alignment requirement in my recent patch series for fixing arm cache problems.
I have not noticed the patch series. ;-) If we put the alignment burden to the driver, I'm afraid many drivers which make use of something like a DMA controller have to modify the code heavily. This sounds not good. :)
We have a fundamental problem when it comes to invalidating an un-aligned buffer. Either you flush the boundary lines and corrupt your buffer at boundaries OR you invalidate without flushing and corrupt memory around your buffer. Both are not good! The only real solution is to have aligned buffers, if you want to have D-cache enabled and do DMA at the same time.
Plus, there should not be *heavy* modifications; DMA engines tend to use essentially two types of memory-resident objects: data buffers and buffer descriptors. There's only a small handful of places in the driver code to look at to find where these objects are allocated and how.
So I stand by my opinion: since the cache invalidation routine should only be called with cache-aligned objects, there is no requirement to flush the first (resp. last) cache line in case of unaligned start (resp.stop), and I don't want cache operations performed when they are not required.
IMHO, flushing is better, because the person who commits the mistake of invalidating the un-aligned buffer is the one who is affected and is likely to fix the issue soon. If we didn't flush, the resulting corruption will cause totally random errors that will be hard to debug. Doing an extra flush operation for a maximum of 2 lines doesn't cost us anything. This is the approach followed by the kernel too.
Hi All, I am interested in this last statement in particular, that Linux allows non-cache aligned buffers for DMA. In a previous discussion series, we demonstrated why it was IMPOSSIBLE for a non-cache aligned DMA buffer to work in conjunction with a "write back" type cache. To be clear, by write-back, we mean a cache system that has a single dirty bit per cache line, and that if there are stores to any addresses in that line, the ENTIRE LINE will be written back into memory, not just the changed data. I also seem to recall that the ARM V7 caches were defined as write back, but I am not an ARM person so I don't know for sure what kind of cache we are talking about here. If it is write-back, there is only one possible solution that always works. Write-through will work with un-aligned buffers if the correct flushes and invalidates are present. In that case, buffer alignment is not so important. However, if the same driver is to be used in both cases, it must use cache-aligned buffers only.
Best Regards, Bill Campbell
br, Aneesh _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

Hi Bill,
Le 06/08/2011 01:04, J. William Campbell a écrit :
Hi All, I am interested in this last statement in particular, that Linux allows non-cache aligned buffers for DMA. In a previous discussion series, we demonstrated why it was IMPOSSIBLE for a non-cache aligned DMA buffer to work in conjunction with a "write back" type cache. To be clear, by write-back, we mean a cache system that has a single dirty bit per cache line, and that if there are stores to any addresses in that line, the ENTIRE LINE will be written back into memory, not just the changed data. I also seem to recall that the ARM V7 caches were defined as write back, but I am not an ARM person so I don't know for sure what kind of cache we are talking about here. If it is write-back, there is only one possible solution that always works. Write-through will work with un-aligned buffers if the correct flushes and invalidates are present. In that case, buffer alignment is not so important. However, if the same driver is to be used in both cases, it must use cache-aligned buffers only.
The type of caches in ARM vary greatly, can be configurable at runtime, with the ability to have settings per regions for some implementations, so we could use either write-through or write-back, even in a single run.
I don't like however the idea of solving the issue discussed here by adding constraints on the type of caching, because it has other effects than helping work in an out-of-spec case: it lowers DMA buffer write throughput, it adds complexity in cache configuration on architectures which can handle cache regions, and on those which cannot, it causes a general RAM write performance hit -- all this for an out-of-spec case which is not really complicated to turn into an in-spec case.
Best Regards, Bill Campbell
Amicalement,
participants (5)
-
Albert ARIBAUD
-
Aneesh V
-
Hong Xu
-
J. William Campbell
-
Reinhard Meyer