Re: [U-Boot] [Resend RFC PATCH 1/2] armv8: Fix dcache disable function

On 10/26/2016 02:02 PM, york.sun@nxp.com wrote:
I came back from my testing and I have more questions than answers.
For _this_ patch, I proposed to flush cache before disabling them, noting once the dcache is disabled, the staled data in dirty cache is not visible to the core. My argument was if we flush L1/L2, they could end up in L3 (I don't know for sure). If I want to skip flushing L3, I have to fix it.
During this discussion, I thought I made a mistake by flushing L1/L2 by way/set first, then flushing by VA. Actually I didn't. I flushed by VA first.
With my today's test, the baseline (working in the sense of booting Linux) is
PATCH 1/2 armv8: Fix dcache disable function PATCH 2/2 armv8: Fix flush_dcache_all function
With these two patches, I flush the stack up to top of U-Boot by VA, followed by flush by set/way. L3 is not flushed. Then d-cache is disabled. I know this is not a real "flush all" procedure. With this modified procedure, I can continue to boot Linux.
If I revert patch 1, i.e. to disable dcache before flushing, I can see the data is not visible from the core (debug with JTAG tool). My hope was the staled data should be flushed to main memory if flushed by VA. That's not the case. The main memory doesn't have the correct data. So my new question is, why flushing by VA doesn't flush the data to main memory? Do I need to flush the cache while cache is enabled?
Guys,
I think I found the root cause of my data loss.
Current code disables D-cache and MMU before flushing the cache. I think the problem is turning off MMU. MMU should stay on when we flush D-cache. We can turn it off after the flushing. Once I make this change, I can see the correct data in memory after flushing (by VA).
Do you agree we should leave MMU on during flushing?
York

On 10/28/2016 11:38 AM, york sun wrote:
On 10/26/2016 02:02 PM, york.sun@nxp.com wrote:
I came back from my testing and I have more questions than answers.
For _this_ patch, I proposed to flush cache before disabling them, noting once the dcache is disabled, the staled data in dirty cache is not visible to the core. My argument was if we flush L1/L2, they could end up in L3 (I don't know for sure). If I want to skip flushing L3, I have to fix it.
During this discussion, I thought I made a mistake by flushing L1/L2 by way/set first, then flushing by VA. Actually I didn't. I flushed by VA first.
With my today's test, the baseline (working in the sense of booting Linux) is
PATCH 1/2 armv8: Fix dcache disable function PATCH 2/2 armv8: Fix flush_dcache_all function
With these two patches, I flush the stack up to top of U-Boot by VA, followed by flush by set/way. L3 is not flushed. Then d-cache is disabled. I know this is not a real "flush all" procedure. With this modified procedure, I can continue to boot Linux.
If I revert patch 1, i.e. to disable dcache before flushing, I can see the data is not visible from the core (debug with JTAG tool). My hope was the staled data should be flushed to main memory if flushed by VA. That's not the case. The main memory doesn't have the correct data. So my new question is, why flushing by VA doesn't flush the data to main memory? Do I need to flush the cache while cache is enabled?
Guys,
I think I found the root cause of my data loss.
Current code disables D-cache and MMU before flushing the cache. I think the problem is turning off MMU. MMU should stay on when we flush D-cache. We can turn it off after the flushing. Once I make this change, I can see the correct data in memory after flushing (by VA).
Do you agree we should leave MMU on during flushing?
If you're "flushing" by VA, then I'm not surprised since the MMU is what defines the VA->PA mapping, and perhaps you have some physically tagged caches.
However, I believe U-Boot mainline currently "flushes" by set/way, which I wouldn't expect MMU status to influence at all.

On 10/28/2016 10:57 AM, Stephen Warren wrote:
On 10/28/2016 11:38 AM, york sun wrote:
On 10/26/2016 02:02 PM, york.sun@nxp.com wrote:
I came back from my testing and I have more questions than answers.
For _this_ patch, I proposed to flush cache before disabling them, noting once the dcache is disabled, the staled data in dirty cache is not visible to the core. My argument was if we flush L1/L2, they could end up in L3 (I don't know for sure). If I want to skip flushing L3, I have to fix it.
During this discussion, I thought I made a mistake by flushing L1/L2 by way/set first, then flushing by VA. Actually I didn't. I flushed by VA first.
With my today's test, the baseline (working in the sense of booting Linux) is
PATCH 1/2 armv8: Fix dcache disable function PATCH 2/2 armv8: Fix flush_dcache_all function
With these two patches, I flush the stack up to top of U-Boot by VA, followed by flush by set/way. L3 is not flushed. Then d-cache is disabled. I know this is not a real "flush all" procedure. With this modified procedure, I can continue to boot Linux.
If I revert patch 1, i.e. to disable dcache before flushing, I can see the data is not visible from the core (debug with JTAG tool). My hope was the staled data should be flushed to main memory if flushed by VA. That's not the case. The main memory doesn't have the correct data. So my new question is, why flushing by VA doesn't flush the data to main memory? Do I need to flush the cache while cache is enabled?
Guys,
I think I found the root cause of my data loss.
Current code disables D-cache and MMU before flushing the cache. I think the problem is turning off MMU. MMU should stay on when we flush D-cache. We can turn it off after the flushing. Once I make this change, I can see the correct data in memory after flushing (by VA).
Do you agree we should leave MMU on during flushing?
If you're "flushing" by VA, then I'm not surprised since the MMU is what defines the VA->PA mapping, and perhaps you have some physically tagged caches.
However, I believe U-Boot mainline currently "flushes" by set/way, which I wouldn't expect MMU status to influence at all.
Flushing by set/way (only) is what I am trying to change. It would be better if we don't have to flush L3. Do you agree?
York

On 10/28/2016 12:17 PM, york sun wrote:
On 10/28/2016 10:57 AM, Stephen Warren wrote:
On 10/28/2016 11:38 AM, york sun wrote:
On 10/26/2016 02:02 PM, york.sun@nxp.com wrote:
I came back from my testing and I have more questions than answers.
For _this_ patch, I proposed to flush cache before disabling them, noting once the dcache is disabled, the staled data in dirty cache is not visible to the core. My argument was if we flush L1/L2, they could end up in L3 (I don't know for sure). If I want to skip flushing L3, I have to fix it.
During this discussion, I thought I made a mistake by flushing L1/L2 by way/set first, then flushing by VA. Actually I didn't. I flushed by VA first.
With my today's test, the baseline (working in the sense of booting Linux) is
PATCH 1/2 armv8: Fix dcache disable function PATCH 2/2 armv8: Fix flush_dcache_all function
With these two patches, I flush the stack up to top of U-Boot by VA, followed by flush by set/way. L3 is not flushed. Then d-cache is disabled. I know this is not a real "flush all" procedure. With this modified procedure, I can continue to boot Linux.
If I revert patch 1, i.e. to disable dcache before flushing, I can see the data is not visible from the core (debug with JTAG tool). My hope was the staled data should be flushed to main memory if flushed by VA. That's not the case. The main memory doesn't have the correct data. So my new question is, why flushing by VA doesn't flush the data to main memory? Do I need to flush the cache while cache is enabled?
Guys,
I think I found the root cause of my data loss.
Current code disables D-cache and MMU before flushing the cache. I think the problem is turning off MMU. MMU should stay on when we flush D-cache. We can turn it off after the flushing. Once I make this change, I can see the correct data in memory after flushing (by VA).
Do you agree we should leave MMU on during flushing?
If you're "flushing" by VA, then I'm not surprised since the MMU is what defines the VA->PA mapping, and perhaps you have some physically tagged caches.
However, I believe U-Boot mainline currently "flushes" by set/way, which I wouldn't expect MMU status to influence at all.
Flushing by set/way (only) is what I am trying to change. It would be better if we don't have to flush L3. Do you agree?
It depends on whether the L3 is before or after the Point of Coherency. If it's before, then it needs to be cleaned. If it's after, then I believe it's irrelevant and can be skipped. I don't believe there's any other factor that will allow/prevent you from skipping operations on your L3; there's no wiggle-room or leeway.
Related, consider the following from the Linux kernel's Documentation/arm64/booting.txt:
- Caches, MMUs The MMU must be off. Instruction cache may be on or off. The address range corresponding to the loaded kernel image must be cleaned to the PoC.
(That only applies to the kernel image specifically, but doing the same for the entire cache content seems reasonable, perhaps even required for other reasons?)

On 10/28/2016 11:32 AM, Stephen Warren wrote:
On 10/28/2016 12:17 PM, york sun wrote:
On 10/28/2016 10:57 AM, Stephen Warren wrote:
On 10/28/2016 11:38 AM, york sun wrote:
On 10/26/2016 02:02 PM, york.sun@nxp.com wrote:
I came back from my testing and I have more questions than answers.
For _this_ patch, I proposed to flush cache before disabling them, noting once the dcache is disabled, the staled data in dirty cache is not visible to the core. My argument was if we flush L1/L2, they could end up in L3 (I don't know for sure). If I want to skip flushing L3, I have to fix it.
During this discussion, I thought I made a mistake by flushing L1/L2 by way/set first, then flushing by VA. Actually I didn't. I flushed by VA first.
With my today's test, the baseline (working in the sense of booting Linux) is
PATCH 1/2 armv8: Fix dcache disable function PATCH 2/2 armv8: Fix flush_dcache_all function
With these two patches, I flush the stack up to top of U-Boot by VA, followed by flush by set/way. L3 is not flushed. Then d-cache is disabled. I know this is not a real "flush all" procedure. With this modified procedure, I can continue to boot Linux.
If I revert patch 1, i.e. to disable dcache before flushing, I can see the data is not visible from the core (debug with JTAG tool). My hope was the staled data should be flushed to main memory if flushed by VA. That's not the case. The main memory doesn't have the correct data. So my new question is, why flushing by VA doesn't flush the data to main memory? Do I need to flush the cache while cache is enabled?
Guys,
I think I found the root cause of my data loss.
Current code disables D-cache and MMU before flushing the cache. I think the problem is turning off MMU. MMU should stay on when we flush D-cache. We can turn it off after the flushing. Once I make this change, I can see the correct data in memory after flushing (by VA).
Do you agree we should leave MMU on during flushing?
If you're "flushing" by VA, then I'm not surprised since the MMU is what defines the VA->PA mapping, and perhaps you have some physically tagged caches.
However, I believe U-Boot mainline currently "flushes" by set/way, which I wouldn't expect MMU status to influence at all.
Flushing by set/way (only) is what I am trying to change. It would be better if we don't have to flush L3. Do you agree?
It depends on whether the L3 is before or after the Point of Coherency. If it's before, then it needs to be cleaned. If it's after, then I believe it's irrelevant and can be skipped. I don't believe there's any other factor that will allow/prevent you from skipping operations on your L3; there's no wiggle-room or leeway.
As Mark pointed, out my L3 is before PoC. Flushing by set/way only cleans L1/L2 cache. If not flushing L3, or flushing by VA, my stack is corrupted.
Related, consider the following from the Linux kernel's Documentation/arm64/booting.txt:
- Caches, MMUs The MMU must be off. Instruction cache may be on or off. The address range corresponding to the loaded kernel image must be cleaned to the PoC.
(That only applies to the kernel image specifically, but doing the same for the entire cache content seems reasonable, perhaps even required for other reasons?)
Booting Linux is not an issue here. The kernel image is flushed by VA.
I am struggling on the dcache_disable() which implies all dcache is flushed. I don't have a reasonable way to flush all if I want to skip L3. I tried to benchmark flushing by VA to cover my entire 16GB memory. It took 30+ seconds. On the other side, flushing by set/way and flushing L3 together took 7 ms. If I only flush U-Boot stack in this function, it can run really fast, but that defeats the purpose of flush all cache.
I thought of parsing each set/way to find the address of each cache line (I don't know how to do that yet), but the tag only contains physical address not VA.
The ARM document shows example code to clean entire data or unified cache to PoC, very similar to the code we have in U-Boot armv8/cache.S. Unless there are other cache maintenance instruction I am not aware of, I don't see how to flush to PoC by set/way.
At this point, I don't see a reasonable way to implement flush all dcache without flushing L3.
York

On Fri, Oct 28, 2016 at 09:35:37PM +0000, york sun wrote:
I am struggling on the dcache_disable() which implies all dcache is flushed. I don't have a reasonable way to flush all if I want to skip L3. I tried to benchmark flushing by VA to cover my entire 16GB memory. It took 30+ seconds. On the other side, flushing by set/way and flushing L3 together took 7 ms. If I only flush U-Boot stack in this function, it can run really fast, but that defeats the purpose of flush all cache.
I thought of parsing each set/way to find the address of each cache line (I don't know how to do that yet), but the tag only contains physical address not VA.
With the MMU off, translation is an idmap (i.e. VA == PA), so if you have physical addresses, you can use those directly.
That said, the presence and implementation of any mechanism to read addresses from the cache is IMPLEMENTATION DEFINED, so this will not be portable.
The ARM document shows example code to clean entire data or unified cache to PoC, very similar to the code we have in U-Boot armv8/cache.S.
Do you mean the "Example code for cache maintenance instructions"?
In recent versions of the ARM ARM there's a large note explaining why this only works in very restricted scenarios (and cannot be used to affect system caches such as your L3).
In the latest ARM ARM ("ARM DDI 0487A.k"), see page D3-1710.
Unless there are other cache maintenance instruction I am not aware of, I don't see how to flush to PoC by set/way.
Architecturally, Set/Way operations are not guaranteed to affect al caches prior to the PoC, and may require other IMPLEMENTATION DEFINED maintenance (e.g. MMIO control of system-level caches).
Thanks, Mark.

On 11/07/2016 06:12 AM, Mark Rutland wrote:
On Fri, Oct 28, 2016 at 09:35:37PM +0000, york sun wrote:
I am struggling on the dcache_disable() which implies all dcache is flushed. I don't have a reasonable way to flush all if I want to skip L3. I tried to benchmark flushing by VA to cover my entire 16GB memory. It took 30+ seconds. On the other side, flushing by set/way and flushing L3 together took 7 ms. If I only flush U-Boot stack in this function, it can run really fast, but that defeats the purpose of flush all cache.
I thought of parsing each set/way to find the address of each cache line (I don't know how to do that yet), but the tag only contains physical address not VA.
With the MMU off, translation is an idmap (i.e. VA == PA), so if you have physical addresses, you can use those directly.
That said, the presence and implementation of any mechanism to read addresses from the cache is IMPLEMENTATION DEFINED, so this will not be portable.
The ARM document shows example code to clean entire data or unified cache to PoC, very similar to the code we have in U-Boot armv8/cache.S.
Do you mean the "Example code for cache maintenance instructions"?
In recent versions of the ARM ARM there's a large note explaining why this only works in very restricted scenarios (and cannot be used to affect system caches such as your L3).
In the latest ARM ARM ("ARM DDI 0487A.k"), see page D3-1710.
Unless there are other cache maintenance instruction I am not aware of, I don't see how to flush to PoC by set/way.
Architecturally, Set/Way operations are not guaranteed to affect al caches prior to the PoC, and may require other IMPLEMENTATION DEFINED maintenance (e.g. MMIO control of system-level caches).
At this point, seeking alternative ways to clean entire cache without flushing L3 seems non-productive. I am going to stop here. Thanks for the discussion.
York

On Fri, Oct 28, 2016 at 12:32:36PM -0600, Stephen Warren wrote:
Related, consider the following from the Linux kernel's Documentation/arm64/booting.txt:
- Caches, MMUs
The MMU must be off. Instruction cache may be on or off. The address range corresponding to the loaded kernel image must be cleaned to the PoC.
(That only applies to the kernel image specifically, but doing the same for the entire cache content seems reasonable, perhaps even required for other reasons?)
It's certainly preferable.
The wording is somewhat poor too, and needs soem fixing up.
If anything has been allocated into the cache which may conflict with later use with Normal Inner-Shareable Inner-WB Outer-WB mappings, thise needs to be (Cleaned+)Invalidated from the caches.
Thanks, Mark.
participants (3)
-
Mark Rutland
-
Stephen Warren
-
york sun