[U-Boot] [PATCH 0/4] USB and cache related fixes

Hey all,
In commit b8adb12 the cache flushing behavior was changed for the EHCI stack. This change showed a few different problems on TI platforms (where our cacheline size is 64 not 32). First, the dcache_off call that ehci-omap had been doing was now not happening soon enough to paper over the cache issues. This call is removed in patch 1. Second, when we have dcache support compiled in but turned off via 'dcache off' the cache routines spam the console about alignment issues when a cache flush is attempted. This is a problem in that it makes operations extremely slow (as we're spending all our time spitting messages to console). The second patch makes the flush routines return when the dcache is off. The last two patches deal with the same problem, for EHCI and for MUSB. The USB spec says that 32 bytes is the minimum alignment but we need larger alignment when the cache is larger. Note that we can't use MAX() here as gcc doesn't allow that expansion inside of align(..).
Tested on omap3_beagle (which was previously broken) and a MAKEALL -a arm looks good too.

This has never been completely sufficient and now happens too late to paper over the cache coherency problems with the current USB stack.
Signed-off-by: Tom Rini trini@ti.com --- drivers/usb/host/ehci-omap.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/usb/host/ehci-omap.c b/drivers/usb/host/ehci-omap.c index 1ed7710..292673b 100644 --- a/drivers/usb/host/ehci-omap.c +++ b/drivers/usb/host/ehci-omap.c @@ -246,7 +246,6 @@ int omap_ehci_hcd_init(struct omap_usbhs_board_data *usbhs_pdata) if (is_ehci_phy_mode(usbhs_pdata->port_mode[i])) omap_ehci_soft_phy_reset(i);
- dcache_disable(); hccr = (struct ehci_hccr *)(OMAP_EHCI_BASE); hcor = (struct ehci_hcor *)(OMAP_EHCI_BASE + 0x10);

Dear Tom Rini,
This has never been completely sufficient and now happens too late to paper over the cache coherency problems with the current USB stack.
Poor USB maintainer isn't CCed :'-(
Signed-off-by: Tom Rini trini@ti.com
But this is always a good thing to see.
Acked-by: Marek Vasut marex@denx.de
drivers/usb/host/ehci-omap.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/usb/host/ehci-omap.c b/drivers/usb/host/ehci-omap.c index 1ed7710..292673b 100644 --- a/drivers/usb/host/ehci-omap.c +++ b/drivers/usb/host/ehci-omap.c @@ -246,7 +246,6 @@ int omap_ehci_hcd_init(struct omap_usbhs_board_data *usbhs_pdata) if (is_ehci_phy_mode(usbhs_pdata->port_mode[i])) omap_ehci_soft_phy_reset(i);
- dcache_disable(); hccr = (struct ehci_hccr *)(OMAP_EHCI_BASE); hcor = (struct ehci_hcor *)(OMAP_EHCI_BASE + 0x10);
Best regards, Marek Vasut

On 06/14/2012 02:57 PM, Marek Vasut wrote:
Dear Tom Rini,
This has never been completely sufficient and now happens too late to paper over the cache coherency problems with the current USB stack.
Poor USB maintainer isn't CCed :'-(
Whoops, forgot. I don't know why I thought it was you :)
Signed-off-by: Tom Rini trini@ti.com
But this is always a good thing to see.
Acked-by: Marek Vasut marex@denx.de
drivers/usb/host/ehci-omap.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/usb/host/ehci-omap.c b/drivers/usb/host/ehci-omap.c index 1ed7710..292673b 100644 --- a/drivers/usb/host/ehci-omap.c +++ b/drivers/usb/host/ehci-omap.c @@ -246,7 +246,6 @@ int omap_ehci_hcd_init(struct omap_usbhs_board_data *usbhs_pdata) if (is_ehci_phy_mode(usbhs_pdata->port_mode[i])) omap_ehci_soft_phy_reset(i);
- dcache_disable(); hccr = (struct ehci_hccr *)(OMAP_EHCI_BASE); hcor = (struct ehci_hcor *)(OMAP_EHCI_BASE + 0x10);
Best regards, Marek Vasut

If we are built with D-CACHE enabled but have run 'dcache off' and then attempt to flush unaligned regions we spam the console with problems that aren't true (as the cache was off).
Signed-off-by: Tom Rini trini@ti.com --- arch/arm/cpu/armv7/cache_v7.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/arch/arm/cpu/armv7/cache_v7.c b/arch/arm/cpu/armv7/cache_v7.c index 1b4e808..1c0f5b0 100644 --- a/arch/arm/cpu/armv7/cache_v7.c +++ b/arch/arm/cpu/armv7/cache_v7.c @@ -250,6 +250,9 @@ static void v7_inval_tlb(void)
void invalidate_dcache_all(void) { + if (!dcache_status()) + return; + v7_maint_dcache_all(ARMV7_DCACHE_INVAL_ALL);
v7_outer_cache_inval_all(); @@ -261,6 +264,9 @@ void invalidate_dcache_all(void) */ void flush_dcache_all(void) { + if (!dcache_status()) + return; + v7_maint_dcache_all(ARMV7_DCACHE_CLEAN_INVAL_ALL);
v7_outer_cache_flush_all(); @@ -272,6 +278,8 @@ void flush_dcache_all(void) */ void invalidate_dcache_range(unsigned long start, unsigned long stop) { + if (!dcache_status()) + return;
v7_dcache_maint_range(start, stop, ARMV7_DCACHE_INVAL_RANGE);
@@ -285,6 +293,9 @@ void invalidate_dcache_range(unsigned long start, unsigned long stop) */ void flush_dcache_range(unsigned long start, unsigned long stop) { + if (!dcache_status()) + return; + v7_dcache_maint_range(start, stop, ARMV7_DCACHE_CLEAN_INVAL_RANGE);
v7_outer_cache_flush_range(start, stop);

Dear Tom Rini,
If we are built with D-CACHE enabled but have run 'dcache off' and then attempt to flush unaligned regions we spam the console with problems that aren't true (as the cache was off).
Signed-off-by: Tom Rini trini@ti.com
arch/arm/cpu/armv7/cache_v7.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/arch/arm/cpu/armv7/cache_v7.c b/arch/arm/cpu/armv7/cache_v7.c index 1b4e808..1c0f5b0 100644 --- a/arch/arm/cpu/armv7/cache_v7.c +++ b/arch/arm/cpu/armv7/cache_v7.c @@ -250,6 +250,9 @@ static void v7_inval_tlb(void)
void invalidate_dcache_all(void) {
- if (!dcache_status())
return;
Will this get optimized out of the dcache is disabled altogether in uboot config?
btw this is 20% cooler in 10 seconds flat! https://plus.google.com/102150693225130002912/posts/9gntjh57dXt
v7_maint_dcache_all(ARMV7_DCACHE_INVAL_ALL);
v7_outer_cache_inval_all();
[...]
Best regards, Marek Vasut

On 06/14/2012 03:00 PM, Marek Vasut wrote:
Dear Tom Rini,
If we are built with D-CACHE enabled but have run 'dcache off' and then attempt to flush unaligned regions we spam the console with problems that aren't true (as the cache was off).
Signed-off-by: Tom Rini trini@ti.com
arch/arm/cpu/armv7/cache_v7.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/arch/arm/cpu/armv7/cache_v7.c b/arch/arm/cpu/armv7/cache_v7.c index 1b4e808..1c0f5b0 100644 --- a/arch/arm/cpu/armv7/cache_v7.c +++ b/arch/arm/cpu/armv7/cache_v7.c @@ -250,6 +250,9 @@ static void v7_inval_tlb(void)
void invalidate_dcache_all(void) {
- if (!dcache_status())
return;
Will this get optimized out of the dcache is disabled altogether in uboot config?
That's another side of #ifs and that has empty functions.
btw this is 20% cooler in 10 seconds flat! https://plus.google.com/102150693225130002912/posts/9gntjh57dXt
Ha.

Dear Tom Rini,
On 06/14/2012 03:00 PM, Marek Vasut wrote:
Dear Tom Rini,
If we are built with D-CACHE enabled but have run 'dcache off' and then attempt to flush unaligned regions we spam the console with problems that aren't true (as the cache was off).
Signed-off-by: Tom Rini trini@ti.com
arch/arm/cpu/armv7/cache_v7.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/arch/arm/cpu/armv7/cache_v7.c b/arch/arm/cpu/armv7/cache_v7.c index 1b4e808..1c0f5b0 100644 --- a/arch/arm/cpu/armv7/cache_v7.c +++ b/arch/arm/cpu/armv7/cache_v7.c @@ -250,6 +250,9 @@ static void v7_inval_tlb(void)
void invalidate_dcache_all(void) {
- if (!dcache_status())
return;
Will this get optimized out of the dcache is disabled altogether in uboot config?
That's another side of #ifs and that has empty functions.
Ok, that's good :-)
btw this is 20% cooler in 10 seconds flat! https://plus.google.com/102150693225130002912/posts/9gntjh57dXt
Ha.
Best regards, Marek Vasut

Hi Tom,
On Fri, Jun 15, 2012 at 12:31 AM, Tom Rini trini@ti.com wrote:
If we are built with D-CACHE enabled but have run 'dcache off' and then attempt to flush unaligned regions we spam the console with problems that aren't true (as the cache was off).
Today we do cache maintenance operations after the dcache is turned off. One example is before jumping to kernel, we try to invalidate the caches, in cache turned off state. So with this patch those maintenance calls will do nothing, which is not correct.
If it is a problem with unaligned regions, then that is the only thing to be fixed right ?. Just trying to understand why this change is required ?
Thanks, Sricharan
Signed-off-by: Tom Rini trini@ti.com
arch/arm/cpu/armv7/cache_v7.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/arch/arm/cpu/armv7/cache_v7.c b/arch/arm/cpu/armv7/cache_v7.c index 1b4e808..1c0f5b0 100644 --- a/arch/arm/cpu/armv7/cache_v7.c +++ b/arch/arm/cpu/armv7/cache_v7.c @@ -250,6 +250,9 @@ static void v7_inval_tlb(void)
void invalidate_dcache_all(void) {
- if (!dcache_status())
- return;
v7_maint_dcache_all(ARMV7_DCACHE_INVAL_ALL);
v7_outer_cache_inval_all(); @@ -261,6 +264,9 @@ void invalidate_dcache_all(void) */ void flush_dcache_all(void) {
- if (!dcache_status())
- return;
v7_maint_dcache_all(ARMV7_DCACHE_CLEAN_INVAL_ALL);
v7_outer_cache_flush_all(); @@ -272,6 +278,8 @@ void flush_dcache_all(void) */ void invalidate_dcache_range(unsigned long start, unsigned long stop) {
- if (!dcache_status())
- return;
v7_dcache_maint_range(start, stop, ARMV7_DCACHE_INVAL_RANGE);
@@ -285,6 +293,9 @@ void invalidate_dcache_range(unsigned long start, unsigned long stop) */ void flush_dcache_range(unsigned long start, unsigned long stop) {
- if (!dcache_status())
- return;
v7_dcache_maint_range(start, stop, ARMV7_DCACHE_CLEAN_INVAL_RANGE);
v7_outer_cache_flush_range(start, stop);

On 06/14/2012 10:48 PM, R, Sricharan wrote:
Hi Tom,
On Fri, Jun 15, 2012 at 12:31 AM, Tom Rini trini@ti.com wrote:
If we are built with D-CACHE enabled but have run 'dcache off' and then attempt to flush unaligned regions we spam the console with problems that aren't true (as the cache was off).
Today we do cache maintenance operations after the dcache is turned off. One example is before jumping to kernel, we try to invalidate the caches, in cache turned off state. So with this patch those maintenance calls will do nothing, which is not correct.
Ah yes, But, shouldn't we be doing these same operations as part of turning the cache off?
If it is a problem with unaligned regions, then that is the only thing to be fixed right ?. Just trying to understand why this change is required ?
The problem is that within the USB/network/filesystem stacks we have a lot of not cache safe alignments apparently. Without this every '#' of a tftp gives a screen full of error printfs. So tftp'ing a kernel takes minutes, not seconds, to complete.

Dear Tom Rini,
On 06/14/2012 10:48 PM, R, Sricharan wrote:
Hi Tom,
On Fri, Jun 15, 2012 at 12:31 AM, Tom Rini trini@ti.com wrote:
If we are built with D-CACHE enabled but have run 'dcache off' and then attempt to flush unaligned regions we spam the console with problems that aren't true (as the cache was off).
Today we do cache maintenance operations after the dcache is turned off. One example is before jumping to kernel, we try to invalidate the caches, in cache turned off state. So with this patch those maintenance calls will do nothing, which is not correct.
Ah yes, But, shouldn't we be doing these same operations as part of turning the cache off?
If it is a problem with unaligned regions, then that is the only
thing to be fixed
right ?. Just trying to understand why this change is required ?
The problem is that within the USB/network/filesystem stacks we have a lot of not cache safe alignments apparently. Without this every '#' of a tftp gives a screen full of error printfs. So tftp'ing a kernel takes minutes, not seconds, to complete.
I think we should augment uboot to disallow tftp, fatload etc. to cache- unaligned address when caches are on ... this'd squash away the need for any shitty bounce buffers right away. Tom, will you be able to implement it, pretty please?
Of course, this doesn't squash the need for fixing FS code etc.
Best regards, Marek Vasut

On 06/15/2012 07:25 AM, Marek Vasut wrote:
Dear Tom Rini,
On 06/14/2012 10:48 PM, R, Sricharan wrote:
Hi Tom,
On Fri, Jun 15, 2012 at 12:31 AM, Tom Rini trini@ti.com wrote:
If we are built with D-CACHE enabled but have run 'dcache off' and then attempt to flush unaligned regions we spam the console with problems that aren't true (as the cache was off).
Today we do cache maintenance operations after the dcache is turned off. One example is before jumping to kernel, we try to invalidate the caches, in cache turned off state. So with this patch those maintenance calls will do nothing, which is not correct.
Ah yes, But, shouldn't we be doing these same operations as part of turning the cache off?
If it is a problem with unaligned regions, then that is the only
thing to be fixed
right ?. Just trying to understand why this change is required ?
The problem is that within the USB/network/filesystem stacks we have a lot of not cache safe alignments apparently. Without this every '#' of a tftp gives a screen full of error printfs. So tftp'ing a kernel takes minutes, not seconds, to complete.
I think we should augment uboot to disallow tftp, fatload etc. to cache- unaligned address when caches are on ... this'd squash away the need for any shitty bounce buffers right away. Tom, will you be able to implement it, pretty please?
But that's not the problem. The problem is all of the stuff going on within the USB/networking stack.

Dear Tom Rini,
On 06/15/2012 07:25 AM, Marek Vasut wrote:
Dear Tom Rini,
On 06/14/2012 10:48 PM, R, Sricharan wrote:
Hi Tom,
On Fri, Jun 15, 2012 at 12:31 AM, Tom Rini trini@ti.com wrote:
If we are built with D-CACHE enabled but have run 'dcache off' and then attempt to flush unaligned regions we spam the console with problems that aren't true (as the cache was off).
Today we do cache maintenance operations after the dcache is turned off. One example is before jumping to kernel, we try to invalidate the caches, in cache turned off state. So with this patch those maintenance calls will do nothing, which is not correct.
Ah yes, But, shouldn't we be doing these same operations as part of turning the cache off?
If it is a problem with unaligned regions, then that is the only
thing to be fixed
right ?. Just trying to understand why this change is required ?
The problem is that within the USB/network/filesystem stacks we have a lot of not cache safe alignments apparently. Without this every '#' of a tftp gives a screen full of error printfs. So tftp'ing a kernel takes minutes, not seconds, to complete.
I think we should augment uboot to disallow tftp, fatload etc. to cache- unaligned address when caches are on ... this'd squash away the need for any shitty bounce buffers right away. Tom, will you be able to implement it, pretty please?
But that's not the problem. The problem is all of the stuff going on within the USB/networking stack.
Correct. This will shield you from the users, I think it's worth to implement like that asap anyway. Or do we want a bounce buffer? I doubt that.
The other part is to fix the internals of u-boot. This will be the harder part.
Best regards, Marek Vasut

Hi,
On Fri, Jun 15, 2012 at 12:31 AM, Tom Rini trini@ti.com wrote:
If we are built with D-CACHE enabled but have run 'dcache off' and then attempt to flush unaligned regions we spam the console with problems that aren't true (as the cache was off).
Today we do cache maintenance operations after the dcache is turned off. One example is before jumping to kernel, we try to invalidate the caches, in cache turned off state. So with this patch those maintenance calls will do nothing, which is not correct.
Ah yes, But, shouldn't we be doing these same operations as part of turning the cache off?
The problem is that while turning of dcaches, we flush it, and turn cache and MMU off. But these operations are not happening automatically in a single call. So there is a chance of valid entries present in cache even after it is OFF.
So we invalidate it before jumping to kernel.
In fact our procedure of turning off the caches and MMU should have a relook, to really ensure things are safe. I will check on this more.
If it is a problem with unaligned regions, then that is the only thing to be fixed right ?. Just trying to understand why this change is required ?
The problem is that within the USB/network/filesystem stacks we have a lot of not cache safe alignments apparently. Without this every '#' of a tftp gives a screen full of error printfs. So tftp'ing a kernel takes minutes, not seconds, to complete.
Ok,but when buffers are not aligned isn't that you are going to have coherency problems ?
Thanks, Sricharan

On 06/15/2012 07:48 AM, R, Sricharan wrote:
Hi,
On Fri, Jun 15, 2012 at 12:31 AM, Tom Rini trini@ti.com wrote:
If we are built with D-CACHE enabled but have run 'dcache off' and then attempt to flush unaligned regions we spam the console with problems that aren't true (as the cache was off).
Today we do cache maintenance operations after the dcache is turned off. One example is before jumping to kernel, we try to invalidate the caches, in cache turned off state. So with this patch those maintenance calls will do nothing, which is not correct.
Ah yes, But, shouldn't we be doing these same operations as part of turning the cache off?
The problem is that while turning of dcaches, we flush it, and turn cache and MMU off. But these operations are not happening automatically in a single call. So there is a chance of valid entries present in cache even after it is OFF.
So we invalidate it before jumping to kernel.
In fact our procedure of turning off the caches and MMU should have a relook, to really ensure things are safe. I will check on this more.
OK, thanks.
If it is a problem with unaligned regions, then that is the only thing to be fixed right ?. Just trying to understand why this change is required ?
The problem is that within the USB/network/filesystem stacks we have a lot of not cache safe alignments apparently. Without this every '#' of a tftp gives a screen full of error printfs. So tftp'ing a kernel takes minutes, not seconds, to complete.
Ok,but when buffers are not aligned isn't that you are going to have coherency problems ?
Exactly. USB requires dcache to be turned off today. With it turned off it spams these messages making tftp to an aligned address take minutes because the console is busy printing all of these messages about flushes it didn't need to do. The alternative is to build omap4/omap5/etc with dcache disabled or USB disabled.
I think the answer here is we need to make sure that when we switch the cache off (dcache_off()) we're always doing the kind of flushing that we had been doing for the kernel.

Hi, [snip..]
If it is a problem with unaligned regions, then that is the only thing to be fixed right ?. Just trying to understand why this change is required ?
The problem is that within the USB/network/filesystem stacks we have a lot of not cache safe alignments apparently. Without this every '#' of a tftp gives a screen full of error printfs. So tftp'ing a kernel takes minutes, not seconds, to complete.
Ok,but when buffers are not aligned isn't that you are going to have coherency problems ?
Exactly. USB requires dcache to be turned off today. With it turned off it spams these messages making tftp to an aligned address take minutes because the console is busy printing all of these messages about flushes it didn't need to do. The alternative is to build omap4/omap5/etc with dcache disabled or USB disabled.
Ok, but who is calling cache maintenance in your case after turning off caches ?
I think the answer here is we need to make sure that when we switch the cache off (dcache_off()) we're always doing the kind of flushing that we had been doing for the kernel.
Yes, and also unaligned buffers will always be a problem. There should be a way to address that as well.
Thanks, Sricharan

On 06/15/2012 08:18 AM, R, Sricharan wrote:
Hi, [snip..]
If it is a problem with unaligned regions, then that is the only thing to be fixed right ?. Just trying to understand why this change is required ?
The problem is that within the USB/network/filesystem stacks we have a lot of not cache safe alignments apparently. Without this every '#' of a tftp gives a screen full of error printfs. So tftp'ing a kernel takes minutes, not seconds, to complete.
Ok,but when buffers are not aligned isn't that you are going to have coherency problems ?
Exactly. USB requires dcache to be turned off today. With it turned off it spams these messages making tftp to an aligned address take minutes because the console is busy printing all of these messages about flushes it didn't need to do. The alternative is to build omap4/omap5/etc with dcache disabled or USB disabled.
Ok, but who is calling cache maintenance in your case after
turning off caches ?
The USB stack currently.

Hi Tom,
On Fri, Jun 15, 2012 at 8:50 PM, Tom Rini trini@ti.com wrote:
On 06/15/2012 08:18 AM, R, Sricharan wrote:
Hi, [snip..]
If it is a problem with unaligned regions, then that is the only thing to be fixed right ?. Just trying to understand why this change is required ?
The problem is that within the USB/network/filesystem stacks we have a lot of not cache safe alignments apparently. Without this every '#' of a tftp gives a screen full of error printfs. So tftp'ing a kernel takes minutes, not seconds, to complete.
Ok,but when buffers are not aligned isn't that you are going to have coherency problems ?
Exactly. USB requires dcache to be turned off today. With it turned off it spams these messages making tftp to an aligned address take minutes because the console is busy printing all of these messages about flushes it didn't need to do. The alternative is to build omap4/omap5/etc with dcache disabled or USB disabled.
Ok, but who is calling cache maintenance in your case after turning off caches ?
The USB stack currently.
Ok, in that case if 1) Alignment issue has to be fixed USB stack, (or)
2) Better to turn off caches when say USB-EHCI is enabled ? (or)
3) Turn of caches until cache library is fixed so that we will not require any maintenance after turning it off. ( I will work on this some time, this week)
I am not sure which one to do, but definitely we should not prevent it from executing as in the above patch. This may lead to other issues.
Thanks, Sricharan

On 06/18/2012 07:13 AM, R, Sricharan wrote:
Hi Tom,
On Fri, Jun 15, 2012 at 8:50 PM, Tom Rini trini@ti.com wrote:
On 06/15/2012 08:18 AM, R, Sricharan wrote:
Hi, [snip..]
> If it is a problem with unaligned regions, then that is the only > thing to be fixed > right ?. Just trying to understand why this change is required ?
The problem is that within the USB/network/filesystem stacks we have a lot of not cache safe alignments apparently. Without this every '#' of a tftp gives a screen full of error printfs. So tftp'ing a kernel takes minutes, not seconds, to complete.
Ok,but when buffers are not aligned isn't that you are going to have coherency problems ?
Exactly. USB requires dcache to be turned off today. With it turned off it spams these messages making tftp to an aligned address take minutes because the console is busy printing all of these messages about flushes it didn't need to do. The alternative is to build omap4/omap5/etc with dcache disabled or USB disabled.
Ok, but who is calling cache maintenance in your case after
turning off caches ?
The USB stack currently.
Ok, in that case if
- Alignment issue has to be fixed USB stack, (or)
This is the correct, long term solution.
- Better to turn off caches when say USB-EHCI is enabled ?
Doing that is what showed the issue to start with. Running 'dcache off' doesn't turn off attempted cache flushing.
(or)
- Turn of caches until cache library is fixed so that we will not require any maintenance after turning it off. ( I will work on this some time, this week)
This is also a good thing. It just needs to make sure attempted flushes don't spam the console, when cache is off :)
I am not sure which one to do, but definitely we should not prevent it from executing as in the above patch. This may lead to other issues.
OK. I'll see how badly boot time goes up when dcache is off to start with on a few boards.

Hi Sricharan,
On 06/15/2012 07:48 AM, R, Sricharan wrote:
Hi,
On Fri, Jun 15, 2012 at 12:31 AM, Tom Rinitrini@ti.com wrote:
If we are built with D-CACHE enabled but have run 'dcache off' and then attempt to flush unaligned regions we spam the console with problems that aren't true (as the cache was off).
Today we do cache maintenance operations after the dcache is turned off. One example is before jumping to kernel, we try to invalidate the caches, in cache turned off state. So with this patch those maintenance calls will do nothing, which is not correct.
Ah yes, But, shouldn't we be doing these same operations as part of turning the cache off?
The problem is that while turning of dcaches, we flush it, and turn cache and MMU off. But these operations are not happening automatically in a single call. So there is a chance of valid entries present in cache even after it is OFF.
I think this is what we need to fix. Otherwise, Tom's change looks good to me. How about an invalidate in dcache_disable() or something like that?
br, Aneesh

Hi, [snip..]
On 06/15/2012 07:48 AM, R, Sricharan wrote:
Hi,
On Fri, Jun 15, 2012 at 12:31 AM, Tom Rinitrini@ti.com wrote:
If we are built with D-CACHE enabled but have run 'dcache off' and
then
attempt to flush unaligned regions we spam the console with
problems
that aren't true (as the cache was off).
Today we do cache maintenance operations after the dcache is
turned off.
One example is before jumping to kernel, we try to invalidate
the caches,
in cache turned off state. So with this patch those maintenance
calls will
do nothing, which is not correct.
Ah yes, But, shouldn't we be doing these same operations as part of turning the cache off?
The problem is that while turning of dcaches, we flush it, and
turn
cache and MMU off. But these operations are not happening automatically in a single call. So there is a chance of valid entries present in cache even after it is OFF.
I think this is what we need to fix. Otherwise, Tom's change looks good to me. How about an invalidate in dcache_disable() or something like that?
Correct. So if the cleaning up sequence is fine, then ok. So there should be no need to check if caches are OFF inside the maintenance functions. Kernel does not looks like doing such checks. The real issue looks like we should have had assembly level functions to do the cleanup routines for flush-invalidate-turn OFF caches/MMU atomically.
Thanks, Sricharan

Hi Aneesh,
On Thu, Jun 21, 2012 at 2:55 PM, Sricharan R r.sricharan@ti.com wrote:
Hi, [snip..]
On 06/15/2012 07:48 AM, R, Sricharan wrote:
Hi,
On Fri, Jun 15, 2012 at 12:31 AM, Tom Rinitrini@ti.com wrote:
If we are built with D-CACHE enabled but have run 'dcache off' and
then
attempt to flush unaligned regions we spam the console with
problems
that aren't true (as the cache was off).
Today we do cache maintenance operations after the dcache is
turned off.
One example is before jumping to kernel, we try to invalidate
the caches,
in cache turned off state. So with this patch those maintenance
calls will
do nothing, which is not correct.
Ah yes, But, shouldn't we be doing these same operations as part of turning the cache off?
The problem is that while turning of dcaches, we flush it, and
turn
cache and MMU off. But these operations are not happening automatically in a single call. So there is a chance of valid entries present in cache even after it is OFF.
I think this is what we need to fix. Otherwise, Tom's change looks good to me. How about an invalidate in dcache_disable() or something like that?
Correct. So if the cleaning up sequence is fine, then ok. So there should be no need to check if caches are OFF inside the maintenance functions. Kernel does not looks like doing such checks. The real issue looks like we should have had assembly level functions to do the cleanup routines for flush-invalidate-turn OFF caches/MM atomically.
Actually, with something like speculation in the behind, only way of ensuring that we do not have any valid entries is to do a invalidate all, after turn of caches/mmu. So this means that we will have a maintenance after turning off.
Thanks, Sricharan

Sricharan,
On 06/21/2012 08:23 AM, R, Sricharan wrote:
Hi Aneesh,
On Thu, Jun 21, 2012 at 2:55 PM, Sricharan Rr.sricharan@ti.com wrote:
Hi, [snip..]
On 06/15/2012 07:48 AM, R, Sricharan wrote:
Hi,
On Fri, Jun 15, 2012 at 12:31 AM, Tom Rinitrini@ti.com wrote: > If we are built with D-CACHE enabled but have run 'dcache off' and
then
> attempt to flush unaligned regions we spam the console with
problems
> that aren't true (as the cache was off). > Today we do cache maintenance operations after the dcache is
turned off.
One example is before jumping to kernel, we try to invalidate
the caches,
in cache turned off state. So with this patch those maintenance
calls will
do nothing, which is not correct.
Ah yes, But, shouldn't we be doing these same operations as part of turning the cache off?
The problem is that while turning of dcaches, we flush it, and
turn
cache and MMU off. But these operations are not happening automatically in a single call. So there is a chance of valid entries present in cache even after it is OFF.
I think this is what we need to fix. Otherwise, Tom's change looks good to me. How about an invalidate in dcache_disable() or something like that?
Correct. So if the cleaning up sequence is fine, then ok. So there should be no need to check if caches are OFF inside the maintenance functions. Kernel does not looks like doing such checks. The real issue looks like we should have had assembly level functions to do the cleanup routines for flush-invalidate-turn OFF caches/MM atomically.
Actually, with something like speculation in the behind, only way of ensuring that we do not have any valid entries is to do a invalidate all, after turn of caches/mmu. So this means that we will have a maintenance after turning off.
Good point. But we need that invalidate just one last time after the disable, right? How about making the cache_status a variable rather than reading the C bit. And then disable_cache() shall be like this:
1. Flush all 2. Disable C bit 3. Invalidate all 4. cache_status = disabled
Maybe, not the best solution. But I can't think of anything better. Please note that after this point there won't be any valid entries in the cache per ARMv7 Architecture reference manual(section B2.2.2):
"If the cache is disabled, it is guaranteed that no new allocation of memory locations into the cache will occur."
br, Aneeesh

Aneesh, [snip..]
>> If we are built with D-CACHE enabled but have run 'dcache off' and
then
>> >> attempt to flush unaligned regions we spam the console with
problems
>> >> that aren't true (as the cache was off). >> > Today we do cache maintenance operations after the dcache is
turned off.
> > One example is before jumping to kernel, we try to invalidate
the caches,
> > in cache turned off state. So with this patch those maintenance
calls will
> > do nothing, which is not correct.
Ah yes, But, shouldn't we be doing these same operations as part of turning the cache off?
The problem is that while turning of dcaches, we flush it, and
turn
cache and MMU off. But these operations are not happening automatically in a single call. So there is a chance of valid entries present in cache even after it is OFF.
I think this is what we need to fix. Otherwise, Tom's change looks good to me. How about an invalidate in dcache_disable() or something like that?
Correct. So if the cleaning up sequence is fine, then ok. So there should be no need to check if caches are OFF inside the maintenance functions. Kernel does not looks like doing such checks. The real issue looks like we should have had assembly level functions to do the cleanup routines for flush-invalidate-turn OFF caches/MM atomically.
Actually, with something like speculation in the behind, only way of ensuring that we do not have any valid entries is to do a invalidate all, after turn of caches/mmu. So this means that we will have a maintenance after turning off.
Good point. But we need that invalidate just one last time after the disable, right? How about making the cache_status a variable rather than reading the C bit. And then disable_cache() shall be like this:
- Flush all
- Disable C bit
- Invalidate all
- cache_status = disabled
Maybe, not the best solution. But I can't think of anything better. Please note that after this point there won't be any valid entries in the cache per ARMv7 Architecture reference manual(section B2.2.2):
"If the cache is disabled, it is guaranteed that no new allocation of memory locations into the cache will occur."
I agree on this sequence. But the only thing is, is it good to turn off the caches runtime, because one driver cant do aligned access ? Will this then not become a custom, say any other driver which sees problem with cache, will then simply go ahead disable and proceed. Is that right ?. yes, having enabled caches we are missing things like UNCACHED allocations, IO etc, but that is going to make boot loaders more heavy, may be.
In the end, I am not sure which is the right decision to take here :)
Thanks, Sricharan

Hi Sricharan,
On 06/21/2012 02:25 AM, Sricharan R wrote:
Hi, [snip..]
On 06/15/2012 07:48 AM, R, Sricharan wrote:
Hi,
On Fri, Jun 15, 2012 at 12:31 AM, Tom Rinitrini@ti.com wrote:
If we are built with D-CACHE enabled but have run 'dcache off' and
then
attempt to flush unaligned regions we spam the console with
problems
that aren't true (as the cache was off).
Today we do cache maintenance operations after the dcache is
turned off.
One example is before jumping to kernel, we try to invalidate
the caches,
in cache turned off state. So with this patch those maintenance
calls will
do nothing, which is not correct.
Ah yes, But, shouldn't we be doing these same operations as part of turning the cache off?
The problem is that while turning of dcaches, we flush it, and
turn
cache and MMU off. But these operations are not happening automatically in a single call. So there is a chance of valid entries present in cache even after it is OFF.
I think this is what we need to fix. Otherwise, Tom's change looks good to me. How about an invalidate in dcache_disable() or something like that?
Correct. So if the cleaning up sequence is fine, then ok. So there should be no need to check if caches are OFF inside the maintenance functions. Kernel does not looks like doing such checks. The real issue looks like we should have had assembly level functions to do the cleanup routines for flush-invalidate-turn OFF caches/MMU atomically.
Sorry for the late reply. Yes, kernel doesn't do such checks. But kernel's implementation is different from u-boot implementation. In kernel the invalidate routine flushes unaligned lines in the end which, after a lot of discussions[1], we decided to avoid in u-boot. Instead we printed noisy error messages to alert the user. Now these messages will appear even if we are calling the maintenance functions after the disable and that's what Tom is trying to avoid.
br, Aneeesh
[1] http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/105113

The USB spec says that 32 bytes is the minimum required alignment. However on some platforms we have a larger minimum requirement for cache coherency. In those cases, use that value rather than the USB spec minimum.
Cc: Marek Vasut marex@denx.de Signed-off-by: Tom Rini trini@ti.com --- drivers/usb/host/ehci-hcd.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 04300be..45725f5 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -29,12 +29,23 @@
#include "ehci.h"
-int rootdev; -struct ehci_hccr *hccr; /* R/O registers, not need for volatile */ -volatile struct ehci_hcor *hcor; +/* + * The EHCI spec says that we must align to at least 32 bytes. However, + * some platforms require larger alignment. + */ +#if ARCH_DMA_MINALIGN > 32 +#define USB_DMA_MINALIGN ARCH_DMA_MINALIGN +#else +#define USB_DMA_MINALIGN 32 +#endif + +int rootdev __attribute__((aligned(USB_DMA_MINALIGN))); +/* R/O registers, not need for volatile */ +struct ehci_hccr *hccr __attribute__((aligned(USB_DMA_MINALIGN))); +volatile struct ehci_hcor *hcor __attribute__((aligned(USB_DMA_MINALIGN)));
static uint16_t portreset; -static struct QH qh_list __attribute__((aligned(32))); +static struct QH qh_list __attribute__((aligned(USB_DMA_MINALIGN)));
static struct descriptor { struct usb_hub_descriptor hub; @@ -207,8 +218,8 @@ static int ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, int length, struct devrequest *req) { - static struct QH qh __attribute__((aligned(32))); - static struct qTD qtd[3] __attribute__((aligned (32))); + static struct QH qh __attribute__((aligned(USB_DMA_MINALIGN))); + static struct qTD qtd[3] __attribute__((aligned(USB_DMA_MINALIGN))); int qtd_counter = 0;
volatile struct qTD *vtd;

Dear Tom Rini,
The USB spec says that 32 bytes is the minimum required alignment. However on some platforms we have a larger minimum requirement for cache coherency. In those cases, use that value rather than the USB spec minimum.
Cc: Marek Vasut marex@denx.de Signed-off-by: Tom Rini trini@ti.com
drivers/usb/host/ehci-hcd.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 04300be..45725f5 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -29,12 +29,23 @@
#include "ehci.h"
-int rootdev; -struct ehci_hccr *hccr; /* R/O registers, not need for volatile */ -volatile struct ehci_hcor *hcor; +/*
- The EHCI spec says that we must align to at least 32 bytes. However,
- some platforms require larger alignment.
- */
+#if ARCH_DMA_MINALIGN > 32 +#define USB_DMA_MINALIGN ARCH_DMA_MINALIGN +#else +#define USB_DMA_MINALIGN 32 +#endif
Don't we have some common header for these?
+int rootdev __attribute__((aligned(USB_DMA_MINALIGN))); +/* R/O registers, not need for volatile */ +struct ehci_hccr *hccr __attribute__((aligned(USB_DMA_MINALIGN))); +volatile struct ehci_hcor *hcor __attribute__((aligned(USB_DMA_MINALIGN)));
static uint16_t portreset; -static struct QH qh_list __attribute__((aligned(32))); +static struct QH qh_list __attribute__((aligned(USB_DMA_MINALIGN)));
static struct descriptor { struct usb_hub_descriptor hub; @@ -207,8 +218,8 @@ static int ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, int length, struct devrequest *req) {
- static struct QH qh __attribute__((aligned(32)));
- static struct qTD qtd[3] __attribute__((aligned (32)));
static struct QH qh __attribute__((aligned(USB_DMA_MINALIGN)));
static struct qTD qtd[3] __attribute__((aligned(USB_DMA_MINALIGN))); int qtd_counter = 0;
volatile struct qTD *vtd;
Best regards, Marek Vasut

On 06/14/2012 12:29 PM, Marek Vasut wrote:
Dear Tom Rini,
The USB spec says that 32 bytes is the minimum required alignment. However on some platforms we have a larger minimum requirement for cache coherency. In those cases, use that value rather than the USB spec minimum.
Cc: Marek Vasut marex@denx.de Signed-off-by: Tom Rini trini@ti.com
drivers/usb/host/ehci-hcd.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 04300be..45725f5 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -29,12 +29,23 @@
#include "ehci.h"
-int rootdev; -struct ehci_hccr *hccr; /* R/O registers, not need for volatile */ -volatile struct ehci_hcor *hcor; +/*
- The EHCI spec says that we must align to at least 32 bytes. However,
- some platforms require larger alignment.
- */
+#if ARCH_DMA_MINALIGN > 32 +#define USB_DMA_MINALIGN ARCH_DMA_MINALIGN +#else +#define USB_DMA_MINALIGN 32 +#endif
Don't we have some common header for these?
For ECHI and musb? I did not spot one unless we go all the way up to common.h or similar.

Dear Tom Rini,
On 06/14/2012 12:29 PM, Marek Vasut wrote:
Dear Tom Rini,
The USB spec says that 32 bytes is the minimum required alignment. However on some platforms we have a larger minimum requirement for cache coherency. In those cases, use that value rather than the USB spec minimum.
Cc: Marek Vasut marex@denx.de Signed-off-by: Tom Rini trini@ti.com
drivers/usb/host/ehci-hcd.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 04300be..45725f5 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -29,12 +29,23 @@
#include "ehci.h"
-int rootdev; -struct ehci_hccr *hccr; /* R/O registers, not need for volatile */ -volatile struct ehci_hcor *hcor; +/*
- The EHCI spec says that we must align to at least 32 bytes.
However, + * some platforms require larger alignment.
- */
+#if ARCH_DMA_MINALIGN > 32 +#define USB_DMA_MINALIGN ARCH_DMA_MINALIGN +#else +#define USB_DMA_MINALIGN 32 +#endif
Don't we have some common header for these?
For ECHI and musb? I did not spot one unless we go all the way up to common.h or similar.
Ok, that's crappy :-/
Don't we have ehci.h or usb.h? Is musb ehci or not?
Best regards, Marek Vasut

On 06/14/2012 12:41 PM, Marek Vasut wrote:
Dear Tom Rini,
On 06/14/2012 12:29 PM, Marek Vasut wrote:
Dear Tom Rini,
The USB spec says that 32 bytes is the minimum required alignment. However on some platforms we have a larger minimum requirement for cache coherency. In those cases, use that value rather than the USB spec minimum.
Cc: Marek Vasut marex@denx.de Signed-off-by: Tom Rini trini@ti.com
drivers/usb/host/ehci-hcd.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 04300be..45725f5 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -29,12 +29,23 @@
#include "ehci.h"
-int rootdev; -struct ehci_hccr *hccr; /* R/O registers, not need for volatile */ -volatile struct ehci_hcor *hcor; +/*
- The EHCI spec says that we must align to at least 32 bytes.
However, + * some platforms require larger alignment.
- */
+#if ARCH_DMA_MINALIGN > 32 +#define USB_DMA_MINALIGN ARCH_DMA_MINALIGN +#else +#define USB_DMA_MINALIGN 32 +#endif
Don't we have some common header for these?
For ECHI and musb? I did not spot one unless we go all the way up to common.h or similar.
Ok, that's crappy :-/
Don't we have ehci.h or usb.h? Is musb ehci or not?
MUSB is Mentor USB. But, good spotting, include/usb.h is in both ehci-hcd.c (and other places) and musb_core.h, moving there.

Dear Tom Rini,
On 06/14/2012 12:41 PM, Marek Vasut wrote:
Dear Tom Rini,
On 06/14/2012 12:29 PM, Marek Vasut wrote:
Dear Tom Rini,
The USB spec says that 32 bytes is the minimum required alignment. However on some platforms we have a larger minimum requirement for cache coherency. In those cases, use that value rather than the USB spec minimum.
Cc: Marek Vasut marex@denx.de Signed-off-by: Tom Rini trini@ti.com
drivers/usb/host/ehci-hcd.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 04300be..45725f5 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -29,12 +29,23 @@
#include "ehci.h"
-int rootdev; -struct ehci_hccr *hccr; /* R/O registers, not need for volatile */ -volatile struct ehci_hcor *hcor; +/*
- The EHCI spec says that we must align to at least 32 bytes.
However, + * some platforms require larger alignment.
- */
+#if ARCH_DMA_MINALIGN > 32 +#define USB_DMA_MINALIGN ARCH_DMA_MINALIGN +#else +#define USB_DMA_MINALIGN 32 +#endif
Don't we have some common header for these?
For ECHI and musb? I did not spot one unless we go all the way up to common.h or similar.
Ok, that's crappy :-/
Don't we have ehci.h or usb.h? Is musb ehci or not?
MUSB is Mentor USB. But, good spotting, include/usb.h is in both ehci-hcd.c (and other places) and musb_core.h, moving there.
Heh, my clairvoiance can now do git-grep-alike actions without touching the repo ... I have to praise myself :-)
Best regards, Marek Vasut

The USB spec says that 32 bytes is the minimum required alignment. However on some platforms we have a larger minimum requirement for cache coherency. In those cases, use that value rather than the USB spec minimum.
Cc: Marek Vasut marex@denx.de Signed-off-by: Tom Rini trini@ti.com --- drivers/usb/musb/musb_core.h | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/musb/musb_core.h b/drivers/usb/musb/musb_core.h index a8adcce..bd1ad82 100644 --- a/drivers/usb/musb/musb_core.h +++ b/drivers/usb/musb/musb_core.h @@ -74,6 +74,16 @@ struct musb_epN_regs { u8 fifosize; };
+/* + * The EHCI spec says that we must align to at least 32 bytes. However, + * some platforms require larger alignment. + */ +#if ARCH_DMA_MINALIGN > 32 +#define USB_DMA_MINALIGN ARCH_DMA_MINALIGN +#else +#define USB_DMA_MINALIGN 32 +#endif + /* Mentor USB core register overlay structure */ #ifndef musb_regs struct musb_regs { @@ -145,7 +155,7 @@ struct musb_regs { struct musb_epN_regs epN; } ep[16];
-} __attribute__((packed, aligned(32))); +} __attribute__((packed, aligned(USB_DMA_MINALIGN))); #endif
/*

Hey all,
In commit b8adb12 the cache flushing behavior was changed for the EHCI stack. This change showed a few different problems on TI platforms (where our cacheline size is 64 not 32). First, the dcache_off call that ehci-omap had been doing was now not happening soon enough to paper over the cache issues. This call is removed in patch 1. Second, when we have dcache support compiled in but turned off via 'dcache off' the cache routines spam the console about alignment issues when a cache flush is attempted. This is a problem in that it makes operations extremely slow (as we're spending all our time spitting messages to console). The second patch makes the flush routines return when the dcache is off. The last patch deal with the same problem, for EHCI and for MUSB. The USB spec says that 32 bytes is the minimum alignment but we need larger alignment when the cache is larger. Note that we can't use MAX() here as gcc doesn't allow that expansion inside of align(..).
Tested on omap3_beagle (which was previously broken) and a MAKEALL -a arm looks good too.
Changes in v2: - Condense last two patches into one that puts the test into <usb.h>

This has never been completely sufficient and now happens too late to paper over the cache coherency problems with the current USB stack.
Signed-off-by: Tom Rini trini@ti.com --- drivers/usb/host/ehci-omap.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/usb/host/ehci-omap.c b/drivers/usb/host/ehci-omap.c index 1ed7710..292673b 100644 --- a/drivers/usb/host/ehci-omap.c +++ b/drivers/usb/host/ehci-omap.c @@ -246,7 +246,6 @@ int omap_ehci_hcd_init(struct omap_usbhs_board_data *usbhs_pdata) if (is_ehci_phy_mode(usbhs_pdata->port_mode[i])) omap_ehci_soft_phy_reset(i);
- dcache_disable(); hccr = (struct ehci_hccr *)(OMAP_EHCI_BASE); hcor = (struct ehci_hcor *)(OMAP_EHCI_BASE + 0x10);

If we are built with D-CACHE enabled but have run 'dcache off' and then attempt to flush unaligned regions we spam the console with problems that aren't true (as the cache was off).
Signed-off-by: Tom Rini trini@ti.com --- arch/arm/cpu/armv7/cache_v7.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/arch/arm/cpu/armv7/cache_v7.c b/arch/arm/cpu/armv7/cache_v7.c index 1b4e808..1c0f5b0 100644 --- a/arch/arm/cpu/armv7/cache_v7.c +++ b/arch/arm/cpu/armv7/cache_v7.c @@ -250,6 +250,9 @@ static void v7_inval_tlb(void)
void invalidate_dcache_all(void) { + if (!dcache_status()) + return; + v7_maint_dcache_all(ARMV7_DCACHE_INVAL_ALL);
v7_outer_cache_inval_all(); @@ -261,6 +264,9 @@ void invalidate_dcache_all(void) */ void flush_dcache_all(void) { + if (!dcache_status()) + return; + v7_maint_dcache_all(ARMV7_DCACHE_CLEAN_INVAL_ALL);
v7_outer_cache_flush_all(); @@ -272,6 +278,8 @@ void flush_dcache_all(void) */ void invalidate_dcache_range(unsigned long start, unsigned long stop) { + if (!dcache_status()) + return;
v7_dcache_maint_range(start, stop, ARMV7_DCACHE_INVAL_RANGE);
@@ -285,6 +293,9 @@ void invalidate_dcache_range(unsigned long start, unsigned long stop) */ void flush_dcache_range(unsigned long start, unsigned long stop) { + if (!dcache_status()) + return; + v7_dcache_maint_range(start, stop, ARMV7_DCACHE_CLEAN_INVAL_RANGE);
v7_outer_cache_flush_range(start, stop);

The USB spec says that 32 bytes is the minimum required alignment. However on some platforms we have a larger minimum requirement for cache coherency. In those cases, use that value rather than the USB spec minimum. We add a cpp check to <usb.h> to define USB_DMA_MINALIGN and make use of it in ehci-hcd.c and musb_core.h. We cannot use MAX() here as we are not allowed to have tests inside of align(...).
Cc: Marek Vasut marex@denx.de Signed-off-by: Tom Rini trini@ti.com
-- Changes in v2: - Move test to <usb.h>, expand comment. --- drivers/usb/host/ehci-hcd.c | 13 +++++++------ drivers/usb/musb/musb_core.h | 2 +- include/usb.h | 10 ++++++++++ 3 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 04300be..5a86117 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -29,12 +29,13 @@
#include "ehci.h"
-int rootdev; -struct ehci_hccr *hccr; /* R/O registers, not need for volatile */ -volatile struct ehci_hcor *hcor; +int rootdev __attribute__((aligned(USB_DMA_MINALIGN))); +/* R/O registers, not need for volatile */ +struct ehci_hccr *hccr __attribute__((aligned(USB_DMA_MINALIGN))); +volatile struct ehci_hcor *hcor __attribute__((aligned(USB_DMA_MINALIGN)));
static uint16_t portreset; -static struct QH qh_list __attribute__((aligned(32))); +static struct QH qh_list __attribute__((aligned(USB_DMA_MINALIGN)));
static struct descriptor { struct usb_hub_descriptor hub; @@ -207,8 +208,8 @@ static int ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, int length, struct devrequest *req) { - static struct QH qh __attribute__((aligned(32))); - static struct qTD qtd[3] __attribute__((aligned (32))); + static struct QH qh __attribute__((aligned(USB_DMA_MINALIGN))); + static struct qTD qtd[3] __attribute__((aligned(USB_DMA_MINALIGN))); int qtd_counter = 0;
volatile struct qTD *vtd; diff --git a/drivers/usb/musb/musb_core.h b/drivers/usb/musb/musb_core.h index a8adcce..e914369 100644 --- a/drivers/usb/musb/musb_core.h +++ b/drivers/usb/musb/musb_core.h @@ -145,7 +145,7 @@ struct musb_regs { struct musb_epN_regs epN; } ep[16];
-} __attribute__((packed, aligned(32))); +} __attribute__((packed, aligned(USB_DMA_MINALIGN))); #endif
/* diff --git a/include/usb.h b/include/usb.h index 6da91e7..ba3d169 100644 --- a/include/usb.h +++ b/include/usb.h @@ -29,6 +29,16 @@ #include <usb_defs.h> #include <usbdescriptors.h>
+/* + * The EHCI spec says that we must align to at least 32 bytes. However, + * some platforms require larger alignment. + */ +#if ARCH_DMA_MINALIGN > 32 +#define USB_DMA_MINALIGN ARCH_DMA_MINALIGN +#else +#define USB_DMA_MINALIGN 32 +#endif + /* Everything is aribtrary */ #define USB_ALTSETTINGALLOC 4 #define USB_MAXALTSETTING 128 /* Hard limit */

Hey all,
In commit b8adb12 the cache flushing behavior was changed for the EHCI stack. This change showed a few different problems on TI platforms (where our cacheline size is 64 not 32). First, the dcache_off call that ehci-omap had been doing was now not happening soon enough to paper over the cache issues. This call is removed in patch 1. The second patch deal with the same problem in EHCI and MUSB. The USB spec says that 32 bytes is the minimum alignment but we need larger alignment when the cache is larger. Note that we can't use MAX() here as gcc doesn't allow that expansion inside of align(..). Patches 3 to 6 disable dcache support at build time on all platforms that enable CONFIG_USB_EHCI_OMAP because performing a run-time 'dcache off' operation leaves USB unusable due to the unaligned flushes that are still attempted. Run-time testing on an omap3_beagle shows a very slight (less than half a second, checked with grabserial) increase in load time for a 3.5-rc3 kernel image from SD card. Note that otherwise a tftp load takes minutes to complete rather than seconds due to all of the console spam.
Tested on omap3_beagle (which was previously broken) and a MAKEALL -a arm looks good too.
Changes in v3: - Drop the cache_v7.c change and instead make all CONFIG_USB_EHCI_OMAP boards disable DCACHE at build time.
Changes in v2: - Condense last two patches into one that puts the test into <usb.h>

This has never been completely sufficient and now happens too late to paper over the cache coherency problems with the current USB stack.
Cc: Marek Vasut marex@denx.de Signed-off-by: Tom Rini trini@ti.com --- drivers/usb/host/ehci-omap.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/usb/host/ehci-omap.c b/drivers/usb/host/ehci-omap.c index 1ed7710..292673b 100644 --- a/drivers/usb/host/ehci-omap.c +++ b/drivers/usb/host/ehci-omap.c @@ -246,7 +246,6 @@ int omap_ehci_hcd_init(struct omap_usbhs_board_data *usbhs_pdata) if (is_ehci_phy_mode(usbhs_pdata->port_mode[i])) omap_ehci_soft_phy_reset(i);
- dcache_disable(); hccr = (struct ehci_hccr *)(OMAP_EHCI_BASE); hcor = (struct ehci_hcor *)(OMAP_EHCI_BASE + 0x10);

The USB spec says that 32 bytes is the minimum required alignment. However on some platforms we have a larger minimum requirement for cache coherency. In those cases, use that value rather than the USB spec minimum. We add a cpp check to <usb.h> to define USB_DMA_MINALIGN and make use of it in ehci-hcd.c and musb_core.h. We cannot use MAX() here as we are not allowed to have tests inside of align(...).
Cc: Marek Vasut marex@denx.de Signed-off-by: Tom Rini trini@ti.com
-- Changes in v2: - Move test to <usb.h>, expand comment. --- drivers/usb/host/ehci-hcd.c | 13 +++++++------ drivers/usb/musb/musb_core.h | 2 +- include/usb.h | 10 ++++++++++ 3 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 04300be..5a86117 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -29,12 +29,13 @@
#include "ehci.h"
-int rootdev; -struct ehci_hccr *hccr; /* R/O registers, not need for volatile */ -volatile struct ehci_hcor *hcor; +int rootdev __attribute__((aligned(USB_DMA_MINALIGN))); +/* R/O registers, not need for volatile */ +struct ehci_hccr *hccr __attribute__((aligned(USB_DMA_MINALIGN))); +volatile struct ehci_hcor *hcor __attribute__((aligned(USB_DMA_MINALIGN)));
static uint16_t portreset; -static struct QH qh_list __attribute__((aligned(32))); +static struct QH qh_list __attribute__((aligned(USB_DMA_MINALIGN)));
static struct descriptor { struct usb_hub_descriptor hub; @@ -207,8 +208,8 @@ static int ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, int length, struct devrequest *req) { - static struct QH qh __attribute__((aligned(32))); - static struct qTD qtd[3] __attribute__((aligned (32))); + static struct QH qh __attribute__((aligned(USB_DMA_MINALIGN))); + static struct qTD qtd[3] __attribute__((aligned(USB_DMA_MINALIGN))); int qtd_counter = 0;
volatile struct qTD *vtd; diff --git a/drivers/usb/musb/musb_core.h b/drivers/usb/musb/musb_core.h index a8adcce..e914369 100644 --- a/drivers/usb/musb/musb_core.h +++ b/drivers/usb/musb/musb_core.h @@ -145,7 +145,7 @@ struct musb_regs { struct musb_epN_regs epN; } ep[16];
-} __attribute__((packed, aligned(32))); +} __attribute__((packed, aligned(USB_DMA_MINALIGN))); #endif
/* diff --git a/include/usb.h b/include/usb.h index 6da91e7..ba3d169 100644 --- a/include/usb.h +++ b/include/usb.h @@ -29,6 +29,16 @@ #include <usb_defs.h> #include <usbdescriptors.h>
+/* + * The EHCI spec says that we must align to at least 32 bytes. However, + * some platforms require larger alignment. + */ +#if ARCH_DMA_MINALIGN > 32 +#define USB_DMA_MINALIGN ARCH_DMA_MINALIGN +#else +#define USB_DMA_MINALIGN 32 +#endif + /* Everything is aribtrary */ #define USB_ALTSETTINGALLOC 4 #define USB_MAXALTSETTING 128 /* Hard limit */

Dear Tom Rini,
The USB spec says that 32 bytes is the minimum required alignment. However on some platforms we have a larger minimum requirement for cache coherency. In those cases, use that value rather than the USB spec minimum. We add a cpp check to <usb.h> to define USB_DMA_MINALIGN and make use of it in ehci-hcd.c and musb_core.h. We cannot use MAX() here as we are not allowed to have tests inside of align(...).
Cc: Marek Vasut marex@denx.de Signed-off-by: Tom Rini trini@ti.com
-- Changes in v2:
- Move test to <usb.h>, expand comment.
drivers/usb/host/ehci-hcd.c | 13 +++++++------ drivers/usb/musb/musb_core.h | 2 +- include/usb.h | 10 ++++++++++ 3 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 04300be..5a86117 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -29,12 +29,13 @@
#include "ehci.h"
-int rootdev; -struct ehci_hccr *hccr; /* R/O registers, not need for volatile */ -volatile struct ehci_hcor *hcor; +int rootdev __attribute__((aligned(USB_DMA_MINALIGN))); +/* R/O registers, not need for volatile */ +struct ehci_hccr *hccr __attribute__((aligned(USB_DMA_MINALIGN))); +volatile struct ehci_hcor *hcor __attribute__((aligned(USB_DMA_MINALIGN)));
^ these need to be aligned too?
static uint16_t portreset; -static struct QH qh_list __attribute__((aligned(32))); +static struct QH qh_list __attribute__((aligned(USB_DMA_MINALIGN)));
static struct descriptor { struct usb_hub_descriptor hub; @@ -207,8 +208,8 @@ static int ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, int length, struct devrequest *req) {
- static struct QH qh __attribute__((aligned(32)));
- static struct qTD qtd[3] __attribute__((aligned (32)));
static struct QH qh __attribute__((aligned(USB_DMA_MINALIGN)));
static struct qTD qtd[3] __attribute__((aligned(USB_DMA_MINALIGN))); int qtd_counter = 0;
volatile struct qTD *vtd;
diff --git a/drivers/usb/musb/musb_core.h b/drivers/usb/musb/musb_core.h index a8adcce..e914369 100644 --- a/drivers/usb/musb/musb_core.h +++ b/drivers/usb/musb/musb_core.h @@ -145,7 +145,7 @@ struct musb_regs { struct musb_epN_regs epN; } ep[16];
-} __attribute__((packed, aligned(32))); +} __attribute__((packed, aligned(USB_DMA_MINALIGN))); #endif
/* diff --git a/include/usb.h b/include/usb.h index 6da91e7..ba3d169 100644 --- a/include/usb.h +++ b/include/usb.h @@ -29,6 +29,16 @@ #include <usb_defs.h> #include <usbdescriptors.h>
+/*
- The EHCI spec says that we must align to at least 32 bytes. However,
- some platforms require larger alignment.
- */
+#if ARCH_DMA_MINALIGN > 32 +#define USB_DMA_MINALIGN ARCH_DMA_MINALIGN +#else +#define USB_DMA_MINALIGN 32 +#endif
/* Everything is aribtrary */ #define USB_ALTSETTINGALLOC 4 #define USB_MAXALTSETTING 128 /* Hard limit */
Best regards, Marek Vasut

On Wed, Jun 20, 2012 at 09:00:45PM +0200, Marek Vasut wrote:
Dear Tom Rini,
The USB spec says that 32 bytes is the minimum required alignment. However on some platforms we have a larger minimum requirement for cache coherency. In those cases, use that value rather than the USB spec minimum. We add a cpp check to <usb.h> to define USB_DMA_MINALIGN and make use of it in ehci-hcd.c and musb_core.h. We cannot use MAX() here as we are not allowed to have tests inside of align(...).
Cc: Marek Vasut marex@denx.de Signed-off-by: Tom Rini trini@ti.com
-- Changes in v2:
- Move test to <usb.h>, expand comment.
drivers/usb/host/ehci-hcd.c | 13 +++++++------ drivers/usb/musb/musb_core.h | 2 +- include/usb.h | 10 ++++++++++ 3 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 04300be..5a86117 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -29,12 +29,13 @@
#include "ehci.h"
-int rootdev; -struct ehci_hccr *hccr; /* R/O registers, not need for volatile */ -volatile struct ehci_hcor *hcor; +int rootdev __attribute__((aligned(USB_DMA_MINALIGN))); +/* R/O registers, not need for volatile */ +struct ehci_hccr *hccr __attribute__((aligned(USB_DMA_MINALIGN))); +volatile struct ehci_hcor *hcor __attribute__((aligned(USB_DMA_MINALIGN)));
^ these need to be aligned too?
Yes, these were the first and easy to spot ones in the cache flush messages, I would swear.

Dear Tom Rini,
On Wed, Jun 20, 2012 at 09:00:45PM +0200, Marek Vasut wrote:
Dear Tom Rini,
The USB spec says that 32 bytes is the minimum required alignment. However on some platforms we have a larger minimum requirement for cache coherency. In those cases, use that value rather than the USB spec minimum. We add a cpp check to <usb.h> to define USB_DMA_MINALIGN and make use of it in ehci-hcd.c and musb_core.h. We cannot use MAX() here as we are not allowed to have tests inside of align(...).
Cc: Marek Vasut marex@denx.de Signed-off-by: Tom Rini trini@ti.com
-- Changes in v2:
- Move test to <usb.h>, expand comment.
drivers/usb/host/ehci-hcd.c | 13 +++++++------ drivers/usb/musb/musb_core.h | 2 +- include/usb.h | 10 ++++++++++ 3 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 04300be..5a86117 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -29,12 +29,13 @@
#include "ehci.h"
-int rootdev; -struct ehci_hccr *hccr; /* R/O registers, not need for volatile */ -volatile struct ehci_hcor *hcor; +int rootdev __attribute__((aligned(USB_DMA_MINALIGN))); +/* R/O registers, not need for volatile */ +struct ehci_hccr *hccr __attribute__((aligned(USB_DMA_MINALIGN))); +volatile struct ehci_hcor *hcor __attribute__((aligned(USB_DMA_MINALIGN)));
^ these need to be aligned too?
Yes, these were the first and easy to spot ones in the cache flush messages, I would swear.
hcor and hccr are register locations though ... so it's pretty weird.
Best regards, Marek Vasut

On Wed, Jun 20, 2012 at 11:15:26PM +0200, Marek Vasut wrote:
Dear Tom Rini,
On Wed, Jun 20, 2012 at 09:00:45PM +0200, Marek Vasut wrote:
Dear Tom Rini,
The USB spec says that 32 bytes is the minimum required alignment. However on some platforms we have a larger minimum requirement for cache coherency. In those cases, use that value rather than the USB spec minimum. We add a cpp check to <usb.h> to define USB_DMA_MINALIGN and make use of it in ehci-hcd.c and musb_core.h. We cannot use MAX() here as we are not allowed to have tests inside of align(...).
Cc: Marek Vasut marex@denx.de Signed-off-by: Tom Rini trini@ti.com
-- Changes in v2:
- Move test to <usb.h>, expand comment.
drivers/usb/host/ehci-hcd.c | 13 +++++++------ drivers/usb/musb/musb_core.h | 2 +- include/usb.h | 10 ++++++++++ 3 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 04300be..5a86117 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -29,12 +29,13 @@
#include "ehci.h"
-int rootdev; -struct ehci_hccr *hccr; /* R/O registers, not need for volatile */ -volatile struct ehci_hcor *hcor; +int rootdev __attribute__((aligned(USB_DMA_MINALIGN))); +/* R/O registers, not need for volatile */ +struct ehci_hccr *hccr __attribute__((aligned(USB_DMA_MINALIGN))); +volatile struct ehci_hcor *hcor __attribute__((aligned(USB_DMA_MINALIGN)));
^ these need to be aligned too?
Yes, these were the first and easy to spot ones in the cache flush messages, I would swear.
hcor and hccr are register locations though ... so it's pretty weird.
OK, now that I look harder at the messages, doing that is only masking a problem. At issue (from u-boot.map): .bss 0x801586c0 0x1c0 drivers/usb/host/libusb_host.o 0x80158860 hcor And a message of: ERROR: v7_dcache_inval_range - stop address is not aligned - 0x9ffbc860 Using 'bdinfo' to see reloc offset gives us hcor, but _end_ not start means it's something within the anonymous part of the bss and padding hcor out just masks the real problem. I'll resubmit a v4 shortly. Thanks!

Dear Tom Rini,
On Wed, Jun 20, 2012 at 11:15:26PM +0200, Marek Vasut wrote:
Dear Tom Rini,
On Wed, Jun 20, 2012 at 09:00:45PM +0200, Marek Vasut wrote:
Dear Tom Rini,
The USB spec says that 32 bytes is the minimum required alignment. However on some platforms we have a larger minimum requirement for cache coherency. In those cases, use that value rather than the USB spec minimum. We add a cpp check to <usb.h> to define USB_DMA_MINALIGN and make use of it in ehci-hcd.c and musb_core.h. We cannot use MAX() here as we are not allowed to have tests inside of align(...).
Cc: Marek Vasut marex@denx.de Signed-off-by: Tom Rini trini@ti.com
-- Changes in v2:
- Move test to <usb.h>, expand comment.
drivers/usb/host/ehci-hcd.c | 13 +++++++------ drivers/usb/musb/musb_core.h | 2 +- include/usb.h | 10 ++++++++++ 3 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 04300be..5a86117 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -29,12 +29,13 @@
#include "ehci.h"
-int rootdev; -struct ehci_hccr *hccr; /* R/O registers, not need for volatile
*/
-volatile struct ehci_hcor *hcor; +int rootdev __attribute__((aligned(USB_DMA_MINALIGN))); +/* R/O registers, not need for volatile */ +struct ehci_hccr *hccr __attribute__((aligned(USB_DMA_MINALIGN))); +volatile struct ehci_hcor *hcor __attribute__((aligned(USB_DMA_MINALIGN)));
^ these need to be aligned too?
Yes, these were the first and easy to spot ones in the cache flush messages, I would swear.
hcor and hccr are register locations though ... so it's pretty weird.
OK, now that I look harder at the messages, doing that is only masking a problem. At issue (from u-boot.map): .bss 0x801586c0 0x1c0 drivers/usb/host/libusb_host.o 0x80158860 hcor And a message of: ERROR: v7_dcache_inval_range - stop address is not aligned - 0x9ffbc860 Using 'bdinfo' to see reloc offset gives us hcor, but _end_ not start means it's something within the anonymous part of the bss and padding hcor out just masks the real problem. I'll resubmit a v4 shortly. Thanks!
You're welcome, Add my Acked-by: Marek Vasut marex@denx.de and push through omap since it's part of a bigger patchset.
Best regards, Marek Vasut

USB EHCI and DCACHE are not compatible, so disable DCACHE support at build-time as run-time disable is insufficient for USB use.
Cc: Ilya Yanok yanok@emcraft.com Signed-off-by: Tom Rini trini@ti.com --- include/configs/mcx.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/configs/mcx.h b/include/configs/mcx.h index f6a83a8..0b29b08 100644 --- a/include/configs/mcx.h +++ b/include/configs/mcx.h @@ -110,9 +110,9 @@ #define CONFIG_OMAP3_GPIO_5 #define CONFIG_USB_EHCI #define CONFIG_USB_EHCI_OMAP +#define CONFIG_SYS_DCACHE_OFF /* USB_EHCI is unusable with DCACHE support */ #define CONFIG_USB_ULPI #define CONFIG_USB_ULPI_VIEWPORT_OMAP -/*#define CONFIG_EHCI_DCACHE*/ /* leave it disabled for now */ #define CONFIG_OMAP_EHCI_PHY1_RESET_GPIO 154 #define CONFIG_OMAP_EHCI_PHY2_RESET_GPIO 152 #define CONFIG_SYS_USB_EHCI_MAX_ROOT_PORTS 3

USB EHCI and DCACHE are not compatible, so disable DCACHE support at build-time as run-time disable is insufficient for USB use.
Signed-off-by: Tom Rini trini@ti.com --- include/configs/omap3_beagle.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/configs/omap3_beagle.h b/include/configs/omap3_beagle.h index b891ee4..79560d7 100644 --- a/include/configs/omap3_beagle.h +++ b/include/configs/omap3_beagle.h @@ -131,7 +131,7 @@ #define CONFIG_USB_EHCI
#define CONFIG_USB_EHCI_OMAP -/*#define CONFIG_EHCI_DCACHE*/ /* leave it disabled for now */ +#define CONFIG_SYS_DCACHE_OFF /* USB_EHCI is unusable with DCACHE support */ #define CONFIG_OMAP_EHCI_PHY1_RESET_GPIO 147
#define CONFIG_USB_ULPI

USB EHCI and DCACHE are not compatible, so disable DCACHE support at build-time as run-time disable is insufficient for USB use.
Cc: Sricharan R r.sricharan@ti.com Signed-off-by: Tom Rini trini@ti.com --- include/configs/omap4_panda.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/include/configs/omap4_panda.h b/include/configs/omap4_panda.h index b4756be..468cf7a 100644 --- a/include/configs/omap4_panda.h +++ b/include/configs/omap4_panda.h @@ -38,6 +38,7 @@ #define CONFIG_USB_HOST #define CONFIG_USB_EHCI #define CONFIG_USB_EHCI_OMAP +#define CONFIG_SYS_DCACHE_OFF /* USB_EHCI is unusable with DCACHE support */ #define CONFIG_USB_STORAGE #define CONFIG_SYS_USB_EHCI_MAX_ROOT_PORTS 3

USB EHCI and DCACHE are not compatible, so disable DCACHE support at build-time as run-time disable is insufficient for USB use.
Cc: Stefano Babic sbabic@denx.de Signed-off-by: Tom Rini trini@ti.com --- include/configs/tam3517-common.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/include/configs/tam3517-common.h b/include/configs/tam3517-common.h index 3fc2c44..b9c96b9 100644 --- a/include/configs/tam3517-common.h +++ b/include/configs/tam3517-common.h @@ -100,6 +100,7 @@ #define CONFIG_OMAP3_GPIO_5 #define CONFIG_USB_EHCI #define CONFIG_USB_EHCI_OMAP +#define CONFIG_SYS_DCACHE_OFF /* USB_EHCI is unusable with DCACHE support */ #define CONFIG_USB_ULPI #define CONFIG_USB_ULPI_VIEWPORT_OMAP #define CONFIG_OMAP_EHCI_PHY1_RESET_GPIO 25

Hey all,
In commit b8adb12 the cache flushing behavior was changed for the EHCI stack. This change showed a few different problems on TI platforms (where our cacheline size is 64 not 32). First, the dcache_off call that ehci-omap had been doing was now not happening soon enough to paper over the cache issues. This call is removed in patch 1. The second patch deal with the same problem in EHCI and MUSB. The USB spec says that 32 bytes is the minimum alignment but we need larger alignment when the cache is larger. Note that we can't use MAX() here as gcc doesn't allow that expansion inside of align(..). Patches 3 to 6 disable dcache support at build time on all platforms that enable CONFIG_USB_EHCI_OMAP because performing a run-time 'dcache off' operation leaves USB unusable due to the unaligned flushes that are still attempted. Run-time testing on an omap3_beagle shows a very slight (less than half a second, checked with grabserial) increase in load time for a 3.5-rc3 kernel image from SD card. Note that otherwise a tftp load takes minutes to complete rather than seconds due to all of the console spam.
Tested on omap3_beagle (which was previously broken) and a MAKEALL -a arm looks good too.
Changes in v4: - Drop alignment on rootdev/hccr/hcor as a double-check showed that was masking an alignment problem elsewhere (Marek Vasut)
Changes in v3: - Drop the cache_v7.c change and instead make all CONFIG_USB_EHCI_OMAP boards disable DCACHE at build time.
Changes in v2: - Condense last two patches into one that puts the test into <usb.h>

This has never been completely sufficient and now happens too late to paper over the cache coherency problems with the current USB stack.
Cc: Marek Vasut marex@denx.de Signed-off-by: Tom Rini trini@ti.com --- drivers/usb/host/ehci-omap.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/usb/host/ehci-omap.c b/drivers/usb/host/ehci-omap.c index 1ed7710..292673b 100644 --- a/drivers/usb/host/ehci-omap.c +++ b/drivers/usb/host/ehci-omap.c @@ -246,7 +246,6 @@ int omap_ehci_hcd_init(struct omap_usbhs_board_data *usbhs_pdata) if (is_ehci_phy_mode(usbhs_pdata->port_mode[i])) omap_ehci_soft_phy_reset(i);
- dcache_disable(); hccr = (struct ehci_hccr *)(OMAP_EHCI_BASE); hcor = (struct ehci_hcor *)(OMAP_EHCI_BASE + 0x10);

The USB spec says that 32 bytes is the minimum required alignment. However on some platforms we have a larger minimum requirement for cache coherency. In those cases, use that value rather than the USB spec minimum. We add a cpp check to <usb.h> to define USB_DMA_MINALIGN and make use of it in ehci-hcd.c and musb_core.h. We cannot use MAX() here as we are not allowed to have tests inside of align(...).
Cc: Marek Vasut marex@denx.de Signed-off-by: Tom Rini trini@ti.com
-- Changes in v4: - Re-checking shows we do not need to add alignment to rootdev/hcor/hccr
Changes in v2: - Move test to <usb.h>, expand comment. --- drivers/usb/host/ehci-hcd.c | 6 +++--- drivers/usb/musb/musb_core.h | 2 +- include/usb.h | 10 ++++++++++ 3 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 04300be..d34c675 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -34,7 +34,7 @@ struct ehci_hccr *hccr; /* R/O registers, not need for volatile */ volatile struct ehci_hcor *hcor;
static uint16_t portreset; -static struct QH qh_list __attribute__((aligned(32))); +static struct QH qh_list __attribute__((aligned(USB_DMA_MINALIGN)));
static struct descriptor { struct usb_hub_descriptor hub; @@ -207,8 +207,8 @@ static int ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, int length, struct devrequest *req) { - static struct QH qh __attribute__((aligned(32))); - static struct qTD qtd[3] __attribute__((aligned (32))); + static struct QH qh __attribute__((aligned(USB_DMA_MINALIGN))); + static struct qTD qtd[3] __attribute__((aligned(USB_DMA_MINALIGN))); int qtd_counter = 0;
volatile struct qTD *vtd; diff --git a/drivers/usb/musb/musb_core.h b/drivers/usb/musb/musb_core.h index a8adcce..e914369 100644 --- a/drivers/usb/musb/musb_core.h +++ b/drivers/usb/musb/musb_core.h @@ -145,7 +145,7 @@ struct musb_regs { struct musb_epN_regs epN; } ep[16];
-} __attribute__((packed, aligned(32))); +} __attribute__((packed, aligned(USB_DMA_MINALIGN))); #endif
/* diff --git a/include/usb.h b/include/usb.h index 6da91e7..ba3d169 100644 --- a/include/usb.h +++ b/include/usb.h @@ -29,6 +29,16 @@ #include <usb_defs.h> #include <usbdescriptors.h>
+/* + * The EHCI spec says that we must align to at least 32 bytes. However, + * some platforms require larger alignment. + */ +#if ARCH_DMA_MINALIGN > 32 +#define USB_DMA_MINALIGN ARCH_DMA_MINALIGN +#else +#define USB_DMA_MINALIGN 32 +#endif + /* Everything is aribtrary */ #define USB_ALTSETTINGALLOC 4 #define USB_MAXALTSETTING 128 /* Hard limit */

USB EHCI and DCACHE are not compatible, so disable DCACHE support at build-time as run-time disable is insufficient for USB use.
Cc: Ilya Yanok yanok@emcraft.com Signed-off-by: Tom Rini trini@ti.com --- include/configs/mcx.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/configs/mcx.h b/include/configs/mcx.h index f6a83a8..0b29b08 100644 --- a/include/configs/mcx.h +++ b/include/configs/mcx.h @@ -110,9 +110,9 @@ #define CONFIG_OMAP3_GPIO_5 #define CONFIG_USB_EHCI #define CONFIG_USB_EHCI_OMAP +#define CONFIG_SYS_DCACHE_OFF /* USB_EHCI is unusable with DCACHE support */ #define CONFIG_USB_ULPI #define CONFIG_USB_ULPI_VIEWPORT_OMAP -/*#define CONFIG_EHCI_DCACHE*/ /* leave it disabled for now */ #define CONFIG_OMAP_EHCI_PHY1_RESET_GPIO 154 #define CONFIG_OMAP_EHCI_PHY2_RESET_GPIO 152 #define CONFIG_SYS_USB_EHCI_MAX_ROOT_PORTS 3

Hi,
21.06.2012 02:14, Tom Rini wrote:
USB EHCI and DCACHE are not compatible, so disable DCACHE support at build-time as run-time disable is insufficient for USB use.
Sorry for missing this discussion. I think compile-time disabling of the cache is too brutal. ehci-hcd cache handling is broken anyway: doing unaligned flushes/invalidates is a bug, and we know for sure that upper layers don't care about alignment (and I bet ehci-hcd does this even for its internal buffers). So what's the point in all this cache handling in ehci-hcd? It's not going to work anyway and just produces problems. So I suggest to just disable all this stuff until generic code will be fixed. Alternatively we can do bounce-buffering inside driver.
Regards, Ilya.

Dear Ilya Yanok,
Hi,
21.06.2012 02:14, Tom Rini wrote:
USB EHCI and DCACHE are not compatible, so disable DCACHE support at build-time as run-time disable is insufficient for USB use.
Sorry for missing this discussion. I think compile-time disabling of the cache is too brutal. ehci-hcd cache handling is broken anyway: doing unaligned flushes/invalidates is a bug, and we know for sure that upper layers don't care about alignment (and I bet ehci-hcd does this even for its internal buffers). So what's the point in all this cache handling in ehci-hcd? It's not going to work anyway and just produces problems. So I suggest to just disable all this stuff until generic code will be fixed. Alternatively we can do bounce-buffering inside driver.
We should rather introduce generic bounce buffer. But the upper layers are getting fixed recently so we should be getting there.
Regards, Ilya.
Best regards, Marek Vasut

Dear Marek,
28.06.2012 02:48, Marek Vasut wrote:
Sorry for missing this discussion. I think compile-time disabling of the cache is too brutal. ehci-hcd cache handling is broken anyway: doing unaligned flushes/invalidates is a bug, and we know for sure that upper layers don't care about alignment (and I bet ehci-hcd does this even for its internal buffers). So what's the point in all this cache handling in ehci-hcd? It's not going to work anyway and just produces problems. So I suggest to just disable all this stuff until generic code will be fixed. Alternatively we can do bounce-buffering inside driver.
We should rather introduce generic bounce buffer. But the upper layers are getting fixed recently so we should be getting there.
Really? Don't forget my old patch [1] then ;) Still I think we should rip off all the cache stuff from ehci-hcd until all patches for upper layers are included. Again, this stuff doesn't do proper things now anyway and USB won't work with dcache enabled.
BTW, I think this was under #ifdef CONFIG_EHCI_DCACHE last time looked at it. Was this changed by your commit? I think that's the source of the problem this series tries to address: you've taken buggy code out of #ifdef ;) I think it's better to just put it back until upper layers won't be fixed.
Regards, Ilya.
[1] http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/114235

Dear Ilya Yanok,
Dear Marek,
28.06.2012 02:48, Marek Vasut wrote:
Sorry for missing this discussion. I think compile-time disabling of the cache is too brutal. ehci-hcd cache handling is broken anyway: doing unaligned flushes/invalidates is a bug, and we know for sure that upper layers don't care about alignment (and I bet ehci-hcd does this even for its internal buffers). So what's the point in all this cache handling in ehci-hcd? It's not going to work anyway and just produces problems. So I suggest to just disable all this stuff until generic code will be fixed. Alternatively we can do bounce-buffering inside driver.
We should rather introduce generic bounce buffer. But the upper layers are getting fixed recently so we should be getting there.
Really? Don't forget my old patch [1] then ;) Still I think we should rip off all the cache stuff from ehci-hcd until all patches for upper layers are included. Again, this stuff doesn't do proper things now anyway and USB won't work with dcache enabled.
Have you tested? I enabled dcache on m28 and tried asix ethernet (needed a patch) and loading from ext2 and vfat (worked).
BTW, I think this was under #ifdef CONFIG_EHCI_DCACHE last time looked at it. Was this changed by your commit? I think that's the source of the problem this series tries to address: you've taken buggy code out of #ifdef ;) I think it's better to just put it back until upper layers won't be fixed.
Regards, Ilya.
[1] http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/114235
Best regards, Marek Vasut

Dear Marek,
28.06.2012 18:37, Marek Vasut wrote:
Really? Don't forget my old patch [1] then ;) Still I think we should rip off all the cache stuff from ehci-hcd until all patches for upper layers are included. Again, this stuff doesn't do proper things now anyway and USB won't work with dcache enabled.
Have you tested? I enabled dcache on m28 and tried asix ethernet (needed a
Surely. (but that probably was an AM3517 with 64 byte cache line)
patch) and loading from ext2 and vfat (worked).
This is just a coincedence ;)
Regards, Ilya.

Dear Ilya Yanok,
Dear Marek,
28.06.2012 18:37, Marek Vasut wrote:
Really? Don't forget my old patch [1] then ;) Still I think we should rip off all the cache stuff from ehci-hcd until all patches for upper layers are included. Again, this stuff doesn't do proper things now anyway and USB won't work with dcache enabled.
Have you tested? I enabled dcache on m28 and tried asix ethernet (needed a
Surely. (but that probably was an AM3517 with 64 byte cache line)
m28 is imx28 with 32byte cacheline
patch) and loading from ext2 and vfat (worked).
This is just a coincedence ;)
Not really, did you check mainline uboot and the fixes in those?
Regards, Ilya.
Best regards, Marek Vasut

Dear Marek,
28.06.2012 19:41, Marek Vasut wrote:
Surely. (but that probably was an AM3517 with 64 byte cache line)
m28 is imx28 with 32byte cacheline
You are lucky then. But some systems have bigger cacheline, right?
patch) and loading from ext2 and vfat (worked).
This is just a coincedence ;)
Not really, did you check mainline uboot and the fixes in those?
Surely I did. Let's take a look:
do { /* Invalidate dcache */ invalidate_dcache_range((uint32_t)&qh_list, (uint32_t)&qh_list + sizeof(struct QH)); invalidate_dcache_range((uint32_t)&qh, (uint32_t)&qh + sizeof(struct QH)); invalidate_dcache_range((uint32_t)qtd, (uint32_t)qtd + sizeof(qtd));
These guys are 32-byte aligned, right? So the assumption is that cacheline is not greater than 32. Sounds like a bug to me (though could be fixed rather easily with USB_MINALIGN introduced by Tom).
token = hc32_to_cpu(vtd->qt_token); if (!(token & 0x80)) break; WATCHDOG_RESET(); } while (get_timer(ts) < timeout); /* Invalidate the memory area occupied by buffer */ invalidate_dcache_range(((uint32_t)buffer & ~31), ((uint32_t)buffer & ~31) + roundup(length, 32));
That's worse. First of all, rounding is wrong: consider buffer=31, length=32. But that's easy to fix too. The bad part is that with writeback cache you can't just enlarge the range to cover the buffer and fit alignment requirements -- this can potentially destroy some changes that are in the same cache line but not yet stored to RAM.
Regards, Ilya.

Dear Ilya Yanok,
Dear Marek,
28.06.2012 19:41, Marek Vasut wrote:
Surely. (but that probably was an AM3517 with 64 byte cache line)
m28 is imx28 with 32byte cacheline
You are lucky then. But some systems have bigger cacheline, right?
Yes, and I just got a perfect system for the job.
patch) and loading from ext2 and vfat (worked).
This is just a coincedence ;)
Not really, did you check mainline uboot and the fixes in those?
Surely I did. Let's take a look:
do { /* Invalidate dcache */ invalidate_dcache_range((uint32_t)&qh_list, (uint32_t)&qh_list + sizeof(struct QH)); invalidate_dcache_range((uint32_t)&qh, (uint32_t)&qh + sizeof(struct QH)); invalidate_dcache_range((uint32_t)qtd, (uint32_t)qtd + sizeof(qtd));
These guys are 32-byte aligned, right? So the assumption is that cacheline is not greater than 32. Sounds like a bug to me (though could be fixed rather easily with USB_MINALIGN introduced by Tom).
Oooh, good catch indeed. Actually, look at the structures, even they have some weird alignment crap in them. Maybe that can also be dropped and the alignment shall be fixed properly. I'll have to research this more.
token = hc32_to_cpu(vtd->qt_token); if (!(token & 0x80)) break; WATCHDOG_RESET(); } while (get_timer(ts) < timeout); /* Invalidate the memory area occupied by buffer */ invalidate_dcache_range(((uint32_t)buffer & ~31), ((uint32_t)buffer & ~31) + roundup(length, 32));
That's worse. First of all, rounding is wrong: consider buffer=31, length=32. But that's easy to fix too. The bad part is that with writeback cache you can't just enlarge the range to cover the buffer and fit alignment requirements -- this can potentially destroy some changes that are in the same cache line but not yet stored to RAM.
Correct, so we have multiple bugs in here that need to be squashed ASAP.
Ilya, can you possibly submit a patch for this?
Regards, Ilya.
Best regards, Marek Vasut

Dear Marek,
30.06.2012 23:27, Marek Vasut wrote:
do { /* Invalidate dcache */ invalidate_dcache_range((uint32_t)&qh_list, (uint32_t)&qh_list + sizeof(struct QH)); invalidate_dcache_range((uint32_t)&qh, (uint32_t)&qh + sizeof(struct QH)); invalidate_dcache_range((uint32_t)qtd, (uint32_t)qtd + sizeof(qtd));
These guys are 32-byte aligned, right? So the assumption is that cacheline is not greater than 32. Sounds like a bug to me (though could be fixed rather easily with USB_MINALIGN introduced by Tom).
Oooh, good catch indeed. Actually, look at the structures, even they have some weird alignment crap in them. Maybe that can also be dropped and the alignment shall be fixed properly. I'll have to research this more.
I guess that's because they have to be both address and size aligned.
token = hc32_to_cpu(vtd->qt_token); if (!(token& 0x80)) break; WATCHDOG_RESET(); } while (get_timer(ts)< timeout); /* Invalidate the memory area occupied by buffer */ invalidate_dcache_range(((uint32_t)buffer& ~31), ((uint32_t)buffer& ~31) + roundup(length, 32));
That's worse. First of all, rounding is wrong: consider buffer=31, length=32. But that's easy to fix too. The bad part is that with writeback cache you can't just enlarge the range to cover the buffer and fit alignment requirements -- this can potentially destroy some changes that are in the same cache line but not yet stored to RAM.
Correct, so we have multiple bugs in here that need to be squashed ASAP.
Ilya, can you possibly submit a patch for this?
Hm.. Fixing alignment for qh, qd and stuff is easy, I can surely submit a patch for this. But I don't really know what to do this the last one: if we are at this point there is no correct way out... We can increase the range to cover the buffer or decrease it to be inside buffer -- but both options are incorrect. Actually we should return error when unaligned buffer is submitted in the first place... But this will likely break a lot of systems... Probably put this check under if (dcache_enabled())?
Regards, Ilya.

Dear Ilya Yanok,
Dear Marek,
30.06.2012 23:27, Marek Vasut wrote:
do { /* Invalidate dcache */ invalidate_dcache_range((uint32_t)&qh_list, (uint32_t)&qh_list + sizeof(struct QH)); invalidate_dcache_range((uint32_t)&qh, (uint32_t)&qh + sizeof(struct QH)); invalidate_dcache_range((uint32_t)qtd, (uint32_t)qtd + sizeof(qtd));
These guys are 32-byte aligned, right? So the assumption is that cacheline is not greater than 32. Sounds like a bug to me (though could be fixed rather easily with USB_MINALIGN introduced by Tom).
Oooh, good catch indeed. Actually, look at the structures, even they have some weird alignment crap in them. Maybe that can also be dropped and the alignment shall be fixed properly. I'll have to research this more.
I guess that's because they have to be both address and size aligned.
token = hc32_to_cpu(vtd->qt_token); if (!(token& 0x80)) break; WATCHDOG_RESET(); } while (get_timer(ts)< timeout); /* Invalidate the memory area occupied by buffer */ invalidate_dcache_range(((uint32_t)buffer& ~31), ((uint32_t)buffer& ~31) + roundup(length, 32));
That's worse. First of all, rounding is wrong: consider buffer=31, length=32. But that's easy to fix too. The bad part is that with writeback cache you can't just enlarge the range to cover the buffer and fit alignment requirements -- this can potentially destroy some changes that are in the same cache line but not yet stored to RAM.
Correct, so we have multiple bugs in here that need to be squashed ASAP.
Ilya, can you possibly submit a patch for this?
Hm.. Fixing alignment for qh, qd and stuff is easy, I can surely submit a patch for this.
Well ... partial patch is still better than none.
But I don't really know what to do this the last one: if we are at this point there is no correct way out... We can increase the range to cover the buffer or decrease it to be inside buffer -- but both options are incorrect. Actually we should return error when unaligned buffer is submitted in the first place... But this will likely break a lot of systems... Probably put this check under if (dcache_enabled())?
Bounce buffer I guess ... with warning that you'll hit performance penalty becaue you're dumb and doing unaligned transfer. And for internal uboot stuff, we can align it.
Regards, Ilya.
Best regards, Marek Vasut

On 06/28/2012 07:37 AM, Marek Vasut wrote:
Dear Ilya Yanok,
Dear Marek,
28.06.2012 02:48, Marek Vasut wrote:
Sorry for missing this discussion. I think compile-time disabling of the cache is too brutal. ehci-hcd cache handling is broken anyway: doing unaligned flushes/invalidates is a bug, and we know for sure that upper layers don't care about alignment (and I bet ehci-hcd does this even for its internal buffers). So what's the point in all this cache handling in ehci-hcd? It's not going to work anyway and just produces problems. So I suggest to just disable all this stuff until generic code will be fixed. Alternatively we can do bounce-buffering inside driver.
We should rather introduce generic bounce buffer. But the upper layers are getting fixed recently so we should be getting there.
Really? Don't forget my old patch [1] then ;) Still I think we should rip off all the cache stuff from ehci-hcd until all patches for upper layers are included. Again, this stuff doesn't do proper things now anyway and USB won't work with dcache enabled.
Have you tested? I enabled dcache on m28 and tried asix ethernet (needed a patch) and loading from ext2 and vfat (worked).
So then we have more places that accidentially aligned to 32bytes since this does not work on TI parts which require 64byte alignment.

Dear Tom Rini,
On 06/28/2012 07:37 AM, Marek Vasut wrote:
Dear Ilya Yanok,
Dear Marek,
28.06.2012 02:48, Marek Vasut wrote:
Sorry for missing this discussion. I think compile-time disabling of the cache is too brutal. ehci-hcd cache handling is broken anyway: doing unaligned flushes/invalidates is a bug, and we know for sure that upper layers don't care about alignment (and I bet ehci-hcd does this even for its internal buffers). So what's the point in all this cache handling in ehci-hcd? It's not going to work anyway and just produces problems. So I suggest to just disable all this stuff until generic code will be fixed. Alternatively we can do bounce-buffering inside driver.
We should rather introduce generic bounce buffer. But the upper layers are getting fixed recently so we should be getting there.
Really? Don't forget my old patch [1] then ;) Still I think we should rip off all the cache stuff from ehci-hcd until all patches for upper layers are included. Again, this stuff doesn't do proper things now anyway and USB won't work with dcache enabled.
Have you tested? I enabled dcache on m28 and tried asix ethernet (needed a patch) and loading from ext2 and vfat (worked).
So then we have more places that accidentially aligned to 32bytes since this does not work on TI parts which require 64byte alignment.
Oh, this is very good it's broken. People actually started whining. Now we have to wait until they start identifying the problematic places and fixing them.
Best regards, Marek Vasut

On Fri, Jun 29, 2012 at 12:01:58AM +0200, Marek Vasut wrote:
Dear Tom Rini,
On 06/28/2012 07:37 AM, Marek Vasut wrote:
Dear Ilya Yanok,
Dear Marek,
28.06.2012 02:48, Marek Vasut wrote:
Sorry for missing this discussion. I think compile-time disabling of the cache is too brutal. ehci-hcd cache handling is broken anyway: doing unaligned flushes/invalidates is a bug, and we know for sure that upper layers don't care about alignment (and I bet ehci-hcd does this even for its internal buffers). So what's the point in all this cache handling in ehci-hcd? It's not going to work anyway and just produces problems. So I suggest to just disable all this stuff until generic code will be fixed. Alternatively we can do bounce-buffering inside driver.
We should rather introduce generic bounce buffer. But the upper layers are getting fixed recently so we should be getting there.
Really? Don't forget my old patch [1] then ;) Still I think we should rip off all the cache stuff from ehci-hcd until all patches for upper layers are included. Again, this stuff doesn't do proper things now anyway and USB won't work with dcache enabled.
Have you tested? I enabled dcache on m28 and tried asix ethernet (needed a patch) and loading from ext2 and vfat (worked).
So then we have more places that accidentially aligned to 32bytes since this does not work on TI parts which require 64byte alignment.
Oh, this is very good it's broken. People actually started whining. Now we have to wait until they start identifying the problematic places and fixing them.
Uh-hunh. So I guess for v2012.07 we'll build-time disable dcache for beagle and omap3_evm and leave it on for mcx and see who has time and hardware to fix things for v2012.10.

Dear Tom Rini,
On Fri, Jun 29, 2012 at 12:01:58AM +0200, Marek Vasut wrote:
Dear Tom Rini,
On 06/28/2012 07:37 AM, Marek Vasut wrote:
Dear Ilya Yanok,
Dear Marek,
28.06.2012 02:48, Marek Vasut wrote:
> Sorry for missing this discussion. I think compile-time disabling > of the cache is too brutal. > ehci-hcd cache handling is broken anyway: doing unaligned > flushes/invalidates is a bug, and we know for sure that upper > layers don't care about alignment (and I bet ehci-hcd does this > even for its internal buffers). So what's the point in all this > cache handling in ehci-hcd? It's not going to work anyway and > just produces problems. So I suggest to just disable all this > stuff until generic code will be fixed. Alternatively we can do > bounce-buffering inside driver.
We should rather introduce generic bounce buffer. But the upper layers are getting fixed recently so we should be getting there.
Really? Don't forget my old patch [1] then ;) Still I think we should rip off all the cache stuff from ehci-hcd until all patches for upper layers are included. Again, this stuff doesn't do proper things now anyway and USB won't work with dcache enabled.
Have you tested? I enabled dcache on m28 and tried asix ethernet (needed a patch) and loading from ext2 and vfat (worked).
So then we have more places that accidentially aligned to 32bytes since this does not work on TI parts which require 64byte alignment.
Oh, this is very good it's broken. People actually started whining. Now we have to wait until they start identifying the problematic places and fixing them.
Uh-hunh. So I guess for v2012.07 we'll build-time disable dcache for beagle and omap3_evm
Didn't you fix the issues?
and leave it on for mcx and see who has time and hardware to fix things for v2012.10.
Or we fix it for mcx too until .07 is out ?
Best regards, Marek Vasut

On 06/28/2012 03:36 PM, Marek Vasut wrote:
Dear Tom Rini,
On Fri, Jun 29, 2012 at 12:01:58AM +0200, Marek Vasut wrote:
Dear Tom Rini,
On 06/28/2012 07:37 AM, Marek Vasut wrote:
Dear Ilya Yanok,
Dear Marek,
28.06.2012 02:48, Marek Vasut wrote: >> Sorry for missing this discussion. I think compile-time disabling >> of the cache is too brutal. >> ehci-hcd cache handling is broken anyway: doing unaligned >> flushes/invalidates is a bug, and we know for sure that upper >> layers don't care about alignment (and I bet ehci-hcd does this >> even for its internal buffers). So what's the point in all this >> cache handling in ehci-hcd? It's not going to work anyway and >> just produces problems. So I suggest to just disable all this >> stuff until generic code will be fixed. Alternatively we can do >> bounce-buffering inside driver. > > We should rather introduce generic bounce buffer. But the upper > layers are getting fixed recently so we should be getting there.
Really? Don't forget my old patch [1] then ;) Still I think we should rip off all the cache stuff from ehci-hcd until all patches for upper layers are included. Again, this stuff doesn't do proper things now anyway and USB won't work with dcache enabled.
Have you tested? I enabled dcache on m28 and tried asix ethernet (needed a patch) and loading from ext2 and vfat (worked).
So then we have more places that accidentially aligned to 32bytes since this does not work on TI parts which require 64byte alignment.
Oh, this is very good it's broken. People actually started whining. Now we have to wait until they start identifying the problematic places and fixing them.
Uh-hunh. So I guess for v2012.07 we'll build-time disable dcache for beagle and omap3_evm
Didn't you fix the issues?
and leave it on for mcx and see who has time and hardware to fix things for v2012.10.
Or we fix it for mcx too until .07 is out ?
To clarify for everyone, the first part of this series fixes some alignment issues for things that were not starting address aligned. There still exist end-address alignment issues within ehci-hcd. The time I have for this problem right now boils down to disable dcache for these boards so that USB is still functional.

Dear Tom Rini,
On 06/28/2012 03:36 PM, Marek Vasut wrote:
Dear Tom Rini,
On Fri, Jun 29, 2012 at 12:01:58AM +0200, Marek Vasut wrote:
Dear Tom Rini,
On 06/28/2012 07:37 AM, Marek Vasut wrote:
Dear Ilya Yanok,
> Dear Marek, > > 28.06.2012 02:48, Marek Vasut wrote: >>> Sorry for missing this discussion. I think compile-time disabling >>> of the cache is too brutal. >>> ehci-hcd cache handling is broken anyway: doing unaligned >>> flushes/invalidates is a bug, and we know for sure that upper >>> layers don't care about alignment (and I bet ehci-hcd does this >>> even for its internal buffers). So what's the point in all this >>> cache handling in ehci-hcd? It's not going to work anyway and >>> just produces problems. So I suggest to just disable all this >>> stuff until generic code will be fixed. Alternatively we can do >>> bounce-buffering inside driver. >> >> We should rather introduce generic bounce buffer. But the upper >> layers are getting fixed recently so we should be getting there. > > Really? Don't forget my old patch [1] then ;) > Still I think we should rip off all the cache stuff from ehci-hcd > until all patches for upper layers are included. Again, this stuff > doesn't do proper things now anyway and USB won't work with dcache > enabled.
Have you tested? I enabled dcache on m28 and tried asix ethernet (needed a patch) and loading from ext2 and vfat (worked).
So then we have more places that accidentially aligned to 32bytes since this does not work on TI parts which require 64byte alignment.
Oh, this is very good it's broken. People actually started whining. Now we have to wait until they start identifying the problematic places and fixing them.
Uh-hunh. So I guess for v2012.07 we'll build-time disable dcache for beagle and omap3_evm
Didn't you fix the issues?
and leave it on for mcx and see who has time and hardware to fix things for v2012.10.
Or we fix it for mcx too until .07 is out ?
To clarify for everyone, the first part of this series fixes some alignment issues for things that were not starting address aligned. There still exist end-address alignment issues within ehci-hcd. The time I have for this problem right now boils down to disable dcache for these boards so that USB is still functional.
To clarify it even further -- it always worked just by sheer coincidence ...
Best regards, Marek Vasut

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 06/28/2012 05:54 PM, Marek Vasut wrote:
Dear Tom Rini,
On 06/28/2012 03:36 PM, Marek Vasut wrote:
Dear Tom Rini,
On Fri, Jun 29, 2012 at 12:01:58AM +0200, Marek Vasut wrote:
Dear Tom Rini,
On 06/28/2012 07:37 AM, Marek Vasut wrote: > Dear Ilya Yanok, > >> Dear Marek, >> >> 28.06.2012 02:48, Marek Vasut wrote: >>>> Sorry for missing this discussion. I think >>>> compile-time disabling of the cache is too >>>> brutal. ehci-hcd cache handling is broken >>>> anyway: doing unaligned flushes/invalidates is a >>>> bug, and we know for sure that upper layers don't >>>> care about alignment (and I bet ehci-hcd does >>>> this even for its internal buffers). So what's >>>> the point in all this cache handling in >>>> ehci-hcd? It's not going to work anyway and just >>>> produces problems. So I suggest to just disable >>>> all this stuff until generic code will be fixed. >>>> Alternatively we can do bounce-buffering inside >>>> driver. >>> >>> We should rather introduce generic bounce buffer. >>> But the upper layers are getting fixed recently so >>> we should be getting there. >> >> Really? Don't forget my old patch [1] then ;) Still >> I think we should rip off all the cache stuff from >> ehci-hcd until all patches for upper layers are >> included. Again, this stuff doesn't do proper things >> now anyway and USB won't work with dcache enabled. > > Have you tested? I enabled dcache on m28 and tried > asix ethernet (needed a patch) and loading from ext2 > and vfat (worked).
So then we have more places that accidentially aligned to 32bytes since this does not work on TI parts which require 64byte alignment.
Oh, this is very good it's broken. People actually started whining. Now we have to wait until they start identifying the problematic places and fixing them.
Uh-hunh. So I guess for v2012.07 we'll build-time disable dcache for beagle and omap3_evm
Didn't you fix the issues?
and leave it on for mcx and see who has time and hardware to fix things for v2012.10.
Or we fix it for mcx too until .07 is out ?
To clarify for everyone, the first part of this series fixes some alignment issues for things that were not starting address aligned. There still exist end-address alignment issues within ehci-hcd. The time I have for this problem right now boils down to disable dcache for these boards so that USB is still functional.
To clarify it even further -- it always worked just by sheer coincidence ...
No, it didn't. It used to work in a timely manner, with dcache disabled but support enabled at build time. Now it works, in an unusably slow manner, with dcache disabled but support enabled at build time. It continues to work in a timely manner with dcache support disabled at build time. On any platform with >32byte alignment requirements for cache flushing.
- -- Tom

Dear Marek,
29.06.2012 04:54, Marek Vasut wrote:
To clarify for everyone, the first part of this series fixes some alignment issues for things that were not starting address aligned. There still exist end-address alignment issues within ehci-hcd. The time I have for this problem right now boils down to disable dcache for these boards so that USB is still functional.
To clarify it even further -- it always worked just by sheer coincidence ...
Not exactly. It never worked (at least on my systems) with D-Cache enabled. But at least we had a choice of run-time disabled dcache. With the recent changes we have to disable cache support at compile time.
Regards, Ilya.

Dear Ilya Yanok,
Dear Marek,
29.06.2012 04:54, Marek Vasut wrote:
To clarify for everyone, the first part of this series fixes some alignment issues for things that were not starting address aligned. There still exist end-address alignment issues within ehci-hcd. The time I have for this problem right now boils down to disable dcache for these boards so that USB is still functional.
To clarify it even further -- it always worked just by sheer coincidence ...
Not exactly. It never worked (at least on my systems) with D-Cache enabled. But at least we had a choice of run-time disabled dcache. With the recent changes we have to disable cache support at compile time.
I see what you're after. But do you consider runtime disabling the cache is the way to go or it's a way of hiding bugs?
Regards, Ilya.
Best regards, Marek Vasut

Dear Marek,
30.06.2012 23:28, Marek Vasut wrote:
Not exactly. It never worked (at least on my systems) with D-Cache enabled. But at least we had a choice of run-time disabled dcache. With the recent changes we have to disable cache support at compile time.
I see what you're after. But do you consider runtime disabling the cache is the way to go or it's a way of hiding bugs?
Both ;) And now we are going to hide even more bugs with compile-time disabling :(
Regards, Ilya.

On 07/03/2012 01:13 PM, Ilya Yanok wrote:
Dear Marek,
30.06.2012 23:28, Marek Vasut wrote:
Not exactly. It never worked (at least on my systems) with D-Cache enabled. But at least we had a choice of run-time disabled dcache. With the recent changes we have to disable cache support at compile time.
I see what you're after. But do you consider runtime disabling the cache is the way to go or it's a way of hiding bugs?
Both ;) And now we are going to hide even more bugs with compile-time disabling :(
Does someone wish to argue we should disable USB support instead on these platforms? I don't see anyone arguing "I have time to fix this for v2012.07".

Hi Tom,
04.07.2012 00:43, Tom Rini wrote:
On 07/03/2012 01:13 PM, Ilya Yanok wrote:
Dear Marek,
30.06.2012 23:28, Marek Vasut wrote:
Not exactly. It never worked (at least on my systems) with D-Cache enabled. But at least we had a choice of run-time disabled dcache. With the recent changes we have to disable cache support at compile time.
I see what you're after. But do you consider runtime disabling the cache is the way to go or it's a way of hiding bugs?
Both ;) And now we are going to hide even more bugs with compile-time disabling :(
Does someone wish to argue we should disable USB support instead on these platforms? I don't see anyone arguing "I have time to fix this for v2012.07".
I just looked at the code more carefully and it seems that most of the upper layers are in much better shape than I thought. So I think we should just extend your 2/6 patch to fix both address and size for structs QH and qtd and don't mess with buffer at all: if we got unaligned buffer -- it's definetely upper layer bug so we should produce some noise in this case. As I said upper layers seems to be in good shape so hopefully there won't be too much noise.
Hm, probably we should put buffer invalidation under if(dcache_enabled()) to leave run-time cache disabling as rescue option for broken upper-layer code..
I'm working on the patch now and hopefully will post it this night.
Regards, Ilya.

Dear Ilya Yanok,
Hi Tom,
04.07.2012 00:43, Tom Rini wrote:
On 07/03/2012 01:13 PM, Ilya Yanok wrote:
Dear Marek,
30.06.2012 23:28, Marek Vasut wrote:
Not exactly. It never worked (at least on my systems) with D-Cache enabled. But at least we had a choice of run-time disabled dcache. With the recent changes we have to disable cache support at compile time.
I see what you're after. But do you consider runtime disabling the cache is the way to go or it's a way of hiding bugs?
Both ;) And now we are going to hide even more bugs with compile-time disabling :(
Does someone wish to argue we should disable USB support instead on these platforms? I don't see anyone arguing "I have time to fix this for v2012.07".
I just looked at the code more carefully and it seems that most of the upper layers are in much better shape than I thought. So I think we should just extend your 2/6 patch to fix both address and size for structs QH and qtd and don't mess with buffer at all: if we got unaligned buffer -- it's definetely upper layer bug so we should produce some noise in this case. As I said upper layers seems to be in good shape so hopefully there won't be too much noise.
Hm, probably we should put buffer invalidation under if(dcache_enabled()) to leave run-time cache disabling as rescue option for broken upper-layer code..
I'm working on the patch now and hopefully will post it this night.
Ilya, thank you for saving my back ;-)
And thank you for investing your time into this.
Regards, Ilya.
Best regards, Marek Vasut

Hi All,
04.07.2012 04:14, Marek Vasut wrote:
Ilya, thank you for saving my back ;-)
And thank you for investing your time into this.
You are welcome ;)
Just posted the patch. No dealing with unaligned buffers from upper layers for now but at least fatload with aligned address works fine.
Regards, Ilya.

Dear Tom Rini,
On 07/03/2012 01:13 PM, Ilya Yanok wrote:
Dear Marek,
30.06.2012 23:28, Marek Vasut wrote:
Not exactly. It never worked (at least on my systems) with D-Cache enabled. But at least we had a choice of run-time disabled dcache. With the recent changes we have to disable cache support at compile time.
I see what you're after. But do you consider runtime disabling the cache is the way to go or it's a way of hiding bugs?
Both ;) And now we are going to hide even more bugs with compile-time disabling :(
Does someone wish to argue we should disable USB support instead on these platforms? I don't see anyone arguing "I have time to fix this for v2012.07".
I don't have time to fix this, I'm hacking 200% already.
Best regards, Marek Vasut

USB EHCI and DCACHE are not compatible, so disable DCACHE support at build-time as run-time disable is insufficient for USB use.
Signed-off-by: Tom Rini trini@ti.com --- include/configs/omap3_beagle.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/configs/omap3_beagle.h b/include/configs/omap3_beagle.h index b891ee4..79560d7 100644 --- a/include/configs/omap3_beagle.h +++ b/include/configs/omap3_beagle.h @@ -131,7 +131,7 @@ #define CONFIG_USB_EHCI
#define CONFIG_USB_EHCI_OMAP -/*#define CONFIG_EHCI_DCACHE*/ /* leave it disabled for now */ +#define CONFIG_SYS_DCACHE_OFF /* USB_EHCI is unusable with DCACHE support */ #define CONFIG_OMAP_EHCI_PHY1_RESET_GPIO 147
#define CONFIG_USB_ULPI

USB EHCI and DCACHE are not compatible, so disable DCACHE support at build-time as run-time disable is insufficient for USB use.
Cc: Sricharan R r.sricharan@ti.com Signed-off-by: Tom Rini trini@ti.com --- include/configs/omap4_panda.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/include/configs/omap4_panda.h b/include/configs/omap4_panda.h index b4756be..468cf7a 100644 --- a/include/configs/omap4_panda.h +++ b/include/configs/omap4_panda.h @@ -38,6 +38,7 @@ #define CONFIG_USB_HOST #define CONFIG_USB_EHCI #define CONFIG_USB_EHCI_OMAP +#define CONFIG_SYS_DCACHE_OFF /* USB_EHCI is unusable with DCACHE support */ #define CONFIG_USB_STORAGE #define CONFIG_SYS_USB_EHCI_MAX_ROOT_PORTS 3

Hi Tom, [snip..]
#define CONFIG_USB_HOST #define CONFIG_USB_EHCI #define CONFIG_USB_EHCI_OMAP +#define CONFIG_SYS_DCACHE_OFF /* USB_EHCI is unusable with
DCACHE
support */ #define CONFIG_USB_STORAGE #define CONFIG_SYS_USB_EHCI_MAX_ROOT_PORTS 3
While, this looks like to be the safest option till we fix both the cache issue and alignment issue in USB stack.
Thanks, Sricharan

Hi, On Thu, Jun 21, 2012 at 4:49 PM, Sricharan R r.sricharan@ti.com wrote:
Hi Tom, [snip..]
#define CONFIG_USB_HOST #define CONFIG_USB_EHCI #define CONFIG_USB_EHCI_OMAP +#define CONFIG_SYS_DCACHE_OFF /* USB_EHCI is unusable with
DCACHE
support */ #define CONFIG_USB_STORAGE #define CONFIG_SYS_USB_EHCI_MAX_ROOT_PORTS 3
While, this looks like to be the safest option till we fix both the cache issue and alignment issue in USB stack.
Acked-by: R Sricharan r.sricharan@ti.com
Thanks, Sricharan

USB EHCI and DCACHE are not compatible, so disable DCACHE support at build-time as run-time disable is insufficient for USB use.
Cc: Stefano Babic sbabic@denx.de Signed-off-by: Tom Rini trini@ti.com --- include/configs/tam3517-common.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/include/configs/tam3517-common.h b/include/configs/tam3517-common.h index 3fc2c44..b9c96b9 100644 --- a/include/configs/tam3517-common.h +++ b/include/configs/tam3517-common.h @@ -100,6 +100,7 @@ #define CONFIG_OMAP3_GPIO_5 #define CONFIG_USB_EHCI #define CONFIG_USB_EHCI_OMAP +#define CONFIG_SYS_DCACHE_OFF /* USB_EHCI is unusable with DCACHE support */ #define CONFIG_USB_ULPI #define CONFIG_USB_ULPI_VIEWPORT_OMAP #define CONFIG_OMAP_EHCI_PHY1_RESET_GPIO 25

Dear Tom Rini,
Hey all,
In commit b8adb12 the cache flushing behavior was changed for the EHCI stack. This change showed a few different problems on TI platforms (where our cacheline size is 64 not 32).
Good thing, it made a bug surface ;-)
First, the dcache_off call that ehci-omap had been doing was now not happening soon enough to paper over the cache issues.
Hm, is the dcache_off() call implemented properly so nothing is lost when you shut off the cache btw?
This call is removed in patch 1. Second, when we have dcache support compiled in but turned off via 'dcache off' the cache routines spam the console about alignment issues when a cache flush is attempted. This is a problem in that it makes operations extremely slow (as we're spending all our time spitting messages to console). The second patch makes the flush routines return when the dcache is off. The last two patches deal with the same problem, for EHCI and for MUSB. The USB spec says that 32 bytes is the minimum alignment but we need larger alignment when the cache is larger. Note that we can't use MAX() here as gcc doesn't allow that expansion inside of align(..).
Tested on omap3_beagle (which was previously broken) and a MAKEALL -a arm looks good too.
Good job Tom, thanks for spending time on fixing this!
Best regards, Marek Vasut

On 06/14/2012 03:02 PM, Marek Vasut wrote:
Dear Tom Rini,
Hey all,
In commit b8adb12 the cache flushing behavior was changed for the EHCI stack. This change showed a few different problems on TI platforms (where our cacheline size is 64 not 32).
Good thing, it made a bug surface ;-)
First, the dcache_off call that ehci-omap had been doing was now not happening soon enough to paper over the cache issues.
Hm, is the dcache_off() call implemented properly so nothing is lost when you shut off the cache btw?
As best I can tell, yes. It will do a dcache_flush_all() and then set the correct bits.

Dear Tom Rini,
On 06/14/2012 03:02 PM, Marek Vasut wrote:
Dear Tom Rini,
Hey all,
In commit b8adb12 the cache flushing behavior was changed for the EHCI stack. This change showed a few different problems on TI platforms (where our cacheline size is 64 not 32).
Good thing, it made a bug surface ;-)
First, the dcache_off call that ehci-omap had been doing was now not happening soon enough to paper over the cache issues.
Hm, is the dcache_off() call implemented properly so nothing is lost when you shut off the cache btw?
As best I can tell, yes. It will do a dcache_flush_all() and then set the correct bits.
Ok, good
Best regards, Marek Vasut
participants (7)
-
Aneesh V
-
Ilya Yanok
-
Marek Vasut
-
Marek Vasut
-
R, Sricharan
-
Sricharan R
-
Tom Rini