[U-Boot] [PATCH V2 0/8] Fixes to EHCI cache handling & smsc95xx

This is my second version of long-dicussed series initially posted by Tom, then by me and then by Marek ;)
This inlcudes proper alignment handling inside EHCI HCD driver, a little bit more careful error handling and cacheline alignment fix for smsc95xx USB Ethernet driver.
I noticed no problems with mcx board but with TI beagle xM I get "EHCI timed out on TD" errors from time to time. Probably that's because I'm testing with different set of devices. Anyway I can see the same errors on beagle xM without these series applied (with D-Cache completely disabled) so I don't think the problem was introduced by these patches.
Ilya Yanok (5): ehci-hcd: fix external buffer cache handling usb: pass cache-aligned buffer to usb_get_descriptor() usb: check return value of submit_{control,bulk}_msg ehci-hcd: change debug() to printf() in case of errors smsc95xx: align buffers to cache line size
Marek Vasut (1): common.h: Introduce DEFINE_CACHE_ALIGN_BUFFER
Tom Rini (2): ehci-omap: Do not call dcache_off from omap_ehci_hcd_init ehci-hcd.c, musb_core, usb.h: Add USB_DMA_MINALIGN define for cache alignment
common/usb.c | 12 ++++-- drivers/usb/eth/smsc95xx.c | 13 ++++-- drivers/usb/host/ehci-hcd.c | 93 +++++++++++++++++++++++------------------- drivers/usb/host/ehci-omap.c | 1 - drivers/usb/musb/musb_core.h | 2 +- include/common.h | 21 ++++++++-- include/usb.h | 10 +++++ 7 files changed, 97 insertions(+), 55 deletions(-)

From: Tom Rini trini@ti.com
This has never been completely sufficient and now happens too late to paper over the cache coherency problems with the current USB stack.
Cc: Marek Vasut marex@denx.de Signed-off-by: Tom Rini trini@ti.com Signed-off-by: Ilya Yanok ilya.yanok@cogentembedded.com --- No changes from initial Tom's version.
drivers/usb/host/ehci-omap.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/usb/host/ehci-omap.c b/drivers/usb/host/ehci-omap.c index 1ed7710..292673b 100644 --- a/drivers/usb/host/ehci-omap.c +++ b/drivers/usb/host/ehci-omap.c @@ -246,7 +246,6 @@ int omap_ehci_hcd_init(struct omap_usbhs_board_data *usbhs_pdata) if (is_ehci_phy_mode(usbhs_pdata->port_mode[i])) omap_ehci_soft_phy_reset(i);
- dcache_disable(); hccr = (struct ehci_hccr *)(OMAP_EHCI_BASE); hcor = (struct ehci_hcor *)(OMAP_EHCI_BASE + 0x10);

From: Marek Vasut marex@denx.de
This is the out-of-function-scope counterpart of ALLOC_CACHE_ALIGN_BUFFER.
Signed-off-by: Marek Vasut marex@denx.de Cc: Tom Rini trini@ti.com [ilya.yanok]: added missing <linux/compiler.h> include and {DEFINE,ALLOC}_ALIGN_BUFFER macros allowing explicit alignment specification. Signed-off-by: Ilya Yanok ilya.yanok@cogentembedded.com --- Changes from Marek's version: - added #include <linux/compiler.h> (needed for __aligned()) - added macros with additional argument to allow explicit alignment specification
include/common.h | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/include/common.h b/include/common.h index d1dd65a..be9c278 100644 --- a/include/common.h +++ b/include/common.h @@ -39,6 +39,7 @@ typedef volatile unsigned char vu_char; #include <linux/bitops.h> #include <linux/types.h> #include <linux/string.h> +#include <linux/compiler.h> #include <asm/ptrace.h> #include <stdarg.h> #if defined(CONFIG_PCI) && (defined(CONFIG_4xx) && !defined(CONFIG_AP1000)) @@ -944,11 +945,25 @@ int cpu_release(int nr, int argc, char * const argv[]); * of a function scoped static buffer. It can not be used to create a cache * line aligned global buffer. */ +#define ALLOC_ALIGN_BUFFER(type, name, size, align) \ + char __##name[ROUND(size * sizeof(type), align) + (align - 1)]; \ + \ + type *name = (type *) ALIGN((uintptr_t)__##name, align) #define ALLOC_CACHE_ALIGN_BUFFER(type, name, size) \ - char __##name[ROUND(size * sizeof(type), ARCH_DMA_MINALIGN) + \ - ARCH_DMA_MINALIGN - 1]; \ + ALLOC_ALIGN_BUFFER(type, name, size, ARCH_DMA_MINALIGN) + +/* + * DEFINE_CACHE_ALIGN_BUFFER() is similar to ALLOC_CACHE_ALIGN_BUFFER, but it's + * purpose is to allow allocating aligned buffers outside of function scope. + * Usage of this macro shall be avoided or used with extreme care! + */ +#define DEFINE_ALIGN_BUFFER(type, name, size, align) \ + static char __##name[roundup(size * sizeof(type), align)] \ + __aligned(align); \ \ - type *name = (type *) ALIGN((uintptr_t)__##name, ARCH_DMA_MINALIGN) + static type *name = (type *)__##name +#define DEFINE_CACHE_ALIGN_BUFFER(type, name, size) \ + DEFINE_ALIGN_BUFFER(type, name, size, ARCH_DMA_MINALIGN)
/* Pull in stuff for the build system */ #ifdef DO_DEPS_ONLY

From: Tom Rini trini@ti.com
The USB spec says that 32 bytes is the minimum required alignment. However on some platforms we have a larger minimum requirement for cache coherency. In those cases, use that value rather than the USB spec minimum. We add a cpp check to <usb.h> to define USB_DMA_MINALIGN and make use of it in ehci-hcd.c and musb_core.h. We cannot use MAX() here as we are not allowed to have tests inside of align(...).
Signed-off-by: Tom Rini trini@ti.com [marek.vasut]: introduce some crazy macro voodoo Signed-off-by: Marek Vasut marex@denx.de [ilya.yanok]: moved external buffer fixes to separate patch, we use {ALLOC,DEFINE}_ALIGN_BUFFER macros with alignment of USB_DMA_MINALIGN for qh_list, qh and qtd structures to make sure they are proper aligned for both controller and cache operations. For some unclear reason qh structure should remain static. Signed-off-by: Ilya Yanok ilya.yanok@cogentembedded.com --- Changes from Marek's version: - external buffer related things moved to separate patch - use static arrays for qh structure as it seems that current code relies on it's address being constant.
drivers/usb/host/ehci-hcd.c | 64 ++++++++++++++++++++++-------------------- drivers/usb/musb/musb_core.h | 2 +- include/usb.h | 10 +++++++ 3 files changed, 45 insertions(+), 31 deletions(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 04300be..59039f4 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -34,7 +34,10 @@ struct ehci_hccr *hccr; /* R/O registers, not need for volatile */ volatile struct ehci_hcor *hcor;
static uint16_t portreset; -static struct QH qh_list __attribute__((aligned(32))); +DEFINE_ALIGN_BUFFER(struct QH, qh_list, 1, USB_DMA_MINALIGN); + +#define ALIGN_END_ADDR(type, ptr, size) \ + ((uint32_t)(ptr) + roundup((size) * sizeof(type), USB_DMA_MINALIGN))
static struct descriptor { struct usb_hub_descriptor hub; @@ -207,8 +210,9 @@ static int ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, int length, struct devrequest *req) { - static struct QH qh __attribute__((aligned(32))); - static struct qTD qtd[3] __attribute__((aligned (32))); + /* for some reason this doesn't work with non-static qh */ + DEFINE_ALIGN_BUFFER(struct QH, qh, 1, USB_DMA_MINALIGN); + ALLOC_ALIGN_BUFFER(struct qTD, qtd, 3, USB_DMA_MINALIGN); int qtd_counter = 0;
volatile struct qTD *vtd; @@ -229,8 +233,8 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, le16_to_cpu(req->value), le16_to_cpu(req->value), le16_to_cpu(req->index));
- memset(&qh, 0, sizeof(struct QH)); - memset(qtd, 0, sizeof(qtd)); + memset(qh, 0, sizeof(struct QH)); + memset(qtd, 0, 3 * sizeof(*qtd));
toggle = usb_gettoggle(dev, usb_pipeendpoint(pipe), usb_pipeout(pipe));
@@ -244,7 +248,7 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, * qh_overlay.qt_next ...... 13-10 H * - qh_overlay.qt_altnext */ - qh.qh_link = cpu_to_hc32((uint32_t)&qh_list | QH_LINK_TYPE_QH); + qh->qh_link = cpu_to_hc32((uint32_t)qh_list | QH_LINK_TYPE_QH); c = (usb_pipespeed(pipe) != USB_SPEED_HIGH && usb_pipeendpoint(pipe) == 0) ? 1 : 0; endpt = (8 << 28) | @@ -255,14 +259,14 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, (usb_pipespeed(pipe) << 12) | (usb_pipeendpoint(pipe) << 8) | (0 << 7) | (usb_pipedevice(pipe) << 0); - qh.qh_endpt1 = cpu_to_hc32(endpt); + qh->qh_endpt1 = cpu_to_hc32(endpt); endpt = (1 << 30) | (dev->portnr << 23) | (dev->parent->devnum << 16) | (0 << 8) | (0 << 0); - qh.qh_endpt2 = cpu_to_hc32(endpt); - qh.qh_overlay.qt_next = cpu_to_hc32(QT_NEXT_TERMINATE); + qh->qh_endpt2 = cpu_to_hc32(endpt); + qh->qh_overlay.qt_next = cpu_to_hc32(QT_NEXT_TERMINATE);
- tdp = &qh.qh_overlay.qt_next; + tdp = &qh->qh_overlay.qt_next;
if (req != NULL) { /* @@ -340,13 +344,13 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, tdp = &qtd[qtd_counter++].qt_next; }
- qh_list.qh_link = cpu_to_hc32((uint32_t)&qh | QH_LINK_TYPE_QH); + qh_list->qh_link = cpu_to_hc32((uint32_t)qh | QH_LINK_TYPE_QH);
/* Flush dcache */ - flush_dcache_range((uint32_t)&qh_list, - (uint32_t)&qh_list + sizeof(struct QH)); - flush_dcache_range((uint32_t)&qh, (uint32_t)&qh + sizeof(struct QH)); - flush_dcache_range((uint32_t)qtd, (uint32_t)qtd + sizeof(qtd)); + flush_dcache_range((uint32_t)qh_list, + ALIGN_END_ADDR(struct QH, qh_list, 1)); + flush_dcache_range((uint32_t)qh, ALIGN_END_ADDR(struct QH, qh, 1)); + flush_dcache_range((uint32_t)qtd, ALIGN_END_ADDR(struct qTD, qtd, 3));
usbsts = ehci_readl(&hcor->or_usbsts); ehci_writel(&hcor->or_usbsts, (usbsts & 0x3f)); @@ -369,12 +373,12 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, timeout = USB_TIMEOUT_MS(pipe); do { /* Invalidate dcache */ - invalidate_dcache_range((uint32_t)&qh_list, - (uint32_t)&qh_list + sizeof(struct QH)); - invalidate_dcache_range((uint32_t)&qh, - (uint32_t)&qh + sizeof(struct QH)); + invalidate_dcache_range((uint32_t)qh_list, + ALIGN_END_ADDR(struct QH, qh_list, 1)); + invalidate_dcache_range((uint32_t)qh, + ALIGN_END_ADDR(struct QH, qh, 1)); invalidate_dcache_range((uint32_t)qtd, - (uint32_t)qtd + sizeof(qtd)); + ALIGN_END_ADDR(struct qTD, qtd, 3));
token = hc32_to_cpu(vtd->qt_token); if (!(token & 0x80)) @@ -403,9 +407,9 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, goto fail; }
- qh_list.qh_link = cpu_to_hc32((uint32_t)&qh_list | QH_LINK_TYPE_QH); + qh_list->qh_link = cpu_to_hc32((uint32_t)qh_list | QH_LINK_TYPE_QH);
- token = hc32_to_cpu(qh.qh_overlay.qt_token); + token = hc32_to_cpu(qh->qh_overlay.qt_token); if (!(token & 0x80)) { debug("TOKEN=%#x\n", token); switch (token & 0xfc) { @@ -733,16 +737,16 @@ int usb_lowlevel_init(void) #endif
/* Set head of reclaim list */ - memset(&qh_list, 0, sizeof(qh_list)); - qh_list.qh_link = cpu_to_hc32((uint32_t)&qh_list | QH_LINK_TYPE_QH); - qh_list.qh_endpt1 = cpu_to_hc32((1 << 15) | (USB_SPEED_HIGH << 12)); - qh_list.qh_curtd = cpu_to_hc32(QT_NEXT_TERMINATE); - qh_list.qh_overlay.qt_next = cpu_to_hc32(QT_NEXT_TERMINATE); - qh_list.qh_overlay.qt_altnext = cpu_to_hc32(QT_NEXT_TERMINATE); - qh_list.qh_overlay.qt_token = cpu_to_hc32(0x40); + memset(qh_list, 0, sizeof(*qh_list)); + qh_list->qh_link = cpu_to_hc32((uint32_t)qh_list | QH_LINK_TYPE_QH); + qh_list->qh_endpt1 = cpu_to_hc32((1 << 15) | (USB_SPEED_HIGH << 12)); + qh_list->qh_curtd = cpu_to_hc32(QT_NEXT_TERMINATE); + qh_list->qh_overlay.qt_next = cpu_to_hc32(QT_NEXT_TERMINATE); + qh_list->qh_overlay.qt_altnext = cpu_to_hc32(QT_NEXT_TERMINATE); + qh_list->qh_overlay.qt_token = cpu_to_hc32(0x40);
/* Set async. queue head pointer. */ - ehci_writel(&hcor->or_asynclistaddr, (uint32_t)&qh_list); + ehci_writel(&hcor->or_asynclistaddr, (uint32_t)qh_list);
reg = ehci_readl(&hccr->cr_hcsparams); descriptor.hub.bNbrPorts = HCS_N_PORTS(reg); diff --git a/drivers/usb/musb/musb_core.h b/drivers/usb/musb/musb_core.h index a8adcce..e914369 100644 --- a/drivers/usb/musb/musb_core.h +++ b/drivers/usb/musb/musb_core.h @@ -145,7 +145,7 @@ struct musb_regs { struct musb_epN_regs epN; } ep[16];
-} __attribute__((packed, aligned(32))); +} __attribute__((packed, aligned(USB_DMA_MINALIGN))); #endif
/* diff --git a/include/usb.h b/include/usb.h index 6da91e7..ba3d169 100644 --- a/include/usb.h +++ b/include/usb.h @@ -29,6 +29,16 @@ #include <usb_defs.h> #include <usbdescriptors.h>
+/* + * The EHCI spec says that we must align to at least 32 bytes. However, + * some platforms require larger alignment. + */ +#if ARCH_DMA_MINALIGN > 32 +#define USB_DMA_MINALIGN ARCH_DMA_MINALIGN +#else +#define USB_DMA_MINALIGN 32 +#endif + /* Everything is aribtrary */ #define USB_ALTSETTINGALLOC 4 #define USB_MAXALTSETTING 128 /* Hard limit */

Dear Ilya Yanok,
[...]
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 04300be..59039f4 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -34,7 +34,10 @@ struct ehci_hccr *hccr; /* R/O registers, not need for volatile */ volatile struct ehci_hcor *hcor;
static uint16_t portreset; -static struct QH qh_list __attribute__((aligned(32))); +DEFINE_ALIGN_BUFFER(struct QH, qh_list, 1, USB_DMA_MINALIGN);
+#define ALIGN_END_ADDR(type, ptr, size) \
- ((uint32_t)(ptr) + roundup((size) * sizeof(type), USB_DMA_MINALIGN))
static struct descriptor { struct usb_hub_descriptor hub; @@ -207,8 +210,9 @@ static int ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, int length, struct devrequest *req) {
- static struct QH qh __attribute__((aligned(32)));
- static struct qTD qtd[3] __attribute__((aligned (32)));
- /* for some reason this doesn't work with non-static qh */
- DEFINE_ALIGN_BUFFER(struct QH, qh, 1, USB_DMA_MINALIGN);
For some reason this doesn't work, but for obvious reason this is NAK. The rest is OK, but this is not. Let's investigate.
ALLOC_ALIGN_BUFFER(struct qTD, qtd, 3, USB_DMA_MINALIGN); int qtd_counter = 0;
volatile struct qTD *vtd;
[...]
I'll poke into this hopefully later today, but if you can try finding something until then, that'd be great :)

Dear Marek,
On Sun, Jul 15, 2012 at 6:59 PM, Marek Vasut marex@denx.de wrote:
index 04300be..59039f4 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -34,7 +34,10 @@ struct ehci_hccr *hccr; /* R/O registers, not need
for
volatile */ volatile struct ehci_hcor *hcor;
static uint16_t portreset; -static struct QH qh_list __attribute__((aligned(32))); +DEFINE_ALIGN_BUFFER(struct QH, qh_list, 1, USB_DMA_MINALIGN);
+#define ALIGN_END_ADDR(type, ptr, size) \
((uint32_t)(ptr) + roundup((size) * sizeof(type),
USB_DMA_MINALIGN))
static struct descriptor { struct usb_hub_descriptor hub; @@ -207,8 +210,9 @@ static int ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, int length, struct devrequest *req) {
static struct QH qh __attribute__((aligned(32)));
static struct qTD qtd[3] __attribute__((aligned (32)));
/* for some reason this doesn't work with non-static qh */
DEFINE_ALIGN_BUFFER(struct QH, qh, 1, USB_DMA_MINALIGN);
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c
For some reason this doesn't work, but for obvious reason this is NAK. The rest is OK, but this is not. Let's investigate.
But this was static since the initial EHCI support introduction... the patch just preserves the previous behavior. I agree we need to find the reason it doesn't work with on-stack qh and fix it but it's another problem, not the one these series try to address.
ALLOC_ALIGN_BUFFER(struct qTD, qtd, 3, USB_DMA_MINALIGN); int qtd_counter = 0; volatile struct qTD *vtd;
[...]
I'll poke into this hopefully later today, but if you can try finding something until then, that'd be great :)
Sorry, I've already lost too much time with this and I won't return to this in the near future.
Regards, Ilya.

Dear Ilya Yanok,
Dear Marek,
On Sun, Jul 15, 2012 at 6:59 PM, Marek Vasut marex@denx.de wrote:
index 04300be..59039f4 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -34,7 +34,10 @@ struct ehci_hccr *hccr; /* R/O registers, not need
for
volatile */ volatile struct ehci_hcor *hcor;
static uint16_t portreset;
-static struct QH qh_list __attribute__((aligned(32))); +DEFINE_ALIGN_BUFFER(struct QH, qh_list, 1, USB_DMA_MINALIGN);
+#define ALIGN_END_ADDR(type, ptr, size) \
((uint32_t)(ptr) + roundup((size) * sizeof(type),
USB_DMA_MINALIGN))
static struct descriptor {
struct usb_hub_descriptor hub;
@@ -207,8 +210,9 @@ static int
ehci_submit_async(struct usb_device *dev, unsigned long pipe, void
*buffer, int length, struct devrequest *req)
{
static struct QH qh __attribute__((aligned(32)));
static struct qTD qtd[3] __attribute__((aligned (32)));
/* for some reason this doesn't work with non-static qh */
DEFINE_ALIGN_BUFFER(struct QH, qh, 1, USB_DMA_MINALIGN);
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c
For some reason this doesn't work, but for obvious reason this is NAK. The rest is OK, but this is not. Let's investigate.
But this was static since the initial EHCI support introduction... the patch just preserves the previous behavior. I agree we need to find the reason it doesn't work with on-stack qh and fix it but it's another problem, not the one these series try to address.
But this is like covering bugs ... the rest of the patches are OK, it's just this little detail that annoys me.
ALLOC_ALIGN_BUFFER(struct qTD, qtd, 3, USB_DMA_MINALIGN); int qtd_counter = 0; volatile struct qTD *vtd;
[...]
I'll poke into this hopefully later today, but if you can try finding something until then, that'd be great :)
Sorry, I've already lost too much time with this and I won't return to this in the near future.
Dunno how much this might motivate you, but this is about my thought now:
http://farm8.staticflickr.com/7138/7456661744_331a4f3535.jpg
I'll try taking over from here though. Thanks for the set and for all your investment into this!
Regards, Ilya.
Best regards, Marek Vasut

Dear Marek,
ok, I finally managed to fix it. Will post the patches in a few seconds.
Regards, Ilya.

Move or_asynclistaddr programming to ehci_submit_async() function to make sure queue head is properly programmed before every transfer. This solves the problem with changing qh address.
Also remove unneeded qh_list->qh_link reprogramming at the end of transfer.
Signed-off-by: Ilya Yanok ilya.yanok@cogentembedded.com --- drivers/usb/host/ehci-hcd.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 04300be..3be2a77 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -348,6 +348,9 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, flush_dcache_range((uint32_t)&qh, (uint32_t)&qh + sizeof(struct QH)); flush_dcache_range((uint32_t)qtd, (uint32_t)qtd + sizeof(qtd));
+ /* Set async. queue head pointer. */ + ehci_writel(&hcor->or_asynclistaddr, (uint32_t)&qh_list); + usbsts = ehci_readl(&hcor->or_usbsts); ehci_writel(&hcor->or_usbsts, (usbsts & 0x3f));
@@ -403,8 +406,6 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, goto fail; }
- qh_list.qh_link = cpu_to_hc32((uint32_t)&qh_list | QH_LINK_TYPE_QH); - token = hc32_to_cpu(qh.qh_overlay.qt_token); if (!(token & 0x80)) { debug("TOKEN=%#x\n", token); @@ -741,9 +742,6 @@ int usb_lowlevel_init(void) qh_list.qh_overlay.qt_altnext = cpu_to_hc32(QT_NEXT_TERMINATE); qh_list.qh_overlay.qt_token = cpu_to_hc32(0x40);
- /* Set async. queue head pointer. */ - ehci_writel(&hcor->or_asynclistaddr, (uint32_t)&qh_list); - reg = ehci_readl(&hccr->cr_hcsparams); descriptor.hub.bNbrPorts = HCS_N_PORTS(reg); printf("Register %x NbrPorts %d\n", reg, descriptor.hub.bNbrPorts);

From: Tom Rini trini@ti.com
The USB spec says that 32 bytes is the minimum required alignment. However on some platforms we have a larger minimum requirement for cache coherency. In those cases, use that value rather than the USB spec minimum. We add a cpp check to <usb.h> to define USB_DMA_MINALIGN and make use of it in ehci-hcd.c and musb_core.h. We cannot use MAX() here as we are not allowed to have tests inside of align(...).
Signed-off-by: Tom Rini trini@ti.com [marek.vasut]: introduce some crazy macro voodoo Signed-off-by: Marek Vasut marex@denx.de [ilya.yanok]: moved external buffer fixes to separate patch, we use {ALLOC,DEFINE}_ALIGN_BUFFER macros with alignment of USB_DMA_MINALIGN for qh_list, qh and qtd structures to make sure they are proper aligned for both controller and cache operations. Signed-off-by: Ilya Yanok ilya.yanok@cogentembedded.com --- Changes from V2: - qh struct made non-static
drivers/usb/host/ehci-hcd.c | 61 ++++++++++++++++++++++-------------------- drivers/usb/musb/musb_core.h | 2 +- include/usb.h | 10 +++++++ 3 files changed, 43 insertions(+), 30 deletions(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 3be2a77..2b46662 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -34,7 +34,10 @@ struct ehci_hccr *hccr; /* R/O registers, not need for volatile */ volatile struct ehci_hcor *hcor;
static uint16_t portreset; -static struct QH qh_list __attribute__((aligned(32))); +DEFINE_ALIGN_BUFFER(struct QH, qh_list, 1, USB_DMA_MINALIGN); + +#define ALIGN_END_ADDR(type, ptr, size) \ + ((uint32_t)(ptr) + roundup((size) * sizeof(type), USB_DMA_MINALIGN))
static struct descriptor { struct usb_hub_descriptor hub; @@ -207,8 +210,8 @@ static int ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, int length, struct devrequest *req) { - static struct QH qh __attribute__((aligned(32))); - static struct qTD qtd[3] __attribute__((aligned (32))); + ALLOC_ALIGN_BUFFER(struct QH, qh, 1, USB_DMA_MINALIGN); + ALLOC_ALIGN_BUFFER(struct qTD, qtd, 3, USB_DMA_MINALIGN); int qtd_counter = 0;
volatile struct qTD *vtd; @@ -229,8 +232,8 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, le16_to_cpu(req->value), le16_to_cpu(req->value), le16_to_cpu(req->index));
- memset(&qh, 0, sizeof(struct QH)); - memset(qtd, 0, sizeof(qtd)); + memset(qh, 0, sizeof(struct QH)); + memset(qtd, 0, 3 * sizeof(*qtd));
toggle = usb_gettoggle(dev, usb_pipeendpoint(pipe), usb_pipeout(pipe));
@@ -244,7 +247,7 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, * qh_overlay.qt_next ...... 13-10 H * - qh_overlay.qt_altnext */ - qh.qh_link = cpu_to_hc32((uint32_t)&qh_list | QH_LINK_TYPE_QH); + qh->qh_link = cpu_to_hc32((uint32_t)qh_list | QH_LINK_TYPE_QH); c = (usb_pipespeed(pipe) != USB_SPEED_HIGH && usb_pipeendpoint(pipe) == 0) ? 1 : 0; endpt = (8 << 28) | @@ -255,14 +258,14 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, (usb_pipespeed(pipe) << 12) | (usb_pipeendpoint(pipe) << 8) | (0 << 7) | (usb_pipedevice(pipe) << 0); - qh.qh_endpt1 = cpu_to_hc32(endpt); + qh->qh_endpt1 = cpu_to_hc32(endpt); endpt = (1 << 30) | (dev->portnr << 23) | (dev->parent->devnum << 16) | (0 << 8) | (0 << 0); - qh.qh_endpt2 = cpu_to_hc32(endpt); - qh.qh_overlay.qt_next = cpu_to_hc32(QT_NEXT_TERMINATE); + qh->qh_endpt2 = cpu_to_hc32(endpt); + qh->qh_overlay.qt_next = cpu_to_hc32(QT_NEXT_TERMINATE);
- tdp = &qh.qh_overlay.qt_next; + tdp = &qh->qh_overlay.qt_next;
if (req != NULL) { /* @@ -340,16 +343,16 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, tdp = &qtd[qtd_counter++].qt_next; }
- qh_list.qh_link = cpu_to_hc32((uint32_t)&qh | QH_LINK_TYPE_QH); + qh_list->qh_link = cpu_to_hc32((uint32_t)qh | QH_LINK_TYPE_QH);
/* Flush dcache */ - flush_dcache_range((uint32_t)&qh_list, - (uint32_t)&qh_list + sizeof(struct QH)); - flush_dcache_range((uint32_t)&qh, (uint32_t)&qh + sizeof(struct QH)); - flush_dcache_range((uint32_t)qtd, (uint32_t)qtd + sizeof(qtd)); + flush_dcache_range((uint32_t)qh_list, + ALIGN_END_ADDR(struct QH, qh_list, 1)); + flush_dcache_range((uint32_t)qh, ALIGN_END_ADDR(struct QH, qh, 1)); + flush_dcache_range((uint32_t)qtd, ALIGN_END_ADDR(struct qTD, qtd, 3));
/* Set async. queue head pointer. */ - ehci_writel(&hcor->or_asynclistaddr, (uint32_t)&qh_list); + ehci_writel(&hcor->or_asynclistaddr, (uint32_t)qh_list);
usbsts = ehci_readl(&hcor->or_usbsts); ehci_writel(&hcor->or_usbsts, (usbsts & 0x3f)); @@ -372,12 +375,12 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, timeout = USB_TIMEOUT_MS(pipe); do { /* Invalidate dcache */ - invalidate_dcache_range((uint32_t)&qh_list, - (uint32_t)&qh_list + sizeof(struct QH)); - invalidate_dcache_range((uint32_t)&qh, - (uint32_t)&qh + sizeof(struct QH)); + invalidate_dcache_range((uint32_t)qh_list, + ALIGN_END_ADDR(struct QH, qh_list, 1)); + invalidate_dcache_range((uint32_t)qh, + ALIGN_END_ADDR(struct QH, qh, 1)); invalidate_dcache_range((uint32_t)qtd, - (uint32_t)qtd + sizeof(qtd)); + ALIGN_END_ADDR(struct qTD, qtd, 3));
token = hc32_to_cpu(vtd->qt_token); if (!(token & 0x80)) @@ -406,7 +409,7 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, goto fail; }
- token = hc32_to_cpu(qh.qh_overlay.qt_token); + token = hc32_to_cpu(qh->qh_overlay.qt_token); if (!(token & 0x80)) { debug("TOKEN=%#x\n", token); switch (token & 0xfc) { @@ -734,13 +737,13 @@ int usb_lowlevel_init(void) #endif
/* Set head of reclaim list */ - memset(&qh_list, 0, sizeof(qh_list)); - qh_list.qh_link = cpu_to_hc32((uint32_t)&qh_list | QH_LINK_TYPE_QH); - qh_list.qh_endpt1 = cpu_to_hc32((1 << 15) | (USB_SPEED_HIGH << 12)); - qh_list.qh_curtd = cpu_to_hc32(QT_NEXT_TERMINATE); - qh_list.qh_overlay.qt_next = cpu_to_hc32(QT_NEXT_TERMINATE); - qh_list.qh_overlay.qt_altnext = cpu_to_hc32(QT_NEXT_TERMINATE); - qh_list.qh_overlay.qt_token = cpu_to_hc32(0x40); + memset(qh_list, 0, sizeof(*qh_list)); + qh_list->qh_link = cpu_to_hc32((uint32_t)qh_list | QH_LINK_TYPE_QH); + qh_list->qh_endpt1 = cpu_to_hc32((1 << 15) | (USB_SPEED_HIGH << 12)); + qh_list->qh_curtd = cpu_to_hc32(QT_NEXT_TERMINATE); + qh_list->qh_overlay.qt_next = cpu_to_hc32(QT_NEXT_TERMINATE); + qh_list->qh_overlay.qt_altnext = cpu_to_hc32(QT_NEXT_TERMINATE); + qh_list->qh_overlay.qt_token = cpu_to_hc32(0x40);
reg = ehci_readl(&hccr->cr_hcsparams); descriptor.hub.bNbrPorts = HCS_N_PORTS(reg); diff --git a/drivers/usb/musb/musb_core.h b/drivers/usb/musb/musb_core.h index a8adcce..e914369 100644 --- a/drivers/usb/musb/musb_core.h +++ b/drivers/usb/musb/musb_core.h @@ -145,7 +145,7 @@ struct musb_regs { struct musb_epN_regs epN; } ep[16];
-} __attribute__((packed, aligned(32))); +} __attribute__((packed, aligned(USB_DMA_MINALIGN))); #endif
/* diff --git a/include/usb.h b/include/usb.h index 6da91e7..ba3d169 100644 --- a/include/usb.h +++ b/include/usb.h @@ -29,6 +29,16 @@ #include <usb_defs.h> #include <usbdescriptors.h>
+/* + * The EHCI spec says that we must align to at least 32 bytes. However, + * some platforms require larger alignment. + */ +#if ARCH_DMA_MINALIGN > 32 +#define USB_DMA_MINALIGN ARCH_DMA_MINALIGN +#else +#define USB_DMA_MINALIGN 32 +#endif + /* Everything is aribtrary */ #define USB_ALTSETTINGALLOC 4 #define USB_MAXALTSETTING 128 /* Hard limit */

Dear Ilya Yanok,
Dear Marek,
ok, I finally managed to fix it. Will post the patches in a few seconds.
So the link I sent you was true afterall ? :)
What was the problem? I'll pick the patches for -next though, unless we're 100% definitelly sure they won't break anything else.
Regards, Ilya.
Best regards, Marek Vasut

Dear Marek,
On Mon, Jul 16, 2012 at 2:47 PM, Marek Vasut marex@denx.de wrote:
ok, I finally managed to fix it. Will post the patches in a few seconds.
So the link I sent you was true afterall ? :)
Not exactly ;) Actually I've already switched to another task but when I went to bed one idea came to my mind and I decided to try it.
What was the problem? I'll pick the patches for -next though, unless we're
100% definitelly sure they won't break anything else.
It looks like the controller could stop at the second QH structure (the one we allocated on stack) on the end of transfer. So on the next function call it started with the incorrect QH if address of qh was changed.
Regards, Ilya.

Dear Ilya Yanok,
Dear Marek,
ok, I finally managed to fix it. Will post the patches in a few seconds.
Regards, Ilya.
I applied and pushed it to u-boot-usb. Can you please give it a proper test now and check if nothing is missing?
Thanks Ilya, Tom, for working on this!
Best regards, Marek Vasut

Buffer coming from upper layers should be cacheline aligned/padded to perform safe cache operations. For now we don't do bounce buffering so getting unaligned buffer is an upper layer error. We can't check if the buffer is properly padded with current interface so just assume it is (consider changing with in the future). The following changes are done:
1. Remove useless length alignment check. We get actual transfer length not the size of the underlying buffer so it's perfectly valid for it to be unaligned. 2. Move flush_dcache_range() out of while loop or it will flush too much. 3. Don't try to fix buffer address before calling invalidate: if it's unaligned it's an error anyway so let cache subsystem cry about that. 4. Fix end buffer address to be cacheline aligned assuming upper layer reserved enough space. This is potentially dangerous operation so upper layers should be careful about that.
Signed-off-by: Ilya Yanok ilya.yanok@cogentembedded.com --- drivers/usb/host/ehci-hcd.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 59039f4..a6cd5e3 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -175,18 +175,15 @@ static int ehci_td_buffer(struct qTD *td, void *buf, size_t sz) { uint32_t delta, next; uint32_t addr = (uint32_t)buf; - size_t rsz = roundup(sz, 32); int idx;
- if (sz != rsz) - debug("EHCI-HCD: Misaligned buffer size (%08x)\n", sz); - - if (addr & 31) + if (addr != ALIGN(addr, ARCH_DMA_MINALIGN)) debug("EHCI-HCD: Misaligned buffer address (%p)\n", buf);
+ flush_dcache_range(addr, ALIGN(addr + sz, ARCH_DMA_MINALIGN)); + idx = 0; while (idx < 5) { - flush_dcache_range(addr, addr + rsz); td->qt_buffer[idx] = cpu_to_hc32(addr); td->qt_buffer_hi[idx] = 0; next = (addr + 4096) & ~4095; @@ -386,9 +383,17 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, WATCHDOG_RESET(); } while (get_timer(ts) < timeout);
- /* Invalidate the memory area occupied by buffer */ - invalidate_dcache_range(((uint32_t)buffer & ~31), - ((uint32_t)buffer & ~31) + roundup(length, 32)); + /* + * Invalidate the memory area occupied by buffer + * Don't try to fix the buffer alignment, if it isn't properly + * aligned it's upper layer's fault so let invalidate_dcache_range() + * vow about it. But we have to fix the length as it's actual + * transfer length and can be unaligned. This is potentially + * dangerous operation, it's responsibility of the calling + * code to make sure enough space is reserved. + */ + invalidate_dcache_range((uint32_t)buffer, + ALIGN((uint32_t)buffer + length, ARCH_DMA_MINALIGN));
/* Check that the TD processing happened */ if (token & 0x80) {

Dear Ilya Yanok,
Buffer coming from upper layers should be cacheline aligned/padded to perform safe cache operations. For now we don't do bounce buffering so getting unaligned buffer is an upper layer error. We can't check if the buffer is properly padded with current interface so just assume it is (consider changing with in the future). The following changes are done:
- Remove useless length alignment check. We get actual transfer
length not the size of the underlying buffer so it's perfectly valid for it to be unaligned. 2. Move flush_dcache_range() out of while loop or it will flush too much. 3. Don't try to fix buffer address before calling invalidate: if it's unaligned it's an error anyway so let cache subsystem cry about that. 4. Fix end buffer address to be cacheline aligned assuming upper layer reserved enough space. This is potentially dangerous operation so upper layers should be careful about that.
Signed-off-by: Ilya Yanok ilya.yanok@cogentembedded.com
drivers/usb/host/ehci-hcd.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 59039f4..a6cd5e3 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -175,18 +175,15 @@ static int ehci_td_buffer(struct qTD *td, void *buf, size_t sz) { uint32_t delta, next; uint32_t addr = (uint32_t)buf;
size_t rsz = roundup(sz, 32); int idx;
if (sz != rsz)
debug("EHCI-HCD: Misaligned buffer size (%08x)\n", sz);
Shall we not also test if addr + sz is aligned?
- if (addr & 31)
- if (addr != ALIGN(addr, ARCH_DMA_MINALIGN)) debug("EHCI-HCD: Misaligned buffer address (%p)\n", buf);
[...]
Best regards, Marek Vasut

usb_get_descriptor passes it's buffer argument directly to usb_control_msg() so it has to be properly aligned/padded.
Signed-off-by: Ilya Yanok ilya.yanok@cogentembedded.com --- common/usb.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/common/usb.c b/common/usb.c index c80155c..46f4741 100644 --- a/common/usb.c +++ b/common/usb.c @@ -799,12 +799,13 @@ int usb_new_device(struct usb_device *dev) dev->epmaxpacketin[0] = 8; dev->epmaxpacketout[0] = 8;
- err = usb_get_descriptor(dev, USB_DT_DEVICE, 0, &dev->descriptor, 8); + err = usb_get_descriptor(dev, USB_DT_DEVICE, 0, tmpbuf, 8); if (err < 8) { printf("\n USB device not responding, " \ "giving up (status=%lX)\n", dev->status); return 1; } + memcpy(&dev->descriptor, tmpbuf, 8); #else /* This is a Windows scheme of initialization sequence, with double * reset of the device (Linux uses the same sequence) @@ -893,7 +894,7 @@ int usb_new_device(struct usb_device *dev) tmp = sizeof(dev->descriptor);
err = usb_get_descriptor(dev, USB_DT_DEVICE, 0, - &dev->descriptor, sizeof(dev->descriptor)); + tmpbuf, sizeof(dev->descriptor)); if (err < tmp) { if (err < 0) printf("unable to get device descriptor (error=%d)\n", @@ -903,6 +904,7 @@ int usb_new_device(struct usb_device *dev) "(expected %i, got %i)\n", tmp, err); return 1; } + memcpy(&dev->descriptor, tmpbuf, sizeof(dev->descriptor)); /* correct le values */ le16_to_cpus(&dev->descriptor.bcdUSB); le16_to_cpus(&dev->descriptor.idVendor);

Return values of submit_{control,bulk}_msg() functions should be checked to detect possible error.
Signed-off-by: Ilya Yanok ilya.yanok@cogentembedded.com --- common/usb.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/common/usb.c b/common/usb.c index 46f4741..1b40228 100644 --- a/common/usb.c +++ b/common/usb.c @@ -188,7 +188,8 @@ int usb_control_msg(struct usb_device *dev, unsigned int pipe, request, requesttype, value, index, size); dev->status = USB_ST_NOT_PROC; /*not yet processed */
- submit_control_msg(dev, pipe, data, size, setup_packet); + if (submit_control_msg(dev, pipe, data, size, setup_packet) < 0) + return -1; if (timeout == 0) return (int)size;
@@ -220,7 +221,8 @@ int usb_bulk_msg(struct usb_device *dev, unsigned int pipe, if (len < 0) return -1; dev->status = USB_ST_NOT_PROC; /*not yet processed */ - submit_bulk_msg(dev, pipe, data, len); + if (submit_bulk_msg(dev, pipe, data, len) < 0) + return -1; while (timeout--) { if (!((volatile unsigned long)dev->status & USB_ST_NOT_PROC)) break;

Printing message could be useful if something goes really wrong.
Signed-off-by: Ilya Yanok ilya.yanok@cogentembedded.com --- drivers/usb/host/ehci-hcd.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index a6cd5e3..52df7fa 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -196,7 +196,7 @@ static int ehci_td_buffer(struct qTD *td, void *buf, size_t sz) }
if (idx == 5) { - debug("out of buffer pointers (%u bytes left)\n", sz); + printf("out of buffer pointers (%u bytes left)\n", sz); return -1; }
@@ -282,7 +282,7 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, (0 << 15) | (0 << 12) | (3 << 10) | (2 << 8) | (0x80 << 0); qtd[qtd_counter].qt_token = cpu_to_hc32(token); if (ehci_td_buffer(&qtd[qtd_counter], req, sizeof(*req)) != 0) { - debug("unable construct SETUP td\n"); + printf("unable construct SETUP td\n"); goto fail; } /* Update previous qTD! */ @@ -311,7 +311,7 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, ((usb_pipein(pipe) ? 1 : 0) << 8) | (0x80 << 0); qtd[qtd_counter].qt_token = cpu_to_hc32(token); if (ehci_td_buffer(&qtd[qtd_counter], buffer, length) != 0) { - debug("unable construct DATA td\n"); + printf("unable construct DATA td\n"); goto fail; } /* Update previous qTD! */

Align buffers passed to the USB code to cache line size so they can be DMAed safely.
Signed-off-by: Ilya Yanok ilya.yanok@cogentembedded.com --- drivers/usb/eth/smsc95xx.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/eth/smsc95xx.c b/drivers/usb/eth/smsc95xx.c index c7aebea..c62a8c1 100644 --- a/drivers/usb/eth/smsc95xx.c +++ b/drivers/usb/eth/smsc95xx.c @@ -153,13 +153,15 @@ static int curr_eth_dev; /* index for name of next device detected */ static int smsc95xx_write_reg(struct ueth_data *dev, u32 index, u32 data) { int len; + ALLOC_CACHE_ALIGN_BUFFER(u32, tmpbuf, 1);
cpu_to_le32s(&data); + tmpbuf[0] = data;
len = usb_control_msg(dev->pusb_dev, usb_sndctrlpipe(dev->pusb_dev, 0), USB_VENDOR_REQUEST_WRITE_REGISTER, USB_DIR_OUT | USB_TYPE_VENDOR | USB_RECIP_DEVICE, - 00, index, &data, sizeof(data), USB_CTRL_SET_TIMEOUT); + 00, index, tmpbuf, sizeof(data), USB_CTRL_SET_TIMEOUT); if (len != sizeof(data)) { debug("smsc95xx_write_reg failed: index=%d, data=%d, len=%d", index, data, len); @@ -171,11 +173,13 @@ static int smsc95xx_write_reg(struct ueth_data *dev, u32 index, u32 data) static int smsc95xx_read_reg(struct ueth_data *dev, u32 index, u32 *data) { int len; + ALLOC_CACHE_ALIGN_BUFFER(u32, tmpbuf, 1);
len = usb_control_msg(dev->pusb_dev, usb_rcvctrlpipe(dev->pusb_dev, 0), USB_VENDOR_REQUEST_READ_REGISTER, USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_DEVICE, - 00, index, data, sizeof(data), USB_CTRL_GET_TIMEOUT); + 00, index, tmpbuf, sizeof(data), USB_CTRL_GET_TIMEOUT); + *data = tmpbuf[0]; if (len != sizeof(data)) { debug("smsc95xx_read_reg failed: index=%d, len=%d", index, len); @@ -664,7 +668,8 @@ static int smsc95xx_send(struct eth_device *eth, void* packet, int length) int actual_len; u32 tx_cmd_a; u32 tx_cmd_b; - unsigned char msg[PKTSIZE + sizeof(tx_cmd_a) + sizeof(tx_cmd_b)]; + ALLOC_CACHE_ALIGN_BUFFER(unsigned char, msg, + PKTSIZE + sizeof(tx_cmd_a) + sizeof(tx_cmd_b));
debug("** %s(), len %d, buf %#x\n", __func__, length, (int)msg); if (length > PKTSIZE) @@ -695,7 +700,7 @@ static int smsc95xx_send(struct eth_device *eth, void* packet, int length) static int smsc95xx_recv(struct eth_device *eth) { struct ueth_data *dev = (struct ueth_data *)eth->priv; - static unsigned char recv_buf[AX_RX_URB_SIZE]; + DEFINE_CACHE_ALIGN_BUFFER(unsigned char, recv_buf, AX_RX_URB_SIZE); unsigned char *buf_ptr; int err; int actual_len;
participants (3)
-
Ilya Yanok
-
Marek Vasut
-
Marek Vasut