[U-Boot] [PATCH 0/6] usb: ci_udc: fixes and cleanups

From: Stephen Warren swarren@nvidia.com
This is a series of small fixes and cleanups either required by those fixes, or enabled now that the fixes are made.
I hope that either patch 1 or 4 might fix the issues Jörg is seeing, but I'm not sure that will happen. The other patches shouldn't change any behaviour.
Stephen Warren (6): usb: ci_udc: fix ci_flush_{qh,qtd} calls in ci_udc_probe() usb: ci_udc: don't assume QTDs are adjacent when transmitting ZLPs usb: ci_udc: lift ilist size calculations to global scope usb: ci_udc: fix items array size/stride calculation usb: ci_udc: remove controller.items array usb: ci_udc: don't memalign() struct ci_req allocations
drivers/usb/gadget/ci_udc.c | 62 ++++++++++++++++++++++----------------------- drivers/usb/gadget/ci_udc.h | 1 - 2 files changed, 30 insertions(+), 33 deletions(-)

From: Stephen Warren swarren@nvidia.com
ci_udc_probe() initializes a pair of QHs and QTDs for each EP. After each pair has been initialized, the pair is cache-flushed. The conversion from QH/QTD index [0..2*NUM_END_POINTS) to EP index [0..NUM_ENDPOINTS] is incorrect; it simply subtracts 1 (which yields the QH/QTD index of the first entry in the pair) rather than dividing by two (which scales the range). Fix this.
On my system, this avoids cache debug prints due to requests to flush unaligned ranges. This is caused because the flush calls happen before the items[] array entries are initialized for all but EP0.
Signed-off-by: Stephen Warren swarren@nvidia.com --- drivers/usb/gadget/ci_udc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/gadget/ci_udc.c b/drivers/usb/gadget/ci_udc.c index a6433e8d2d8d..8ba604841c47 100644 --- a/drivers/usb/gadget/ci_udc.c +++ b/drivers/usb/gadget/ci_udc.c @@ -834,8 +834,8 @@ static int ci_udc_probe(void) controller.items[i] = (struct ept_queue_item *)imem;
if (i & 1) { - ci_flush_qh(i - 1); - ci_flush_qtd(i - 1); + ci_flush_qh(i / 2); + ci_flush_qtd(i / 2); } }

From: Stephen Warren swarren@nvidia.com
Fix ci_ep_submit_next_request()'s ZLP transmission code to explicitly call ci_get_qtd() to find the address of the other QTD to use. This will allow us to correctly align each QTD individually in the future, which may involve leaving a gap between the QTDs.
Signed-off-by: Stephen Warren swarren@nvidia.com --- drivers/usb/gadget/ci_udc.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/gadget/ci_udc.c b/drivers/usb/gadget/ci_udc.c index 8ba604841c47..4115cd5dab0f 100644 --- a/drivers/usb/gadget/ci_udc.c +++ b/drivers/usb/gadget/ci_udc.c @@ -404,10 +404,11 @@ static void ci_ep_submit_next_request(struct ci_ep *ci_ep) * only 1 is used at a time since either an IN or an OUT but * not both is queued. For an IN transaction, item currently * points at the second of these items, so we know that we - * can use (item - 1) to transmit the extra zero-length packet + * can use the other to transmit the extra zero-length packet. */ - item->next = (unsigned)(item - 1); - item--; + struct ept_queue_item *other_item = ci_get_qtd(num, 0); + item->next = (unsigned)other_item; + item = other_item; item->info = INFO_ACTIVE; }

From: Stephen Warren swarren@nvidia.com
This will allow functions other than ci_udc_probe() to make use of the constants in a future change.
This in turn requires converting the const int variables to #defines, since the initialization of one global const int can't depend on the value of another const int; the compiler thinks it's non-constant if that dependency exists.
Signed-off-by: Stephen Warren swarren@nvidia.com --- drivers/usb/gadget/ci_udc.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/drivers/usb/gadget/ci_udc.c b/drivers/usb/gadget/ci_udc.c index 4115cd5dab0f..3a114cf11e17 100644 --- a/drivers/usb/gadget/ci_udc.c +++ b/drivers/usb/gadget/ci_udc.c @@ -34,6 +34,17 @@ #error This driver can not work on systems with caches longer than 128b #endif
+/* + * Each qTD item must be 32-byte aligned, each qTD touple must be + * cacheline aligned. There are two qTD items for each endpoint and + * only one of them is used for the endpoint at time, so we can group + * them together. + */ +#define ILIST_ALIGN roundup(ARCH_DMA_MINALIGN, 32) +#define ILIST_ENT_RAW_SZ (2 * sizeof(struct ept_queue_item)) +#define ILIST_ENT_SZ roundup(ILIST_ENT_RAW_SZ, ARCH_DMA_MINALIGN) +#define ILIST_SZ (NUM_ENDPOINTS * ILIST_ENT_SZ) + #ifndef DEBUG #define DBG(x...) do {} while (0) #else @@ -786,29 +797,18 @@ static int ci_udc_probe(void) const int eplist_raw_sz = num * sizeof(struct ept_queue_head); const int eplist_sz = roundup(eplist_raw_sz, ARCH_DMA_MINALIGN);
- const int ilist_align = roundup(ARCH_DMA_MINALIGN, 32); - const int ilist_ent_raw_sz = 2 * sizeof(struct ept_queue_item); - const int ilist_ent_sz = roundup(ilist_ent_raw_sz, ARCH_DMA_MINALIGN); - const int ilist_sz = NUM_ENDPOINTS * ilist_ent_sz; - /* The QH list must be aligned to 4096 bytes. */ controller.epts = memalign(eplist_align, eplist_sz); if (!controller.epts) return -ENOMEM; memset(controller.epts, 0, eplist_sz);
- /* - * Each qTD item must be 32-byte aligned, each qTD touple must be - * cacheline aligned. There are two qTD items for each endpoint and - * only one of them is used for the endpoint at time, so we can group - * them together. - */ - controller.items_mem = memalign(ilist_align, ilist_sz); + controller.items_mem = memalign(ILIST_ALIGN, ILIST_SZ); if (!controller.items_mem) { free(controller.epts); return -ENOMEM; } - memset(controller.items_mem, 0, ilist_sz); + memset(controller.items_mem, 0, ILIST_SZ);
for (i = 0; i < 2 * NUM_ENDPOINTS; i++) { /* @@ -828,7 +828,7 @@ static int ci_udc_probe(void) head->next = TERMINATE; head->info = 0;
- imem = controller.items_mem + ((i >> 1) * ilist_ent_sz); + imem = controller.items_mem + ((i >> 1) * ILIST_ENT_SZ); if (i & 1) imem += sizeof(struct ept_queue_item);

From: Stephen Warren swarren@nvidia.com
2 QTDs are allocated for each EP. The current allocation scheme aligns the first QTD in each pair, but simply adds the struct size to calculate the second QTD's address. This will result in a non-cache-aligned addresss IF the system's ARCH_DMA_MINALIGN is not 32 bytes (i.e. the size of struct ept_queue_item).
Similarly, the original ilist_ent_sz calculation aligned the value to ARCH_DMA_MINALIGN but didn't take the USB HW's 32-byte alignment requirement into account. This doesn't cause a practical issue unless ARCH_DMA_MINALIGN < 32 (which I suspect is quite unlikely), but we may as well fix the code to be explicit, so it's obviously completely correct.
The new value of ILIST_ENT_SZ takes all alignment requirements into account, so we can simplify ci_{flush,invalidate}_qtd() by simply using that macro rather than calling roundup().
Similarly, the calculation of controller.items[i] can be simplified, since each QTD is evenly spaced at its individual alignment requirement, rather than each pair being aligned, and entries within the pair being spaced apart only by structure size.
Signed-off-by: Stephen Warren swarren@nvidia.com --- drivers/usb/gadget/ci_udc.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-)
diff --git a/drivers/usb/gadget/ci_udc.c b/drivers/usb/gadget/ci_udc.c index 3a114cf11e17..abaf8985503f 100644 --- a/drivers/usb/gadget/ci_udc.c +++ b/drivers/usb/gadget/ci_udc.c @@ -35,15 +35,20 @@ #endif
/* - * Each qTD item must be 32-byte aligned, each qTD touple must be - * cacheline aligned. There are two qTD items for each endpoint and - * only one of them is used for the endpoint at time, so we can group - * them together. + * Every QTD must be individually aligned, since we can program any + * QTD's address into HW. Cache flushing requires ARCH_DMA_MINALIGN, + * and the USB HW requires 32-byte alignment. Align to both: */ #define ILIST_ALIGN roundup(ARCH_DMA_MINALIGN, 32) -#define ILIST_ENT_RAW_SZ (2 * sizeof(struct ept_queue_item)) -#define ILIST_ENT_SZ roundup(ILIST_ENT_RAW_SZ, ARCH_DMA_MINALIGN) -#define ILIST_SZ (NUM_ENDPOINTS * ILIST_ENT_SZ) +/* Each QTD is this size */ +#define ILIST_ENT_RAW_SZ sizeof(struct ept_queue_item) +/* + * Align the size of the QTD too, so we can add this value to each + * QTD's address to get another aligned address. + */ +#define ILIST_ENT_SZ roundup(ILIST_ENT_RAW_SZ, ILIST_ALIGN) +/* For each endpoint, we need 2 QTDs, one for each of IN and OUT */ +#define ILIST_SZ (NUM_ENDPOINTS * 2 * ILIST_ENT_SZ)
#ifndef DEBUG #define DBG(x...) do {} while (0) @@ -184,8 +189,7 @@ static void ci_flush_qtd(int ep_num) { struct ept_queue_item *item = ci_get_qtd(ep_num, 0); const uint32_t start = (uint32_t)item; - const uint32_t end_raw = start + 2 * sizeof(*item); - const uint32_t end = roundup(end_raw, ARCH_DMA_MINALIGN); + const uint32_t end = start + 2 * ILIST_ENT_SZ;
flush_dcache_range(start, end); } @@ -200,8 +204,7 @@ static void ci_invalidate_qtd(int ep_num) { struct ept_queue_item *item = ci_get_qtd(ep_num, 0); const uint32_t start = (uint32_t)item; - const uint32_t end_raw = start + 2 * sizeof(*item); - const uint32_t end = roundup(end_raw, ARCH_DMA_MINALIGN); + const uint32_t end = start + 2 * ILIST_ENT_SZ;
invalidate_dcache_range(start, end); } @@ -828,10 +831,7 @@ static int ci_udc_probe(void) head->next = TERMINATE; head->info = 0;
- imem = controller.items_mem + ((i >> 1) * ILIST_ENT_SZ); - if (i & 1) - imem += sizeof(struct ept_queue_item); - + imem = controller.items_mem + (i * ILIST_ENT_SZ); controller.items[i] = (struct ept_queue_item *)imem;
if (i & 1) {

From: Stephen Warren swarren@nvidia.com
There's no need to store an array of QTD pointers in the controller. Since the calculation is so simple, just have ci_get_qtd() perform it at run-time, rather than pre-calculating everything.
Signed-off-by: Stephen Warren swarren@nvidia.com --- drivers/usb/gadget/ci_udc.c | 8 +++----- drivers/usb/gadget/ci_udc.h | 1 - 2 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/drivers/usb/gadget/ci_udc.c b/drivers/usb/gadget/ci_udc.c index abaf8985503f..89138675c32f 100644 --- a/drivers/usb/gadget/ci_udc.c +++ b/drivers/usb/gadget/ci_udc.c @@ -146,7 +146,9 @@ static struct ept_queue_head *ci_get_qh(int ep_num, int dir_in) */ static struct ept_queue_item *ci_get_qtd(int ep_num, int dir_in) { - return controller.items[(ep_num * 2) + dir_in]; + int index = (ep_num * 2) + dir_in; + uint8_t *imem = controller.items_mem + (index * ILIST_ENT_SZ); + return (struct ept_queue_item *)imem; }
/** @@ -790,7 +792,6 @@ static int ci_pullup(struct usb_gadget *gadget, int is_on) static int ci_udc_probe(void) { struct ept_queue_head *head; - uint8_t *imem; int i;
const int num = 2 * NUM_ENDPOINTS; @@ -831,9 +832,6 @@ static int ci_udc_probe(void) head->next = TERMINATE; head->info = 0;
- imem = controller.items_mem + (i * ILIST_ENT_SZ); - controller.items[i] = (struct ept_queue_item *)imem; - if (i & 1) { ci_flush_qh(i / 2); ci_flush_qtd(i / 2); diff --git a/drivers/usb/gadget/ci_udc.h b/drivers/usb/gadget/ci_udc.h index c2144021e675..346164a64132 100644 --- a/drivers/usb/gadget/ci_udc.h +++ b/drivers/usb/gadget/ci_udc.h @@ -102,7 +102,6 @@ struct ci_drv { struct usb_gadget_driver *driver; struct ehci_ctrl *ctrl; struct ept_queue_head *epts; - struct ept_queue_item *items[2 * NUM_ENDPOINTS]; uint8_t *items_mem; struct ci_ep ep[NUM_ENDPOINTS]; };

From: Stephen Warren swarren@nvidia.com
struct ci_req is a purely software structure, and needs no specific memory alignment. Hence, allocate it with calloc() rather than memalign(). The use of memalign() was left-over from when struct ci_req was going to hold the aligned bounce buffer, but this is now dynamically allocated.
Signed-off-by: Stephen Warren swarren@nvidia.com --- drivers/usb/gadget/ci_udc.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/usb/gadget/ci_udc.c b/drivers/usb/gadget/ci_udc.c index 89138675c32f..b8c36523eb1d 100644 --- a/drivers/usb/gadget/ci_udc.c +++ b/drivers/usb/gadget/ci_udc.c @@ -222,12 +222,11 @@ ci_ep_alloc_request(struct usb_ep *ep, unsigned int gfp_flags) if (num == 0 && controller.ep0_req) return &controller.ep0_req->req;
- ci_req = memalign(ARCH_DMA_MINALIGN, sizeof(*ci_req)); + ci_req = calloc(1, sizeof(*ci_req)); if (!ci_req) return NULL;
INIT_LIST_HEAD(&ci_req->queue); - ci_req->b_buf = 0;
if (num == 0) controller.ep0_req = ci_req;

On 07/01/2014 07:41 PM, Stephen Warren wrote:
From: Stephen Warren swarren@nvidia.com
This is a series of small fixes and cleanups either required by those fixes, or enabled now that the fixes are made.
I hope that either patch 1 or 4 might fix the issues Jörg is seeing, but I'm not sure that will happen. The other patches shouldn't change any behaviour.
Stephen Warren (6): usb: ci_udc: fix ci_flush_{qh,qtd} calls in ci_udc_probe() usb: ci_udc: don't assume QTDs are adjacent when transmitting ZLPs usb: ci_udc: lift ilist size calculations to global scope usb: ci_udc: fix items array size/stride calculation usb: ci_udc: remove controller.items array usb: ci_udc: don't memalign() struct ci_req allocations
drivers/usb/gadget/ci_udc.c | 62 ++++++++++++++++++++++----------------------- drivers/usb/gadget/ci_udc.h | 1 - 2 files changed, 30 insertions(+), 33 deletions(-)
Good news! The last patch usb: ci_udc: don't memalign() struct ci_req allocations removes the timeout error after starting the fourth run of tftp in a row.
This is how I tested. Checked out u-boot-usb/master branch. Applied the necessary patches to support our board. Applied the patches step after step. After applying a patch reset the board and run tftp from console until an error occured, which is always the fourth run. This is the case until applying patch usb: ci_udc: don't memalign() struct ci_req allocations, which throws no timeout error within running tftp about 60 times in a row.

On 07/01/2014 03:25 PM, Jörg Krause wrote:
On 07/01/2014 07:41 PM, Stephen Warren wrote:
From: Stephen Warren swarren@nvidia.com
This is a series of small fixes and cleanups either required by those fixes, or enabled now that the fixes are made.
I hope that either patch 1 or 4 might fix the issues Jörg is seeing, but I'm not sure that will happen. The other patches shouldn't change any behaviour.
Stephen Warren (6): usb: ci_udc: fix ci_flush_{qh,qtd} calls in ci_udc_probe() usb: ci_udc: don't assume QTDs are adjacent when transmitting ZLPs usb: ci_udc: lift ilist size calculations to global scope usb: ci_udc: fix items array size/stride calculation usb: ci_udc: remove controller.items array usb: ci_udc: don't memalign() struct ci_req allocations
drivers/usb/gadget/ci_udc.c | 62 ++++++++++++++++++++++----------------------- drivers/usb/gadget/ci_udc.h | 1 - 2 files changed, 30 insertions(+), 33 deletions(-)
Good news! The last patch usb: ci_udc: don't memalign() struct ci_req allocations removes the timeout error after starting the fourth run of tftp in a row.
This is how I tested. Checked out u-boot-usb/master branch. Applied the necessary patches to support our board. Applied the patches step after step. After applying a patch reset the board and run tftp from console until an error occured, which is always the fourth run.
How many times did you boot the board for each patch? I'm more interested in how often the first TFTP after a reset/power-on passes or fails, so it would be nice to boot the board many times to see what happens to the first TFTP invocation too. This is especially true since you'd indicated before that the problem was (at least sometimes) visible on the first TFTP invocation, and this "it fails the fourth time" symptom is something completely new.
This is the case until applying patch usb: ci_udc: don't memalign() struct ci_req allocations, which throws no timeout error within running tftp about 60 times in a row.
That's quite odd. That patch definitely should not affect behaviour (and indeed I only sent it as an after-thought). If it does, then the only explanation I can think of is that the malloc heap's alignment changed, which just happens to hide the bug you're seeing. No doubt, there is still some lingering cache-flushing bug or similar...
BTW, did you fix the U-Boot header files in your board support patches, so that everything correctly knows that the cache line size is 32? I think it's mandatory to fix that issue before testing the USB code.

On 07/01/2014 03:31 PM, Stephen Warren wrote:
On 07/01/2014 03:25 PM, Jörg Krause wrote:
On 07/01/2014 07:41 PM, Stephen Warren wrote:
From: Stephen Warren swarren@nvidia.com
This is a series of small fixes and cleanups either required by those fixes, or enabled now that the fixes are made.
I hope that either patch 1 or 4 might fix the issues Jörg is seeing, but I'm not sure that will happen. The other patches shouldn't change any behaviour.
Stephen Warren (6): usb: ci_udc: fix ci_flush_{qh,qtd} calls in ci_udc_probe() usb: ci_udc: don't assume QTDs are adjacent when transmitting ZLPs usb: ci_udc: lift ilist size calculations to global scope usb: ci_udc: fix items array size/stride calculation usb: ci_udc: remove controller.items array usb: ci_udc: don't memalign() struct ci_req allocations
drivers/usb/gadget/ci_udc.c | 62 ++++++++++++++++++++++----------------------- drivers/usb/gadget/ci_udc.h | 1 - 2 files changed, 30 insertions(+), 33 deletions(-)
Good news! The last patch usb: ci_udc: don't memalign() struct ci_req allocations removes the timeout error after starting the fourth run of tftp in a row.
This is how I tested. Checked out u-boot-usb/master branch. Applied the necessary patches to support our board. Applied the patches step after step. After applying a patch reset the board and run tftp from console until an error occured, which is always the fourth run.
How many times did you boot the board for each patch? I'm more interested in how often the first TFTP after a reset/power-on passes or fails, so it would be nice to boot the board many times to see what happens to the first TFTP invocation too. This is especially true since you'd indicated before that the problem was (at least sometimes) visible on the first TFTP invocation, and this "it fails the fourth time" symptom is something completely new.
This is the case until applying patch usb: ci_udc: don't memalign() struct ci_req allocations, which throws no timeout error within running tftp about 60 times in a row.
That's quite odd. That patch definitely should not affect behaviour (and indeed I only sent it as an after-thought). If it does, then the only explanation I can think of is that the malloc heap's alignment changed, which just happens to hide the bug you're seeing. No doubt, there is still some lingering cache-flushing bug or similar...
BTW, did you fix the U-Boot header files in your board support patches, so that everything correctly knows that the cache line size is 32? I think it's mandatory to fix that issue before testing the USB code.
Can you please try one more thing:
Go back to u-boot-usb/master plus just your board support patches. Apply the following and test that:
diff --git a/drivers/usb/gadget/ci_udc.c b/drivers/usb/gadget/ci_udc.c index a6433e8d2d8d..13be1b73d3d1 100644 --- a/drivers/usb/gadget/ci_udc.c +++ b/drivers/usb/gadget/ci_udc.c @@ -209,9 +209,9 @@ ci_ep_alloc_request(struct usb_ep *ep, unsigned int gfp_flags) ci_req = memalign(ARCH_DMA_MINALIGN, sizeof(*ci_req)); if (!ci_req) return NULL;
memset(ci_req, 0, sizeof(*ci_req)); INIT_LIST_HEAD(&ci_req->queue);
ci_req->b_buf = 0; if (num == 0) controller.ep0_req = ci_req;
Thanks.

On 07/01/2014 11:36 PM, Stephen Warren wrote:
[snip] Can you please try one more thing:
Go back to u-boot-usb/master plus just your board support patches. Apply the following and test that:
diff --git a/drivers/usb/gadget/ci_udc.c b/drivers/usb/gadget/ci_udc.c index a6433e8d2d8d..13be1b73d3d1 100644 --- a/drivers/usb/gadget/ci_udc.c +++ b/drivers/usb/gadget/ci_udc.c @@ -209,9 +209,9 @@ ci_ep_alloc_request(struct usb_ep *ep, unsigned int gfp_flags) ci_req = memalign(ARCH_DMA_MINALIGN, sizeof(*ci_req)); if (!ci_req) return NULL;
memset(ci_req, 0, sizeof(*ci_req)); INIT_LIST_HEAD(&ci_req->queue);
ci_req->b_buf = 0; if (num == 0) controller.ep0_req = ci_req;
Thanks.
Applied and tested with printf in cache.c enabled. ttp runs more than three times in row without a timeout error.
=> reset resetting ... HTLLCLLC
U-Boot 2014.07-rc3-g88eca85-dirty (Jul 02 2014 - 00:39:20)
CPU: Freescale i.MX28 rev1.2 at 454 MHz BOOT: NAND, 3V3 DRAM: 64 MiB NAND: 128 MiB In: serial Out: serial Err: serial Net: usb_ether [PRIME] Hit any key to stop autoboot: 0 => tftp imx28-airlino.dtb using ci_udc, OUT ep- IN ep- STATUS ep- MAC 00:19:b8:00:00:02 HOST MAC 00:19:b8:00:00:01 high speed config #1: 2 mA, Ethernet Gadget, using CDC Ethernet USB network up! Using usb_ether device TFTP from server 10.0.0.1; our IP address is 10.0.0.2 Filename 'imx28-airlino.dtb'. Load address: 0x40008000 Loading: ## 2.1 MiB/s done Bytes transferred = 18003 (4653 hex) CACHE: Misaligned operation at range [40008000, 4000c653] => CACHE: Misaligned operation at range [43fd0b08, 43fd0b60] CACHE: Misaligned operation at range [43fd0b14, 43fd0b60] using ci_udc, OUT ep- IN ep- STATUS ep- MAC 00:19:b8:00:00:02 HOST MAC 00:19:b8:00:00:01 high speed config #1: 2 mA, Ethernet Gadget, using CDC Ethernet USB network up! Using usb_ether device TFTP from server 10.0.0.1; our IP address is 10.0.0.2 Filename 'imx28-airlino.dtb'. Load address: 0x40008000 Loading: ## 3.4 MiB/s done Bytes transferred = 18003 (4653 hex) CACHE: Misaligned operation at range [40008000, 4000c653] => CACHE: Misaligned operation at range [43fd0b08, 43fd0b60] CACHE: Misaligned operation at range [43fd0b14, 43fd0b60] using ci_udc, OUT ep- IN ep- STATUS ep- MAC 00:19:b8:00:00:02 HOST MAC 00:19:b8:00:00:01 high speed config #1: 2 mA, Ethernet Gadget, using CDC Ethernet USB network up! Using usb_ether device TFTP from server 10.0.0.1; our IP address is 10.0.0.2 Filename 'imx28-airlino.dtb'. Load address: 0x40008000 Loading: ## 4.3 MiB/s done Bytes transferred = 18003 (4653 hex) CACHE: Misaligned operation at range [40008000, 4000c653] => CACHE: Misaligned operation at range [43fd0b08, 43fd0b60] CACHE: Misaligned operation at range [43fd0b14, 43fd0b60] using ci_udc, OUT ep- IN ep- STATUS ep- MAC 00:19:b8:00:00:02 HOST MAC 00:19:b8:00:00:01 high speed config #1: 2 mA, Ethernet Gadget, using CDC Ethernet USB network up! Using usb_ether device TFTP from server 10.0.0.1; our IP address is 10.0.0.2 Filename 'imx28-airlino.dtb'. Load address: 0x40008000 Loading: ## 5.7 MiB/s done Bytes transferred = 18003 (4653 hex) CACHE: Misaligned operation at range [40008000, 4000c653] => CACHE: Misaligned operation at range [43fd0b08, 43fd0b60] CACHE: Misaligned operation at range [43fd0b14, 43fd0b60] using ci_udc, OUT ep- IN ep- STATUS ep- MAC 00:19:b8:00:00:02 HOST MAC 00:19:b8:00:00:01 high speed config #1: 2 mA, Ethernet Gadget, using CDC Ethernet USB network up! Using usb_ether device TFTP from server 10.0.0.1; our IP address is 10.0.0.2 Filename 'imx28-airlino.dtb'. Load address: 0x40008000 Loading: ## 4.3 MiB/s done Bytes transferred = 18003 (4653 hex) CACHE: Misaligned operation at range [40008000, 4000c653]

On Wednesday, July 02, 2014 at 12:42:40 AM, Jörg Krause wrote:
On 07/01/2014 11:36 PM, Stephen Warren wrote:
[snip] Can you please try one more thing:
Go back to u-boot-usb/master plus just your board support patches. Apply
the following and test that:
diff --git a/drivers/usb/gadget/ci_udc.c b/drivers/usb/gadget/ci_udc.c index a6433e8d2d8d..13be1b73d3d1 100644 --- a/drivers/usb/gadget/ci_udc.c +++ b/drivers/usb/gadget/ci_udc.c @@ -209,9 +209,9 @@ ci_ep_alloc_request(struct usb_ep *ep, unsigned int gfp_flags)
ci_req = memalign(ARCH_DMA_MINALIGN, sizeof(*ci_req)); if (!ci_req) return NULL;
memset(ci_req, 0, sizeof(*ci_req)); INIT_LIST_HEAD(&ci_req->queue);
ci_req->b_buf = 0; if (num == 0) controller.ep0_req = ci_req;
Thanks.
Applied and tested with printf in cache.c enabled. ttp runs more than three times in row without a timeout error.
It might really be worth if you ran git log --oneline and showed us exactly which patches were applied for each particular test. I am really getting lost sometimes ...
=> reset resetting ... HTLLCLLC U-Boot 2014.07-rc3-g88eca85-dirty (Jul 02 2014 - 00:39:20) CPU: Freescale i.MX28 rev1.2 at 454 MHz BOOT: NAND, 3V3 DRAM: 64 MiB NAND: 128 MiB In: serial Out: serial Err: serial Net: usb_ether [PRIME] Hit any key to stop autoboot: 0 => tftp imx28-airlino.dtb using ci_udc, OUT ep- IN ep- STATUS ep- MAC 00:19:b8:00:00:02 HOST MAC 00:19:b8:00:00:01 high speed config #1: 2 mA, Ethernet Gadget, using CDC Ethernet USB network up! Using usb_ether device TFTP from server 10.0.0.1; our IP address is 10.0.0.2 Filename 'imx28-airlino.dtb'. Load address: 0x40008000 Loading: ## 2.1 MiB/s done Bytes transferred = 18003 (4653 hex) CACHE: Misaligned operation at range [40008000, 4000c653]
This is a bug somewhere.
=> CACHE: Misaligned operation at range [43fd0b08, 43fd0b60]
This is a bug.
CACHE: Misaligned operation at range [43fd0b14, 43fd0b60]
This is a bug.
It would be nice to figure out where these unaligned accesses come from.

On 07/01/2014 04:42 PM, Jörg Krause wrote:
On 07/01/2014 11:36 PM, Stephen Warren wrote:
[snip] Can you please try one more thing:
Go back to u-boot-usb/master plus just your board support patches. Apply the following and test that:
diff --git a/drivers/usb/gadget/ci_udc.c b/drivers/usb/gadget/ci_udc.c index a6433e8d2d8d..13be1b73d3d1 100644 --- a/drivers/usb/gadget/ci_udc.c +++ b/drivers/usb/gadget/ci_udc.c @@ -209,9 +209,9 @@ ci_ep_alloc_request(struct usb_ep *ep, unsigned int gfp_flags) ci_req = memalign(ARCH_DMA_MINALIGN, sizeof(*ci_req)); if (!ci_req) return NULL;
memset(ci_req, 0, sizeof(*ci_req)); INIT_LIST_HEAD(&ci_req->queue);
ci_req->b_buf = 0; if (num == 0) controller.ep0_req = ci_req;
Thanks.
Applied and tested with printf in cache.c enabled. ttp runs more than three times in row without a timeout error.
Excellent, I guess that does make sense now.
An equivalent of that change is included in patch 6/6, so that explains why you saw it fix your problem.
s3c_udc also needs the change above, so I'll go send a patch for that.
Thanks for testing, and it's great that we got to the bottom of this.

On 07/01/2014 11:31 PM, Stephen Warren wrote:
On 07/01/2014 03:25 PM, Jörg Krause wrote:
On 07/01/2014 07:41 PM, Stephen Warren wrote:
From: Stephen Warren swarren@nvidia.com
This is a series of small fixes and cleanups either required by those fixes, or enabled now that the fixes are made.
I hope that either patch 1 or 4 might fix the issues Jörg is seeing, but I'm not sure that will happen. The other patches shouldn't change any behaviour.
Stephen Warren (6): usb: ci_udc: fix ci_flush_{qh,qtd} calls in ci_udc_probe() usb: ci_udc: don't assume QTDs are adjacent when transmitting ZLPs usb: ci_udc: lift ilist size calculations to global scope usb: ci_udc: fix items array size/stride calculation usb: ci_udc: remove controller.items array usb: ci_udc: don't memalign() struct ci_req allocations
drivers/usb/gadget/ci_udc.c | 62 ++++++++++++++++++++++----------------------- drivers/usb/gadget/ci_udc.h | 1 - 2 files changed, 30 insertions(+), 33 deletions(-)
Good news! The last patch usb: ci_udc: don't memalign() struct ci_req allocations removes the timeout error after starting the fourth run of tftp in a row.
This is how I tested. Checked out u-boot-usb/master branch. Applied the necessary patches to support our board. Applied the patches step after step. After applying a patch reset the board and run tftp from console until an error occured, which is always the fourth run.
How many times did you boot the board for each patch? I'm more interested in how often the first TFTP after a reset/power-on passes or fails, so it would be nice to boot the board many times to see what happens to the first TFTP invocation too. This is especially true since you'd indicated before that the problem was (at least sometimes) visible on the first TFTP invocation, and this "it fails the fourth time" symptom is something completely new.
The problem with the problems on the first run was before applying the patch usb: ci_udc: fix interaction with CONFIG_USB_ETH_CDC, which was not upstream on u-boot-imx branch at the moment of writing the error report. After applying the patch and setting the cachline size to 32, the first three runs of tftp works fine, but always the fourth time it fails.
This is the case until applying patch usb: ci_udc: don't memalign() struct ci_req allocations, which throws no timeout error within running tftp about 60 times in a row.
That's quite odd. That patch definitely should not affect behaviour (and indeed I only sent it as an after-thought). If it does, then the only explanation I can think of is that the malloc heap's alignment changed, which just happens to hide the bug you're seeing. No doubt, there is still some lingering cache-flushing bug or similar...
BTW, did you fix the U-Boot header files in your board support patches, so that everything correctly knows that the cache line size is 32? I think it's mandatory to fix that issue before testing the USB code.
Yes, I did this: #define CONFIG_SYS_CACHELINE_SIZE 32
participants (3)
-
Jörg Krause
-
Marek Vasut
-
Stephen Warren