[RFT PATCH v1 0/5] usb: Improve robustness of ehci-hcd controller operation

This patch set is rather a request for testing (and a starting point for the discussion), as it may improve the robustness of USB with some pendrives - and yes sacrifice some performance for reliability. The previous version of this patch: https://patchwork.ozlabs.org/patch/1244928/ fixed issue for some network USB adapters and improved stability on TI boards. This patch also provides very detailed explanation of the problem in the commit message.
With the async support patch applied ( SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6 ), the qhtoken variable has value 0x00 when token shows errors. As a result the error handling path is not executed. This looks like some missing/broken cache flushing - for easier bisecting this patch has been reverted for now
Test setup: =========== Data: VFAT on the pendrive 16MiB FitImage on USB pen drive (to be read in U-Boot) 400 MiB recovery image (but not read)
Pen drives: ----------- 1. ID 0930:6544 Toshiba Corp. TransMemory-Mini / Kingston DataTraveler 2.0 Stick 2. ID 21c4:8005 GOODRAM 8GB
HW: --- IMX6Q -> TPC70 board
Procedure: ---------- Boot U-Boot, execute: loadusb=usb start; fatload usb 0 ${loadaddr} ${upd_image} (Then the image has been crafted to OOPs and WDT after 4 seconds causes reset). When we fail - the "normal" execution path is followed and we boot up till prompt.
Results: ========
1. Current mainline - SHA1: 63b2ef407de2f0997deef3b54b7e7ab9c7a7cb27 - The USB error (EHCI timed out on TD - token=0x80008d80) is present after ~10 minutes - pendrive [1]
- Error after a few minutes (1,2) - pendrive [2]
2. When USB async support is reverted (commit SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6 [*]) - Error is detected after ~1h - pendrive [1]
3. With this patchset applied - 12h of testing - no error - pendrive [1]
With the pendrive [2] I do observe that very often it is not recognized at all. Even more strange - there is a difference in the reliability of being recognized between identical pendrives (used one vs. just unboxed one).
Lukasz Majewski (5): Revert "usb: ehci-hcd: Keep async schedule running" usb: Handle XACTERR error in DATA phase of USB storage usb: Add some delay to wait for slow USB devices to be operational usb: Provide code to handle spinup of USB usb devices (mostly HDDs) usb: Handle QT_TOKEN_STATUS_XACTERR error when sending data
common/usb.c | 10 ++++- common/usb_storage.c | 46 ++++++++++++++++++++++ drivers/usb/host/ehci-hcd.c | 78 ++++++++++++++++++++++++++----------- include/usb.h | 1 + include/usb_defs.h | 1 + 5 files changed, 112 insertions(+), 24 deletions(-)

This reverts commit 02b0e1a36c5bc20174299312556ec4e266872bd6.
Signed-off-by: Lukasz Majewski lukma@denx.de ---
drivers/usb/host/ehci-hcd.c | 51 ++++++++++++++++++++++--------------- 1 file changed, 30 insertions(+), 21 deletions(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 1cc02052f5..0a77111f80 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -309,7 +309,7 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, volatile struct qTD *vtd; unsigned long ts; uint32_t *tdp; - uint32_t endpt, maxpacket, token, usbsts, qhtoken; + uint32_t endpt, maxpacket, token, usbsts; uint32_t c, toggle; uint32_t cmd; int timeout; @@ -553,21 +553,22 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, flush_dcache_range((unsigned long)qtd, ALIGN_END_ADDR(struct qTD, qtd, qtd_count));
+ /* Set async. queue head pointer. */ + ehci_writel(&ctrl->hcor->or_asynclistaddr, virt_to_phys(&ctrl->qh_list)); + usbsts = ehci_readl(&ctrl->hcor->or_usbsts); ehci_writel(&ctrl->hcor->or_usbsts, (usbsts & 0x3f));
/* Enable async. schedule. */ cmd = ehci_readl(&ctrl->hcor->or_usbcmd); - if (!(cmd & CMD_ASE)) { - cmd |= CMD_ASE; - ehci_writel(&ctrl->hcor->or_usbcmd, cmd); + cmd |= CMD_ASE; + ehci_writel(&ctrl->hcor->or_usbcmd, cmd);
- ret = handshake((uint32_t *)&ctrl->hcor->or_usbsts, STS_ASS, STS_ASS, - 100 * 1000); - if (ret < 0) { - printf("EHCI fail timeout STS_ASS set\n"); - goto fail; - } + ret = handshake((uint32_t *)&ctrl->hcor->or_usbsts, STS_ASS, STS_ASS, + 100 * 1000); + if (ret < 0) { + printf("EHCI fail timeout STS_ASS set\n"); + goto fail; }
/* Wait for TDs to be processed. */ @@ -588,11 +589,6 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, break; WATCHDOG_RESET(); } while (get_timer(ts) < timeout); - qhtoken = hc32_to_cpu(qh->qh_overlay.qt_token); - - ctrl->qh_list.qh_link = cpu_to_hc32(virt_to_phys(&ctrl->qh_list) | QH_LINK_TYPE_QH); - flush_dcache_range((unsigned long)&ctrl->qh_list, - ALIGN_END_ADDR(struct QH, &ctrl->qh_list, 1));
/* * Invalidate the memory area occupied by buffer @@ -611,12 +607,25 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, if (QT_TOKEN_GET_STATUS(token) & QT_TOKEN_STATUS_ACTIVE) printf("EHCI timed out on TD - token=%#x\n", token);
- if (!(QT_TOKEN_GET_STATUS(qhtoken) & QT_TOKEN_STATUS_ACTIVE)) { - debug("TOKEN=%#x\n", qhtoken); - switch (QT_TOKEN_GET_STATUS(qhtoken) & + /* Disable async schedule. */ + cmd = ehci_readl(&ctrl->hcor->or_usbcmd); + cmd &= ~CMD_ASE; + ehci_writel(&ctrl->hcor->or_usbcmd, cmd); + + ret = handshake((uint32_t *)&ctrl->hcor->or_usbsts, STS_ASS, 0, + 100 * 1000); + if (ret < 0) { + printf("EHCI fail timeout STS_ASS reset\n"); + goto fail; + } + + token = hc32_to_cpu(qh->qh_overlay.qt_token); + if (!(QT_TOKEN_GET_STATUS(token) & QT_TOKEN_STATUS_ACTIVE)) { + debug("TOKEN=%#x\n", token); + switch (QT_TOKEN_GET_STATUS(token) & ~(QT_TOKEN_STATUS_SPLITXSTATE | QT_TOKEN_STATUS_PERR)) { case 0: - toggle = QT_TOKEN_GET_DT(qhtoken); + toggle = QT_TOKEN_GET_DT(token); usb_settoggle(dev, usb_pipeendpoint(pipe), usb_pipeout(pipe), toggle); dev->status = 0; @@ -634,11 +643,11 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, break; default: dev->status = USB_ST_CRC_ERR; - if (QT_TOKEN_GET_STATUS(qhtoken) & QT_TOKEN_STATUS_HALTED) + if (QT_TOKEN_GET_STATUS(token) & QT_TOKEN_STATUS_HALTED) dev->status |= USB_ST_STALLED; break; } - dev->act_len = length - QT_TOKEN_GET_TOTALBYTES(qhtoken); + dev->act_len = length - QT_TOKEN_GET_TOTALBYTES(token); } else { dev->act_len = 0; #ifndef CONFIG_USB_EHCI_FARADAY

On 3/22/20 2:00 PM, Lukasz Majewski wrote:
This reverts commit 02b0e1a36c5bc20174299312556ec4e266872bd6.
Signed-off-by: Lukasz Majewski lukma@denx.de
This patch lacks any and all explanation why this is being reverted. The patch you are reverting here explains why it was added and what real issues it was fixing, so instead of reverting it, if there is an issue with that patch, you should identify the issue and fix it.

Hi Marek,
On 3/22/20 2:00 PM, Lukasz Majewski wrote:
This reverts commit 02b0e1a36c5bc20174299312556ec4e266872bd6.
Signed-off-by: Lukasz Majewski lukma@denx.de
This patch lacks any and all explanation why this is being reverted. The patch you are reverting here explains why it was added and what real issues it was fixing, so instead of reverting it, if there is an issue with that patch, you should identify the issue and fix it.
Marek, have you received the cover letter for this patch series?
In the cover letter I've written the rationale for reverting this patch.
In short - qhtoken has value of 0x0, when the token variable shows errors. As a result the error handling is broken. Could you comment on those arguments?
Moreover, I've explicitly stated that this is a Request For Testing like patch series with a detailed report of testing procedure (for my use case) for the USB in U-Boot (as Tom has tested the patch with some ETH dongles).
Best regards,
Lukasz Majewski
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de

On 3/23/20 7:57 AM, Lukasz Majewski wrote:
Hi Marek,
Hi,
On 3/22/20 2:00 PM, Lukasz Majewski wrote:
This reverts commit 02b0e1a36c5bc20174299312556ec4e266872bd6.
Signed-off-by: Lukasz Majewski lukma@denx.de
This patch lacks any and all explanation why this is being reverted. The patch you are reverting here explains why it was added and what real issues it was fixing, so instead of reverting it, if there is an issue with that patch, you should identify the issue and fix it.
Marek, have you received the cover letter for this patch series?
In the cover letter I've written the rationale for reverting this patch.
That should have been explained in this patch description.
In short - qhtoken has value of 0x0, when the token variable shows errors. As a result the error handling is broken. Could you comment on those arguments?
Maybe you are referencing/reading the wrong token ?
You should probably figure out why this doesn't work first and then add fixes on top.
Moreover, I've explicitly stated that this is a Request For Testing like patch series with a detailed report of testing procedure (for my use case) for the USB in U-Boot (as Tom has tested the patch with some ETH dongles).
I was still unable to replicate the ethernet device failure.

Hi Marek,
On 3/23/20 7:57 AM, Lukasz Majewski wrote:
Hi Marek,
Hi,
On 3/22/20 2:00 PM, Lukasz Majewski wrote:
This reverts commit 02b0e1a36c5bc20174299312556ec4e266872bd6.
Signed-off-by: Lukasz Majewski lukma@denx.de
This patch lacks any and all explanation why this is being reverted. The patch you are reverting here explains why it was added and what real issues it was fixing, so instead of reverting it, if there is an issue with that patch, you should identify the issue and fix it.
Marek, have you received the cover letter for this patch series?
In the cover letter I've written the rationale for reverting this patch.
That should have been explained in this patch description.
In short - qhtoken has value of 0x0, when the token variable shows errors. As a result the error handling is broken. Could you comment on those arguments?
Maybe you are referencing/reading the wrong token ?
I'm printing the token which is used afterwards for reacting on possible errors.
You should probably figure out why this doesn't work first and then add fixes on top.
Haven't you seen such problem during code development on your setup when developing this patch?
Moreover, I've explicitly stated that this is a Request For Testing like patch series with a detailed report of testing procedure (for my use case) for the USB in U-Boot (as Tom has tested the patch with some ETH dongles).
I was still unable to replicate the ethernet device failure.
Which boards and SoCs do you used for your test setup?
For me the issue is visible on i.MX53 and i.MX6Q.
Best regards,
Lukasz Majewski
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de

On 3/23/20 1:41 PM, Lukasz Majewski wrote:
Hi Marek,
Hi,
On 3/23/20 7:57 AM, Lukasz Majewski wrote:
Hi Marek,
Hi,
On 3/22/20 2:00 PM, Lukasz Majewski wrote:
This reverts commit 02b0e1a36c5bc20174299312556ec4e266872bd6.
Signed-off-by: Lukasz Majewski lukma@denx.de
This patch lacks any and all explanation why this is being reverted. The patch you are reverting here explains why it was added and what real issues it was fixing, so instead of reverting it, if there is an issue with that patch, you should identify the issue and fix it.
Marek, have you received the cover letter for this patch series?
In the cover letter I've written the rationale for reverting this patch.
That should have been explained in this patch description.
In short - qhtoken has value of 0x0, when the token variable shows errors. As a result the error handling is broken. Could you comment on those arguments?
Maybe you are referencing/reading the wrong token ?
I'm printing the token which is used afterwards for reacting on possible errors.
You should probably figure out why this doesn't work first and then add fixes on top.
Haven't you seen such problem during code development on your setup when developing this patch?
During the development of the patch, I don't remember, sorry. I most certainly saw various failure modes, however those should not be present mainline.
I tested this patch with the problematic USB sticks on R-Car Gen3 and with SMSC95xx USB ethernet adapter last weekend and I didn't see a problem.

Hi Marek,
On 3/23/20 1:41 PM, Lukasz Majewski wrote:
Hi Marek,
Hi,
On 3/23/20 7:57 AM, Lukasz Majewski wrote:
Hi Marek,
Hi,
On 3/22/20 2:00 PM, Lukasz Majewski wrote:
This reverts commit 02b0e1a36c5bc20174299312556ec4e266872bd6.
Signed-off-by: Lukasz Majewski lukma@denx.de
This patch lacks any and all explanation why this is being reverted. The patch you are reverting here explains why it was added and what real issues it was fixing, so instead of reverting it, if there is an issue with that patch, you should identify the issue and fix it.
Marek, have you received the cover letter for this patch series?
In the cover letter I've written the rationale for reverting this patch.
That should have been explained in this patch description.
In short - qhtoken has value of 0x0, when the token variable shows errors. As a result the error handling is broken. Could you comment on those arguments?
Maybe you are referencing/reading the wrong token ?
I'm printing the token which is used afterwards for reacting on possible errors.
You should probably figure out why this doesn't work first and then add fixes on top.
Haven't you seen such problem during code development on your setup when developing this patch?
During the development of the patch, I don't remember, sorry. I most certainly saw various failure modes, however those should not be present mainline.
The issue is that the qhtoken is not updated at all.
Maybe you remember - is Linux using async setup by default (as introduced in SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6) ?
I tested this patch with the problematic USB sticks on R-Car Gen3 and with SMSC95xx USB ethernet adapter last weekend and I didn't see a problem.
Ok.
For i.MX6Q: The SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6 patch causes the iMX6Q to fail after a few minutes of testing. General in i.MX6Q the usb is NOT robust at all.
For i.MX53: With patch 02b0e1a36c5bc20174299312556ec4e266872bd6 applied it also breaks after a few minutes.
With this patch series applied it works for 2 days now without any issue.
Best regards,
Lukasz Majewski
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de

On 3/24/20 8:06 AM, Lukasz Majewski wrote:
Hi Marek,
Hi,
[...]
You should probably figure out why this doesn't work first and then add fixes on top.
Haven't you seen such problem during code development on your setup when developing this patch?
During the development of the patch, I don't remember, sorry. I most certainly saw various failure modes, however those should not be present mainline.
The issue is that the qhtoken is not updated at all.
Maybe you remember - is Linux using async setup by default (as introduced in SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6) ?
If I recall correctly, it is using async schedule for bulk transfers. But the code is available, so you can double-check that.
I tested this patch with the problematic USB sticks on R-Car Gen3 and with SMSC95xx USB ethernet adapter last weekend and I didn't see a problem.
Ok.
For i.MX6Q: The SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6 patch causes the iMX6Q to fail after a few minutes of testing. General in i.MX6Q the usb is NOT robust at all.
For i.MX53: With patch 02b0e1a36c5bc20174299312556ec4e266872bd6 applied it also breaks after a few minutes.
So on CI HDRC , there is some difference in behavior. That is what you need to find and fix then.
With this patch series applied it works for 2 days now without any issue.
Except performance is totally degraded and there is still no clear explanation _why_ any of these patches are needed and/or whether doing write to a block device with these patches may cause data corruption.

Hi Marek,
On 3/24/20 8:06 AM, Lukasz Majewski wrote:
Hi Marek,
Hi,
[...]
You should probably figure out why this doesn't work first and then add fixes on top.
Haven't you seen such problem during code development on your setup when developing this patch?
During the development of the patch, I don't remember, sorry. I most certainly saw various failure modes, however those should not be present mainline.
The issue is that the qhtoken is not updated at all.
Maybe you remember - is Linux using async setup by default (as introduced in SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6) ?
If I recall correctly, it is using async schedule for bulk transfers. But the code is available, so you can double-check that.
I tested this patch with the problematic USB sticks on R-Car Gen3 and with SMSC95xx USB ethernet adapter last weekend and I didn't see a problem.
Ok.
For i.MX6Q: The SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6 patch causes the iMX6Q to fail after a few minutes of testing. General in i.MX6Q the usb is NOT robust at all.
For i.MX53: With patch 02b0e1a36c5bc20174299312556ec4e266872bd6 applied it also breaks after a few minutes.
So on CI HDRC , there is some difference in behavior. That is what you need to find and fix then.
The conclusion is that some boards/implementations are broken.
With this patch series applied it works for 2 days now without any issue.
Except performance is totally degraded
So we do have _very_ fast USB which breaks after a few minutes of constant testing (with procedure stated earlier).
and there is still no clear explanation _why_ any of these patches are needed
Haven't I explicitly explained in previous mails why XACTARR error shall be handled? Nor the original thread did it? Wasn't the cover-letter verbose enough?
and/or whether doing write to a block device with these patches may cause data corruption.
So I will ask differently - what _may_ happen when the "TD - token=XXXX" error shows up and the board hangs? Wouldn't we risk some unwanted storage corruption?
Best regards,
Lukasz Majewski
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de

On 3/24/20 7:11 PM, Lukasz Majewski wrote:
Hi Marek,
Hi,
On 3/24/20 8:06 AM, Lukasz Majewski wrote:
Hi Marek,
Hi,
[...]
You should probably figure out why this doesn't work first and then add fixes on top.
Haven't you seen such problem during code development on your setup when developing this patch?
During the development of the patch, I don't remember, sorry. I most certainly saw various failure modes, however those should not be present mainline.
The issue is that the qhtoken is not updated at all.
Maybe you remember - is Linux using async setup by default (as introduced in SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6) ?
If I recall correctly, it is using async schedule for bulk transfers. But the code is available, so you can double-check that.
I tested this patch with the problematic USB sticks on R-Car Gen3 and with SMSC95xx USB ethernet adapter last weekend and I didn't see a problem.
Ok.
For i.MX6Q: The SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6 patch causes the iMX6Q to fail after a few minutes of testing. General in i.MX6Q the usb is NOT robust at all.
For i.MX53: With patch 02b0e1a36c5bc20174299312556ec4e266872bd6 applied it also breaks after a few minutes.
So on CI HDRC , there is some difference in behavior. That is what you need to find and fix then.
The conclusion is that some boards/implementations are broken.
At least systems with CI HDRC.
With this patch series applied it works for 2 days now without any issue.
Except performance is totally degraded
So we do have _very_ fast USB which breaks after a few minutes of constant testing (with procedure stated earlier).
No, we have a controller which manifests some problem and that problem needs to be identified and fixed. Whether it's in the stack or in the controller driver is to be seen.
and there is still no clear explanation _why_ any of these patches are needed
Haven't I explicitly explained in previous mails why XACTARR error shall be handled? Nor the original thread did it? Wasn't the cover-letter verbose enough?
Regarding XACTERR, I agree it should be handled somehow.
However, I don't think handling XACTERR is what fixes your problems with the USB sticks, is it ?
Also, it is still unclear whether handling XACTERR exactly the same as STALL is the right thing to do. Is it ? I think it's not.
and/or whether doing write to a block device with these patches may cause data corruption.
So I will ask differently - what _may_ happen when the "TD - token=XXXX" error shows up and the board hangs? Wouldn't we risk some unwanted storage corruption?
The behavior is undefined.

This change brings support for handling XACTERR error during DATA phase of USB BBB (bulk) transmission.
The fix is to clear stall'ed endpoint and reset recovery on the USB mass storage class.
This code is the port to newest U-Boot of the fix from - "rayvt" (from [1]).
Links: [1] - https://forum.doozan.com/read.php?3,35295,35295#msg-35295 [2] - https://www.dropbox.com/s/nrkrd1no63viuu8/uboot-bodhi-2016.05-timeoutTD.patc...
Signed-off-by: Lukasz Majewski lukma@denx.de [Unfortunately, the original patch [2] did not contain S-o-B from the original author - "rayvt"] ---
common/usb_storage.c | 10 ++++++++++ include/usb_defs.h | 1 + 2 files changed, 11 insertions(+)
diff --git a/common/usb_storage.c b/common/usb_storage.c index 097b6729c1..92e1e54d1c 100644 --- a/common/usb_storage.c +++ b/common/usb_storage.c @@ -740,6 +740,16 @@ static int usb_stor_BBB_transport(struct scsi_cmd *srb, struct us_data *us)
result = usb_bulk_msg(us->pusb_dev, pipe, srb->pdata, srb->datalen, &data_actlen, USB_CNTL_TIMEOUT * 5); + + /* special handling of XACTERR in DATA phase */ + if (result < 0 && (us->pusb_dev->status & USB_ST_XACTERR)) { + debug("XACTERR in data phase - clr, reset, and return fail.\n"); + usb_stor_BBB_clear_endpt_stall(us, dir_in ? + us->ep_in : us->ep_out); + usb_stor_BBB_reset(us); + return USB_STOR_TRANSPORT_FAILED; + } + /* special handling of STALL in DATA phase */ if ((result < 0) && (us->pusb_dev->status & USB_ST_STALLED)) { debug("DATA:stall\n"); diff --git a/include/usb_defs.h b/include/usb_defs.h index 6dd2c997f9..572f6ab296 100644 --- a/include/usb_defs.h +++ b/include/usb_defs.h @@ -197,6 +197,7 @@ #define USB_ST_NAK_REC 0x10 /* NAK Received*/ #define USB_ST_CRC_ERR 0x20 /* CRC/timeout Error */ #define USB_ST_BIT_ERR 0x40 /* Bitstuff error */ +#define USB_ST_XACTERR 0x80 /* XACTERR error */ #define USB_ST_NOT_PROC 0x80000000L /* Not yet processed */

On 3/22/20 2:00 PM, Lukasz Majewski wrote:
This change brings support for handling XACTERR error during DATA phase of USB BBB (bulk) transmission.
The fix is to clear stall'ed endpoint and reset recovery on the USB mass storage class.
This code is the port to newest U-Boot of the fix from - "rayvt" (from [1]).
Links: [1] - https://forum.doozan.com/read.php?3,35295,35295#msg-35295 [2] - https://www.dropbox.com/s/nrkrd1no63viuu8/uboot-bodhi-2016.05-timeoutTD.patc...
Signed-off-by: Lukasz Majewski lukma@denx.de [Unfortunately, the original patch [2] did not contain S-o-B from the original author - "rayvt"]
common/usb_storage.c | 10 ++++++++++ include/usb_defs.h | 1 + 2 files changed, 11 insertions(+)
diff --git a/common/usb_storage.c b/common/usb_storage.c index 097b6729c1..92e1e54d1c 100644 --- a/common/usb_storage.c +++ b/common/usb_storage.c @@ -740,6 +740,16 @@ static int usb_stor_BBB_transport(struct scsi_cmd *srb, struct us_data *us)
result = usb_bulk_msg(us->pusb_dev, pipe, srb->pdata, srb->datalen, &data_actlen, USB_CNTL_TIMEOUT * 5);
- /* special handling of XACTERR in DATA phase */
- if (result < 0 && (us->pusb_dev->status & USB_ST_XACTERR)) {
debug("XACTERR in data phase - clr, reset, and return fail.\n");
usb_stor_BBB_clear_endpt_stall(us, dir_in ?
us->ep_in : us->ep_out);
usb_stor_BBB_reset(us);
return USB_STOR_TRANSPORT_FAILED;
- }
Can resetting the endpoint result in data corruption of some sort here ? Especially if this happens on filesystem write for example ?

Hi Marek,
On 3/22/20 2:00 PM, Lukasz Majewski wrote:
This change brings support for handling XACTERR error during DATA phase of USB BBB (bulk) transmission.
The fix is to clear stall'ed endpoint and reset recovery on the USB mass storage class.
This code is the port to newest U-Boot of the fix from - "rayvt" (from [1]).
Links: [1] - https://forum.doozan.com/read.php?3,35295,35295#msg-35295 [2] - https://www.dropbox.com/s/nrkrd1no63viuu8/uboot-bodhi-2016.05-timeoutTD.patc...
Signed-off-by: Lukasz Majewski lukma@denx.de [Unfortunately, the original patch [2] did not contain S-o-B from the original author - "rayvt"]
common/usb_storage.c | 10 ++++++++++ include/usb_defs.h | 1 + 2 files changed, 11 insertions(+)
diff --git a/common/usb_storage.c b/common/usb_storage.c index 097b6729c1..92e1e54d1c 100644 --- a/common/usb_storage.c +++ b/common/usb_storage.c @@ -740,6 +740,16 @@ static int usb_stor_BBB_transport(struct scsi_cmd *srb, struct us_data *us) result = usb_bulk_msg(us->pusb_dev, pipe, srb->pdata, srb->datalen, &data_actlen, USB_CNTL_TIMEOUT * 5);
- /* special handling of XACTERR in DATA phase */
- if (result < 0 && (us->pusb_dev->status & USB_ST_XACTERR))
{
debug("XACTERR in data phase - clr, reset, and
return fail.\n");
usb_stor_BBB_clear_endpt_stall(us, dir_in ?
us->ep_in :
us->ep_out);
usb_stor_BBB_reset(us);
return USB_STOR_TRANSPORT_FAILED;
- }
Can resetting the endpoint result in data corruption of some sort here ?
Please correct me if I'm wrong, but this code is executed when we do receive data, not writing them. Those operations shall (and are?) orthogonal.
Especially if this happens on filesystem write for example ?
Do we write data here?
Best regards,
Lukasz Majewski
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de

On 3/23/20 8:00 AM, Lukasz Majewski wrote:
Hi Marek,
On 3/22/20 2:00 PM, Lukasz Majewski wrote:
This change brings support for handling XACTERR error during DATA phase of USB BBB (bulk) transmission.
The fix is to clear stall'ed endpoint and reset recovery on the USB mass storage class.
This code is the port to newest U-Boot of the fix from - "rayvt" (from [1]).
Links: [1] - https://forum.doozan.com/read.php?3,35295,35295#msg-35295 [2] - https://www.dropbox.com/s/nrkrd1no63viuu8/uboot-bodhi-2016.05-timeoutTD.patc...
Signed-off-by: Lukasz Majewski lukma@denx.de [Unfortunately, the original patch [2] did not contain S-o-B from the original author - "rayvt"]
common/usb_storage.c | 10 ++++++++++ include/usb_defs.h | 1 + 2 files changed, 11 insertions(+)
diff --git a/common/usb_storage.c b/common/usb_storage.c index 097b6729c1..92e1e54d1c 100644 --- a/common/usb_storage.c +++ b/common/usb_storage.c @@ -740,6 +740,16 @@ static int usb_stor_BBB_transport(struct scsi_cmd *srb, struct us_data *us) result = usb_bulk_msg(us->pusb_dev, pipe, srb->pdata, srb->datalen, &data_actlen, USB_CNTL_TIMEOUT * 5);
- /* special handling of XACTERR in DATA phase */
- if (result < 0 && (us->pusb_dev->status & USB_ST_XACTERR))
{
debug("XACTERR in data phase - clr, reset, and
return fail.\n");
usb_stor_BBB_clear_endpt_stall(us, dir_in ?
us->ep_in :
us->ep_out);
usb_stor_BBB_reset(us);
return USB_STOR_TRANSPORT_FAILED;
- }
Can resetting the endpoint result in data corruption of some sort here ?
Please correct me if I'm wrong, but this code is executed when we do receive data, not writing them. Those operations shall (and are?) orthogonal.
Especially if this happens on filesystem write for example ?
Do we write data here?
I only did a very quick look into the code, but there I see
1082 static int usb_write_10(struct scsi_cmd *srb, struct us_data *ss, ... 1096 return ss->transport(srb, ss);
1338 case US_PR_BULK: 1339 debug("Bulk/Bulk/Bulk\n"); 1340 ss->transport = usb_stor_BBB_transport;
So I would say, the answer is -- yes.

Hi Marek,
On 3/23/20 8:00 AM, Lukasz Majewski wrote:
Hi Marek,
On 3/22/20 2:00 PM, Lukasz Majewski wrote:
This change brings support for handling XACTERR error during DATA phase of USB BBB (bulk) transmission.
The fix is to clear stall'ed endpoint and reset recovery on the USB mass storage class.
This code is the port to newest U-Boot of the fix from - "rayvt" (from [1]).
Links: [1] - https://forum.doozan.com/read.php?3,35295,35295#msg-35295 [2] - https://www.dropbox.com/s/nrkrd1no63viuu8/uboot-bodhi-2016.05-timeoutTD.patc...
Signed-off-by: Lukasz Majewski lukma@denx.de [Unfortunately, the original patch [2] did not contain S-o-B from the original author - "rayvt"]
common/usb_storage.c | 10 ++++++++++ include/usb_defs.h | 1 + 2 files changed, 11 insertions(+)
diff --git a/common/usb_storage.c b/common/usb_storage.c index 097b6729c1..92e1e54d1c 100644 --- a/common/usb_storage.c +++ b/common/usb_storage.c @@ -740,6 +740,16 @@ static int usb_stor_BBB_transport(struct scsi_cmd *srb, struct us_data *us) result = usb_bulk_msg(us->pusb_dev, pipe, srb->pdata, srb->datalen, &data_actlen, USB_CNTL_TIMEOUT * 5);
- /* special handling of XACTERR in DATA phase */
- if (result < 0 && (us->pusb_dev->status &
USB_ST_XACTERR)) {
debug("XACTERR in data phase - clr, reset, and
return fail.\n");
usb_stor_BBB_clear_endpt_stall(us, dir_in ?
us->ep_in :
us->ep_out);
usb_stor_BBB_reset(us);
return USB_STOR_TRANSPORT_FAILED;
- }
Can resetting the endpoint result in data corruption of some sort here ?
Please correct me if I'm wrong, but this code is executed when we do receive data, not writing them. Those operations shall (and are?) orthogonal.
Especially if this happens on filesystem write for example ?
Do we write data here?
I only did a very quick look into the code, but there I see
1082 static int usb_write_10(struct scsi_cmd *srb, struct us_data *ss, ... 1096 return ss->transport(srb, ss);
1338 case US_PR_BULK: 1339 debug("Bulk/Bulk/Bulk\n"); 1340 ss->transport = usb_stor_BBB_transport;
So I would say, the answer is -- yes.
A few lines down (usb_storage.c @ L757) you have the USB_ST_STALLED status handled in the same way (it also aborts after a single retry).
Best regards,
Lukasz Majewski
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de

On 3/23/20 2:03 PM, Lukasz Majewski wrote:
Hi Marek,
Hi,
On 3/23/20 8:00 AM, Lukasz Majewski wrote:
Hi Marek,
On 3/22/20 2:00 PM, Lukasz Majewski wrote:
This change brings support for handling XACTERR error during DATA phase of USB BBB (bulk) transmission.
The fix is to clear stall'ed endpoint and reset recovery on the USB mass storage class.
This code is the port to newest U-Boot of the fix from - "rayvt" (from [1]).
Links: [1] - https://forum.doozan.com/read.php?3,35295,35295#msg-35295 [2] - https://www.dropbox.com/s/nrkrd1no63viuu8/uboot-bodhi-2016.05-timeoutTD.patc...
Signed-off-by: Lukasz Majewski lukma@denx.de [Unfortunately, the original patch [2] did not contain S-o-B from the original author - "rayvt"]
common/usb_storage.c | 10 ++++++++++ include/usb_defs.h | 1 + 2 files changed, 11 insertions(+)
diff --git a/common/usb_storage.c b/common/usb_storage.c index 097b6729c1..92e1e54d1c 100644 --- a/common/usb_storage.c +++ b/common/usb_storage.c @@ -740,6 +740,16 @@ static int usb_stor_BBB_transport(struct scsi_cmd *srb, struct us_data *us) result = usb_bulk_msg(us->pusb_dev, pipe, srb->pdata, srb->datalen, &data_actlen, USB_CNTL_TIMEOUT * 5);
- /* special handling of XACTERR in DATA phase */
- if (result < 0 && (us->pusb_dev->status &
USB_ST_XACTERR)) {
debug("XACTERR in data phase - clr, reset, and
return fail.\n");
usb_stor_BBB_clear_endpt_stall(us, dir_in ?
us->ep_in :
us->ep_out);
usb_stor_BBB_reset(us);
return USB_STOR_TRANSPORT_FAILED;
- }
Can resetting the endpoint result in data corruption of some sort here ?
Please correct me if I'm wrong, but this code is executed when we do receive data, not writing them. Those operations shall (and are?) orthogonal.
Especially if this happens on filesystem write for example ?
Do we write data here?
I only did a very quick look into the code, but there I see
1082 static int usb_write_10(struct scsi_cmd *srb, struct us_data *ss, ... 1096 return ss->transport(srb, ss);
1338 case US_PR_BULK: 1339 debug("Bulk/Bulk/Bulk\n"); 1340 ss->transport = usb_stor_BBB_transport;
So I would say, the answer is -- yes.
A few lines down (usb_storage.c @ L757) you have the USB_ST_STALLED status handled in the same way (it also aborts after a single retry).
But stall isn't xfer error. Which makes me wonder, why is the xfer error here handled the same way as a stall, shouldn't the handling be different ?
Also, this doesn't answer the question whether such handling can cause data corruption during write.

This change provides some extra time for some slow (or degraded) USB devices to become fully operational.
This code is the port to newest U-Boot of the fix from - "rayvt" (from [1]).
Links: [1] - https://forum.doozan.com/read.php?3,35295,35295#msg-35295 [2] - https://www.dropbox.com/s/nrkrd1no63viuu8/uboot-bodhi-2016.05-timeoutTD.patc...
Signed-off-by: Lukasz Majewski lukma@denx.de [Unfortunately, the original patch [2] did not contain S-o-B from the original author - "rayvt"] ---
common/usb.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/common/usb.c b/common/usb.c index 349e838f1d..305482b5bb 100644 --- a/common/usb.c +++ b/common/usb.c @@ -925,14 +925,20 @@ static int get_descriptor_len(struct usb_device *dev, int len, int expect_len) __maybe_unused struct usb_device_descriptor *desc; ALLOC_CACHE_ALIGN_BUFFER(unsigned char, tmpbuf, USB_BUFSIZ); int err; + int retry = 5;
desc = (struct usb_device_descriptor *)tmpbuf;
+again: err = usb_get_descriptor(dev, USB_DT_DEVICE, 0, desc, len); if (err < expect_len) { if (err < 0) { - printf("unable to get device descriptor (error=%d)\n", - err); + printf("unable to get device descriptor (error=%d) retry: %d\n", + err, retry); + mdelay(50); + if (--retry >= 0) + /* Some drives are just slow to wake up. */ + goto again; return err; } else { printf("USB device descriptor short read (expected %i, got %i)\n",

On 3/22/20 2:00 PM, Lukasz Majewski wrote:
This change provides some extra time for some slow (or degraded) USB devices to become fully operational.
This code is the port to newest U-Boot of the fix from - "rayvt" (from [1]).
Links: [1] - https://forum.doozan.com/read.php?3,35295,35295#msg-35295 [2] - https://www.dropbox.com/s/nrkrd1no63viuu8/uboot-bodhi-2016.05-timeoutTD.patc...
Signed-off-by: Lukasz Majewski lukma@denx.de [Unfortunately, the original patch [2] did not contain S-o-B from the original author - "rayvt"]
common/usb.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/common/usb.c b/common/usb.c index 349e838f1d..305482b5bb 100644 --- a/common/usb.c +++ b/common/usb.c @@ -925,14 +925,20 @@ static int get_descriptor_len(struct usb_device *dev, int len, int expect_len) __maybe_unused struct usb_device_descriptor *desc; ALLOC_CACHE_ALIGN_BUFFER(unsigned char, tmpbuf, USB_BUFSIZ); int err;
int retry = 5;
desc = (struct usb_device_descriptor *)tmpbuf;
+again: err = usb_get_descriptor(dev, USB_DT_DEVICE, 0, desc, len); if (err < expect_len) { if (err < 0) {
printf("unable to get device descriptor (error=%d)\n",
err);
printf("unable to get device descriptor (error=%d) retry: %d\n",
err, retry);
mdelay(50);
Why 50 mSec and not some other value, like 100 mSec ?

Hi Marek,
On 3/22/20 2:00 PM, Lukasz Majewski wrote:
This change provides some extra time for some slow (or degraded) USB devices to become fully operational.
This code is the port to newest U-Boot of the fix from - "rayvt" (from [1]).
Links: [1] - https://forum.doozan.com/read.php?3,35295,35295#msg-35295 [2] - https://www.dropbox.com/s/nrkrd1no63viuu8/uboot-bodhi-2016.05-timeoutTD.patc...
Signed-off-by: Lukasz Majewski lukma@denx.de [Unfortunately, the original patch [2] did not contain S-o-B from the original author - "rayvt"]
common/usb.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/common/usb.c b/common/usb.c index 349e838f1d..305482b5bb 100644 --- a/common/usb.c +++ b/common/usb.c @@ -925,14 +925,20 @@ static int get_descriptor_len(struct usb_device *dev, int len, int expect_len) __maybe_unused struct usb_device_descriptor *desc; ALLOC_CACHE_ALIGN_BUFFER(unsigned char, tmpbuf, USB_BUFSIZ); int err;
int retry = 5;
desc = (struct usb_device_descriptor *)tmpbuf;
+again: err = usb_get_descriptor(dev, USB_DT_DEVICE, 0, desc, len); if (err < expect_len) { if (err < 0) {
printf("unable to get device descriptor
(error=%d)\n",
err);
printf("unable to get device descriptor
(error=%d) retry: %d\n",
err, retry);
mdelay(50);
Why 50 mSec and not some other value, like 100 mSec ?
I think that this value (50 ms) was took from Linux in some point and with the retry set to 5 was the ported heuristics.
If you ask why exactly there is 50 ms - I cannot say, as I've just ported the patch.
Best regards,
Lukasz Majewski
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de

On 3/23/20 8:04 AM, Lukasz Majewski wrote:
Hi Marek,
On 3/22/20 2:00 PM, Lukasz Majewski wrote:
This change provides some extra time for some slow (or degraded) USB devices to become fully operational.
This code is the port to newest U-Boot of the fix from - "rayvt" (from [1]).
Links: [1] - https://forum.doozan.com/read.php?3,35295,35295#msg-35295 [2] - https://www.dropbox.com/s/nrkrd1no63viuu8/uboot-bodhi-2016.05-timeoutTD.patc...
Signed-off-by: Lukasz Majewski lukma@denx.de [Unfortunately, the original patch [2] did not contain S-o-B from the original author - "rayvt"]
common/usb.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/common/usb.c b/common/usb.c index 349e838f1d..305482b5bb 100644 --- a/common/usb.c +++ b/common/usb.c @@ -925,14 +925,20 @@ static int get_descriptor_len(struct usb_device *dev, int len, int expect_len) __maybe_unused struct usb_device_descriptor *desc; ALLOC_CACHE_ALIGN_BUFFER(unsigned char, tmpbuf, USB_BUFSIZ); int err;
int retry = 5;
desc = (struct usb_device_descriptor *)tmpbuf;
+again: err = usb_get_descriptor(dev, USB_DT_DEVICE, 0, desc, len); if (err < expect_len) { if (err < 0) {
printf("unable to get device descriptor
(error=%d)\n",
err);
printf("unable to get device descriptor
(error=%d) retry: %d\n",
err, retry);
mdelay(50);
Why 50 mSec and not some other value, like 100 mSec ?
I think that this value (50 ms) was took from Linux in some point and with the retry set to 5 was the ported heuristics.
If you ask why exactly there is 50 ms - I cannot say, as I've just ported the patch.
I see, then please research this. The USB stack has enough ad-hoc random values in it already, no need to add new ones.

After this change USB mass storage devices - mostly HDDs connected via USB - will gain handling of extra time needed for their spin up.
This operation is realized with issuing SCSI start/stop unit (0x1B) command.
This code is the port to newest U-Boot of the fix from - "rayvt" (from [1]).
Links: [1] - https://forum.doozan.com/read.php?3,35295,35295#msg-35295 [2] - https://www.dropbox.com/s/nrkrd1no63viuu8/uboot-bodhi-2016.05-timeoutTD.patc...
Signed-off-by: Lukasz Majewski lukma@denx.de [Unfortunately, the original patch [2] did not contain S-o-B from the original author - "rayvt"] ---
common/usb_storage.c | 36 ++++++++++++++++++++++++++++++++++++ include/usb.h | 1 + 2 files changed, 37 insertions(+)
diff --git a/common/usb_storage.c b/common/usb_storage.c index 92e1e54d1c..3c2324fa1a 100644 --- a/common/usb_storage.c +++ b/common/usb_storage.c @@ -729,6 +729,7 @@ static int usb_stor_BBB_transport(struct scsi_cmd *srb, struct us_data *us) pipeout = usb_sndbulkpipe(us->pusb_dev, us->ep_out); /* DATA phase + error handling */ data_actlen = 0; + mdelay(10); /* Like linux does. */ /* no data, go immediately to the STATUS phase */ if (srb->datalen == 0) goto st; @@ -1023,9 +1024,32 @@ static int usb_request_sense(struct scsi_cmd *srb, struct us_data *ss) return 0; }
+/* + * This spins up the disk and also consumes the time that the + * disk takes to become active and ready to read data. + * Some drives (like Western Digital) can take more than 5 seconds. + * The delay occurs on the 1st data read from the disk. + * Extending the timeout here works better than handling the timeout + * as an error on a "real" read. + */ +static int usb_spinup(struct scsi_cmd *srb, struct us_data *ss) +{ + memset(&srb->cmd[0], 0, 12); + srb->cmd[0] = SCSI_START_STP; + srb->cmd[1] = srb->lun << 5; + srb->cmd[4] = 1; /* Start spinup. */ + srb->datalen = 0; + srb->cmdlen = 6; + ss->pusb_dev->extra_timeout = 9876; + ss->transport(srb, ss); + ss->pusb_dev->extra_timeout = 0; + return 0; +} + static int usb_test_unit_ready(struct scsi_cmd *srb, struct us_data *ss) { int retries = 10; + int gave_extra_time = 0;
do { memset(&srb->cmd[0], 0, 12); @@ -1048,6 +1072,17 @@ static int usb_test_unit_ready(struct scsi_cmd *srb, struct us_data *ss) if ((srb->sense_buf[2] == 0x02) && (srb->sense_buf[12] == 0x3a)) return -1; + /* + * If the status is "Not Ready - becoming ready", give it + * more time. Linux issues a spinup command (once) and gives + * it 100 seconds. + */ + if (srb->sense_buf[2] == 0x02 && + srb->sense_buf[12] == 0x04 && + gave_extra_time == 0) { + retries = 100; /* Allow 10 seconds. */ + gave_extra_time = retries; + } mdelay(100); } while (retries--);
@@ -1502,6 +1537,7 @@ int usb_stor_get_info(struct usb_device *dev, struct us_data *ss, dev_desc->log2blksz = LOG2(dev_desc->blksz); dev_desc->type = perq; debug(" address %d\n", dev_desc->target); + usb_spinup(pccb, ss);
return 1; } diff --git a/include/usb.h b/include/usb.h index efb67ea33f..5b0f5ce5d6 100644 --- a/include/usb.h +++ b/include/usb.h @@ -140,6 +140,7 @@ struct usb_device { int act_len; /* transferred bytes */ int maxchild; /* Number of ports if hub */ int portnr; /* Port number, 1=first */ + int extra_timeout; /* Add to timeout in ehci_submit_async or spinup */ #if !CONFIG_IS_ENABLED(DM_USB) /* parent hub, or NULL if this is the root hub */ struct usb_device *parent;

On 3/22/20 2:00 PM, Lukasz Majewski wrote: [...]
diff --git a/common/usb_storage.c b/common/usb_storage.c index 92e1e54d1c..3c2324fa1a 100644 --- a/common/usb_storage.c +++ b/common/usb_storage.c @@ -729,6 +729,7 @@ static int usb_stor_BBB_transport(struct scsi_cmd *srb, struct us_data *us) pipeout = usb_sndbulkpipe(us->pusb_dev, us->ep_out); /* DATA phase + error handling */ data_actlen = 0;
- mdelay(10); /* Like linux does. */
Does this add delay to every single transfer ? That would mean a massive slowdown if you use short data transfers, which is needed for old/limited USB sticks.
/* no data, go immediately to the STATUS phase */ if (srb->datalen == 0) goto st; @@ -1023,9 +1024,32 @@ static int usb_request_sense(struct scsi_cmd *srb, struct us_data *ss) return 0; }
+/*
- This spins up the disk and also consumes the time that the
- disk takes to become active and ready to read data.
- Some drives (like Western Digital) can take more than 5 seconds.
- The delay occurs on the 1st data read from the disk.
- Extending the timeout here works better than handling the timeout
- as an error on a "real" read.
- */
+static int usb_spinup(struct scsi_cmd *srb, struct us_data *ss) +{
- memset(&srb->cmd[0], 0, 12);
- srb->cmd[0] = SCSI_START_STP;
- srb->cmd[1] = srb->lun << 5;
- srb->cmd[4] = 1; /* Start spinup. */
- srb->datalen = 0;
- srb->cmdlen = 6;
- ss->pusb_dev->extra_timeout = 9876;
What is this magic number ?
- ss->transport(srb, ss);
- ss->pusb_dev->extra_timeout = 0;
- return 0;
+}
[...]
diff --git a/include/usb.h b/include/usb.h index efb67ea33f..5b0f5ce5d6 100644 --- a/include/usb.h +++ b/include/usb.h @@ -140,6 +140,7 @@ struct usb_device { int act_len; /* transferred bytes */ int maxchild; /* Number of ports if hub */ int portnr; /* Port number, 1=first */
- int extra_timeout; /* Add to timeout in ehci_submit_async or spinup */
Does this work with xhci too ?

Hi Marek,
On 3/22/20 2:00 PM, Lukasz Majewski wrote: [...]
diff --git a/common/usb_storage.c b/common/usb_storage.c index 92e1e54d1c..3c2324fa1a 100644 --- a/common/usb_storage.c +++ b/common/usb_storage.c @@ -729,6 +729,7 @@ static int usb_stor_BBB_transport(struct scsi_cmd *srb, struct us_data *us) pipeout = usb_sndbulkpipe(us->pusb_dev, us->ep_out); /* DATA phase + error handling */ data_actlen = 0;
- mdelay(10); /* Like linux does. */
Does this add delay to every single transfer ?
It brings the slowdown, yes.
However, without it I very often see the error that the USB address couldn't be assigned.
That would mean a massive slowdown if you use short data transfers, which is needed for old/limited USB sticks.
/* no data, go immediately to the STATUS phase */ if (srb->datalen == 0) goto st; @@ -1023,9 +1024,32 @@ static int usb_request_sense(struct scsi_cmd *srb, struct us_data *ss) return 0; }
+/*
- This spins up the disk and also consumes the time that the
- disk takes to become active and ready to read data.
- Some drives (like Western Digital) can take more than 5 seconds.
- The delay occurs on the 1st data read from the disk.
- Extending the timeout here works better than handling the
timeout
- as an error on a "real" read.
- */
+static int usb_spinup(struct scsi_cmd *srb, struct us_data *ss) +{
- memset(&srb->cmd[0], 0, 12);
- srb->cmd[0] = SCSI_START_STP;
- srb->cmd[1] = srb->lun << 5;
- srb->cmd[4] = 1; /* Start spinup. */
- srb->datalen = 0;
- srb->cmdlen = 6;
- ss->pusb_dev->extra_timeout = 9876;
What is this magic number ?
This number is added to the timeout up to which ehci controller waits for EHCI TD to be sent.
Why there is 9876 - I do guess that it was took from Linux in some point in time.
- ss->transport(srb, ss);
- ss->pusb_dev->extra_timeout = 0;
- return 0;
+}
[...]
diff --git a/include/usb.h b/include/usb.h index efb67ea33f..5b0f5ce5d6 100644 --- a/include/usb.h +++ b/include/usb.h @@ -140,6 +140,7 @@ struct usb_device { int act_len; /* transferred bytes */ int maxchild; /* Number of ports if hub */ int portnr; /* Port number, 1=first */
- int extra_timeout; /* Add to timeout in ehci_submit_async
or spinup */
Does this work with xhci too ?
Yes, it is used in patch 5/5.
Best regards,
Lukasz Majewski
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de

On 3/23/20 8:53 AM, Lukasz Majewski wrote:
Hi Marek,
Hi,
On 3/22/20 2:00 PM, Lukasz Majewski wrote: [...]
diff --git a/common/usb_storage.c b/common/usb_storage.c index 92e1e54d1c..3c2324fa1a 100644 --- a/common/usb_storage.c +++ b/common/usb_storage.c @@ -729,6 +729,7 @@ static int usb_stor_BBB_transport(struct scsi_cmd *srb, struct us_data *us) pipeout = usb_sndbulkpipe(us->pusb_dev, us->ep_out); /* DATA phase + error handling */ data_actlen = 0;
- mdelay(10); /* Like linux does. */
Does this add delay to every single transfer ?
It brings the slowdown, yes.
However, without it I very often see the error that the USB address couldn't be assigned.
Seems like this is hiding some real error then.
If I do basic math, then I reach a conclusion that the comment is bogus. Look, assume the transfer itself takes 0 time, then if you have 10 mS delays between transfers, you can do 100 transfer per second. If one transfer is 240 blocks * 512 bytes , then you are limited to 12.2 MB/s. And I am positive USB 2.0 sticks in Linux can transfer faster than that.
That would mean a massive slowdown if you use short data transfers, which is needed for old/limited USB sticks.
/* no data, go immediately to the STATUS phase */ if (srb->datalen == 0) goto st; @@ -1023,9 +1024,32 @@ static int usb_request_sense(struct scsi_cmd *srb, struct us_data *ss) return 0; }
+/*
- This spins up the disk and also consumes the time that the
- disk takes to become active and ready to read data.
- Some drives (like Western Digital) can take more than 5 seconds.
- The delay occurs on the 1st data read from the disk.
- Extending the timeout here works better than handling the
timeout
- as an error on a "real" read.
- */
+static int usb_spinup(struct scsi_cmd *srb, struct us_data *ss) +{
- memset(&srb->cmd[0], 0, 12);
- srb->cmd[0] = SCSI_START_STP;
- srb->cmd[1] = srb->lun << 5;
- srb->cmd[4] = 1; /* Start spinup. */
- srb->datalen = 0;
- srb->cmdlen = 6;
- ss->pusb_dev->extra_timeout = 9876;
What is this magic number ?
This number is added to the timeout up to which ehci controller waits for EHCI TD to be sent.
This is generic code and has to work with OHCI/UHCI/xHCI too.
Why there is 9876 - I do guess that it was took from Linux in some point in time.
Please, research it.
- ss->transport(srb, ss);
- ss->pusb_dev->extra_timeout = 0;
- return 0;
+}
[...]
diff --git a/include/usb.h b/include/usb.h index efb67ea33f..5b0f5ce5d6 100644 --- a/include/usb.h +++ b/include/usb.h @@ -140,6 +140,7 @@ struct usb_device { int act_len; /* transferred bytes */ int maxchild; /* Number of ports if hub */ int portnr; /* Port number, 1=first */
- int extra_timeout; /* Add to timeout in ehci_submit_async
or spinup */
Does this work with xhci too ?
Yes, it is used in patch 5/5.
Does xhci need it ?

Hi Marek,
On 3/23/20 8:53 AM, Lukasz Majewski wrote:
Hi Marek,
Hi,
On 3/22/20 2:00 PM, Lukasz Majewski wrote: [...]
diff --git a/common/usb_storage.c b/common/usb_storage.c index 92e1e54d1c..3c2324fa1a 100644 --- a/common/usb_storage.c +++ b/common/usb_storage.c @@ -729,6 +729,7 @@ static int usb_stor_BBB_transport(struct scsi_cmd *srb, struct us_data *us) pipeout = usb_sndbulkpipe(us->pusb_dev, us->ep_out); /* DATA phase + error handling */ data_actlen = 0;
- mdelay(10); /* Like linux does. */
Does this add delay to every single transfer ?
It brings the slowdown, yes.
However, without it I very often see the error that the USB address couldn't be assigned.
Seems like this is hiding some real error then.
If I do basic math, then I reach a conclusion that the comment is bogus. Look, assume the transfer itself takes 0 time, then if you have 10 mS delays between transfers, you can do 100 transfer per second. If one transfer is 240 blocks * 512 bytes , then you are limited to 12.2 MB/s. And I am positive USB 2.0 sticks in Linux can transfer faster than that.
Theoretically USB 2.0 can have up to 60MiB/s transfer rate. The 12MiB/s is for USB 1.x.
I cannot say why this delay helps with recognition of some devices. I also start wondering why I do see some strange problems in U-Boot (like not assigning address, timeouts), as those errors are not present on Linux.
That would mean a massive slowdown if you use short data transfers, which is needed for old/limited USB sticks.
/* no data, go immediately to the STATUS phase */ if (srb->datalen == 0) goto st; @@ -1023,9 +1024,32 @@ static int usb_request_sense(struct scsi_cmd *srb, struct us_data *ss) return 0; }
+/*
- This spins up the disk and also consumes the time that the
- disk takes to become active and ready to read data.
- Some drives (like Western Digital) can take more than 5
seconds.
- The delay occurs on the 1st data read from the disk.
- Extending the timeout here works better than handling the
timeout
- as an error on a "real" read.
- */
+static int usb_spinup(struct scsi_cmd *srb, struct us_data *ss) +{
- memset(&srb->cmd[0], 0, 12);
- srb->cmd[0] = SCSI_START_STP;
- srb->cmd[1] = srb->lun << 5;
- srb->cmd[4] = 1; /* Start spinup. */
- srb->datalen = 0;
- srb->cmdlen = 6;
- ss->pusb_dev->extra_timeout = 9876;
What is this magic number ?
This number is added to the timeout up to which ehci controller waits for EHCI TD to be sent.
This is generic code and has to work with OHCI/UHCI/xHCI too.
Why there is 9876 - I do guess that it was took from Linux in some point in time.
Please, research it.
- ss->transport(srb, ss);
- ss->pusb_dev->extra_timeout = 0;
- return 0;
+}
[...]
diff --git a/include/usb.h b/include/usb.h index efb67ea33f..5b0f5ce5d6 100644 --- a/include/usb.h +++ b/include/usb.h @@ -140,6 +140,7 @@ struct usb_device { int act_len; /* transferred bytes */ int maxchild; /* Number of ports if hub */ int portnr; /* Port number, 1=first */
- int extra_timeout; /* Add to timeout in ehci_submit_async
or spinup */
Does this work with xhci too ?
Yes, it is used in patch 5/5.
Does xhci need it ?
It is hard to say. The original fix was for EHCI.
Best regards,
Lukasz Majewski
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de

On 3/23/20 1:54 PM, Lukasz Majewski wrote:
Hi Marek,
Hi,
On 3/23/20 8:53 AM, Lukasz Majewski wrote:
Hi Marek,
Hi,
On 3/22/20 2:00 PM, Lukasz Majewski wrote: [...]
diff --git a/common/usb_storage.c b/common/usb_storage.c index 92e1e54d1c..3c2324fa1a 100644 --- a/common/usb_storage.c +++ b/common/usb_storage.c @@ -729,6 +729,7 @@ static int usb_stor_BBB_transport(struct scsi_cmd *srb, struct us_data *us) pipeout = usb_sndbulkpipe(us->pusb_dev, us->ep_out); /* DATA phase + error handling */ data_actlen = 0;
- mdelay(10); /* Like linux does. */
Does this add delay to every single transfer ?
It brings the slowdown, yes.
However, without it I very often see the error that the USB address couldn't be assigned.
Seems like this is hiding some real error then.
If I do basic math, then I reach a conclusion that the comment is bogus. Look, assume the transfer itself takes 0 time, then if you have 10 mS delays between transfers, you can do 100 transfer per second. If one transfer is 240 blocks * 512 bytes , then you are limited to 12.2 MB/s. And I am positive USB 2.0 sticks in Linux can transfer faster than that.
Theoretically USB 2.0 can have up to 60MiB/s transfer rate. The 12MiB/s is for USB 1.x.
If you add such a massive 10 mS delay between each and every single read, then the performance will be much lower, see the simple calculation above.
I cannot say why this delay helps with recognition of some devices. I also start wondering why I do see some strange problems in U-Boot (like not assigning address, timeouts), as those errors are not present on Linux.
Maybe something to investigate in more depth ?
[...]
diff --git a/include/usb.h b/include/usb.h index efb67ea33f..5b0f5ce5d6 100644 --- a/include/usb.h +++ b/include/usb.h @@ -140,6 +140,7 @@ struct usb_device { int act_len; /* transferred bytes */ int maxchild; /* Number of ports if hub */ int portnr; /* Port number, 1=first */
- int extra_timeout; /* Add to timeout in ehci_submit_async
or spinup */
Does this work with xhci too ?
Yes, it is used in patch 5/5.
Does xhci need it ?
It is hard to say. The original fix was for EHCI.
Please check. This needs to be fixed properly instead of just papering over the problem and hacking it up somehow to make it somehow work.

This code adds check if QT_TOKEN_STATUS_XACTERR error occurred. When it is detected the token is reconfigured and transmission is retried.
This code is the port to newest U-Boot of the fix from - "rayvt" (from [1]).
Links: [1] - https://forum.doozan.com/read.php?3,35295,35295#msg-35295 [2] - https://www.dropbox.com/s/nrkrd1no63viuu8/uboot-bodhi-2016.05-timeoutTD.patc...
Signed-off-by: Lukasz Majewski lukma@denx.de [Unfortunately, the original patch [2] did not contain S-o-B from the original author - "rayvt"] ---
drivers/usb/host/ehci-hcd.c | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 0a77111f80..45eda7ad24 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -315,6 +315,7 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, int timeout; int ret = 0; struct ehci_ctrl *ctrl = ehci_get_ctrl(dev); + int trynum;
debug("dev=%p, pipe=%lx, buffer=%p, length=%d, req=%p\n", dev, pipe, buffer, length, req); @@ -560,6 +561,10 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, ehci_writel(&ctrl->hcor->or_usbsts, (usbsts & 0x3f));
/* Enable async. schedule. */ + trynum = 1; /* No more than 2 tries, in case of XACTERR. */ +retry_xacterr:; + vtd = &qtd[qtd_counter - 1]; + cmd = ehci_readl(&ctrl->hcor->or_usbcmd); cmd |= CMD_ASE; ehci_writel(&ctrl->hcor->or_usbcmd, cmd); @@ -573,8 +578,8 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer,
/* Wait for TDs to be processed. */ ts = get_timer(0); - vtd = &qtd[qtd_counter - 1]; timeout = USB_TIMEOUT_MS(pipe); + timeout += dev->extra_timeout; do { /* Invalidate dcache */ invalidate_dcache_range((unsigned long)&ctrl->qh_list, @@ -589,6 +594,7 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, break; WATCHDOG_RESET(); } while (get_timer(ts) < timeout); + debug("took %4lu ms of %4d\n", get_timer(ts), timeout);
/* * Invalidate the memory area occupied by buffer @@ -622,6 +628,25 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, token = hc32_to_cpu(qh->qh_overlay.qt_token); if (!(QT_TOKEN_GET_STATUS(token) & QT_TOKEN_STATUS_ACTIVE)) { debug("TOKEN=%#x\n", token); + if (token & QT_TOKEN_STATUS_XACTERR) { + if (--trynum >= 0) { + /* + * It is necessary to do this, otherwise the + * disk is clagged. + */ + debug("reset the TD and redo, because of XACTERR\n"); + token &= ~QT_TOKEN_STATUS_HALTED; + token |= QT_TOKEN_STATUS_ACTIVE | + QT_TOKEN_CERR(2); + vtd->qt_token = cpu_to_hc32(token); + qh->qh_overlay.qt_token = cpu_to_hc32(token); + goto retry_xacterr; + } + dev->status = USB_ST_XACTERR; + dev->act_len = length - QT_TOKEN_GET_TOTALBYTES(token); + goto fail; + } + switch (QT_TOKEN_GET_STATUS(token) & ~(QT_TOKEN_STATUS_SPLITXSTATE | QT_TOKEN_STATUS_PERR)) { case 0:

On 3/22/20 2:00 PM, Lukasz Majewski wrote:
This code adds check if QT_TOKEN_STATUS_XACTERR error occurred. When it is detected the token is reconfigured and transmission is retried.
This code is the port to newest U-Boot of the fix from - "rayvt" (from [1]).
Links: [1] - https://forum.doozan.com/read.php?3,35295,35295#msg-35295 [2] - https://www.dropbox.com/s/nrkrd1no63viuu8/uboot-bodhi-2016.05-timeoutTD.patc...
Signed-off-by: Lukasz Majewski lukma@denx.de [Unfortunately, the original patch [2] did not contain S-o-B from the original author - "rayvt"]
drivers/usb/host/ehci-hcd.c | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 0a77111f80..45eda7ad24 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -315,6 +315,7 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, int timeout; int ret = 0; struct ehci_ctrl *ctrl = ehci_get_ctrl(dev);
int trynum;
debug("dev=%p, pipe=%lx, buffer=%p, length=%d, req=%p\n", dev, pipe, buffer, length, req);
@@ -560,6 +561,10 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, ehci_writel(&ctrl->hcor->or_usbsts, (usbsts & 0x3f));
/* Enable async. schedule. */
- trynum = 1; /* No more than 2 tries, in case of XACTERR. */
+retry_xacterr:;
- vtd = &qtd[qtd_counter - 1];
- cmd = ehci_readl(&ctrl->hcor->or_usbcmd); cmd |= CMD_ASE; ehci_writel(&ctrl->hcor->or_usbcmd, cmd);
Are you sure always retrying the transfer if you get XACTERR is the right thing to do, even if you get that e.g. on filesystem writes ?

Hi Marek,
On 3/22/20 2:00 PM, Lukasz Majewski wrote:
This code adds check if QT_TOKEN_STATUS_XACTERR error occurred. When it is detected the token is reconfigured and transmission is retried.
This code is the port to newest U-Boot of the fix from - "rayvt" (from [1]).
Links: [1] - https://forum.doozan.com/read.php?3,35295,35295#msg-35295 [2] - https://www.dropbox.com/s/nrkrd1no63viuu8/uboot-bodhi-2016.05-timeoutTD.patc...
Signed-off-by: Lukasz Majewski lukma@denx.de [Unfortunately, the original patch [2] did not contain S-o-B from the original author - "rayvt"]
drivers/usb/host/ehci-hcd.c | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 0a77111f80..45eda7ad24 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -315,6 +315,7 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, int timeout; int ret = 0; struct ehci_ctrl *ctrl = ehci_get_ctrl(dev);
int trynum;
debug("dev=%p, pipe=%lx, buffer=%p, length=%d, req=%p\n",
dev, pipe, buffer, length, req); @@ -560,6 +561,10 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, ehci_writel(&ctrl->hcor->or_usbsts, (usbsts & 0x3f)); /* Enable async. schedule. */
- trynum = 1; /* No more than 2 tries, in case of
XACTERR. */ +retry_xacterr:;
- vtd = &qtd[qtd_counter - 1];
- cmd = ehci_readl(&ctrl->hcor->or_usbcmd); cmd |= CMD_ASE; ehci_writel(&ctrl->hcor->or_usbcmd, cmd);
Are you sure always retrying the transfer if you get XACTERR is the right thing to do, even if you get that e.g. on filesystem writes ?
Please correct me if I'm wrong, but this function - ehci_submit_async - doesn't work with filesystem. It just setups proper EHCI descriptor and checks if it was sent with or without errors.
When the XACTERR happens, proper flag is cleared and the transmission is retried.
Best regards,
Lukasz Majewski
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de

On 3/23/20 8:18 AM, Lukasz Majewski wrote:
Hi Marek,
Hi,
On 3/22/20 2:00 PM, Lukasz Majewski wrote:
This code adds check if QT_TOKEN_STATUS_XACTERR error occurred. When it is detected the token is reconfigured and transmission is retried.
This code is the port to newest U-Boot of the fix from - "rayvt" (from [1]).
Links: [1] - https://forum.doozan.com/read.php?3,35295,35295#msg-35295 [2] - https://www.dropbox.com/s/nrkrd1no63viuu8/uboot-bodhi-2016.05-timeoutTD.patc...
Signed-off-by: Lukasz Majewski lukma@denx.de [Unfortunately, the original patch [2] did not contain S-o-B from the original author - "rayvt"]
drivers/usb/host/ehci-hcd.c | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 0a77111f80..45eda7ad24 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -315,6 +315,7 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, int timeout; int ret = 0; struct ehci_ctrl *ctrl = ehci_get_ctrl(dev);
int trynum;
debug("dev=%p, pipe=%lx, buffer=%p, length=%d, req=%p\n",
dev, pipe, buffer, length, req); @@ -560,6 +561,10 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, ehci_writel(&ctrl->hcor->or_usbsts, (usbsts & 0x3f)); /* Enable async. schedule. */
- trynum = 1; /* No more than 2 tries, in case of
XACTERR. */ +retry_xacterr:;
- vtd = &qtd[qtd_counter - 1];
- cmd = ehci_readl(&ctrl->hcor->or_usbcmd); cmd |= CMD_ASE; ehci_writel(&ctrl->hcor->or_usbcmd, cmd);
Are you sure always retrying the transfer if you get XACTERR is the right thing to do, even if you get that e.g. on filesystem writes ?
Please correct me if I'm wrong, but this function - ehci_submit_async
- doesn't work with filesystem. It just setups proper EHCI descriptor and checks if it was sent with or without errors.
If you're writing into a block device, this function is used. If you get XACTERR during the transfer and you retry, is there a chance the data get actually written to the device ?
When the XACTERR happens, proper flag is cleared and the transmission is retried.
What happens to the payload if it's a host-to-device transfer, do the data get discarded or do they get written to the storage ?

Hi Marek,
On 3/23/20 8:18 AM, Lukasz Majewski wrote:
Hi Marek,
Hi,
On 3/22/20 2:00 PM, Lukasz Majewski wrote:
This code adds check if QT_TOKEN_STATUS_XACTERR error occurred. When it is detected the token is reconfigured and transmission is retried.
This code is the port to newest U-Boot of the fix from - "rayvt" (from [1]).
Links: [1] - https://forum.doozan.com/read.php?3,35295,35295#msg-35295 [2] - https://www.dropbox.com/s/nrkrd1no63viuu8/uboot-bodhi-2016.05-timeoutTD.patc...
Signed-off-by: Lukasz Majewski lukma@denx.de [Unfortunately, the original patch [2] did not contain S-o-B from the original author - "rayvt"]
drivers/usb/host/ehci-hcd.c | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 0a77111f80..45eda7ad24 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -315,6 +315,7 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, int timeout; int ret = 0; struct ehci_ctrl *ctrl = ehci_get_ctrl(dev);
int trynum;
debug("dev=%p, pipe=%lx, buffer=%p, length=%d, req=%p\n",
dev, pipe, buffer, length, req); @@ -560,6 +561,10 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, ehci_writel(&ctrl->hcor->or_usbsts, (usbsts & 0x3f)); /* Enable async. schedule. */
- trynum = 1; /* No more than 2 tries, in case of
XACTERR. */ +retry_xacterr:;
- vtd = &qtd[qtd_counter - 1];
- cmd = ehci_readl(&ctrl->hcor->or_usbcmd); cmd |= CMD_ASE; ehci_writel(&ctrl->hcor->or_usbcmd, cmd);
Are you sure always retrying the transfer if you get XACTERR is the right thing to do, even if you get that e.g. on filesystem writes ?
Please correct me if I'm wrong, but this function - ehci_submit_async
- doesn't work with filesystem. It just setups proper EHCI
descriptor and checks if it was sent with or without errors.
If you're writing into a block device, this function is used. If you get XACTERR during the transfer and you retry, is there a chance the data get actually written to the device ?
As fair as I can tell this code snippet doesn't touch file systems.
When the XACTERR happens, proper flag is cleared and the transmission is retried.
What happens to the payload if it's a host-to-device transfer, do the data get discarded or do they get written to the storage ?
As state above - the resent shall only result is data being to some in-memory buffer.
Best regards,
Lukasz Majewski
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de

On 3/23/20 1:58 PM, Lukasz Majewski wrote:
Hi Marek,
Hi,
On 3/23/20 8:18 AM, Lukasz Majewski wrote:
Hi Marek,
Hi,
On 3/22/20 2:00 PM, Lukasz Majewski wrote:
This code adds check if QT_TOKEN_STATUS_XACTERR error occurred. When it is detected the token is reconfigured and transmission is retried.
This code is the port to newest U-Boot of the fix from - "rayvt" (from [1]).
Links: [1] - https://forum.doozan.com/read.php?3,35295,35295#msg-35295 [2] - https://www.dropbox.com/s/nrkrd1no63viuu8/uboot-bodhi-2016.05-timeoutTD.patc...
Signed-off-by: Lukasz Majewski lukma@denx.de [Unfortunately, the original patch [2] did not contain S-o-B from the original author - "rayvt"]
drivers/usb/host/ehci-hcd.c | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 0a77111f80..45eda7ad24 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -315,6 +315,7 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, int timeout; int ret = 0; struct ehci_ctrl *ctrl = ehci_get_ctrl(dev);
int trynum;
debug("dev=%p, pipe=%lx, buffer=%p, length=%d, req=%p\n",
dev, pipe, buffer, length, req); @@ -560,6 +561,10 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, ehci_writel(&ctrl->hcor->or_usbsts, (usbsts & 0x3f)); /* Enable async. schedule. */
- trynum = 1; /* No more than 2 tries, in case of
XACTERR. */ +retry_xacterr:;
- vtd = &qtd[qtd_counter - 1];
- cmd = ehci_readl(&ctrl->hcor->or_usbcmd); cmd |= CMD_ASE; ehci_writel(&ctrl->hcor->or_usbcmd, cmd);
Are you sure always retrying the transfer if you get XACTERR is the right thing to do, even if you get that e.g. on filesystem writes ?
Please correct me if I'm wrong, but this function - ehci_submit_async
- doesn't work with filesystem. It just setups proper EHCI
descriptor and checks if it was sent with or without errors.
If you're writing into a block device, this function is used. If you get XACTERR during the transfer and you retry, is there a chance the data get actually written to the device ?
As fair as I can tell this code snippet doesn't touch file systems.
It could be any sort of write into block storage, it does not have to be filesystem. The filesystem was just an example of write type we trigger more often.
When the XACTERR happens, proper flag is cleared and the transmission is retried.
What happens to the payload if it's a host-to-device transfer, do the data get discarded or do they get written to the storage ?
As state above - the resent shall only result is data being to some in-memory buffer.
If you trigger a write from device memory to remote storage, then it is the other way around.

On Sun, Mar 22, 2020 at 02:00:26PM +0100, Lukasz Majewski wrote:
This patch set is rather a request for testing (and a starting point for the discussion), as it may improve the robustness of USB with some pendrives - and yes sacrifice some performance for reliability. The previous version of this patch: https://patchwork.ozlabs.org/patch/1244928/ fixed issue for some network USB adapters and improved stability on TI boards. This patch also provides very detailed explanation of the problem in the commit message.
With the async support patch applied ( SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6 ), the qhtoken variable has value 0x00 when token shows errors. As a result the error handling path is not executed. This looks like some missing/broken cache flushing - for easier bisecting this patch has been reverted for now
Note that while the original patch returned USB ethernet on the Beagleboard to a functional state with this series applied it's back to non-functional.

Hi Tom,
On Sun, Mar 22, 2020 at 02:00:26PM +0100, Lukasz Majewski wrote:
This patch set is rather a request for testing (and a starting point for the discussion), as it may improve the robustness of USB with some pendrives - and yes sacrifice some performance for reliability. The previous version of this patch: https://patchwork.ozlabs.org/patch/1244928/ fixed issue for some network USB adapters and improved stability on TI boards. This patch also provides very detailed explanation of the problem in the commit message.
With the async support patch applied ( SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6 ), the qhtoken variable has value 0x00 when token shows errors. As a result the error handling path is not executed. This looks like some missing/broken cache flushing - for easier bisecting this patch has been reverted for now
Note that while the original patch returned USB ethernet on the Beagleboard to a functional state with this series applied it's back to non-functional.
Thanks for testing.
The _only_ difference between the first version of this patch and this one is the lack of dynamic reduction of transfer size for the latter. The former reduces the transfer size to 64 blocks (instead of default 240). And with 64 blocks it retries two times the transmission.
Best regards,
Lukasz Majewski
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
participants (3)
-
Lukasz Majewski
-
Marek Vasut
-
Tom Rini