[PATCH] usb: ehci-hcd: Add IAA handshake for removing async QH

According to EHCI spec, software needs to do handshake with HC for safely removing QH from async list. This handshake is implemented by setting IAAD (Interrupt on Async Advance Doorbell) bit in USB_USBCMD register and poll the IAA (Interrupt on Async Advance bit) in the USB_USBSTS to ensure the HC has released all on-chip state that may potentially reference one of the data structures just removed.
Current codes only check active status of the last QTD, but this can't ensure the QH is released from HC. We can meet unrecoverable "EHCI timed out on TD" errors when running SCT tests on USB disk. The USB_ASYNCLISTADDR register is changed to a invalid address when the issue happens. It is fixed after adding the IAA handshake.
Signed-off-by: Ye Li ye.li@nxp.com --- drivers/usb/host/ehci-hcd.c | 25 +++++++++++++++++++++++++ drivers/usb/host/ehci.h | 1 + 2 files changed, 26 insertions(+)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 8933f60..b52a903 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -346,6 +346,28 @@ static int ehci_disable_async(struct ehci_ctrl *ctrl) return ret; }
+static int ehci_iaa_cycle(struct ehci_ctrl *ctrl) +{ + u32 cmd, status; + int ret; + + /* Enable Interrupt on Async Advance Doorbell. */ + cmd = ehci_readl(&ctrl->hcor->or_usbcmd); + cmd |= CMD_IAAD; + ehci_writel(&ctrl->hcor->or_usbcmd, cmd); + + ret = handshake((uint32_t *)&ctrl->hcor->or_usbsts, STS_IAA, STS_IAA, + 10 * 1000); /* 10ms timeout */ + if (ret < 0) + printf("EHCI fail timeout STS_IAA set\n"); + + status = ehci_readl(&ctrl->hcor->or_usbsts); + if ((status & STS_IAA)) + ehci_writel(&ctrl->hcor->or_usbsts, STS_IAA); + + return ret; +} + static int ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, int length, struct devrequest *req) @@ -631,6 +653,9 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, flush_dcache_range((unsigned long)&ctrl->qh_list, ALIGN_END_ADDR(struct QH, &ctrl->qh_list, 1));
+ /* Set IAAD, poll IAA */ + ehci_iaa_cycle(ctrl); + /* * Invalidate the memory area occupied by buffer * Don't try to fix the buffer alignment, if it isn't properly diff --git a/drivers/usb/host/ehci.h b/drivers/usb/host/ehci.h index 8e07554..e9e6f2a5 100644 --- a/drivers/usb/host/ehci.h +++ b/drivers/usb/host/ehci.h @@ -44,6 +44,7 @@ struct ehci_hcor { #define STS_ASS (1 << 15) #define STS_PSS (1 << 14) #define STS_HALT (1 << 12) +#define STS_IAA (1 << 5) uint32_t or_usbintr; #define INTR_UE (1 << 0) /* USB interrupt enable */ #define INTR_UEE (1 << 1) /* USB error interrupt enable */

On 3/8/21 4:35 AM, Ye Li wrote: [...]
+static int ehci_iaa_cycle(struct ehci_ctrl *ctrl) +{
- u32 cmd, status;
- int ret;
- /* Enable Interrupt on Async Advance Doorbell. */
- cmd = ehci_readl(&ctrl->hcor->or_usbcmd);
- cmd |= CMD_IAAD;
- ehci_writel(&ctrl->hcor->or_usbcmd, cmd);
- ret = handshake((uint32_t *)&ctrl->hcor->or_usbsts, STS_IAA, STS_IAA,
Is the (uint32_t *) cast really needed ?
10 * 1000); /* 10ms timeout */
- if (ret < 0)
printf("EHCI fail timeout STS_IAA set\n");
Shouldn't there be some abort ^ if ret < 0 ? Also, add the return value into the printf, it is useful for debugging.
- status = ehci_readl(&ctrl->hcor->or_usbsts);
- if ((status & STS_IAA))
Drop the double parenthesis here, one set of () is enough.
ehci_writel(&ctrl->hcor->or_usbsts, STS_IAA);
- return ret;
+}
Is this a bugfix for this release or is this for next release ? We're in rc3 already, so I would suggest to be careful.

Hi Marek,
On Mon, 2021-03-08 at 09:50 +0100, Marek Vasut wrote:
Caution: EXT Email
On 3/8/21 4:35 AM, Ye Li wrote: [...]
+static int ehci_iaa_cycle(struct ehci_ctrl *ctrl) +{ + u32 cmd, status; + int ret;
+ /* Enable Interrupt on Async Advance Doorbell. */ + cmd = ehci_readl(&ctrl->hcor->or_usbcmd); + cmd |= CMD_IAAD; + ehci_writel(&ctrl->hcor->or_usbcmd, cmd);
+ ret = handshake((uint32_t *)&ctrl->hcor->or_usbsts, STS_IAA, STS_IAA,
Is the (uint32_t *) cast really needed ?
I will remove it in v2.
+ 10 * 1000); /* 10ms timeout */ + if (ret < 0) + printf("EHCI fail timeout STS_IAA set\n");
Shouldn't there be some abort ^ if ret < 0 ? Also, add the return value into the printf, it is useful for debugging.
I don't think need to abort anything when ret < 0. The ret value is very simple in handshake, when ret < 0, it must be -1. Actually the codes refer the ehci_enable_async for the usage of handshake.
+ status = ehci_readl(&ctrl->hcor->or_usbsts); + if ((status & STS_IAA))
Drop the double parenthesis here, one set of () is enough.
I will remove it v2.
+ ehci_writel(&ctrl->hcor->or_usbsts, STS_IAA);
+ return ret; +}
Is this a bugfix for this release or is this for next release ? We're in rc3 already, so I would suggest to be careful.
It is ok to add the fix for next release. From our tests, the issue is not very easy to reproduce by just using usb read/write. In SCT, generally we need to run about 1 hour to see the issue.
Best regards, Ye Li

On 3/9/21 4:18 AM, Ye Li wrote:
Hi Marek,
Hi,
[...]
+ 10 * 1000); /* 10ms timeout */ + if (ret < 0) + printf("EHCI fail timeout STS_IAA set\n");
Shouldn't there be some abort ^ if ret < 0 ? Also, add the return value into the printf, it is useful for debugging.
I don't think need to abort anything when ret < 0. The ret value is very simple in handshake, when ret < 0, it must be -1. Actually the codes refer the ehci_enable_async for the usage of handshake.
If the handshake timed out, that indicates a problem and that problem shouldn't be ignored, but rather propagated, no ? Maybe the entire ehci transfer should be aborted ?
[...]
+ ehci_writel(&ctrl->hcor->or_usbsts, STS_IAA);
+ return ret; +}
Is this a bugfix for this release or is this for next release ? We're in rc3 already, so I would suggest to be careful.
It is ok to add the fix for next release. From our tests, the issue is not very easy to reproduce by just using usb read/write. In SCT, generally we need to run about 1 hour to see the issue.
Can you add details on how to reproduce the issue into the commit message ? That would be highly helpful.

Hi Marek,
On Tue, 2021-03-09 at 09:08 +0100, Marek Vasut wrote:
Caution: EXT Email
On 3/9/21 4:18 AM, Ye Li wrote:
Hi Marek,
Hi,
[...]
+ 10 * 1000); /* 10ms timeout */ + if (ret < 0) + printf("EHCI fail timeout STS_IAA set\n");
Shouldn't there be some abort ^ if ret < 0 ? Also, add the return value into the printf, it is useful for debugging.
I don't think need to abort anything when ret < 0. The ret value is very simple in handshake, when ret < 0, it must be -1. Actually the codes refer the ehci_enable_async for the usage of handshake.
If the handshake timed out, that indicates a problem and that problem shouldn't be ignored, but rather propagated, no ? Maybe the entire ehci transfer should be aborted ?
I supposed the problem means QH does not advance, so next submit of async transfer will have QTD timeout. But it is ok to abort transfer as the error handling.
[...]
+ ehci_writel(&ctrl->hcor->or_usbsts, STS_IAA);
+ return ret; +}
Is this a bugfix for this release or is this for next release ? We're in rc3 already, so I would suggest to be careful.
It is ok to add the fix for next release. From our tests, the issue is not very easy to reproduce by just using usb read/write. In SCT, generally we need to run about 1 hour to see the issue.
Can you add details on how to reproduce the issue into the commit message ? That would be highly helpful.
ok. I will add more.
Best regards, Ye Li

On 3/9/21 10:32 AM, Ye Li wrote:
Hi Marek,
Hi,
[...]
+ 10 * 1000); /* 10ms timeout */ + if (ret < 0) + printf("EHCI fail timeout STS_IAA set\n");
Shouldn't there be some abort ^ if ret < 0 ? Also, add the return value into the printf, it is useful for debugging.
I don't think need to abort anything when ret < 0. The ret value is very simple in handshake, when ret < 0, it must be -1. Actually the codes refer the ehci_enable_async for the usage of handshake.
If the handshake timed out, that indicates a problem and that problem shouldn't be ignored, but rather propagated, no ? Maybe the entire ehci transfer should be aborted ?
I supposed the problem means QH does not advance, so next submit of async transfer will have QTD timeout. But it is ok to abort transfer as the error handling.
If this is an error which would always cause subsequent error, abort here already.
+ ehci_writel(&ctrl->hcor->or_usbsts, STS_IAA);
+ return ret; +}
Is this a bugfix for this release or is this for next release ? We're in rc3 already, so I would suggest to be careful.
It is ok to add the fix for next release. From our tests, the issue is not very easy to reproduce by just using usb read/write. In SCT, generally we need to run about 1 hour to see the issue.
Can you add details on how to reproduce the issue into the commit message ? That would be highly helpful.
ok. I will add more.
Thank you
participants (2)
-
Marek Vasut
-
Ye Li