Re: [U-Boot] ehci-hcd: Allow cleanups to happen on an EHCI timeout.

Hi Simon,
Sorry for the delayed response to your email. I was just trying to prepare a proper response to your email with collating information about similar fixes to the EHCI timeout but never got to completing it. Please excuse me, thanks.
I'm following up on your feedback for the patch titled: ehci-hcd: Allow cleanups to happen on an EHCI timeout. [1]
On Sat, Jun 25, 2011 at 12:28 PM, Joel A Fernandes agnel.joel@gmail.com wrote:
With this, the EHCI seems to "recover" from a timeout. This is particularly observable if you were to ping the wrong IP Address and then ping the correct one or if there was a temporary failure during tftp sessions.
All it takes is one timeout to disable it. If you have a noisy network (lot of traffic), even if the traffic is not for the board, the timeouts don't occur.
Signed-off-by: Joel A Fernandes agnel.joel@gmail.com
Robert, Could you see if this patch solves the issue you're seeing without increasing the timeout? Simon, Could this be a fix for a similar issue you were seeing with asix?
Yes this is better (ASIX can recover from a timeout), but I am concerned that it carries on without reporting an error. Is that right?
Should we disable async schedule and then return an error?
Actually, with this patch, the error message is still displayed before disabling async schedule.
With respect to using a bulk USB stick (some of which take 3s or more to respond to a submit) this doesn't make any difference for me. It seems to take a long time to respond the first time, so the 5s timeout seems prudent.
Since this sorts out the network side we can probably skip that patch.
Are you suggesting we revert the patch you had submitted [2], instead of deleting the goto line as done in my patch? I think it would be better if we left [2] in and allowed the async disable to happen after a timeout.
There are other patches that are reported to fix the issue such as [3] and [4], but I think they appear to be more like workarounds and delay the occurrence of the event of a timeout itself. A timeout which would occur for any other reason such as too many USB devices connected to the hub can trigger the problem, and, not running async schedule and the other code after the timeout seems to make EHCI unrecoverable.
Looking forward to your suggestions, Thanks,
-Joel
[1] http://patchwork.ozlabs.org/patch/102041/ [2] http://lists.denx.de/pipermail/u-boot/2011-February/087043.html [3] http://patchwork.ozlabs.org/patch/100367/ [4] https://github.com/kylemanna/u-boot/commit/a2211d841b5c5b67e1f336a7d8d1e6580...

[Adding Simon to CC]
Hi Simon,
Sorry for the delayed response to your email. I was just trying to prepare a proper response to your email with collating information about similar fixes to the EHCI timeout but never got to completing it. Please excuse me, thanks.
I'm following up on your feedback for the patch titled: ehci-hcd: Allow cleanups to happen on an EHCI timeout. [1]
On Sat, Jun 25, 2011 at 12:28 PM, Joel A Fernandes agnel.joel@gmail.com wrote:
With this, the EHCI seems to "recover" from a timeout. This is particularly observable if you were to ping the wrong IP Address and then ping the correct one or if there was a temporary failure during tftp sessions.
All it takes is one timeout to disable it. If you have a noisy network (lot of traffic), even if the traffic is not for the board, the timeouts don't occur.
Signed-off-by: Joel A Fernandes agnel.joel@gmail.com
Robert, Could you see if this patch solves the issue you're seeing without increasing the timeout? Simon, Could this be a fix for a similar issue you were seeing with asix?
Yes this is better (ASIX can recover from a timeout), but I am concerned that it carries on without reporting an error. Is that right?
Should we disable async schedule and then return an error?
Actually, with this patch, the error message is still displayed before disabling async schedule.
With respect to using a bulk USB stick (some of which take 3s or more to respond to a submit) this doesn't make any difference for me. It seems to take a long time to respond the first time, so the 5s timeout seems prudent.
Since this sorts out the network side we can probably skip that patch.
Are you suggesting we revert the patch you had submitted [2], instead of deleting the goto line as done in my patch? I think it would be better if we left [2] in and allowed the async disable to happen after a timeout like I'm doing.
There are other patches that are reported to fix the issue such as [3] and [4], but I think they are more like workarounds and delay the occurrence of the event of a timeout itself. A timeout which would occur for any other reason such as too many USB devices connected to the hub can trigger the problem, and, not running async schedule and the other code after the timeout seems to make EHCI unrecoverable.
Looking forward to your suggestions, Thanks,
-Joel
[1] http://patchwork.ozlabs.org/patch/102041/ [2] http://lists.denx.de/pipermail/u-boot/2011-February/087043.html [3] http://patchwork.ozlabs.org/patch/100367/ [4] https://github.com/kylemanna/u-boot/commit/a2211d841b5c5b67e1f336a7d8d1e6580...

Hi Joel,
On Fri, Aug 12, 2011 at 4:19 PM, Joel A Fernandes agnel.joel@gmail.com wrote:
[Adding Simon to CC]
Hi Simon,
Sorry for the delayed response to your email. I was just trying to prepare a proper response to your email with collating information about similar fixes to the EHCI timeout but never got to completing it. Please excuse me, thanks.
No hurry!
I'm following up on your feedback for the patch titled: ehci-hcd: Allow cleanups to happen on an EHCI timeout. [1]
On Sat, Jun 25, 2011 at 12:28 PM, Joel A Fernandes agnel.joel@gmail.com wrote:
With this, the EHCI seems to "recover" from a timeout. This is particularly observable if you were to ping the wrong IP Address and then ping the correct one or if there was a temporary failure during tftp sessions.
All it takes is one timeout to disable it. If you have a noisy network (lot of traffic), even if the traffic is not for the board, the timeouts don't occur.
Signed-off-by: Joel A Fernandes agnel.joel@gmail.com
Robert, Could you see if this patch solves the issue you're seeing without increasing the timeout? Simon, Could this be a fix for a similar issue you were seeing with asix?
Yes this is better (ASIX can recover from a timeout), but I am concerned that it carries on without reporting an error. Is that right?
Should we disable async schedule and then return an error?
Actually, with this patch, the error message is still displayed before disabling async schedule.
OK I see, fine. Really it is closer to the original behavior, but with the printf() to warn the user.
With respect to using a bulk USB stick (some of which take 3s or more to respond to a submit) this doesn't make any difference for me. It seems to take a long time to respond the first time, so the 5s timeout seems prudent.
Since this sorts out the network side we can probably skip that patch.
Are you suggesting we revert the patch you had submitted [2], instead of deleting the goto line as done in my patch? I think it would be better if we left [2] in and allowed the async disable to happen after a timeout like I'm doing.
There are other patches that are reported to fix the issue such as [3] and [4], but I think they are more like workarounds and delay the occurrence of the event of a timeout itself. A timeout which would occur for any other reason such as too many USB devices connected to the hub can trigger the problem, and, not running async schedule and the other code after the timeout seems to make EHCI unrecoverable.
My feeling was that the time was more a function of the device that is plugged in than the USB port/peripheral. Perhaps someone will find a device which needs a 10s timeout, so I agree just increasing it is not really the solution.
I found that once the device timed out it needed a reset to work - just resubmitting the urb didn't work for me. Maybe I had some other problem.
Anyway I think your patch looks good, thank you.
Regards, Simon
Looking forward to your suggestions, Thanks,
-Joel
[1] http://patchwork.ozlabs.org/patch/102041/ [2] http://lists.denx.de/pipermail/u-boot/2011-February/087043.html [3] http://patchwork.ozlabs.org/patch/100367/ [4] https://github.com/kylemanna/u-boot/commit/a2211d841b5c5b67e1f336a7d8d1e6580...

Hi Simon,
Thanks a lot for reviewing the issue.
With respect to using a bulk USB stick (some of which take 3s or more to respond to a submit) this doesn't make any difference for me. It seems to take a long time to respond the first time, so the 5s timeout seems prudent.
Since this sorts out the network side we can probably skip that patch.
Are you suggesting we revert the patch you had submitted [2], instead of deleting the goto line as done in my patch? I think it would be better if we left [2] in and allowed the async disable to happen after a timeout like I'm doing.
There are other patches that are reported to fix the issue such as [3] and [4], but I think they are more like workarounds and delay the occurrence of the event of a timeout itself. A timeout which would occur for any other reason such as too many USB devices connected to the hub can trigger the problem, and, not running async schedule and the other code after the timeout seems to make EHCI unrecoverable.
My feeling was that the time was more a function of the device that is plugged in than the USB port/peripheral. Perhaps someone will find a device which needs a 10s timeout, so I agree just increasing it is not really the solution.
I found that once the device timed out it needed a reset to work - just resubmitting the urb didn't work for me. Maybe I had some other problem.
Anyway I think your patch looks good, thank you.
Could I add your Acked-by to the submission as well?
thanks,
Joel

On Wed, Aug 17, 2011 at 4:47 PM, Joel A Fernandes agnel.joel@gmail.com wrote:
Hi Simon,
Thanks a lot for reviewing the issue.
With respect to using a bulk USB stick (some of which take 3s or more to respond to a submit) this doesn't make any difference for me. It seems to take a long time to respond the first time, so the 5s timeout seems prudent.
Since this sorts out the network side we can probably skip that patch.
Are you suggesting we revert the patch you had submitted [2], instead of deleting the goto line as done in my patch? I think it would be better if we left [2] in and allowed the async disable to happen after a timeout like I'm doing.
There are other patches that are reported to fix the issue such as [3] and [4], but I think they are more like workarounds and delay the occurrence of the event of a timeout itself. A timeout which would occur for any other reason such as too many USB devices connected to the hub can trigger the problem, and, not running async schedule and the other code after the timeout seems to make EHCI unrecoverable.
My feeling was that the time was more a function of the device that is plugged in than the USB port/peripheral. Perhaps someone will find a device which needs a 10s timeout, so I agree just increasing it is not really the solution.
I found that once the device timed out it needed a reset to work - just resubmitting the urb didn't work for me. Maybe I had some other problem.
Anyway I think your patch looks good, thank you.
Could I add your Acked-by to the submission as well?
Hi Joel,
Yes.
Acked-by: Simon Glass sglass@chromium.org
Regards, Simon
thanks,
Joel
participants (2)
-
Joel A Fernandes
-
Simon Glass