
On Tue, Jun 26, 2018 at 4:55 PM, Alexander Graf agraf@suse.de wrote:
On 06/26/2018 02:47 PM, Peter Robinson wrote:
This bug is the combination of dwc2 USB controller and lan78xx USB ethernet controller, which is the combination in use on the Raspberry Pi Model 3 B+.
When the host attempts to receive a packet, but a packet has not arrived, the lan78xx controller responds by setting BIR (Bulk-In Empty Response) to NAK. Unfortunately, this hangs the USB controller and requires the USB controller to be reset.
The fix proposed is to have the lan78xx controller respond by setting BIR to ZLP.
Signed-off-by: Andrew Thomas andrew.thomas@oracle.com
Tested-by: Peter Robinson pbrobinson@gmail.com
Tested on the RPi 3B+ and certainly improves this situation a number of Fedora users have seen.
What exactly have you tested?
Even with this patch, I am not reliably to reliably boot into grub. It almost seems as if the packet buffer keeps getting overwritten by newer packets so that by the time we process the old ones, the ones we wanted to see are gone.
Booting to grub and then a local system, it's seemed improved in terms of stability for me and a few other users, at least was more consistent in getting to grub without "Rx: failed to receive: -5" but then from the various testing different Fedora users have seen prior to this patch it seems that the issue is very network/cable/environment specific. I'm fairly certain that this patch doesn't fix the problem but it at least appears from current reports to improve the situation for some users.
Improve, yes, but not solve it to a point where it's fully usable. And I don't think we have a full grasp on what's broken yet.
I completely agree, it makes the problem go away for a number of users so it's not completely terrible and less people bothering me is a good start IMO even if it's not the final fix.
What I'm seeing here is that booting grub stalls within early bootup (where it loads modules) until I press any key on serial. Pressing a key on a USB keyboard does not help.
Given that there are no interrupts enabled on the RPi at this point, I can only think of caches as the culprit for this breakage. But why did it work with the normal 3B then?
Different USB NIC, maybe the usb3 interface has a different impact on the dwc2 interface. The lan78xx driver seemed to have had little use in general before it landed in the 3B+ and had numerous issues in the Linux driver that have been quickly addressed so I figured that the u-boot driver is probably just as terrible. Due to other commitments I've had little time to investigate that theory further though as my understanding of low level usb isn't huge.
Peter