[U-Boot] [REGRESSION] Ethernet broken on Colibri T20/T30

Hi Simon
As mentioned before I noticed Ethernet (on-module USB ASIX chip) to be broken on master while it still worked fine in v2016.01. I kind of remember having once noticed something along those lines when testing some of your early dm stuff but could not find our discussion about it anymore plus I don't think it did actually lead anywhere.
FWIW it currently crashes as follows (showing Colibri T20 but the same happens on Colibri T30):
U-Boot SPL 2016.01-00205-gb8c5b47 (Jan 17 2016 - 01:31:48) Trying to boot from RAM
U-Boot 2016.01-00205-gb8c5b47 (Jan 17 2016 - 01:31:48 +0100)
TEGRA20 Model: Toradex Colibri T20 Board: Toradex Colibri T20 DRAM: 512 MiB NAND: 1024 MiB MMC: Tegra SD/MMC: 0 *** Warning - bad CRC, using default environment
In: serial Out: lcd Err: lcd USB recovery mode Net: No ethernet found. Hit any key to stop autoboot: 2 0 Colibri T20 # usb start starting USB... USB0: USB EHCI 1.00 USB1: USB EHCI 1.00 USB2: USB EHCI 1.00 scanning bus 1 for devices... 1 USB Device(s) found scanning bus 2 for devices... Warning: asix_eth using MAC address from ROM 2 USB Device(s) found scanning bus 0 for devices... 2 USB Device(s) found Colibri T20 # dhcp BOOTP broadcast 1 EHCI timed out on TD - token=0x88008d80 Rx: failed to receive: -5 BOOTP broadcast 2 EHCI timed out on TD - token=0x8008d80 Rx: failed to receive: -5 BOOTP broadcast 3 EHCI timed out on TD - token=0x88008d80 Rx: failed to receive: -5 BOOTP broadcast 4 EHCI timed out on TD - token=0x8008d80 Rx: failed to receive: -5 BOOTP broadcast 5 EHCI timed out on TD - token=0x88008d80 Rx: failed to receive: -5 BOOTP broadcast 6 EHCI timed out on TD - token=0x8008d80 Rx: failed to receive: -5
Retry time exceeded; starting again data abort pc : [<1fb4d194>] lr : [<1fbb9784>] reloc pc : [<0012d194>] lr : [<00199784>] sp : 1d61c5e0 ip : 6aa665ea fp : 1d62bca0 r10: 00000000 r9 : 1d61fee0 r8 : 1d62bca0 r7 : 00000000 r6 : 1d6312e0 r5 : 1f6da8e0 r4 : 00000000 r3 : 1c585760 r2 : 03155180 r1 : 00000000 r0 : caa661ca Flags: NzCv IRQs off FIQs off Mode SVC_32 Resetting CPU ...
resetting ...
Do you have any clue what could be going on?
Cheers
Marcel

On 01/16/2016 05:51 PM, Marcel Ziswiler wrote:
Hi Simon
As mentioned before I noticed Ethernet (on-module USB ASIX chip) to be broken on master while it still worked fine in v2016.01. I kind of remember having once noticed something along those lines when testing some of your early dm stuff but could not find our discussion about it anymore plus I don't think it did actually lead anywhere.
FWIW it currently crashes as follows (showing Colibri T20 but the same happens on Colibri T30):
U-Boot SPL 2016.01-00205-gb8c5b47 (Jan 17 2016 - 01:31:48)
...
Colibri T20 # usb start starting USB... USB0: USB EHCI 1.00 USB1: USB EHCI 1.00 USB2: USB EHCI 1.00 scanning bus 1 for devices... 1 USB Device(s) found scanning bus 2 for devices... Warning: asix_eth using MAC address from ROM 2 USB Device(s) found scanning bus 0 for devices... 2 USB Device(s) found Colibri T20 # dhcp BOOTP broadcast 1 EHCI timed out on TD - token=0x88008d80 Rx: failed to receive: -5 BOOTP broadcast 2 EHCI timed out on TD - token=0x8008d80 Rx: failed to receive: -5 BOOTP broadcast 3 EHCI timed out on TD - token=0x88008d80 Rx: failed to receive: -5 BOOTP broadcast 4 EHCI timed out on TD - token=0x8008d80 Rx: failed to receive: -5 BOOTP broadcast 5 EHCI timed out on TD - token=0x88008d80 Rx: failed to receive: -5 BOOTP broadcast 6 EHCI timed out on TD - token=0x8008d80 Rx: failed to receive: -5
Retry time exceeded; starting again data abort pc : [<1fb4d194>] lr : [<1fbb9784>] reloc pc : [<0012d194>] lr : [<00199784>] sp : 1d61c5e0 ip : 6aa665ea fp : 1d62bca0 r10: 00000000 r9 : 1d61fee0 r8 : 1d62bca0 r7 : 00000000 r6 : 1d6312e0 r5 : 1f6da8e0 r4 : 00000000 r3 : 1c585760 r2 : 03155180 r1 : 00000000 r0 : caa661ca Flags: NzCv IRQs off FIQs off Mode SVC_32 Resetting CPU ...
resetting ...
Do you have any clue what could be going on?
FWIW, I tried that same commit on Beaver with an asix USB Ethernet device and didn't see that problem, even after aligning the config file for Beaver more closely with colibri_t30 (e.g. disabling PCIe support, copying all the config options that colibri_t30 has but beaver doesn't into beaver's config file).
I wonder if it's anything to do with having 3 USB controllers active rather than 2 (although that seems unlikely; I can see going from 1->2 causing issues, but once 2 work, adding more seems like it should). Perhaps there's a DRAM bandwidth issue, but then I can't see why the patches between v2016.01 and the commit you mentioned would affect that.

Hi,
On 18 January 2016 at 10:42, Stephen Warren swarren@wwwdotorg.org wrote:
On 01/16/2016 05:51 PM, Marcel Ziswiler wrote:
Hi Simon
As mentioned before I noticed Ethernet (on-module USB ASIX chip) to be broken on master while it still worked fine in v2016.01. I kind of remember having once noticed something along those lines when testing some of your early dm stuff but could not find our discussion about it anymore plus I don't think it did actually lead anywhere.
FWIW it currently crashes as follows (showing Colibri T20 but the same happens on Colibri T30):
U-Boot SPL 2016.01-00205-gb8c5b47 (Jan 17 2016 - 01:31:48)
...
Colibri T20 # usb start starting USB... USB0: USB EHCI 1.00 USB1: USB EHCI 1.00 USB2: USB EHCI 1.00 scanning bus 1 for devices... 1 USB Device(s) found scanning bus 2 for devices... Warning: asix_eth using MAC address from ROM 2 USB Device(s) found scanning bus 0 for devices... 2 USB Device(s) found Colibri T20 # dhcp BOOTP broadcast 1 EHCI timed out on TD - token=0x88008d80 Rx: failed to receive: -5 BOOTP broadcast 2 EHCI timed out on TD - token=0x8008d80 Rx: failed to receive: -5 BOOTP broadcast 3 EHCI timed out on TD - token=0x88008d80 Rx: failed to receive: -5 BOOTP broadcast 4 EHCI timed out on TD - token=0x8008d80 Rx: failed to receive: -5 BOOTP broadcast 5 EHCI timed out on TD - token=0x88008d80 Rx: failed to receive: -5 BOOTP broadcast 6 EHCI timed out on TD - token=0x8008d80 Rx: failed to receive: -5
Retry time exceeded; starting again data abort pc : [<1fb4d194>] lr : [<1fbb9784>] reloc pc : [<0012d194>] lr : [<00199784>] sp : 1d61c5e0 ip : 6aa665ea fp : 1d62bca0 r10: 00000000 r9 : 1d61fee0 r8 : 1d62bca0 r7 : 00000000 r6 : 1d6312e0 r5 : 1f6da8e0 r4 : 00000000 r3 : 1c585760 r2 : 03155180 r1 : 00000000 r0 : caa661ca Flags: NzCv IRQs off FIQs off Mode SVC_32 Resetting CPU ...
resetting ...
Do you have any clue what could be going on?
FWIW, I tried that same commit on Beaver with an asix USB Ethernet device and didn't see that problem, even after aligning the config file for Beaver more closely with colibri_t30 (e.g. disabling PCIe support, copying all the config options that colibri_t30 has but beaver doesn't into beaver's config file).
I wonder if it's anything to do with having 3 USB controllers active rather than 2 (although that seems unlikely; I can see going from 1->2 causing issues, but once 2 work, adding more seems like it should). Perhaps there's a DRAM bandwidth issue, but then I can't see why the patches between v2016.01 and the commit you mentioned would affect that.
I have seen the EHCI timeout on and off. I thought it might be due to cache misalignment such that we wail to talk to the hardware. I have never dug into it though, or at least not successfully. If you have a repeatable case it might be worth taking a look.
Regards, Simon
participants (3)
-
Marcel Ziswiler
-
Simon Glass
-
Stephen Warren