Function xhci_bulk_tx( ) failed to handle double transfer EVENT TRB case in one call

Hello, On platform LS1028ARDB, we observed USB3.0/2.0 dongle failed to do tftp data transfer and caused system reset, I have some analysis below but still your input for help:
=>usb reset&&setenv ethact r8152_eth&&tftp a0000000 test_data resetting USB... Bus usb3@3100000: Register 200017f NbrPorts 2 Starting the controller USB XHCI 1.00 Bus usb3@3110000: Register 200017f NbrPorts 2 Starting the controller USB XHCI 1.00 scanning bus usb3@3100000 for devices... Warning: r8152_eth MAC addresses don't match: Address in ROM is d4:6e:0e:0c:8d:14 Address in environment is 00:0e:c6:c9:5e:0f 2 USB Device(s) found scanning bus usb3@3110000 for devices... 1 USB Device(s) found scanning usb for storage devices... 0 Storage Device(s) found Waiting for Ethernet connection... done. Using r8152_eth device TFTP from server 10.192.208.233; our IP address is 10.193.21.125 Filename 'test_data'. Load address: 0xa0000000 Loading: BUG at drivers/usb/host/xhci-ring.c:726/xhci_bulk_tx()! BUG! ### ERROR ### Please RESET the board ###
Further debugging shows that is due to xhci_blk_tx() miss handling 2nd event TRB in one call, causing in next call (for different EP), it got that missed event TRB and found ep_index mismatched, then hold system. A typical scenario is shown as below (with my debug prints added):
dev=00000000fbb4f3c0, pipe=c0010283, buffer=00000000fbb4fd80, length=2048----------------------------buffer would cross 64KB boundary, so we will send request in more than 1 TRB, this is the key issue trigger condition xhci_bulk_tx()#0.1.running_total:0x280 xhci_bulk_tx()#0.2.trb_buff_len:0x280 xhci_bulk_tx()#0.3.running_total:0x280 xhci_bulk_tx()#0.4.num_trbs:0x2--------------------------2 Transfer TRB xhci_bulk_tx()#0.5.running_total:0x10280 xhci_bulk_tx()#0.start_trb:0x00000000fbb47140 ----------xhci_bulk_tx()#0.&ring->enqueue->generic:0x00000000fbb47140----------Assemble 1st Transfer TRB xhci_bulk_tx()#0.addr:0xfbb4fd80 xhci_bulk_tx()#0.trb_buff_len:0x280 xhci_bulk_tx()#0.running_total:0x280 xhci_bulk_tx()#0.length:0x800 xhci_bulk_tx()#0.TRB_MAX_BUFF_SIZE:0x10000 ----------xhci_bulk_tx()#0.&ring->enqueue->generic:0x00000000fbb47150----------Assemble 2nd Transfer TRB xhci_bulk_tx()#0.addr:0xfbb50000 xhci_bulk_tx()#0.trb_buff_len:0x580 xhci_bulk_tx()#0.running_total:0x800 xhci_bulk_tx()#0.length:0x800 xhci_bulk_tx()#0.TRB_MAX_BUFF_SIZE:0x10000 xhci_bulk_tx()#0.event->trans_event.buffer:0x00000000fbb47140----------------handle event TRB for 1st Transfer TRB xhci_bulk_tx()#0.event->trans_event.transfer_len:0xd000180 xhci_bulk_tx()#0.event->trans_event.flags:0x1058001 xhci_bulk_tx()#0.event->len:384 xhci_bulk_tx()#0.event->complete_code:13 xhci_bulk_tx()#1.field:0x1058001 xhci_bulk_tx()#2.TRB_TO_EP_ENDEX(field):0x4 xhci_bulk_tx()#3.ep_index:0x4----------------------------------------------------------xhci_bulk_tx() return
dev=00000000fbb4f3c0, pipe=c0018203, buffer=00000000fbb2f9c0, length=110----------an other call comming, and for different EP (ep_index = 05) xhci_bulk_tx()#0.1.running_total:0x640 xhci_bulk_tx()#0.2.trb_buff_len:0x640 xhci_bulk_tx()#0.3.running_total:0x640 xhci_bulk_tx()#0.4.num_trbs:0x1 xhci_bulk_tx()#0.5.running_total:0x640 xhci_bulk_tx()#0.start_trb:0x00000000fbb47610 ----------xhci_bulk_tx()#0.&ring->enqueue->generic:0x00000000fbb47610---------queue 1st transfer TRB xhci_bulk_tx()#0.addr:0xfbb2f9c0 xhci_bulk_tx()#0.trb_buff_len:0x6e xhci_bulk_tx()#0.running_total:0x6e xhci_bulk_tx()#0.length:0x6e xhci_bulk_tx()#0.TRB_MAX_BUFF_SIZE:0x10000 xhci_bulk_tx()#0.event->trans_event.buffer:0x00000000fbb47150------------hand event TRB, note that this buffer is for last Transfer TRB in last call (see above '2nd Transfer TRB') xhci_bulk_tx()#0.event->trans_event.transfer_len:0x1000000 xhci_bulk_tx()#0.event->trans_event.flags:0x1058000 xhci_bulk_tx()#0.event->len:0 xhci_bulk_tx()#0.event->complete_code:1 xhci_bulk_tx()#1.field:0x1058000 xhci_bulk_tx()#2.TRB_TO_EP_ENDEX(field):0x4----------------ep_index is 4 rather than 5, then cause BUG() xhci_bulk_tx()#3.ep_index:0x5------------------------------------- ep_index mis-match wait for event again!!!!!!!!!---------------------------------- BUG at drivers/usb/host/xhci-ring.c:759/xhci_bulk_tx()! BUG! ### ERROR ### Please RESET the board ###
Basing on above analysis, I ever tried to remove below code from this function, issue disappear, which prove my assumption: diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 092ed6e..28197ef 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -591,14 +591,7 @@ int xhci_bulk_tx(struct usb_device *udev, unsigned long pipe, ep_ctx = xhci_get_ep_ctx(ctrl, virt_dev->out_ctx, ep_index);
ring = virt_dev->eps[ep_index].ring; - /* - * How much data is (potentially) left before the 64KB boundary? - * XHCI Spec puts restriction( TABLE 49 and 6.4.1 section of XHCI Spec) - * that the buffer should not span 64KB boundary. if so - * we send request in more than 1 TRB by chaining them. - */ - running_total = TRB_MAX_BUFF_SIZE - - (lower_32_bits(val_64) & (TRB_MAX_BUFF_SIZE - 1)); + running_total = length; trb_buff_len = running_total; running_total &= TRB_MAX_BUFF_SIZE - 1;
Besides that, I also ever tried to let xhci_bulk_tx() repeatedly handle multiple EVENT TRB in one call. However, things works for several rounds but end up with event TRB Ring hang finally, reason unknow. - Log: dev=00000000fbb4f3c0, pipe=c0010283, buffer=00000000fbb4fd80, length=2048 xhci_bulk_tx()#0.1.running_total:0x280 xhci_bulk_tx()#0.2.trb_buff_len:0x280 xhci_bulk_tx()#0.3.running_total:0x280 xhci_bulk_tx()#0.4.num_trbs:0x2 xhci_bulk_tx()#0.5.running_total:0x10280 xhci_bulk_tx()#0.start_trb:0x00000000fbb47160 ----------xhci_bulk_tx()#0.&ring->enqueue->generic:0x00000000fbb47160---------------1st Transfer TRB ----------xhci_bulk_tx()#0.&ring->enqueue->generic:0x00000000fbb47170-------------2nd TransferTRB xhci_bulk_tx()#0.num_trbs_count:2 num_trbs_count > 1!!!!!!!!!--------------------------------------Here I add code to handle another event TRB xhci_bulk_tx()#0.num_trbs_count:1 xhci_bulk_tx()#0.event->trans_event.buffer:0x00000000fbb47170----------------------Now we got the final event TRB xhci_bulk_tx()#0.event->trans_event.transfer_len:0x1000000 xhci_bulk_tx()#0.event->trans_event.flags:0x1058000 xhci_bulk_tx()#0.event->len:0 xhci_bulk_tx()#0.event->complete_code:1 xhci_bulk_tx()#1.field:0x1058000 xhci_bulk_tx()#2.TRB_TO_EP_ENDEX(field):0x4 xhci_bulk_tx()#3.ep_index:0x4 xhci_bulk_tx()#4.ret:0-------------------------------------Return success ...----------------------------------------------------------------This success can last for several round call for the same EP dev=00000000fbb4f3c0, pipe=c0010283, buffer=00000000fbb4fd80, length=2048--------------The last call when event Ring hang xhci_bulk_tx()#0.1.running_total:0x280 xhci_bulk_tx()#0.2.trb_buff_len:0x280 xhci_bulk_tx()#0.3.running_total:0x280 xhci_bulk_tx()#0.4.num_trbs:0x2 xhci_bulk_tx()#0.5.running_total:0x10280 xhci_bulk_tx()#0.start_trb:0x00000000fbb47180 ----------xhci_bulk_tx()#0.&ring->enqueue->generic:0x00000000fbb47180 xhci_bulk_tx()#0.addr:0xfbb4fd80 xhci_bulk_tx()#0.trb_buff_len:0x280 xhci_bulk_tx()#0.running_total:0x280 xhci_bulk_tx()#0.length:0x800 xhci_bulk_tx()#0.TRB_MAX_BUFF_SIZE:0x10000 ----------xhci_bulk_tx()#0.&ring->enqueue->generic:0x00000000fbb47190 xhci_bulk_tx()#0.addr:0xfbb50000 xhci_bulk_tx()#0.trb_buff_len:0x580 xhci_bulk_tx()#0.running_total:0x800 xhci_bulk_tx()#0.length:0x800 xhci_bulk_tx()#0.TRB_MAX_BUFF_SIZE:0x10000 xhci_bulk_tx()#0.num_trbs_count:2 num_trbs_count > 1!!!!!!!!! XHCI bulk transfer timed out, aborting.................................. Rx: failed to receive: -5 T
- Related hacking: diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 092ed6e..0b4da26 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -558,6 +558,7 @@ int xhci_bulk_tx(struct usb_device *udev, unsigned long pipe, int length, void *buffer) { int num_trbs = 0; + int num_trbs_count = 0; struct xhci_generic_trb *start_trb; bool first_trb = false; int start_cycle; @@ -614,6 +615,7 @@ int xhci_bulk_tx(struct usb_device *udev, unsigned long pipe, num_trbs++; running_total += TRB_MAX_BUFF_SIZE; } + num_trbs_count = num_trbs;
/* * XXX: Calling routine prepare_ring() called in place of @@ -713,7 +715,7 @@ int xhci_bulk_tx(struct usb_device *udev, unsigned long pipe, } while (running_total < length);
giveback_first_trb(udev, ep_index, start_cycle, start_trb); - +again: event = xhci_wait_for_event(ctrl, TRB_TRANSFER); if (!event) { debug("XHCI bulk transfer timed out, aborting...\n"); @@ -722,6 +724,12 @@ int xhci_bulk_tx(struct usb_device *udev, unsigned long pipe, udev->act_len = 0; return -ETIMEDOUT; } + if(num_trbs_count-- > 1) { + printf("num_trbs_count > 1!!!!!!!!!\n"); + xhci_acknowledge_event(ctrl); + goto again; + } + field = le32_to_cpu(event->trans_event.flags);
BUG_ON(TRB_TO_SLOT_ID(field) != slot_id);
So, any comment? Thanks in advance.
Regards, Ran
participants (1)
-
Ran Wang