[PATCH v2 0/3] dwc3: gadget: properly fix cache operations

We experience huge problems with cache handling on Qualcomm systems, and it appears the dcache handling in the DWC3 gadget code is quite wrong and causes operational issues.
This serie fixes the dcache operations on unaligned data, and properly invalidate buffers when reading back data from hardware.
Signed-off-by: Neil Armstrong neil.armstrong@linaro.org --- Changes in v2: - Fix typo in drivers/usb/dwc3/core.h and rewrite patch 1 commit message - Link to v1: https://lore.kernel.org/r/20240719-u-boot-dwc3-gadget-dcache-fixup-v1-0-58a5...
--- Neil Armstrong (3): usb: dwc3: allocate setup_buf with dma_alloc_coherent() usb: dwc3: fix dcache flush range calculation usb: dwc3: invalidate dcache on buffer used in interrupt handling
drivers/usb/dwc3/core.h | 2 ++ drivers/usb/dwc3/ep0.c | 6 ++++-- drivers/usb/dwc3/gadget.c | 10 ++++++---- drivers/usb/dwc3/io.h | 13 ++++++++++++- 4 files changed, 24 insertions(+), 7 deletions(-) --- base-commit: 3f772959501c99fbe5aa0b22a36efe3478d1ae1c change-id: 20240719-u-boot-dwc3-gadget-dcache-fixup-ea1e92758663
Best regards,

Since setup_buf is also consumed by hardware DMA, aligns it's allocation like other hardware buffers by introduce setup_buf_addr populated by dma_alloc_coherent(), and use it to pass the physical address of the buffer to the hardware.
Reviewed-by: Mattijs Korpershoek mkorpershoek@baylibre.com Signed-off-by: Neil Armstrong neil.armstrong@linaro.org --- drivers/usb/dwc3/core.h | 2 ++ drivers/usb/dwc3/ep0.c | 4 ++-- drivers/usb/dwc3/gadget.c | 8 ++++---- 3 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h index 7374ce950da..b572ea340c8 100644 --- a/drivers/usb/dwc3/core.h +++ b/drivers/usb/dwc3/core.h @@ -670,6 +670,7 @@ struct dwc3_scratchpad_array { * @ep0_trb: dma address of ep0_trb * @ep0_usb_req: dummy req used while handling STD USB requests * @ep0_bounce_addr: dma address of ep0_bounce + * @setup_buf_addr: dma address of setup_buf * @scratch_addr: dma address of scratchbuf * @lock: for synchronizing * @dev: pointer to our struct device @@ -757,6 +758,7 @@ struct dwc3 { dma_addr_t ep0_trb_addr; dma_addr_t ep0_bounce_addr; dma_addr_t scratch_addr; + dma_addr_t setup_buf_addr; struct dwc3_request ep0_usb_req;
/* device lock */ diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c index 16b11ce3d9f..0c7e0123368 100644 --- a/drivers/usb/dwc3/ep0.c +++ b/drivers/usb/dwc3/ep0.c @@ -381,7 +381,7 @@ static int dwc3_ep0_handle_status(struct dwc3 *dwc, dep = dwc->eps[0]; dwc->ep0_usb_req.dep = dep; dwc->ep0_usb_req.request.length = sizeof(*response_pkt); - dwc->ep0_usb_req.request.buf = dwc->setup_buf; + dwc->ep0_usb_req.request.buf = (void *)dwc->setup_buf_addr; dwc->ep0_usb_req.request.complete = dwc3_ep0_status_cmpl;
return __dwc3_gadget_ep0_queue(dep, &dwc->ep0_usb_req); @@ -663,7 +663,7 @@ static int dwc3_ep0_set_sel(struct dwc3 *dwc, struct usb_ctrlrequest *ctrl) dep = dwc->eps[0]; dwc->ep0_usb_req.dep = dep; dwc->ep0_usb_req.request.length = dep->endpoint.maxpacket; - dwc->ep0_usb_req.request.buf = dwc->setup_buf; + dwc->ep0_usb_req.request.buf = (void *)dwc->setup_buf_addr; dwc->ep0_usb_req.request.complete = dwc3_ep0_set_sel_cmpl;
return __dwc3_gadget_ep0_queue(dep, &dwc->ep0_usb_req); diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 7d6bcc2627f..d41b590afb8 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -2622,8 +2622,8 @@ int dwc3_gadget_init(struct dwc3 *dwc) goto err1; }
- dwc->setup_buf = memalign(CONFIG_SYS_CACHELINE_SIZE, - DWC3_EP0_BOUNCE_SIZE); + dwc->setup_buf = dma_alloc_coherent(DWC3_EP0_BOUNCE_SIZE, + (unsigned long *)&dwc->setup_buf_addr); if (!dwc->setup_buf) { ret = -ENOMEM; goto err2; @@ -2670,7 +2670,7 @@ err4: dma_free_coherent(dwc->ep0_bounce);
err3: - kfree(dwc->setup_buf); + dma_free_coherent(dwc->setup_buf);
err2: dma_free_coherent(dwc->ep0_trb); @@ -2692,7 +2692,7 @@ void dwc3_gadget_exit(struct dwc3 *dwc)
dma_free_coherent(dwc->ep0_bounce);
- kfree(dwc->setup_buf); + dma_free_coherent(dwc->setup_buf);
dma_free_coherent(dwc->ep0_trb);

The current flush operation will omit doing a flush/invalidate on the first and last bytes if the base address and size are not aligned with DMA_MINALIGN.
This causes operation failures Qualcomm platforms.
Take in account the alignment and size of the buffer and also flush the previous and last cacheline.
Reviewed-by: Mattijs Korpershoek mkorpershoek@baylibre.com Signed-off-by: Neil Armstrong neil.armstrong@linaro.org --- drivers/usb/dwc3/io.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/io.h b/drivers/usb/dwc3/io.h index 04791d4c9be..1aaf5413c6d 100644 --- a/drivers/usb/dwc3/io.h +++ b/drivers/usb/dwc3/io.h @@ -50,6 +50,9 @@ static inline void dwc3_writel(void __iomem *base, u32 offset, u32 value)
static inline void dwc3_flush_cache(uintptr_t addr, int length) { - flush_dcache_range(addr, addr + ROUND(length, CACHELINE_SIZE)); + uintptr_t start_addr = (uintptr_t)addr & ~(ARCH_DMA_MINALIGN - 1); + uintptr_t end_addr = ALIGN((uintptr_t)addr + length, ARCH_DMA_MINALIGN); + + flush_dcache_range(start_addr, end_addr); } #endif /* __DRIVERS_USB_DWC3_IO_H */

On 7/24/24 5:48 PM, Neil Armstrong wrote:
The current flush operation will omit doing a flush/invalidate on the first and last bytes if the base address and size are not aligned with DMA_MINALIGN.
This causes operation failures Qualcomm platforms.
Take in account the alignment and size of the buffer and also flush the previous and last cacheline.
Reviewed-by: Mattijs Korpershoek mkorpershoek@baylibre.com Signed-off-by: Neil Armstrong neil.armstrong@linaro.org
drivers/usb/dwc3/io.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/io.h b/drivers/usb/dwc3/io.h index 04791d4c9be..1aaf5413c6d 100644 --- a/drivers/usb/dwc3/io.h +++ b/drivers/usb/dwc3/io.h @@ -50,6 +50,9 @@ static inline void dwc3_writel(void __iomem *base, u32 offset, u32 value)
static inline void dwc3_flush_cache(uintptr_t addr, int length) {
- flush_dcache_range(addr, addr + ROUND(length, CACHELINE_SIZE));
- uintptr_t start_addr = (uintptr_t)addr & ~(ARCH_DMA_MINALIGN - 1);
- uintptr_t end_addr = ALIGN((uintptr_t)addr + length, ARCH_DMA_MINALIGN);
- flush_dcache_range(start_addr, end_addr);
include/cpu_func.h:void flush_dcache_range(unsigned long start, unsigned long stop);
So you shouldn't be feeding this function uintptr_t , right ?
Also, why change CACHELINE_SIZE to ARCH_DMA_MINALIGN ?

On 01/10/2024 17:21, Marek Vasut wrote:
On 7/24/24 5:48 PM, Neil Armstrong wrote:
The current flush operation will omit doing a flush/invalidate on the first and last bytes if the base address and size are not aligned with DMA_MINALIGN.
This causes operation failures Qualcomm platforms.
Take in account the alignment and size of the buffer and also flush the previous and last cacheline.
Reviewed-by: Mattijs Korpershoek mkorpershoek@baylibre.com Signed-off-by: Neil Armstrong neil.armstrong@linaro.org
drivers/usb/dwc3/io.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/io.h b/drivers/usb/dwc3/io.h index 04791d4c9be..1aaf5413c6d 100644 --- a/drivers/usb/dwc3/io.h +++ b/drivers/usb/dwc3/io.h @@ -50,6 +50,9 @@ static inline void dwc3_writel(void __iomem *base, u32 offset, u32 value) static inline void dwc3_flush_cache(uintptr_t addr, int length) { - flush_dcache_range(addr, addr + ROUND(length, CACHELINE_SIZE)); + uintptr_t start_addr = (uintptr_t)addr & ~(ARCH_DMA_MINALIGN - 1); + uintptr_t end_addr = ALIGN((uintptr_t)addr + length, ARCH_DMA_MINALIGN);
+ flush_dcache_range(start_addr, end_addr);
include/cpu_func.h:void flush_dcache_range(unsigned long start, unsigned long stop);
So you shouldn't be feeding this function uintptr_t , right ?
uintptr_t is the right type to perform pointer calculation, I should probably cast it to unsigned long for flush_dcache_range.
I'll do this in v2.
Also, why change CACHELINE_SIZE to ARCH_DMA_MINALIGN ?
Because all the memory allocated for DMA was done with dma_alloc_coherent() which uses ARCH_DMA_MINALIGN directly or indirectly.
I should probably remove CACHELINE_SIZE in the same patch.
Thanks, Neil

On 10/1/24 5:32 PM, Neil Armstrong wrote:
On 01/10/2024 17:21, Marek Vasut wrote:
On 7/24/24 5:48 PM, Neil Armstrong wrote:
The current flush operation will omit doing a flush/invalidate on the first and last bytes if the base address and size are not aligned with DMA_MINALIGN.
This causes operation failures Qualcomm platforms.
Take in account the alignment and size of the buffer and also flush the previous and last cacheline.
Reviewed-by: Mattijs Korpershoek mkorpershoek@baylibre.com Signed-off-by: Neil Armstrong neil.armstrong@linaro.org
drivers/usb/dwc3/io.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/io.h b/drivers/usb/dwc3/io.h index 04791d4c9be..1aaf5413c6d 100644 --- a/drivers/usb/dwc3/io.h +++ b/drivers/usb/dwc3/io.h @@ -50,6 +50,9 @@ static inline void dwc3_writel(void __iomem *base, u32 offset, u32 value) static inline void dwc3_flush_cache(uintptr_t addr, int length) { - flush_dcache_range(addr, addr + ROUND(length, CACHELINE_SIZE)); + uintptr_t start_addr = (uintptr_t)addr & ~(ARCH_DMA_MINALIGN - 1); + uintptr_t end_addr = ALIGN((uintptr_t)addr + length, ARCH_DMA_MINALIGN);
+ flush_dcache_range(start_addr, end_addr);
include/cpu_func.h:void flush_dcache_range(unsigned long start, unsigned long stop);
So you shouldn't be feeding this function uintptr_t , right ?
uintptr_t is the right type to perform pointer calculation, I should probably cast it to unsigned long for flush_dcache_range.
Better fix flush_dcache_range(), using unsigned long there is not great, but that's for separate series from this bugfix.
I'll do this in v2.
Also, why change CACHELINE_SIZE to ARCH_DMA_MINALIGN ?
Because all the memory allocated for DMA was done with dma_alloc_coherent() which uses ARCH_DMA_MINALIGN directly or indirectly.
I should probably remove CACHELINE_SIZE in the same patch.
And document this in the commit message, that's useful information.
Thanks

On 01/10/2024 17:47, Marek Vasut wrote:
On 10/1/24 5:32 PM, Neil Armstrong wrote:
On 01/10/2024 17:21, Marek Vasut wrote:
On 7/24/24 5:48 PM, Neil Armstrong wrote:
The current flush operation will omit doing a flush/invalidate on the first and last bytes if the base address and size are not aligned with DMA_MINALIGN.
This causes operation failures Qualcomm platforms.
Take in account the alignment and size of the buffer and also flush the previous and last cacheline.
Reviewed-by: Mattijs Korpershoek mkorpershoek@baylibre.com Signed-off-by: Neil Armstrong neil.armstrong@linaro.org
drivers/usb/dwc3/io.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/io.h b/drivers/usb/dwc3/io.h index 04791d4c9be..1aaf5413c6d 100644 --- a/drivers/usb/dwc3/io.h +++ b/drivers/usb/dwc3/io.h @@ -50,6 +50,9 @@ static inline void dwc3_writel(void __iomem *base, u32 offset, u32 value) static inline void dwc3_flush_cache(uintptr_t addr, int length) { - flush_dcache_range(addr, addr + ROUND(length, CACHELINE_SIZE)); + uintptr_t start_addr = (uintptr_t)addr & ~(ARCH_DMA_MINALIGN - 1); + uintptr_t end_addr = ALIGN((uintptr_t)addr + length, ARCH_DMA_MINALIGN);
+ flush_dcache_range(start_addr, end_addr);
include/cpu_func.h:void flush_dcache_range(unsigned long start, unsigned long stop);
So you shouldn't be feeding this function uintptr_t , right ?
uintptr_t is the right type to perform pointer calculation, I should probably cast it to unsigned long for flush_dcache_range.
Better fix flush_dcache_range(), using unsigned long there is not great, but that's for separate series from this bugfix.
Since uintptr_t is a typedef to unsigned long, it won't change anything.
I'll do this in v2.
Also, why change CACHELINE_SIZE to ARCH_DMA_MINALIGN ?
Because all the memory allocated for DMA was done with dma_alloc_coherent() which uses ARCH_DMA_MINALIGN directly or indirectly.
I should probably remove CACHELINE_SIZE in the same patch.
And document this in the commit message, that's useful information.
Done, thx
Thanks

On Qualcomm systems, the setup buffer and even buffers are in a bad state at interrupt handling, so invalidate the dcache lines for the setup_buf and event buffer to make sure we read correct data written by the hardware.
This fixes the following error: dwc3-generic-peripheral usb@a600000: UNKNOWN IRQ type -1 dwc3-generic-peripheral usb@a600000: UNKNOWN IRQ type 4673109
and invalid situation in dwc3_gadget_giveback() because setup_buf content is read at 0s and leads to fatal crash fixed by [1].
[1] https://lore.kernel.org/all/20240528-topic-sm8x50-dwc3-gadget-crash-fix-v1-1...
Reviewed-by: Mattijs Korpershoek mkorpershoek@baylibre.com Signed-off-by: Neil Armstrong neil.armstrong@linaro.org --- drivers/usb/dwc3/ep0.c | 2 ++ drivers/usb/dwc3/gadget.c | 2 ++ drivers/usb/dwc3/io.h | 8 ++++++++ 3 files changed, 12 insertions(+)
diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c index 0c7e0123368..fc1d5892106 100644 --- a/drivers/usb/dwc3/ep0.c +++ b/drivers/usb/dwc3/ep0.c @@ -743,6 +743,8 @@ static void dwc3_ep0_inspect_setup(struct dwc3 *dwc, if (!dwc->gadget_driver) goto out;
+ dwc3_invalidate_cache(ctrl, sizeof(*ctrl)); + len = le16_to_cpu(ctrl->wLength); if (!len) { dwc->three_stage_setup = false; diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index d41b590afb8..0bc9aee4daa 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -2503,6 +2503,8 @@ static irqreturn_t dwc3_process_event_buf(struct dwc3 *dwc, u32 buf) while (left > 0) { union dwc3_event event;
+ dwc3_invalidate_cache((uintptr_t)evt->buf, evt->length); + event.raw = *(u32 *) (evt->buf + evt->lpos);
dwc3_process_event_entry(dwc, &event); diff --git a/drivers/usb/dwc3/io.h b/drivers/usb/dwc3/io.h index 1aaf5413c6d..7cf05203b0d 100644 --- a/drivers/usb/dwc3/io.h +++ b/drivers/usb/dwc3/io.h @@ -55,4 +55,12 @@ static inline void dwc3_flush_cache(uintptr_t addr, int length)
flush_dcache_range(start_addr, end_addr); } + +static inline void dwc3_invalidate_cache(uintptr_t addr, int length) +{ + uintptr_t start_addr = (uintptr_t)addr & ~(ARCH_DMA_MINALIGN - 1); + uintptr_t end_addr = ALIGN((uintptr_t)addr + length, ARCH_DMA_MINALIGN); + + invalidate_dcache_range(start_addr, end_addr); +} #endif /* __DRIVERS_USB_DWC3_IO_H */

Hi,
On 24/07/2024 17:48, Neil Armstrong wrote:
We experience huge problems with cache handling on Qualcomm systems, and it appears the dcache handling in the DWC3 gadget code is quite wrong and causes operational issues.
This serie fixes the dcache operations on unaligned data, and properly invalidate buffers when reading back data from hardware.
Signed-off-by: Neil Armstrong neil.armstrong@linaro.org
Changes in v2:
- Fix typo in drivers/usb/dwc3/core.h and rewrite patch 1 commit message
- Link to v1: https://lore.kernel.org/r/20240719-u-boot-dwc3-gadget-dcache-fixup-v1-0-58a5...
Neil Armstrong (3): usb: dwc3: allocate setup_buf with dma_alloc_coherent() usb: dwc3: fix dcache flush range calculation usb: dwc3: invalidate dcache on buffer used in interrupt handling
drivers/usb/dwc3/core.h | 2 ++ drivers/usb/dwc3/ep0.c | 6 ++++-- drivers/usb/dwc3/gadget.c | 10 ++++++---- drivers/usb/dwc3/io.h | 13 ++++++++++++- 4 files changed, 24 insertions(+), 7 deletions(-)
base-commit: 3f772959501c99fbe5aa0b22a36efe3478d1ae1c change-id: 20240719-u-boot-dwc3-gadget-dcache-fixup-ea1e92758663
Best regards,
Gentle ping, those fixes are quite important to make USB Gadget reliable on Qualcomm platforms,
Thanks, Neil

On mar., oct. 01, 2024 at 16:43, Neil Armstrong neil.armstrong@linaro.org wrote:
Hi,
On 24/07/2024 17:48, Neil Armstrong wrote:
We experience huge problems with cache handling on Qualcomm systems, and it appears the dcache handling in the DWC3 gadget code is quite wrong and causes operational issues.
This serie fixes the dcache operations on unaligned data, and properly invalidate buffers when reading back data from hardware.
Signed-off-by: Neil Armstrong neil.armstrong@linaro.org
Changes in v2:
- Fix typo in drivers/usb/dwc3/core.h and rewrite patch 1 commit message
- Link to v1: https://lore.kernel.org/r/20240719-u-boot-dwc3-gadget-dcache-fixup-v1-0-58a5...
Neil Armstrong (3): usb: dwc3: allocate setup_buf with dma_alloc_coherent() usb: dwc3: fix dcache flush range calculation usb: dwc3: invalidate dcache on buffer used in interrupt handling
drivers/usb/dwc3/core.h | 2 ++ drivers/usb/dwc3/ep0.c | 6 ++++-- drivers/usb/dwc3/gadget.c | 10 ++++++---- drivers/usb/dwc3/io.h | 13 ++++++++++++- 4 files changed, 24 insertions(+), 7 deletions(-)
base-commit: 3f772959501c99fbe5aa0b22a36efe3478d1ae1c change-id: 20240719-u-boot-dwc3-gadget-dcache-fixup-ea1e92758663
Best regards,
Gentle ping, those fixes are quite important to make USB Gadget reliable on Qualcomm platforms,
Sorry for the wait on this one. I did not act upon this series because it's not assigned to me.
According to patchwork [1], this is assigned to bmeng.
Bin, do you want to review this or can I pick this up through u-boot-dfu since it's usb gadget related?
[1] https://patchwork.ozlabs.org/project/uboot/cover/20240724-u-boot-dwc3-gadget...
Thanks, Mattijs
Thanks, Neil
participants (4)
-
Marek Vasut
-
Mattijs Korpershoek
-
Neil Armstrong
-
neil.armstrong@linaro.org