[U-Boot] [PATCH v4 0/7] dcache support for Raspberry Pi 1

This patchset enables dcache support for Raspberry Pi 1. First the cache support code for arm1136 and 1176 was merged. CONFIG_SYS_CACHELINE_SIZE is defined as 32 bytes which is used as alignment for mailbox buffer allocations. Then rpi mailbox code has now dcache flush for writing the mailbox request and a dcache invalidation for receiving the mailbox answer. Finally the CONFIG_SYS_DCACHE_OFF switch got removed from rpi1 config. It is still set for rpi2 config.
dcache supprt increases the MMC read performance on RPI 1 from 5,4 MiB/s to 12.3 MiB/s. TFTP download over USB Ethernet increases from 2,3 MiB/s to 2,6 to 3,3 MiB/s (seems to vary a lot).
This was tested by the following commands:
fatload mmc 0:1 ${kernel_addr_r} zImage
and
tftp u-boot.bin
Changes in v4: * Added Acked-By and Tested-By of Stephen Warren * Set CONFIG_SYS_CACHELINE_SIZE for rpi and rpi2 appropriately * Updated USB buffer comment about required alignment * Used git format-patch -M to generate patches
Changes in v3: * dwc2 dcache support * Use ALLOC_CACHE_ALIGN_BUFFER for mailbox buffer allocation * Use ARCH_DMA_MINALIGN for size to roundup
Changes in v2: * Merge arm1136/1176 cache code * Use cacheline size as mailbox buffer alignment * Flush/invalidate mailbox buffer up to cacheline size
Alexander Stein (7): arm1136: Remove dead code arm1136/arm1176: Merge cache handling code ARM: bcm283x: Define CONFIG_SYS_CACHELINE_SIZE ARM: bcm283x: Allocate all mailbox buffers cacheline aligned arm/mach-bcm283x/mbox: Flush and invalidate dcache when using fw mailbox dwc2: Add dcache support arm/rpi: Enable dcache
arch/arm/cpu/arm11/Makefile | 8 +++++ arch/arm/cpu/{arm1136 => arm11}/cpu.c | 10 ------ arch/arm/cpu/arm1136/Makefile | 1 - arch/arm/cpu/arm1176/Makefile | 4 ++- arch/arm/cpu/arm1176/cpu.c | 51 ------------------------------- arch/arm/mach-bcm283x/include/mach/mbox.h | 3 ++ arch/arm/mach-bcm283x/mbox.c | 9 ++++++ board/raspberrypi/rpi/rpi.c | 10 +++--- drivers/usb/host/dwc2.c | 18 ++++++++--- drivers/video/bcm2835.c | 4 +-- include/configs/rpi-common.h | 1 - include/configs/rpi.h | 2 ++ include/configs/rpi_2.h | 2 ++ 13 files changed, 48 insertions(+), 75 deletions(-) create mode 100644 arch/arm/cpu/arm11/Makefile rename arch/arm/cpu/{arm1136 => arm11}/cpu.c (94%) delete mode 100644 arch/arm/cpu/arm1176/cpu.c

Apparently lcd_panel_disable is not defined anywhere, so no config for an arm1136 board would have set CONFIG_LCD. Remove the unused code.
Signed-off-by: Alexander Stein alexanders83@web.de Acked-by: Stephen Warren swarren@wwwdotorg.org Tested-by: Stephen Warren swarren@wwwdotorg.org --- arch/arm/cpu/arm1136/cpu.c | 10 ---------- 1 file changed, 10 deletions(-)
diff --git a/arch/arm/cpu/arm1136/cpu.c b/arch/arm/cpu/arm1136/cpu.c index a7aed4b..5d4b3c2 100644 --- a/arch/arm/cpu/arm1136/cpu.c +++ b/arch/arm/cpu/arm1136/cpu.c @@ -32,16 +32,6 @@ int cleanup_before_linux (void)
disable_interrupts ();
-#ifdef CONFIG_LCD - { - extern void lcd_disable(void); - extern void lcd_panel_disable(void); - - lcd_disable(); /* proper disable of lcd & panel */ - lcd_panel_disable(); - } -#endif - /* turn off I/D-cache */ icache_disable(); dcache_disable();

On Fri, Jul 24, 2015 at 09:22:09AM +0200, Alexander Stein wrote:
Apparently lcd_panel_disable is not defined anywhere, so no config for an arm1136 board would have set CONFIG_LCD. Remove the unused code.
Signed-off-by: Alexander Stein alexanders83@web.de Acked-by: Stephen Warren swarren@wwwdotorg.org Tested-by: Stephen Warren swarren@wwwdotorg.org
Applied to u-boot/master, thanks!

As both cores are similar merge the cache handling code for both CPUs to arm11 directory.
Signed-off-by: Alexander Stein alexanders83@web.de Acked-by: Stephen Warren swarren@wwwdotorg.org Tested-by: Stephen Warren swarren@wwwdotorg.org --- Changes in v4: * Used git format-patch -M to generate patches
arch/arm/cpu/arm11/Makefile | 8 ++++++ arch/arm/cpu/{arm1136 => arm11}/cpu.c | 0 arch/arm/cpu/arm1136/Makefile | 1 - arch/arm/cpu/arm1176/Makefile | 4 ++- arch/arm/cpu/arm1176/cpu.c | 51 ----------------------------------- 5 files changed, 11 insertions(+), 53 deletions(-) create mode 100644 arch/arm/cpu/arm11/Makefile rename arch/arm/cpu/{arm1136 => arm11}/cpu.c (100%) delete mode 100644 arch/arm/cpu/arm1176/cpu.c
diff --git a/arch/arm/cpu/arm11/Makefile b/arch/arm/cpu/arm11/Makefile new file mode 100644 index 0000000..2379b0f --- /dev/null +++ b/arch/arm/cpu/arm11/Makefile @@ -0,0 +1,8 @@ +# +# (C) Copyright 2000-2006 +# Wolfgang Denk, DENX Software Engineering, wd@denx.de. +# +# SPDX-License-Identifier: GPL-2.0+ +# + +obj-y = cpu.o diff --git a/arch/arm/cpu/arm1136/cpu.c b/arch/arm/cpu/arm11/cpu.c similarity index 100% rename from arch/arm/cpu/arm1136/cpu.c rename to arch/arm/cpu/arm11/cpu.c diff --git a/arch/arm/cpu/arm1136/Makefile b/arch/arm/cpu/arm1136/Makefile index 56a9390..5d6f0aa 100644 --- a/arch/arm/cpu/arm1136/Makefile +++ b/arch/arm/cpu/arm1136/Makefile @@ -6,7 +6,6 @@ #
extra-y = start.o -obj-y = cpu.o
obj-$(CONFIG_MX31) += mx31/ obj-$(CONFIG_MX35) += mx35/ diff --git a/arch/arm/cpu/arm1176/Makefile b/arch/arm/cpu/arm1176/Makefile index deec427..cd6dc9c 100644 --- a/arch/arm/cpu/arm1176/Makefile +++ b/arch/arm/cpu/arm1176/Makefile @@ -8,5 +8,7 @@ # SPDX-License-Identifier: GPL-2.0+ #
+obj- += dummy.o extra-y = start.o -obj-y = cpu.o + +obj-y += ../arm11/ diff --git a/arch/arm/cpu/arm1176/cpu.c b/arch/arm/cpu/arm1176/cpu.c deleted file mode 100644 index 2d81651..0000000 --- a/arch/arm/cpu/arm1176/cpu.c +++ /dev/null @@ -1,51 +0,0 @@ -/* - * (C) Copyright 2004 Texas Insturments - * - * (C) Copyright 2002 - * Sysgo Real-Time Solutions, GmbH <www.elinos.com> - * Marius Groeger mgroeger@sysgo.de - * - * (C) Copyright 2002 - * Gary Jennejohn, DENX Software Engineering, garyj@denx.de - * - * SPDX-License-Identifier: GPL-2.0+ - */ - -/* - * CPU specific code - */ - -#include <common.h> -#include <command.h> -#include <asm/system.h> - -static void cache_flush (void); - -int cleanup_before_linux (void) -{ - /* - * this function is called just before we call linux - * it prepares the processor for linux - * - * we turn off caches etc ... - */ - - disable_interrupts (); - - /* turn off I/D-cache */ - icache_disable(); - dcache_disable(); - /* flush I/D-cache */ - cache_flush(); - - return 0; -} - -/* flush I/D-cache */ -static void cache_flush (void) -{ - /* invalidate both caches and flush btb */ - asm ("mcr p15, 0, %0, c7, c7, 0": :"r" (0)); - /* mem barrier to sync things */ - asm ("mcr p15, 0, %0, c7, c10, 4": :"r" (0)); -}

On Fri, Jul 24, 2015 at 09:22:10AM +0200, Alexander Stein wrote:
As both cores are similar merge the cache handling code for both CPUs to arm11 directory.
Signed-off-by: Alexander Stein alexanders83@web.de Acked-by: Stephen Warren swarren@wwwdotorg.org Tested-by: Stephen Warren swarren@wwwdotorg.org
Applied to u-boot/master, thanks!

The cacheline is always 32 bytes for arm1176 CPUs, so define it at board config level for cache handling code. The ARM Cortex-A7 has a dcache line size of 64 bytes.
Signed-off-by: Alexander Stein alexanders83@web.de Acked-by: Stephen Warren swarren@wwwdotorg.org Tested-by: Stephen Warren swarren@wwwdotorg.org --- Changes in v4: * Set CONFIG_SYS_CACHELINE_SIZE for rpi and rpi2 separately as they differ
include/configs/rpi.h | 2 ++ include/configs/rpi_2.h | 1 + 2 files changed, 3 insertions(+)
diff --git a/include/configs/rpi.h b/include/configs/rpi.h index ab2f4db..86422e3 100644 --- a/include/configs/rpi.h +++ b/include/configs/rpi.h @@ -7,6 +7,8 @@ #ifndef __CONFIG_H #define __CONFIG_H
+#define CONFIG_SYS_CACHELINE_SIZE 32 + #include "rpi-common.h"
#endif diff --git a/include/configs/rpi_2.h b/include/configs/rpi_2.h index 2e7e74f..13dc8de 100644 --- a/include/configs/rpi_2.h +++ b/include/configs/rpi_2.h @@ -9,6 +9,7 @@
#define CONFIG_SKIP_LOWLEVEL_INIT #define CONFIG_BCM2836 +#define CONFIG_SYS_CACHELINE_SIZE 64
#include "rpi-common.h"

On Fri, Jul 24, 2015 at 09:22:11AM +0200, Alexander Stein wrote:
The cacheline is always 32 bytes for arm1176 CPUs, so define it at board config level for cache handling code. The ARM Cortex-A7 has a dcache line size of 64 bytes.
Signed-off-by: Alexander Stein alexanders83@web.de Acked-by: Stephen Warren swarren@wwwdotorg.org Tested-by: Stephen Warren swarren@wwwdotorg.org
Applied to u-boot/master, thanks!

The mailbox buffer is required to be at least 16 bytes aligned, but for cache invalidation and/or flush it needs to be cacheline aligned. Use ALLOC_CACHE_ALIGN_BUFFER for all mailbox buffer allocations.
Signed-off-by: Alexander Stein alexanders83@web.de Acked-by: Stephen Warren swarren@wwwdotorg.org Tested-by: Stephen Warren swarren@wwwdotorg.org --- Changes in v3: * Use ALLOC_CACHE_ALIGN_BUFFER instead of ALLOC_ALIGN_BUFFER + CONFIG_SYS_CACHELINE_SIZE
board/raspberrypi/rpi/rpi.c | 10 +++++----- drivers/video/bcm2835.c | 4 ++-- 2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/board/raspberrypi/rpi/rpi.c b/board/raspberrypi/rpi/rpi.c index 96fe870..d21750e 100644 --- a/board/raspberrypi/rpi/rpi.c +++ b/board/raspberrypi/rpi/rpi.c @@ -182,7 +182,7 @@ u32 rpi_board_rev = 0;
int dram_init(void) { - ALLOC_ALIGN_BUFFER(struct msg_get_arm_mem, msg, 1, 16); + ALLOC_CACHE_ALIGN_BUFFER(struct msg_get_arm_mem, msg, 1); int ret;
BCM2835_MBOX_INIT_HDR(msg); @@ -212,7 +212,7 @@ static void set_fdtfile(void)
static void set_usbethaddr(void) { - ALLOC_ALIGN_BUFFER(struct msg_get_mac_address, msg, 1, 16); + ALLOC_CACHE_ALIGN_BUFFER(struct msg_get_mac_address, msg, 1); int ret;
if (!models[rpi_board_rev].has_onboard_eth) @@ -245,7 +245,7 @@ int misc_init_r(void)
static int power_on_module(u32 module) { - ALLOC_ALIGN_BUFFER(struct msg_set_power_state, msg_pwr, 1, 16); + ALLOC_CACHE_ALIGN_BUFFER(struct msg_set_power_state, msg_pwr, 1); int ret;
BCM2835_MBOX_INIT_HDR(msg_pwr); @@ -269,7 +269,7 @@ static int power_on_module(u32 module)
static void get_board_rev(void) { - ALLOC_ALIGN_BUFFER(struct msg_get_board_rev, msg, 1, 16); + ALLOC_CACHE_ALIGN_BUFFER(struct msg_get_board_rev, msg, 1); int ret; const char *name;
@@ -324,7 +324,7 @@ int board_init(void)
int board_mmc_init(bd_t *bis) { - ALLOC_ALIGN_BUFFER(struct msg_get_clock_rate, msg_clk, 1, 16); + ALLOC_CACHE_ALIGN_BUFFER(struct msg_get_clock_rate, msg_clk, 1); int ret;
power_on_module(BCM2835_MBOX_POWER_DEVID_SDHCI); diff --git a/drivers/video/bcm2835.c b/drivers/video/bcm2835.c index 1f18231..61d054d 100644 --- a/drivers/video/bcm2835.c +++ b/drivers/video/bcm2835.c @@ -38,8 +38,8 @@ struct msg_setup {
void lcd_ctrl_init(void *lcdbase) { - ALLOC_ALIGN_BUFFER(struct msg_query, msg_query, 1, 16); - ALLOC_ALIGN_BUFFER(struct msg_setup, msg_setup, 1, 16); + ALLOC_CACHE_ALIGN_BUFFER(struct msg_query, msg_query, 1); + ALLOC_CACHE_ALIGN_BUFFER(struct msg_setup, msg_setup, 1); int ret; u32 w, h;

On Fri, Jul 24, 2015 at 09:22:12AM +0200, Alexander Stein wrote:
The mailbox buffer is required to be at least 16 bytes aligned, but for cache invalidation and/or flush it needs to be cacheline aligned. Use ALLOC_CACHE_ALIGN_BUFFER for all mailbox buffer allocations.
Signed-off-by: Alexander Stein alexanders83@web.de Acked-by: Stephen Warren swarren@wwwdotorg.org Tested-by: Stephen Warren swarren@wwwdotorg.org
Applied to u-boot/master, thanks!

When using dcache the setup data for the mailbox must be actually written into memory before calling into firmware. Thus flush and invalidate the memory.
Signed-off-by: Alexander Stein alexanders83@web.de Acked-by: Stephen Warren swarren@wwwdotorg.org Tested-by: Stephen Warren swarren@wwwdotorg.org --- Changes in v3: * Use ARCH_DMA_MINALIGN instead of fixed 32 * Adjust comment in header
Changes in v2: * Add hint in header about alignment requirements * Invalidate cache after calling into mailbox * round size up to next cacheline size
arch/arm/mach-bcm283x/include/mach/mbox.h | 3 +++ arch/arm/mach-bcm283x/mbox.c | 9 +++++++++ 2 files changed, 12 insertions(+)
diff --git a/arch/arm/mach-bcm283x/include/mach/mbox.h b/arch/arm/mach-bcm283x/include/mach/mbox.h index 54d369c..ff959c8 100644 --- a/arch/arm/mach-bcm283x/include/mach/mbox.h +++ b/arch/arm/mach-bcm283x/include/mach/mbox.h @@ -522,6 +522,9 @@ int bcm2835_mbox_call_raw(u32 chan, u32 send, u32 *recv); * a termination value are expected to immediately follow the header in * memory, as required by the property protocol. * + * Each struct bcm2835_mbox_hdr passed must be allocated with + * ALLOC_CACHE_ALIGN_BUFFER(x, y, z) to ensure proper cache flush/invalidate. + * * Returns 0 for success, any other value for error. */ int bcm2835_mbox_call_prop(u32 chan, struct bcm2835_mbox_hdr *buffer); diff --git a/arch/arm/mach-bcm283x/mbox.c b/arch/arm/mach-bcm283x/mbox.c index 1af9be7..311bd8f 100644 --- a/arch/arm/mach-bcm283x/mbox.c +++ b/arch/arm/mach-bcm283x/mbox.c @@ -111,9 +111,18 @@ int bcm2835_mbox_call_prop(u32 chan, struct bcm2835_mbox_hdr *buffer) dump_buf(buffer); #endif
+ flush_dcache_range((unsigned long)buffer, + (unsigned long)((void *)buffer + + roundup(buffer->buf_size, ARCH_DMA_MINALIGN))); + ret = bcm2835_mbox_call_raw(chan, phys_to_bus((u32)buffer), &rbuffer); if (ret) return ret; + + invalidate_dcache_range((unsigned long)buffer, + (unsigned long)((void *)buffer + + roundup(buffer->buf_size, ARCH_DMA_MINALIGN))); + if (rbuffer != phys_to_bus((u32)buffer)) { printf("mbox: Response buffer mismatch\n"); return -1;

On Fri, Jul 24, 2015 at 09:22:13AM +0200, Alexander Stein wrote:
When using dcache the setup data for the mailbox must be actually written into memory before calling into firmware. Thus flush and invalidate the memory.
Signed-off-by: Alexander Stein alexanders83@web.de Acked-by: Stephen Warren swarren@wwwdotorg.org Tested-by: Stephen Warren swarren@wwwdotorg.org
Applied to u-boot/master, thanks!

This adds dcache support for dwc2. The DMA buffers must be DMA aligned and is flushed for outgoing transactions before starting transfer. For ingoing transactions it is invalidated after the transfer has finished.
Signed-off-by: Alexander Stein alexanders83@web.de Acked-by: Stephen Warren swarren@wwwdotorg.org --- Changes in v4: * Updated USB buffer comment about required alignment
drivers/usb/host/dwc2.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/host/dwc2.c b/drivers/usb/host/dwc2.c index eee60a2..b7fb4f8 100644 --- a/drivers/usb/host/dwc2.c +++ b/drivers/usb/host/dwc2.c @@ -21,9 +21,9 @@ #define DWC2_STATUS_BUF_SIZE 64 #define DWC2_DATA_BUF_SIZE (64 * 1024)
-/* We need doubleword-aligned buffers for DMA transfers */ -DEFINE_ALIGN_BUFFER(uint8_t, aligned_buffer, DWC2_DATA_BUF_SIZE, 8); -DEFINE_ALIGN_BUFFER(uint8_t, status_buffer, DWC2_STATUS_BUF_SIZE, 8); +/* We need cacheline-aligned buffers for DMA transfers and dcache support */ +DEFINE_ALIGN_BUFFER(uint8_t, aligned_buffer, DWC2_DATA_BUF_SIZE, ARCH_DMA_MINALIGN); +DEFINE_ALIGN_BUFFER(uint8_t, status_buffer, DWC2_STATUS_BUF_SIZE, ARCH_DMA_MINALIGN);
#define MAX_DEVICE 16 #define MAX_ENDPOINT 16 @@ -802,9 +802,14 @@ int chunk_msg(struct usb_device *dev, unsigned long pipe, int *pid, int in, (*pid << DWC2_HCTSIZ_PID_OFFSET), &hc_regs->hctsiz);
- if (!in) + if (!in) { memcpy(aligned_buffer, (char *)buffer + done, len);
+ flush_dcache_range((unsigned long)aligned_buffer, + (unsigned long)((void *)aligned_buffer + + roundup(len, ARCH_DMA_MINALIGN))); + } + writel(phys_to_bus((unsigned long)aligned_buffer), &hc_regs->hcdma);
@@ -820,6 +825,11 @@ int chunk_msg(struct usb_device *dev, unsigned long pipe, int *pid, int in,
if (in) { xfer_len -= sub; + + invalidate_dcache_range((unsigned long)aligned_buffer, + (unsigned long)((void *)aligned_buffer + + roundup(xfer_len, ARCH_DMA_MINALIGN))); + memcpy(buffer + done, aligned_buffer, xfer_len); if (sub) stop_transfer = 1;

On Fri, Jul 24, 2015 at 09:22:14AM +0200, Alexander Stein wrote:
This adds dcache support for dwc2. The DMA buffers must be DMA aligned and is flushed for outgoing transactions before starting transfer. For ingoing transactions it is invalidated after the transfer has finished.
Signed-off-by: Alexander Stein alexanders83@web.de Acked-by: Stephen Warren swarren@wwwdotorg.org
Applied to u-boot/master, thanks!

Now that mailbox driver supports cache flush and invalidation, we can enable dcache.
Signed-off-by: Alexander Stein alexanders83@web.de Acked-by: Stephen Warren swarren@wwwdotorg.org Tested-by: Stephen Warren swarren@wwwdotorg.org --- include/configs/rpi-common.h | 1 - include/configs/rpi_2.h | 1 + 2 files changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/configs/rpi-common.h b/include/configs/rpi-common.h index 1012cdd..dd638c4 100644 --- a/include/configs/rpi-common.h +++ b/include/configs/rpi-common.h @@ -14,7 +14,6 @@ #define CONFIG_SYS_GENERIC_BOARD #define CONFIG_BCM2835 #define CONFIG_ARCH_CPU_INIT -#define CONFIG_SYS_DCACHE_OFF
#define CONFIG_SYS_TIMER_RATE 1000000 #define CONFIG_SYS_TIMER_COUNTER \ diff --git a/include/configs/rpi_2.h b/include/configs/rpi_2.h index 13dc8de..bea4ebd 100644 --- a/include/configs/rpi_2.h +++ b/include/configs/rpi_2.h @@ -10,6 +10,7 @@ #define CONFIG_SKIP_LOWLEVEL_INIT #define CONFIG_BCM2836 #define CONFIG_SYS_CACHELINE_SIZE 64 +#define CONFIG_SYS_DCACHE_OFF
#include "rpi-common.h"

On Fri, Jul 24, 2015 at 09:22:15AM +0200, Alexander Stein wrote:
Now that mailbox driver supports cache flush and invalidation, we can enable dcache.
Signed-off-by: Alexander Stein alexanders83@web.de Acked-by: Stephen Warren swarren@wwwdotorg.org Tested-by: Stephen Warren swarren@wwwdotorg.org
Applied to u-boot/master, thanks!
participants (2)
-
Alexander Stein
-
Tom Rini