[U-Boot] [PATCH V2 1/4] Replace CONFIG_MMC_BOUNCE_BUFFER with CONFIG_BOUNCE_BUFFER in configs

From: Stephen Warren swarren@nvidia.com
Commits 6dc71c8 "MMC: MXS: Toggle the generic bounce buffer on the boards" and 49a627f "MMC: Remove the MMC bounce buffer" replaced CONFIG_MMC_BOUNCE_BUFFER with CONFIG_BOUNCE_BUFFER, but missed converting a few boards over to the new option. Fix this.
Signed-off-by: Stephen Warren swarren@nvidia.com --- v2: New patch. --- include/configs/mx6qarm2.h | 2 +- include/configs/mx6qsabre_common.h | 2 +- include/configs/mx6qsabrelite.h | 2 +- include/configs/sc_sps_1.h | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/configs/mx6qarm2.h b/include/configs/mx6qarm2.h index 23562a8..28a3deb 100644 --- a/include/configs/mx6qarm2.h +++ b/include/configs/mx6qarm2.h @@ -50,7 +50,7 @@ #define CONFIG_MMC #define CONFIG_CMD_MMC #define CONFIG_GENERIC_MMC -#define CONFIG_MMC_BOUNCE_BUFFER +#define CONFIG_BOUNCE_BUFFER #define CONFIG_CMD_FAT #define CONFIG_DOS_PARTITION
diff --git a/include/configs/mx6qsabre_common.h b/include/configs/mx6qsabre_common.h index bfb9cd4..a5c93d0 100644 --- a/include/configs/mx6qsabre_common.h +++ b/include/configs/mx6qsabre_common.h @@ -45,7 +45,7 @@ #define CONFIG_MMC #define CONFIG_CMD_MMC #define CONFIG_GENERIC_MMC -#define CONFIG_MMC_BOUNCE_BUFFER +#define CONFIG_BOUNCE_BUFFER #define CONFIG_CMD_EXT2 #define CONFIG_CMD_FAT #define CONFIG_DOS_PARTITION diff --git a/include/configs/mx6qsabrelite.h b/include/configs/mx6qsabrelite.h index b56d7ca..a28d5a5 100644 --- a/include/configs/mx6qsabrelite.h +++ b/include/configs/mx6qsabrelite.h @@ -72,7 +72,7 @@ #define CONFIG_MMC #define CONFIG_CMD_MMC #define CONFIG_GENERIC_MMC -#define CONFIG_MMC_BOUNCE_BUFFER +#define CONFIG_BOUNCE_BUFFER #define CONFIG_CMD_EXT2 #define CONFIG_CMD_FAT #define CONFIG_DOS_PARTITION diff --git a/include/configs/sc_sps_1.h b/include/configs/sc_sps_1.h index f5dc393..cb99d58 100644 --- a/include/configs/sc_sps_1.h +++ b/include/configs/sc_sps_1.h @@ -140,7 +140,7 @@ #ifdef CONFIG_CMD_MMC #define CONFIG_APBH_DMA #define CONFIG_MMC -#define CONFIG_MMC_BOUNCE_BUFFER +#define CONFIG_BOUNCE_BUFFER #define CONFIG_GENERIC_MMC #define CONFIG_MXS_MMC #endif

From: Stephen Warren swarren@nvidia.com
If any driver ever needs to use the bounce buffer API, it always needs to use it. As such, providing a dummy implementation of those APIs when CONFIG_BOUNCE_BUFFER isn't defined does not make sense. Remove the dummy implementation.
Signed-off-by: Stephen Warren swarren@nvidia.com --- v2: New patch. --- include/bouncebuf.h | 14 -------------- 1 files changed, 0 insertions(+), 14 deletions(-)
diff --git a/include/bouncebuf.h b/include/bouncebuf.h index 31021c5..aa2278c 100644 --- a/include/bouncebuf.h +++ b/include/bouncebuf.h @@ -51,7 +51,6 @@ */ #define GEN_BB_RW (GEN_BB_READ | GEN_BB_WRITE)
-#ifdef CONFIG_BOUNCE_BUFFER /** * bounce_buffer_start() -- Start the bounce buffer session * data: pointer to buffer to be aligned @@ -70,18 +69,5 @@ int bounce_buffer_start(void **data, size_t len, void **backup, uint8_t flags); * flags: flags describing the transaction, see above. */ int bounce_buffer_stop(void **data, size_t len, void **backup, uint8_t flags); -#else -static inline int bounce_buffer_start(void **data, size_t len, void **backup, - uint8_t flags) -{ - return 0; -} - -static inline int bounce_buffer_stop(void **data, size_t len, void **backup, - uint8_t flags) -{ - return 0; -} -#endif
#endif

On Tue, Nov 6, 2012 at 1:27 PM, Stephen Warren swarren@wwwdotorg.org wrote:
From: Stephen Warren swarren@nvidia.com
If any driver ever needs to use the bounce buffer API, it always needs to use it. As such, providing a dummy implementation of those APIs when CONFIG_BOUNCE_BUFFER isn't defined does not make sense. Remove the dummy implementation.
Signed-off-by: Stephen Warren swarren@nvidia.com
Tested on Seaboard, loading a kernel from ext2 partition.
Acked-by: Simon Glass sjg@chromium.org Tested-by: Simon Glass sjg@chromium.org

From: Stephen Warren swarren@nvidia.com
The current bouncebuf API requires all parameters to be passed to both bounce_buffer_start() and bounce_buffer_stop(). Modify the bouncebuf start function to accept a state structure as a parameter, and only require that state struct to be passed to the stop function. This simplifies usage of the bounce buffer by clients.
Don't modify the data pointer, but rather store the temporary buffer in this state struct. The bouncebuf code ensures that client code can always use a single buffer pointer in the state structure, irrespective of whether a bounce buffer actually had to be allocated.
Move cache management logic into the bounce buffer code, so that each client doesn't have to duplicate this. I believe there's no need to invalidate the buffer before a DMA operation, since flushing the cache should prevent any write-backs.
Update the MXS MMC driver for this change.
Signed-off-by: Stephen Warren swarren@nvidia.com --- v2: * Move cache management into bounce buffer code. * Simplify addr_aligned(). * free() any allocated bounce buffer. * Added comments to struct bounce_buffer. * Change flags parameter from uint8_t to unsigned int. * s/bounce_buffer_state/bounce_buffer/. * Include <linux/types.h> in bouncebuf.h so it's self-contained. * Minor changes to rebase on top of earlier changed patches. --- common/bouncebuf.c | 75 +++++++++++++++++++++++++++---------------------- drivers/mmc/mxsmmc.c | 30 ++++--------------- include/bouncebuf.h | 33 ++++++++++++++++------ 3 files changed, 72 insertions(+), 66 deletions(-)
diff --git a/common/bouncebuf.c b/common/bouncebuf.c index 4f827f8..1df12cd 100644 --- a/common/bouncebuf.c +++ b/common/bouncebuf.c @@ -27,21 +27,19 @@ #include <errno.h> #include <bouncebuf.h>
-static int addr_aligned(void *data, size_t len) +static int addr_aligned(struct bounce_buffer *state) { const ulong align_mask = ARCH_DMA_MINALIGN - 1;
/* Check if start is aligned */ - if ((ulong)data & align_mask) { - debug("Unaligned start address %p\n", data); + if ((ulong)state->user_buffer & align_mask) { + debug("Unaligned buffer address %p\n", state->user_buffer); return 0; }
- data += len; - - /* Check if end is aligned */ - if ((ulong)data & align_mask) { - debug("Unaligned end address %p\n", data); + /* Check if length is aligned */ + if (state->len != state->len_aligned) { + debug("Unaligned buffer length %d\n", state->len); return 0; }
@@ -49,44 +47,53 @@ static int addr_aligned(void *data, size_t len) return 1; }
-int bounce_buffer_start(void **data, size_t len, void **backup, uint8_t flags) +int bounce_buffer_start(struct bounce_buffer *state, void *data, + size_t len, unsigned int flags) { - void *tmp; - size_t alen; - - if (addr_aligned(*data, len)) { - *backup = NULL; - return 0; + state->user_buffer = data; + state->bounce_buffer = data; + state->len = len; + state->len_aligned = roundup(len, ARCH_DMA_MINALIGN); + state->flags = flags; + + if (!addr_aligned(state)) { + state->bounce_buffer = memalign(ARCH_DMA_MINALIGN, + state->len_aligned); + if (!state->bounce_buffer) + return -ENOMEM; + + if (state->flags & GEN_BB_READ) + memcpy(state->bounce_buffer, state->user_buffer, + state->len); }
- alen = roundup(len, ARCH_DMA_MINALIGN); - tmp = memalign(ARCH_DMA_MINALIGN, alen); - - if (!tmp) - return -ENOMEM; - - if (flags & GEN_BB_READ) - memcpy(tmp, *data, len); - - *backup = *data; - *data = tmp; + /* + * Flush data to RAM so DMA reads can pick it up, + * and any CPU writebacks don't race with DMA writes + */ + flush_dcache_range((unsigned long)state->bounce_buffer, + (unsigned long)(state->bounce_buffer) + + state->len_aligned);
return 0; }
-int bounce_buffer_stop(void **data, size_t len, void **backup, uint8_t flags) +int bounce_buffer_stop(struct bounce_buffer *state) { - void *tmp = *data; + if (state->flags & GEN_BB_WRITE) { + /* Invalidate cache so that CPU can see any newly DMA'd data */ + invalidate_dcache_range((unsigned long)state->bounce_buffer, + (unsigned long)(state->bounce_buffer) + + state->len_aligned); + }
- /* The buffer was already aligned, since "backup" is NULL. */ - if (!*backup) + if (state->bounce_buffer == state->user_buffer) return 0;
- if (flags & GEN_BB_WRITE) - memcpy(*backup, *data, len); + if (state->flags & GEN_BB_WRITE) + memcpy(state->user_buffer, state->bounce_buffer, state->len);
- *data = *backup; - free(tmp); + free(state->bounce_buffer);
return 0; } diff --git a/drivers/mmc/mxsmmc.c b/drivers/mmc/mxsmmc.c index 109acbf..024df59 100644 --- a/drivers/mmc/mxsmmc.c +++ b/drivers/mmc/mxsmmc.c @@ -96,11 +96,11 @@ static int mxsmmc_send_cmd_pio(struct mxsmmc_priv *priv, struct mmc_data *data) static int mxsmmc_send_cmd_dma(struct mxsmmc_priv *priv, struct mmc_data *data) { uint32_t data_count = data->blocksize * data->blocks; - uint32_t cache_data_count = roundup(data_count, ARCH_DMA_MINALIGN); int dmach; struct mxs_dma_desc *desc = priv->desc; - void *addr, *backup; - uint8_t flags; + void *addr; + unsigned int flags; + struct bounce_buffer bbstate;
memset(desc, 0, sizeof(struct mxs_dma_desc)); desc->address = (dma_addr_t)desc; @@ -115,19 +115,9 @@ static int mxsmmc_send_cmd_dma(struct mxsmmc_priv *priv, struct mmc_data *data) flags = GEN_BB_READ; }
- bounce_buffer_start(&addr, data_count, &backup, flags); + bounce_buffer_start(&bbstate, addr, data_count, flags);
- priv->desc->cmd.address = (dma_addr_t)addr; - - if (data->flags & MMC_DATA_WRITE) { - /* Flush data to DRAM so DMA can pick them up */ - flush_dcache_range((uint32_t)addr, - (uint32_t)(addr) + cache_data_count); - } - - /* Invalidate the area, so no writeback into the RAM races with DMA */ - invalidate_dcache_range((uint32_t)priv->desc->cmd.address, - (uint32_t)(priv->desc->cmd.address + cache_data_count)); + priv->desc->cmd.address = (dma_addr_t)bbstate.bounce_buffer;
priv->desc->cmd.data |= MXS_DMA_DESC_IRQ | MXS_DMA_DESC_DEC_SEM | (data_count << MXS_DMA_DESC_BYTES_OFFSET); @@ -135,17 +125,11 @@ static int mxsmmc_send_cmd_dma(struct mxsmmc_priv *priv, struct mmc_data *data) dmach = MXS_DMA_CHANNEL_AHB_APBH_SSP0 + priv->id; mxs_dma_desc_append(dmach, priv->desc); if (mxs_dma_go(dmach)) { - bounce_buffer_stop(&addr, data_count, &backup, flags); + bounce_buffer_stop(&bbstate); return COMM_ERR; }
- /* The data arrived into DRAM, invalidate cache over them */ - if (data->flags & MMC_DATA_READ) { - invalidate_dcache_range((uint32_t)addr, - (uint32_t)(addr) + cache_data_count); - } - - bounce_buffer_stop(&addr, data_count, &backup, flags); + bounce_buffer_stop(&bbstate);
return 0; } diff --git a/include/bouncebuf.h b/include/bouncebuf.h index aa2278c..707b588 100644 --- a/include/bouncebuf.h +++ b/include/bouncebuf.h @@ -25,6 +25,8 @@ #ifndef __INCLUDE_BOUNCEBUF_H__ #define __INCLUDE_BOUNCEBUF_H__
+#include <linux/types.h> + /* * GEN_BB_READ -- Data are read from the buffer eg. by DMA hardware. * The source buffer is copied into the bounce buffer (if unaligned, otherwise @@ -51,23 +53,36 @@ */ #define GEN_BB_RW (GEN_BB_READ | GEN_BB_WRITE)
+struct bounce_buffer { + /* Copy of data parameter passed to start() */ + void *user_buffer; + /* + * DMA-aligned buffer. This field is always set to the value that + * should be used for DMA; either equal to .user_buffer, or to a + * freshly allocated aligned buffer. + */ + void *bounce_buffer; + /* Copy of len parameter passed to start() */ + size_t len; + /* DMA-aligned buffer length */ + size_t len_aligned; + /* Copy of flags parameter passed to start() */ + unsigned int flags; +}; + /** * bounce_buffer_start() -- Start the bounce buffer session + * state: stores state passed between bounce_buffer_{start,stop} * data: pointer to buffer to be aligned * len: length of the buffer - * backup: pointer to backup buffer (the original value is stored here if - * needed * flags: flags describing the transaction, see above. */ -int bounce_buffer_start(void **data, size_t len, void **backup, uint8_t flags); +int bounce_buffer_start(struct bounce_buffer *state, void *data, + size_t len, unsigned int flags); /** * bounce_buffer_stop() -- Finish the bounce buffer session - * data: pointer to buffer that was aligned - * len: length of the buffer - * backup: pointer to backup buffer (the original value is stored here if - * needed - * flags: flags describing the transaction, see above. + * state: stores state passed between bounce_buffer_{start,stop} */ -int bounce_buffer_stop(void **data, size_t len, void **backup, uint8_t flags); +int bounce_buffer_stop(struct bounce_buffer *state);
#endif

On Tue, Nov 6, 2012 at 1:27 PM, Stephen Warren swarren@wwwdotorg.org wrote:
From: Stephen Warren swarren@nvidia.com
The current bouncebuf API requires all parameters to be passed to both bounce_buffer_start() and bounce_buffer_stop(). Modify the bouncebuf start function to accept a state structure as a parameter, and only require that state struct to be passed to the stop function. This simplifies usage of the bounce buffer by clients.
Don't modify the data pointer, but rather store the temporary buffer in this state struct. The bouncebuf code ensures that client code can always use a single buffer pointer in the state structure, irrespective of whether a bounce buffer actually had to be allocated.
Move cache management logic into the bounce buffer code, so that each client doesn't have to duplicate this. I believe there's no need to invalidate the buffer before a DMA operation, since flushing the cache should prevent any write-backs.
Update the MXS MMC driver for this change.
Signed-off-by: Stephen Warren swarren@nvidia.com
Tested on Seaboard, loading a kernel from ext2 partition.
Acked-by: Simon Glass sjg@chromium.org Tested-by: Simon Glass sjg@chromium.org
This is a really nice change, thank you Stephen.
Regards, Simon

From: Stephen Warren swarren@nvidia.com
Tegra's MMC driver does DMA, and hence needs cache-aligned buffers. In some cases (e.g. user load commands) this cannot be guaranteed by callers of the MMC APIs. To solve this, modify the Tegra MMC driver to use the new bounce_buffer_*() APIs.
Note: Ideally, all U-Boot code will always provide address- and size- aligned buffers, so a bounce buffer will only ever be needed for user- supplied buffers (e.g. load commands). Ensuring this removes the need for performance-sucking bounce buffer cache management and memcpy()s. The one known exception at present is the SCR buffer in sd_change_freq(), which is only 8 bytes long. Solving this requires enhancing struct mmc_data to know the difference between buffer size and transferred data size, or forcing all callers of mmc_send_cmd() to have allocated buffers using ALLOC_CACHE_ALIGN_BUFFER(), which while true in this case, is not enforced in any way at present, and so cannot be assumed by the core MMC code.
Signed-off-by: Stephen Warren swarren@nvidia.com -- v2: * Simplified patch by splitting bounce buffer API usage into a wrapper function, which calls mmc_send_cmd_bounced(); a renamed version of the original mmc_send_cmd(). * Adjusted for bouncebuffer patch changes --- drivers/mmc/tegra_mmc.c | 64 ++++++++++++++++++++++++------------- include/configs/tegra20-common.h | 3 ++ 2 files changed, 44 insertions(+), 23 deletions(-)
diff --git a/drivers/mmc/tegra_mmc.c b/drivers/mmc/tegra_mmc.c index 1fd5592..d749ab0 100644 --- a/drivers/mmc/tegra_mmc.c +++ b/drivers/mmc/tegra_mmc.c @@ -19,6 +19,7 @@ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
+#include <bouncebuf.h> #include <common.h> #include <asm/gpio.h> #include <asm/io.h> @@ -66,14 +67,17 @@ static void tegra_get_setup(struct mmc_host *host, int dev_index) host->reg = (struct tegra_mmc *)host->base; }
-static void mmc_prepare_data(struct mmc_host *host, struct mmc_data *data) +static void mmc_prepare_data(struct mmc_host *host, struct mmc_data *data, + struct bounce_buffer *bbstate) { unsigned char ctrl;
- debug("data->dest: %08X, data->blocks: %u, data->blocksize: %u\n", - (u32)data->dest, data->blocks, data->blocksize);
- writel((u32)data->dest, &host->reg->sysad); + debug("buf: %p (%p), data->blocks: %u, data->blocksize: %u\n", + bbstate->bounce_buffer, bbstate->user_buffer, data->blocks, + data->blocksize); + + writel((u32)bbstate->bounce_buffer, &host->reg->sysad); /* * DMASEL[4:3] * 00 = Selects SDMA @@ -114,14 +118,6 @@ static void mmc_set_transfer_mode(struct mmc_host *host, struct mmc_data *data) if (data->flags & MMC_DATA_READ) mode |= TEGRA_MMC_TRNMOD_DATA_XFER_DIR_SEL_READ;
- if (data->flags & MMC_DATA_WRITE) { - if ((uintptr_t)data->src & (ARCH_DMA_MINALIGN - 1)) - printf("Warning: unaligned write to %p may fail\n", - data->src); - flush_dcache_range((ulong)data->src, (ulong)data->src + - data->blocks * data->blocksize); - } - writew(mode, &host->reg->trnmod); }
@@ -156,8 +152,8 @@ static int mmc_wait_inhibit(struct mmc_host *host, return 0; }
-static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, - struct mmc_data *data) +static int mmc_send_cmd_bounced(struct mmc *mmc, struct mmc_cmd *cmd, + struct mmc_data *data, struct bounce_buffer *bbstate) { struct mmc_host *host = (struct mmc_host *)mmc->priv; int flags, i; @@ -172,7 +168,7 @@ static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, return result;
if (data) - mmc_prepare_data(host, data); + mmc_prepare_data(host, data, bbstate);
debug("cmd->arg: %08x\n", cmd->cmdarg); writel(cmd->cmdarg, &host->reg->argument); @@ -322,20 +318,42 @@ static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, } } writel(mask, &host->reg->norintsts); - if (data->flags & MMC_DATA_READ) { - if ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) - printf("Warning: unaligned read from %p " - "may fail\n", data->dest); - invalidate_dcache_range((ulong)data->dest, - (ulong)data->dest + - data->blocks * data->blocksize); - } }
udelay(1000); return 0; }
+static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, + struct mmc_data *data) +{ + void *buf; + unsigned int bbflags; + size_t len; + struct bounce_buffer bbstate; + int ret; + + if (data) { + if (data->flags & MMC_DATA_READ) { + buf = data->dest; + bbflags = GEN_BB_WRITE; + } else { + buf = (void *)data->src; + bbflags = GEN_BB_READ; + } + len = data->blocks * data->blocksize; + + bounce_buffer_start(&bbstate, buf, len, bbflags); + } + + ret = mmc_send_cmd_bounced(mmc, cmd, data, &bbstate); + + if (data) + bounce_buffer_stop(&bbstate); + + return ret; +} + static void mmc_change_clock(struct mmc_host *host, uint clock) { int div; diff --git a/include/configs/tegra20-common.h b/include/configs/tegra20-common.h index 65b23cf..181af70 100644 --- a/include/configs/tegra20-common.h +++ b/include/configs/tegra20-common.h @@ -203,4 +203,7 @@ #define CONFIG_SYS_NAND_SELF_INIT #define CONFIG_SYS_NAND_ONFI_DETECTION
+/* Misc utility code */ +#define CONFIG_BOUNCE_BUFFER + #endif /* __TEGRA20_COMMON_H */

Hi Stephen,
On Tue, Nov 6, 2012 at 1:27 PM, Stephen Warren swarren@wwwdotorg.org wrote:
From: Stephen Warren swarren@nvidia.com
Tegra's MMC driver does DMA, and hence needs cache-aligned buffers. In some cases (e.g. user load commands) this cannot be guaranteed by callers of the MMC APIs. To solve this, modify the Tegra MMC driver to use the new bounce_buffer_*() APIs.
Note: Ideally, all U-Boot code will always provide address- and size- aligned buffers, so a bounce buffer will only ever be needed for user- supplied buffers (e.g. load commands). Ensuring this removes the need for performance-sucking bounce buffer cache management and memcpy()s. The one known exception at present is the SCR buffer in sd_change_freq(), which is only 8 bytes long. Solving this requires enhancing struct mmc_data to know the difference between buffer size and transferred data size, or forcing all callers of mmc_send_cmd() to have allocated buffers using ALLOC_CACHE_ALIGN_BUFFER(), which while true in this case, is not enforced in any way at present, and so cannot be assumed by the core MMC code.
Signed-off-by: Stephen Warren swarren@nvidia.com
Tested on Seaboard, loading a kernel from ext2 partition.
Acked-by: Simon Glass sjg@chromium.org Tested-by: Simon Glass sjg@chromium.org
It's really nice to clean up the cache code as well, very slick.
Regards, Simon

On Tue, Nov 6, 2012 at 1:27 PM, Stephen Warren swarren@wwwdotorg.org wrote:
From: Stephen Warren swarren@nvidia.com
Commits 6dc71c8 "MMC: MXS: Toggle the generic bounce buffer on the boards" and 49a627f "MMC: Remove the MMC bounce buffer" replaced CONFIG_MMC_BOUNCE_BUFFER with CONFIG_BOUNCE_BUFFER, but missed converting a few boards over to the new option. Fix this.
Signed-off-by: Stephen Warren swarren@nvidia.com
Tested on Seaboard, loading a kernel from ext2 partition.
Acked-by: Simon Glass sjg@chromium.org Tested-by: Simon Glass sjg@chromium.org

On 11/06/2012 02:27 PM, Stephen Warren wrote:
Commits 6dc71c8 "MMC: MXS: Toggle the generic bounce buffer on the boards" and 49a627f "MMC: Remove the MMC bounce buffer" replaced CONFIG_MMC_BOUNCE_BUFFER with CONFIG_BOUNCE_BUFFER, but missed converting a few boards over to the new option. Fix this.
Do these patches look good to be applied? I'd like to see them in the release, since they fix an actual bug on Tegra (buffer corruption, which definitely leads to /at least/ SD cards being accessed as 1-bit rather than 4-bit on some Tegra platforms some of the time).
participants (2)
-
Simon Glass
-
Stephen Warren