[U-Boot] [PATCH 1/2] part_dos: allocate cacheline aligned buffers

Some hardware requires the data buffers to be cacheline-aligned to make sure DMA operations can be properly executed.
This patch uses the ALLOC_CACHE_ALIGN_BUFFER macro to allocate buffers with proper alignment. The same was already done for EFI partitions in commit f75dd58 "part_efi: dcache: allocate cacheline aligned buffers".
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de --- This fixes boot failures on Avionic Design Plutux and Medcom boards.
disk/part_dos.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/disk/part_dos.c b/disk/part_dos.c index b5bcb37..c028aaf 100644 --- a/disk/part_dos.c +++ b/disk/part_dos.c @@ -87,7 +87,7 @@ static int test_block_type(unsigned char *buffer)
int test_part_dos (block_dev_desc_t *dev_desc) { - unsigned char buffer[dev_desc->blksz]; + ALLOC_CACHE_ALIGN_BUFFER(unsigned char, buffer, dev_desc->blksz);
if ((dev_desc->block_read(dev_desc->dev, 0, 1, (ulong *) buffer) != 1) || (buffer[DOS_PART_MAGIC_OFFSET + 0] != 0x55) || @@ -102,7 +102,7 @@ int test_part_dos (block_dev_desc_t *dev_desc) static void print_partition_extended (block_dev_desc_t *dev_desc, int ext_part_sector, int relative, int part_num) { - unsigned char buffer[dev_desc->blksz]; + ALLOC_CACHE_ALIGN_BUFFER(unsigned char, buffer, dev_desc->blksz); dos_partition_t *pt; int i;
@@ -166,7 +166,7 @@ static int get_partition_info_extended (block_dev_desc_t *dev_desc, int ext_part int relative, int part_num, int which_part, disk_partition_t *info) { - unsigned char buffer[dev_desc->blksz]; + ALLOC_CACHE_ALIGN_BUFFER(unsigned char, buffer, dev_desc->blksz); dos_partition_t *pt; int i;

The MMC core sometimes reads buffers that are smaller than a complete cacheline, for example when reading the SCR. In order to avoid a warning from the ARM v7 cache handling code, this patch makes sure that complete cachelines are flushed.
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de --- drivers/mmc/tegra2_mmc.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/mmc/tegra2_mmc.c b/drivers/mmc/tegra2_mmc.c index fb8a57d..3615876 100644 --- a/drivers/mmc/tegra2_mmc.c +++ b/drivers/mmc/tegra2_mmc.c @@ -323,12 +323,13 @@ static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, } writel(mask, &host->reg->norintsts); if (data->flags & MMC_DATA_READ) { + ulong end = (ulong)data->dest + + roundup(data->blocks * data->blocksize, + ARCH_DMA_MINALIGN); if ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p " "may fail\n", data->dest); - invalidate_dcache_range((ulong)data->dest, - (ulong)data->dest + - data->blocks * data->blocksize); + invalidate_dcache_range((ulong)data->dest, end); } }

On Tuesday 24 April 2012 03:53:44 Thierry Reding wrote:
The MMC core sometimes reads buffers that are smaller than a complete cacheline, for example when reading the SCR. In order to avoid a warning from the ARM v7 cache handling code, this patch makes sure that complete cachelines are flushed.
this is still wrong. all you've done is bypass the error message without addressing the underlying problem -- you're invalidating too much. -mike

* Mike Frysinger wrote:
On Tuesday 24 April 2012 03:53:44 Thierry Reding wrote:
The MMC core sometimes reads buffers that are smaller than a complete cacheline, for example when reading the SCR. In order to avoid a warning from the ARM v7 cache handling code, this patch makes sure that complete cachelines are flushed.
this is still wrong. all you've done is bypass the error message without addressing the underlying problem -- you're invalidating too much.
Reading 8 bytes is always less than a cacheline, so we don't have much choice, do we? We could of course always read a whole cacheline even if only 8 bytes are requested, but does that have any advantage over reading 8 bytes and then invalidating the cacheline?
Or maybe I'm missing the point.
Thierry

Hi Thierry,
On Thu, Apr 26, 2012 at 6:18 PM, Thierry Reding < thierry.reding@avionic-design.de> wrote:
- Mike Frysinger wrote:
On Tuesday 24 April 2012 03:53:44 Thierry Reding wrote:
The MMC core sometimes reads buffers that are smaller than a complete cacheline, for example when reading the SCR. In order to avoid a
warning
from the ARM v7 cache handling code, this patch makes sure that
complete
cachelines are flushed.
this is still wrong. all you've done is bypass the error message without addressing the underlying problem -- you're invalidating too much.
Reading 8 bytes is always less than a cacheline, so we don't have much choice, do we? We could of course always read a whole cacheline even if only 8 bytes are requested, but does that have any advantage over reading 8 bytes and then invalidating the cacheline?
Or maybe I'm missing the point.
Well the point is that you can read 8 bytes but you still must use a buffer that is large enough for DMA activity. So the caller must allocate a buffer larger than 8 bytes. I worry that what you have done will just introduce obscure bugs, since we will potentially invalidate stack variants (for example) and lose their values.
With the case problems, we are trying to fix them at source (i.e. at the higher level).
Regards, Simon
Thierry
U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

* Simon Glass wrote:
Hi Thierry,
On Thu, Apr 26, 2012 at 6:18 PM, Thierry Reding < thierry.reding@avionic-design.de> wrote:
- Mike Frysinger wrote:
On Tuesday 24 April 2012 03:53:44 Thierry Reding wrote:
The MMC core sometimes reads buffers that are smaller than a complete cacheline, for example when reading the SCR. In order to avoid a
warning
from the ARM v7 cache handling code, this patch makes sure that
complete
cachelines are flushed.
this is still wrong. all you've done is bypass the error message without addressing the underlying problem -- you're invalidating too much.
Reading 8 bytes is always less than a cacheline, so we don't have much choice, do we? We could of course always read a whole cacheline even if only 8 bytes are requested, but does that have any advantage over reading 8 bytes and then invalidating the cacheline?
Or maybe I'm missing the point.
Well the point is that you can read 8 bytes but you still must use a buffer that is large enough for DMA activity. So the caller must allocate a buffer larger than 8 bytes.
The buffer used in this case is allocated with the ALLOC_CACHE_ALIGN_BUFFER() macro, so the buffer size is not an issue. However, the mmc_data structure's blocksize field is set to 8, which is the value that will eventually end up in invalidate_dcache_range().
For reference, see sd_change_freq() in drivers/mmc/mmc.c.
I worry that what you have done will just introduce obscure bugs, since we will potentially invalidate stack variants (for example) and lose their values.
With the case problems, we are trying to fix them at source (i.e. at the higher level).
I understand. The question then becomes how to best fix the passed in size. Always passing the size of a complete cacheline in the SEND_SCR command doesn't seem like a better option because it may have other implications on other hardware.
Thierry

On Thursday 26 April 2012 06:34:43 Thierry Reding wrote:
For reference, see sd_change_freq() in drivers/mmc/mmc.c.
yes, that shows what we're talking about. int sd_change_freq(struct mmc *mmc) { ... stack vars ... int err; ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2); struct mmc_data data; int timeout; ... stack vars ... ... data.dest = (char *)scr; data.blocksize = 8; data.blocks = 1; data.flags = MMC_DATA_READ; err = mmc_send_cmd(mmc, &cmd, &data); ...
static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, struct mmc_data *data) { ... if (data->flags & MMC_DATA_READ) { if ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p " "may fail\n", data->dest); invalidate_dcache_range((ulong)data->dest, (ulong)data->dest + data->blocks * data->blocksize); } ...
so what invalidate_dcache_range() will invalidate is from &scr and 8 bytes after that. the higher layers make sure &scr is aligned properly, but not that it spans a full cache line. so if the stack layout is: [random stuff] 0x100 [scr[0]] 0x104 [scr[1]] 0x108 [err] 0x10a [timeout] [more stuff]
this implicitly invalidates [err] and [timeout] and more stuff. by you forcibly rounding up the length, you no longer get a warning, but you still (silently) blew away those variables from cache possibly losing changes that weren't written back to external memory yet.
I worry that what you have done will just introduce obscure bugs, since we will potentially invalidate stack variants (for example) and lose their values.
With the case problems, we are trying to fix them at source (i.e. at the higher level).
I understand. The question then becomes how to best fix the passed in size. Always passing the size of a complete cacheline in the SEND_SCR command doesn't seem like a better option because it may have other implications on other hardware.
i proposed this in a previous thread, but i think Simon missed it.
perhaps the warning in the core code could be dropped and all your changes in fringe code obsoleted (such as these USB patches): when it detects that an address is starting on an unaligned boundary, flush that line first, and then let it be invalidated. accordingly, when the end length is on an unaligned boundary, do the same flush-then-invalidate step. this should also make things work without a (significant) loss in performance. if anything, i suspect the overhead of doing runtime buffer size calculations and manually aligning pointers (which is what ALLOC_CACHE_ALIGN_BUFFER does) is a wash compared to partially flushing cache lines in the core ...
Simon: what do you think of this last idea ? -mike

Hi Mike,
On Fri, Apr 27, 2012 at 5:29 PM, Mike Frysinger vapier@gentoo.org wrote:
On Thursday 26 April 2012 06:34:43 Thierry Reding wrote:
For reference, see sd_change_freq() in drivers/mmc/mmc.c.
yes, that shows what we're talking about. int sd_change_freq(struct mmc *mmc) { ... stack vars ... int err; ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2); struct mmc_data data; int timeout; ... stack vars ... ... data.dest = (char *)scr; data.blocksize = 8; data.blocks = 1; data.flags = MMC_DATA_READ; err = mmc_send_cmd(mmc, &cmd, &data); ...
static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, struct mmc_data *data) { ... if (data->flags & MMC_DATA_READ) { if ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p " "may fail\n", data->dest); invalidate_dcache_range((ulong)data->dest, (ulong)data->dest + data->blocks * data->blocksize); } ...
so what invalidate_dcache_range() will invalidate is from &scr and 8 bytes after that. the higher layers make sure &scr is aligned properly, but not that it spans a full cache line. so if the stack layout is: [random stuff] 0x100 [scr[0]] 0x104 [scr[1]] 0x108 [err] 0x10a [timeout] [more stuff]
this implicitly invalidates [err] and [timeout] and more stuff. by you forcibly rounding up the length, you no longer get a warning, but you still (silently) blew away those variables from cache possibly losing changes that weren't written back to external memory yet.
I worry that what you have done will just introduce obscure bugs, since we will potentially invalidate stack variants (for example) and lose their values.
With the case problems, we are trying to fix them at source (i.e. at
the
higher level).
I understand. The question then becomes how to best fix the passed in
size.
Always passing the size of a complete cacheline in the SEND_SCR command doesn't seem like a better option because it may have other implications
on
other hardware.
i proposed this in a previous thread, but i think Simon missed it.
perhaps the warning in the core code could be dropped and all your changes in fringe code obsoleted (such as these USB patches): when it detects that an address is starting on an unaligned boundary, flush that line first, and then let it be invalidated. accordingly, when the end length is on an unaligned boundary, do the same flush-then-invalidate step. this should also make things work without a (significant) loss in performance. if anything, i suspect the overhead of doing runtime buffer size calculations and manually aligning pointers (which is what ALLOC_CACHE_ALIGN_BUFFER does) is a wash compared to partially flushing cache lines in the core ...
Simon: what do you think of this last idea ?
I think I proved to myself a while back that it can't be done. Even if you flush you can't know that more activity will not occur on that cache line when you access the stack a microsecond later. Then much later when the DMA updates the RAM address with the data, you will not see it, because you have already pulled in that cache line. We are relying on the invalidate to 'hold' until we are ready to read the data.
How about just using an array of one element? So instead of:
struct fred fred;
do_something_with(&fred);
use:
ALLOC_CACHE_ALIGN_BUFFER(struct fred, scr, 1)
do_something_with(fred);
Regards, Simon
-mike

On 04/26/2012 11:29 PM, Mike Frysinger wrote:
On Thursday 26 April 2012 06:34:43 Thierry Reding wrote:
For reference, see sd_change_freq() in drivers/mmc/mmc.c.
This is a follow-up to:
http://lists.denx.de/pipermail/u-boot/2012-April/123080.html
which was referenced from:
http://lists.denx.de/pipermail/u-boot/2012-September/133641.html
yes, that shows what we're talking about. int sd_change_freq(struct mmc *mmc) { ... stack vars ... int err; ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2); struct mmc_data data; int timeout; ... stack vars ... ... data.dest = (char *)scr; data.blocksize = 8; data.blocks = 1; data.flags = MMC_DATA_READ; err = mmc_send_cmd(mmc, &cmd, &data); ...
static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, struct mmc_data *data) { ... if (data->flags & MMC_DATA_READ) { if ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p " "may fail\n", data->dest); invalidate_dcache_range((ulong)data->dest, (ulong)data->dest + data->blocks * data->blocksize); } ...
so what invalidate_dcache_range() will invalidate is from &scr and 8 bytes after that. the higher layers make sure &scr is aligned properly, but not that it spans a full cache line. so if the stack layout is: [random stuff] 0x100 [scr[0]] 0x104 [scr[1]] 0x108 [err] 0x10a [timeout] [more stuff]
That's not the stack layout. scr is allocated (as quoted above) using ALLOC_CACHE_ALIGN_BUFFER. Therefore, the storage space allocated for scr is always cache-aligned in address and size, even if the code only uses 8 bytes of it.
this implicitly invalidates [err] and [timeout] and more stuff. by you forcibly rounding up the length, you no longer get a warning, but you still (silently) blew away those variables from cache possibly losing changes that weren't written back to external memory yet.
So, this problem won't actually occur here.
So, Thierry's proposed solution (which I'll re-quote below) would actually work just fine:
[PATCH 2/2] mmc: tegra: invalidate complete cachelines
The MMC core sometimes reads buffers that are smaller than a complete cacheline, for example when reading the SCR. In order to avoid a warning from the ARM v7 cache handling code, this patch makes sure that complete cachelines are flushed.
Signed-off-by: Thierry Reding <thierry.reding at avionic-design.de> --- drivers/mmc/tegra2_mmc.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/mmc/tegra2_mmc.c b/drivers/mmc/tegra2_mmc.c index fb8a57d..3615876 100644 --- a/drivers/mmc/tegra2_mmc.c +++ b/drivers/mmc/tegra2_mmc.c @@ -323,12 +323,13 @@ static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, } writel(mask, &host->reg->norintsts); if (data->flags & MMC_DATA_READ) { + ulong end = (ulong)data->dest + + roundup(data->blocks * data->blocksize, + ARCH_DMA_MINALIGN); if ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p " "may fail\n", data->dest); - invalidate_dcache_range((ulong)data->dest, - (ulong)data->dest
data->blocks * data->blocksize); +
invalidate_dcache_range((ulong)data->dest, end); } }
--
... so long as we require any struct mmc_data data's .dest field to point at a buffer that was allocated using ALLOC_CACHE_ALIGN_BUFFER.
So, I'd like to propose that we take Thierry's fix, perhaps having validated (or implemented some way of enforcing) that ALLOC_CACHE_ALIGN_BUFFER() was used to allocate the buffer pointer.

Hi Stephen,
On Fri, Nov 2, 2012 at 1:22 PM, Stephen Warren swarren@wwwdotorg.org wrote:
On 04/26/2012 11:29 PM, Mike Frysinger wrote:
On Thursday 26 April 2012 06:34:43 Thierry Reding wrote:
For reference, see sd_change_freq() in drivers/mmc/mmc.c.
This is a follow-up to:
http://lists.denx.de/pipermail/u-boot/2012-April/123080.html
which was referenced from:
http://lists.denx.de/pipermail/u-boot/2012-September/133641.html
yes, that shows what we're talking about. int sd_change_freq(struct mmc *mmc) { ... stack vars ... int err; ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2); struct mmc_data data; int timeout; ... stack vars ... ... data.dest = (char *)scr; data.blocksize = 8; data.blocks = 1; data.flags = MMC_DATA_READ; err = mmc_send_cmd(mmc, &cmd, &data); ...
static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, struct mmc_data *data) { ... if (data->flags & MMC_DATA_READ) { if ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p " "may fail\n", data->dest); invalidate_dcache_range((ulong)data->dest, (ulong)data->dest + data->blocks * data->blocksize); } ...
so what invalidate_dcache_range() will invalidate is from &scr and 8 bytes after that. the higher layers make sure &scr is aligned properly, but not that it spans a full cache line. so if the stack layout is: [random stuff] 0x100 [scr[0]] 0x104 [scr[1]] 0x108 [err] 0x10a [timeout] [more stuff]
That's not the stack layout. scr is allocated (as quoted above) using ALLOC_CACHE_ALIGN_BUFFER. Therefore, the storage space allocated for scr is always cache-aligned in address and size, even if the code only uses 8 bytes of it.
this implicitly invalidates [err] and [timeout] and more stuff. by you forcibly rounding up the length, you no longer get a warning, but you still (silently) blew away those variables from cache possibly losing changes that weren't written back to external memory yet.
So, this problem won't actually occur here.
So, Thierry's proposed solution (which I'll re-quote below) would actually work just fine:
[PATCH 2/2] mmc: tegra: invalidate complete cachelines
The MMC core sometimes reads buffers that are smaller than a complete cacheline, for example when reading the SCR. In order to avoid a warning from the ARM v7 cache handling code, this patch makes sure that complete cachelines are flushed.
Signed-off-by: Thierry Reding <thierry.reding at avionic-design.de> --- drivers/mmc/tegra2_mmc.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/mmc/tegra2_mmc.c b/drivers/mmc/tegra2_mmc.c index fb8a57d..3615876 100644 --- a/drivers/mmc/tegra2_mmc.c +++ b/drivers/mmc/tegra2_mmc.c @@ -323,12 +323,13 @@ static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, } writel(mask, &host->reg->norintsts); if (data->flags & MMC_DATA_READ) { + ulong end = (ulong)data->dest + + roundup(data->blocks * data->blocksize, + ARCH_DMA_MINALIGN); if ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p " "may fail\n", data->dest); - invalidate_dcache_range((ulong)data->dest, - (ulong)data->dest
data->blocks * data->blocksize); +
invalidate_dcache_range((ulong)data->dest, end); } }
--
... so long as we require any struct mmc_data data's .dest field to point at a buffer that was allocated using ALLOC_CACHE_ALIGN_BUFFER.
So, I'd like to propose that we take Thierry's fix, perhaps having validated (or implemented some way of enforcing) that ALLOC_CACHE_ALIGN_BUFFER() was used to allocate the buffer pointer.
That sounds ok to me.
It is important to avoid silent failure if we can. Are you proposing to pass a 'size' parameter along with any buffer pointer, or something else?
Regards, Simon

Dear Simon Glass,
Hi Stephen,
On Fri, Nov 2, 2012 at 1:22 PM, Stephen Warren swarren@wwwdotorg.org wrote:
On 04/26/2012 11:29 PM, Mike Frysinger wrote:
On Thursday 26 April 2012 06:34:43 Thierry Reding wrote:
For reference, see sd_change_freq() in drivers/mmc/mmc.c.
This is a follow-up to:
http://lists.denx.de/pipermail/u-boot/2012-April/123080.html
which was referenced from:
http://lists.denx.de/pipermail/u-boot/2012-September/133641.html
yes, that shows what we're talking about. int sd_change_freq(struct mmc *mmc) { ... stack vars ... int err; ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2); struct mmc_data data; int timeout; ... stack vars ... ... data.dest = (char *)scr; data.blocksize = 8; data.blocks = 1; data.flags = MMC_DATA_READ; err = mmc_send_cmd(mmc, &cmd, &data); ...
static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, struct mmc_data *data) { ... if (data->flags & MMC_DATA_READ) { if ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p " "may fail\n", data->dest); invalidate_dcache_range((ulong)data->dest, (ulong)data->dest + data->blocks * data->blocksize); } ...
so what invalidate_dcache_range() will invalidate is from &scr and 8 bytes after that. the higher layers make sure &scr is aligned properly, but not that it spans a full cache line. so if the stack layout is: [random stuff] 0x100 [scr[0]] 0x104 [scr[1]] 0x108 [err] 0x10a [timeout] [more stuff]
That's not the stack layout. scr is allocated (as quoted above) using ALLOC_CACHE_ALIGN_BUFFER. Therefore, the storage space allocated for scr is always cache-aligned in address and size, even if the code only uses 8 bytes of it.
this implicitly invalidates [err] and [timeout] and more stuff. by you forcibly rounding up the length, you no longer get a warning, but you still (silently) blew away those variables from cache possibly losing changes that weren't written back to external memory yet.
So, this problem won't actually occur here.
So, Thierry's proposed solution (which I'll re-quote below) would
actually work just fine:
[PATCH 2/2] mmc: tegra: invalidate complete cachelines
The MMC core sometimes reads buffers that are smaller than a complete cacheline, for example when reading the SCR. In order to avoid a warning from the ARM v7 cache handling code, this patch makes sure that complete cachelines are flushed.
Signed-off-by: Thierry Reding <thierry.reding at avionic-design.de> --- drivers/mmc/tegra2_mmc.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/mmc/tegra2_mmc.c b/drivers/mmc/tegra2_mmc.c index fb8a57d..3615876 100644 --- a/drivers/mmc/tegra2_mmc.c +++ b/drivers/mmc/tegra2_mmc.c @@ -323,12 +323,13 @@ static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, } writel(mask, &host->reg->norintsts); if (data->flags & MMC_DATA_READ) { + ulong end = (ulong)data->dest + + roundup(data->blocks * data->blocksize, + ARCH_DMA_MINALIGN); if ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p " "may fail\n", data->dest); - invalidate_dcache_range((ulong)data->dest, - (ulong)data->dest + - data->blocks * data->blocksize); + invalidate_dcache_range((ulong)data->dest, end); } }
--
... so long as we require any struct mmc_data data's .dest field to point at a buffer that was allocated using ALLOC_CACHE_ALIGN_BUFFER.
So, I'd like to propose that we take Thierry's fix, perhaps having validated (or implemented some way of enforcing) that ALLOC_CACHE_ALIGN_BUFFER() was used to allocate the buffer pointer.
That sounds ok to me.
It is important to avoid silent failure if we can. Are you proposing to pass a 'size' parameter along with any buffer pointer, or something else?
Dumb question -- might be unrelated. Does the tegra mmc driver do DMA? And if so, what happens if you do raw read to unaligned address (aka. how come you don't need the bounce buffer)?

On 11/02/2012 02:38 PM, Marek Vasut wrote:
Dear Simon Glass,
Hi Stephen,
On Fri, Nov 2, 2012 at 1:22 PM, Stephen Warren swarren@wwwdotorg.org wrote:
On 04/26/2012 11:29 PM, Mike Frysinger wrote:
On Thursday 26 April 2012 06:34:43 Thierry Reding wrote:
For reference, see sd_change_freq() in drivers/mmc/mmc.c.
This is a follow-up to:
http://lists.denx.de/pipermail/u-boot/2012-April/123080.html
which was referenced from:
http://lists.denx.de/pipermail/u-boot/2012-September/133641.html
yes, that shows what we're talking about. int sd_change_freq(struct mmc *mmc) { ... stack vars ... int err; ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2); struct mmc_data data; int timeout; ... stack vars ... ... data.dest = (char *)scr; data.blocksize = 8; data.blocks = 1; data.flags = MMC_DATA_READ; err = mmc_send_cmd(mmc, &cmd, &data); ...
static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, struct mmc_data *data) { ... if (data->flags & MMC_DATA_READ) { if ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p " "may fail\n", data->dest); invalidate_dcache_range((ulong)data->dest, (ulong)data->dest + data->blocks * data->blocksize); } ...
so what invalidate_dcache_range() will invalidate is from &scr and 8 bytes after that. the higher layers make sure &scr is aligned properly, but not that it spans a full cache line. so if the stack layout is: [random stuff] 0x100 [scr[0]] 0x104 [scr[1]] 0x108 [err] 0x10a [timeout] [more stuff]
That's not the stack layout. scr is allocated (as quoted above) using ALLOC_CACHE_ALIGN_BUFFER. Therefore, the storage space allocated for scr is always cache-aligned in address and size, even if the code only uses 8 bytes of it.
this implicitly invalidates [err] and [timeout] and more stuff. by you forcibly rounding up the length, you no longer get a warning, but you still (silently) blew away those variables from cache possibly losing changes that weren't written back to external memory yet.
So, this problem won't actually occur here.
So, Thierry's proposed solution (which I'll re-quote below) would
actually work just fine:
[PATCH 2/2] mmc: tegra: invalidate complete cachelines
The MMC core sometimes reads buffers that are smaller than a complete cacheline, for example when reading the SCR. In order to avoid a warning from the ARM v7 cache handling code, this patch makes sure that complete cachelines are flushed.
Signed-off-by: Thierry Reding <thierry.reding at avionic-design.de> --- drivers/mmc/tegra2_mmc.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/mmc/tegra2_mmc.c b/drivers/mmc/tegra2_mmc.c index fb8a57d..3615876 100644 --- a/drivers/mmc/tegra2_mmc.c +++ b/drivers/mmc/tegra2_mmc.c @@ -323,12 +323,13 @@ static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, } writel(mask, &host->reg->norintsts); if (data->flags & MMC_DATA_READ) { + ulong end = (ulong)data->dest + + roundup(data->blocks * data->blocksize, + ARCH_DMA_MINALIGN); if ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p " "may fail\n", data->dest); - invalidate_dcache_range((ulong)data->dest, - (ulong)data->dest + - data->blocks * data->blocksize); + invalidate_dcache_range((ulong)data->dest, end); } }
--
... so long as we require any struct mmc_data data's .dest field to point at a buffer that was allocated using ALLOC_CACHE_ALIGN_BUFFER.
So, I'd like to propose that we take Thierry's fix, perhaps having validated (or implemented some way of enforcing) that ALLOC_CACHE_ALIGN_BUFFER() was used to allocate the buffer pointer.
That sounds ok to me.
It is important to avoid silent failure if we can. Are you proposing to pass a 'size' parameter along with any buffer pointer, or something else?
Dumb question -- might be unrelated. Does the tegra mmc driver do DMA? And if so, what happens if you do raw read to unaligned address (aka. how come you don't need the bounce buffer)?
Yes, it does DMA, I believe. (At least if it doesn't, I have no idea why the driver is flushing caches!)
I guess we only support the use of aligned addresses, so e.g. the following would work:
ext2load mmc 0:1 0x00100000 /file
but the following wouldn't:
ext2load mmc 0:1 0x00100004 /file
which while I suppose it is an artificial restriction, hasn't been an issue in practice.

Dear Stephen Warren,
On 11/02/2012 02:38 PM, Marek Vasut wrote:
Dear Simon Glass,
Hi Stephen,
On Fri, Nov 2, 2012 at 1:22 PM, Stephen Warren swarren@wwwdotorg.org
wrote:
On 04/26/2012 11:29 PM, Mike Frysinger wrote:
On Thursday 26 April 2012 06:34:43 Thierry Reding wrote:
For reference, see sd_change_freq() in drivers/mmc/mmc.c.
This is a follow-up to:
http://lists.denx.de/pipermail/u-boot/2012-April/123080.html
which was referenced from:
http://lists.denx.de/pipermail/u-boot/2012-September/133641.html
yes, that shows what we're talking about. int sd_change_freq(struct mmc *mmc) { ... stack vars ... int err; ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2); struct mmc_data data; int timeout; ... stack vars ... ... data.dest = (char *)scr; data.blocksize = 8; data.blocks = 1; data.flags = MMC_DATA_READ; err = mmc_send_cmd(mmc, &cmd, &data); ...
static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, struct mmc_data *data) { ... if (data->flags & MMC_DATA_READ) { if ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p " "may fail\n", data->dest); invalidate_dcache_range((ulong)data->dest, (ulong)data->dest + data->blocks * data->blocksize); } ...
so what invalidate_dcache_range() will invalidate is from &scr and 8 bytes after that. the higher layers make sure &scr is aligned properly, but not that it spans a full cache line. so if the stack layout is: [random stuff] 0x100 [scr[0]] 0x104 [scr[1]] 0x108 [err] 0x10a [timeout] [more stuff]
That's not the stack layout. scr is allocated (as quoted above) using ALLOC_CACHE_ALIGN_BUFFER. Therefore, the storage space allocated for scr is always cache-aligned in address and size, even if the code only uses 8 bytes of it.
this implicitly invalidates [err] and [timeout] and more stuff. by you forcibly rounding up the length, you no longer get a warning, but you still (silently) blew away those variables from cache possibly losing changes that weren't written back to external memory yet.
So, this problem won't actually occur here.
So, Thierry's proposed solution (which I'll re-quote below) would
actually work just fine:
[PATCH 2/2] mmc: tegra: invalidate complete cachelines
The MMC core sometimes reads buffers that are smaller than a complete cacheline, for example when reading the SCR. In order to avoid a warning from the ARM v7 cache handling code, this patch makes sure that complete cachelines are flushed.
Signed-off-by: Thierry Reding <thierry.reding at avionic-design.de> --- drivers/mmc/tegra2_mmc.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/mmc/tegra2_mmc.c b/drivers/mmc/tegra2_mmc.c index fb8a57d..3615876 100644 --- a/drivers/mmc/tegra2_mmc.c +++ b/drivers/mmc/tegra2_mmc.c @@ -323,12 +323,13 @@ static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, } writel(mask, &host->reg->norintsts); if (data->flags & MMC_DATA_READ) { + ulong end = (ulong)data->dest + + roundup(data->blocks * data->blocksize, +
ARCH_DMA_MINALIGN); if ((uintptr_t)data->dest &
(ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p " "may fail\n", data->dest); - invalidate_dcache_range((ulong)data->dest, - (ulong)data->dest + - data->blocks * data->blocksize); + invalidate_dcache_range((ulong)data->dest, end); } }
--
... so long as we require any struct mmc_data data's .dest field to point at a buffer that was allocated using ALLOC_CACHE_ALIGN_BUFFER.
So, I'd like to propose that we take Thierry's fix, perhaps having validated (or implemented some way of enforcing) that ALLOC_CACHE_ALIGN_BUFFER() was used to allocate the buffer pointer.
That sounds ok to me.
It is important to avoid silent failure if we can. Are you proposing to pass a 'size' parameter along with any buffer pointer, or something else?
Dumb question -- might be unrelated. Does the tegra mmc driver do DMA? And if so, what happens if you do raw read to unaligned address (aka. how come you don't need the bounce buffer)?
Yes, it does DMA, I believe. (At least if it doesn't, I have no idea why the driver is flushing caches!)
I guess we only support the use of aligned addresses, so e.g. the following would work:
ext2load mmc 0:1 0x00100000 /file
but the following wouldn't:
ext2load mmc 0:1 0x00100004 /file
which while I suppose it is an artificial restriction, hasn't been an issue in practice.
Then just enable the bounce buffer and it will work ;-)

On 11/02/2012 03:28 PM, Marek Vasut wrote:
Dear Stephen Warren,
On 11/02/2012 02:38 PM, Marek Vasut wrote:
...
Dumb question -- might be unrelated. Does the tegra mmc driver do DMA? And if so, what happens if you do raw read to unaligned address (aka. how come you don't need the bounce buffer)?
Yes, it does DMA, I believe. (At least if it doesn't, I have no idea why the driver is flushing caches!)
I guess we only support the use of aligned addresses, so e.g. the following would work:
ext2load mmc 0:1 0x00100000 /file
but the following wouldn't:
ext2load mmc 0:1 0x00100004 /file
which while I suppose it is an artificial restriction, hasn't been an issue in practice.
Then just enable the bounce buffer and it will work ;-)
You suggested that last time, and it made no difference then... In fact, the config option you mentioned isn't used anywhere in the srouce tree except adding bouncebuf.o into the build right now; which config option do you think I should use?

Dear Stephen Warren,
On 11/02/2012 03:28 PM, Marek Vasut wrote:
Dear Stephen Warren,
On 11/02/2012 02:38 PM, Marek Vasut wrote:
...
Dumb question -- might be unrelated. Does the tegra mmc driver do DMA? And if so, what happens if you do raw read to unaligned address (aka. how come you don't need the bounce buffer)?
Yes, it does DMA, I believe. (At least if it doesn't, I have no idea why the driver is flushing caches!)
I guess we only support the use of aligned addresses, so e.g. the following would work:
ext2load mmc 0:1 0x00100000 /file
but the following wouldn't:
ext2load mmc 0:1 0x00100004 /file
which while I suppose it is an artificial restriction, hasn't been an issue in practice.
Then just enable the bounce buffer and it will work ;-)
You suggested that last time, and it made no difference then... In fact, the config option you mentioned isn't used anywhere in the srouce tree except adding bouncebuf.o
Bouncebuf.o ?
into the build right now; which config option do you think I should use?
CONFIG_BOUNCE_BUFFER
It's used on mx28-based boards.
Best regards, Marek Vasut

On 11/02/2012 04:10 PM, Marek Vasut wrote:
Dear Stephen Warren,
On 11/02/2012 03:28 PM, Marek Vasut wrote:
Dear Stephen Warren,
On 11/02/2012 02:38 PM, Marek Vasut wrote:
...
Dumb question -- might be unrelated. Does the tegra mmc driver do DMA? And if so, what happens if you do raw read to unaligned address (aka. how come you don't need the bounce buffer)?
Yes, it does DMA, I believe. (At least if it doesn't, I have no idea why the driver is flushing caches!)
I guess we only support the use of aligned addresses, so e.g. the following would work:
ext2load mmc 0:1 0x00100000 /file
but the following wouldn't:
ext2load mmc 0:1 0x00100004 /file
which while I suppose it is an artificial restriction, hasn't been an issue in practice.
Then just enable the bounce buffer and it will work ;-)
You suggested that last time, and it made no difference then... In fact, the config option you mentioned isn't used anywhere in the srouce tree except adding bouncebuf.o
Bouncebuf.o ?
common/bouncebuf.c
into the build right now; which config option do you think I should use?
CONFIG_BOUNCE_BUFFER
It's used on mx28-based boards.
Ahah, there's much more to it than just defining the config option, and grep'ing for the config option doesn't reveal that.
Defining CONFIG_BOUNCE_BUFFER simply pulls in bouncebuf.o into the build. It's then up to the individual MMC/... driver to actually make use of the new functions - i.e. explicitly call e.g. bounce_buffer_start(). The MXS MMC driver calls these without any ifdef CONFIG_BOUNCE_BUFFER, and hence doesn't show up when you grep for that. I had assumed that such functionality would be a conditional part of the MMC core rather than the individual drivers, hence had missed the usage in the MXS driver.
So, that config option plus some changes to the Tegra MMC driver might work - I'll take a look. However, making that change will simply hide any unaligned buffer and reduce performance, so I'd still prefer to require that buffers be aligned in the future where possible.

Dear Stephen Warren,
On 11/02/2012 04:10 PM, Marek Vasut wrote:
Dear Stephen Warren,
On 11/02/2012 03:28 PM, Marek Vasut wrote:
Dear Stephen Warren,
On 11/02/2012 02:38 PM, Marek Vasut wrote:
...
Dumb question -- might be unrelated. Does the tegra mmc driver do DMA? And if so, what happens if you do raw read to unaligned address (aka. how come you don't need the bounce buffer)?
Yes, it does DMA, I believe. (At least if it doesn't, I have no idea why the driver is flushing caches!)
I guess we only support the use of aligned addresses, so e.g. the following would work:
ext2load mmc 0:1 0x00100000 /file
but the following wouldn't:
ext2load mmc 0:1 0x00100004 /file
which while I suppose it is an artificial restriction, hasn't been an issue in practice.
Then just enable the bounce buffer and it will work ;-)
You suggested that last time, and it made no difference then... In fact, the config option you mentioned isn't used anywhere in the srouce tree except adding bouncebuf.o
Bouncebuf.o ?
common/bouncebuf.c
Ugh, it was applied ...
into the build right now; which config option do you think I should use?
CONFIG_BOUNCE_BUFFER
It's used on mx28-based boards.
Ahah, there's much more to it than just defining the config option, and grep'ing for the config option doesn't reveal that.
Defining CONFIG_BOUNCE_BUFFER simply pulls in bouncebuf.o into the build. It's then up to the individual MMC/... driver to actually make use of the new functions - i.e. explicitly call e.g. bounce_buffer_start(). The MXS MMC driver calls these without any ifdef CONFIG_BOUNCE_BUFFER, and hence doesn't show up when you grep for that. I had assumed that such functionality would be a conditional part of the MMC core rather than the individual drivers, hence had missed the usage in the MXS driver.
Sorry about that, I didn't know the patch was pulled in so I expected the old approach.
So, that config option plus some changes to the Tegra MMC driver might work - I'll take a look. However, making that change will simply hide any unaligned buffer and reduce performance, so I'd still prefer to require that buffers be aligned in the future where possible.
Best regards, Marek Vasut

Hi Stephen,
On Mon, Nov 5, 2012 at 11:50 AM, Stephen Warren swarren@wwwdotorg.org wrote:
On 11/02/2012 04:10 PM, Marek Vasut wrote:
Dear Stephen Warren,
On 11/02/2012 03:28 PM, Marek Vasut wrote:
Dear Stephen Warren,
On 11/02/2012 02:38 PM, Marek Vasut wrote:
...
Dumb question -- might be unrelated. Does the tegra mmc driver do DMA? And if so, what happens if you do raw read to unaligned address (aka. how come you don't need the bounce buffer)?
Yes, it does DMA, I believe. (At least if it doesn't, I have no idea why the driver is flushing caches!)
I guess we only support the use of aligned addresses, so e.g. the following would work:
ext2load mmc 0:1 0x00100000 /file
but the following wouldn't:
ext2load mmc 0:1 0x00100004 /file
which while I suppose it is an artificial restriction, hasn't been an issue in practice.
Then just enable the bounce buffer and it will work ;-)
You suggested that last time, and it made no difference then... In fact, the config option you mentioned isn't used anywhere in the srouce tree except adding bouncebuf.o
Bouncebuf.o ?
common/bouncebuf.c
into the build right now; which config option do you think I should use?
CONFIG_BOUNCE_BUFFER
It's used on mx28-based boards.
Ahah, there's much more to it than just defining the config option, and grep'ing for the config option doesn't reveal that.
Defining CONFIG_BOUNCE_BUFFER simply pulls in bouncebuf.o into the build. It's then up to the individual MMC/... driver to actually make use of the new functions - i.e. explicitly call e.g. bounce_buffer_start(). The MXS MMC driver calls these without any ifdef CONFIG_BOUNCE_BUFFER, and hence doesn't show up when you grep for that. I had assumed that such functionality would be a conditional part of the MMC core rather than the individual drivers, hence had missed the usage in the MXS driver.
So, that config option plus some changes to the Tegra MMC driver might work - I'll take a look. However, making that change will simply hide any unaligned buffer and reduce performance, so I'd still prefer to require that buffers be aligned in the future where possible.
Yes that seems right to me.
Regards, Simon

On 11/02/2012 02:25 PM, Simon Glass wrote:
Hi Stephen,
On Fri, Nov 2, 2012 at 1:22 PM, Stephen Warren swarren@wwwdotorg.org wrote:
On 04/26/2012 11:29 PM, Mike Frysinger wrote:
On Thursday 26 April 2012 06:34:43 Thierry Reding wrote:
For reference, see sd_change_freq() in drivers/mmc/mmc.c.
This is a follow-up to:
http://lists.denx.de/pipermail/u-boot/2012-April/123080.html
which was referenced from:
http://lists.denx.de/pipermail/u-boot/2012-September/133641.html
yes, that shows what we're talking about. int sd_change_freq(struct mmc *mmc) { ... stack vars ... int err; ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2); struct mmc_data data; int timeout; ... stack vars ... ... data.dest = (char *)scr; data.blocksize = 8; data.blocks = 1; data.flags = MMC_DATA_READ; err = mmc_send_cmd(mmc, &cmd, &data); ...
static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, struct mmc_data *data) { ... if (data->flags & MMC_DATA_READ) { if ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p " "may fail\n", data->dest); invalidate_dcache_range((ulong)data->dest, (ulong)data->dest + data->blocks * data->blocksize); } ...
so what invalidate_dcache_range() will invalidate is from &scr and 8 bytes after that. the higher layers make sure &scr is aligned properly, but not that it spans a full cache line. so if the stack layout is: [random stuff] 0x100 [scr[0]] 0x104 [scr[1]] 0x108 [err] 0x10a [timeout] [more stuff]
That's not the stack layout. scr is allocated (as quoted above) using ALLOC_CACHE_ALIGN_BUFFER. Therefore, the storage space allocated for scr is always cache-aligned in address and size, even if the code only uses 8 bytes of it.
this implicitly invalidates [err] and [timeout] and more stuff. by you forcibly rounding up the length, you no longer get a warning, but you still (silently) blew away those variables from cache possibly losing changes that weren't written back to external memory yet.
So, this problem won't actually occur here.
So, Thierry's proposed solution (which I'll re-quote below) would actually work just fine:
[PATCH 2/2] mmc: tegra: invalidate complete cachelines
The MMC core sometimes reads buffers that are smaller than a complete cacheline, for example when reading the SCR. In order to avoid a warning from the ARM v7 cache handling code, this patch makes sure that complete cachelines are flushed.
Signed-off-by: Thierry Reding <thierry.reding at avionic-design.de> --- drivers/mmc/tegra2_mmc.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/mmc/tegra2_mmc.c b/drivers/mmc/tegra2_mmc.c index fb8a57d..3615876 100644 --- a/drivers/mmc/tegra2_mmc.c +++ b/drivers/mmc/tegra2_mmc.c @@ -323,12 +323,13 @@ static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, } writel(mask, &host->reg->norintsts); if (data->flags & MMC_DATA_READ) { + ulong end = (ulong)data->dest + + roundup(data->blocks * data->blocksize, + ARCH_DMA_MINALIGN); if ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p " "may fail\n", data->dest); - invalidate_dcache_range((ulong)data->dest, - (ulong)data->dest
data->blocks * data->blocksize); +
invalidate_dcache_range((ulong)data->dest, end); } }
--
... so long as we require any struct mmc_data data's .dest field to point at a buffer that was allocated using ALLOC_CACHE_ALIGN_BUFFER.
So, I'd like to propose that we take Thierry's fix, perhaps having validated (or implemented some way of enforcing) that ALLOC_CACHE_ALIGN_BUFFER() was used to allocate the buffer pointer.
That sounds ok to me.
It is important to avoid silent failure if we can. Are you proposing to pass a 'size' parameter along with any buffer pointer, or something else?
I was wondering about creating some kind of macro to initialize the buffer pointer in struct mmc_data, and writing that macro so that the buffer passed to it had to have been allocated using ALLOC_CACHE_ALIGN_BUFFER (e.g. by referencing the hidden variable it creates, or something like that).
Passing in the buffer size would also work, although it seems like it'd be easier to screw that up and hide the silent errors you mentioned?

On Tue, Apr 24, 2012 at 7:53 PM, Thierry Reding thierry.reding@avionic-design.de wrote:
Some hardware requires the data buffers to be cacheline-aligned to make sure DMA operations can be properly executed.
This patch uses the ALLOC_CACHE_ALIGN_BUFFER macro to allocate buffers with proper alignment. The same was already done for EFI partitions in commit f75dd58 "part_efi: dcache: allocate cacheline aligned buffers".
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de
Acked-by: Simon Glass sjg@chromium.org
This fixes boot failures on Avionic Design Plutux and Medcom boards.
disk/part_dos.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/disk/part_dos.c b/disk/part_dos.c index b5bcb37..c028aaf 100644 --- a/disk/part_dos.c +++ b/disk/part_dos.c @@ -87,7 +87,7 @@ static int test_block_type(unsigned char *buffer)
int test_part_dos (block_dev_desc_t *dev_desc) {
- unsigned char buffer[dev_desc->blksz];
- ALLOC_CACHE_ALIGN_BUFFER(unsigned char, buffer, dev_desc->blksz);
if ((dev_desc->block_read(dev_desc->dev, 0, 1, (ulong *) buffer) != 1) || (buffer[DOS_PART_MAGIC_OFFSET + 0] != 0x55) || @@ -102,7 +102,7 @@ int test_part_dos (block_dev_desc_t *dev_desc) static void print_partition_extended (block_dev_desc_t *dev_desc, int ext_part_sector, int relative, int part_num) {
- unsigned char buffer[dev_desc->blksz];
- ALLOC_CACHE_ALIGN_BUFFER(unsigned char, buffer, dev_desc->blksz);
dos_partition_t *pt; int i;
@@ -166,7 +166,7 @@ static int get_partition_info_extended (block_dev_desc_t *dev_desc, int ext_part int relative, int part_num, int which_part, disk_partition_t *info) {
- unsigned char buffer[dev_desc->blksz];
- ALLOC_CACHE_ALIGN_BUFFER(unsigned char, buffer, dev_desc->blksz);
dos_partition_t *pt; int i;
-- 1.7.10

* Simon Glass wrote:
On Tue, Apr 24, 2012 at 7:53 PM, Thierry Reding thierry.reding@avionic-design.de wrote:
Some hardware requires the data buffers to be cacheline-aligned to make sure DMA operations can be properly executed.
This patch uses the ALLOC_CACHE_ALIGN_BUFFER macro to allocate buffers with proper alignment. The same was already done for EFI partitions in commit f75dd58 "part_efi: dcache: allocate cacheline aligned buffers".
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de
Acked-by: Simon Glass sjg@chromium.org
I just noticed that the same patch was already posted by Eric Nelson back in March. Anatolij Gustschin just applied it to u-boot-staging/agust@denx.de.
Thierry
participants (5)
-
Marek Vasut
-
Mike Frysinger
-
Simon Glass
-
Stephen Warren
-
Thierry Reding