[U-Boot] [PATCH] mmc:dcache: Cache line size aligned internal MMC buffers

MMC operations are performed on cache line size aligned buffers. In the current MMC implementation it is allowed to pass buffer with arbitrary alignment. In this patch assumption has been made, that it is better to align the buffer on the MMC framework boundary, than in a number of u-boot subsystems, which are using MMC.
Signed-off-by: Lukasz Majewski l.majewski@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com CC: Andy Fleming afleming@gmail.com CC: Albert ARIBAUD albert.u.boot@aribaud.net --- drivers/mmc/mmc.c | 30 +++++++++++++++++++++++++++--- 1 files changed, 27 insertions(+), 3 deletions(-)
diff --git a/drivers/mmc/mmc.c b/drivers/mmc/mmc.c index 5f79a17..47e94c8 100644 --- a/drivers/mmc/mmc.c +++ b/drivers/mmc/mmc.c @@ -263,6 +263,7 @@ mmc_write_blocks(struct mmc *mmc, ulong start, lbaint_t blkcnt, const void*src) struct mmc_cmd cmd; struct mmc_data data; int timeout = 1000; + void *cache_align_buf;
if ((start + blkcnt) > mmc->block_dev.lba) { printf("MMC: block number 0x%lx exceeds max(0x%lx)\n", @@ -283,13 +284,22 @@ mmc_write_blocks(struct mmc *mmc, ulong start, lbaint_t blkcnt, const void*src) cmd.resp_type = MMC_RSP_R1; cmd.flags = 0;
- data.src = src; + cache_align_buf = memalign(get_dcache_line_size(), + mmc->write_bl_len * blkcnt); + + if (!cache_align_buf) + return -ENOMEM; + + memcpy(cache_align_buf, src, mmc->write_bl_len * blkcnt); + + data.src = cache_align_buf; data.blocks = blkcnt; data.blocksize = mmc->write_bl_len; data.flags = MMC_DATA_WRITE;
if (mmc_send_cmd(mmc, &cmd, &data)) { printf("mmc write failed\n"); + free(cache_align_buf); return 0; }
@@ -303,6 +313,7 @@ mmc_write_blocks(struct mmc *mmc, ulong start, lbaint_t blkcnt, const void*src) cmd.flags = 0; if (mmc_send_cmd(mmc, &cmd, NULL)) { printf("mmc fail to send stop cmd\n"); + free(cache_align_buf); return 0; }
@@ -310,6 +321,7 @@ mmc_write_blocks(struct mmc *mmc, ulong start, lbaint_t blkcnt, const void*src) mmc_send_status(mmc, timeout); }
+ free(cache_align_buf); return blkcnt; }
@@ -342,6 +354,7 @@ int mmc_read_blocks(struct mmc *mmc, void *dst, ulong start, lbaint_t blkcnt) struct mmc_cmd cmd; struct mmc_data data; int timeout = 1000; + void *cache_align_buf;
if (blkcnt > 1) cmd.cmdidx = MMC_CMD_READ_MULTIPLE_BLOCK; @@ -356,13 +369,21 @@ int mmc_read_blocks(struct mmc *mmc, void *dst, ulong start, lbaint_t blkcnt) cmd.resp_type = MMC_RSP_R1; cmd.flags = 0;
- data.dest = dst; + cache_align_buf = memalign(get_dcache_line_size(), + mmc->read_bl_len * blkcnt); + + if (!cache_align_buf) + return -ENOMEM; + + data.dest = cache_align_buf; data.blocks = blkcnt; data.blocksize = mmc->read_bl_len; data.flags = MMC_DATA_READ;
- if (mmc_send_cmd(mmc, &cmd, &data)) + if (mmc_send_cmd(mmc, &cmd, &data)) { + free(cache_align_buf); return 0; + }
if (blkcnt > 1) { cmd.cmdidx = MMC_CMD_STOP_TRANSMISSION; @@ -371,6 +392,7 @@ int mmc_read_blocks(struct mmc *mmc, void *dst, ulong start, lbaint_t blkcnt) cmd.flags = 0; if (mmc_send_cmd(mmc, &cmd, NULL)) { printf("mmc fail to send stop cmd\n"); + free(cache_align_buf); return 0; }
@@ -378,6 +400,8 @@ int mmc_read_blocks(struct mmc *mmc, void *dst, ulong start, lbaint_t blkcnt) mmc_send_status(mmc, timeout); }
+ memcpy(dst, cache_align_buf, mmc->read_bl_len * blkcnt); + free(cache_align_buf); return blkcnt; }

On Friday, August 19, 2011 05:25:13 Lukasz Majewski wrote:
- cache_align_buf = memalign(get_dcache_line_size(),
nowhere do i see get_dcache_line_size() defined
also, what is the code size increase with your patch ? -mike

Hi Mike,
On Fri, 19 Aug 2011 09:57:10 -0400 Mike Frysinger vapier@gentoo.org wrote:
On Friday, August 19, 2011 05:25:13 Lukasz Majewski wrote:
- cache_align_buf = memalign(get_dcache_line_size(),
nowhere do i see get_dcache_line_size() defined
also, what is the code size increase with your patch ? -mike
Please look to the following post: http://patchwork.ozlabs.org/patch/110501/
and another related with this issue: http://patchwork.ozlabs.org/patch/110300/
Code size overhead (s5p_goni target): Without proposed changes: 167928 B (u-boot.bin) With changes: 168208 B (u-boot.bin)
Delta: 280 B

On Friday, August 19, 2011 11:28:18 Lukasz Majewski wrote:
On Fri, 19 Aug 2011 09:57:10 -0400 Mike Frysinger wrote:
On Friday, August 19, 2011 05:25:13 Lukasz Majewski wrote:
- cache_align_buf = memalign(get_dcache_line_size(),
nowhere do i see get_dcache_line_size() defined
Please look to the following post: http://patchwork.ozlabs.org/patch/110501/
and another related with this issue: http://patchwork.ozlabs.org/patch/110300/
if you're posting patches with dependencies, you need to mention them explicitly (below the "---" area), or send proper patch series ([PATCH N/M]).
ignoring that, this patch will break all arches except arm. that's bad mmmkay. you probably need to move that weak def out of arm's cache.c and into like lib/cache.c.
also, what is the code size increase with your patch ?
Code size overhead (s5p_goni target): Without proposed changes: 167928 B (u-boot.bin) With changes: 168208 B (u-boot.bin)
Delta: 280 B
np if it gives significant (more than system noise) speedups. any details on that ? -mike

Hi Mike,
On Fri, 19 Aug 2011 11:35:50 -0400 Mike Frysinger vapier@gentoo.org wrote:
On Friday, August 19, 2011 11:28:18 Lukasz Majewski wrote:
On Fri, 19 Aug 2011 09:57:10 -0400 Mike Frysinger wrote:
On Friday, August 19, 2011 05:25:13 Lukasz Majewski wrote:
- cache_align_buf = memalign(get_dcache_line_size(),
nowhere do i see get_dcache_line_size() defined
Please look to the following post: http://patchwork.ozlabs.org/patch/110501/
and another related with this issue: http://patchwork.ozlabs.org/patch/110300/
if you're posting patches with dependencies, you need to mention them explicitly (below the "---" area), or send proper patch series ([PATCH N/M]).
ignoring that, this patch will break all arches except arm. that's bad mmmkay. you probably need to move that weak def out of arm's cache.c and into like lib/cache.c.
Yes, I will prepare two patch series: One addressing the get_dcache_line_size function for all u-boot architectures.
Another patch series will address changes to the drivers/mmc.c file.
also, what is the code size increase with your patch ?
Code size overhead (s5p_goni target): Without proposed changes: 167928 B (u-boot.bin) With changes: 168208 B (u-boot.bin)
Delta: 280 B
np if it gives significant (more than system noise) speedups. any details on that ? -mike
No tests performed yet. The goal of those patches is to preserve the MMC subsystem functionality when dcache is enabled (the ext_csd[512] corruption is observed with d-cache enabled).
I'm particularly interested if the approach with memalign and get_dcache_line_size is the preferred way to go.
I will think about some reliable ways to measure the MMC performance with enabled and disabled MMC subsystem.

On Monday, August 22, 2011 03:29:52 Lukasz Majewski wrote:
On Fri, 19 Aug 2011 11:35:50 -0400 Mike Frysinger wrote:
On Friday, August 19, 2011 11:28:18 Lukasz Majewski wrote:
On Fri, 19 Aug 2011 09:57:10 -0400 Mike Frysinger wrote:
On Friday, August 19, 2011 05:25:13 Lukasz Majewski wrote: also, what is the code size increase with your patch ?
Code size overhead (s5p_goni target): Without proposed changes: 167928 B (u-boot.bin) With changes: 168208 B (u-boot.bin)
Delta: 280 B
np if it gives significant (more than system noise) speedups. any details on that ?
No tests performed yet. The goal of those patches is to preserve the MMC subsystem functionality when dcache is enabled (the ext_csd[512] corruption is observed with d-cache enabled).
so you're papering over a bug in some controller's cache handling ? shouldnt you fix the controller in question by having it flush its caches ? aligning random buffers to make cache issues "go away" isnt the right way for anything. -mike

On Mon, Aug 22, 2011 at 9:08 AM, Mike Frysinger vapier@gentoo.org wrote:
On Monday, August 22, 2011 03:29:52 Lukasz Majewski wrote:
On Fri, 19 Aug 2011 11:35:50 -0400 Mike Frysinger wrote:
On Friday, August 19, 2011 11:28:18 Lukasz Majewski wrote:
On Fri, 19 Aug 2011 09:57:10 -0400 Mike Frysinger wrote:
On Friday, August 19, 2011 05:25:13 Lukasz Majewski wrote: also, what is the code size increase with your patch ?
Code size overhead (s5p_goni target): Without proposed changes: 167928 B (u-boot.bin) With changes: 168208 B (u-boot.bin)
Delta: 280 B
np if it gives significant (more than system noise) speedups. any details on that ?
No tests performed yet. The goal of those patches is to preserve the MMC subsystem functionality when dcache is enabled (the ext_csd[512] corruption is observed with d-cache enabled).
so you're papering over a bug in some controller's cache handling ? shouldnt you fix the controller in question by having it flush its caches ? aligning random buffers to make cache issues "go away" isnt the right way for anything. -mike
No, this isn't something that can be fixed in the controller driver code. This is a fundamental problem with buffers in U-Boot that needs to be resolved by aligning all buffers used for DMA. The main problem is that invalidating a non-cache line aligned buffer is not a safe operation. There have been a number of threads discussing this. The general consensus was to make attempting to invalidate an unaligned buffer an error and then to clean up the unaligned buffers as we find them.
Lukasz, I also have been using memalign to clean up accesses in local patches, so you've got my vote there. I am curious as to whether we should provide a single block allocation API or if each subsection should add lazy memalign allocations to create aligned buffers when they are needed...
Thanks, Anton
U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

On Monday, August 22, 2011 06:42:06 PM Anton Staaf wrote:
On Mon, Aug 22, 2011 at 9:08 AM, Mike Frysinger vapier@gentoo.org wrote:
On Monday, August 22, 2011 03:29:52 Lukasz Majewski wrote:
On Fri, 19 Aug 2011 11:35:50 -0400 Mike Frysinger wrote:
On Friday, August 19, 2011 11:28:18 Lukasz Majewski wrote:
On Fri, 19 Aug 2011 09:57:10 -0400 Mike Frysinger wrote:
On Friday, August 19, 2011 05:25:13 Lukasz Majewski wrote: also, what is the code size increase with your patch ?
Code size overhead (s5p_goni target): Without proposed changes: 167928 B (u-boot.bin) With changes: 168208 B (u-boot.bin)
Delta: 280 B
np if it gives significant (more than system noise) speedups. any details on that ?
No tests performed yet. The goal of those patches is to preserve the MMC subsystem functionality when dcache is enabled (the ext_csd[512] corruption is observed with d-cache enabled).
so you're papering over a bug in some controller's cache handling ? shouldnt you fix the controller in question by having it flush its caches ? aligning random buffers to make cache issues "go away" isnt the right way for anything. -mike
No, this isn't something that can be fixed in the controller driver code. This is a fundamental problem with buffers in U-Boot that needs to be resolved by aligning all buffers used for DMA. The main problem is that invalidating a non-cache line aligned buffer is not a safe operation. There have been a number of threads discussing this. The general consensus was to make attempting to invalidate an unaligned buffer an error and then to clean up the unaligned buffers as we find them.
Lukasz, I also have been using memalign to clean up accesses in local patches, so you've got my vote there. I am curious as to whether we should provide a single block allocation API or if each subsection should add lazy memalign allocations to create aligned buffers when they are needed...
Maybe some dma_allocate_aligned() would be cool. And probably control the alignment with some #define CONFIG PLATFORM_ALIGNMENT_SIZE ?
Cheers
Thanks, Anton
U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

On Monday, August 22, 2011 12:42:06 Anton Staaf wrote:
On Mon, Aug 22, 2011 at 9:08 AM, Mike Frysinger vapier@gentoo.org wrote:
On Monday, August 22, 2011 03:29:52 Lukasz Majewski wrote:
No tests performed yet. The goal of those patches is to preserve the MMC subsystem functionality when dcache is enabled (the ext_csd[512] corruption is observed with d-cache enabled).
so you're papering over a bug in some controller's cache handling ? shouldnt you fix the controller in question by having it flush its caches ? aligning random buffers to make cache issues "go away" isnt the right way for anything.
No, this isn't something that can be fixed in the controller driver code. This is a fundamental problem with buffers in U-Boot that needs to be resolved by aligning all buffers used for DMA. The main problem is that invalidating a non-cache line aligned buffer is not a safe operation. There have been a number of threads discussing this. The general consensus was to make attempting to invalidate an unaligned buffer an error and then to clean up the unaligned buffers as we find them.
Linux has taken the approach of the callers providing dma safe buffers instead of having lower layers trying to fix things up all the time. and for obvious reasons: you avoid the double allocation, the memcpy, and you avoid unclear documentation between layers where multiple layers attempt to fixup the callers thus futher multiplying the duplicated allocations/copies/etc...
so in this case, why not fix the caller that is passing a dma unaligned buffer down to the mmc code ? -mike

Yes, this would be a much preferable approach. The only problem that I encountered when attempting to do that in the Chromium U-Boot was that there are exposed stand alone U-Boot API's that can pass in pointers that make their way all the way down to the device driver. Two solutions occurred to me. One was to put an Error or assert as high up as possible and expect violators to fix their code, the other was to patch things up in lower layers (right now we have a bounce buffer in the tegra MMC driver to deal with unaligned buffers). My intent was that this would give us a functional U-Boot while we took the time to clean up as many of the unaligned buffers that U-Boot uses, and then tackle the question of stand alone applications.
-Anton
On Mon, Aug 22, 2011 at 10:17 AM, Mike Frysinger vapier@gentoo.org wrote:
On Monday, August 22, 2011 12:42:06 Anton Staaf wrote:
On Mon, Aug 22, 2011 at 9:08 AM, Mike Frysinger vapier@gentoo.org wrote:
On Monday, August 22, 2011 03:29:52 Lukasz Majewski wrote:
No tests performed yet. The goal of those patches is to preserve the MMC subsystem functionality when dcache is enabled (the ext_csd[512] corruption is observed with d-cache enabled).
so you're papering over a bug in some controller's cache handling ? shouldnt you fix the controller in question by having it flush its caches ? aligning random buffers to make cache issues "go away" isnt the right way for anything.
No, this isn't something that can be fixed in the controller driver code. This is a fundamental problem with buffers in U-Boot that needs to be resolved by aligning all buffers used for DMA. The main problem is that invalidating a non-cache line aligned buffer is not a safe operation. There have been a number of threads discussing this. The general consensus was to make attempting to invalidate an unaligned buffer an error and then to clean up the unaligned buffers as we find them.
Linux has taken the approach of the callers providing dma safe buffers instead of having lower layers trying to fix things up all the time. and for obvious reasons: you avoid the double allocation, the memcpy, and you avoid unclear documentation between layers where multiple layers attempt to fixup the callers thus futher multiplying the duplicated allocations/copies/etc...
so in this case, why not fix the caller that is passing a dma unaligned buffer down to the mmc code ? -mike

On Monday, August 22, 2011 14:15:22 Anton Staaf wrote:
please dont top post
Yes, this would be a much preferable approach. The only problem that I encountered when attempting to do that in the Chromium U-Boot was that there are exposed stand alone U-Boot API's that can pass in pointers that make their way all the way down to the device driver. Two solutions occurred to me. One was to put an Error or assert as high up as possible and expect violators to fix their code, the other was to patch things up in lower layers (right now we have a bounce buffer in the tegra MMC driver to deal with unaligned buffers). My intent was that this would give us a functional U-Boot while we took the time to clean up as many of the unaligned buffers that U-Boot uses, and then tackle the question of stand alone applications.
could you elaborate on the code paths in question ? -mike

On Mon, Aug 22, 2011 at 11:31 AM, Mike Frysinger vapier@gentoo.org wrote:
On Monday, August 22, 2011 14:15:22 Anton Staaf wrote:
please dont top post
Sorry about that.
Yes, this would be a much preferable approach. The only problem that I encountered when attempting to do that in the Chromium U-Boot was that there are exposed stand alone U-Boot API's that can pass in pointers that make their way all the way down to the device driver. Two solutions occurred to me. One was to put an Error or assert as high up as possible and expect violators to fix their code, the other was to patch things up in lower layers (right now we have a bounce buffer in the tegra MMC driver to deal with unaligned buffers). My intent was that this would give us a functional U-Boot while we took the time to clean up as many of the unaligned buffers that U-Boot uses, and then tackle the question of stand alone applications.
could you elaborate on the code paths in question ?
Heh, now I've got to go fine them again. The ub_dev_read function in the example/api/glue.c is an example of a use of the public syscall API that as far as I can tell passes down the buffer pointer all the way to the driver layer.
As for the unaligned buffers I've run into:
fs/ext2/dev.c: sec_buf is allocated on the stack and thus is not guaranteed to be aligned. drivers/mmc/mmc.c: ext_csd in mmc_change_freq is allocated on the stack. drivers/mmc/mmc.c: scr and switch_status in sd_change_freq are allocated on the stack. drivers/mmc/mmc.c: ext_csd in mmc_startup is allocated on the stack. disk/part_efi.c: gpt_header in print_part_efi is allocated on the stack. disk/part_efi.c: gpt_header in get_partition_info_efi is allocated on the stack. disk/part_efi.c: legacy_mbr in test_part_efi is allocated on the stack. disk/part_efi.c: pte in alloc_read_gpt_entries is allocated using malloc (not memalign).
I have working solutions to all of these but they generally use the lazy allocation of an aligned buffer as needed pattern. I wanted to see the results of Lukasz current thread before sending them upstream. But that's probably not right. I should probably send them now so we can see a number of examples of the problem and get a better idea of how to fix it.
Thanks, Anton
-mike

Hi Anton,
On Mon, 22 Aug 2011 11:57:57 -0700 Anton Staaf robotboy@google.com wrote:
drivers/mmc/mmc.c: ext_csd in mmc_change_freq is allocated on the stac drivers/mmc/mmc.c: scr and switch_status in sd_change_freq are allocated on the stack. drivers/mmc/mmc.c: ext_csd in mmc_startup is allocated on the stack.
This allocations are already fixed:
http://patchwork.ozlabs.org/patch/110300/ http://patchwork.ozlabs.org/patch/109790/
If any doubts/comments/ideas, please let me know :-)
But that's probably not right. I should probably send them now so we can see a number of examples of the problem and get a better idea of how to fix it.
Discussion is always welcome.

On Tue, Aug 23, 2011 at 2:19 AM, Lukasz Majewski l.majewski@samsung.com wrote:
Hi Anton,
On Mon, 22 Aug 2011 11:57:57 -0700 Anton Staaf robotboy@google.com wrote:
drivers/mmc/mmc.c: ext_csd in mmc_change_freq is allocated on the stac drivers/mmc/mmc.c: scr and switch_status in sd_change_freq are allocated on the stack. drivers/mmc/mmc.c: ext_csd in mmc_startup is allocated on the stack.
This allocations are already fixed:
http://patchwork.ozlabs.org/patch/110300/ http://patchwork.ozlabs.org/patch/109790/
Yup, should have said that you'd already taken care of some of these. I'll send a patch now for another unaligned buffer in the mmc code that isn't in your patch set.
Thanks, Anton
If any doubts/comments/ideas, please let me know :-)
But that's probably not right. I should probably send them now so we can see a number of examples of the problem and get a better idea of how to fix it.
Discussion is always welcome.
-- Best regards,
Lukasz Majewski
Samsung Poland R&D Center Platform Group

On Tuesday, August 23, 2011 05:19:39 Lukasz Majewski wrote:
On Mon, 22 Aug 2011 11:57:57 -0700 Anton Staaf wrote:
drivers/mmc/mmc.c: ext_csd in mmc_change_freq is allocated on the stac drivers/mmc/mmc.c: scr and switch_status in sd_change_freq are allocated on the stack. drivers/mmc/mmc.c: ext_csd in mmc_startup is allocated on the stack.
This allocations are already fixed:
http://patchwork.ozlabs.org/patch/110300/ http://patchwork.ozlabs.org/patch/109790/
If any doubts/comments/ideas, please let me know :-)
hmm, i wish we had a memalign_alloca(). and all this copy & pasting of get_dcache_line_size() makes me unhappy as we're encoding too-low-of-a-level logic into funcs.
what about adding a new func like: #define dma_buffer_alloca(size)
and it would take care of allocating a big enough aligned buffer on the stack. -mike

On Tue, Aug 23, 2011 at 10:30 AM, Mike Frysinger vapier@gentoo.org wrote:
On Tuesday, August 23, 2011 05:19:39 Lukasz Majewski wrote:
On Mon, 22 Aug 2011 11:57:57 -0700 Anton Staaf wrote:
drivers/mmc/mmc.c: ext_csd in mmc_change_freq is allocated on the stac drivers/mmc/mmc.c: scr and switch_status in sd_change_freq are allocated on the stack. drivers/mmc/mmc.c: ext_csd in mmc_startup is allocated on the stack.
This allocations are already fixed:
http://patchwork.ozlabs.org/patch/110300/ http://patchwork.ozlabs.org/patch/109790/
If any doubts/comments/ideas, please let me know :-)
hmm, i wish we had a memalign_alloca(). and all this copy & pasting of get_dcache_line_size() makes me unhappy as we're encoding too-low-of-a-level logic into funcs.
what about adding a new func like: #define dma_buffer_alloca(size)
I generally avoid large allocations on the stack, they can confuse virtual stack management and blow out small embedded stacks. But neither of these are really a problem for most U-Boot targets. Are you thinking something like:
#define dma_buffer_alloca(size) alloca(size + get_dcache_line_size() - 1) & ~(get_dcache_line_size() - 1);
Subtracting one from the total allocated size is technically correct, but could fail poorly if the get_dcache_line_size function returned 0 for some reason. Perhaps because it's called on a target with no cache so the implementer figured 0 was as good as any value to return.
I have a nagging suspicion that I'm forgetting something though. I know I looked at using alloca first when I was starting to deal with cache and DMA issues on the Tegra. And I seem to recall a reason not to use it. But it's not coming back to me now. Perhaps someone else will think of it. :)
Thanks, Anton
and it would take care of allocating a big enough aligned buffer on the stack. -mike

On Tuesday, August 23, 2011 14:12:09 Anton Staaf wrote:
On Tue, Aug 23, 2011 at 10:30 AM, Mike Frysinger wrote:
On Tuesday, August 23, 2011 05:19:39 Lukasz Majewski wrote:
On Mon, 22 Aug 2011 11:57:57 -0700 Anton Staaf wrote:
drivers/mmc/mmc.c: ext_csd in mmc_change_freq is allocated on the stac drivers/mmc/mmc.c: scr and switch_status in sd_change_freq are allocated on the stack. drivers/mmc/mmc.c: ext_csd in mmc_startup is allocated on the stack.
This allocations are already fixed:
http://patchwork.ozlabs.org/patch/110300/ http://patchwork.ozlabs.org/patch/109790/
If any doubts/comments/ideas, please let me know :-)
hmm, i wish we had a memalign_alloca(). and all this copy & pasting of get_dcache_line_size() makes me unhappy as we're encoding too-low-of-a-level logic into funcs.
what about adding a new func like: #define dma_buffer_alloca(size)
I generally avoid large allocations on the stack, they can confuse virtual stack management and blow out small embedded stacks. But neither of these are really a problem for most U-Boot targets.
and more importantly, we're already doing these things on the stack ;)
Are you thinking something like:
#define dma_buffer_alloca(size) alloca(size + get_dcache_line_size() -
- & ~(get_dcache_line_size() - 1);
yes, exactly
Subtracting one from the total allocated size is technically correct, but could fail poorly if the get_dcache_line_size function returned 0 for some reason. Perhaps because it's called on a target with no cache so the implementer figured 0 was as good as any value to return.
i'm not sure we should cater to buggy get_dcache_line_size implementations
I have a nagging suspicion that I'm forgetting something though. I know I looked at using alloca first when I was starting to deal with cache and DMA issues on the Tegra. And I seem to recall a reason not to use it. But it's not coming back to me now. Perhaps someone else will think of it. :)
if the stack lives in a place that dma can't access, but that'd already be a problem for these funcs -mike

On Tuesday, August 23, 2011 14:12:09 Anton Staaf wrote:
On Tue, Aug 23, 2011 at 10:30 AM, Mike Frysinger wrote:
what about adding a new func like: #define dma_buffer_alloca(size)
I generally avoid large allocations on the stack, they can confuse virtual stack management and blow out small embedded stacks.
oh, and that doesnt preclude also adding a dma_buffer_malloc(size). it's just that all the cases here are already on the stack, so i was focusing on that. -mike

On Tue, Aug 23, 2011 at 11:36 AM, Mike Frysinger vapier@gentoo.org wrote:
On Tuesday, August 23, 2011 14:12:09 Anton Staaf wrote:
On Tue, Aug 23, 2011 at 10:30 AM, Mike Frysinger wrote:
what about adding a new func like: #define dma_buffer_alloca(size)
I generally avoid large allocations on the stack, they can confuse virtual stack management and blow out small embedded stacks.
oh, and that doesnt preclude also adding a dma_buffer_malloc(size). it's just that all the cases here are already on the stack, so i was focusing on that. -mike
Yup, totally reasonable. Lukasz, would you like to add the above macro to your dcache cache line size patch? Then we can change our other patches to use it instead of memalign.
Thanks, Anton

Dear Anton Staaf,
In message CAF6FioXeFQP2a0VaZOVOTmxPvJxF4=kamS18=2=p6TFNefwUuA@mail.gmail.com you wrote:
what about adding a new func like: #define dma_buffer_alloca(size)
I generally avoid large allocations on the stack, they can confuse virtual stack management and blow out small embedded stacks. But neither of these are really a problem for most U-Boot targets. Are you thinking something like:
#define dma_buffer_alloca(size) alloca(size + get_dcache_line_size() -
- & ~(get_dcache_line_size() - 1);
I don't think I will accept any alloca() based code into mainline.
Best regards,
Wolfgang Denk

On Tue, Aug 23, 2011 at 1:12 PM, Wolfgang Denk wd@denx.de wrote:
Dear Anton Staaf,
In message CAF6FioXeFQP2a0VaZOVOTmxPvJxF4=kamS18=2=p6TFNefwUuA@mail.gmail.com you wrote:
what about adding a new func like: #define dma_buffer_alloca(size)
I generally avoid large allocations on the stack, they can confuse virtual stack management and blow out small embedded stacks. But neither of these are really a problem for most U-Boot targets. Are you thinking something like:
#define dma_buffer_alloca(size) alloca(size + get_dcache_line_size() -
- & ~(get_dcache_line_size() - 1);
I don't think I will accept any alloca() based code into mainline.
Ahh, that must have been my reason for not using it in the first place, a premonition. :)
So then, to guide our efforts, what is a more suitable solution? Would you prefer we stick with the existing path of calling memalign and passing it the cache size by directly calling get_dcache_line_size? Or would you prefer something more like a dma_buffer_malloc function that allocates on the heap a cache line size aligned buffer and returns it?
Hmm, that will require a bit of thought because we need to recover the original pointer returned by malloc to cleanly free the buffer later.
Thanks, Anton
Best regards,
Wolfgang Denk
-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de "There are three principal ways to lose money: wine, women, and en- gineers. While the first two are more pleasant, the third is by far the more certain." -- Baron Rothschild, ca. 1800

On Tuesday, August 23, 2011 16:27:26 Anton Staaf wrote:
So then, to guide our efforts, what is a more suitable solution? Would you prefer we stick with the existing path of calling memalign and passing it the cache size by directly calling get_dcache_line_size? Or would you prefer something more like a dma_buffer_malloc function that allocates on the heap a cache line size aligned buffer and returns it?
memalign() is simply a malloc() with offset fudging, so dma_buffer_malloc() is the way to go imo. anything that involves end code having to figure out how to align things itself is asking for pain. -mike

On Tue, Aug 23, 2011 at 1:37 PM, Mike Frysinger vapier@gentoo.org wrote:
On Tuesday, August 23, 2011 16:27:26 Anton Staaf wrote:
So then, to guide our efforts, what is a more suitable solution? Would you prefer we stick with the existing path of calling memalign and passing it the cache size by directly calling get_dcache_line_size? Or would you prefer something more like a dma_buffer_malloc function that allocates on the heap a cache line size aligned buffer and returns it?
memalign() is simply a malloc() with offset fudging, so dma_buffer_malloc() is the way to go imo. anything that involves end code having to figure out how to align things itself is asking for pain.
Indeed, I had temporarily forgotten about memalign it seems. :/
Does it make more sense to put such a function into Lukasz's patch, or a separate patch?
Thanks, Anton
-mike

On Tuesday, August 23, 2011 17:06:41 Anton Staaf wrote:
On Tue, Aug 23, 2011 at 1:37 PM, Mike Frysinger vapier@gentoo.org wrote:
On Tuesday, August 23, 2011 16:27:26 Anton Staaf wrote:
So then, to guide our efforts, what is a more suitable solution? Would you prefer we stick with the existing path of calling memalign and passing it the cache size by directly calling get_dcache_line_size? Or would you prefer something more like a dma_buffer_malloc function that allocates on the heap a cache line size aligned buffer and returns it?
memalign() is simply a malloc() with offset fudging, so dma_buffer_malloc() is the way to go imo. anything that involves end code having to figure out how to align things itself is asking for pain.
Indeed, I had temporarily forgotten about memalign it seems. :/
Does it make more sense to put such a function into Lukasz's patch, or a separate patch?
Lukasz' havent been merged yet, so imo it makes more sense to put the sane framework in place and then fix the relevant code on top of that -mike

Dear Mike Frysinger,
In message 201108231637.05845.vapier@gentoo.org you wrote:
On Tuesday, August 23, 2011 16:27:26 Anton Staaf wrote:
So then, to guide our efforts, what is a more suitable solution? Would you prefer we stick with the existing path of calling memalign and passing it the cache size by directly calling get_dcache_line_size? Or would you prefer something more like a dma_buffer_malloc function that allocates on the heap a cache line size aligned buffer and returns it?
memalign() is simply a malloc() with offset fudging, so dma_buffer_malloc() is the way to go imo. anything that involves end code having to figure out how to align things itself is asking for pain.
I would like to avoid using any malloc code here. We have to keep in mind that such code changes will spread, and will be copied into driver code, file systems, etc. which might be used (and even required, for example for NAND or SDCard booting systems) before relocation - but malloc becomes available only after relocation.
Why cannot we define a macro that declares a (sufficiently sized) buffer on the stack and provides and a pointer to a (correctly aligned) address in this buffer?
Best regards,
Wolfgang Denk

On Tuesday, August 23, 2011 17:09:37 Wolfgang Denk wrote:
Mike Frysinger wrote:
On Tuesday, August 23, 2011 16:27:26 Anton Staaf wrote:
So then, to guide our efforts, what is a more suitable solution? Would you prefer we stick with the existing path of calling memalign and passing it the cache size by directly calling get_dcache_line_size? Or would you prefer something more like a dma_buffer_malloc function that allocates on the heap a cache line size aligned buffer and returns it?
memalign() is simply a malloc() with offset fudging, so dma_buffer_malloc() is the way to go imo. anything that involves end code having to figure out how to align things itself is asking for pain.
I would like to avoid using any malloc code here. We have to keep in mind that such code changes will spread, and will be copied into driver code, file systems, etc. which might be used (and even required, for example for NAND or SDCard booting systems) before relocation - but malloc becomes available only after relocation.
Why cannot we define a macro that declares a (sufficiently sized) buffer on the stack and provides and a pointer to a (correctly aligned) address in this buffer?
isnt that what i already posted and you NAK-ed ? :)
DMA_DECLARE_BUFFER(...) -mike

On Tue, Aug 23, 2011 at 2:32 PM, Mike Frysinger vapier@gentoo.org wrote:
On Tuesday, August 23, 2011 17:09:37 Wolfgang Denk wrote:
Mike Frysinger wrote:
On Tuesday, August 23, 2011 16:27:26 Anton Staaf wrote:
So then, to guide our efforts, what is a more suitable solution? Would you prefer we stick with the existing path of calling memalign and passing it the cache size by directly calling get_dcache_line_size? Or would you prefer something more like a dma_buffer_malloc function that allocates on the heap a cache line size aligned buffer and returns it?
memalign() is simply a malloc() with offset fudging, so dma_buffer_malloc() is the way to go imo. anything that involves end code having to figure out how to align things itself is asking for pain.
I would like to avoid using any malloc code here. We have to keep in mind that such code changes will spread, and will be copied into driver code, file systems, etc. which might be used (and even required, for example for NAND or SDCard booting systems) before relocation - but malloc becomes available only after relocation.
Why cannot we define a macro that declares a (sufficiently sized) buffer on the stack and provides and a pointer to a (correctly aligned) address in this buffer?
isnt that what i already posted and you NAK-ed ? :)
DMA_DECLARE_BUFFER(...)
I wasn't going to say it. :) How about something like this, which is very similar to what you had Mike, but doesn't define the array in the macro. It's a bit clearer what is going on, but also requires a bit more work at each use.
#define DCACHE_RESIZE(size) ((size) + get_dcache_line_size() - 1) #define DCACHE_ALIGN(buffer) ((buffer) + get_dcache_line_size() - 1) & ~(get_dcache_line_size() - 1)
char buffer[DCACHE_RESIZE(100)]; char * aligned = DCACHE_ALIGN(buffer);
It would be awesome if the idea below worked, but it can't because the array is popped when the ({...}) scope is exited I believe.
#define allocate_dma_buffer_on_stack(size) \ ({ \ char _dma_buffer[(size) + get_dcache_line_size() - 1]; \ \ _dma_buffer & ~(get_dcache_line_size() - 1); \ })
And you could use it like:
char * buffer = allocate_dma_buffer_on_stack(100);
If anyone can think of a way to make this work that would be excellent...
Thanks, Anton
-mike

On Tuesday, August 23, 2011 17:48:50 Anton Staaf wrote:
I wasn't going to say it. :) How about something like this, which is very similar to what you had Mike, but doesn't define the array in the macro. It's a bit clearer what is going on, but also requires a bit more work at each use.
#define DCACHE_RESIZE(size) ((size) + get_dcache_line_size() - 1) #define DCACHE_ALIGN(buffer) ((buffer) + get_dcache_line_size() - 1) & ~(get_dcache_line_size() - 1)
char buffer[DCACHE_RESIZE(100)];
as long as people always use a byte-sized type (i.e. char), this should work. obviously using "u32 buffer[...]" will be bad.
It would be awesome if the idea below worked, but it can't because the array is popped when the ({...}) scope is exited I believe.
yes, i believe you are correct. -mike

Dear Mike Frysinger,
In message 201108231732.39791.vapier@gentoo.org you wrote:
Why cannot we define a macro that declares a (sufficiently sized) buffer on the stack and provides and a pointer to a (correctly aligned) address in this buffer?
isnt that what i already posted and you NAK-ed ? :)
DMA_DECLARE_BUFFER(...)
I just NAKed this specific implementation.
Best regards,
Wolfgang Denk

On Tuesday, August 23, 2011 18:42:46 Wolfgang Denk wrote:
Mike Frysinger wrote:
Why cannot we define a macro that declares a (sufficiently sized) buffer on the stack and provides and a pointer to a (correctly aligned) address in this buffer?
isnt that what i already posted and you NAK-ed ? :)
DMA_DECLARE_BUFFER(...)
I just NAKed this specific implementation.
well, it's hard to come up with a "good" one without knowing the parameters to work within ;). i'm not stuck on any particular implementation, i just want the helper to be simple to use and hard to screw up.
the trouble with doing something like: char foo[func_to_do_round(size)]; is that "size" is not in # of bytes but is number of elements. so the point of my (i dont deny) complicated cpp logic was so that the internal implementation took on more of the work leaving the user (which we all know can easily mess things up) with a very simple macro: DMA_DECLARE_BUFFER(<buffer type>, <variable name>, <num elements>); -mike

Hi,
On Tue, 23 Aug 2011 23:00:59 -0400 Mike Frysinger vapier@gentoo.org wrote:
On Tuesday, August 23, 2011 18:42:46 Wolfgang Denk wrote:
Mike Frysinger wrote:
Why cannot we define a macro that declares a (sufficiently sized) buffer on the stack and provides and a pointer to a (correctly aligned) address in this buffer?
isnt that what i already posted and you NAK-ed ? :)
DMA_DECLARE_BUFFER(...)
I just NAKed this specific implementation.
well, it's hard to come up with a "good" one without knowing the parameters to work within ;). i'm not stuck on any particular implementation, i just want the helper to be simple to use and hard to screw up.
the trouble with doing something like: char foo[func_to_do_round(size)]; is that "size" is not in # of bytes but is number of elements. so the point of my (i dont deny) complicated cpp logic was so that the internal implementation took on more of the work leaving the user (which we all know can easily mess things up) with a very simple macro: DMA_DECLARE_BUFFER(<buffer type>, <variable name>, <num elements>); -mike
After reading dcache related threads I'd like to sum them up:
1. alloca() -> not acceptable to u-boot mainline by Wolfgang. I agree that alloca is NOT the way to go.
2. malloc/memalign -> avoidable to use
3. Mike's DMA_DECLARE_BUFFER(<buffer type>, <variable name>, <size in bytes>) solution looks OK for me, at least for the stack allocated data (e.g. ext_csd). However this proposed implementation has been NAK'ed by Wolfgang.
I'm curious how this macro should look like? Is it only matter of code reordering or other approach shall be found?
4. get_dcache_line_size() can be simply defined as #define get_dcache_line_size() CONFIG_SYS_CACHE_LINE_SIZE if we _really_ want to save a few bytes. However, I think, that proposed get_dcache_line_size() implementation ( http://patchwork.ozlabs.org/patch/111048/ ) is more programmer friendly (one don't need to exactly know the dcache line size on a particular SoC).

Dear Lukasz Majewski,
In message 20110824120744.097ba2c5@lmajewski.digital.local you wrote:
After reading dcache related threads I'd like to sum them up:
- alloca() -> not acceptable to u-boot mainline by Wolfgang. I agree
that alloca is NOT the way to go.
malloc/memalign -> avoidable to use
Mike's DMA_DECLARE_BUFFER(<buffer type>, <variable name>,
<size in bytes>) solution looks OK for me, at least for the stack allocated data (e.g. ext_csd). However this proposed implementation has been NAK'ed by Wolfgang.
I'm curious how this macro should look like? Is it only matter of code reordering or other approach shall be found?
I think I'd like to see a macro that can be used like this:
void *dma_buffer_pointer = DMA_BUFFER(size_in_bytes);
- get_dcache_line_size() can be simply defined as
#define get_dcache_line_size() CONFIG_SYS_CACHE_LINE_SIZE if we _really_ want to save a few bytes.
Actually I fail to understand why we would ever need get_dcache_line_size() in a boot loader.
Best regards,
Wolfgang Denk

Hi Wolfgang,
I think I'd like to see a macro that can be used like this:
void *dma_buffer_pointer = DMA_BUFFER(size_in_bytes);
Frankly speaking I don't know any easy way to define buffer this way in a macro (at least without digging deep enough to C preprocessor programming tricks). Maybe Mike or Anton knows.
The void *dma_buffer_pointer = DMA_BUFFER(size_in_bytes); approach needs to do following things in macro:
#define DMA_BUFFER(100) \ char buf[100 + get_dcache_line_size]; \ unsigned long tmp = (unsigned long) buf; \ void* buf_out = (void*) ((tmp + (get_dcache_line_size() - 1)) & ~(get_dcache_line_size() - 1))
The problem here is to assign the buf_out pointer to the void *dma_buffer_pointer. How can we "return" the void *buf_out?
For comparison please look to this solution:
#define ALIGN_ADDR(addr) ((void *)(((unsigned long) addr + get_dcache_line_size() - 1)\ & ~(get_dcache_line_size() - 1)))
#define DMA_DECLARE_BUFFER(type, name, size) \ char *__##name[size + get_dcache_line_size()]; \ type *name = ALIGN_ADDR(__##name);
int mmc_startup(struct mmc *mmc) { /* Some declarations */ /* char ext_csd[512]; */ DMA_DECLARE_BUFFER(char, ext_csd, 512);
printf("%s: ptr:%p\n", __func__, ext_csd);
/* rest of the code */ }
Here the DMA_DECLARE_BUFFER really defines the buffer as an automatic variable with this function scope. This is tested and works :-)
- get_dcache_line_size() can be simply defined as
#define get_dcache_line_size() CONFIG_SYS_CACHE_LINE_SIZE if we _really_ want to save a few bytes.
Actually I fail to understand why we would ever need get_dcache_line_size() in a boot loader.
If I can ask for clarification here.
Shall not u-boot read on fly the cache line size (as it is now done) or you don't like the get_cache_line_size defined as function?
Going further shall the get_cache_line_size be removed completely and replaced with CONFIG_SYS_CACHE_LINE_SIZE?
Best regards,
Wolfgang Denk

On Wednesday, August 24, 2011 09:25:53 Wolfgang Denk wrote:
Lukasz Majewski wrote:
- Mike's DMA_DECLARE_BUFFER(<buffer type>, <variable name>,
<size in bytes>) solution looks OK for me, at least for the stack allocated data (e.g. ext_csd). However this proposed implementation has been NAK'ed by Wolfgang.
I'm curious how this macro should look like? Is it only matter of code reordering or other approach shall be found?
I think I'd like to see a macro that can be used like this:
void *dma_buffer_pointer = DMA_BUFFER(size_in_bytes);
as Anton highlighted, i'm not sure this is possible. DMA_BUFFER() would have to be defined with ({...}), and any storage declared in there is in new scope, so as soon as you exit that scope, any storage declared is invalid for use outside of it. the only thing that is good for is producing a single value (i.e. an "int" or "char" or a pointer etc..., but not storage).
if a person who knows more about gcc internals in this regard could comment, that'd be great :). -mike

On Wed, Aug 24, 2011 at 6:25 AM, Wolfgang Denk wd@denx.de wrote:
Dear Lukasz Majewski,
In message 20110824120744.097ba2c5@lmajewski.digital.local you wrote:
After reading dcache related threads I'd like to sum them up:
- alloca() -> not acceptable to u-boot mainline by Wolfgang. I agree
that alloca is NOT the way to go.
malloc/memalign -> avoidable to use
Mike's DMA_DECLARE_BUFFER(<buffer type>, <variable name>,
<size in bytes>) solution looks OK for me, at least for the stack allocated data (e.g. ext_csd). However this proposed implementation has been NAK'ed by Wolfgang.
I'm curious how this macro should look like? Is it only matter of code reordering or other approach shall be found?
I think I'd like to see a macro that can be used like this:
void *dma_buffer_pointer = DMA_BUFFER(size_in_bytes);
- get_dcache_line_size() can be simply defined as
#define get_dcache_line_size() CONFIG_SYS_CACHE_LINE_SIZE if we _really_ want to save a few bytes.
Actually I fail to understand why we would ever need get_dcache_line_size() in a boot loader.
It is required so that we can correctly allocate buffers that will be used by DMA engines to read or write data. The reason that these buffers need to be cache line size aligned is because unaligned cache invalidates are not possible to do in a safe way. The problem is that invalidating a partial cache line requires invalidating the entire line. And the other part of the line can contain nearby variables (especially if the buffer is stack allocated), so we have to first flush the cache line to be safe. However, that flush will clobber the values that the DMA engine wrote to main memory that are not reflected in the cache.
Thanks, Anton
Best regards,
Wolfgang Denk
-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de "You know, after a woman's raised a family and so on, she wants to start living her own life." "Whose life she's _been_ living, then?" - Terry Pratchett, _Witches Abroad_

On Wednesday, August 24, 2011 13:27:05 Anton Staaf wrote:
On Wed, Aug 24, 2011 at 6:25 AM, Wolfgang Denk wrote:
Lukasz Majewski wrote:
- get_dcache_line_size() can be simply defined as
#define get_dcache_line_size() CONFIG_SYS_CACHE_LINE_SIZE if we _really_ want to save a few bytes.
Actually I fail to understand why we would ever need get_dcache_line_size() in a boot loader.
It is required so that we can correctly allocate buffers that will be used by DMA engines to read or write data. The reason that these buffers need to be cache line size aligned is because unaligned cache invalidates are not possible to do in a safe way. The problem is that invalidating a partial cache line requires invalidating the entire line. And the other part of the line can contain nearby variables (especially if the buffer is stack allocated), so we have to first flush the cache line to be safe. However, that flush will clobber the values that the DMA engine wrote to main memory that are not reflected in the cache.
right, and that's why i want to "hide" it be hind a "dma buffer allocate" API so that people can just say "i want a dma safe buffer" rather than having to delve into these details. they'll inevitable get confused and screw it up. ;) -mike

Dear Anton Staaf,
In message CAF6FioWBsjFa+QdFNHEQGO50eLoqZMc--Yt3-iYX6DWr=hcOrg@mail.gmail.com you wrote:
- get_dcache_line_size() can be simply defined as
#define get_dcache_line_size() CONFIG_SYS_CACHE_LINE_SIZE if we _really_ want to save a few bytes.
Actually I fail to understand why we would ever need get_dcache_line_size() in a boot loader.
It is required so that we can correctly allocate buffers that will be used by DMA engines to read or write data. The reason that these
No, it is definitely NOT needed for this purpose - we have been using CONFIG_SYS_CACHE_LINE_SIZE for more than a decade without problems, so I don't really understand why we now would need a new function for this?
Best regards,
Wolfgang Denk

On Wed, Aug 24, 2011 at 11:12 AM, Wolfgang Denk wd@denx.de wrote:
Dear Anton Staaf,
In message CAF6FioWBsjFa+QdFNHEQGO50eLoqZMc--Yt3-iYX6DWr=hcOrg@mail.gmail.com you wrote:
- get_dcache_line_size() can be simply defined as
#define get_dcache_line_size() CONFIG_SYS_CACHE_LINE_SIZE if we _really_ want to save a few bytes.
Actually I fail to understand why we would ever need get_dcache_line_size() in a boot loader.
It is required so that we can correctly allocate buffers that will be used by DMA engines to read or write data. The reason that these
No, it is definitely NOT needed for this purpose - we have been using CONFIG_SYS_CACHE_LINE_SIZE for more than a decade without problems, so I don't really understand why we now would need a new function for this?
Ahh, I misunderstood your question. I thought you were asking about the need to know the cache line size. Not it's specific implementation as a function call. In which case, I agree, and am happy to use CONFIG_SYS_CACHE_LINE_SIZE, though I don't see any definitions of CONFIG_SYS_CACHE_LINE_SIZE in the entire U-Boot tree. What have I missed?
Thanks, Anton
Best regards,
Wolfgang Denk
-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Brain: an apparatus with which we think we think. - Ambrose Bierce

Dear Anton Staaf,
In message CAF6FioWM6MezMMDr6+i8tGOLG-En4TihtOCauPbH0o7vqHaf_A@mail.gmail.com you wrote:
No, it is definitely NOT needed for this purpose - we have been using CONFIG_SYS_CACHE_LINE_SIZE for more than a decade without problems, so I don't really understand why we now would need a new function for this?
Ahh, I misunderstood your question. I thought you were asking about the need to know the cache line size. Not it's specific implementation as a function call. In which case, I agree, and am happy to use CONFIG_SYS_CACHE_LINE_SIZE, though I don't see any definitions of CONFIG_SYS_CACHE_LINE_SIZE in the entire U-Boot tree. What have I missed?
I copy & pasted the name, incorrectly. It's CONFIG_SYS_CACHELINE_SIZE
You will find this being used:
-> grep -R CONFIG_SYS_CACHELINE_SIZE * | wc -l 276
Best regards,
Wolfgang Denk

On Wed, Aug 24, 2011 at 12:04 PM, Wolfgang Denk wd@denx.de wrote:
Dear Anton Staaf,
In message CAF6FioWM6MezMMDr6+i8tGOLG-En4TihtOCauPbH0o7vqHaf_A@mail.gmail.com you wrote:
No, it is definitely NOT needed for this purpose - we have been using CONFIG_SYS_CACHE_LINE_SIZE for more than a decade without problems, so I don't really understand why we now would need a new function for this?
Ahh, I misunderstood your question. I thought you were asking about the need to know the cache line size. Not it's specific implementation as a function call. In which case, I agree, and am happy to use CONFIG_SYS_CACHE_LINE_SIZE, though I don't see any definitions of CONFIG_SYS_CACHE_LINE_SIZE in the entire U-Boot tree. What have I missed?
I copy & pasted the name, incorrectly. It's CONFIG_SYS_CACHELINE_SIZE
You will find this being used:
-> grep -R CONFIG_SYS_CACHELINE_SIZE * | wc -l 276
:) Hah, thanks. I tried a number of permutations, but never the right one.
-Anton
Best regards,
Wolfgang Denk
-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de There are certain things men must do to remain men. -- Kirk, "The Ultimate Computer", stardate 4929.4

On Wed, 24 Aug 2011 11:25:42 -0700 Anton Staaf robotboy@google.com wrote:
On Wed, Aug 24, 2011 at 11:12 AM, Wolfgang Denk wd@denx.de wrote:
Dear Anton Staaf,
In message CAF6FioWBsjFa+QdFNHEQGO50eLoqZMc--Yt3-iYX6DWr=hcOrg@mail.gmail.com you wrote:
- get_dcache_line_size() can be simply defined as
#define get_dcache_line_size() CONFIG_SYS_CACHE_LINE_SIZE if we _really_ want to save a few bytes.
Actually I fail to understand why we would ever need get_dcache_line_size() in a boot loader.
It is required so that we can correctly allocate buffers that will be used by DMA engines to read or write data. The reason that these
No, it is definitely NOT needed for this purpose - we have been using CONFIG_SYS_CACHE_LINE_SIZE for more than a decade without problems, so I don't really understand why we now would need a new function for this?
Ok, so one problem solved :-).
Ahh, I misunderstood your question. I thought you were asking about the need to know the cache line size. Not it's specific implementation as a function call. In which case, I agree, and am happy to use CONFIG_SYS_CACHE_LINE_SIZE, though I don't see any definitions of CONFIG_SYS_CACHE_LINE_SIZE in the entire U-Boot tree. What have I missed?
There are some similar definitions: ./include/configs/atstk1006.h:#define CONFIG_SYS_DCACHE_LINESZ 32 ./include/configs/atngw100.h:#define CONFIG_SYS_DCACHE_LINESZ 32 ./include/configs/favr-32-ezkit.h:#define CONFIG_SYS_DCACHE_LINESZ 32
Thanks, Anton
Best regards,
Wolfgang Denk
-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Brain: an apparatus with which we think we think. - Ambrose Bierce
U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Regards, Lukasz Majewski

On Wed, Aug 24, 2011 at 12:18 PM, Lukasz Majewski majess1982@gmail.com wrote:
On Wed, 24 Aug 2011 11:25:42 -0700 Anton Staaf robotboy@google.com wrote:
On Wed, Aug 24, 2011 at 11:12 AM, Wolfgang Denk wd@denx.de wrote:
Dear Anton Staaf,
In message CAF6FioWBsjFa+QdFNHEQGO50eLoqZMc--Yt3-iYX6DWr=hcOrg@mail.gmail.com you wrote:
- get_dcache_line_size() can be simply defined as
#define get_dcache_line_size() CONFIG_SYS_CACHE_LINE_SIZE if we _really_ want to save a few bytes.
Actually I fail to understand why we would ever need get_dcache_line_size() in a boot loader.
It is required so that we can correctly allocate buffers that will be used by DMA engines to read or write data. The reason that these
No, it is definitely NOT needed for this purpose - we have been using CONFIG_SYS_CACHE_LINE_SIZE for more than a decade without problems, so I don't really understand why we now would need a new function for this?
Ok, so one problem solved :-).
Ahh, I misunderstood your question. I thought you were asking about the need to know the cache line size. Not it's specific implementation as a function call. In which case, I agree, and am happy to use CONFIG_SYS_CACHE_LINE_SIZE, though I don't see any definitions of CONFIG_SYS_CACHE_LINE_SIZE in the entire U-Boot tree. What have I missed?
There are some similar definitions: ./include/configs/atstk1006.h:#define CONFIG_SYS_DCACHE_LINESZ 32 ./include/configs/atngw100.h:#define CONFIG_SYS_DCACHE_LINESZ 32 ./include/configs/favr-32-ezkit.h:#define CONFIG_SYS_DCACHE_LINESZ 32
I would assume that these should be changed to the one mentioned by Wolfgang. But this still leaves us with a question of how to make a simple, easy to use macro for allocating cache line size aligned stack buffers. I'll work on that some more and see if I can come up with something.
Thanks, Anton
Thanks, Anton
Best regards,
Wolfgang Denk
-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Brain: an apparatus with which we think we think. - Ambrose Bierce
U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Regards, Lukasz Majewski

On Wed, Aug 24, 2011 at 1:13 PM, Anton Staaf robotboy@google.com wrote:
On Wed, Aug 24, 2011 at 12:18 PM, Lukasz Majewski majess1982@gmail.com wrote:
On Wed, 24 Aug 2011 11:25:42 -0700 Anton Staaf robotboy@google.com wrote:
On Wed, Aug 24, 2011 at 11:12 AM, Wolfgang Denk wd@denx.de wrote:
Dear Anton Staaf,
In message CAF6FioWBsjFa+QdFNHEQGO50eLoqZMc--Yt3-iYX6DWr=hcOrg@mail.gmail.com you wrote:
> 4. get_dcache_line_size() can be simply defined as > #define get_dcache_line_size() CONFIG_SYS_CACHE_LINE_SIZE if we > _really_ want to save a few bytes.
Actually I fail to understand why we would ever need get_dcache_line_size() in a boot loader.
It is required so that we can correctly allocate buffers that will be used by DMA engines to read or write data. The reason that these
No, it is definitely NOT needed for this purpose - we have been using CONFIG_SYS_CACHE_LINE_SIZE for more than a decade without problems, so I don't really understand why we now would need a new function for this?
Ok, so one problem solved :-).
Ahh, I misunderstood your question. I thought you were asking about the need to know the cache line size. Not it's specific implementation as a function call. In which case, I agree, and am happy to use CONFIG_SYS_CACHE_LINE_SIZE, though I don't see any definitions of CONFIG_SYS_CACHE_LINE_SIZE in the entire U-Boot tree. What have I missed?
There are some similar definitions: ./include/configs/atstk1006.h:#define CONFIG_SYS_DCACHE_LINESZ 32 ./include/configs/atngw100.h:#define CONFIG_SYS_DCACHE_LINESZ 32 ./include/configs/favr-32-ezkit.h:#define CONFIG_SYS_DCACHE_LINESZ 32
I would assume that these should be changed to the one mentioned by Wolfgang. But this still leaves us with a question of how to make a simple, easy to use macro for allocating cache line size aligned stack buffers. I'll work on that some more and see if I can come up with something.
OK, better late than never, I've run down all of the solutions I can think of. Below are three solutions and one non-solution, how they would be used, and what I see as pro's and con's for each.
1) Mikes's macro
#define DMA_ALIGN_SIZE(size) \ (((size) + CONFIG_SYS_CACHELINE_SIZE - 1)
#define DMA_DECLARE_BUFFER(type, name, size) \ void __##name[DMA_ALIGN_SIZE(size * sizeof(type))]; \ type * name = __##name & ~(CONFIG_SYS_CACHELINE_SIZE - 1));
DMA_DECLARE_BUFFER(int, buffer, 100);
Pros: Doesn't use alloca, doesn't use malloc and doesn't use new GCC alignment attribute abilities. This makes it the most portable, and the most supported solution.
Cons: It's a pretty ugly macro that obfuscates the creation of multiple variables. Thus I would say it falls into the macro magic category.
It looks like this is already working it's way into a patch. So the rest of my comments below may be moot.
2) Use alloca wrapped in a macro:
#define alloca_cacheline_alligned(size) alloca(size + CONFIG_SYS_CACHELINE_SIZE - 1) & ~(CONFIG_SYS_CACHELINE_SIZE - 1)
int * buffer = alloca_cacheline_alligned(100 * sizeof(int));
Pros: It is fairly obvious what the above code is intended to do, even to someone that doesn't know the underlying implementation of the macro.
Cons: Wolfgang does not want alloca in the U-Boot codebase.
I would like to know, mainly for my education, why you do not want alloca, but are OK with dynamic array sizing (as in the first solution above). My understanding is that they pretty much equate to the same thing. What is it about alloca that is objectionable?
3) Use GCC specific alignment attribute:
#define CACHLINE_ALIGNED __attribute__ ((aligned (CONFIG_SYS_CACHELINE_SIZE)))
int buffer[100] CACHELINE_ALIGNED;
Pros: The declaration of the buffer is even simpler and more obvious, no use of alloca at all.
Cons: This doesn't work in any version of GCC before October 2010. Meaning that it probably doesn't work in whatever compiler you're using.
It's really too bad that this isn't a usable solution. I suppose that we could switch to it at some point when we expect U-Boot to only be compiled by versions of GCC that support this. By the way, the failure mode here is pretty bad. If you compile the above code with an older GCC it will silently fail to align the variable. :(
4) Use memalign and free:
int * buffer = memalign(CONFIG_SYS_CACHELINE_SIZE, 100 * sizeof(int));
free(buffer);
Pros: portable.
Cons: Heap instead of Stack allocation. The result is that it's slower, and requires more manual management of buffers. Also, Wolfgang is opposed to this solution.
I think I agree with this not being a great solution. I do wonder if it would be useful to consider a dedicated buffer management API that could be used to allocate and free cache line aligned power of two buffers. Perhaps something like a buddy heap for simplicity.
Thanks, Anton
Thanks, Anton
Thanks, Anton
Best regards,
Wolfgang Denk
-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Brain: an apparatus with which we think we think. - Ambrose Bierce
U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Regards, Lukasz Majewski

Dear Anton Staaf,
In message CAF6FioWhhqbVQWWPrusb69S2mpfBtoiazxc8x56bkogJmzXZ3g@mail.gmail.com you wrote:
I would like to know, mainly for my education, why you do not want alloca, but are OK with dynamic array sizing (as in the first solution above). My understanding is that they pretty much equate to the same thing. What is it about alloca that is objectionable?
First, I've got bitten by alloca() before, when GCC misbehaved. Admitted, that was in some _very_ old version, but such experience sticks.
Second, the behaviour of alloca() depends on a number of non-obvious settings and compiler flags, and results are not always obvious. rom the man page:
Notes on the GNU Version Normally, gcc(1) translates calls to alloca() with inlined code. This is not done when either the -ansi, -std=c89, -std=c99, or the -fno-builtin option is given (and the header <alloca.h> is not included). But beware! By default the glibc version of <stdlib.h> includes <alloca.h> and that contains the line:
#define alloca(size) __builtin_alloca (size)
with messy consequences if one has a private version of this function.
This is nothing I want to have in any production software. OK, you may argument that U-Boot has a strictly controlled environment, but anyway.
Third, the man page says:
NOTES
The alloca() function is machine- and compiler-dependent. For certain applications, its use can improve efficiency compared to the use of malloc(3) plus free(3). In certain cases, it can also simplify memory deallocation in applications that use longjmp(3) or siglongjmp(3). Otherwise, its use is discouraged.
In my opinion, U-Boot falls into the "otherwise" category...
I think I agree with this not being a great solution. I do wonder if it would be useful to consider a dedicated buffer management API that could be used to allocate and free cache line aligned power of two buffers. Perhaps something like a buddy heap for simplicity.
The longer I read this thread, the less frightening Mikes's macro becomes (compared tho the alternatives).
Best regards,
Wolfgang Denk

On Mon, Aug 29, 2011 at 1:35 PM, Wolfgang Denk wd@denx.de wrote:
Dear Anton Staaf,
In message CAF6FioWhhqbVQWWPrusb69S2mpfBtoiazxc8x56bkogJmzXZ3g@mail.gmail.com you wrote:
I would like to know, mainly for my education, why you do not want alloca, but are OK with dynamic array sizing (as in the first solution above). My understanding is that they pretty much equate to the same thing. What is it about alloca that is objectionable?
First, I've got bitten by alloca() before, when GCC misbehaved. Admitted, that was in some _very_ old version, but such experience sticks.
Very fair, I have programming constructs I avoid for the same reason. *cough* exceptions *cough*
Second, the behaviour of alloca() depends on a number of non-obvious settings and compiler flags, and results are not always obvious. rom the man page:
Notes on the GNU Version Normally, gcc(1) translates calls to alloca() with inlined code. This is not done when either the -ansi, -std=c89, -std=c99, or the -fno-builtin option is given (and the header <alloca.h> is not included). But beware! By default the glibc version of <stdlib.h> includes <alloca.h> and that contains the line:
#define alloca(size) __builtin_alloca (size)
with messy consequences if one has a private version of this function.
This is nothing I want to have in any production software. OK, you may argument that U-Boot has a strictly controlled environment, but anyway.
Yes, I don't expect that we will allow a custom implementation of alloca into U-Boot, but it's a good point that the compiler treats it in strange ways. One alternative here would be to always call __builtin_alloca... This would probably have other issues though.
Third, the man page says:
NOTES
The alloca() function is machine- and compiler-dependent. For certain applications, its use can improve efficiency compared to the use of malloc(3) plus free(3). In certain cases, it can also simplify memory deallocation in applications that use longjmp(3) or siglongjmp(3). Otherwise, its use is discouraged.
In my opinion, U-Boot falls into the "otherwise" category...
Yup, and another man page in the "Bugs" section says:
The alloca() function is machine and compiler dependent. On many systems its implementation is buggy. Its use is discouraged.
Which seems even worse. So, OK, I think we've run this to the ground. And we can always find and replace all of the uses of Mikes macro easily if we think of something better later.
Thank you, Anton
I think I agree with this not being a great solution. I do wonder if it would be useful to consider a dedicated buffer management API that could be used to allocate and free cache line aligned power of two buffers. Perhaps something like a buddy heap for simplicity.
The longer I read this thread, the less frightening Mikes's macro becomes (compared tho the alternatives).
Best regards,
Wolfgang Denk
-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de "Though a program be but three lines long, someday it will have to be maintained."
- The Tao of Programming

On 08/29/2011 03:12 PM, Anton Staaf wrote:
- Mikes's macro
#define DMA_ALIGN_SIZE(size) \ (((size) + CONFIG_SYS_CACHELINE_SIZE - 1)
#define DMA_DECLARE_BUFFER(type, name, size) \ void __##name[DMA_ALIGN_SIZE(size * sizeof(type))]; \ type * name = __##name & ~(CONFIG_SYS_CACHELINE_SIZE - 1));
DMA_DECLARE_BUFFER(int, buffer, 100);
This doesn't compile, and it tries to round the buffer down below its starting point.
After fixing the more obvious issues, I get "error: initializer element is not constant".
There might be no way to express and-by-constant as a relocation.
You could set the pointer at runtime, though, and remove some of the macrification:
#define DMA_ALIGN_SIZE(size) \ ((size) + CONFIG_SYS_CACHELINE_SIZE - 1) #define DMA_ALIGN_ADDR(addr) \ (DMA_ALIGN_SIZE(addr) & (CONFIG_SYS_CACHELINE_SIZE - 1))
int buffer_unaligned[DMA_ALIGN_SIZE(100)]; int *buffer;
some_init_func() { buffer = (int *)(DMA_ALIGN_ADDR((uintptr_t)buffer_unaligned)); }
- Use GCC specific alignment attribute:
#define CACHLINE_ALIGNED __attribute__ ((aligned (CONFIG_SYS_CACHELINE_SIZE)))
int buffer[100] CACHELINE_ALIGNED;
Pros: The declaration of the buffer is even simpler and more obvious, no use of alloca at all.
Cons: This doesn't work in any version of GCC before October 2010. Meaning that it probably doesn't work in whatever compiler you're using.
It's really too bad that this isn't a usable solution. I suppose that we could switch to it at some point when we expect U-Boot to only be compiled by versions of GCC that support this. By the way, the failure mode here is pretty bad. If you compile the above code with an older GCC it will silently fail to align the variable. :(
If the decision is made to depend on newer compilers, U-Boot could check for it and #error out if the compiler is too old (possibly just in the files that depend on this feature).
-Scott

On Mon, Aug 29, 2011 at 1:47 PM, Scott Wood scottwood@freescale.com wrote:
On 08/29/2011 03:12 PM, Anton Staaf wrote:
- Mikes's macro
#define DMA_ALIGN_SIZE(size) \ (((size) + CONFIG_SYS_CACHELINE_SIZE - 1)
#define DMA_DECLARE_BUFFER(type, name, size) \ void __##name[DMA_ALIGN_SIZE(size * sizeof(type))]; \ type * name = __##name & ~(CONFIG_SYS_CACHELINE_SIZE - 1));
DMA_DECLARE_BUFFER(int, buffer, 100);
This doesn't compile, and it tries to round the buffer down below its starting point.
You are correct. I wrote that one as a modification of mikes initial proposal. I should have caught the incorrect rounding when I did. The patch that Lukasz sent titled "dcache: Dcache line size aligned stack buffer allocation" has a correct implementation.
After fixing the more obvious issues, I get "error: initializer element is not constant".
I think this requires the use of -std=c99 or GCC extensions. More specifics can be found here: http://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html
There might be no way to express and-by-constant as a relocation.
You could set the pointer at runtime, though, and remove some of the macrification:
#define DMA_ALIGN_SIZE(size) \ ((size) + CONFIG_SYS_CACHELINE_SIZE - 1) #define DMA_ALIGN_ADDR(addr) \ (DMA_ALIGN_SIZE(addr) & (CONFIG_SYS_CACHELINE_SIZE - 1))
int buffer_unaligned[DMA_ALIGN_SIZE(100)]; int *buffer;
some_init_func() { buffer = (int *)(DMA_ALIGN_ADDR((uintptr_t)buffer_unaligned)); }
:) This was one of my suggestions earlier on a different thread. It was rejected there, I believe because it makes things less clear.
- Use GCC specific alignment attribute:
#define CACHLINE_ALIGNED __attribute__ ((aligned (CONFIG_SYS_CACHELINE_SIZE)))
int buffer[100] CACHELINE_ALIGNED;
Pros: The declaration of the buffer is even simpler and more obvious, no use of alloca at all.
Cons: This doesn't work in any version of GCC before October 2010. Meaning that it probably doesn't work in whatever compiler you're using.
It's really too bad that this isn't a usable solution. I suppose that we could switch to it at some point when we expect U-Boot to only be compiled by versions of GCC that support this. By the way, the failure mode here is pretty bad. If you compile the above code with an older GCC it will silently fail to align the variable. :(
If the decision is made to depend on newer compilers, U-Boot could check for it and #error out if the compiler is too old (possibly just in the files that depend on this feature).
Yup, I think whatever macro we come up with we can simplify it eventually using the GCC attribute solution and we can make the macro conditional on compiler version.
Thanks, Anton
-Scott

On 08/29/2011 03:58 PM, Anton Staaf wrote:
On Mon, Aug 29, 2011 at 1:47 PM, Scott Wood scottwood@freescale.com wrote:
On 08/29/2011 03:12 PM, Anton Staaf wrote:
- Mikes's macro
#define DMA_ALIGN_SIZE(size) \ (((size) + CONFIG_SYS_CACHELINE_SIZE - 1)
#define DMA_DECLARE_BUFFER(type, name, size) \ void __##name[DMA_ALIGN_SIZE(size * sizeof(type))]; \ type * name = __##name & ~(CONFIG_SYS_CACHELINE_SIZE - 1));
DMA_DECLARE_BUFFER(int, buffer, 100);
This doesn't compile, and it tries to round the buffer down below its starting point.
You are correct. I wrote that one as a modification of mikes initial proposal. I should have caught the incorrect rounding when I did. The patch that Lukasz sent titled "dcache: Dcache line size aligned stack buffer allocation" has a correct implementation.
With the version in that patch I get the slightly different "error: initializer element is not computable at load time". Seems like whether you cast the address to (type *) or (void *) determines which error you get. This is with GCC 4.5.1 (powerpc) and 4.6.0 (x86). Maybe it's arch-dependent, based on available relocation types.
Also, shouldn't the array be of type "char" rather than "char *"?
How do you make the declaration static?
After fixing the more obvious issues, I get "error: initializer element is not constant".
I think this requires the use of -std=c99 or GCC extensions. More specifics can be found here: http://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html
-std=c99 doesn't help.
The problem isn't the array itself, it's the pointer initializer.
You could set the pointer at runtime, though, and remove some of the macrification:
#define DMA_ALIGN_SIZE(size) \ ((size) + CONFIG_SYS_CACHELINE_SIZE - 1) #define DMA_ALIGN_ADDR(addr) \ (DMA_ALIGN_SIZE(addr) & (CONFIG_SYS_CACHELINE_SIZE - 1))
int buffer_unaligned[DMA_ALIGN_SIZE(100)]; int *buffer;
some_init_func() { buffer = (int *)(DMA_ALIGN_ADDR((uintptr_t)buffer_unaligned)); }
:) This was one of my suggestions earlier on a different thread. It was rejected there, I believe because it makes things less clear.
So, the complex macro is bad because it obscures things, and this version is bad because it doesn't? :-)
-Scott

On Mon, Aug 29, 2011 at 2:23 PM, Scott Wood scottwood@freescale.com wrote:
On 08/29/2011 03:58 PM, Anton Staaf wrote:
On Mon, Aug 29, 2011 at 1:47 PM, Scott Wood scottwood@freescale.com wrote:
On 08/29/2011 03:12 PM, Anton Staaf wrote:
- Mikes's macro
#define DMA_ALIGN_SIZE(size) \ (((size) + CONFIG_SYS_CACHELINE_SIZE - 1)
#define DMA_DECLARE_BUFFER(type, name, size) \ void __##name[DMA_ALIGN_SIZE(size * sizeof(type))]; \ type * name = __##name & ~(CONFIG_SYS_CACHELINE_SIZE - 1));
DMA_DECLARE_BUFFER(int, buffer, 100);
This doesn't compile, and it tries to round the buffer down below its starting point.
You are correct. I wrote that one as a modification of mikes initial proposal. I should have caught the incorrect rounding when I did. The patch that Lukasz sent titled "dcache: Dcache line size aligned stack buffer allocation" has a correct implementation.
With the version in that patch I get the slightly different "error: initializer element is not computable at load time". Seems like whether you cast the address to (type *) or (void *) determines which error you get. This is with GCC 4.5.1 (powerpc) and 4.6.0 (x86). Maybe it's arch-dependent, based on available relocation types.
Also, shouldn't the array be of type "char" rather than "char *"?
Yes, you are correct, it should be a char. That may be the problem.
How do you make the declaration static?
you can't with this version of the macro. Are there cases where you need the buffer to be static?
Thanks, Anton
After fixing the more obvious issues, I get "error: initializer element is not constant".
I think this requires the use of -std=c99 or GCC extensions. More specifics can be found here: http://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html
-std=c99 doesn't help.
The problem isn't the array itself, it's the pointer initializer.
You could set the pointer at runtime, though, and remove some of the macrification:
#define DMA_ALIGN_SIZE(size) \ ((size) + CONFIG_SYS_CACHELINE_SIZE - 1) #define DMA_ALIGN_ADDR(addr) \ (DMA_ALIGN_SIZE(addr) & (CONFIG_SYS_CACHELINE_SIZE - 1))
int buffer_unaligned[DMA_ALIGN_SIZE(100)]; int *buffer;
some_init_func() { buffer = (int *)(DMA_ALIGN_ADDR((uintptr_t)buffer_unaligned)); }
:) This was one of my suggestions earlier on a different thread. It was rejected there, I believe because it makes things less clear.
So, the complex macro is bad because it obscures things, and this version is bad because it doesn't? :-)
-Scott

On 08/29/2011 04:54 PM, Anton Staaf wrote:
On Mon, Aug 29, 2011 at 2:23 PM, Scott Wood scottwood@freescale.com wrote:
With the version in that patch I get the slightly different "error: initializer element is not computable at load time". Seems like whether you cast the address to (type *) or (void *) determines which error you get. This is with GCC 4.5.1 (powerpc) and 4.6.0 (x86). Maybe it's arch-dependent, based on available relocation types.
Also, shouldn't the array be of type "char" rather than "char *"?
Yes, you are correct, it should be a char. That may be the problem.
It didn't make a difference.
How do you make the declaration static?
you can't with this version of the macro. Are there cases where you need the buffer to be static?
I think you'd want it to be static more often than not.
-Scott

On Mon, Aug 29, 2011 at 3:03 PM, Scott Wood scottwood@freescale.com wrote:
On 08/29/2011 04:54 PM, Anton Staaf wrote:
On Mon, Aug 29, 2011 at 2:23 PM, Scott Wood scottwood@freescale.com wrote:
With the version in that patch I get the slightly different "error: initializer element is not computable at load time". Seems like whether you cast the address to (type *) or (void *) determines which error you get. This is with GCC 4.5.1 (powerpc) and 4.6.0 (x86). Maybe it's arch-dependent, based on available relocation types.
Also, shouldn't the array be of type "char" rather than "char *"?
Yes, you are correct, it should be a char. That may be the problem.
It didn't make a difference.
Hmm, I haven't played with this code yet, so I can't say for sure. But it would certainly throw a wrench in the works if Mike's solution was also unavailable on some platforms. By chance, you aren't also adding static here are you? I could certainly see that that wouldn't work.
How do you make the declaration static?
you can't with this version of the macro. Are there cases where you need the buffer to be static?
I think you'd want it to be static more often than not.
If the buffer is allocated at file scope, then yes, we would want it to be static. But not at function scope. It would no longer be allocated on the stack in that case, and that was frowned upon earlier.
Thanks, Anton
-Scott

On 08/29/2011 05:49 PM, Anton Staaf wrote:
How do you make the declaration static?
you can't with this version of the macro. Are there cases where you need the buffer to be static?
I think you'd want it to be static more often than not.
If the buffer is allocated at file scope, then yes, we would want it to be static. But not at function scope. It would no longer be allocated on the stack in that case, and that was frowned upon earlier.
Ah, that's the issue. I was trying to declare it at file scope. :-P
-Scott

On Mon, Aug 29, 2011 at 4:01 PM, Scott Wood scottwood@freescale.com wrote:
On 08/29/2011 05:49 PM, Anton Staaf wrote:
How do you make the declaration static?
you can't with this version of the macro. Are there cases where you need the buffer to be static?
I think you'd want it to be static more often than not.
If the buffer is allocated at file scope, then yes, we would want it to be static. But not at function scope. It would no longer be allocated on the stack in that case, and that was frowned upon earlier.
Ah, that's the issue. I was trying to declare it at file scope. :-P
Thank goodness, I was trying to figure out how that couldn't be a valid GCC complaint. :)
Thanks, Anton
-Scott

On Tuesday, August 23, 2011 16:12:03 Wolfgang Denk wrote:
Anton Staaf wrote:
what about adding a new func like: #define dma_buffer_alloca(size)
I generally avoid large allocations on the stack, they can confuse virtual stack management and blow out small embedded stacks. But neither of these are really a problem for most U-Boot targets. Are you thinking something like:
#define dma_buffer_alloca(size) alloca(size + get_dcache_line_size() -
- & ~(get_dcache_line_size() - 1);
I don't think I will accept any alloca() based code into mainline.
hmm, for some reason i thought we were using alloca. but i dont see any consumers in u-boot itself. -mike

Hi Mike,
On Mon, 22 Aug 2011 12:08:42 -0400 Mike Frysinger vapier@gentoo.org wrote:
On Monday, August 22, 2011 03:29:52 Lukasz Majewski wrote:
On Fri, 19 Aug 2011 11:35:50 -0400 Mike Frysinger wrote:
On Friday, August 19, 2011 11:28:18 Lukasz Majewski wrote:
On Fri, 19 Aug 2011 09:57:10 -0400 Mike Frysinger wrote:
On Friday, August 19, 2011 05:25:13 Lukasz Majewski wrote: also, what is the code size increase with your patch ?
Code size overhead (s5p_goni target): Without proposed changes: 167928 B (u-boot.bin) With changes: 168208 B (u-boot.bin)
Delta: 280 B
np if it gives significant (more than system noise) speedups. any details on that ?
No tests performed yet. The goal of those patches is to preserve the MMC subsystem functionality when dcache is enabled (the ext_csd[512] corruption is observed with d-cache enabled).
so you're papering over a bug in some controller's cache handling ? shouldnt you fix the controller in question by having it flush its caches ? aligning random buffers to make cache issues "go away" isnt the right way for anything. -mike
Please look into the following patch: http://patchwork.ozlabs.org/patch/110576/
It seems that only flushing/invalidating buffers is not enough.
participants (7)
-
Anton Staaf
-
Lukasz Majewski
-
Lukasz Majewski
-
Marek Vasut
-
Mike Frysinger
-
Scott Wood
-
Wolfgang Denk