
On Thursday 26 April 2012 06:34:43 Thierry Reding wrote:
For reference, see sd_change_freq() in drivers/mmc/mmc.c.
yes, that shows what we're talking about. int sd_change_freq(struct mmc *mmc) { ... stack vars ... int err; ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2); struct mmc_data data; int timeout; ... stack vars ... ... data.dest = (char *)scr; data.blocksize = 8; data.blocks = 1; data.flags = MMC_DATA_READ; err = mmc_send_cmd(mmc, &cmd, &data); ...
static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, struct mmc_data *data) { ... if (data->flags & MMC_DATA_READ) { if ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p " "may fail\n", data->dest); invalidate_dcache_range((ulong)data->dest, (ulong)data->dest + data->blocks * data->blocksize); } ...
so what invalidate_dcache_range() will invalidate is from &scr and 8 bytes after that. the higher layers make sure &scr is aligned properly, but not that it spans a full cache line. so if the stack layout is: [random stuff] 0x100 [scr[0]] 0x104 [scr[1]] 0x108 [err] 0x10a [timeout] [more stuff]
this implicitly invalidates [err] and [timeout] and more stuff. by you forcibly rounding up the length, you no longer get a warning, but you still (silently) blew away those variables from cache possibly losing changes that weren't written back to external memory yet.
I worry that what you have done will just introduce obscure bugs, since we will potentially invalidate stack variants (for example) and lose their values.
With the case problems, we are trying to fix them at source (i.e. at the higher level).
I understand. The question then becomes how to best fix the passed in size. Always passing the size of a complete cacheline in the SEND_SCR command doesn't seem like a better option because it may have other implications on other hardware.
i proposed this in a previous thread, but i think Simon missed it.
perhaps the warning in the core code could be dropped and all your changes in fringe code obsoleted (such as these USB patches): when it detects that an address is starting on an unaligned boundary, flush that line first, and then let it be invalidated. accordingly, when the end length is on an unaligned boundary, do the same flush-then-invalidate step. this should also make things work without a (significant) loss in performance. if anything, i suspect the overhead of doing runtime buffer size calculations and manually aligning pointers (which is what ALLOC_CACHE_ALIGN_BUFFER does) is a wash compared to partially flushing cache lines in the core ...
Simon: what do you think of this last idea ? -mike