Re: [U-Boot] [linux-sunxi] Uboot error: address not aligned in v7_dcache_inval_range

On Sun, 2014-04-13 at 23:45 -0400, Shixin Zeng wrote:
Hi,
I compiled the current u-boot from https://github.com/jwrdegoede/u-boot-sunxi.git for cubieboard2, and wrote it to the SD card. I was trying to boot the kernel on my computer over network by tftp, however it failed when I ran "dhcp" or "tftp" command in uboot with a tons of:
ERROR: v7_dcache_inval_range - start address is not aligned - 0x7fb677e0 ERROR: v7_dcache_inval_range - stop address is not aligned - 0x7fb67820
I'm seeing this on Cubieboard2 and Cubietruck. It appears to be down to a change to the upstream designware driver:
commit 50b0df814b0f75c08a3d45a017016a75af3edb5d Author: Alexey Brodkin Alexey.Brodkin@synopsys.com Date: Wed Jan 22 20:49:09 2014 +0400
net/designware: make driver compatible with data cache
Up until now this driver only worked with data cache disabled. To make it work with enabled data cache following changes were required:
* Flush Tx/Rx buffer descriptors their modification * Invalidate Tx/Rx buffer descriptors before reading its values * Flush cache for data passed from CPU to GMAC * Invalidate cache for data passed from GMAC to CPU
http://git.denx.de/?p=u-boot.git;a=commit;h=50b0df814b0f75c08a3d45a017016a75...
I suppose this was only tested on some architecture which allows DMA flush/invaidation at a fairly fine granularity (at least down to 4 byte boundaries)
Making sure that struct dw_eth_dev is DMA aligned helps with the invalidate of the descriptors in dw_eth_recv (see below) but with that the invalidate of the txrx_status field in dw_eth_send is still problematic -- the field is only 4 bytes, so although the descriptor is aligned the end is not.
Ian.
commit 8878d858ede12584b885fa9439f9093bf2186a90 Author: Ian Campbell ijc@hellion.org.uk Date: Sat Apr 19 14:16:04 2014 +0100
net/designware: ensure device private data is DMA aligned.
struct dw_eth_dev contains fields which are accessed via DMA, so make sure it is aligned to a dma boundary. Without this I see: ERROR: v7_dcache_inval_range - start address is not aligned - 0x7fb677e0
Signed-off-by: Ian Campbell ian.campbell@citrix.com
diff --git a/drivers/net/designware.c b/drivers/net/designware.c index 6ece479..1120f70 100644 --- a/drivers/net/designware.c +++ b/drivers/net/designware.c @@ -412,7 +412,8 @@ int designware_initialize(ulong base_addr, u32 interface) * Since the priv structure contains the descriptors which need a strict * buswidth alignment, memalign is used to allocate memory */ - priv = (struct dw_eth_dev *) memalign(16, sizeof(struct dw_eth_dev)); + priv = (struct dw_eth_dev *) memalign(ARCH_DMA_MINALIGN, + sizeof(struct dw_eth_dev)); if (!priv) { free(dev); return -ENOMEM;

struct dw_eth_dev contains fields which are accessed via DMA, so make sure it is aligned to a dma boundary. Without this I see: ERROR: v7_dcache_inval_range - start address is not aligned - 0x7fb677e0
Signed-off-by: Ian Campbell ian.campbell@citrix.com --- drivers/net/designware.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/designware.c b/drivers/net/designware.c index 6ece479..1120f70 100644 --- a/drivers/net/designware.c +++ b/drivers/net/designware.c @@ -412,7 +412,8 @@ int designware_initialize(ulong base_addr, u32 interface) * Since the priv structure contains the descriptors which need a strict * buswidth alignment, memalign is used to allocate memory */ - priv = (struct dw_eth_dev *) memalign(16, sizeof(struct dw_eth_dev)); + priv = (struct dw_eth_dev *) memalign(ARCH_DMA_MINALIGN, + sizeof(struct dw_eth_dev)); if (!priv) { free(dev); return -ENOMEM;

Dear Ian,
On Sat, 2014-04-19 at 14:52 +0100, Ian Campbell wrote:
struct dw_eth_dev contains fields which are accessed via DMA, so make sure it is aligned to a dma boundary. Without this I see: ERROR: v7_dcache_inval_range - start address is not aligned - 0x7fb677e0
Signed-off-by: Ian Campbell ian.campbell@citrix.com
drivers/net/designware.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/designware.c b/drivers/net/designware.c index 6ece479..1120f70 100644 --- a/drivers/net/designware.c +++ b/drivers/net/designware.c @@ -412,7 +412,8 @@ int designware_initialize(ulong base_addr, u32 interface) * Since the priv structure contains the descriptors which need a strict * buswidth alignment, memalign is used to allocate memory */
- priv = (struct dw_eth_dev *) memalign(16, sizeof(struct dw_eth_dev));
- priv = (struct dw_eth_dev *) memalign(ARCH_DMA_MINALIGN,
if (!priv) { free(dev); return -ENOMEM;sizeof(struct dw_eth_dev));
Thanks for this fix. It was a left-over from initially submitted driver and I missed this hard-coded item.
Still I haven't tried to execute this on the real board. Hope to do it soon but I don't expect any issues.
Regards, Alexey
Reviewed-by: Alexey Brodkin abrodkin@synopsys.com

On Saturday, April 19, 2014 at 03:52:20 PM, Ian Campbell wrote:
struct dw_eth_dev contains fields which are accessed via DMA, so make sure it is aligned to a dma boundary. Without this I see: ERROR: v7_dcache_inval_range - start address is not aligned - 0x7fb677e0
Signed-off-by: Ian Campbell ian.campbell@citrix.com
drivers/net/designware.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/designware.c b/drivers/net/designware.c index 6ece479..1120f70 100644 --- a/drivers/net/designware.c +++ b/drivers/net/designware.c @@ -412,7 +412,8 @@ int designware_initialize(ulong base_addr, u32 interface) * Since the priv structure contains the descriptors which need a strict * buswidth alignment, memalign is used to allocate memory */
- priv = (struct dw_eth_dev *) memalign(16, sizeof(struct dw_eth_dev));
- priv = (struct dw_eth_dev *) memalign(ARCH_DMA_MINALIGN,
sizeof(struct dw_eth_dev));
Acked-by: Marek Vasut marex@denx.de
Best regards, Marek Vasut

Some platforms cannot invalidate the cache at four byte intervals, so invalidate the entire descriptor.
Signed-off-by: Ian Campbell ijc@hellion.org.uk --- drivers/net/designware.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/net/designware.c b/drivers/net/designware.c index 1120f70..7d14cec 100644 --- a/drivers/net/designware.c +++ b/drivers/net/designware.c @@ -280,10 +280,13 @@ static int dw_eth_send(struct eth_device *dev, void *packet, int length) u32 desc_num = priv->tx_currdescnum; struct dmamacdescr *desc_p = &priv->tx_mac_descrtable[desc_num];
- /* Invalidate only "status" field for the following check */ - invalidate_dcache_range((unsigned long)&desc_p->txrx_status, - (unsigned long)&desc_p->txrx_status + - sizeof(desc_p->txrx_status)); + /* Strictly we only need to invalidate the "status" field for + * the following check, but on some platforms we cannot + * invalidate only 4 bytes, so invalidate the the whole thing + * which is known to be DMA aligned. */ + invalidate_dcache_range((unsigned long)desc_p, + (unsigned long)desc_p + + sizeof(struct dmamacdescr));
/* Check if the descriptor is owned by CPU */ if (desc_p->txrx_status & DESC_TXSTS_OWNBYDMA) {

Dear Ian,
On Sat, 2014-04-19 at 14:52 +0100, Ian Campbell wrote:
- /* Invalidate only "status" field for the following check */
- invalidate_dcache_range((unsigned long)&desc_p->txrx_status,
(unsigned long)&desc_p->txrx_status +
sizeof(desc_p->txrx_status));
/* Strictly we only need to invalidate the "status" field for
* the following check, but on some platforms we cannot
* invalidate only 4 bytes, so invalidate the the whole thing
* which is known to be DMA aligned. */
invalidate_dcache_range((unsigned long)desc_p,
(unsigned long)desc_p +
sizeof(struct dmamacdescr));
/* Check if the descriptor is owned by CPU */ if (desc_p->txrx_status & DESC_TXSTS_OWNBYDMA) {
Unfortunately I cannot recall exactly why I wanted to invalidate only "status" field.
Now looking at this code I may assume that I wanted to save some CPU cycles. Because:
1. We don't care about all other fields except "status". GMAC only changes "status" field when it resets "OWNED_BY_DMA" flag and all other fields CPU writes but not reads while sending packets.
2. We may save quite a few CPU cycles if only invalidating minimum amount of bytes (remember each read from external memory may cost 100s of cycles).
So I would advise:
1. Don't invalidate "sizeof(struct dmamacdescr)" but only "roundup(sizeof(desc_p->txrx_status), ARCH_DMA_MINALIGN))".
2. In the following lines implements rounding as well: ============ /* Flush data to be sent */ flush_dcache_range((unsigned long)desc_p->dmamac_addr, (unsigned long)desc_p->dmamac_addr + length); ============
We may be sure "desc_p->dmamac_addr" is properly aligned, but length could be not-aligned. So I'd replace "length" with "roundup(length, ARCH_DMA_MINALIGN)" as you did in 3rd patch.
3. Check carefully if there're other instances of probably unaligned cache operations. I erroneously didn't care about alignment on cache invalidation/flushing because my implementation of those cache operations deals with non-aligned start/end internally within invalidate/flush functions - which might be not that good even if it's convenient for me.
4. Why don't you squeeze all 3 patches in 1 and name it like "fix alignment issues with caches on some platforms"? Basically with all 3 patches you fix one and only issue and application of any one of those 3 patches doesn't solve your problem, right?
Regards, Alexey

On Thu, 2014-04-24 at 17:41 +0000, Alexey Brodkin wrote:
- Don't invalidate "sizeof(struct dmamacdescr)" but only
"roundup(sizeof(desc_p->txrx_status), ARCH_DMA_MINALIGN))".
OK. (Although given the realities of the real world values of ARCH_DMA_MINALIGN on every arch and the sizes of the structs & fields involved this isn't actually buying you anything at all)
- In the following lines implements rounding as well:
Will fix as well.
- Check carefully if there're other instances of probably unaligned
cache operations.
I'm not seeing any others, in practice or by eye-balling the code.
- Why don't you squeeze all 3 patches in 1 and name it like "fix
alignment issues with caches on some platforms"? Basically with all 3 patches you fix one and only issue and application of any one of those 3 patches doesn't solve your problem, right?
These are the issues as I discovered them one by one. I can fold them if you like but doing them separately will aid bisection if one of them turns out to be wrong in some way. As you prefer.
Ian.

Hi Ian,
On Thu, 2014-04-24 at 20:14 +0100, Ian Campbell wrote:
On Thu, 2014-04-24 at 17:41 +0000, Alexey Brodkin wrote:
- Don't invalidate "sizeof(struct dmamacdescr)" but only
"roundup(sizeof(desc_p->txrx_status), ARCH_DMA_MINALIGN))".
OK. (Although given the realities of the real world values of ARCH_DMA_MINALIGN on every arch and the sizes of the structs & fields involved this isn't actually buying you anything at all)
Well this particular structure is of size sizeof(uint32_t) * 4 = 16 bytes. And I may suppose that cache lines could be shorter than 16 bytes even though it could be pretty rare situation. So definitely not a big deal.
But since we're dealing with macros here all mentioned calculations will be done by pre-processor and execution performance won't be affected.
- In the following lines implements rounding as well:
Will fix as well.
- Check carefully if there're other instances of probably unaligned
cache operations.
I thought a bit more about this situation and now I'm not that sure if we need to align addresses we pass to cache invalidate/flush functions.
Because IMHO drivers shouldn't care about specifics of particular platform or architecture. Otherwise we'll need to patch each and every driver only for cache invalidate/flush functions. I looked how these functions are used in other drivers and see that in most of cases no additional alignment precautions were implemented. People just pass start and end addresses.
In its turn platform and architecture provides cache invalidate/flush functions implement its functionality depending on hardware specifics.
For example on architectures that may only flush/invalidate with granularity of 1 cache line cache invalidate/flush functions make sure to start processing from the start of the cache line to which start address falls and end processing when cache line where end address falls is processed.
I may assume that there're architectures that automatically understand from which cache line to start and at which line to stop processing.
But if your architecture requires cache line aligned addresses to be used for start/end addresses you may look for examples in ARC (http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/arc/cpu/arc700/cac...),, MIPS (http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/mips/cpu/mips32/cp...), SH (http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/sh/cpu/sh4/cache.c),
and what's interesting even implementation you use have semi-proper start/end addresses handling - http://git.denx.de/?p=u-boot.git;a=blob;f=arch/arm/lib/cache-pl310.c
Here's your invalidation procedure: ============ /* invalidate memory from start to stop-1 */ void v7_outer_cache_inval_range(u32 start, u32 stop) { /* PL310 currently supports only 32 bytes cache line */ u32 pa, line_size = 32;
/* * If start address is not aligned to cache-line do not * invalidate the first cache-line */ if (start & (line_size - 1)) { printf("ERROR: %s - start address is not aligned - 0x%08x\n", __func__, start); /* move to next cache line */ start = (start + line_size - 1) & ~(line_size - 1); }
/* * If stop address is not aligned to cache-line do not * invalidate the last cache-line */ if (stop & (line_size - 1)) { printf("ERROR: %s - stop address is not aligned - 0x%08x\n", __func__, stop); /* align to the beginning of this cache line */ stop &= ~(line_size - 1); }
for (pa = start; pa < stop; pa = pa + line_size) writel(pa, &pl310->pl310_inv_line_pa);
pl310_cache_sync(); } ============
1. I don't understand why start from the next cache line if start address is not aligned to cache line boundary? I'd say that you want to invalidate cache line that contains unaligned start address. Otherwise first bytes won't be invalidated, right?
2. Why do we throw _error_ message. I may understand if you emit _warning_ message in case of debug build (with DEBUG defined). Well in current implementation (see 1) it could be error because behavior is really dangerous. But if you start from correct cache line only warning in debug mode makes sense (IMHO).
3. Stop/end address in contrast might need to be extended depending on HW implementation (see above comment).
And here's your flush procedure: =========== void v7_outer_cache_flush_range(u32 start, u32 stop) { /* PL310 currently supports only 32 bytes cache line */ u32 pa, line_size = 32;
/* * Align to the beginning of cache-line - this ensures that * the first 5 bits are 0 as required by PL310 TRM */ start &= ~(line_size - 1);
for (pa = start; pa < stop; pa = pa + line_size) writel(pa, &pl310->pl310_clean_inv_line_pa);
pl310_cache_sync(); } ===========
Which looks very correct to me. I'm wondering if there was a reason to have so different implementation of functions that do very similar things.
So at this point I would ask you to modify cache invalidate function for your architecture. This way you prevent mentioned issues with other drivers.
I'm not seeing any others, in practice or by eye-balling the code.
- Why don't you squeeze all 3 patches in 1 and name it like "fix
alignment issues with caches on some platforms"? Basically with all 3 patches you fix one and only issue and application of any one of those 3 patches doesn't solve your problem, right?
These are the issues as I discovered them one by one. I can fold them if you like but doing them separately will aid bisection if one of them turns out to be wrong in some way. As you prefer.
Keeping in mind things written above I'd say that patches 2 & 3 are not needed at all, while patch 1 makes perfect sense and fixes an obvious issue.
Regards, Alexey

CCing the ARM custodian. Albert, what do you think of Alexey's comments below? Actually, having read it properly myself I think Alexey is confusing cache flushing with cache invalidation, I've left the CC in place though in case you have any thoughts on the matter.
On Fri, 2014-04-25 at 08:48 +0000, Alexey Brodkin wrote:
I thought a bit more about this situation and now I'm not that sure if we need to align addresses we pass to cache invalidate/flush functions.
Because IMHO drivers shouldn't care about specifics of particular platform or architecture. Otherwise we'll need to patch each and every driver only for cache invalidate/flush functions. I looked how these functions are used in other drivers and see that in most of cases no additional alignment precautions were implemented. People just pass start and end addresses.
In its turn platform and architecture provides cache invalidate/flush functions implement its functionality depending on hardware specifics.
For example on architectures that may only flush/invalidate with granularity of 1 cache line cache invalidate/flush functions make sure to start processing from the start of the cache line to which start address falls and end processing when cache line where end address falls is processed.
I may assume that there're architectures that automatically understand from which cache line to start and at which line to stop processing.
But if your architecture requires cache line aligned addresses to be used for start/end addresses you may look for examples in ARC (http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/arc/cpu/arc700/cac...),, MIPS (http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/mips/cpu/mips32/cp...), SH (http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/sh/cpu/sh4/cache.c),
and what's interesting even implementation you use have semi-proper start/end addresses handling - http://git.denx.de/?p=u-boot.git;a=blob;f=arch/arm/lib/cache-pl310.c
This is the driver for one particular ARM cache controller and not the one used for the SoC. In any case it does "proper" start/end handling only for cache flush operations, not cache invalidate.
Cache invalidate is a potentially destructive operation (throwing away data in the caches), having it operate on anything more than the precise region requested would be very surprising to almost anyone I think.
Here's your invalidation procedure:
/* invalidate memory from start to stop-1 */ void v7_outer_cache_inval_range(u32 start, u32 stop) { /* PL310 currently supports only 32 bytes cache line */ u32 pa, line_size = 32;
/* * If start address is not aligned to cache-line do not * invalidate the first cache-line */ if (start & (line_size - 1)) { printf("ERROR: %s - start address is not aligned - 0x%08x\n", __func__, start); /* move to next cache line */ start = (start + line_size - 1) & ~(line_size - 1); }
/* * If stop address is not aligned to cache-line do not * invalidate the last cache-line */ if (stop & (line_size - 1)) { printf("ERROR: %s - stop address is not aligned - 0x%08x\n", __func__, stop); /* align to the beginning of this cache line */ stop &= ~(line_size - 1); }
for (pa = start; pa < stop; pa = pa + line_size) writel(pa, &pl310->pl310_inv_line_pa);
pl310_cache_sync(); } ============
- I don't understand why start from the next cache line if start
address is not aligned to cache line boundary? I'd say that you want to invalidate cache line that contains unaligned start address. Otherwise first bytes won't be invalidated, right?
- Why do we throw _error_ message. I may understand if you emit
_warning_ message in case of debug build (with DEBUG defined). Well in current implementation (see 1) it could be error because behavior is really dangerous. But if you start from correct cache line only warning in debug mode makes sense (IMHO).
- Stop/end address in contrast might need to be extended depending on
HW implementation (see above comment).
And here's your flush procedure:
void v7_outer_cache_flush_range(u32 start, u32 stop) { /* PL310 currently supports only 32 bytes cache line */ u32 pa, line_size = 32;
/* * Align to the beginning of cache-line - this ensures that * the first 5 bits are 0 as required by PL310 TRM */ start &= ~(line_size - 1);
for (pa = start; pa < stop; pa = pa + line_size) writel(pa, &pl310->pl310_clean_inv_line_pa);
pl310_cache_sync(); } ===========
Which looks very correct to me. I'm wondering if there was a reason to have so different implementation of functions that do very similar things.
I think you are missing the important differences between a cache flush and a cache invalidate.
So at this point I would ask you to modify cache invalidate function for your architecture. This way you prevent mentioned issues with other drivers.
As I describe above, I don't think this would be at all wise.
I'm not seeing any others, in practice or by eye-balling the code.
- Why don't you squeeze all 3 patches in 1 and name it like "fix
alignment issues with caches on some platforms"? Basically with all 3 patches you fix one and only issue and application of any one of those 3 patches doesn't solve your problem, right?
These are the issues as I discovered them one by one. I can fold them if you like but doing them separately will aid bisection if one of them turns out to be wrong in some way. As you prefer.
Keeping in mind things written above I'd say that patches 2 & 3 are not needed at all, while patch 1 makes perfect sense and fixes an obvious issue.
Regards, Alexey

Dear Ian,
On Sun, 2014-04-27 at 19:47 +0100, Ian Campbell wrote:
This is the driver for one particular ARM cache controller and not the one used for the SoC. In any case it does "proper" start/end handling only for cache flush operations, not cache invalidate.
Cache invalidate is a potentially destructive operation (throwing away data in the caches), having it operate on anything more than the precise region requested would be very surprising to almost anyone I think.
...
I think you are missing the important differences between a cache flush and a cache invalidate.
IMHO cache invalidation and flush operations are sort of antipodes.
With invalidation you discard all data in corresponding line in cache and replace it with freshly read data from memory.
With flush you move cache line to corresponding memory location overriding previously existing values in memory.
So if you deal with 2 independent data fields which both share the same one cache line it's potentially dangerous to do both flush and invalidate of this cache line.
In case of MMU utilization we have a luxury of uncached access, so we may safely access control structures in memory with granularity which is available for this particular CPU. This is AFAIK drivers deal with buffer descriptors in Linux kernel.
In case of U-Boot where we prefer to keep things simple we don't use MMU. So no generic way for cache bypassing. Still some architectures like ARC700 have special instructions for accessing memory bypassing cache but I prever to not use them and keep sources platform-independent.
And in this situation IMHO the only safe solution could be in proper design of data layout. In other words we need to keep independent data blocks aligned to cache line.
And as you may see from "designware.h" buffer descriptor structure is aligned: ============== struct dmamacdescr { u32 txrx_status; u32 dmamac_cntl; void *dmamac_addr; struct dmamacdescr *dmamac_next; } __aligned(ARCH_DMA_MINALIGN); ==============
Regards, Alexey

On Mon, 2014-04-28 at 12:05 +0000, Alexey Brodkin wrote:
And in this situation IMHO the only safe solution could be in proper design of data layout. In other words we need to keep independent data blocks aligned to cache line.
And as you may see from "designware.h" buffer descriptor structure is aligned:
There's no point in taking all this care if you then go and flush subfields, as the driver does, since they are not necessarily going to have the required alignment. That was the entire point of this patch!
I'm going to do the roundup thing you asked for, even though it seems like a pointless optimisation to me given the context.
============== struct dmamacdescr { u32 txrx_status; u32 dmamac_cntl; void *dmamac_addr; struct dmamacdescr *dmamac_next; } __aligned(ARCH_DMA_MINALIGN); ==============
Regards, Alexey

- Don't invalidate "sizeof(struct dmamacdescr)" but only
"roundup(sizeof(desc_p->txrx_status), ARCH_DMA_MINALIGN))".
I'm not sure I like this: if ARCH_DMA_MINALIGN is "too large" and ends up invalidating more than the struct, it could be an error, so it's safer to ask it to invalidate the struct (which we know can be safely invalidates).
If invalidate_dcache_range is used "often", then I'd suggest to change its API so it receives 2 bounds: the one that has to be invalidated and the surrounding one that can safely be invalidated.
Stefan

Signed-off-by: Ian Campbell ijc@hellion.org.uk --- drivers/net/designware.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/designware.c b/drivers/net/designware.c index 7d14cec..30446d3 100644 --- a/drivers/net/designware.c +++ b/drivers/net/designware.c @@ -354,7 +354,7 @@ static int dw_eth_recv(struct eth_device *dev) /* Invalidate received data */ invalidate_dcache_range((unsigned long)desc_p->dmamac_addr, (unsigned long)desc_p->dmamac_addr + - length); + roundup(length, ARCH_DMA_MINALIGN));
NetReceive(desc_p->dmamac_addr, length);

On Sat, Apr 19, 2014 at 9:52 AM, Ian Campbell ijc@hellion.org.uk wrote:
Signed-off-by: Ian Campbell ijc@hellion.org.uk
drivers/net/designware.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/designware.c b/drivers/net/designware.c index 7d14cec..30446d3 100644 --- a/drivers/net/designware.c +++ b/drivers/net/designware.c @@ -354,7 +354,7 @@ static int dw_eth_recv(struct eth_device *dev) /* Invalidate received data */ invalidate_dcache_range((unsigned long)desc_p->dmamac_addr, (unsigned long)desc_p->dmamac_addr
length);
roundup(length,
ARCH_DMA_MINALIGN));
NetReceive(desc_p->dmamac_addr, length);
With these three patches, I don't see
v7_dcache_inval_range errors anymore . Thanks!
-- 1.9.0
-- You received this message because you are subscribed to the Google Groups "linux-sunxi" group. To unsubscribe from this group and stop receiving emails from it, send an email to linux-sunxi+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.

On Sat, Apr 19, 2014 at 9:30 AM, Ian Campbell ijc@hellion.org.uk wrote:
On Sun, 2014-04-13 at 23:45 -0400, Shixin Zeng wrote:
Hi,
I compiled the current u-boot from https://github.com/jwrdegoede/u-boot-sunxi.git for cubieboard2, and wrote it to the SD card. I was trying to boot the kernel on my computer over network by tftp, however it failed when I ran "dhcp" or "tftp" command in uboot with a tons of:
ERROR: v7_dcache_inval_range - start address is not aligned - 0x7fb677e0 ERROR: v7_dcache_inval_range - stop address is not aligned - 0x7fb67820
I'm seeing this on Cubieboard2 and Cubietruck. It appears to be down to a change to the upstream designware driver:
commit 50b0df814b0f75c08a3d45a017016a75af3edb5d Author: Alexey Brodkin Alexey.Brodkin@synopsys.com Date: Wed Jan 22 20:49:09 2014 +0400
net/designware: make driver compatible with data cache Up until now this driver only worked with data cache disabled. To make it work with enabled data cache following changes were
required:
* Flush Tx/Rx buffer descriptors their modification * Invalidate Tx/Rx buffer descriptors before reading its values * Flush cache for data passed from CPU to GMAC * Invalidate cache for data passed from GMAC to CPU
http://git.denx.de/?p=u-boot.git;a=commit;h=50b0df814b0f75c08a3d45a017016a75...
I suppose this was only tested on some architecture which allows DMA flush/invaidation at a fairly fine granularity (at least down to 4 byte boundaries)
Making sure that struct dw_eth_dev is DMA aligned helps with the invalidate of the descriptors in dw_eth_recv (see below) but with that the invalidate of the txrx_status field in dw_eth_send is still problematic -- the field is only 4 bytes, so although the descriptor is aligned the end is not.
Indeed, with this change, I now get these errors instead:
Loading: ERROR: v7_dcache_inval_range - stop address is not aligned - 0x7fb663c4 ERROR: v7_dcache_inval_range - stop address is not aligned - 0x7fb66404 ERROR: v7_dcache_inval_range - stop address is not aligned - 0x7fb6f401
Best Regards
Shixin Zeng
Ian.
commit 8878d858ede12584b885fa9439f9093bf2186a90 Author: Ian Campbell ijc@hellion.org.uk Date: Sat Apr 19 14:16:04 2014 +0100
net/designware: ensure device private data is DMA aligned. struct dw_eth_dev contains fields which are accessed via DMA, so make
sure it is aligned to a dma boundary. Without this I see: ERROR: v7_dcache_inval_range - start address is not aligned - 0x7fb677e0
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
diff --git a/drivers/net/designware.c b/drivers/net/designware.c index 6ece479..1120f70 100644 --- a/drivers/net/designware.c +++ b/drivers/net/designware.c @@ -412,7 +412,8 @@ int designware_initialize(ulong base_addr, u32 interface) * Since the priv structure contains the descriptors which need a strict * buswidth alignment, memalign is used to allocate memory */
priv = (struct dw_eth_dev *) memalign(16, sizeof(struct
dw_eth_dev));
priv = (struct dw_eth_dev *) memalign(ARCH_DMA_MINALIGN,
sizeof(struct dw_eth_dev)); if (!priv) { free(dev); return -ENOMEM;
-- You received this message because you are subscribed to the Google Groups "linux-sunxi" group. To unsubscribe from this group and stop receiving emails from it, send an email to linux-sunxi+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.

On Saturday, April 19, 2014 at 03:30:14 PM, Ian Campbell wrote:
On Sun, 2014-04-13 at 23:45 -0400, Shixin Zeng wrote:
Hi,
I compiled the current u-boot from https://github.com/jwrdegoede/u-boot-sunxi.git for cubieboard2, and wrote it to the SD card. I was trying to boot the kernel on my computer over network by tftp, however it failed when I ran "dhcp" or "tftp" command in uboot with a tons of:
ERROR: v7_dcache_inval_range - start address is not aligned - 0x7fb677e0 ERROR: v7_dcache_inval_range - stop address is not aligned - 0x7fb67820
I'm seeing this on Cubieboard2 and Cubietruck. It appears to be down to a change to the upstream designware driver:
commit 50b0df814b0f75c08a3d45a017016a75af3edb5d Author: Alexey Brodkin Alexey.Brodkin@synopsys.com Date: Wed Jan 22 20:49:09 2014 +0400
net/designware: make driver compatible with data cache Up until now this driver only worked with data cache disabled. To make it work with enabled data cache following changes were
required:
* Flush Tx/Rx buffer descriptors their modification * Invalidate Tx/Rx buffer descriptors before reading its values * Flush cache for data passed from CPU to GMAC * Invalidate cache for data passed from GMAC to CPU
http://git.denx.de/?p=u-boot.git;a=commit;h=50b0df814b0f75c08a3d45a017016a7 5af3edb5d
I suppose this was only tested on some architecture which allows DMA flush/invaidation at a fairly fine granularity (at least down to 4 byte boundaries)
This was a sheer luck this ever worked. Looking at the entire driver, to fix all your issues with DMA and caches, it would be sufficient to re-align "struct dw_eth_dev" properly.
See drivers/net/designware.h: 1) struct dmamacdescr {} is already __aligned(ARCH_DMA_MINALIGN) => This structure, if aligned in memory to proper boundary, can be flushed/ invalidated without problems. 2) struct dw_eth_dev {} can be aligned to ANY 4-byte boundary But this structure contains two arrays of struct dmamacdescr {} , which each have their elements' lenght aligned to ARCH_DMA_MINALIGN
Solution:
Your patch [1/3] and reorder the structure in designware.h so that the struct dmamacdescr tx_mac_descrtable[] struct dmamacdescr rx_mac_descrtable[] are first and anything that does not need to be aligned follows. This way, the DMA descriptors will always be aligned and you need not worry about the flushes. You don't even need to ROUNDUP their length, since they are already fine.
When reordering the struct dw_eth_dev {}, make sure to add a comment about the alignment.
Best regards, Marek Vasut

On Sat, 2014-04-26 at 20:27 +0200, Marek Vasut wrote:
This was a sheer luck this ever worked. Looking at the entire driver, to fix all your issues with DMA and caches, it would be sufficient to re-align "struct dw_eth_dev" properly.
See drivers/net/designware.h:
- struct dmamacdescr {} is already __aligned(ARCH_DMA_MINALIGN) => This structure, if aligned in memory to proper boundary, can be flushed/ invalidated without problems.
- struct dw_eth_dev {} can be aligned to ANY 4-byte boundary But this structure contains two arrays of struct dmamacdescr {} , which each have their elements' lenght aligned to ARCH_DMA_MINALIGN
Solution:
Your patch [1/3] and reorder the structure in designware.h so that the struct dmamacdescr tx_mac_descrtable[] struct dmamacdescr rx_mac_descrtable[] are first and anything that does not need to be aligned follows. This way, the DMA descriptors will always be aligned and you need not worry about the flushes. You don't even need to ROUNDUP their length, since they are already fine.
That sounds like a good plan. I'll take a look.
When reordering the struct dw_eth_dev {}, make sure to add a comment about the alignment.
Of course.
Ian.

On Sunday, April 27, 2014 at 08:40:50 PM, Ian Campbell wrote:
On Sat, 2014-04-26 at 20:27 +0200, Marek Vasut wrote:
This was a sheer luck this ever worked. Looking at the entire driver, to fix all your issues with DMA and caches, it would be sufficient to re-align "struct dw_eth_dev" properly.
See drivers/net/designware.h:
struct dmamacdescr {} is already __aligned(ARCH_DMA_MINALIGN)
=> This structure, if aligned in memory to proper boundary, can be flushed/
invalidated without problems.
struct dw_eth_dev {} can be aligned to ANY 4-byte boundary
But this structure contains two arrays of struct dmamacdescr {} , which each have their elements' lenght aligned to ARCH_DMA_MINALIGN
Solution:
Your patch [1/3] and reorder the structure in designware.h so that the struct dmamacdescr tx_mac_descrtable[] struct dmamacdescr rx_mac_descrtable[] are first and anything that does not need to be aligned follows. This way, the DMA descriptors will always be aligned and you need not worry about the flushes. You don't even need to ROUNDUP their length, since they are already fine.
That sounds like a good plan. I'll take a look.
When reordering the struct dw_eth_dev {}, make sure to add a comment about the alignment.
Of course.
Thanks
Best regards, Marek Vasut

On Sat, 2014-04-26 at 20:27 +0200, Marek Vasut wrote:
On Saturday, April 19, 2014 at 03:30:14 PM, Ian Campbell wrote:
On Sun, 2014-04-13 at 23:45 -0400, Shixin Zeng wrote:
Hi,
I compiled the current u-boot from https://github.com/jwrdegoede/u-boot-sunxi.git for cubieboard2, and wrote it to the SD card. I was trying to boot the kernel on my computer over network by tftp, however it failed when I ran "dhcp" or "tftp" command in uboot with a tons of:
ERROR: v7_dcache_inval_range - start address is not aligned - 0x7fb677e0 ERROR: v7_dcache_inval_range - stop address is not aligned - 0x7fb67820
I'm seeing this on Cubieboard2 and Cubietruck. It appears to be down to a change to the upstream designware driver:
commit 50b0df814b0f75c08a3d45a017016a75af3edb5d Author: Alexey Brodkin Alexey.Brodkin@synopsys.com Date: Wed Jan 22 20:49:09 2014 +0400
net/designware: make driver compatible with data cache Up until now this driver only worked with data cache disabled. To make it work with enabled data cache following changes were
required:
* Flush Tx/Rx buffer descriptors their modification * Invalidate Tx/Rx buffer descriptors before reading its values * Flush cache for data passed from CPU to GMAC * Invalidate cache for data passed from GMAC to CPU
http://git.denx.de/?p=u-boot.git;a=commit;h=50b0df814b0f75c08a3d45a017016a7 5af3edb5d
I suppose this was only tested on some architecture which allows DMA flush/invaidation at a fairly fine granularity (at least down to 4 byte boundaries)
This was a sheer luck this ever worked. Looking at the entire driver, to fix all your issues with DMA and caches, it would be sufficient to re-align "struct dw_eth_dev" properly.
See drivers/net/designware.h:
- struct dmamacdescr {} is already __aligned(ARCH_DMA_MINALIGN) => This structure, if aligned in memory to proper boundary, can be flushed/ invalidated without problems.
- struct dw_eth_dev {} can be aligned to ANY 4-byte boundary But this structure contains two arrays of struct dmamacdescr {} , which each have their elements' lenght aligned to ARCH_DMA_MINALIGN
Solution:
Your patch [1/3] and reorder the structure in designware.h so that the struct dmamacdescr tx_mac_descrtable[] struct dmamacdescr rx_mac_descrtable[] are first and anything that does not need to be aligned follows. This way, the DMA descriptors will always be aligned and you need not worry about the flushes. You don't even need to ROUNDUP their length, since they are already fine.
Unfortunately this isn't sufficient, at least a change in the spirit of my second patch (to flush the entire descriptor) is also needed, because flushing just a subfield misaligns things again.
And my patch 3/3 is still needed because it deals with the data itself and not the descriptors.
So having done all that it doesn't seem that reordering dw_eth_dev is necessary.
Ian.

On Monday, April 28, 2014 at 09:55:46 PM, Ian Campbell wrote:
On Sat, 2014-04-26 at 20:27 +0200, Marek Vasut wrote:
On Saturday, April 19, 2014 at 03:30:14 PM, Ian Campbell wrote:
On Sun, 2014-04-13 at 23:45 -0400, Shixin Zeng wrote:
Hi,
I compiled the current u-boot from https://github.com/jwrdegoede/u-boot-sunxi.git for cubieboard2, and wrote it to the SD card. I was trying to boot the kernel on my computer over network by tftp, however it failed when I ran "dhcp" or "tftp" command in uboot with a tons of:
ERROR: v7_dcache_inval_range - start address is not aligned - 0x7fb677e0 ERROR: v7_dcache_inval_range - stop address is not aligned - 0x7fb67820
I'm seeing this on Cubieboard2 and Cubietruck. It appears to be down to a change to the upstream designware driver:
commit 50b0df814b0f75c08a3d45a017016a75af3edb5d Author: Alexey Brodkin Alexey.Brodkin@synopsys.com Date: Wed Jan 22 20:49:09 2014 +0400
net/designware: make driver compatible with data cache Up until now this driver only worked with data cache disabled. To make it work with enabled data cache following changes were
required: * Flush Tx/Rx buffer descriptors their modification * Invalidate Tx/Rx buffer descriptors before reading its values * Flush cache for data passed from CPU to GMAC * Invalidate cache for data passed from GMAC to CPU
http://git.denx.de/?p=u-boot.git;a=commit;h=50b0df814b0f75c08a3d45a0170 16a7 5af3edb5d
I suppose this was only tested on some architecture which allows DMA flush/invaidation at a fairly fine granularity (at least down to 4 byte boundaries)
This was a sheer luck this ever worked. Looking at the entire driver, to fix all your issues with DMA and caches, it would be sufficient to re-align "struct dw_eth_dev" properly.
See drivers/net/designware.h:
struct dmamacdescr {} is already __aligned(ARCH_DMA_MINALIGN)
=> This structure, if aligned in memory to proper boundary, can be flushed/
invalidated without problems.
struct dw_eth_dev {} can be aligned to ANY 4-byte boundary
But this structure contains two arrays of struct dmamacdescr {} , which each have their elements' lenght aligned to ARCH_DMA_MINALIGN
Solution:
Your patch [1/3] and reorder the structure in designware.h so that the struct dmamacdescr tx_mac_descrtable[] struct dmamacdescr rx_mac_descrtable[] are first and anything that does not need to be aligned follows. This way, the DMA descriptors will always be aligned and you need not worry about the flushes. You don't even need to ROUNDUP their length, since they are already fine.
Unfortunately this isn't sufficient, at least a change in the spirit of my second patch (to flush the entire descriptor) is also needed, because flushing just a subfield misaligns things again.
Ah, true. Your second patch is needed as well, sorry.
And my patch 3/3 is still needed because it deals with the data itself and not the descriptors.
True.
So having done all that it doesn't seem that reordering dw_eth_dev is necessary.
Reordering dw_eth_dev is necessary. Look:
108 struct dmamacdescr { // sizeof() = 0x10 109 u32 txrx_status; // +0x0 110 u32 dmamac_cntl; // +0x4 111 void *dmamac_addr; // +0x8 112 struct dmamacdescr *dmamac_next;// +0xc 113 } __aligned(ARCH_DMA_MINALIGN); // total size = 0x40
217 struct dw_eth_dev { 218 u32 interface; // +0x0 219 u32 tx_currdescnum; // +0x4 220 u32 rx_currdescnum; // +0x8 221 222 struct dmamacdescr tx_mac_descrtable[CONFIG_TX_DESCR_NUM];//+0xc
You do memalign() to allocate this. The .$interface ends up at address aligned to 64byte boundary (aka. it's cache aligned). Now, the structure is naturally aligned so tx_mac_descrtable[0] ends up at +0xc offset from the start of the structure , am I right ?
If I am wrong, then the compiler considers struct dmamacdescr {} as a one big chunk of data aligned to ARCH_DMA_MINALIGN boundary and thus inserts a big slop between rx_currdescnum and tx_mac_descrtable[0] to pad it correctly, which is not nice.
Reordering the structure will make sure there is no slop and there is no posibility of making tx_mac_descrtable unaligned ever.

On Mon, 2014-04-28 at 22:21 +0200, Marek Vasut wrote:
On Monday, April 28, 2014 at 09:55:46 PM, Ian Campbell wrote:
So having done all that it doesn't seem that reordering dw_eth_dev is necessary.
Reordering dw_eth_dev is necessary. Look:
It's certainly desirable, but it's not *necessary*. Anyway, that's just splitting hairs, I'm going to send a patch (on top of the v2 of this series) shortly.
Ian.
participants (5)
-
Alexey Brodkin
-
Ian Campbell
-
Marek Vasut
-
Shixin Zeng
-
Stefan Monnier