
On Wed, 20 Jul 2011 08:36:12 -0700 "J. William Campbell" jwilliamcampbell@comcast.net wrote:
On 7/20/2011 7:35 AM, Albert ARIBAUD wrote:
Le 20/07/2011 16:01, J. William Campbell a écrit :
On 7/20/2011 6:02 AM, Albert ARIBAUD wrote:
Le 19/07/2011 22:11, J. William Campbell a écrit :
If this is true, then it means that the cache is of type write-back (as opposed to write-thru). From a (very brief) look at the arm7 manuals, it appears that both types of cache may be present in the cpu. Do you know how this operates?
Usually, copyback (rather than writeback) and writethough are modes of operation, not cache types.
Hi Albert, One some CPUs both cache modes are available. On many other CPUs (I would guess most), you have one fixed mode available, but not both. I have always seen the two modes described as write-back and write-through, but I am sure we are talking about the same things.
We are. Copy-back is another name for write-back, not used by ARM but by some others.
The examples that have both modes that I am familiar with have the mode as a "global" setting. It is not controlled by bits in the TLB or anything like that. How does it work on ARM? Is it fixed, globally, globally controlled, or controlled by memory management?
Well, it's a bit complicated, because it depends on the architecture version *and* implementation -- ARM themselves do not mandate things, and it is up to the SoC designer to specify what cache they want and what mode it supports, both at L1 and L2, in their specific instance of ARM cores. And yes, you can have memory areas that are write-back and others that are write-through in the same system.
If it is controlled by memory management, it looks to me like lots of problems could be avoided by operating with input type buffers set as write-through. One probably isn't going to be writing to input buffers much under program control anyway, so the performance loss should be minimal. This gets rid of the alignment restrictions on these buffers but not the invalidate/flush requirements.
There's not much you can do about alignment issues except align to cache line boundaries.
However, if memory management is required to set the cache mode, it might be best to operate with the buffers and descriptors un-cached. That gets rid of the flush/invalidate requirement at the expense of slowing down copying from read buffers.
That makes 'best' a subjective choice, doesn't it? :)
Hi All, Yes,it probably depends on the usage.
Probably a reasonable price to pay for the associated simplicity.
Others would say that spending some time setting up alignments and flushes and invalidates is a reasonable price to pay for increased performance... That's an open debate where no solution is The Right One(tm).
For instance, consider the TFTP image reading. People would like the image to end up in cached memory because we'll do some checksumming on it before we give it control, and having it cached makes this step quite faster; but we'll lose that if we put it in non-cached memory because it comes through the Ethernet controller's DMA; and it would be worse to receive packets in non-cached memory only to move their contents into cached memory later on.
I think properly aligning descriptors and buffers is enough to avoid the mixed flush/invalidate line issue, and wisely putting instruction barriers should be enough to get the added performance of cache without too much of the hassle of memory management.
I am pretty sure that all the drivers read the input data into intermediate buffers in all cases. There is no practical way to be sure the next packet received is the "right one" for the tftp. Plus there are headers involved, and furthermore there is no way to ensure that a tftp destination is located on a sector boundary. In short, you are going to copy from an input buffer to a destination. However, it is still correct that copying from an non-cached area is slower than from cached areas, because of burst reads vs. individual reads. However, I doubt that the u-boot user can tell the difference, as the network latency will far exceed the difference in copy time. The question is, which is easier to do, and that is probably a matter of opinion. However, it is safe to say that so far a cached solution has eluded us. That may be changing, but it would still be nice to know how to allocate a section of un-cached RAM in the ARM processor, in so far as the question has a single answer! That would allow easy portability of drivers that do not know about caches, of which there seems to be many.
I agree. Unfortunately, my time is up for now, and I can't go on with trying to fix this driver. Maybe I'll pick up after my vacation. As for now I settled for the ugly solution of keeping dcache disabled while ethernet is being used :-( IMHO, doing cache maintenance all over the driver is not an easy or nice solution. Implementing a non-cached memory pool in the MMU and a corresponding dma_malloc() sounds like much more universally applicable to any driver.
Best regards,