Re: [U-Boot] i.MX51: FEC: Cache coherency problem?

21 Jul 2011


      On Wed, 20 Jul 2011 08:36:12 -0700
"J. William Campbell" jwilliamcampbell@comcast.net wrote:
...
On 7/20/2011 7:35 AM, Albert ARIBAUD wrote:
...
Le 20/07/2011 16:01, J. William Campbell a écrit :
...
On 7/20/2011 6:02 AM, Albert ARIBAUD wrote:
...
Le 19/07/2011 22:11, J. William Campbell a écrit :
...
If this is true, then it means that the cache is of type write-back 
(as
opposed to write-thru). From a (very brief) look at the arm7 
manuals, it
appears that both types of cache may be present in the cpu. Do you 
know
how this operates?
Usually, copyback (rather than writeback) and writethough are modes of
operation, not cache types.
Hi Albert,
One some CPUs both cache modes are available. On many other CPUs (I
would guess most), you have one fixed mode available, but not both. I
have always seen the two modes described as write-back and
write-through, but I am sure we are talking about the same things.
We are. Copy-back is another name for write-back, not used by ARM but 
by some others.
...
The
examples that have both modes that I am familiar with have the mode as a
"global" setting. It is not controlled by bits in the TLB or anything
like that. How does it work on ARM? Is it fixed, globally, globally
controlled, or controlled by memory management?
Well, it's a bit complicated, because it depends on the architecture 
version *and* implementation -- ARM themselves do not mandate things, 
and it is up to the SoC designer to specify what cache they want and 
what mode it supports, both at L1 and L2, in their specific instance 
of ARM cores. And yes, you can have memory areas that are write-back 
and others that are write-through in the same system.
...
If it is controlled by memory management, it looks to me like lots of
problems could be avoided by operating with input type buffers set as
write-through. One probably isn't going to be writing to input buffers
much under program control anyway, so the performance loss should be
minimal. This gets rid of the alignment restrictions on these buffers
but not the invalidate/flush requirements.
There's not much you can do about alignment issues except align to 
cache line boundaries.
...
However, if memory management
is required to set the cache mode, it might be best to operate with the
buffers and descriptors un-cached. That gets rid of the flush/invalidate
requirement at the expense of slowing down copying from read buffers.
That makes 'best' a subjective choice, doesn't it? :)
Hi All,
         Yes,it probably depends on the usage.
...
...
Probably a reasonable price to pay for the associated simplicity.
Others would say that spending some time setting up alignments and 
flushes and invalidates is a reasonable price to pay for increased 
performance... That's an open debate where no solution is The Right 
One(tm).
For instance, consider the TFTP image reading. People would like the 
image to end up in cached memory because we'll do some checksumming on 
it before we give it control, and having it cached makes this step 
quite faster; but we'll lose that if we put it in non-cached memory 
because it comes through the Ethernet controller's DMA; and it would 
be worse to receive packets in non-cached memory only to move their 
contents into cached memory later on.
I think properly aligning descriptors and buffers is enough to avoid 
the mixed flush/invalidate line issue, and wisely putting instruction 
barriers should be enough to get the added performance of cache 
without too much of the hassle of memory management.
I am pretty sure that all the drivers read the input data into 
intermediate buffers in all cases. There is no practical way to be sure 
the next packet received is the "right one" for the tftp. Plus there are 
headers involved, and furthermore there is no way to ensure that a tftp 
destination is located on a sector boundary. In short, you are going to 
copy from an input buffer to a destination.
However, it is still correct that copying from an non-cached area is 
slower than from cached areas, because of burst reads vs. individual 
reads. However, I doubt that the u-boot user can tell the difference, as 
the network latency will far exceed the difference in copy time. The 
question is, which is easier to do, and that is probably a matter of 
opinion. However, it is safe to say that so far a cached solution has 
eluded us. That may be changing, but it would still be nice to know how 
to allocate a section of un-cached RAM in the ARM processor, in so far 
as the question has a single answer! That would allow easy portability 
of drivers that do not know about caches, of which there seems to be many.
I agree. Unfortunately, my time is up for now, and I can't go on with trying
to fix this driver. Maybe I'll pick up after my vacation.
As for now I settled for the ugly solution of keeping dcache disabled while
ethernet is being used :-(
IMHO, doing cache maintenance all over the driver is not an easy or nice
solution. Implementing a non-cached memory pool in the MMU and a corresponding
dma_malloc() sounds like much more universally applicable to any driver.
Best regards,
-- 
David Jander
Protonic Holland.