
On 7/23/2011 6:04 AM, Albert ARIBAUD wrote:
Le 21/07/2011 08:48, David Jander a écrit :
However, it is still correct that copying from an non-cached area is slower than from cached areas, because of burst reads vs. individual reads. However, I doubt that the u-boot user can tell the difference, as the network latency will far exceed the difference in copy time.
That's assuming cache is only for networking. There can be DMA engines in a lot of other peripherals which do not have the same latency as network (and then, even for networking, TFTP can be done from a very nearby server, possibly even on the same Ethernet segment).
Hi All, Yes, there are other uses of DMA. On a network, unless you have a Gigabit network, your memory access speed is at least an order of magnitude faster than the network, probably more. Plus, there is a latency due to sending the ack and request for the next record that undoubtedly swamps out any reduction in memory speed due to the single copy that takes place. In the case of other devices, like for example disks, the percentage effect is probably greater, but since these devices are so fast anyway that the human-perceived speed reduction is essentially nil. If we were talking about a CPU running Linux and doing all kinds of I/O all day long, the reduction in throughput performance might be 10% and that might matter. In a boot loader that does I/O mostly to read in a program to replace itself, I would argue that nobody will notice the difference between cached and un-cached buffers. Counter-examples welcome however.
The question is, which is easier to do, and that is probably a matter of opinion. However, it is safe to say that so far a cached solution has eluded us. That may be changing, but it would still be nice to know how to allocate a section of un-cached RAM in the ARM processor, in so far as the question has a single answer! That would allow easy portability of drivers that do not know about caches, of which there seems to be many.
That is one approach, which I think prevents cache from being used beyond caching pure CPU-used DRAM.
You are certainly correct there. However, I think the pure CPU-used ram case is the one that matters most. Uncompressing and checksumming of input data are typical u-boot functions that take significant time. The performance increase due to cache hits in these cases is huge, and easily perceptible by the user.
I agree. Unfortunately, my time is up for now, and I can't go on with trying to fix this driver. Maybe I'll pick up after my vacation. As for now I settled for the ugly solution of keeping dcache disabled while ethernet is being used :-(
Make sure you flush before disabling. :)
IMHO, doing cache maintenance all over the driver is not an easy or nice solution. Implementing a non-cached memory pool in the MMU and a corresponding dma_malloc() sounds like much more universally applicable to any driver.
I think cache maintenance is feasible if one makes sure the cached areas used by the driver are properly aligned, which simplifies things a lot: you don't have to care for flush-invalidate or just-in-time invalidate, you just have to flush before sending and invalidate before reading.
I do agree it can be done. However, most (I think?) of the CPUs to which u-boot have been ported have cache-coherent DMA. As a result, cache issues for these CPUs are not addressed in the driver at all. Often this means that cache support is done after the fact by somebody other than the original author who may not totally understand the original driver. If DMA buffers were always allocated from cache-coherent memory, either because the memory is un-cached or because the CPU is DMA cache coherent, no changes would be necessary to get the driver working correctly. If performance ever became an issue in the un-cached case, then more work would be required, but in most cases, I expect nobody will notice.
Best Regards, Bill Campbell
Best regards,
Amicalement,