
On 08/18/2014 02:00 AM, Thierry Reding wrote:
From: Thierry Reding treding@nvidia.com
This series attempts to fix a long-standing problem in the rtl8169 driver (though the same problem may exist in other drivers as well). Let me first explain what exactly the issue is:
The rtl8169 driver provides a set of RX and TX descriptors for the device to use. Once they're set up, the device is told about their location so that it can fetch the descriptors using DMA. The device will also write packet state back into these descriptors using DMA. For this to work properly, whenever a driver needs to access these descriptors it needs to invalidate the D-cache line(s) associated with them. Similarly when changes to the descriptor have been made by the driver, the cache lines need to be flushed to make sure the changes are visible to the device.
The descriptors are 16 bytes in size. This causes problems when used on CPUs that have a cache-line size that is larger than 16 bytes. One example is the NVIDIA Tegra124 which has 64-byte cache-lines. That means that 4 descriptors fit into a single cache-line. So whenever the driver flushes a cache-line it has the potential to discard changes made to another descriptor by the DMA device. One typical symptom is that large transfers over TFTP will often not complete and hang somewhere midway because a device marked a packet received but the driver flushing the cache and causing the packet to be lost.
Since the descriptors need to be consecutive in memory, I don't see a way to fix this other than to use uncached memory. Therefore the solution proposed in this patch series is to introduce a mechanism in U-Boot to allow a driver to allocate from a pool of uncached memory. Currently an implementation is provided only for ARM v7. The idea is that a region (of user-definable size) immediately below (taking into account architecture-specific alignment restrictions) the malloc() area is mapped uncacheable in the MMU. A driver can use the new noncached_alloc() function to allocate a chunk of memory from this pool dynamically for buffers that it can't or doesn't want to do any explicit cache-maintainance on, yet needs to be shared with DMA devices.
Patches 1-3 are minor preparatory work. Patch 1 cleans up some coding style issues in the ARM v7 cache code and patch 2 uses more future-proof types for the mmu_set_region_dcache_behaviour() function arguments. Patch 3 is purely for debugging purposes. It will print out the region used by malloc() when DEBUG is enabled. This can be useful to see where the malloc() region is in the memory map (compared to the noncached region introduced in a later patch for example).
Patch 4 implements the noncached API for ARM v7. It obtains the start of the malloc() area and places the noncached region immediately below it so that noncached_alloc() can allocate from it. During boot, the noncached area will be set up immediately after malloc().
Patch 5 enables noncached memory for all Tegra boards. It uses a 1 MiB chunk which should be plenty (it's also the minimum on ARM v7 because it matches the MMU section size and therefore the granularity at which U-Boot can set the cacheable attributes).
If LPAE were to be enabled, the minimum would be 2MiB, but I suppose we can deal with that if/when the time comes.