
From: Thierry Reding treding@nvidia.com
This series attempts to fix a long-standing problem in the rtl8169 driver (though the same problem may exist in other drivers as well). Let me first explain what exactly the issue is:
The rtl8169 driver provides a set of RX and TX descriptors for the device to use. Once they're set up, the device is told about their location so that it can fetch the descriptors using DMA. The device will also write packet state back into these descriptors using DMA. For this to work properly, whenever a driver needs to access these descriptors it needs to invalidate the D-cache line(s) associated with them. Similarly when changes to the descriptor have been made by the driver, the cache lines need to be flushed to make sure the changes are visible to the device.
The descriptors are 16 bytes in size. This causes problems when used on CPUs that have a cache-line size that is larger than 16 bytes. One example is the NVIDIA Tegra124 which has 64-byte cache-lines. That means that 4 descriptors fit into a single cache-line. So whenever the driver flushes a cache-line it has the potential to discard changes made to another descriptor by the DMA device. One typical symptom is that large transfers over TFTP will often not complete and hang somewhere midway because a device marked a packet received but the driver flushing the cache and causing the packet to be lost.
Since the descriptors need to be consecutive in memory, I don't see a way to fix this other than to use uncached memory. Therefore the solution proposed in this patch series is to introduce a mechanism in U-Boot to allow a driver to allocate from a pool of uncached memory. Currently an implementation is provided only for ARM v7. The idea is that a region (of user-definable size) immediately below (taking into account architecture-specific alignment restrictions) the malloc() area is mapped uncacheable in the MMU. A driver can use the new noncached_alloc() function to allocate a chunk of memory from this pool dynamically for buffers that it can't or doesn't want to do any explicit cache-maintainance on, yet needs to be shared with DMA devices.
Patches 1-3 are minor preparatory work. Patch 1 cleans up some coding style issues in the ARM v7 cache code and patch 2 uses more future-proof types for the mmu_set_region_dcache_behaviour() function arguments. Patch 3 is purely for debugging purposes. It will print out the region used by malloc() when DEBUG is enabled. This can be useful to see where the malloc() region is in the memory map (compared to the noncached region introduced in a later patch for example).
Patch 4 implements the noncached API for ARM v7. It obtains the start of the malloc() area and places the noncached region immediately below it so that noncached_alloc() can allocate from it. During boot, the noncached area will be set up immediately after malloc().
Patch 5 enables noncached memory for all Tegra boards. It uses a 1 MiB chunk which should be plenty (it's also the minimum on ARM v7 because it matches the MMU section size and therefore the granularity at which U-Boot can set the cacheable attributes).
Patch 6 is not really related but just something I stumbled across when going through the code. According to the top-level README file, network drivers are supposed to respect the CONFIG_SYS_RX_ETH_BUFFER. rtl8169 doesn't currently do that, so this patch fixes it.
Patch 7 is the result of earlier rework that still aimed at solving the problem using explicit cache maintenance. rtl8169 hardware requires buffers to be aligned to 256 byte boundaries. The rtl8169 driver used to employ some trickery to make that work, but nowadays there are macros that can be used to the same effect, so this patch uses them and gets rid of the custom trickery. This patch also prints out a warning if it detects a potential caching issue (i.e. ARCH_DMA_MINALIGN > sizeof(struct RxDesc)).
Patch 8 finally adds optional support for non-cached memory. When available the driver will now use the noncached API to obtain uncached buffers for the RX and TX descriptor rings. At the same time the cache-maintenance functions for the RX and TX descriptors become no-ops so that the code can work with or without the noncached API available.
With all of the above in place, patch 9 adds support for RTL-8168/8111g as found on the NVIDIA Jetson TK1 board (which has a Tegra124 SoC).
Note that this series also fixes the sporadic hangs of large TFTP transfers for earlier SoC generations of Tegra (Tegra20 and Tegra30), though they were less frequent there, probably caused by the cache-lines being 32 bytes rather than 64.
Thierry
Thierry Reding (9): ARM: cache_v7: Various minor cleanups ARM: cache-cp15: Use unsigned long for address and size malloc: Output region when debugging ARM: Implement non-cached memory support ARM: tegra: Enable non-cached memory net: rtl8169: Honor CONFIG_SYS_RX_ETH_BUFFER net: rtl8169: Properly align buffers net: rtl8169: Use non-cached memory if available net: rtl8169: Add support for RTL-8168/8111g
README | 16 ++++++ arch/arm/cpu/armv7/cache_v7.c | 14 +++--- arch/arm/include/asm/system.h | 7 ++- arch/arm/lib/cache-cp15.c | 6 +-- arch/arm/lib/cache.c | 41 +++++++++++++++ common/board_r.c | 11 +++++ common/dlmalloc.c | 3 ++ drivers/net/rtl8169.c | 110 ++++++++++++++++++++++++++++++----------- include/configs/tegra-common.h | 1 + 9 files changed, 168 insertions(+), 41 deletions(-)