
On 10/1/2013 9:50 PM, Scott Wood wrote:
On Tue, 2013-10-01 at 14:38 +0300, Claudiu Manoil wrote:
On 10/1/2013 2:22 AM, Scott Wood wrote:
On Mon, 2013-09-30 at 12:44 +0300, Claudiu Manoil wrote:
+static RTXBD rtx __aligned(8); +#define RXBD(i) rtx.rxbd[i] +#define TXBD(i) rtx.txbd[i] +#define GET_BD_STAT(T, i) be16_to_cpu((__force __be16)T##BD(i).status) +#define SET_BD_STAT(T, i, v) T##BD(i).status = (__force __u16)cpu_to_be16(v) +#define GET_BD_BLEN(T, i) be16_to_cpu((__force __be16)T##BD(i).length) +#define SET_BD_BLEN(T, i, v) T##BD(i).length = (__force __u16)cpu_to_be16(v) +#define GET_BD_BPTR(T, i) be32_to_cpu((__force __be32)T##BD(i).bufptr) +#define SET_BD_BPTR(T, i, v) T##BD(i).bufptr = (__force __u32)cpu_to_be32(v)
Why the forcing? Can't you declare the data with __be16/__be32?
The __force annotation is obviously needed to suppress the sparse warnings due to BD data declaration type not being __bexx, but the generic uintxx_t, as you noticed as well. Now, why I didn't use __bexx instead? The main reason would be maintainability: the DMA descriptors may not be in big endian format exclusively. The eTSEC hw may actually have an option to interpret the DMA descriptors in little endian format.
May have or does have?
If it does have such a feature, do you plan to use it? Usually I have not seen such features used for (e.g.) little-endian PCI devices on big endian hosts.
I has that option, but I don't really plan to use it, clearly not for big endian hosts. The "may have" was for future little endian hosts. But I think this option is not really needed by the uboot driver so doing the byte swapping in software should be ok (i.e. performance wise).
What's wrong with:
for (t = 0; in_be16(&rtx.rxbd[rx_idx].status) & RXBD_EMPTY; t++) {
Or if you don't want to use I/O accessors on the DMA descriptors (Is synchronization ensured some other way? We had problems with this in the Linux driver before...):
Synchronization here is is insured by declaring the RTXBD structure type as "volatile" (see RTXBD declaration, a couple of lines above).
That does not achieve hardware synchronization, and even the effects on the compiler are questionable due to volatile's vague semantics.
The existing code has been working this way for quite a while on the mpc85xx platforms,
It was "working" for a while in Linux as well, until we encountered a workload where it didn't (though granted, there was no volatile in that case). See Linux commit 3b6330ce2a3e1f152f79a6203f73d23356e243a7
Good point, I guess it would be safer too use some memory barriers around accesses to BD fields in the uboot driver too. However some portable barriers would be needed, eieio() doesn't have an equivalent for ARM.
FWIW, I see some other places in U-Boot's TSEC driver that use out_be32() on the descriptors (e.g. startup_tsec after "Point to the buffer descriptors").
That's only to program the tbase and rbase registers with the physical address of the Tx/Rx BD rings' base.
so I thought it would be better not to change this approach. Using i/o accessors for the Buffer Descriptors would be a significant change, and I don't see how to argue such a change: why would the I/O accessors be better than the current approach - i.e. declaring the DMA descriptors as volatile? Is there something wrong with about volatile usage in this case? (afaik, this is a case were the volatile declaration is permitted)
Also, there doesn't seem to be other net drivers using I/O accessors for the BD rings.
I picked some random examples, and the first driver in Linux in which I could quickly find the BD rings uses I/O accessors (drivers/net/ethernet/realtek/8319too.c). I then checked its U-Boot eqivalent (drivers/net/rtl8139.c) and it also uses I/O accessors for the descriptors.
I actually meant accessing buffer descriptor fields (like status, length, dma address). As you can see in this example from linux 8319too.c's Rx routine, there's no I/O accessor involved:
/* read size+status of next frame from DMA ring buffer */ rx_status = le32_to_cpu (*(__le32 *) (rx_ring + ring_offset));
However the driver does use a rmb() just before this line.
for (t = 0; be16_to_cpup(&rtx.rxbd[rx_idx].status) & RXBD_EMPTY; t++) {
I opted for using macros instead due to code maintainability,
Obfuscatory macros do not help.
and to avoid overly long lines (80)
You could factor out an rxbd_empty() function, or factor that loop out to be its own function, or have a local variable point to rtx.rxbd[rx_idx]...
and to better control these changes. For instance, if etsec were to access it's BDs in little endian format in the future,
Either don't do that (preferred option), or at that point add tsec16_to_cpup() and friends.
I'll have a try with the portable I/O accessors (in_be, out_be) to see how it works that way.
Thanks, Claudiu