Re: [U-Boot] [PATCH 7/9] net: tsec: Use portable types and accessors for BDs

2 Oct 2013


      On 10/1/2013 9:50 PM, Scott Wood wrote:
...
On Tue, 2013-10-01 at 14:38 +0300, Claudiu Manoil wrote:
...
On 10/1/2013 2:22 AM, Scott Wood wrote:
...
On Mon, 2013-09-30 at 12:44 +0300, Claudiu Manoil wrote:
...
+static RTXBD rtx __aligned(8);
+#define RXBD(i) rtx.rxbd[i]
+#define TXBD(i) rtx.txbd[i]
+#define GET_BD_STAT(T, i) be16_to_cpu((__force __be16)T##BD(i).status)
+#define SET_BD_STAT(T, i, v) T##BD(i).status = (__force __u16)cpu_to_be16(v)
+#define GET_BD_BLEN(T, i) be16_to_cpu((__force __be16)T##BD(i).length)
+#define SET_BD_BLEN(T, i, v) T##BD(i).length = (__force __u16)cpu_to_be16(v)
+#define GET_BD_BPTR(T, i) be32_to_cpu((__force __be32)T##BD(i).bufptr)
+#define SET_BD_BPTR(T, i, v) T##BD(i).bufptr = (__force __u32)cpu_to_be32(v)
Why the forcing?  Can't you declare the data with __be16/__be32?
The __force annotation is obviously needed to suppress the sparse
warnings due to BD data declaration type not being __bexx, but the
generic uintxx_t, as you noticed as well.
Now, why I didn't use __bexx instead?  The main reason would be
maintainability: the DMA descriptors may not be in big endian format
exclusively.  The eTSEC hw may actually have an option to interpret
the DMA descriptors in little endian format.
May have or does have?
If it does have such a feature, do you plan to use it?  Usually I have
not seen such features used for (e.g.) little-endian PCI devices on big
endian hosts.
I has that option, but I don't really plan to use it, clearly not for 
big endian hosts. The "may have" was for future little endian hosts.
But I think this option is not really needed by the uboot driver
so doing the byte swapping in software should be ok (i.e. performance
wise).
...
...
...
What's wrong with:
for (t = 0; in_be16(&rtx.rxbd[rx_idx].status) & RXBD_EMPTY; t++) {
Or if you don't want to use I/O accessors on the DMA descriptors (Is
synchronization ensured some other way?  We had problems with this in
the Linux driver before...):
Synchronization here is is insured by declaring the RTXBD structure
type as "volatile" (see RTXBD declaration, a couple of lines above).
That does not achieve hardware synchronization, and even the effects on
the compiler are questionable due to volatile's vague semantics.
...
The existing code has been working this way for quite a while on the
mpc85xx platforms,
It was "working" for a while in Linux as well, until we encountered a
workload where it didn't (though granted, there was no volatile in that
case).  See Linux commit 3b6330ce2a3e1f152f79a6203f73d23356e243a7
Good point, I guess it would be safer too use some memory barriers
around accesses to BD fields in the uboot driver too.  However some
portable barriers would be needed, eieio() doesn't have an equivalent
for ARM.
...
FWIW, I see some other places in U-Boot's TSEC driver that use
out_be32() on the descriptors (e.g. startup_tsec after "Point to the
buffer descriptors").
That's only to program the tbase and rbase registers with the physical
address of the Tx/Rx BD rings' base.
...
...
so I thought it would be better not to change this
approach.  Using i/o accessors for the Buffer Descriptors would be a
significant change, and I don't see how to argue such a change: why
would the I/O accessors be better than the current approach - i.e.
declaring the DMA descriptors as volatile?  Is there something wrong
with about volatile usage in this case? (afaik, this is a case were
the volatile declaration is permitted)
...
Also, there doesn't seem to be other net drivers using I/O accessors
for the BD rings.
I picked some random examples, and the first driver in Linux in which I
could quickly find the BD rings uses I/O accessors
(drivers/net/ethernet/realtek/8319too.c).  I then checked its U-Boot
eqivalent (drivers/net/rtl8139.c) and it also uses I/O accessors for the
descriptors.
I actually meant accessing buffer descriptor fields (like status, 
length, dma address). As you can see in this example from linux
8319too.c's Rx routine, there's no I/O accessor involved:
/* read size+status of next frame from DMA ring buffer */
rx_status = le32_to_cpu (*(__le32 *) (rx_ring + ring_offset));
However the driver does use a rmb() just before this line.
...
...
...
for (t = 0; be16_to_cpup(&rtx.rxbd[rx_idx].status) & RXBD_EMPTY; t++) {
I opted for using macros instead due to code maintainability,
Obfuscatory macros do not help.
...
and to avoid overly long lines (80)
You could factor out an rxbd_empty() function, or factor that loop out
to be its own function, or have a local variable point to
rtx.rxbd[rx_idx]...
...
and to better control these changes.
For instance, if etsec were to access it's BDs in little endian format
in the future,
Either don't do that (preferred option), or at that point add
tsec16_to_cpup() and friends.
I'll have a try with the portable I/O accessors (in_be, out_be) to see
how it works that way.
Thanks,
Claudiu