[U-Boot] [PATCH 0/3] pcnet driver fixes

This series fixes issues with the pcnet driver & its memory accesses.
Previously the network interface on the MIPS Malta board was unreliable & would often time out whilst transferring files via TFTP, but with this series applied it is now stable.
Paul Burton (3): pcnet: access descriptor rings & init block uncached pcnet: align rx buffers for cache invalidation pcnet: force ordering of descriptor accesses
drivers/net/pcnet.c | 122 ++++++++++++++++++++++++++++------------------------ 1 file changed, 65 insertions(+), 57 deletions(-)

The prior accesses to the descriptor rings & init block via cached memory had a few issues:
- The memory needs cache flushes or invalidation at the appropriate times, but was not necessarily aligned on cache line boundaries. This could lead to data being incorrectly lost or written back to RAM at the wrong time.
- There are points where ordering of writes to the memory is important, but because it's cached memory the pcnet controller would see cache lines written back ordered by address. This could occasionally lead to hardware seeing descriptors in an incorrect state.
- Flushing the cache constantly is inefficient.
So, to avoid all of those issues simply access the descriptors & init block via uncached memory. The MIPS-specific UNCACHED_SDRAM macro is used to do this (retrieving an address in kseg1) as I could see no existing generic solution. Since the MIPS Malta board is the only user of the pcnet driver, hopefully this doesn't matter.
Signed-off-by: Paul Burton paul.burton@imgtec.com --- drivers/net/pcnet.c | 66 ++++++++++++++++++++++++++++------------------------- 1 file changed, 35 insertions(+), 31 deletions(-)
diff --git a/drivers/net/pcnet.c b/drivers/net/pcnet.c index 71a3110..8c334c7 100644 --- a/drivers/net/pcnet.c +++ b/drivers/net/pcnet.c @@ -71,10 +71,14 @@ struct pcnet_init_block { u32 reserved2; };
-typedef struct pcnet_priv { +struct pcnet_uncached_priv { struct pcnet_rx_head rx_ring[RX_RING_SIZE]; struct pcnet_tx_head tx_ring[TX_RING_SIZE]; struct pcnet_init_block init_block; +}; + +typedef struct pcnet_priv { + struct pcnet_uncached_priv *uc; /* Receive Buffer space */ unsigned char rx_buf[RX_RING_SIZE][PKT_BUF_SZ + 4]; int cur_rx; @@ -283,6 +287,7 @@ static int pcnet_probe(struct eth_device *dev, bd_t *bis, int dev_nr)
static int pcnet_init(struct eth_device *dev, bd_t *bis) { + struct pcnet_uncached_priv *uc; int i, val; u32 addr;
@@ -325,24 +330,31 @@ static int pcnet_init(struct eth_device *dev, bd_t *bis) addr = (u32)malloc(sizeof(pcnet_priv_t) + 0x10); addr = (addr + 0xf) & ~0xf; lp = (pcnet_priv_t *)addr; + + addr = (u32)memalign(ARCH_DMA_MINALIGN, sizeof(*lp->uc)); + flush_dcache_range(addr, addr + sizeof(*lp->uc)); + addr = UNCACHED_SDRAM(addr); + lp->uc = (struct pcnet_uncached_priv *)addr; }
- lp->init_block.mode = cpu_to_le16(0x0000); - lp->init_block.filter[0] = 0x00000000; - lp->init_block.filter[1] = 0x00000000; + uc = lp->uc; + + uc->init_block.mode = cpu_to_le16(0x0000); + uc->init_block.filter[0] = 0x00000000; + uc->init_block.filter[1] = 0x00000000;
/* * Initialize the Rx ring. */ lp->cur_rx = 0; for (i = 0; i < RX_RING_SIZE; i++) { - lp->rx_ring[i].base = PCI_TO_MEM_LE(dev, lp->rx_buf[i]); - lp->rx_ring[i].buf_length = cpu_to_le16(-PKT_BUF_SZ); - lp->rx_ring[i].status = cpu_to_le16(0x8000); + uc->rx_ring[i].base = PCI_TO_MEM_LE(dev, lp->rx_buf[i]); + uc->rx_ring[i].buf_length = cpu_to_le16(-PKT_BUF_SZ); + uc->rx_ring[i].status = cpu_to_le16(0x8000); PCNET_DEBUG1 ("Rx%d: base=0x%x buf_length=0x%hx status=0x%hx\n", i, - lp->rx_ring[i].base, lp->rx_ring[i].buf_length, - lp->rx_ring[i].status); + uc->rx_ring[i].base, uc->rx_ring[i].buf_length, + uc->rx_ring[i].status); }
/* @@ -351,34 +363,34 @@ static int pcnet_init(struct eth_device *dev, bd_t *bis) */ lp->cur_tx = 0; for (i = 0; i < TX_RING_SIZE; i++) { - lp->tx_ring[i].base = 0; - lp->tx_ring[i].status = 0; + uc->tx_ring[i].base = 0; + uc->tx_ring[i].status = 0; }
/* * Setup Init Block. */ - PCNET_DEBUG1("Init block at 0x%p: MAC", &lp->init_block); + PCNET_DEBUG1("Init block at 0x%p: MAC", &lp->uc->init_block);
for (i = 0; i < 6; i++) { - lp->init_block.phys_addr[i] = dev->enetaddr[i]; - PCNET_DEBUG1(" %02x", lp->init_block.phys_addr[i]); + lp->uc->init_block.phys_addr[i] = dev->enetaddr[i]; + PCNET_DEBUG1(" %02x", lp->uc->init_block.phys_addr[i]); }
- lp->init_block.tlen_rlen = cpu_to_le16(TX_RING_LEN_BITS | + uc->init_block.tlen_rlen = cpu_to_le16(TX_RING_LEN_BITS | RX_RING_LEN_BITS); - lp->init_block.rx_ring = PCI_TO_MEM_LE(dev, lp->rx_ring); - lp->init_block.tx_ring = PCI_TO_MEM_LE(dev, lp->tx_ring); - flush_dcache_range((unsigned long)lp, (unsigned long)&lp->rx_buf); + uc->init_block.rx_ring = PCI_TO_MEM_LE(dev, uc->rx_ring); + uc->init_block.tx_ring = PCI_TO_MEM_LE(dev, uc->tx_ring);
PCNET_DEBUG1("\ntlen_rlen=0x%x rx_ring=0x%x tx_ring=0x%x\n", - lp->init_block.tlen_rlen, - lp->init_block.rx_ring, lp->init_block.tx_ring); + uc->init_block.tlen_rlen, + uc->init_block.rx_ring, uc->init_block.tx_ring);
/* * Tell the controller where the Init Block is located. */ - addr = PCI_TO_MEM(dev, &lp->init_block); + barrier(); + addr = PCI_TO_MEM(dev, &lp->uc->init_block); pcnet_write_csr(dev, 1, addr & 0xffff); pcnet_write_csr(dev, 2, (addr >> 16) & 0xffff);
@@ -408,7 +420,7 @@ static int pcnet_init(struct eth_device *dev, bd_t *bis) static int pcnet_send(struct eth_device *dev, void *packet, int pkt_len) { int i, status; - struct pcnet_tx_head *entry = &lp->tx_ring[lp->cur_tx]; + struct pcnet_tx_head *entry = &lp->uc->tx_ring[lp->cur_tx];
PCNET_DEBUG2("Tx%d: %d bytes from 0x%p ", lp->cur_tx, pkt_len, packet); @@ -418,8 +430,6 @@ static int pcnet_send(struct eth_device *dev, void *packet, int pkt_len)
/* Wait for completion by testing the OWN bit */ for (i = 1000; i > 0; i--) { - invalidate_dcache_range((unsigned long)entry, - (unsigned long)entry + sizeof(*entry)); status = le16_to_cpu(entry->status); if ((status & 0x8000) == 0) break; @@ -442,8 +452,6 @@ static int pcnet_send(struct eth_device *dev, void *packet, int pkt_len) entry->misc = 0x00000000; entry->base = PCI_TO_MEM_LE(dev, packet); entry->status = cpu_to_le16(status); - flush_dcache_range((unsigned long)entry, - (unsigned long)entry + sizeof(*entry));
/* Trigger an immediate send poll. */ pcnet_write_csr(dev, 0, 0x0008); @@ -463,9 +471,7 @@ static int pcnet_recv (struct eth_device *dev) u16 status;
while (1) { - entry = &lp->rx_ring[lp->cur_rx]; - invalidate_dcache_range((unsigned long)entry, - (unsigned long)entry + sizeof(*entry)); + entry = &lp->uc->rx_ring[lp->cur_rx]; /* * If we own the next entry, it's a new packet. Send it up. */ @@ -505,8 +511,6 @@ static int pcnet_recv (struct eth_device *dev) } } entry->status |= cpu_to_le16(0x8000); - flush_dcache_range((unsigned long)entry, - (unsigned long)entry + sizeof(*entry));
if (++lp->cur_rx >= RX_RING_SIZE) lp->cur_rx = 0;

On Mon, Apr 07, 2014 at 04:41:46PM +0100, Paul Burton wrote:
The prior accesses to the descriptor rings & init block via cached memory had a few issues:
The memory needs cache flushes or invalidation at the appropriate times, but was not necessarily aligned on cache line boundaries. This could lead to data being incorrectly lost or written back to RAM at the wrong time.
There are points where ordering of writes to the memory is important, but because it's cached memory the pcnet controller would see cache lines written back ordered by address. This could occasionally lead to hardware seeing descriptors in an incorrect state.
Flushing the cache constantly is inefficient.
So, to avoid all of those issues simply access the descriptors & init block via uncached memory. The MIPS-specific UNCACHED_SDRAM macro is used to do this (retrieving an address in kseg1) as I could see no existing generic solution. Since the MIPS Malta board is the only user of the pcnet driver, hopefully this doesn't matter.
Signed-off-by: Paul Burton paul.burton@imgtec.com
Applied to u-boot/master, thanks!

The RX buffers are invalidated when a packet is received, however they were not suitably cache-line aligned. Allocate them seperately to the pcnet_priv structure and align to ARCH_DMA_MINALIGN in order to ensure suitable alignment for the cache invalidation, preventing anything else being placed in the same lines & lost.
Signed-off-by: Paul Burton paul.burton@imgtec.com --- drivers/net/pcnet.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/drivers/net/pcnet.c b/drivers/net/pcnet.c index 8c334c7..2802411 100644 --- a/drivers/net/pcnet.c +++ b/drivers/net/pcnet.c @@ -80,7 +80,7 @@ struct pcnet_uncached_priv { typedef struct pcnet_priv { struct pcnet_uncached_priv *uc; /* Receive Buffer space */ - unsigned char rx_buf[RX_RING_SIZE][PKT_BUF_SZ + 4]; + unsigned char (*rx_buf)[RX_RING_SIZE][PKT_BUF_SZ + 4]; int cur_rx; int cur_tx; } pcnet_priv_t; @@ -335,6 +335,10 @@ static int pcnet_init(struct eth_device *dev, bd_t *bis) flush_dcache_range(addr, addr + sizeof(*lp->uc)); addr = UNCACHED_SDRAM(addr); lp->uc = (struct pcnet_uncached_priv *)addr; + + addr = (u32)memalign(ARCH_DMA_MINALIGN, sizeof(*lp->rx_buf)); + flush_dcache_range(addr, addr + sizeof(*lp->rx_buf)); + lp->rx_buf = (void *)addr; }
uc = lp->uc; @@ -348,7 +352,7 @@ static int pcnet_init(struct eth_device *dev, bd_t *bis) */ lp->cur_rx = 0; for (i = 0; i < RX_RING_SIZE; i++) { - uc->rx_ring[i].base = PCI_TO_MEM_LE(dev, lp->rx_buf[i]); + uc->rx_ring[i].base = PCI_TO_MEM_LE(dev, (*lp->rx_buf)[i]); uc->rx_ring[i].buf_length = cpu_to_le16(-PKT_BUF_SZ); uc->rx_ring[i].status = cpu_to_le16(0x8000); PCNET_DEBUG1 @@ -467,6 +471,7 @@ static int pcnet_send(struct eth_device *dev, void *packet, int pkt_len) static int pcnet_recv (struct eth_device *dev) { struct pcnet_rx_head *entry; + unsigned char *buf; int pkt_len = 0; u16 status;
@@ -500,14 +505,12 @@ static int pcnet_recv (struct eth_device *dev) printf("%s: Rx%d: invalid packet length %d\n", dev->name, lp->cur_rx, pkt_len); } else { - invalidate_dcache_range( - (unsigned long)lp->rx_buf[lp->cur_rx], - (unsigned long)lp->rx_buf[lp->cur_rx] + - pkt_len); - NetReceive(lp->rx_buf[lp->cur_rx], pkt_len); + buf = (*lp->rx_buf)[lp->cur_rx]; + invalidate_dcache_range((unsigned long)buf, + (unsigned long)buf + pkt_len); + NetReceive(buf, pkt_len); PCNET_DEBUG2("Rx%d: %d bytes from 0x%p\n", - lp->cur_rx, pkt_len, - lp->rx_buf[lp->cur_rx]); + lp->cur_rx, pkt_len, buf); } } entry->status |= cpu_to_le16(0x8000);

On Mon, Apr 07, 2014 at 04:41:47PM +0100, Paul Burton wrote:
The RX buffers are invalidated when a packet is received, however they were not suitably cache-line aligned. Allocate them seperately to the pcnet_priv structure and align to ARCH_DMA_MINALIGN in order to ensure suitable alignment for the cache invalidation, preventing anything else being placed in the same lines & lost.
Signed-off-by: Paul Burton paul.burton@imgtec.com
Applied to u-boot/master, thanks!

The ordering of accesses to the rx & tx descriptors is important, yet the send & recv functions accessed them via regular structure accesses. This leaves the compiler with the opportunity to reorder those accesses or to hoist them outside of loops. Prevent that from happening by using readl & writel to access the descriptors. As a nice bonus, this removes the need for the driver to care about endianness.
Signed-off-by: Paul Burton paul.burton@imgtec.com --- drivers/net/pcnet.c | 37 +++++++++++++++++++------------------ 1 file changed, 19 insertions(+), 18 deletions(-)
diff --git a/drivers/net/pcnet.c b/drivers/net/pcnet.c index 2802411..237fbba 100644 --- a/drivers/net/pcnet.c +++ b/drivers/net/pcnet.c @@ -434,7 +434,7 @@ static int pcnet_send(struct eth_device *dev, void *packet, int pkt_len)
/* Wait for completion by testing the OWN bit */ for (i = 1000; i > 0; i--) { - status = le16_to_cpu(entry->status); + status = readw(&entry->status); if ((status & 0x8000) == 0) break; udelay(100); @@ -451,11 +451,10 @@ static int pcnet_send(struct eth_device *dev, void *packet, int pkt_len) * Setup Tx ring. Caution: the write order is important here, * set the status with the "ownership" bits last. */ - status = 0x8300; - entry->length = cpu_to_le16(-pkt_len); - entry->misc = 0x00000000; - entry->base = PCI_TO_MEM_LE(dev, packet); - entry->status = cpu_to_le16(status); + writew(-pkt_len, &entry->length); + writel(0, &entry->misc); + writel(PCI_TO_MEM(dev, packet), &entry->base); + writew(0x8300, &entry->status);
/* Trigger an immediate send poll. */ pcnet_write_csr(dev, 0, 0x0008); @@ -473,34 +472,34 @@ static int pcnet_recv (struct eth_device *dev) struct pcnet_rx_head *entry; unsigned char *buf; int pkt_len = 0; - u16 status; + u16 status, err_status;
while (1) { entry = &lp->uc->rx_ring[lp->cur_rx]; /* * If we own the next entry, it's a new packet. Send it up. */ - status = le16_to_cpu(entry->status); + status = readw(&entry->status); if ((status & 0x8000) != 0) break; - status >>= 8; + err_status = status >> 8;
- if (status != 0x03) { /* There was an error. */ + if (err_status != 0x03) { /* There was an error. */ printf("%s: Rx%d", dev->name, lp->cur_rx); - PCNET_DEBUG1(" (status=0x%x)", status); - if (status & 0x20) + PCNET_DEBUG1(" (status=0x%x)", err_status); + if (err_status & 0x20) printf(" Frame"); - if (status & 0x10) + if (err_status & 0x10) printf(" Overflow"); - if (status & 0x08) + if (err_status & 0x08) printf(" CRC"); - if (status & 0x04) + if (err_status & 0x04) printf(" Fifo"); printf(" Error\n"); - entry->status &= le16_to_cpu(0x03ff); + status &= 0x03ff;
} else { - pkt_len = (le32_to_cpu(entry->msg_length) & 0xfff) - 4; + pkt_len = (readl(&entry->msg_length) & 0xfff) - 4; if (pkt_len < 60) { printf("%s: Rx%d: invalid packet length %d\n", dev->name, lp->cur_rx, pkt_len); @@ -513,7 +512,9 @@ static int pcnet_recv (struct eth_device *dev) lp->cur_rx, pkt_len, buf); } } - entry->status |= cpu_to_le16(0x8000); + + status |= 0x8000; + writew(status, &entry->status);
if (++lp->cur_rx >= RX_RING_SIZE) lp->cur_rx = 0;

On Mon, Apr 07, 2014 at 04:41:48PM +0100, Paul Burton wrote:
The ordering of accesses to the rx & tx descriptors is important, yet the send & recv functions accessed them via regular structure accesses. This leaves the compiler with the opportunity to reorder those accesses or to hoist them outside of loops. Prevent that from happening by using readl & writel to access the descriptors. As a nice bonus, this removes the need for the driver to care about endianness.
Signed-off-by: Paul Burton paul.burton@imgtec.com
Applied to u-boot/master, thanks!

Hi Tom,
2014-04-07 17:41 GMT+02:00 Paul Burton paul.burton@imgtec.com:
This series fixes issues with the pcnet driver & its memory accesses.
Previously the network interface on the MIPS Malta board was unreliable & would often time out whilst transferring files via TFTP, but with this series applied it is now stable.
Paul Burton (3): pcnet: access descriptor rings & init block uncached pcnet: align rx buffers for cache invalidation pcnet: force ordering of descriptor accesses
In patchwork someone assigned all patches in this series to you. Is that intentional or shall I pick them up for MIPS tree?

On Thu, Apr 17, 2014 at 09:44:06PM +0200, Daniel Schwierzeck wrote:
Hi Tom,
2014-04-07 17:41 GMT+02:00 Paul Burton paul.burton@imgtec.com:
This series fixes issues with the pcnet driver & its memory accesses.
Previously the network interface on the MIPS Malta board was unreliable & would often time out whilst transferring files via TFTP, but with this series applied it is now stable.
Paul Burton (3): pcnet: access descriptor rings & init block uncached pcnet: align rx buffers for cache invalidation pcnet: force ordering of descriptor accesses
In patchwork someone assigned all patches in this series to you. Is that intentional or shall I pick them up for MIPS tree?
I grabbed them since they were networking related and Joe is a bit busy. As a rule of thumb, if you see something assigned to me in patchwork but you want to take it, feel free. I often grab things to make sure they don't get missed.
participants (3)
-
Daniel Schwierzeck
-
Paul Burton
-
Tom Rini