[PATCH 0/1] Workaround for timeout error in NETSEC driver

This patch adds the workaround for the timeout error in the NETSEC driver when sending packets.
The NETSEC driver occasionally shows a "netsec_wait_while_busy" error and fails to send packets like the below:
==================== U-Boot 2023.01 (Jan 09 2023 - 16:07:33 +0000)
CPU: SC2A11:Cortex-A53 MPCore 24cores Model: Socionext Developer Box DRAM: 1.9 GiB (effective 63.9 GiB) optee optee: OP-TEE: revision 3.20 (8e74d476) I/TC: Reserved shared memory is enabled I/TC: Dynamic shared memory is enabled I/TC: Normal World virtualization support is disabled I/TC: Asynchronous notifications are disabled Core: 24 devices, 19 uclasses, devicetree: separate MMC: sdhci@52300000: 0 Loading Environment from nowhere... OK PCI: Failed autoconfig bar 14 In: uart@2a400000 Out: uart@2a400000 Err: uart@2a400000 Net: SF: Detected mx25u51245g with page size 256 Bytes, erase size 4 KiB, total 64 MiB eth0: ethernet@522d0000 starting USB... Bus xhci_pci: Register 8000820 NbrPorts 8 Starting the controller USB XHCI 1.00 scanning bus xhci_pci for devices... 1 USB Device(s) found scanning usb for storage devices... 0 Storage Device(s) found scanning bus for devices... SATA link 0 timeout. SATA link 1 timeout. AHCI 0001.0200 32 slots 2 ports 6 Gbps 0x3 impl SATA mode flags: 64bit ncq stag led clo pmp pio slum part ccc sxs Hit any key to stop autoboot: 2 0 => setenv serverip 192.168.1.109 => dhcp BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 BOOTP broadcast 4 *** Unhandled DHCP Option in OFFER/ACK: 125 *** Unhandled DHCP Option in OFFER/ACK: 125 DHCP client bound to address 192.168.1.111 (2131 ms) *** Warning: no boot file name; using 'C0A8016F.img' Using ethernet@522d0000 device TFTP from server 192.168.1.109; our IP address is 192.168.1.111 Filename 'C0A8016F.img'. Load address: 0x80000000 Loading: * TFTP error: 'File not found' (1) Not retrying... => ping 192.168.1.1 Using ethernet@522d0000 device host 192.168.1.1 is alive => ping 192.168.1.1 ethernet@522d0000 Waiting for PHY auto negotiation to complete... done Using ethernet@522d0000 device host 192.168.1.1 is alive => ping 192.168.1.1
(...skipping...)
=> ping 192.168.1.1 ethernet@522d0000 Waiting for PHY auto negotiation to complete... done netsec_wait_while_busy: timeout Using ethernet@522d0000 device
ARP Retry count exceeded; starting again ping failed; host 192.168.1.1 is not alive ====================
This could be occurred by, e.g., tftp and dhcp as well.
After investigation, it turns out it's waiting for MAC_REG_DESC_SOFT_RST to be cleared to 0 after writing 1. The NETSEC firmware normally clears it, but sometimes it seems to enter a weird state where it's never been cleared until the next NETSEC software reset. The reproducibility seems to vary on the environment (for example, 100 ping tries would cause the issue in my environment.); however, we faced the issue on at least three different boards in three different networking environments.
We have already reported the issue to Socionext, the supplier of NETSEC firmware, but it will take longer to find the root cause and fix the issue. Meanwhile, we can add a workaround for the problematic state by software resetting NETSEC.
Ryosuke Saito (1): net: sni_netsec: Add workaround for timeout error
drivers/net/sni_netsec.c | 50 ++++++++++++++++++++++++++++++++-------- 1 file changed, 41 insertions(+), 9 deletions(-)

The NETSEC GMAC occasionally falls into a weird state where MAC_REG_DESC_SOFT_RST has never been cleared and shows errors like the below when networking commands are issued:
=> ping 192.168.1.1 ethernet@522d0000 Waiting for PHY auto negotiation to complete... done netsec_wait_while_busy: timeout Using ethernet@522d0000 device
ARP Retry count exceeded; starting again ping failed; host 192.168.1.1 is not alive
It happens on not only 'ping' but also 'dhcp', 'tftp' and so on.
Luckily, restarting the NETSEC GMAC and trying again seems to fix the problematic state. So first ensure that we haven't entered the state by checking MAC_REG_DESC_SOFT_RST to be cleared; otherwise, restarting NETSEC/PHY and trying again would work as a workaround.
Signed-off-by: Ryosuke Saito ryosuke.saito@linaro.org --- drivers/net/sni_netsec.c | 50 ++++++++++++++++++++++++++++++++-------- 1 file changed, 41 insertions(+), 9 deletions(-)
diff --git a/drivers/net/sni_netsec.c b/drivers/net/sni_netsec.c index 9780f2092bd4..71afe78fd28a 100644 --- a/drivers/net/sni_netsec.c +++ b/drivers/net/sni_netsec.c @@ -286,6 +286,8 @@ struct netsec_rx_pkt_info { bool err_flag; };
+static int netsec_reset_hardware(struct netsec_priv *priv, bool load_ucode); + static void netsec_write_reg(struct netsec_priv *priv, u32 reg_addr, u32 val) { writel(val, priv->ioaddr + reg_addr); @@ -532,18 +534,11 @@ static int netsec_mac_update_to_phy_state(struct netsec_priv *priv) return 0; }
-static int netsec_start_gmac(struct netsec_priv *priv) +static int netsec_reset_gmac(struct netsec_priv *priv) { u32 value = 0; int ret;
- if (priv->max_speed != SPEED_1000) - value = (NETSEC_GMAC_MCR_REG_CST | - NETSEC_GMAC_MCR_REG_HALF_DUPLEX_COMMON); - - if (netsec_set_mac_reg(priv, GMAC_REG_MCR, value)) - return -ETIMEDOUT; - if (netsec_set_mac_reg(priv, GMAC_REG_BMR, NETSEC_GMAC_BMR_REG_RESET)) return -ETIMEDOUT; @@ -558,10 +553,47 @@ static int netsec_start_gmac(struct netsec_priv *priv) if (value & NETSEC_GMAC_BMR_REG_SWR) return -EAGAIN;
+ /** + * NETSEC GMAC sometimes shows the peculiar behaviour where + * MAC_REG_DESC_SOFT_RST never been cleared, resulting in the loss of + * sending packets. + * + * Workaround: + * Restart NETSEC and PHY, retry again. + */ netsec_write_reg(priv, MAC_REG_DESC_SOFT_RST, 1); - if (netsec_wait_while_busy(priv, MAC_REG_DESC_SOFT_RST, 1)) + udelay(1000); + if (netsec_read_reg(priv, MAC_REG_DESC_SOFT_RST)) { + phy_shutdown(priv->phydev); + netsec_reset_hardware(priv, false); + phy_startup(priv->phydev); + return -EAGAIN; + } + return 0; +} + +static int netsec_start_gmac(struct netsec_priv *priv) +{ + u32 value = 0; + u32 failure = 0; + int ret; + + if (priv->max_speed != SPEED_1000) + value = (NETSEC_GMAC_MCR_REG_CST | + NETSEC_GMAC_MCR_REG_HALF_DUPLEX_COMMON); + + if (netsec_set_mac_reg(priv, GMAC_REG_MCR, value)) return -ETIMEDOUT;
+ /* Reset GMAC */ + while ((ret = netsec_reset_gmac(priv)) == -EAGAIN && ++failure < 3) + ; + + if (ret) { + pr_err("%s: failed to reset gmac(err=%d).\n", __func__, ret); + return ret; + } + netsec_write_reg(priv, MAC_REG_DESC_INIT, 1); if (netsec_wait_while_busy(priv, MAC_REG_DESC_INIT, 1)) return -ETIMEDOUT;

On Thu, 3 Aug 2023 at 23:56, Ryosuke Saito ryosuke.saito@linaro.org wrote:
The NETSEC GMAC occasionally falls into a weird state where MAC_REG_DESC_SOFT_RST has never been cleared and shows errors like the below when networking commands are issued:
=> ping 192.168.1.1 ethernet@522d0000 Waiting for PHY auto negotiation to complete... done netsec_wait_while_busy: timeout Using ethernet@522d0000 device ARP Retry count exceeded; starting again ping failed; host 192.168.1.1 is not alive
It happens on not only 'ping' but also 'dhcp', 'tftp' and so on.
Luckily, restarting the NETSEC GMAC and trying again seems to fix the problematic state. So first ensure that we haven't entered the state by checking MAC_REG_DESC_SOFT_RST to be cleared; otherwise, restarting NETSEC/PHY and trying again would work as a workaround.
Signed-off-by: Ryosuke Saito ryosuke.saito@linaro.org
drivers/net/sni_netsec.c | 50 ++++++++++++++++++++++++++++++++-------- 1 file changed, 41 insertions(+), 9 deletions(-)
Tested-By: Masahisa Kojima masahisa.kojima@linaro.org
Overnight testing of 'dhcp->ping->reset' sequence works for me. Thank you for fixing the issue.
Regards, Masahisa Kojima
diff --git a/drivers/net/sni_netsec.c b/drivers/net/sni_netsec.c index 9780f2092bd4..71afe78fd28a 100644 --- a/drivers/net/sni_netsec.c +++ b/drivers/net/sni_netsec.c @@ -286,6 +286,8 @@ struct netsec_rx_pkt_info { bool err_flag; };
+static int netsec_reset_hardware(struct netsec_priv *priv, bool load_ucode);
static void netsec_write_reg(struct netsec_priv *priv, u32 reg_addr, u32 val) { writel(val, priv->ioaddr + reg_addr); @@ -532,18 +534,11 @@ static int netsec_mac_update_to_phy_state(struct netsec_priv *priv) return 0; }
-static int netsec_start_gmac(struct netsec_priv *priv) +static int netsec_reset_gmac(struct netsec_priv *priv) { u32 value = 0; int ret;
if (priv->max_speed != SPEED_1000)
value = (NETSEC_GMAC_MCR_REG_CST |
NETSEC_GMAC_MCR_REG_HALF_DUPLEX_COMMON);
if (netsec_set_mac_reg(priv, GMAC_REG_MCR, value))
return -ETIMEDOUT;
if (netsec_set_mac_reg(priv, GMAC_REG_BMR, NETSEC_GMAC_BMR_REG_RESET)) return -ETIMEDOUT;
@@ -558,10 +553,47 @@ static int netsec_start_gmac(struct netsec_priv *priv) if (value & NETSEC_GMAC_BMR_REG_SWR) return -EAGAIN;
/**
* NETSEC GMAC sometimes shows the peculiar behaviour where
* MAC_REG_DESC_SOFT_RST never been cleared, resulting in the loss of
* sending packets.
*
* Workaround:
* Restart NETSEC and PHY, retry again.
*/ netsec_write_reg(priv, MAC_REG_DESC_SOFT_RST, 1);
if (netsec_wait_while_busy(priv, MAC_REG_DESC_SOFT_RST, 1))
udelay(1000);
if (netsec_read_reg(priv, MAC_REG_DESC_SOFT_RST)) {
phy_shutdown(priv->phydev);
netsec_reset_hardware(priv, false);
phy_startup(priv->phydev);
return -EAGAIN;
}
return 0;
+}
+static int netsec_start_gmac(struct netsec_priv *priv) +{
u32 value = 0;
u32 failure = 0;
int ret;
if (priv->max_speed != SPEED_1000)
value = (NETSEC_GMAC_MCR_REG_CST |
NETSEC_GMAC_MCR_REG_HALF_DUPLEX_COMMON);
if (netsec_set_mac_reg(priv, GMAC_REG_MCR, value)) return -ETIMEDOUT;
/* Reset GMAC */
while ((ret = netsec_reset_gmac(priv)) == -EAGAIN && ++failure < 3)
;
if (ret) {
pr_err("%s: failed to reset gmac(err=%d).\n", __func__, ret);
return ret;
}
netsec_write_reg(priv, MAC_REG_DESC_INIT, 1); if (netsec_wait_while_busy(priv, MAC_REG_DESC_INIT, 1)) return -ETIMEDOUT;
-- 2.41.0

On Thu, Aug 03, 2023 at 11:56:48PM +0900, Ryosuke Saito wrote:
The NETSEC GMAC occasionally falls into a weird state where MAC_REG_DESC_SOFT_RST has never been cleared and shows errors like the below when networking commands are issued:
=> ping 192.168.1.1 ethernet@522d0000 Waiting for PHY auto negotiation to complete... done netsec_wait_while_busy: timeout Using ethernet@522d0000 device ARP Retry count exceeded; starting again ping failed; host 192.168.1.1 is not alive
It happens on not only 'ping' but also 'dhcp', 'tftp' and so on.
Luckily, restarting the NETSEC GMAC and trying again seems to fix the problematic state. So first ensure that we haven't entered the state by checking MAC_REG_DESC_SOFT_RST to be cleared; otherwise, restarting NETSEC/PHY and trying again would work as a workaround.
Signed-off-by: Ryosuke Saito ryosuke.saito@linaro.org Tested-By: Masahisa Kojima masahisa.kojima@linaro.org
Applied to u-boot/master, thanks!
participants (3)
-
Masahisa Kojima
-
Ryosuke Saito
-
Tom Rini