[PATCH v2] pci: Work around PCIe link training failures

Attempt to handle cases with a downstream port of a PCIe switch where link training never completes and the link continues switching between speeds indefinitely with the data link layer never reaching the active state.
It has been observed with a downstream port of the ASMedia ASM2824 Gen 3 switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, P/N 41433, wired to a SiFive HiFive Unmatched board. In this setup the switches are supposed to negotiate the link speed of preferably 5.0GT/s, falling back to 2.5GT/s.
However the link continues oscillating between the two speeds, at the rate of 34-35 times per second, with link training reported active ~84% of the time repeatedly, e.g.:
02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode]) [...] Bus: primary=02, secondary=05, subordinate=05, sec-latency=0 [...] Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00 [...] LnkSta: Speed 5GT/s (downgraded), Width x1 (ok) TrErr- Train+ SlotClk+ DLActive- BWMgmt+ ABWMgmt- [...] LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB [...]
Forcibly limiting the target link speed to 2.5GT/s with the upstream ASM2824 device makes the two switches communicate correctly however:
02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode]) [...] Bus: primary=02, secondary=05, subordinate=09, sec-latency=0 [...] Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00 [...] LnkSta: Speed 2.5GT/s (downgraded), Width x1 (ok) TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt- [...] LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB [...]
and then:
05:00.0 PCI bridge [0604]: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch [12d8:2304] (rev 05) (prog-if 00 [Normal decode]) [...] Bus: primary=05, secondary=06, subordinate=09, sec-latency=0 [...] Capabilities: [c0] Express (v2) Upstream Port, MSI 00 [...] LnkSta: Speed 2.5GT/s (downgraded), Width x1 (downgraded) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- [...] LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB [...]
Make use of this observation then and attempt to detect the inability to negotiate the link speed automatically, and then handle it by hand. Use the Data Link Layer Link Active status flag as the primary indicator of successful link speed negotiation, but given that the flag is optional by hardware to implement (the ASM2824 does have it though), resort to checking for the mandatory Link Bandwidth Management Status flag showing that the link speed or width has been changed in an attempt to correct unreliable link operation (the ASM2824 does set it too).
If these checks indicate that link may not operate correctly, then poll the Data Link Layer Link Active status flag along with the Link Training flag for the duration of 200ms to see if the link has stabilised, that is either that the Data Link Layer Link Active status flag has been set or that Link Training has been inactive during at least the second half of the inteval.
If that has indicated failure, reduce the target speed, request a link retrain and check again if the link has stabilised. Repeat until either successful or the link speeds supported by the downstream port have been exhausted.
Signed-off-by: Maciej W. Rozycki macro@orcam.me.uk --- Hi,
Thank you, Pali, for your valuable feedback. Here's v2 of the change with your suggestions taken into account. I believe I have addressed all your concerns and I haven't heard from anyone else, so I consider this version final unless someone speaks out.
Please apply then.
Maciej
Changes from v1:
- also handle root complex and PCI/PCI-X to PCIe bridge devices,
- update comments throughout accordingly,
- likewise the change description; mention the rate of oscillations observed with the ASM2824 device. --- drivers/pci/pci_auto.c | 157 +++++++++++++++++++++++++++++++++++++++++++++++++ include/pci.h | 18 +++++ 2 files changed, 174 insertions(+), 1 deletion(-)
u-boot-pcie-manual-retrain.diff Index: u-boot/drivers/pci/pci_auto.c =================================================================== --- u-boot.orig/drivers/pci/pci_auto.c +++ u-boot/drivers/pci/pci_auto.c @@ -5,6 +5,7 @@ * Author: Matt Porter mporter@mvista.com * * Copyright 2000 MontaVista Software Inc. + * Copyright (c) 2021 Maciej W. Rozycki macro@orcam.me.uk */
#include <common.h> @@ -12,6 +13,7 @@ #include <errno.h> #include <log.h> #include <pci.h> +#include <time.h> #include "pci_internal.h"
/* the user can define CONFIG_SYS_PCI_CACHE_LINE_SIZE to avoid problems */ @@ -180,6 +182,155 @@ static void dm_pciauto_setup_device(stru dm_pci_write_config8(dev, PCI_LATENCY_TIMER, 0x80); }
+/* + * Check if the link of a downstream PCIe port operates correctly. + * + * For that check if the optional Data Link Layer Link Active status gets + * on within a 200ms period or failing that wait until the completion of + * that period and check if link training has shown the completed status + * continuously throughout the second half of that period. + * + * Observation with the ASMedia ASM2824 Gen 3 switch indicates it takes + * 11-44ms to indicate the Data Link Layer Link Active status at 2.5GT/s, + * though it may take a couple of link training iterations. + */ +static bool dm_pciauto_exp_link_stable(struct udevice *dev, int pcie_off) +{ + u64 loops = 0, trcount = 0, ntrcount = 0, flips = 0; + bool dllla, lnktr, plnktr; + u16 exp_lnksta; + pci_dev_t bdf; + u64 end; + + dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKSTA, &exp_lnksta); + plnktr = !!(exp_lnksta & PCI_EXP_LNKSTA_LNKTR); + + end = get_ticks() + usec_to_tick(200000); + do { + dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKSTA, + &exp_lnksta); + dllla = !!(exp_lnksta & PCI_EXP_LNKSTA_DLLLA); + lnktr = !!(exp_lnksta & PCI_EXP_LNKSTA_LNKTR); + + flips += plnktr ^ lnktr; + if (lnktr) { + ntrcount = 0; + trcount++; + } else { + ntrcount++; + } + loops++; + + plnktr = lnktr; + } while (!dllla && get_ticks() < end); + + bdf = dm_pci_get_bdf(dev); + debug("PCI Autoconfig: %02x.%02x.%02x: Fixup link: DL active: %u; " + "%3llu flips, %6llu loops of which %6llu while training, " + "final %6llu stable\n", + PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf), + (unsigned int)dllla, + (unsigned long long)flips, (unsigned long long)loops, + (unsigned long long)trcount, (unsigned long long)ntrcount); + + return dllla || ntrcount >= loops / 2; +} + +/* + * Retrain the link of a downstream PCIe port by hand if necessary. + * + * This is needed at least where a downstream port of the ASMedia ASM2824 + * Gen 3 switch is wired to the upstream port of the Pericom PI7C9X2G304 + * Gen 2 switch, and observed with the Delock Riser Card PCI Express x1 > + * 2 x PCIe x1 device, P/N 41433, plugged into the SiFive HiFive Unmatched + * board. + * + * In such a configuration the switches are supposed to negotiate the link + * speed of preferably 5.0GT/s, falling back to 2.5GT/s. However the link + * continues switching between the two speeds indefinitely and the data + * link layer never reaches the active state, with link training reported + * active ~84% of the time repeatedly. Forcing the target link speed to + * 2.5GT/s with the upstream ASM2824 device makes the two switches talk to + * each other correctly however. + * + * As this can potentially happen with any device and is cheap in the case + * of correctly operating hardware, let's do it for all downstream ports, + * for root complexes, PCIe switches and PCI/PCI-X to PCIe bridges, and if + * automatic link training may have failed to complete, as indicated by + * the optional Data Link Layer Link Active status being off and the Link + * Bandwidth Management Status showing that hardware has changed the link + * speed or width in an attempt to correct unreliable link operation, then + * try lower and lower speeds one by one until one has succeeded or we + * have run out of speeds. + * + * This requires the presence of the Link Control 2 register, so make sure + * the PCI Express Capability Version is at least 2. Also don't try, for + * obvious reasons, to limit the speed if 2.5GT/s is the only link speed + * supported. + */ +static void dm_pciauto_exp_fixup_link(struct udevice *dev, int pcie_off) +{ + u16 exp_lnksta, exp_lnkctl, exp_lnkctl2; + u16 exp_flags, exp_type, exp_version; + u32 exp_lnkcap, exp_lnkcap2; + unsigned int speed; + pci_dev_t bdf; + + dm_pci_read_config16(dev, pcie_off + PCI_EXP_FLAGS, &exp_flags); + exp_version = exp_flags & PCI_EXP_FLAGS_VERSION; + if (exp_version < 2) + return; + exp_type = (exp_flags & PCI_EXP_FLAGS_TYPE) >> 4; + switch (exp_type) { + case PCI_EXP_TYPE_ROOT_PORT: + case PCI_EXP_TYPE_DOWN_PORT: + case PCI_EXP_TYPE_PCIE_BRDG: + break; + default: + return; + } + + dm_pci_read_config32(dev, pcie_off + PCI_EXP_LNKCAP, &exp_lnkcap); + if ((exp_lnkcap & PCI_EXP_LNKCAP_SLS) <= PCI_EXP_LNKCAP_SLS_2_5GB) + return; + + dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKSTA, &exp_lnksta); + if ((exp_lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) != + PCI_EXP_LNKSTA_LBMS) + return; + + if (dm_pciauto_exp_link_stable(dev, pcie_off)) + return; + + bdf = dm_pci_get_bdf(dev); + debug("PCI Autoconfig: %02x.%02x.%02x: Downgrading speed by hand\n", + PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf)); + + dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKCTL, &exp_lnkctl); + dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKCTL2, &exp_lnkctl2); + exp_lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS; + + dm_pci_read_config32(dev, pcie_off + PCI_EXP_LNKCAP, &exp_lnkcap); + dm_pci_read_config32(dev, pcie_off + PCI_EXP_LNKCAP2, &exp_lnkcap2); + for (speed = (exp_lnkcap & PCI_EXP_LNKCAP_SLS) - 1; + speed >= PCI_EXP_LNKCAP_SLS_2_5GB; + speed--) { + if (exp_lnkcap2 && (exp_lnkcap2 & 1 << speed) == 0) + continue; + + debug("PCI Autoconfig: %02x.%02x.%02x: Trying speed: %u\n", + PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf), speed); + + dm_pci_write_config16(dev, pcie_off + PCI_EXP_LNKCTL2, + exp_lnkctl2 | speed); + dm_pci_write_config16(dev, pcie_off + PCI_EXP_LNKCTL, + exp_lnkctl | PCI_EXP_LNKCTL_RETRLNK); + + if (dm_pciauto_exp_link_stable(dev, pcie_off)) + return; + } +} + void dm_pciauto_prescan_setup_bridge(struct udevice *dev, int sub_bus) { struct pci_region *pci_mem; @@ -189,6 +340,7 @@ void dm_pciauto_prescan_setup_bridge(str u8 io_32; struct udevice *ctlr = pci_get_controller(dev); struct pci_controller *ctlr_hose = dev_get_uclass_priv(ctlr); + int pcie_off;
pci_mem = ctlr_hose->pci_mem; pci_prefetch = ctlr_hose->pci_prefetch; @@ -267,6 +419,11 @@ void dm_pciauto_prescan_setup_bridge(str cmdstat |= PCI_COMMAND_IO; }
+ /* For PCIe devices see if we need to retrain the link by hand */ + pcie_off = dm_pci_find_capability(dev, PCI_CAP_ID_EXP); + if (pcie_off) + dm_pciauto_exp_fixup_link(dev, pcie_off); + /* Enable memory and I/O accesses, enable bus master */ dm_pci_write_config16(dev, PCI_COMMAND, cmdstat | PCI_COMMAND_MASTER); } Index: u-boot/include/pci.h =================================================================== --- u-boot.orig/include/pci.h +++ u-boot/include/pci.h @@ -5,6 +5,7 @@ * * (C) Copyright 2002 * Wolfgang Denk, DENX Software Engineering, wd@denx.de. + * Copyright (c) 2021 Maciej W. Rozycki macro@orcam.me.uk */
#ifndef _PCI_H @@ -475,16 +476,24 @@
/* PCI Express capabilities */ #define PCI_EXP_FLAGS 2 /* Capabilities register */ +#define PCI_EXP_FLAGS_VERSION 0x000f /* Capability Version */ #define PCI_EXP_FLAGS_TYPE 0x00f0 /* Device/Port type */ #define PCI_EXP_TYPE_ROOT_PORT 0x4 /* Root Port */ +#define PCI_EXP_TYPE_DOWN_PORT 0x6 /* Downstream Port */ +#define PCI_EXP_TYPE_PCIE_BRDG 0x8 /* PCI/PCI-X to PCI Express Bridge */ #define PCI_EXP_DEVCAP 4 /* Device capabilities */ #define PCI_EXP_DEVCAP_FLR 0x10000000 /* Function Level Reset */ #define PCI_EXP_DEVCTL 8 /* Device Control */ #define PCI_EXP_DEVCTL_BCR_FLR 0x8000 /* Bridge Configuration Retry / FLR */ #define PCI_EXP_LNKCAP 12 /* Link Capabilities */ #define PCI_EXP_LNKCAP_SLS 0x0000000f /* Supported Link Speeds */ +#define PCI_EXP_LNKCAP_SLS_2_5GB 0x0001 /* Supported Link Speed 2.5GT/s */ +#define PCI_EXP_LNKCAP_SLS_5_0GB 0x0002 /* Supported Link Speed 5.0GT/s */ +#define PCI_EXP_LNKCAP_SLS_8_0GB 0x0003 /* Supported Link Speed 8.0GT/s */ #define PCI_EXP_LNKCAP_MLW 0x000003f0 /* Maximum Link Width */ #define PCI_EXP_LNKCAP_DLLLARC 0x00100000 /* Data Link Layer Link Active Reporting Capable */ +#define PCI_EXP_LNKCTL 16 /* Link Control */ +#define PCI_EXP_LNKCTL_RETRLNK 0x0020 /* Retrain Link */ #define PCI_EXP_LNKSTA 18 /* Link Status */ #define PCI_EXP_LNKSTA_CLS 0x000f /* Current Link Speed */ #define PCI_EXP_LNKSTA_CLS_2_5GB 0x0001 /* Current Link Speed 2.5GT/s */ @@ -492,7 +501,9 @@ #define PCI_EXP_LNKSTA_CLS_8_0GB 0x0003 /* Current Link Speed 8.0GT/s */ #define PCI_EXP_LNKSTA_NLW 0x03f0 /* Negotiated Link Width */ #define PCI_EXP_LNKSTA_NLW_SHIFT 4 /* start of NLW mask in link status */ +#define PCI_EXP_LNKSTA_LNKTR 0x0800 /* Link Training */ #define PCI_EXP_LNKSTA_DLLLA 0x2000 /* Data Link Layer Link Active */ +#define PCI_EXP_LNKSTA_LBMS 0x4000 /* Link Bandwidth Management Status */ #define PCI_EXP_SLTCAP 20 /* Slot Capabilities */ #define PCI_EXP_SLTCAP_PSN 0xfff80000 /* Physical Slot Number */ #define PCI_EXP_RTCTL 28 /* Root Control */ @@ -503,8 +514,13 @@ #define PCI_EXP_DEVCAP2_ARI 0x00000020 /* ARI Forwarding Supported */ #define PCI_EXP_DEVCTL2 40 /* Device Control 2 */ #define PCI_EXP_DEVCTL2_ARI 0x0020 /* Alternative Routing-ID */ - +#define PCI_EXP_LNKCAP2 44 /* Link Capability 2 */ +#define PCI_EXP_LNKCAP2_SLSV 0x000000fe /* Supported Link Speeds Vector */ #define PCI_EXP_LNKCTL2 48 /* Link Control 2 */ +#define PCI_EXP_LNKCTL2_TLS 0x000f /* Target Link Speed */ +#define PCI_EXP_LNKSTA_TLS_2_5GB 0x0001 /* Target Link Speed 2.5GT/s */ +#define PCI_EXP_LNKSTA_TLS_5_0GB 0x0002 /* Target Link Speed 5.0GT/s */ +#define PCI_EXP_LNKSTA_TLS_8_0GB 0x0003 /* Target Link Speed 8.0GT/s */ /* Single Root I/O Virtualization Registers */ #define PCI_SRIOV_CAP 0x04 /* SR-IOV Capabilities */ #define PCI_SRIOV_CTRL 0x08 /* SR-IOV Control */

Hi Maciej,
On 11/16/21 12:35, Maciej W. Rozycki wrote:
Attempt to handle cases with a downstream port of a PCIe switch where link training never completes and the link continues switching between speeds indefinitely with the data link layer never reaching the active state.
It has been observed with a downstream port of the ASMedia ASM2824 Gen 3 switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, P/N 41433, wired to a SiFive HiFive Unmatched board. In this setup the switches are supposed to negotiate the link speed of preferably 5.0GT/s, falling back to 2.5GT/s.
However the link continues oscillating between the two speeds, at the rate of 34-35 times per second, with link training reported active ~84% of the time repeatedly, e.g.:
Interesting topic and approach. Please find a few questions / comments below.
02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode]) [...] Bus: primary=02, secondary=05, subordinate=05, sec-latency=0 [...] Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00 [...] LnkSta: Speed 5GT/s (downgraded), Width x1 (ok) TrErr- Train+ SlotClk+ DLActive- BWMgmt+ ABWMgmt- [...] LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB [...]
Forcibly limiting the target link speed to 2.5GT/s with the upstream ASM2824 device makes the two switches communicate correctly however:
02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode]) [...] Bus: primary=02, secondary=05, subordinate=09, sec-latency=0 [...] Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00 [...] LnkSta: Speed 2.5GT/s (downgraded), Width x1 (ok) TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt- [...] LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB [...]
and then:
05:00.0 PCI bridge [0604]: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch [12d8:2304] (rev 05) (prog-if 00 [Normal decode]) [...] Bus: primary=05, secondary=06, subordinate=09, sec-latency=0 [...] Capabilities: [c0] Express (v2) Upstream Port, MSI 00 [...] LnkSta: Speed 2.5GT/s (downgraded), Width x1 (downgraded) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- [...] LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB [...]
Make use of this observation then and attempt to detect the inability to negotiate the link speed automatically, and then handle it by hand. Use the Data Link Layer Link Active status flag as the primary indicator of successful link speed negotiation, but given that the flag is optional by hardware to implement (the ASM2824 does have it though), resort to checking for the mandatory Link Bandwidth Management Status flag showing that the link speed or width has been changed in an attempt to correct unreliable link operation (the ASM2824 does set it too).
If these checks indicate that link may not operate correctly, then poll the Data Link Layer Link Active status flag along with the Link Training flag for the duration of 200ms to see if the link has stabilised, that is either that the Data Link Layer Link Active status flag has been set or that Link Training has been inactive during at least the second half of the inteval.
If that has indicated failure, reduce the target speed, request a link retrain and check again if the link has stabilised. Repeat until either successful or the link speeds supported by the downstream port have been exhausted.
So in such cases, the link speed will be downgraded? I would expect at least a big warning in such cases.
Did you try to change some other configuration options for the link establishment? I remember that on some hardware we were able to get better "link-up results" by setting the de-emphasis level to -3.5dB instead of -6dB (in the link control status register 2), before trying to re-estblish the link. Did you also test with "tuning" such parameters. There might be other, which I'm missing right now.
Thanks, Stefan
Signed-off-by: Maciej W. Rozycki macro@orcam.me.uk
Hi,
Thank you, Pali, for your valuable feedback. Here's v2 of the change with your suggestions taken into account. I believe I have addressed all your concerns and I haven't heard from anyone else, so I consider this version final unless someone speaks out.
Please apply then.
Maciej
Changes from v1:
also handle root complex and PCI/PCI-X to PCIe bridge devices,
update comments throughout accordingly,
likewise the change description; mention the rate of oscillations observed with the ASM2824 device.
drivers/pci/pci_auto.c | 157 +++++++++++++++++++++++++++++++++++++++++++++++++ include/pci.h | 18 +++++ 2 files changed, 174 insertions(+), 1 deletion(-)
u-boot-pcie-manual-retrain.diff Index: u-boot/drivers/pci/pci_auto.c =================================================================== --- u-boot.orig/drivers/pci/pci_auto.c +++ u-boot/drivers/pci/pci_auto.c @@ -5,6 +5,7 @@
- Author: Matt Porter mporter@mvista.com
- Copyright 2000 MontaVista Software Inc.
- Copyright (c) 2021 Maciej W. Rozycki macro@orcam.me.uk
*/
#include <common.h>
@@ -12,6 +13,7 @@ #include <errno.h> #include <log.h> #include <pci.h> +#include <time.h> #include "pci_internal.h"
/* the user can define CONFIG_SYS_PCI_CACHE_LINE_SIZE to avoid problems */ @@ -180,6 +182,155 @@ static void dm_pciauto_setup_device(stru dm_pci_write_config8(dev, PCI_LATENCY_TIMER, 0x80); }
+/*
- Check if the link of a downstream PCIe port operates correctly.
- For that check if the optional Data Link Layer Link Active status gets
- on within a 200ms period or failing that wait until the completion of
- that period and check if link training has shown the completed status
- continuously throughout the second half of that period.
- Observation with the ASMedia ASM2824 Gen 3 switch indicates it takes
- 11-44ms to indicate the Data Link Layer Link Active status at 2.5GT/s,
- though it may take a couple of link training iterations.
- */
+static bool dm_pciauto_exp_link_stable(struct udevice *dev, int pcie_off) +{
- u64 loops = 0, trcount = 0, ntrcount = 0, flips = 0;
- bool dllla, lnktr, plnktr;
- u16 exp_lnksta;
- pci_dev_t bdf;
- u64 end;
- dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKSTA, &exp_lnksta);
- plnktr = !!(exp_lnksta & PCI_EXP_LNKSTA_LNKTR);
- end = get_ticks() + usec_to_tick(200000);
- do {
dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKSTA,
&exp_lnksta);
dllla = !!(exp_lnksta & PCI_EXP_LNKSTA_DLLLA);
lnktr = !!(exp_lnksta & PCI_EXP_LNKSTA_LNKTR);
flips += plnktr ^ lnktr;
if (lnktr) {
ntrcount = 0;
trcount++;
} else {
ntrcount++;
}
loops++;
plnktr = lnktr;
- } while (!dllla && get_ticks() < end);
- bdf = dm_pci_get_bdf(dev);
- debug("PCI Autoconfig: %02x.%02x.%02x: Fixup link: DL active: %u; "
"%3llu flips, %6llu loops of which %6llu while training, "
"final %6llu stable\n",
PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf),
(unsigned int)dllla,
(unsigned long long)flips, (unsigned long long)loops,
(unsigned long long)trcount, (unsigned long long)ntrcount);
- return dllla || ntrcount >= loops / 2;
+}
+/*
- Retrain the link of a downstream PCIe port by hand if necessary.
- This is needed at least where a downstream port of the ASMedia ASM2824
- Gen 3 switch is wired to the upstream port of the Pericom PI7C9X2G304
- Gen 2 switch, and observed with the Delock Riser Card PCI Express x1 >
- 2 x PCIe x1 device, P/N 41433, plugged into the SiFive HiFive Unmatched
- board.
- In such a configuration the switches are supposed to negotiate the link
- speed of preferably 5.0GT/s, falling back to 2.5GT/s. However the link
- continues switching between the two speeds indefinitely and the data
- link layer never reaches the active state, with link training reported
- active ~84% of the time repeatedly. Forcing the target link speed to
- 2.5GT/s with the upstream ASM2824 device makes the two switches talk to
- each other correctly however.
- As this can potentially happen with any device and is cheap in the case
- of correctly operating hardware, let's do it for all downstream ports,
- for root complexes, PCIe switches and PCI/PCI-X to PCIe bridges, and if
- automatic link training may have failed to complete, as indicated by
- the optional Data Link Layer Link Active status being off and the Link
- Bandwidth Management Status showing that hardware has changed the link
- speed or width in an attempt to correct unreliable link operation, then
- try lower and lower speeds one by one until one has succeeded or we
- have run out of speeds.
- This requires the presence of the Link Control 2 register, so make sure
- the PCI Express Capability Version is at least 2. Also don't try, for
- obvious reasons, to limit the speed if 2.5GT/s is the only link speed
- supported.
- */
+static void dm_pciauto_exp_fixup_link(struct udevice *dev, int pcie_off) +{
- u16 exp_lnksta, exp_lnkctl, exp_lnkctl2;
- u16 exp_flags, exp_type, exp_version;
- u32 exp_lnkcap, exp_lnkcap2;
- unsigned int speed;
- pci_dev_t bdf;
- dm_pci_read_config16(dev, pcie_off + PCI_EXP_FLAGS, &exp_flags);
- exp_version = exp_flags & PCI_EXP_FLAGS_VERSION;
- if (exp_version < 2)
return;
- exp_type = (exp_flags & PCI_EXP_FLAGS_TYPE) >> 4;
- switch (exp_type) {
- case PCI_EXP_TYPE_ROOT_PORT:
- case PCI_EXP_TYPE_DOWN_PORT:
- case PCI_EXP_TYPE_PCIE_BRDG:
break;
- default:
return;
- }
- dm_pci_read_config32(dev, pcie_off + PCI_EXP_LNKCAP, &exp_lnkcap);
- if ((exp_lnkcap & PCI_EXP_LNKCAP_SLS) <= PCI_EXP_LNKCAP_SLS_2_5GB)
return;
- dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKSTA, &exp_lnksta);
- if ((exp_lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) !=
PCI_EXP_LNKSTA_LBMS)
return;
- if (dm_pciauto_exp_link_stable(dev, pcie_off))
return;
- bdf = dm_pci_get_bdf(dev);
- debug("PCI Autoconfig: %02x.%02x.%02x: Downgrading speed by hand\n",
PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf));
- dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKCTL, &exp_lnkctl);
- dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKCTL2, &exp_lnkctl2);
- exp_lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
- dm_pci_read_config32(dev, pcie_off + PCI_EXP_LNKCAP, &exp_lnkcap);
- dm_pci_read_config32(dev, pcie_off + PCI_EXP_LNKCAP2, &exp_lnkcap2);
- for (speed = (exp_lnkcap & PCI_EXP_LNKCAP_SLS) - 1;
speed >= PCI_EXP_LNKCAP_SLS_2_5GB;
speed--) {
if (exp_lnkcap2 && (exp_lnkcap2 & 1 << speed) == 0)
continue;
debug("PCI Autoconfig: %02x.%02x.%02x: Trying speed: %u\n",
PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf), speed);
dm_pci_write_config16(dev, pcie_off + PCI_EXP_LNKCTL2,
exp_lnkctl2 | speed);
dm_pci_write_config16(dev, pcie_off + PCI_EXP_LNKCTL,
exp_lnkctl | PCI_EXP_LNKCTL_RETRLNK);
if (dm_pciauto_exp_link_stable(dev, pcie_off))
return;
- }
+}
- void dm_pciauto_prescan_setup_bridge(struct udevice *dev, int sub_bus) { struct pci_region *pci_mem;
@@ -189,6 +340,7 @@ void dm_pciauto_prescan_setup_bridge(str u8 io_32; struct udevice *ctlr = pci_get_controller(dev); struct pci_controller *ctlr_hose = dev_get_uclass_priv(ctlr);
int pcie_off;
pci_mem = ctlr_hose->pci_mem; pci_prefetch = ctlr_hose->pci_prefetch;
@@ -267,6 +419,11 @@ void dm_pciauto_prescan_setup_bridge(str cmdstat |= PCI_COMMAND_IO; }
- /* For PCIe devices see if we need to retrain the link by hand */
- pcie_off = dm_pci_find_capability(dev, PCI_CAP_ID_EXP);
- if (pcie_off)
dm_pciauto_exp_fixup_link(dev, pcie_off);
- /* Enable memory and I/O accesses, enable bus master */ dm_pci_write_config16(dev, PCI_COMMAND, cmdstat | PCI_COMMAND_MASTER); }
Index: u-boot/include/pci.h
--- u-boot.orig/include/pci.h +++ u-boot/include/pci.h @@ -5,6 +5,7 @@
- (C) Copyright 2002
- Wolfgang Denk, DENX Software Engineering, wd@denx.de.
- Copyright (c) 2021 Maciej W. Rozycki macro@orcam.me.uk
*/
#ifndef _PCI_H
@@ -475,16 +476,24 @@
/* PCI Express capabilities */ #define PCI_EXP_FLAGS 2 /* Capabilities register */ +#define PCI_EXP_FLAGS_VERSION 0x000f /* Capability Version */ #define PCI_EXP_FLAGS_TYPE 0x00f0 /* Device/Port type */ #define PCI_EXP_TYPE_ROOT_PORT 0x4 /* Root Port */ +#define PCI_EXP_TYPE_DOWN_PORT 0x6 /* Downstream Port */ +#define PCI_EXP_TYPE_PCIE_BRDG 0x8 /* PCI/PCI-X to PCI Express Bridge */ #define PCI_EXP_DEVCAP 4 /* Device capabilities */ #define PCI_EXP_DEVCAP_FLR 0x10000000 /* Function Level Reset */ #define PCI_EXP_DEVCTL 8 /* Device Control */ #define PCI_EXP_DEVCTL_BCR_FLR 0x8000 /* Bridge Configuration Retry / FLR */ #define PCI_EXP_LNKCAP 12 /* Link Capabilities */ #define PCI_EXP_LNKCAP_SLS 0x0000000f /* Supported Link Speeds */ +#define PCI_EXP_LNKCAP_SLS_2_5GB 0x0001 /* Supported Link Speed 2.5GT/s */ +#define PCI_EXP_LNKCAP_SLS_5_0GB 0x0002 /* Supported Link Speed 5.0GT/s */ +#define PCI_EXP_LNKCAP_SLS_8_0GB 0x0003 /* Supported Link Speed 8.0GT/s */ #define PCI_EXP_LNKCAP_MLW 0x000003f0 /* Maximum Link Width */ #define PCI_EXP_LNKCAP_DLLLARC 0x00100000 /* Data Link Layer Link Active Reporting Capable */ +#define PCI_EXP_LNKCTL 16 /* Link Control */ +#define PCI_EXP_LNKCTL_RETRLNK 0x0020 /* Retrain Link */ #define PCI_EXP_LNKSTA 18 /* Link Status */ #define PCI_EXP_LNKSTA_CLS 0x000f /* Current Link Speed */ #define PCI_EXP_LNKSTA_CLS_2_5GB 0x0001 /* Current Link Speed 2.5GT/s */ @@ -492,7 +501,9 @@ #define PCI_EXP_LNKSTA_CLS_8_0GB 0x0003 /* Current Link Speed 8.0GT/s */ #define PCI_EXP_LNKSTA_NLW 0x03f0 /* Negotiated Link Width */ #define PCI_EXP_LNKSTA_NLW_SHIFT 4 /* start of NLW mask in link status */ +#define PCI_EXP_LNKSTA_LNKTR 0x0800 /* Link Training */ #define PCI_EXP_LNKSTA_DLLLA 0x2000 /* Data Link Layer Link Active */ +#define PCI_EXP_LNKSTA_LBMS 0x4000 /* Link Bandwidth Management Status */ #define PCI_EXP_SLTCAP 20 /* Slot Capabilities */ #define PCI_EXP_SLTCAP_PSN 0xfff80000 /* Physical Slot Number */ #define PCI_EXP_RTCTL 28 /* Root Control */ @@ -503,8 +514,13 @@ #define PCI_EXP_DEVCAP2_ARI 0x00000020 /* ARI Forwarding Supported */ #define PCI_EXP_DEVCTL2 40 /* Device Control 2 */ #define PCI_EXP_DEVCTL2_ARI 0x0020 /* Alternative Routing-ID */
+#define PCI_EXP_LNKCAP2 44 /* Link Capability 2 */ +#define PCI_EXP_LNKCAP2_SLSV 0x000000fe /* Supported Link Speeds Vector */ #define PCI_EXP_LNKCTL2 48 /* Link Control 2 */ +#define PCI_EXP_LNKCTL2_TLS 0x000f /* Target Link Speed */ +#define PCI_EXP_LNKSTA_TLS_2_5GB 0x0001 /* Target Link Speed 2.5GT/s */ +#define PCI_EXP_LNKSTA_TLS_5_0GB 0x0002 /* Target Link Speed 5.0GT/s */ +#define PCI_EXP_LNKSTA_TLS_8_0GB 0x0003 /* Target Link Speed 8.0GT/s */ /* Single Root I/O Virtualization Registers */ #define PCI_SRIOV_CAP 0x04 /* SR-IOV Capabilities */ #define PCI_SRIOV_CTRL 0x08 /* SR-IOV Control */
Viele Grüße, Stefan Roese

Hi Stefan,
Make use of this observation then and attempt to detect the inability to negotiate the link speed automatically, and then handle it by hand. Use the Data Link Layer Link Active status flag as the primary indicator of successful link speed negotiation, but given that the flag is optional by hardware to implement (the ASM2824 does have it though), resort to checking for the mandatory Link Bandwidth Management Status flag showing that the link speed or width has been changed in an attempt to correct unreliable link operation (the ASM2824 does set it too).
If these checks indicate that link may not operate correctly, then poll the Data Link Layer Link Active status flag along with the Link Training flag for the duration of 200ms to see if the link has stabilised, that is either that the Data Link Layer Link Active status flag has been set or that Link Training has been inactive during at least the second half of the inteval.
If that has indicated failure, reduce the target speed, request a link retrain and check again if the link has stabilised. Repeat until either successful or the link speeds supported by the downstream port have been exhausted.
So in such cases, the link speed will be downgraded? I would expect at least a big warning in such cases.
I had mixed feelings about such extra clutter and chose not to include it, but perhaps it's worth adding after all, especially with the most recent findings, noted below.
Did you try to change some other configuration options for the link establishment? I remember that on some hardware we were able to get better "link-up results" by setting the de-emphasis level to -3.5dB instead of -6dB (in the link control status register 2), before trying to re-estblish the link. Did you also test with "tuning" such parameters. There might be other, which I'm missing right now.
Thank you for the suggestion. I've never been too familiar with analogue electronics engineering, so I didn't consider working at that level.
So as it has turned out the ASM2824 has the de-emphasis level already set to -3.5dB by default (at power-up or reset; the power-up default is 0063, and some bits are sticky, so may not change at reset). Interestingly, the bit is defined as HwInit, and as such it is meant to be "read-only after intialization", however with the ASM2824 it appears freely writable at any time and state reported in other registers and at the other end of the link indicates these changes do take effect. They do not fix the issue with link training though; I have tried both settings to no avail.
However while fiddling with the register I have discovered an interesting phenomenon in that the link will actually switch to 5GT/s and then work reliably, provided that it is done in two steps: first clamping the target link speed to 2.5GT/s and letting link training succeed at it, and only then switching the target link speed to 5GT/s (or for that matter 8GT/s). At that point the link changes to 5GT/s instantaneously (there's no Link Training reported active, not even momentarily, or Data Link Layer Link Active reported inactive), as shown by the Link Status Register at both ends (and the de-emphasis level does not matter; it works at either value, as reported in the Link Status 2 register, again at both ends).
It makes me suspect that the problem with link negotiation is at the data link layer, rather than at the physical layer as I originally thought. IOW the two devices disagree at the protocol rather than electrical level, and only allowing a higher link speed once the data link layer has gone up somehow avoids the incompatibility.
It works the same regardless of whether I change the target link speed in U-Boot (whether by firmware code itself or by poking with commands entered at the prompt by hand) or in Linux (with `setpci'). However changing the target link speed back to any beyond 2.5GT/s in U-Boot has an unfortunate side effect of devices behind the problematic link being only accessible until Linux boots. This is because Linux issues a reset to the PCIe tree, which causes the link to be reinitialised with the target link speed set beyond 2.5GT/s, and that brings the problem back. The reset however does not cause an issue and lets devices behind the problematic link continue working if the target link speed has been set by U-Boot to 2.5GT/s. This is because the Target Link Speed field in the Link Control 2 register is sticky, so the clamp continues to be applied.
So I think my observations above have implications as follows:
1. We don't need to try lower and lower target link speeds as a workaround in U-Boot. It is enough if we force any link found problematic just to 2.5GT/s, as a minimal requirement to make such a link to work and also the speed all PCIe devices must support.
2. We don't want to try to switch to any higher speed afterwards in U-Boot as it will prevent an OS that does not have a workaround in place, but issues a PCIe reset from working with devices behind such a problematic link. I think we ought to do our best to prevent that from happening, i.e. have the most robust workaround possible.
3. We do want to have a more sophisticated workaround in Linux (and other OSes, as someone steps in to implement one) that will ensure correct hot-plug operation and also give better performance. With hot-plug events possible at any time an OS driver cannot do aggressive polling however, so I think unlike with the workaround in U-Boot it'll have to be structured differently, e.g. if it is vendor:device-specific, then it can rely on ASM2824 Data Link Layer Link Active reporting facility and can sleep instead then as it doesn't have to do the link training detection dance.
4. This all means that changes for U-Boot and Linux will definitely have to be different each.
I'll work on simplifying the U-Boot change along the lines outlined above then, and also look into a corresponding workaround for Linux.
Thank you very much indeed for your feedback, I think I have now made good progress here.
Maciej

On Thursday 18 November 2021 00:03:58 Maciej W. Rozycki wrote:
At that point the link changes to 5GT/s instantaneously (there's no Link Training reported active, not even momentarily, or Data Link Layer Link Active reported inactive), as shown by the Link Status Register at both ends (and the de-emphasis level does not matter; it works at either value, as reported in the Link Status 2 register, again at both ends).
Here is simplified PCIe LTSSM diagram: https://www.oreilly.com/library/view/pci-express-system/0321156307/032115630...
Active link is in L0 state and changing link speed is performed via Recovery state followed by Configuration and back to L0. And there is important note that LinkUp is reported as 1b also when going into Configuration state from Recovery state. DLLLA bit in Link Status register reports 1b when DLCMSM is in DL_Active state. And DL_Active is changed (to DL_Inactive) only when LinkUp changes to 0b.
So it means that DLLLA bit in Link Status is not changed during successful link retraining when changing link speed. So your above observation is correct that DLLLA was not changed. And reason is not so obvious...
Link Training bit in Link Status register is 1b when link is in Configuration or Recovery state. It is possible that link speed change is too fast and you do not observe it (or device return old value, sampled last time).

Hello!
On Tuesday 16 November 2021 11:35:24 Maciej W. Rozycki wrote:
Attempt to handle cases with a downstream port of a PCIe switch where link training never completes and the link continues switching between speeds indefinitely with the data link layer never reaching the active state.
It has been observed with a downstream port of the ASMedia ASM2824 Gen 3 switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, P/N 41433, wired to a SiFive HiFive Unmatched board. In this setup the switches are supposed to negotiate the link speed of preferably 5.0GT/s, falling back to 2.5GT/s.
However the link continues oscillating between the two speeds, at the rate of 34-35 times per second, with link training reported active ~84% of the time repeatedly, e.g.:
02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode]) [...] Bus: primary=02, secondary=05, subordinate=05, sec-latency=0 [...] Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00 [...] LnkSta: Speed 5GT/s (downgraded), Width x1 (ok) TrErr- Train+ SlotClk+ DLActive- BWMgmt+ ABWMgmt- [...] LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB [...]
Forcibly limiting the target link speed to 2.5GT/s with the upstream ASM2824 device makes the two switches communicate correctly however:
02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode]) [...] Bus: primary=02, secondary=05, subordinate=09, sec-latency=0 [...] Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00 [...] LnkSta: Speed 2.5GT/s (downgraded), Width x1 (ok) TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt- [...] LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB [...]
and then:
05:00.0 PCI bridge [0604]: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch [12d8:2304] (rev 05) (prog-if 00 [Normal decode]) [...] Bus: primary=05, secondary=06, subordinate=09, sec-latency=0 [...] Capabilities: [c0] Express (v2) Upstream Port, MSI 00 [...] LnkSta: Speed 2.5GT/s (downgraded), Width x1 (downgraded) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- [...] LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB [...]
Make use of this observation then and attempt to detect the inability to negotiate the link speed automatically, and then handle it by hand. Use the Data Link Layer Link Active status flag as the primary indicator of successful link speed negotiation, but given that the flag is optional by hardware to implement (the ASM2824 does have it though), resort to checking for the mandatory Link Bandwidth Management Status flag showing that the link speed or width has been changed in an attempt to correct unreliable link operation (the ASM2824 does set it too).
If these checks indicate that link may not operate correctly, then poll the Data Link Layer Link Active status flag along with the Link Training flag for the duration of 200ms to see if the link has stabilised, that is either that the Data Link Layer Link Active status flag has been set or that Link Training has been inactive during at least the second half of the inteval.
If that has indicated failure, reduce the target speed, request a link retrain and check again if the link has stabilised. Repeat until either successful or the link speeds supported by the downstream port have been exhausted.
I would like to hear what Bjorn Helgaas thinks about this issue at all and how to handle it, without potentially break something else. And based on the fact that in U-Boot's PCI core code there is no such global quirk implemented, I really do not know if U-Boot maintainers and developers want to review and understand all PCI Express aspect of this code and possible side-effects. This is just what I think about it, I do not have maintainer hat.
I suggested to to first send this fixup to the Linux kernel for proper review (or to any other similar bigger project with PCI Express support) and then implement (accepted) solution into U-Boot. As I think this can speed up inclusion of this fix/workaround in U-Boot. I do not have a problem to ack/review PCI patches for U-Boot which just re-implements PCI logic from Linux kernel.
Signed-off-by: Maciej W. Rozycki macro@orcam.me.uk
Hi,
Thank you, Pali, for your valuable feedback. Here's v2 of the change with your suggestions taken into account. I believe I have addressed all your concerns and I haven't heard from anyone else, so I consider this version final unless someone speaks out.
Please apply then.
Maciej
Changes from v1:
also handle root complex and PCI/PCI-X to PCIe bridge devices,
update comments throughout accordingly,
likewise the change description; mention the rate of oscillations observed with the ASM2824 device.
drivers/pci/pci_auto.c | 157 +++++++++++++++++++++++++++++++++++++++++++++++++ include/pci.h | 18 +++++ 2 files changed, 174 insertions(+), 1 deletion(-)
u-boot-pcie-manual-retrain.diff Index: u-boot/drivers/pci/pci_auto.c =================================================================== --- u-boot.orig/drivers/pci/pci_auto.c +++ u-boot/drivers/pci/pci_auto.c @@ -5,6 +5,7 @@
- Author: Matt Porter mporter@mvista.com
- Copyright 2000 MontaVista Software Inc.
*/
- Copyright (c) 2021 Maciej W. Rozycki macro@orcam.me.uk
#include <common.h> @@ -12,6 +13,7 @@ #include <errno.h> #include <log.h> #include <pci.h> +#include <time.h> #include "pci_internal.h"
/* the user can define CONFIG_SYS_PCI_CACHE_LINE_SIZE to avoid problems */ @@ -180,6 +182,155 @@ static void dm_pciauto_setup_device(stru dm_pci_write_config8(dev, PCI_LATENCY_TIMER, 0x80); }
+/*
- Check if the link of a downstream PCIe port operates correctly.
- For that check if the optional Data Link Layer Link Active status gets
- on within a 200ms period or failing that wait until the completion of
- that period and check if link training has shown the completed status
- continuously throughout the second half of that period.
- Observation with the ASMedia ASM2824 Gen 3 switch indicates it takes
- 11-44ms to indicate the Data Link Layer Link Active status at 2.5GT/s,
- though it may take a couple of link training iterations.
- */
+static bool dm_pciauto_exp_link_stable(struct udevice *dev, int pcie_off) +{
- u64 loops = 0, trcount = 0, ntrcount = 0, flips = 0;
- bool dllla, lnktr, plnktr;
- u16 exp_lnksta;
- pci_dev_t bdf;
- u64 end;
- dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKSTA, &exp_lnksta);
- plnktr = !!(exp_lnksta & PCI_EXP_LNKSTA_LNKTR);
- end = get_ticks() + usec_to_tick(200000);
- do {
dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKSTA,
&exp_lnksta);
dllla = !!(exp_lnksta & PCI_EXP_LNKSTA_DLLLA);
lnktr = !!(exp_lnksta & PCI_EXP_LNKSTA_LNKTR);
flips += plnktr ^ lnktr;
if (lnktr) {
ntrcount = 0;
trcount++;
} else {
ntrcount++;
}
loops++;
plnktr = lnktr;
- } while (!dllla && get_ticks() < end);
- bdf = dm_pci_get_bdf(dev);
- debug("PCI Autoconfig: %02x.%02x.%02x: Fixup link: DL active: %u; "
"%3llu flips, %6llu loops of which %6llu while training, "
"final %6llu stable\n",
PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf),
(unsigned int)dllla,
(unsigned long long)flips, (unsigned long long)loops,
(unsigned long long)trcount, (unsigned long long)ntrcount);
- return dllla || ntrcount >= loops / 2;
+}
+/*
- Retrain the link of a downstream PCIe port by hand if necessary.
- This is needed at least where a downstream port of the ASMedia ASM2824
- Gen 3 switch is wired to the upstream port of the Pericom PI7C9X2G304
- Gen 2 switch, and observed with the Delock Riser Card PCI Express x1 >
- 2 x PCIe x1 device, P/N 41433, plugged into the SiFive HiFive Unmatched
- board.
- In such a configuration the switches are supposed to negotiate the link
- speed of preferably 5.0GT/s, falling back to 2.5GT/s. However the link
- continues switching between the two speeds indefinitely and the data
- link layer never reaches the active state, with link training reported
- active ~84% of the time repeatedly. Forcing the target link speed to
- 2.5GT/s with the upstream ASM2824 device makes the two switches talk to
- each other correctly however.
- As this can potentially happen with any device and is cheap in the case
- of correctly operating hardware, let's do it for all downstream ports,
- for root complexes, PCIe switches and PCI/PCI-X to PCIe bridges, and if
- automatic link training may have failed to complete, as indicated by
- the optional Data Link Layer Link Active status being off and the Link
- Bandwidth Management Status showing that hardware has changed the link
- speed or width in an attempt to correct unreliable link operation, then
- try lower and lower speeds one by one until one has succeeded or we
- have run out of speeds.
- This requires the presence of the Link Control 2 register, so make sure
- the PCI Express Capability Version is at least 2. Also don't try, for
- obvious reasons, to limit the speed if 2.5GT/s is the only link speed
- supported.
- */
+static void dm_pciauto_exp_fixup_link(struct udevice *dev, int pcie_off) +{
- u16 exp_lnksta, exp_lnkctl, exp_lnkctl2;
- u16 exp_flags, exp_type, exp_version;
- u32 exp_lnkcap, exp_lnkcap2;
- unsigned int speed;
- pci_dev_t bdf;
- dm_pci_read_config16(dev, pcie_off + PCI_EXP_FLAGS, &exp_flags);
- exp_version = exp_flags & PCI_EXP_FLAGS_VERSION;
- if (exp_version < 2)
return;
- exp_type = (exp_flags & PCI_EXP_FLAGS_TYPE) >> 4;
- switch (exp_type) {
- case PCI_EXP_TYPE_ROOT_PORT:
- case PCI_EXP_TYPE_DOWN_PORT:
- case PCI_EXP_TYPE_PCIE_BRDG:
break;
- default:
return;
- }
- dm_pci_read_config32(dev, pcie_off + PCI_EXP_LNKCAP, &exp_lnkcap);
- if ((exp_lnkcap & PCI_EXP_LNKCAP_SLS) <= PCI_EXP_LNKCAP_SLS_2_5GB)
return;
- dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKSTA, &exp_lnksta);
- if ((exp_lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) !=
PCI_EXP_LNKSTA_LBMS)
return;
- if (dm_pciauto_exp_link_stable(dev, pcie_off))
return;
- bdf = dm_pci_get_bdf(dev);
- debug("PCI Autoconfig: %02x.%02x.%02x: Downgrading speed by hand\n",
PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf));
- dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKCTL, &exp_lnkctl);
- dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKCTL2, &exp_lnkctl2);
- exp_lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
- dm_pci_read_config32(dev, pcie_off + PCI_EXP_LNKCAP, &exp_lnkcap);
- dm_pci_read_config32(dev, pcie_off + PCI_EXP_LNKCAP2, &exp_lnkcap2);
- for (speed = (exp_lnkcap & PCI_EXP_LNKCAP_SLS) - 1;
speed >= PCI_EXP_LNKCAP_SLS_2_5GB;
speed--) {
if (exp_lnkcap2 && (exp_lnkcap2 & 1 << speed) == 0)
continue;
debug("PCI Autoconfig: %02x.%02x.%02x: Trying speed: %u\n",
PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf), speed);
dm_pci_write_config16(dev, pcie_off + PCI_EXP_LNKCTL2,
exp_lnkctl2 | speed);
dm_pci_write_config16(dev, pcie_off + PCI_EXP_LNKCTL,
exp_lnkctl | PCI_EXP_LNKCTL_RETRLNK);
if (dm_pciauto_exp_link_stable(dev, pcie_off))
return;
I was thinking more about it and maybe this code still needs some improvements. PCIe link consist of two ends and you should not force speed on one end which is unsupported by other. Because iteration is going from highest speed to slowest speed, it means that this code can try to setup also 8GT/s speed on root/downstream port to which is connected only 5GT/s card. I have 5GT/s PCIe cards which successfully initialize link when downstream port forces 8GT/s speed, but link is setup only to 2.5GT/s. So this code for those PCIe cards connected to unstable ports (when this workaround is needed) degrades performance from 5GT/s to just 2.5GT/s. But I do not know how to correctly handle it (one quick idea: always force 2.5GT/s, stabilize link, read capability from card and retrain again; but I do not know if this is correct).
And... I have also PCIe HW which does not support DLLA bit and RL bit is write-only. I'm not sure right now, if this code would not cause issues for such HW (if to it is connected broken PCIe card which does not like link retraining), this needs to be tested.
- }
+}
void dm_pciauto_prescan_setup_bridge(struct udevice *dev, int sub_bus) { struct pci_region *pci_mem; @@ -189,6 +340,7 @@ void dm_pciauto_prescan_setup_bridge(str u8 io_32; struct udevice *ctlr = pci_get_controller(dev); struct pci_controller *ctlr_hose = dev_get_uclass_priv(ctlr);
int pcie_off;
pci_mem = ctlr_hose->pci_mem; pci_prefetch = ctlr_hose->pci_prefetch;
@@ -267,6 +419,11 @@ void dm_pciauto_prescan_setup_bridge(str cmdstat |= PCI_COMMAND_IO; }
- /* For PCIe devices see if we need to retrain the link by hand */
- pcie_off = dm_pci_find_capability(dev, PCI_CAP_ID_EXP);
- if (pcie_off)
dm_pciauto_exp_fixup_link(dev, pcie_off);
- /* Enable memory and I/O accesses, enable bus master */ dm_pci_write_config16(dev, PCI_COMMAND, cmdstat | PCI_COMMAND_MASTER);
} Index: u-boot/include/pci.h =================================================================== --- u-boot.orig/include/pci.h +++ u-boot/include/pci.h @@ -5,6 +5,7 @@
- (C) Copyright 2002
- Wolfgang Denk, DENX Software Engineering, wd@denx.de.
*/
- Copyright (c) 2021 Maciej W. Rozycki macro@orcam.me.uk
#ifndef _PCI_H @@ -475,16 +476,24 @@
/* PCI Express capabilities */ #define PCI_EXP_FLAGS 2 /* Capabilities register */ +#define PCI_EXP_FLAGS_VERSION 0x000f /* Capability Version */ #define PCI_EXP_FLAGS_TYPE 0x00f0 /* Device/Port type */ #define PCI_EXP_TYPE_ROOT_PORT 0x4 /* Root Port */ +#define PCI_EXP_TYPE_DOWN_PORT 0x6 /* Downstream Port */ +#define PCI_EXP_TYPE_PCIE_BRDG 0x8 /* PCI/PCI-X to PCI Express Bridge */
Currently U-Boot macros in pci.h comes from Linux file include/uapi/linux/pci_regs.h. So I would suggest to stick with macro names as defined in that Linux file and when introducing new macros into U-Boot pci.h, reuse names from Linux pci_regs.h.
In Linux pci_regs.h are currently:
#define PCI_EXP_FLAGS_TYPE 0x00f0 /* Device/Port type */ #define PCI_EXP_TYPE_ENDPOINT 0x0 /* Express Endpoint */ #define PCI_EXP_TYPE_LEG_END 0x1 /* Legacy Endpoint */ #define PCI_EXP_TYPE_ROOT_PORT 0x4 /* Root Port */ #define PCI_EXP_TYPE_UPSTREAM 0x5 /* Upstream Port */ #define PCI_EXP_TYPE_DOWNSTREAM 0x6 /* Downstream Port */ #define PCI_EXP_TYPE_PCI_BRIDGE 0x7 /* PCIe to PCI/PCI-X Bridge */ #define PCI_EXP_TYPE_PCIE_BRIDGE 0x8 /* PCI/PCI-X to PCIe Bridge */ #define PCI_EXP_TYPE_RC_END 0x9 /* Root Complex Integrated Endpoint */ #define PCI_EXP_TYPE_RC_EC 0xa /* Root Complex Event Collector */
#define PCI_EXP_DEVCAP 4 /* Device capabilities */ #define PCI_EXP_DEVCAP_FLR 0x10000000 /* Function Level Reset */ #define PCI_EXP_DEVCTL 8 /* Device Control */ #define PCI_EXP_DEVCTL_BCR_FLR 0x8000 /* Bridge Configuration Retry / FLR */ #define PCI_EXP_LNKCAP 12 /* Link Capabilities */ #define PCI_EXP_LNKCAP_SLS 0x0000000f /* Supported Link Speeds */ +#define PCI_EXP_LNKCAP_SLS_2_5GB 0x0001 /* Supported Link Speed 2.5GT/s */ +#define PCI_EXP_LNKCAP_SLS_5_0GB 0x0002 /* Supported Link Speed 5.0GT/s */ +#define PCI_EXP_LNKCAP_SLS_8_0GB 0x0003 /* Supported Link Speed 8.0GT/s */ #define PCI_EXP_LNKCAP_MLW 0x000003f0 /* Maximum Link Width */ #define PCI_EXP_LNKCAP_DLLLARC 0x00100000 /* Data Link Layer Link Active Reporting Capable */ +#define PCI_EXP_LNKCTL 16 /* Link Control */ +#define PCI_EXP_LNKCTL_RETRLNK 0x0020 /* Retrain Link */ #define PCI_EXP_LNKSTA 18 /* Link Status */ #define PCI_EXP_LNKSTA_CLS 0x000f /* Current Link Speed */ #define PCI_EXP_LNKSTA_CLS_2_5GB 0x0001 /* Current Link Speed 2.5GT/s */ @@ -492,7 +501,9 @@ #define PCI_EXP_LNKSTA_CLS_8_0GB 0x0003 /* Current Link Speed 8.0GT/s */ #define PCI_EXP_LNKSTA_NLW 0x03f0 /* Negotiated Link Width */ #define PCI_EXP_LNKSTA_NLW_SHIFT 4 /* start of NLW mask in link status */ +#define PCI_EXP_LNKSTA_LNKTR 0x0800 /* Link Training */ #define PCI_EXP_LNKSTA_DLLLA 0x2000 /* Data Link Layer Link Active */ +#define PCI_EXP_LNKSTA_LBMS 0x4000 /* Link Bandwidth Management Status */ #define PCI_EXP_SLTCAP 20 /* Slot Capabilities */ #define PCI_EXP_SLTCAP_PSN 0xfff80000 /* Physical Slot Number */ #define PCI_EXP_RTCTL 28 /* Root Control */ @@ -503,8 +514,13 @@ #define PCI_EXP_DEVCAP2_ARI 0x00000020 /* ARI Forwarding Supported */ #define PCI_EXP_DEVCTL2 40 /* Device Control 2 */ #define PCI_EXP_DEVCTL2_ARI 0x0020 /* Alternative Routing-ID */
+#define PCI_EXP_LNKCAP2 44 /* Link Capability 2 */ +#define PCI_EXP_LNKCAP2_SLSV 0x000000fe /* Supported Link Speeds Vector */ #define PCI_EXP_LNKCTL2 48 /* Link Control 2 */ +#define PCI_EXP_LNKCTL2_TLS 0x000f /* Target Link Speed */ +#define PCI_EXP_LNKSTA_TLS_2_5GB 0x0001 /* Target Link Speed 2.5GT/s */ +#define PCI_EXP_LNKSTA_TLS_5_0GB 0x0002 /* Target Link Speed 5.0GT/s */ +#define PCI_EXP_LNKSTA_TLS_8_0GB 0x0003 /* Target Link Speed 8.0GT/s */ /* Single Root I/O Virtualization Registers */ #define PCI_SRIOV_CAP 0x04 /* SR-IOV Capabilities */ #define PCI_SRIOV_CTRL 0x08 /* SR-IOV Control */

Hi Pali,
If that has indicated failure, reduce the target speed, request a link retrain and check again if the link has stabilised. Repeat until either successful or the link speeds supported by the downstream port have been exhausted.
I would like to hear what Bjorn Helgaas thinks about this issue at all and how to handle it, without potentially break something else. And based on the fact that in U-Boot's PCI core code there is no such global quirk implemented, I really do not know if U-Boot maintainers and developers want to review and understand all PCI Express aspect of this code and possible side-effects. This is just what I think about it, I do not have maintainer hat.
Please leave the decision up to actual maintainers then. They can speak for themselves.
I suggested to to first send this fixup to the Linux kernel for proper review (or to any other similar bigger project with PCI Express support) and then implement (accepted) solution into U-Boot. As I think this can speed up inclusion of this fix/workaround in U-Boot. I do not have a problem to ack/review PCI patches for U-Boot which just re-implements PCI logic from Linux kernel.
I think that with a change functionally reviewed elsewhere and merely carried over there'd be little to nothing to ack/review here. However based on my most recent findings reported in the other message I don't think we're going to have the same workaround as Linux likely will, because U-Boot and Linux have different objectives each.
- for (speed = (exp_lnkcap & PCI_EXP_LNKCAP_SLS) - 1;
speed >= PCI_EXP_LNKCAP_SLS_2_5GB;
speed--) {
if (exp_lnkcap2 && (exp_lnkcap2 & 1 << speed) == 0)
continue;
debug("PCI Autoconfig: %02x.%02x.%02x: Trying speed: %u\n",
PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf), speed);
dm_pci_write_config16(dev, pcie_off + PCI_EXP_LNKCTL2,
exp_lnkctl2 | speed);
dm_pci_write_config16(dev, pcie_off + PCI_EXP_LNKCTL,
exp_lnkctl | PCI_EXP_LNKCTL_RETRLNK);
if (dm_pciauto_exp_link_stable(dev, pcie_off))
return;
I was thinking more about it and maybe this code still needs some improvements. PCIe link consist of two ends and you should not force speed on one end which is unsupported by other. Because iteration is going from highest speed to slowest speed, it means that this code can try to setup also 8GT/s speed on root/downstream port to which is connected only 5GT/s card. I have 5GT/s PCIe cards which successfully initialize link when downstream port forces 8GT/s speed, but link is setup only to 2.5GT/s. So this code for those PCIe cards connected to unstable ports (when this workaround is needed) degrades performance from 5GT/s to just 2.5GT/s. But I do not know how to correctly handle it (one quick idea: always force 2.5GT/s, stabilize link, read capability from card and retrain again; but I do not know if this is correct).
Are you sure you know what you are talking about here?
As a part of link training ends first negotiate link at 2.5GT/s, which is a speed all devices must support, then exchange information as to speeds actually supported, and only then possibly choose to switch to another speed supported by both. How lowering the target link speed is supposed to break it? It's no different to connecting a device the maximum supported link speed of which is the target link speed chosen. At worst they'll stay at 2.5GT/s.
And... I have also PCIe HW which does not support DLLA bit and RL bit is write-only. I'm not sure right now, if this code would not cause issues for such HW (if to it is connected broken PCIe card which does not like link retraining), this needs to be tested.
RL is always write-only and DLLLA is often not supported. I think both code itself and my description is clear as to the handling of these bits.
Are you sure you are up to reviewing code in this area?
Maciej

Hi Maciej,
On 11/18/21 01:32, Maciej W. Rozycki wrote:
Hi Pali,
If that has indicated failure, reduce the target speed, request a link retrain and check again if the link has stabilised. Repeat until either successful or the link speeds supported by the downstream port have been exhausted.
I would like to hear what Bjorn Helgaas thinks about this issue at all and how to handle it, without potentially break something else. And based on the fact that in U-Boot's PCI core code there is no such global quirk implemented, I really do not know if U-Boot maintainers and developers want to review and understand all PCI Express aspect of this code and possible side-effects. This is just what I think about it, I do not have maintainer hat.
Please leave the decision up to actual maintainers then. They can speak for themselves.
Sure, this is correct. But still, review comments from other developers (e.g. non maintainers) are often very helpful. And especially Pali has done quite a lot of work in the area of PCIe lately (amongst others) and IMHO his thoughts and comments seem quite reasonable - at least to me.
I suggested to to first send this fixup to the Linux kernel for proper review (or to any other similar bigger project with PCI Express support) and then implement (accepted) solution into U-Boot. As I think this can speed up inclusion of this fix/workaround in U-Boot. I do not have a problem to ack/review PCI patches for U-Boot which just re-implements PCI logic from Linux kernel.
I think that with a change functionally reviewed elsewhere and merely carried over there'd be little to nothing to ack/review here. However based on my most recent findings reported in the other message I don't think we're going to have the same workaround as Linux likely will, because U-Boot and Linux have different objectives each.
- for (speed = (exp_lnkcap & PCI_EXP_LNKCAP_SLS) - 1;
speed >= PCI_EXP_LNKCAP_SLS_2_5GB;
speed--) {
if (exp_lnkcap2 && (exp_lnkcap2 & 1 << speed) == 0)
continue;
debug("PCI Autoconfig: %02x.%02x.%02x: Trying speed: %u\n",
PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf), speed);
dm_pci_write_config16(dev, pcie_off + PCI_EXP_LNKCTL2,
exp_lnkctl2 | speed);
dm_pci_write_config16(dev, pcie_off + PCI_EXP_LNKCTL,
exp_lnkctl | PCI_EXP_LNKCTL_RETRLNK);
if (dm_pciauto_exp_link_stable(dev, pcie_off))
return;
I was thinking more about it and maybe this code still needs some improvements. PCIe link consist of two ends and you should not force speed on one end which is unsupported by other. Because iteration is going from highest speed to slowest speed, it means that this code can try to setup also 8GT/s speed on root/downstream port to which is connected only 5GT/s card. I have 5GT/s PCIe cards which successfully initialize link when downstream port forces 8GT/s speed, but link is setup only to 2.5GT/s. So this code for those PCIe cards connected to unstable ports (when this workaround is needed) degrades performance from 5GT/s to just 2.5GT/s. But I do not know how to correctly handle it (one quick idea: always force 2.5GT/s, stabilize link, read capability from card and retrain again; but I do not know if this is correct).
Are you sure you know what you are talking about here?
As a part of link training ends first negotiate link at 2.5GT/s, which is a speed all devices must support, then exchange information as to speeds actually supported, and only then possibly choose to switch to another speed supported by both. How lowering the target link speed is supposed to break it? It's no different to connecting a device the maximum supported link speed of which is the target link speed chosen. At worst they'll stay at 2.5GT/s.
And... I have also PCIe HW which does not support DLLA bit and RL bit is write-only. I'm not sure right now, if this code would not cause issues for such HW (if to it is connected broken PCIe card which does not like link retraining), this needs to be tested.
RL is always write-only and DLLLA is often not supported. I think both code itself and my description is clear as to the handling of these bits.
Are you sure you are up to reviewing code in this area?
Maciej, please don't get me wrong. I value your input here. But this sounds a bit offensive IMHO and I would like to keep the discussion here on the technical level.
Thanks, Stefan

Hi Stefan,
I would like to hear what Bjorn Helgaas thinks about this issue at all and how to handle it, without potentially break something else. And based on the fact that in U-Boot's PCI core code there is no such global quirk implemented, I really do not know if U-Boot maintainers and developers want to review and understand all PCI Express aspect of this code and possible side-effects. This is just what I think about it, I do not have maintainer hat.
Please leave the decision up to actual maintainers then. They can speak for themselves.
Sure, this is correct. But still, review comments from other developers (e.g. non maintainers) are often very helpful. And especially Pali has done quite a lot of work in the area of PCIe lately (amongst others) and IMHO his thoughts and comments seem quite reasonable - at least to me.
Having done it myself I know reviews are often more challenging than actual problem solving, also from my own experience, and I do appreciate Pali's effort and the willingness to help, as I have already expressed.
And... I have also PCIe HW which does not support DLLA bit and RL bit is write-only. I'm not sure right now, if this code would not cause issues for such HW (if to it is connected broken PCIe card which does not like link retraining), this needs to be tested.
RL is always write-only and DLLLA is often not supported. I think both code itself and my description is clear as to the handling of these bits.
Are you sure you are up to reviewing code in this area?
Maciej, please don't get me wrong. I value your input here. But this sounds a bit offensive IMHO and I would like to keep the discussion here on the technical level.
Apologies to sound a little harsh.
I've started feeling being side-tracked into non-issues though, as some of the matters raised are clear either from the specification or I have specifically pointed them out with my submission, e.g.:
"Use the Data Link Layer Link Active status flag as the primary indicator of successful link speed negotiation, but given that the flag is optional by hardware to implement (the ASM2824 does have it though) [...]"
(previously also the matter with the Supported Link Speeds Vector, etc.), so I started feeling like my effort to get all the details given in the submission was for nothing.
Thank you for your input.
Maciej

On Thu, 18 Nov 2021, Maciej W. Rozycki wrote:
"Use the Data Link Layer Link Active status flag as the primary indicator of successful link speed negotiation, but given that the flag is optional by hardware to implement (the ASM2824 does have it though) [...]"
NB I did verify the change too by making code ignore the presence of the DLLLA bit in the ASM2824, the usual approach in simulating a different environment that is not readily available. Otherwise I would have no guarantee that the loop termination conditions are indeed right or what a reasonable timeout would be for when there is no DLLLA bit. If that was the case, I'd be offering unverified code (which is sometimes inevitable) without mentioning that fact and/or asking for verification, and that would be a major fault with any submission.
I could have mentioned it with my submission that operation without DLLLA has been verified, which was clearly an oversight on my side. OTOH it's been routine for me when working with hardware, so it didn't occur to me it may not be obvious to someone else (though indeed I've seen submissions of varying quality, so things cannot be taken for granted).
Maciej
participants (3)
-
Maciej W. Rozycki
-
Pali Rohár
-
Stefan Roese