
On Friday 08 April 2022 00:18:48 Maciej W. Rozycki wrote:
On Thu, 7 Apr 2022, Stefan Roese wrote:
Hello! What do you think about this change? I think it is good compromise between enable this workaround for all builds on all boards and enable it only based on device id. Or would it be better to restrict this workaround just for ASM2824 device like the last iteration of kernel patch?
I'm not sure if we should name this "workaround" ASM2824, even though it's currently (only) targeted exactly for this PCIe switch. It might be helpful for other PCIe switches as well. So perhaps it's better to give this function a more generic name instead? With this change, it makes perhaps also sense to keep this function in pci_auto.c but also rename the Kconfig option to some more generic version.
By now I have become somewhat tired arguing and explaining matters over and over again as things have been moving as slow as molasses in this area, but one point I want to raise here is while it is indeed the ASM2824 device that seems problematic, it may actually be downstream, so you won't know it's there until you go through the workaround, as observed with the root port of the SiFive FU740-C000 SOC (which has a separate workaround in U-boot, clearly for the same issue; cf. `pcie_sifive_force_gen1'). So it looks like the erratum is going to show up with some device combinations in which the device enumerator may not have a way to know an ASM2824 is there until the workaround applied to an upstream device has let the link work.
I see that Linux patch was not not merged yet and there are already comments that this issue is probably board or arch specific and maybe should be in arch/riscv linux dir: https://lore.kernel.org/linux-pci/20220421202711.GA1415244@bhelgaas/
From that comment I have feeling that the issue could be really specific
to board or combination of connected devices (ASM2824+PI7C9X2G304) as there is really no other report about this issue.
In any case it is weird.
And as I previously already mentioned the Linux version of the workaround is only activated by the vendor:device ID because you cannot busy-loop polling on the Link Training bit in Linux (while you can do it in U-Boot, because U-Boot is not an OS).
Is is not _only_ because of this. For a longer time there is a direction to specify exact list of _affected_ hw which needs workaround. And not usage of wildcard which match all hardware, even unaffected. See for example comments which are adding workaround for broken GIC HW: https://lore.kernel.org/lkml/87ilsutb6w.wl-maz@kernel.org/
And this makes sense, workarounds should be targeted.
Arguably I could have broadened it to cover all Gen 3+ devices and poll on the Data Link Layer Link Active bit, which doesn't require busy-looping for meaningful results, but that would still leave Gen 2 devices out and chances are the system boots from U-Boot with the generic workaround applied and the link already negotiated at 2.5GT/s.
NB the ASM2824 switch has been used with option cards as well, e.g. https://www.amazon.com/dp/B07PRN2QCV, so it can be there in any system that has a connector of any kind that lets one use PCIe option cards.
FWIW,
Maciej