
On Sat, 2020-06-27 at 20:57 +0800, Kever Yang wrote:
Hi Kurt,
On 2020/6/4 上午5:17, Peter Geis wrote:
On Tue, Jun 2, 2020 at 11:12 AM Kurt Miller kurt@intricatesoftware.com wrote:
On Tue, 2020-06-02 at 10:23 +0800, Shawn Lin wrote:
在 2020/6/2 9:59, Kever Yang 写道:
Hi Kurt,
On 2020/6/2 上午4:30, Kurt Miller wrote:
On at least the RockPro64, many cards will trip a synchronous abort when first accessing PCIe config space during bus scanning. A delay after link training allows some of these cards to function.
Signed-off-by: Kurt Miller kurt@intricatesoftware.com
On the RockPro64, some pci cards trip a synchronous abort when scanning the pci bus. For example with HighPoint Rocket Raid 640L which is based on Marvell 88SE9230 I see this:
=> pci "Synchronous Abort" handler, esr 0x96000210 elr: 000000000022d034 lr : 000000000022cfd0 (reloc) elr: 00000000f4568034 lr : 00000000f4567fd0 x0 : 0000000000100000 x1 : 00000000f8000000 x2 : 0000000000000000 x3 : 0000000000100000 x4 : 00000000f2559290 x5 : 0000000000000000 x6 : 0000000000000001 x7 : 00000000f2559860 x8 : 0000000000000030 x9 : 0000000000000008 x10: 0000000000000010 x11: 00000000f251fd1c x12: 0000000000001421 x13: 0000000000001468 x14: 00000000f251fd4c x15: 00000000ffffffff x16: 0000000000060001 x17: 000000000000001f x18: 00000000f2532dc0 x19: 00000000f251fcd0 x20: 0000000000000001 x21: 0000000000000000 x22: 0000000000010000 x23: 00000000f45d4000 x24: 0000000000000000 x25: 00000000f45bc000 x26: 0000000000000000 x27: 0000000000000000 x28: 00000000f2541440 x29: 00000000f251fc20
Code: 540000c1 350000a5 93407c00 f9400081 (b8616800) Resetting CPU ...
Adding a delay after link training works-around the problem. I added this delay to the OpenBSD rkpcie driver as well:
https://github.com/openbsd/src/commit/9857dee3520d8ca5bec68538f4b0708d7e64fc...
HighPoint Rocket Raid 640L needs a 1.75 sec delay and Crossfield SAS9211-4i needs a 1 second delay, so I arbitrarily decided on 2 seconds.
The delay work-around was originally discovered by nuumio: https://github.com/nuumio/linux-kernel/commit/5a65b17686002dc84d461bffa324a2...
drivers/pci/pcie_rockchip.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/pci/pcie_rockchip.c b/drivers/pci/pcie_rockchip.c index 0edc2464a8..51cfbf6b18 100644 --- a/drivers/pci/pcie_rockchip.c +++ b/drivers/pci/pcie_rockchip.c @@ -288,6 +288,14 @@ static int rockchip_pcie_init_port(struct udevice *dev) goto err_power_off_phy; } + /* + * XXX: On at least the RockPro64, many cards will trip a + * synchronous abort when first accessing PCIe config space + * during bus scanning. A delay after link training allows + * some of these cards to function. + */ + mdelay(2000);
I don't understand what kind of delay for init needs 2 Seconds, root cause will preferred.
Hi Kever,
While working on this issue for the OpenBSD PCIe driver I was not able to determine the root cause. I tested the following adapters:
ROCKPro64 2 Port SATA StarTech PEXSAT32 2 Port SATA Samsung 970 Evo NVMe w/m.2 adapter IO CREST SI-PEX40148 2 Port SATA IO CREST SI-PEX40057 4 port HighPoint Rocket Raid 640L Crossfield SAS9211-4i Del PERC H200 Dell PERC 6/i Intel Gigabit VT Quad Port Server
All of the above adapters successfully link trained, however three of them would panic upon the first read of the PCI config space. nuumio's work-around of placing a delay after link-training allows these cards to work.
HighPoint Rocket Raid 640L ~1.75 sec delay Crossfield SAS9211-4i ~1 sec delay Dell PERC H200 ~1 sec delay
In attempt to determine if there was a way to detect how long to wait, I compared every status and debug register documented in the rk3399 TRM part 2 for the PCI controller. I compared the values pre-delay and post-delay. I was not able to find a value that would indicate it was safe to proceed.
Strictly speaking, how long should we need for this had been provided, see this:
I can accept the same fix like kernel which is 100ms, but 2 Second is really too much for most cases.
Hi Kever,
Thank you for following up on this. The 100ms delay before link-training appears to be solving a different issue where some cards fail to link train. From the comment in the referenced patch "Otherwise we do see some failures for link training."
The delay I am proposing is after successful link-training to avoid the synchronous abort that happens when scanning the pci bus and attempting read the pci config space for some cards.
I performed a test where I added the 100ms delay before link-training and it did not correct the synchronous abort on the HighPoint Rocket Raid 640L.
Shawn Lin suggested I check if the power supply is stable before enabling training. How would I go about doing this?
Regards, -Kurt
Thanks,
- Kever
If you need more delay, I highly suspect you should check if the power supply is stable before enabling training.
Hi Shawn,
Thank you for pointing out the need for the 100ms delay before beginning link-training. I believe this is to correct link training failures, while the delay after link training is to work-around post link training reading of PCI config space on certain cards.
There are similar issues on the Linux driver with various cards randomly throwing an abort. If it is power, a 2 second wait to allow cards to stabilize after power on might be wise for the Linux driver as well.
I imagine some cards take longer to complete power on reset than others, and attempting to read them immediately after powering them up could be the issue here.
In my testing I discussed above, I suspected that power was a likely cause. The HighPoint Rocket Raid 640L is a good test card because it documents its 3.3v power at 0.7 watts:
https://highpoint-tech.com/USA_new/series_r600-specifications.htm
How can I check if the power supply is stable?
Regards, -Kurt
Thanks,
- Kever
/* Initialize Root Complex registers. */ writel(PCIE_LM_VENDOR_ROCKCHIP, priv->apb_base + PCIE_LM_VENDOR_ID); writel(PCI_CLASS_BRIDGE_PCI << 16,