[U-Boot] [PATCH] arm: timer: sunxi: fix a64 spurious timeout issues

Fixes spurious timeouts which have been seen during testing SPI_SUNXI driver. The false timeouts disappear when number of bits reduced to 10 in workaround.
The false timeouts are caused by timer backward jumps.
Signed-off-by: Oskari Lemmela oskari@lemmela.net --- arch/arm/cpu/armv8/generic_timer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/cpu/armv8/generic_timer.c b/arch/arm/cpu/armv8/generic_timer.c index c1706dcec1..2e06ee4ed2 100644 --- a/arch/arm/cpu/armv8/generic_timer.c +++ b/arch/arm/cpu/armv8/generic_timer.c @@ -66,7 +66,7 @@ unsigned long timer_read_counter(void) isb(); do { asm volatile("mrs %0, cntpct_el0" : "=r" (cntpct)); - } while (((cntpct + 1) & GENMASK(10, 0)) <= 1); + } while (((cntpct + 1) & GENMASK(9, 0)) <= 1);
return cntpct; }

On Sun, Mar 17, 2019 at 2:56 PM Oskari Lemmela oskari@lemmela.net wrote:
Fixes spurious timeouts which have been seen during testing SPI_SUNXI driver. The false timeouts disappear when number of bits reduced to 10 in workaround.
The false timeouts are caused by timer backward jumps.
Wouldn't it be best to apply the same fix as the kernel uses here [1] as a more permanent fix?
[1] https://www.spinics.net/lists/arm-kernel/msg699622.html
Signed-off-by: Oskari Lemmela oskari@lemmela.net
arch/arm/cpu/armv8/generic_timer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/cpu/armv8/generic_timer.c b/arch/arm/cpu/armv8/generic_timer.c index c1706dcec1..2e06ee4ed2 100644 --- a/arch/arm/cpu/armv8/generic_timer.c +++ b/arch/arm/cpu/armv8/generic_timer.c @@ -66,7 +66,7 @@ unsigned long timer_read_counter(void) isb(); do { asm volatile("mrs %0, cntpct_el0" : "=r" (cntpct));
} while (((cntpct + 1) & GENMASK(10, 0)) <= 1);
} while (((cntpct + 1) & GENMASK(9, 0)) <= 1); return cntpct;
}
2.17.1
U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot

On 3/17/19 6:04 PM, Peter Robinson wrote:
On Sun, Mar 17, 2019 at 2:56 PM Oskari Lemmela oskari@lemmela.net wrote:
Fixes spurious timeouts which have been seen during testing SPI_SUNXI driver. The false timeouts disappear when number of bits reduced to 10 in workaround.
The false timeouts are caused by timer backward jumps.
Wouldn't it be best to apply the same fix as the kernel uses here [1] as a more permanent fix?
Sure. Difference is that retry counter was added to while loop in kernel side.
Only CNTPCT is used in u-boot. So CNTVCT part of kernel patch is currently not needed?
Signed-off-by: Oskari Lemmela oskari@lemmela.net
arch/arm/cpu/armv8/generic_timer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/cpu/armv8/generic_timer.c b/arch/arm/cpu/armv8/generic_timer.c index c1706dcec1..2e06ee4ed2 100644 --- a/arch/arm/cpu/armv8/generic_timer.c +++ b/arch/arm/cpu/armv8/generic_timer.c @@ -66,7 +66,7 @@ unsigned long timer_read_counter(void) isb(); do { asm volatile("mrs %0, cntpct_el0" : "=r" (cntpct));
} while (((cntpct + 1) & GENMASK(10, 0)) <= 1);
} while (((cntpct + 1) & GENMASK(9, 0)) <= 1); return cntpct;
}
-- 2.17.1
U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot
Oskari

On 03/17/19 13:41, Oskari Lemmelä wrote:
On 3/17/19 6:04 PM, Peter Robinson wrote:
On Sun, Mar 17, 2019 at 2:56 PM Oskari Lemmela oskari@lemmela.net wrote:
Fixes spurious timeouts which have been seen during testing SPI_SUNXI driver. The false timeouts disappear when number of bits reduced to 10 in workaround.
The false timeouts are caused by timer backward jumps.
Wouldn't it be best to apply the same fix as the kernel uses here [1] as a more permanent fix?
This _is_ the same fix as the kernel uses.
Sure. Difference is that retry counter was added to while loop in kernel side.
And the retry counter isn't needed here, since there is no preemption in u-boot.
Only CNTPCT is used in u-boot. So CNTVCT part of kernel patch is currently not needed?
Correct.
Signed-off-by: Oskari Lemmela oskari@lemmela.net
arch/arm/cpu/armv8/generic_timer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/cpu/armv8/generic_timer.c b/arch/arm/cpu/armv8/generic_timer.c index c1706dcec1..2e06ee4ed2 100644 --- a/arch/arm/cpu/armv8/generic_timer.c +++ b/arch/arm/cpu/armv8/generic_timer.c @@ -66,7 +66,7 @@ unsigned long timer_read_counter(void) isb(); do { asm volatile("mrs %0, cntpct_el0" : "=r" (cntpct)); - } while (((cntpct + 1) & GENMASK(10, 0)) <= 1); + } while (((cntpct + 1) & GENMASK(9, 0)) <= 1);
Please update the 11 to 10 in the comment above as well.
return cntpct; } -- 2.17.1
Thanks, Samuel

On 17/03/2019 18:41, Oskari Lemmelä wrote:
On 3/17/19 6:04 PM, Peter Robinson wrote:
On Sun, Mar 17, 2019 at 2:56 PM Oskari Lemmela oskari@lemmela.net wrote:
Fixes spurious timeouts which have been seen during testing SPI_SUNXI driver. The false timeouts disappear when number of bits reduced to 10 in workaround.
So in general I am fine with this patch, if it fixes things for you, but still scratching my head about this. AFAIK Samuel has never seen less than 11 identical bits in his testing, and I am using the new SPI driver for some weeks now (without the patch) and never had any issues. I am on the Pine64 LTS (with that "R18" SoC).
So Oskari, can you share how exactly you triggered this problem? I'd rather know what's going on before papering over the issue, especially if that leaves the much more important Linux kernel still at risk.
Cheers, Andre.
The false timeouts are caused by timer backward jumps.
Wouldn't it be best to apply the same fix as the kernel uses here [1] as a more permanent fix?
Sure. Difference is that retry counter was added to while loop in kernel side.
Only CNTPCT is used in u-boot. So CNTVCT part of kernel patch is currently not needed?
Signed-off-by: Oskari Lemmela oskari@lemmela.net
arch/arm/cpu/armv8/generic_timer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/cpu/armv8/generic_timer.c b/arch/arm/cpu/armv8/generic_timer.c index c1706dcec1..2e06ee4ed2 100644 --- a/arch/arm/cpu/armv8/generic_timer.c +++ b/arch/arm/cpu/armv8/generic_timer.c @@ -66,7 +66,7 @@ unsigned long timer_read_counter(void) isb(); do { asm volatile("mrs %0, cntpct_el0" : "=r" (cntpct)); - } while (((cntpct + 1) & GENMASK(10, 0)) <= 1); + } while (((cntpct + 1) & GENMASK(9, 0)) <= 1);
return cntpct; } -- 2.17.1
U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot
Oskari

On 3/17/19 11:14 PM, André Przywara wrote:
On 17/03/2019 18:41, Oskari Lemmelä wrote:
On 3/17/19 6:04 PM, Peter Robinson wrote:
On Sun, Mar 17, 2019 at 2:56 PM Oskari Lemmela oskari@lemmela.net wrote:
Fixes spurious timeouts which have been seen during testing SPI_SUNXI driver. The false timeouts disappear when number of bits reduced to 10 in workaround.
So in general I am fine with this patch, if it fixes things for you, but still scratching my head about this. AFAIK Samuel has never seen less than 11 identical bits in his testing, and I am using the new SPI driver for some weeks now (without the patch) and never had any issues. I am on the Pine64 LTS (with that "R18" SoC).
So Oskari, can you share how exactly you triggered this problem? I'd rather know what's going on before papering over the issue, especially if that leaves the much more important Linux kernel still at risk.
Actually now when I'm trying to retrigger it seems to be hard to catch. I was running following loop couple of days without any issues
setenv read_loop 'setenv i 0; while sf read ${kernel_addr_r} 0; do setexpr i ${i} + 1; echo run ${i}; done'
Then I changed wait loop code in spi driver to original one.
diff --git a/drivers/spi/spi-sunxi.c b/drivers/spi/spi-sunxi.c index dbfeac77ee..6aaa79ddd9 100644 --- a/drivers/spi/spi-sunxi.c +++ b/drivers/spi/spi-sunxi.c @@ -377,14 +377,11 @@ static int sun4i_spi_xfer(struct udevice *dev, unsigned int bitlen, setbits_le32(SPI_REG(priv, SPI_TCR), SPI_BIT(priv, SPI_TCR_XCH));
- /* Wait till RX FIFO to be empty */ - ret = readl_poll_timeout(SPI_REG(priv, SPI_FSR), - rx_fifocnt, - (((rx_fifocnt & - SPI_BIT(priv, SPI_FSR_RF_CNT_MASK)) >> - SUN4I_FIFO_STA_RF_CNT_BITS) >= nbytes), - SUN4I_SPI_TIMEOUT_US); - if (ret < 0) { + /* Wait transfer to complete */ + u32 *tcr = SPI_REG(priv, SPI_TCR); + ret = wait_for_bit_le32(tcr, 0x80000000, + false, SUN4I_SPI_TIMEOUT_US, false); + if (ret) { printf("ERROR: sun4i_spi: Timeout transferring data\n"); sun4i_spi_set_cs(bus, slave_plat->cs, false); return ret;
Don't know it that was difference but then only one of my boards trigger it two times in row.
=> setenv read_loop 'setenv i 0; while sf read ${kernel_addr_r} 0; do setexpr i ${i} + 1; echo run ${i}; done' => sf probe SF: Detected w25q128 with page size 256 Bytes, erase size 4 KiB, total 16 MiB => run read_loop device 0 whole chip ERROR: sun4i_spi: Timeout transferring data SF: 16777216 bytes @ 0x0 Read: ERROR -110 => run read_loop device 0 whole chip ERROR: sun4i_spi: Timeout transferring data SF: 16777216 bytes @ 0x0 Read: ERROR -110 =>
And after that also couple of times. I'll leave all those three Pine64-LTS boards to run same code to see if only one of them have this issue.
Thanks, Oskari
Cheers, Andre.
The false timeouts are caused by timer backward jumps.
Wouldn't it be best to apply the same fix as the kernel uses here [1] as a more permanent fix?
Sure. Difference is that retry counter was added to while loop in kernel side.
Only CNTPCT is used in u-boot. So CNTVCT part of kernel patch is currently not needed?
Signed-off-by: Oskari Lemmela oskari@lemmela.net
arch/arm/cpu/armv8/generic_timer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/cpu/armv8/generic_timer.c b/arch/arm/cpu/armv8/generic_timer.c index c1706dcec1..2e06ee4ed2 100644 --- a/arch/arm/cpu/armv8/generic_timer.c +++ b/arch/arm/cpu/armv8/generic_timer.c @@ -66,7 +66,7 @@ unsigned long timer_read_counter(void) isb(); do { asm volatile("mrs %0, cntpct_el0" : "=r" (cntpct)); - } while (((cntpct + 1) & GENMASK(10, 0)) <= 1); + } while (((cntpct + 1) & GENMASK(9, 0)) <= 1);
return cntpct; } -- 2.17.1
U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot
Oskari

On Fri, Mar 22, 2019 at 1:55 AM Oskari Lemmelä oskari@lemmela.net wrote:
On 3/17/19 11:14 PM, André Przywara wrote:
On 17/03/2019 18:41, Oskari Lemmelä wrote:
On 3/17/19 6:04 PM, Peter Robinson wrote:
On Sun, Mar 17, 2019 at 2:56 PM Oskari Lemmela oskari@lemmela.net wrote:
Fixes spurious timeouts which have been seen during testing SPI_SUNXI driver. The false timeouts disappear when number of bits reduced to 10 in workaround.
So in general I am fine with this patch, if it fixes things for you, but still scratching my head about this. AFAIK Samuel has never seen less than 11 identical bits in his testing, and I am using the new SPI driver for some weeks now (without the patch) and never had any issues. I am on the Pine64 LTS (with that "R18" SoC).
So Oskari, can you share how exactly you triggered this problem? I'd rather know what's going on before papering over the issue, especially if that leaves the much more important Linux kernel still at risk.
Actually now when I'm trying to retrigger it seems to be hard to catch. I was running following loop couple of days without any issues
setenv read_loop 'setenv i 0; while sf read ${kernel_addr_r} 0; do setexpr i ${i} + 1; echo run ${i}; done'
Then I changed wait loop code in spi driver to original one.
W/o this timer patch change?
diff --git a/drivers/spi/spi-sunxi.c b/drivers/spi/spi-sunxi.c index dbfeac77ee..6aaa79ddd9 100644 --- a/drivers/spi/spi-sunxi.c +++ b/drivers/spi/spi-sunxi.c @@ -377,14 +377,11 @@ static int sun4i_spi_xfer(struct udevice *dev, unsigned int bitlen, setbits_le32(SPI_REG(priv, SPI_TCR), SPI_BIT(priv, SPI_TCR_XCH));
/* Wait till RX FIFO to be empty */
ret = readl_poll_timeout(SPI_REG(priv, SPI_FSR),
rx_fifocnt,
(((rx_fifocnt &
SPI_BIT(priv,
SPI_FSR_RF_CNT_MASK)) >>
- SUN4I_FIFO_STA_RF_CNT_BITS) >= nbytes),
- SUN4I_SPI_TIMEOUT_US);
if (ret < 0) {
/* Wait transfer to complete */
u32 *tcr = SPI_REG(priv, SPI_TCR);
ret = wait_for_bit_le32(tcr, 0x80000000,
This look strange, to me since we are relying on poll transfer where the existing code is checking fifo count which seems valid.
false, SUN4I_SPI_TIMEOUT_US, false);
if (ret) { printf("ERROR: sun4i_spi: Timeout transferring
data\n"); sun4i_spi_set_cs(bus, slave_plat->cs, false); return ret;
Don't know it that was difference but then only one of my boards trigger it two times in row.
=> setenv read_loop 'setenv i 0; while sf read ${kernel_addr_r} 0; do setexpr i ${i} + 1; echo run ${i}; done'
let me test it.
participants (6)
-
André Przywara
-
Jagan Teki
-
Oskari Lemmela
-
Oskari Lemmelä
-
Peter Robinson
-
Samuel Holland