[PATCH 0/2] spi: sunxi: Improve the loading speed

Right now, the maximal transfer speed from an SPI flash on a V3s is about 240kb/s. That is pretty slow. It turns out, that due to an error u-boot is setting the maximum frequency to 1MHz. By fixing that another bug is unearthed: one cannot set a clock divider of 1:1 due to the handling between CDR1 and CDR2 handling. By fixing that I achieved loading speeds of about 1.5MB/s.
Michael Walle (2): spi: sunxi: drop max_hz handling spi: sunxi: fix clock divider calculation for max frequency setting
drivers/spi/spi-sunxi.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-)

The driver is trying to read the "spi-max-frequency" property of the *controller* driver node. There is no such property. The "spi-max-frequency" property belongs to the SPI devices on the bus.
Right now, the driver will always fall back to the default value of 1MHz and thus flash reads are very slow with just about 215kb/s.
In fact, the SPI uclass will already take care of everything and we just have to clamp the frequency to the values the driver/hardware supports. Thus, drop the whole max_hz handling.
Signed-off-by: Michael Walle mwalle@kernel.org --- drivers/spi/spi-sunxi.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/drivers/spi/spi-sunxi.c b/drivers/spi/spi-sunxi.c index 13725ee7a2d..bfb402902b8 100644 --- a/drivers/spi/spi-sunxi.c +++ b/drivers/spi/spi-sunxi.c @@ -135,7 +135,6 @@ struct sun4i_spi_variant { struct sun4i_spi_plat { struct sun4i_spi_variant *variant; u32 base; - u32 max_hz; };
struct sun4i_spi_priv { @@ -237,6 +236,13 @@ static void sun4i_spi_set_speed_mode(struct udevice *dev) unsigned int div; u32 reg;
+ /* + * The uclass should take care that this won't happen. But anyway, + * avoid a div-by-zero exception. + */ + if (!priv->freq) + return; + /* * Setup clock divider. * @@ -401,11 +407,10 @@ static int sun4i_spi_xfer(struct udevice *dev, unsigned int bitlen,
static int sun4i_spi_set_speed(struct udevice *dev, uint speed) { - struct sun4i_spi_plat *plat = dev_get_plat(dev); struct sun4i_spi_priv *priv = dev_get_priv(dev);
- if (speed > plat->max_hz) - speed = plat->max_hz; + if (speed > SUN4I_SPI_MAX_RATE) + speed = SUN4I_SPI_MAX_RATE;
if (speed < SUN4I_SPI_MIN_RATE) speed = SUN4I_SPI_MIN_RATE; @@ -458,7 +463,6 @@ static int sun4i_spi_probe(struct udevice *bus)
priv->variant = plat->variant; priv->base = plat->base; - priv->freq = plat->max_hz;
return 0; } @@ -466,16 +470,9 @@ static int sun4i_spi_probe(struct udevice *bus) static int sun4i_spi_of_to_plat(struct udevice *bus) { struct sun4i_spi_plat *plat = dev_get_plat(bus); - int node = dev_of_offset(bus);
plat->base = dev_read_addr(bus); plat->variant = (struct sun4i_spi_variant *)dev_get_driver_data(bus); - plat->max_hz = fdtdec_get_int(gd->fdt_blob, node, - "spi-max-frequency", - SUN4I_SPI_DEFAULT_RATE); - - if (plat->max_hz > SUN4I_SPI_MAX_RATE) - plat->max_hz = SUN4I_SPI_MAX_RATE;
return 0; }

On Fri, 12 Jul 2024 19:14:56 +0200 Michael Walle mwalle@kernel.org wrote:
Hi Michael,
The driver is trying to read the "spi-max-frequency" property of the *controller* driver node. There is no such property. The "spi-max-frequency" property belongs to the SPI devices on the bus.
Ah, indeed, good catch! Many thanks for sending this!
Right now, the driver will always fall back to the default value of 1MHz and thus flash reads are very slow with just about 215kb/s.
That's even slower, right? I guess around 125 KB/s?
In fact, the SPI uclass will already take care of everything and we just have to clamp the frequency to the values the driver/hardware supports. Thus, drop the whole max_hz handling.
Looks good to me, I verified this by timing the read, this patch indeed significantly increases the performance. Also changing the limit in the DT gets reflected in the driver and in the read speed. Also verified that the values read from the SPI flash are the same in all cases.
Signed-off-by: Michael Walle mwalle@kernel.org
Reviewed-by: Andre Przywara andre.przywara@arm.com Tested-by: Andre Przywara andre.przywara@arm.com
I will make this part of the first 2024.10 PR.
Cheers, Andre
drivers/spi/spi-sunxi.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/drivers/spi/spi-sunxi.c b/drivers/spi/spi-sunxi.c index 13725ee7a2d..bfb402902b8 100644 --- a/drivers/spi/spi-sunxi.c +++ b/drivers/spi/spi-sunxi.c @@ -135,7 +135,6 @@ struct sun4i_spi_variant { struct sun4i_spi_plat { struct sun4i_spi_variant *variant; u32 base;
- u32 max_hz;
};
struct sun4i_spi_priv { @@ -237,6 +236,13 @@ static void sun4i_spi_set_speed_mode(struct udevice *dev) unsigned int div; u32 reg;
- /*
* The uclass should take care that this won't happen. But anyway,
* avoid a div-by-zero exception.
*/
- if (!priv->freq)
return;
- /*
- Setup clock divider.
@@ -401,11 +407,10 @@ static int sun4i_spi_xfer(struct udevice *dev, unsigned int bitlen,
static int sun4i_spi_set_speed(struct udevice *dev, uint speed) {
struct sun4i_spi_plat *plat = dev_get_plat(dev); struct sun4i_spi_priv *priv = dev_get_priv(dev);
if (speed > plat->max_hz)
speed = plat->max_hz;
if (speed > SUN4I_SPI_MAX_RATE)
speed = SUN4I_SPI_MAX_RATE;
if (speed < SUN4I_SPI_MIN_RATE) speed = SUN4I_SPI_MIN_RATE;
@@ -458,7 +463,6 @@ static int sun4i_spi_probe(struct udevice *bus)
priv->variant = plat->variant; priv->base = plat->base;
priv->freq = plat->max_hz;
return 0;
} @@ -466,16 +470,9 @@ static int sun4i_spi_probe(struct udevice *bus) static int sun4i_spi_of_to_plat(struct udevice *bus) { struct sun4i_spi_plat *plat = dev_get_plat(bus);
int node = dev_of_offset(bus);
plat->base = dev_read_addr(bus); plat->variant = (struct sun4i_spi_variant *)dev_get_driver_data(bus);
plat->max_hz = fdtdec_get_int(gd->fdt_blob, node,
"spi-max-frequency",
SUN4I_SPI_DEFAULT_RATE);
if (plat->max_hz > SUN4I_SPI_MAX_RATE)
plat->max_hz = SUN4I_SPI_MAX_RATE;
return 0;
}

Hi,
The driver is trying to read the "spi-max-frequency" property of the *controller* driver node. There is no such property. The "spi-max-frequency" property belongs to the SPI devices on the bus.
Ah, indeed, good catch! Many thanks for sending this!
Right now, the driver will always fall back to the default value of 1MHz and thus flash reads are very slow with just about 215kb/s.
That's even slower, right? I guess around 125 KB/s?
Yes of course :) 1Mhz/8 at most. I was fooled by the "sf update" command which will skip the same sectors and then the overall speed will be faster.
In fact, the SPI uclass will already take care of everything and we just have to clamp the frequency to the values the driver/hardware supports. Thus, drop the whole max_hz handling.
Looks good to me, I verified this by timing the read, this patch indeed significantly increases the performance. Also changing the limit in the DT gets reflected in the driver and in the read speed. Also verified that the values read from the SPI flash are the same in all cases.
Signed-off-by: Michael Walle mwalle@kernel.org
Reviewed-by: Andre Przywara andre.przywara@arm.com Tested-by: Andre Przywara andre.przywara@arm.com
I will make this part of the first 2024.10 PR.
This means just 1/2 or both? Because there was no Rb on the second patch.
-michael

On Tue, 16 Jul 2024 08:58:14 +0200 "Michael Walle" mwalle@kernel.org wrote:
Hi,
The driver is trying to read the "spi-max-frequency" property of the *controller* driver node. There is no such property. The "spi-max-frequency" property belongs to the SPI devices on the bus.
Ah, indeed, good catch! Many thanks for sending this!
Right now, the driver will always fall back to the default value of 1MHz and thus flash reads are very slow with just about 215kb/s.
That's even slower, right? I guess around 125 KB/s?
Yes of course :) 1Mhz/8 at most. I was fooled by the "sf update" command which will skip the same sectors and then the overall speed will be faster.
In fact, the SPI uclass will already take care of everything and we just have to clamp the frequency to the values the driver/hardware supports. Thus, drop the whole max_hz handling.
Looks good to me, I verified this by timing the read, this patch indeed significantly increases the performance. Also changing the limit in the DT gets reflected in the driver and in the read speed. Also verified that the values read from the SPI flash are the same in all cases.
Signed-off-by: Michael Walle mwalle@kernel.org
Reviewed-by: Andre Przywara andre.przywara@arm.com Tested-by: Andre Przywara andre.przywara@arm.com
I will make this part of the first 2024.10 PR.
This means just 1/2 or both? Because there was no Rb on the second patch.
Just this one for now, as I was preparing the pull request. It's a fix, so I can send it anytime later, it doesn't have to wait for anything. I was hoping we can fix that other calculation issue at the same time.
Cheers, Andre.

If the maximum frequency is requested, we still fall into the CDR2 handling. But there the minimal divider is 2. For the sun6i and sun8i we can do better with the CDR1 setting where the minimal divider is 1: SPI_CLK = MOD_CLK / 2 ^ cdr with cdr = 0
Thus, handle the div = 1 case specially.
While at it, correct the comment above the calculation.
Signed-off-by: Michael Walle mwalle@kernel.org --- drivers/spi/spi-sunxi.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/spi/spi-sunxi.c b/drivers/spi/spi-sunxi.c index bfb402902b8..3048ab0ecf7 100644 --- a/drivers/spi/spi-sunxi.c +++ b/drivers/spi/spi-sunxi.c @@ -249,6 +249,8 @@ static void sun4i_spi_set_speed_mode(struct udevice *dev) * We have two choices there. Either we can use the clock * divide rate 1, which is calculated thanks to this formula: * SPI_CLK = MOD_CLK / (2 ^ (cdr + 1)) + * Or for sun6i/sun8i variants: + * SPI_CLK = MOD_CLK / (2 ^ cdr) * Or we can use CDR2, which is calculated with the formula: * SPI_CLK = MOD_CLK / (2 * (cdr + 1)) * Whether we use the former or the latter is set through the @@ -256,12 +258,15 @@ static void sun4i_spi_set_speed_mode(struct udevice *dev) * * First try CDR2, and if we can't reach the expected * frequency, fall back to CDR1. + * There is one exception if the requested clock is the input + * clock. In that case we always use CDR1 because we'll get a + * 1:1: ration for sun6i/sun8i variants. */
div = DIV_ROUND_UP(SUNXI_INPUT_CLOCK, priv->freq); reg = readl(SPI_REG(priv, SPI_CCR));
- if ((div / 2) <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) { + if (div != 1 && ((div / 2) <= (SUN4I_CLK_CTL_CDR2_MASK + 1))) { div /= 2; if (div > 0) div--;

On Fri, 12 Jul 2024 19:14:57 +0200 Michael Walle mwalle@kernel.org wrote:
Hi,
If the maximum frequency is requested, we still fall into the CDR2 handling. But there the minimal divider is 2. For the sun6i and sun8i we can do better with the CDR1 setting where the minimal divider is 1: SPI_CLK = MOD_CLK / 2 ^ cdr with cdr = 0
Thus, handle the div = 1 case specially.
Yes, this is correct: we lose half the performance without this patch, in the (common) full clock speed case. However ....
While at it, correct the comment above the calculation.
Signed-off-by: Michael Walle mwalle@kernel.org
drivers/spi/spi-sunxi.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/spi/spi-sunxi.c b/drivers/spi/spi-sunxi.c index bfb402902b8..3048ab0ecf7 100644 --- a/drivers/spi/spi-sunxi.c +++ b/drivers/spi/spi-sunxi.c @@ -249,6 +249,8 @@ static void sun4i_spi_set_speed_mode(struct udevice *dev) * We have two choices there. Either we can use the clock * divide rate 1, which is calculated thanks to this formula: * SPI_CLK = MOD_CLK / (2 ^ (cdr + 1))
* Or for sun6i/sun8i variants:
* SPI_CLK = MOD_CLK / (2 ^ cdr)
- Or we can use CDR2, which is calculated with the formula:
- SPI_CLK = MOD_CLK / (2 * (cdr + 1))
- Whether we use the former or the latter is set through the
@@ -256,12 +258,15 @@ static void sun4i_spi_set_speed_mode(struct udevice *dev) * * First try CDR2, and if we can't reach the expected * frequency, fall back to CDR1.
* There is one exception if the requested clock is the input
* clock. In that case we always use CDR1 because we'll get a
* 1:1: ration for sun6i/sun8i variants.
*/
div = DIV_ROUND_UP(SUNXI_INPUT_CLOCK, priv->freq); reg = readl(SPI_REG(priv, SPI_CCR));
- if ((div / 2) <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
- if (div != 1 && ((div / 2) <= (SUN4I_CLK_CTL_CDR2_MASK + 1))) { div /= 2;
This is still not fully correct, is it? If I ask for 10 MHz, the algorithm should select 8 MHz (24/3) or actually 6 MHz (24/4), but it chooses 12 MHz (24/2), which is too much. So I think this division here should be either: div = (div + 1) / 2; or: div = DIV_ROUND_UP(div, 2);
Can someone confirm this?
Cheers, Andre
if (div > 0) div--;

- if ((div / 2) <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
- if (div != 1 && ((div / 2) <= (SUN4I_CLK_CTL_CDR2_MASK + 1))) { div /= 2;
This is still not fully correct, is it? If I ask for 10 MHz, the algorithm should select 8 MHz (24/3) or actually 6 MHz (24/4), but it chooses 12 MHz (24/2), which is too much. So I think this division here should be either: div = (div + 1) / 2; or: div = DIV_ROUND_UP(div, 2);
Can someone confirm this?
When I've written this patch, I've looked at how linux does it (and it's history) and I'm sure you know that linux has two drivers, for sun4i and sun6i/sun8i. Somehow u-boot conflates them into one with just one being correct with the SPI_CLK in the CDR2 case (?).
But anyway, this is about CDR1 and it seems you're right. But OTOH I've tested this briefly with "sf probe 0:0 <hz>" and looked at the SCK frequency of the readid command with a scope and it was always less than my requested frequency. At least after the second probe (there must be another bug which will still keep the frequency of the probe at the former speed). Soo.. I'm not sure. Mh.
While this might be a bug, it doesn't affect this patch which will just make sure we can get a 1:1 ratio on SoCs where this is possible, i.e. not on the sun4i variant.
-michael

On Fri, 12 Jul 2024 at 18:25, Michael Walle mwalle@kernel.org wrote:
Right now, the maximal transfer speed from an SPI flash on a V3s is about 240kb/s. That is pretty slow. It turns out, that due to an error u-boot is setting the maximum frequency to 1MHz. By fixing that another bug is unearthed: one cannot set a clock divider of 1:1 due to the handling between CDR1 and CDR2 handling. By fixing that I achieved loading speeds of about 1.5MB/s.
Minor nit, should the clock fix go first so there's not a regression if someone needs to do a bisect on the first commit?
Michael Walle (2): spi: sunxi: drop max_hz handling spi: sunxi: fix clock divider calculation for max frequency setting
drivers/spi/spi-sunxi.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-)
-- 2.39.2

Right now, the maximal transfer speed from an SPI flash on a V3s is about 240kb/s. That is pretty slow. It turns out, that due to an error u-boot is setting the maximum frequency to 1MHz. By fixing that another bug is unearthed: one cannot set a clock divider of 1:1 due to the handling between CDR1 and CDR2 handling. By fixing that I achieved loading speeds of about 1.5MB/s.
Minor nit, should the clock fix go first so there's not a regression if someone needs to do a bisect on the first commit?
Sure can do for the next version.
-michael

On Fri, 12 Jul 2024 18:28:15 +0100 Peter Robinson pbrobinson@gmail.com wrote:
On Fri, 12 Jul 2024 at 18:25, Michael Walle mwalle@kernel.org wrote:
Right now, the maximal transfer speed from an SPI flash on a V3s is about 240kb/s. That is pretty slow. It turns out, that due to an error u-boot is setting the maximum frequency to 1MHz. By fixing that another bug is unearthed: one cannot set a clock divider of 1:1 due to the handling between CDR1 and CDR2 handling. By fixing that I achieved loading speeds of about 1.5MB/s.
Minor nit, should the clock fix go first so there's not a regression if someone needs to do a bisect on the first commit?
I am not sure this really matters here, since this patch just lifts the frequency from 12 MHz to 24 MHz, while patch 1/2 lifts it from 1 MHz to 12 MHz. So there is no regression as such.
Cheers, Andre
Michael Walle (2): spi: sunxi: drop max_hz handling spi: sunxi: fix clock divider calculation for max frequency setting
drivers/spi/spi-sunxi.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-)
-- 2.39.2
participants (3)
-
Andre Przywara
-
Michael Walle
-
Peter Robinson