[PATCH 0/8] sunxi: mmc: Fixes and speed increase

While debugging some eMMC problem on the H616 SoC, I stumbled upon some weird code in the Allwinner MMC driver. Some closer inspection and some help from Ondrej later this series of fixes emerged: Some patches remove part of the #ifdef hell we needlessly had in the driver. A big chunk is around the "new timing mode", which all "newer" SoCs (since around 2015) have, and which requires some extra bit to be set. We didn't enable this mode for all SoCs, this is now fixed in patches 3-6. Patch 7 fixes a big performance problem we had due to using MMIO accesses for the actual data transfer, as opposed to DMA transfers used in Linux. Short from adding a lot of code to use DMA as well, we can actually halve the number of MMIO accesses on reads, effectively doubling the bus transfer performance. This helps the H6 a lot, but also improves the eMMC read performance. The final patch makes use of some generic MMC DT code, to parse generic DT properties. This allows us to remove sunxi specific code, but also adds support for "broken-cd" and more advanced MMC speed modes.
Please have a look and test this code on as many boards as possible. While a performance increase is nice, we don't want to risk data integrity over this, so please try to verify that it still works for you.
Cheers, Andre.
P.S. Patches 5 and 6 use different approaches to differentiate between SoCs specific quirks: Patch 5/8 selects an explicit symbol from the SoC specific sections in our Kconfig file, while patch 6/8 compares the selected SoC type in the C code. Please let me know which approach is better, I can then use this for both patches (and in the future).
Andre Przywara (8): mmc: sunxi: Avoid #ifdefs in delay and width setup mmc: sunxi: Fix warnings with CONFIG_PHYS_64BIT mmc: sunxi: Fix MMC clock parent selection mmc: sunxi: Cleanup "new timing mode" selection mmc: sunxi: Enable "new timing mode" on all new SoCs mmc: sunxi: Cleanup and fix self-calibration code mmc: sunxi: Increase MMIO FIFO read performance mmc: sunxi: Use mmc_of_parse()
.../include/asm/arch-sunxi/clock_sun50i_h6.h | 2 +- arch/arm/include/asm/arch-sunxi/mmc.h | 1 + arch/arm/mach-sunxi/Kconfig | 3 + drivers/mmc/sunxi_mmc.c | 160 +++++++++++------- 4 files changed, 102 insertions(+), 64 deletions(-)

The delay and bus-width setup are slightly different across the Allwinner SoC generations, and we covered this so far with some preprocessor conditionals.
Use the more readable IS_ENABLE() instead.
Signed-off-by: Andre Przywara andre.przywara@arm.com --- drivers/mmc/sunxi_mmc.c | 33 +++++++++++++++------------------ 1 file changed, 15 insertions(+), 18 deletions(-)
diff --git a/drivers/mmc/sunxi_mmc.c b/drivers/mmc/sunxi_mmc.c index 3503ccdb2ee..87b79fcf5ef 100644 --- a/drivers/mmc/sunxi_mmc.c +++ b/drivers/mmc/sunxi_mmc.c @@ -156,23 +156,19 @@ static int mmc_set_mod_clk(struct sunxi_mmc_priv *priv, unsigned int hz) } else if (hz <= 25000000) { oclk_dly = 0; sclk_dly = 5; -#ifdef CONFIG_MACH_SUN9I - } else if (hz <= 52000000) { - oclk_dly = 5; - sclk_dly = 4; - } else { - /* hz > 52000000 */ - oclk_dly = 2; - sclk_dly = 4; -#else - } else if (hz <= 52000000) { - oclk_dly = 3; - sclk_dly = 4; } else { - /* hz > 52000000 */ - oclk_dly = 1; + if (IS_ENABLED(CONFIG_MACH_SUN9I)) { + if (hz <= 52000000) + oclk_dly = 5; + else + oclk_dly = 2; + } else { + if (hz <= 52000000) + oclk_dly = 3; + else + oclk_dly = 1; + } sclk_dly = 4; -#endif }
if (new_mode) { @@ -521,10 +517,11 @@ struct mmc *sunxi_mmc_init(int sdc_no)
cfg->voltages = MMC_VDD_32_33 | MMC_VDD_33_34; cfg->host_caps = MMC_MODE_4BIT; -#if defined(CONFIG_MACH_SUN50I) || defined(CONFIG_MACH_SUN8I) || defined(CONFIG_SUN50I_GEN_H6) - if (sdc_no == 2) + + if ((IS_ENABLED(CONFIG_MACH_SUN50I) || IS_ENABLED(CONFIG_MACH_SUN8I) || + IS_ENABLED(CONFIG_SUN50I_GEN_H6)) && (sdc_no == 2)) cfg->host_caps = MMC_MODE_8BIT; -#endif + cfg->host_caps |= MMC_MODE_HS_52MHz | MMC_MODE_HS; cfg->b_max = CONFIG_SYS_MMC_MAX_BLK_COUNT;

On 5/25/21 8:30 AM, Andre Przywara wrote:
The delay and bus-width setup are slightly different across the Allwinner SoC generations, and we covered this so far with some preprocessor conditionals.
Use the more readable IS_ENABLE() instead.
Signed-off-by: Andre Przywara andre.przywara@arm.com
Reviewed-by: Jaehoon Chung jh80.chung@samsung.com
Best Regards, Jaehoon Chung
drivers/mmc/sunxi_mmc.c | 33 +++++++++++++++------------------ 1 file changed, 15 insertions(+), 18 deletions(-)
diff --git a/drivers/mmc/sunxi_mmc.c b/drivers/mmc/sunxi_mmc.c index 3503ccdb2ee..87b79fcf5ef 100644 --- a/drivers/mmc/sunxi_mmc.c +++ b/drivers/mmc/sunxi_mmc.c @@ -156,23 +156,19 @@ static int mmc_set_mod_clk(struct sunxi_mmc_priv *priv, unsigned int hz) } else if (hz <= 25000000) { oclk_dly = 0; sclk_dly = 5; -#ifdef CONFIG_MACH_SUN9I
- } else if (hz <= 52000000) {
oclk_dly = 5;
sclk_dly = 4;
- } else {
/* hz > 52000000 */
oclk_dly = 2;
sclk_dly = 4;
-#else
- } else if (hz <= 52000000) {
oclk_dly = 3;
} else {sclk_dly = 4;
/* hz > 52000000 */
oclk_dly = 1;
if (IS_ENABLED(CONFIG_MACH_SUN9I)) {
if (hz <= 52000000)
oclk_dly = 5;
else
oclk_dly = 2;
} else {
if (hz <= 52000000)
oclk_dly = 3;
else
oclk_dly = 1;
sclk_dly = 4;}
-#endif }
if (new_mode) { @@ -521,10 +517,11 @@ struct mmc *sunxi_mmc_init(int sdc_no)
cfg->voltages = MMC_VDD_32_33 | MMC_VDD_33_34; cfg->host_caps = MMC_MODE_4BIT; -#if defined(CONFIG_MACH_SUN50I) || defined(CONFIG_MACH_SUN8I) || defined(CONFIG_SUN50I_GEN_H6)
- if (sdc_no == 2)
- if ((IS_ENABLED(CONFIG_MACH_SUN50I) || IS_ENABLED(CONFIG_MACH_SUN8I) ||
cfg->host_caps = MMC_MODE_8BIT;IS_ENABLED(CONFIG_SUN50I_GEN_H6)) && (sdc_no == 2))
-#endif
- cfg->host_caps |= MMC_MODE_HS_52MHz | MMC_MODE_HS; cfg->b_max = CONFIG_SYS_MMC_MAX_BLK_COUNT;

When enabling PHYS_64BIT on 32-bit platforms, we get two warnings about pointer casts in sunxi_mmc.c. Those are related to MMIO addresses, which are always below 1GB on all Allwinner SoCs, so there is no problem with anything having more than 32 bits.
Add the proper casts to make it compile cleanly.
Signed-off-by: Andre Przywara andre.przywara@arm.com --- drivers/mmc/sunxi_mmc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/mmc/sunxi_mmc.c b/drivers/mmc/sunxi_mmc.c index 87b79fcf5ef..869af993d35 100644 --- a/drivers/mmc/sunxi_mmc.c +++ b/drivers/mmc/sunxi_mmc.c @@ -631,14 +631,14 @@ static int sunxi_mmc_probe(struct udevice *dev) cfg->f_min = 400000; cfg->f_max = 52000000;
- priv->reg = (void *)dev_read_addr(dev); + priv->reg = dev_read_addr_ptr(dev);
/* We don't have a sunxi clock driver so find the clock address here */ ret = dev_read_phandle_with_args(dev, "clocks", "#clock-cells", 0, 1, &args); if (ret) return ret; - ccu_reg = (u32 *)ofnode_get_addr(args.node); + ccu_reg = (u32 *)(uintptr_t)ofnode_get_addr(args.node);
priv->mmc_no = ((uintptr_t)priv->reg - SUNXI_MMC0_BASE) / 0x1000; priv->mclkreg = (void *)ccu_reg + get_mclk_offset() + priv->mmc_no * 4;

On 5/25/21 8:30 AM, Andre Przywara wrote:
When enabling PHYS_64BIT on 32-bit platforms, we get two warnings about pointer casts in sunxi_mmc.c. Those are related to MMIO addresses, which are always below 1GB on all Allwinner SoCs, so there is no problem with anything having more than 32 bits.
Add the proper casts to make it compile cleanly.
Signed-off-by: Andre Przywara andre.przywara@arm.com
Reviewed-by: Jaehoon Chung jh80.chung@samsung.com
Best Regards, Jaehoon Chung
drivers/mmc/sunxi_mmc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/mmc/sunxi_mmc.c b/drivers/mmc/sunxi_mmc.c index 87b79fcf5ef..869af993d35 100644 --- a/drivers/mmc/sunxi_mmc.c +++ b/drivers/mmc/sunxi_mmc.c @@ -631,14 +631,14 @@ static int sunxi_mmc_probe(struct udevice *dev) cfg->f_min = 400000; cfg->f_max = 52000000;
- priv->reg = (void *)dev_read_addr(dev);
priv->reg = dev_read_addr_ptr(dev);
/* We don't have a sunxi clock driver so find the clock address here */ ret = dev_read_phandle_with_args(dev, "clocks", "#clock-cells", 0, 1, &args); if (ret) return ret;
- ccu_reg = (u32 *)ofnode_get_addr(args.node);
ccu_reg = (u32 *)(uintptr_t)ofnode_get_addr(args.node);
priv->mmc_no = ((uintptr_t)priv->reg - SUNXI_MMC0_BASE) / 0x1000; priv->mclkreg = (void *)ccu_reg + get_mclk_offset() + priv->mmc_no * 4;

Most Allwinner SoCs which use the so called "new timing mode" in their MMC controllers actually use the double-rate PLL6/PERIPH0 clock as their parent input clock. This is interestingly enough compensated by a hidden "by 2" post-divider in the mod clock, so the divider and actual output rate stay the same.
Even though for the H6 and H616 (but only for them!) we use the doubled input clock for the divider computation, we never accounted for the implicit post-divider, so the clock was only half the speed on those SoCs. This didn't really matter so far, as our slow MMIO routine limits the transfer speed anyway, but we will fix this soon.
Clean up the code around that selection, to always use the normal PLL6 (PERIPH0(1x)) clock as an input. As the rate and divider are the same, that makes no difference. Explain the hardware differences in a comment.
Signed-off-by: Andre Przywara andre.przywara@arm.com --- arch/arm/include/asm/arch-sunxi/clock_sun50i_h6.h | 2 +- drivers/mmc/sunxi_mmc.c | 10 +++++++--- 2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/arch/arm/include/asm/arch-sunxi/clock_sun50i_h6.h b/arch/arm/include/asm/arch-sunxi/clock_sun50i_h6.h index 62abfc4ef6b..e000f78ff45 100644 --- a/arch/arm/include/asm/arch-sunxi/clock_sun50i_h6.h +++ b/arch/arm/include/asm/arch-sunxi/clock_sun50i_h6.h @@ -326,7 +326,7 @@ struct sunxi_ccm_reg { #define CCM_MMC_CTRL_M(x) ((x) - 1) #define CCM_MMC_CTRL_N(x) ((x) << 8) #define CCM_MMC_CTRL_OSCM24 (0x0 << 24) -#define CCM_MMC_CTRL_PLL6X2 (0x1 << 24) +#define CCM_MMC_CTRL_PLL6 (0x1 << 24) #define CCM_MMC_CTRL_PLL_PERIPH2X2 (0x2 << 24) #define CCM_MMC_CTRL_ENABLE (0x1 << 31) /* H6 doesn't have these delays */ diff --git a/drivers/mmc/sunxi_mmc.c b/drivers/mmc/sunxi_mmc.c index 869af993d35..bc68debdad6 100644 --- a/drivers/mmc/sunxi_mmc.c +++ b/drivers/mmc/sunxi_mmc.c @@ -124,10 +124,14 @@ static int mmc_set_mod_clk(struct sunxi_mmc_priv *priv, unsigned int hz) #ifdef CONFIG_MACH_SUN9I pll = CCM_MMC_CTRL_PLL_PERIPH0; pll_hz = clock_get_pll4_periph0(); -#elif defined(CONFIG_SUN50I_GEN_H6) - pll = CCM_MMC_CTRL_PLL6X2; - pll_hz = clock_get_pll6() * 2; #else + /* + * SoCs since the A64 (H5, H6, H616) actually use the doubled + * rate of PLL6/PERIPH0 as an input clock, but compensate for + * that with a fixed post-divider of 2 in the mod clock. + * This cancels each other out, so for simplicity we just + * pretend it's always PLL6 without a post divider here. + */ pll = CCM_MMC_CTRL_PLL6; pll_hz = clock_get_pll6(); #endif

Among the SoCs using the "new timing mode", only the A83T needs to explicitly switch to that mode.
By just defining the symbol for that one odd A83T bit to 0 for any other SoCs, we can always OR that in, and save the confusing nested #ifdefs.
Clean up the also confusing new_mode setting on the way.
Signed-off-by: Andre Przywara andre.przywara@arm.com --- drivers/mmc/sunxi_mmc.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/drivers/mmc/sunxi_mmc.c b/drivers/mmc/sunxi_mmc.c index bc68debdad6..33cedb4edba 100644 --- a/drivers/mmc/sunxi_mmc.c +++ b/drivers/mmc/sunxi_mmc.c @@ -23,6 +23,10 @@ #include <asm-generic/gpio.h> #include <linux/delay.h>
+#ifndef CCM_MMC_CTRL_MODE_SEL_NEW +#define CCM_MMC_CTRL_MODE_SEL_NEW 0 +#endif + struct sunxi_mmc_plat { struct mmc_config cfg; struct mmc mmc; @@ -102,13 +106,10 @@ static int mmc_resource_init(int sdc_no) static int mmc_set_mod_clk(struct sunxi_mmc_priv *priv, unsigned int hz) { unsigned int pll, pll_hz, div, n, oclk_dly, sclk_dly; - bool new_mode = true; + bool new_mode = IS_ENABLED(CONFIG_MMC_SUNXI_HAS_NEW_MODE); bool calibrate = false; u32 val = 0;
- if (!IS_ENABLED(CONFIG_MMC_SUNXI_HAS_NEW_MODE)) - new_mode = false; - /* A83T support new mode only on eMMC */ if (IS_ENABLED(CONFIG_MACH_SUN8I_A83T) && priv->mmc_no != 2) new_mode = false; @@ -176,12 +177,8 @@ static int mmc_set_mod_clk(struct sunxi_mmc_priv *priv, unsigned int hz) }
if (new_mode) { -#ifdef CONFIG_MMC_SUNXI_HAS_NEW_MODE -#ifdef CONFIG_MMC_SUNXI_HAS_MODE_SWITCH - val = CCM_MMC_CTRL_MODE_SEL_NEW; -#endif + val |= CCM_MMC_CTRL_MODE_SEL_NEW; setbits_le32(&priv->reg->ntsr, SUNXI_MMC_NTSR_MODE_SEL_NEW); -#endif } else if (!calibrate) { /* * Use hardcoded delay values if controller doesn't support

On 5/25/21 8:30 AM, Andre Przywara wrote:
Among the SoCs using the "new timing mode", only the A83T needs to explicitly switch to that mode.
By just defining the symbol for that one odd A83T bit to 0 for any other SoCs, we can always OR that in, and save the confusing nested #ifdefs.
Clean up the also confusing new_mode setting on the way.
Signed-off-by: Andre Przywara andre.przywara@arm.com
Reviewed-by: Jaehoon Chung jh80.chung@samsung.com
Best Regards, Jaehoon Chung
drivers/mmc/sunxi_mmc.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/drivers/mmc/sunxi_mmc.c b/drivers/mmc/sunxi_mmc.c index bc68debdad6..33cedb4edba 100644 --- a/drivers/mmc/sunxi_mmc.c +++ b/drivers/mmc/sunxi_mmc.c @@ -23,6 +23,10 @@ #include <asm-generic/gpio.h> #include <linux/delay.h>
+#ifndef CCM_MMC_CTRL_MODE_SEL_NEW +#define CCM_MMC_CTRL_MODE_SEL_NEW 0 +#endif
struct sunxi_mmc_plat { struct mmc_config cfg; struct mmc mmc; @@ -102,13 +106,10 @@ static int mmc_resource_init(int sdc_no) static int mmc_set_mod_clk(struct sunxi_mmc_priv *priv, unsigned int hz) { unsigned int pll, pll_hz, div, n, oclk_dly, sclk_dly;
- bool new_mode = true;
- bool new_mode = IS_ENABLED(CONFIG_MMC_SUNXI_HAS_NEW_MODE); bool calibrate = false; u32 val = 0;
- if (!IS_ENABLED(CONFIG_MMC_SUNXI_HAS_NEW_MODE))
new_mode = false;
- /* A83T support new mode only on eMMC */ if (IS_ENABLED(CONFIG_MACH_SUN8I_A83T) && priv->mmc_no != 2) new_mode = false;
@@ -176,12 +177,8 @@ static int mmc_set_mod_clk(struct sunxi_mmc_priv *priv, unsigned int hz) }
if (new_mode) { -#ifdef CONFIG_MMC_SUNXI_HAS_NEW_MODE -#ifdef CONFIG_MMC_SUNXI_HAS_MODE_SWITCH
val = CCM_MMC_CTRL_MODE_SEL_NEW;
-#endif
setbits_le32(&priv->reg->ntsr, SUNXI_MMC_NTSR_MODE_SEL_NEW);val |= CCM_MMC_CTRL_MODE_SEL_NEW;
-#endif } else if (!calibrate) { /* * Use hardcoded delay values if controller doesn't support

All SoCs since the Allwinner A64 (H5, H6, R40, H616) feature the so called "new timing mode", so enable this in Kconfig for those SoCs.
Signed-off-by: Andre Przywara andre.przywara@arm.com --- arch/arm/mach-sunxi/Kconfig | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/arch/arm/mach-sunxi/Kconfig b/arch/arm/mach-sunxi/Kconfig index bc8509b72a2..b8abe2fefa9 100644 --- a/arch/arm/mach-sunxi/Kconfig +++ b/arch/arm/mach-sunxi/Kconfig @@ -152,6 +152,7 @@ config SUN50I_GEN_H6 bool select FIT select SPL_LOAD_FIT + select MMC_SUNXI_HAS_NEW_MODE select SUPPORT_SPL ---help--- Select this for sunxi SoCs which have H6 like peripherals, clocks @@ -297,6 +298,7 @@ config MACH_SUN8I_R40 select CPU_V7_HAS_VIRT select ARCH_SUPPORT_PSCI select SUNXI_GEN_SUN6I + select MMC_SUNXI_HAS_NEW_MODE select SUPPORT_SPL select SUNXI_DRAM_DW select SUNXI_DRAM_DW_32BIT @@ -346,6 +348,7 @@ config MACH_SUN50I_H5 bool "sun50i (Allwinner H5)" select ARM64 select MACH_SUNXI_H3_H5 + select MMC_SUNXI_HAS_NEW_MODE select FIT select SPL_LOAD_FIT

Newer SoCs have a self calibration feature, which avoids us writing hard coded phase delay values into the controller.
Consolidate the code by avoiding unnecessary #ifdefs, and also enabling the feature for all those newer SoCs.
Signed-off-by: Andre Przywara andre.przywara@arm.com --- drivers/mmc/sunxi_mmc.c | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-)
diff --git a/drivers/mmc/sunxi_mmc.c b/drivers/mmc/sunxi_mmc.c index 33cedb4edba..a30fd8fbdb1 100644 --- a/drivers/mmc/sunxi_mmc.c +++ b/drivers/mmc/sunxi_mmc.c @@ -103,21 +103,29 @@ static int mmc_resource_init(int sdc_no) } #endif
+/* + * All A64 and later MMC controllers feature auto-calibration. This would + * normally be detected via the compatible string, but we need something + * which works in the SPL as well. + */ +static bool sunxi_mmc_can_calibrate(void) +{ + return IS_ENABLED(CONFIG_MACH_SUN50I) || + IS_ENABLED(CONFIG_MACH_SUN50I_H5) || + IS_ENABLED(CONFIG_SUN50I_GEN_H6) || + IS_ENABLED(CONFIG_MACH_SUN8I_R40); +} + static int mmc_set_mod_clk(struct sunxi_mmc_priv *priv, unsigned int hz) { unsigned int pll, pll_hz, div, n, oclk_dly, sclk_dly; bool new_mode = IS_ENABLED(CONFIG_MMC_SUNXI_HAS_NEW_MODE); - bool calibrate = false; u32 val = 0;
/* A83T support new mode only on eMMC */ if (IS_ENABLED(CONFIG_MACH_SUN8I_A83T) && priv->mmc_no != 2) new_mode = false;
-#if defined(CONFIG_MACH_SUN50I) || defined(CONFIG_SUN50I_GEN_H6) - calibrate = true; -#endif - if (hz <= 24000000) { pll = CCM_MMC_CTRL_OSCM24; pll_hz = 24000000; @@ -179,7 +187,9 @@ static int mmc_set_mod_clk(struct sunxi_mmc_priv *priv, unsigned int hz) if (new_mode) { val |= CCM_MMC_CTRL_MODE_SEL_NEW; setbits_le32(&priv->reg->ntsr, SUNXI_MMC_NTSR_MODE_SEL_NEW); - } else if (!calibrate) { + } + + if (!sunxi_mmc_can_calibrate()) { /* * Use hardcoded delay values if controller doesn't support * calibration @@ -237,14 +247,15 @@ static int mmc_config_clock(struct sunxi_mmc_priv *priv, struct mmc *mmc) rval &= ~SUNXI_MMC_CLK_DIVIDER_MASK; writel(rval, &priv->reg->clkcr);
-#if defined(CONFIG_MACH_SUN50I) || defined(CONFIG_SUN50I_GEN_H6) +#if defined(CONFIG_SUNXI_GEN_SUN6I) || defined(CONFIG_SUN50I_GEN_H6) /* A64 supports calibration of delays on MMC controller and we * have to set delay of zero before starting calibration. * Allwinner BSP driver sets a delay only in the case of * using HS400 which is not supported by mainline U-Boot or * Linux at the moment */ - writel(SUNXI_MMC_CAL_DL_SW_EN, &priv->reg->samp_dl); + if (sunxi_mmc_can_calibrate()) + writel(SUNXI_MMC_CAL_DL_SW_EN, &priv->reg->samp_dl); #endif
/* Re-enable Clock */

To avoid the complexity of DMA operations (with chained descriptors), we use repeated MMIO reads and writes to the SD_FIFO_REG, which allows us to drain or fill the MMC data buffer FIFO very easily.
However those MMIO accesses are somewhat costly, so this limits our MMC performance, to between 17 and 22 MB/s, but down to 9.5 MB/s on the H6 (partly due to the lower AHB1 frequency).
As it turns out we read the FIFO status register after *every* word we read or write, which effectively doubles the number of MMIO accesses, thus effectively more than halving our performance.
To avoid this overhead, we can make use of the FIFO level bits, which are in the very same FIFO status registers. So for a read request, we now can collect as many words as the FIFO level originally indicated, and only then need to update the status register.
We don't know for sure the size of the FIFO (and it seems to differ across SoCs anyway), so writing is more fragile, which is why we still use the old method for that. If we find a minimum FIFO size available on all SoCs, we could use that, in a later optimisation.
This patch increases the eMMC read speed on a Pine64-LTS from about 22MB/s to 44 MB/s. SD card reads don't gain that much, but with 23 MB/s we now reach the practical limit for 3.3V SD cards. On the H6 we double our transfer speed, from 9.5 MB/s to 19.7 MB/s.
Signed-off-by: Andre Przywara andre.przywara@arm.com --- arch/arm/include/asm/arch-sunxi/mmc.h | 1 + drivers/mmc/sunxi_mmc.c | 39 +++++++++++++++++++++------ 2 files changed, 32 insertions(+), 8 deletions(-)
diff --git a/arch/arm/include/asm/arch-sunxi/mmc.h b/arch/arm/include/asm/arch-sunxi/mmc.h index 340e25b04d2..5daacf10eb1 100644 --- a/arch/arm/include/asm/arch-sunxi/mmc.h +++ b/arch/arm/include/asm/arch-sunxi/mmc.h @@ -119,6 +119,7 @@ struct sunxi_mmc { #define SUNXI_MMC_STATUS_CARD_PRESENT (0x1 << 8) #define SUNXI_MMC_STATUS_CARD_DATA_BUSY (0x1 << 9) #define SUNXI_MMC_STATUS_DATA_FSM_BUSY (0x1 << 10) +#define SUNXI_MMC_STATUS_FIFO_LEVEL(reg) (((reg) >> 17) & 0x3fff)
#define SUNXI_MMC_NTSR_MODE_SEL_NEW (0x1 << 31)
diff --git a/drivers/mmc/sunxi_mmc.c b/drivers/mmc/sunxi_mmc.c index a30fd8fbdb1..115b519546e 100644 --- a/drivers/mmc/sunxi_mmc.c +++ b/drivers/mmc/sunxi_mmc.c @@ -311,8 +311,9 @@ static int mmc_trans_data_by_cpu(struct sunxi_mmc_priv *priv, struct mmc *mmc, SUNXI_MMC_STATUS_FIFO_FULL; unsigned i; unsigned *buff = (unsigned int *)(reading ? data->dest : data->src); - unsigned byte_cnt = data->blocksize * data->blocks; - unsigned timeout_msecs = byte_cnt >> 8; + unsigned word_cnt = (data->blocksize * data->blocks) >> 2; + unsigned timeout_msecs = word_cnt >> 6; + uint32_t status; unsigned long start;
if (timeout_msecs < 2000) @@ -323,16 +324,38 @@ static int mmc_trans_data_by_cpu(struct sunxi_mmc_priv *priv, struct mmc *mmc,
start = get_timer(0);
- for (i = 0; i < (byte_cnt >> 2); i++) { - while (readl(&priv->reg->status) & status_bit) { + for (i = 0; i < word_cnt;) { + unsigned int in_fifo; + + while ((status = readl(&priv->reg->status)) & status_bit) { if (get_timer(start) > timeout_msecs) return -1; }
- if (reading) - buff[i] = readl(&priv->reg->fifo); - else - writel(buff[i], &priv->reg->fifo); + /* + * For writing we do not easily know the FIFO size, so have + * to check the FIFO status after every word written. + * TODO: For optimisation we could work out a minimum FIFO + * size across all SoCs, and use that together with the current + * fill level to write chunks of words. + */ + if (!reading) { + writel(buff[i++], &priv->reg->fifo); + continue; + } + + /* + * The status register holds the current FIFO level, so we + * can be sure to collect as many words from the FIFO + * register without checking the status register after every + * read. That saves half of the costly MMIO reads, effectively + * doubling the read performance. + */ + for (in_fifo = SUNXI_MMC_STATUS_FIFO_LEVEL(status); + in_fifo > 0; + in_fifo--) + buff[i++] = readl_relaxed(&priv->reg->fifo); + dmb(); }
return 0;

At the moment the Allwinner MMC driver parses the bus-width and non-removable DT properties itself, in the probe() routine.
There is actually a generic function provided by the MMC framework doing this job, also it parses more generic properties like broken-cd and advanced transfer modes.
Drop our own code and call mmc_of_parse() instead, to get all new features for free.
Signed-off-by: Andre Przywara andre.przywara@arm.com --- drivers/mmc/sunxi_mmc.c | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-)
diff --git a/drivers/mmc/sunxi_mmc.c b/drivers/mmc/sunxi_mmc.c index 115b519546e..178b8cf106d 100644 --- a/drivers/mmc/sunxi_mmc.c +++ b/drivers/mmc/sunxi_mmc.c @@ -37,7 +37,6 @@ struct sunxi_mmc_priv { uint32_t *mclkreg; unsigned fatal_err; struct gpio_desc cd_gpio; /* Change Detect GPIO */ - int cd_inverted; /* Inverted Card Detect */ struct sunxi_mmc *reg; struct mmc_config cfg; }; @@ -612,12 +611,21 @@ static int sunxi_mmc_send_cmd(struct udevice *dev, struct mmc_cmd *cmd,
static int sunxi_mmc_getcd(struct udevice *dev) { + struct mmc *mmc = mmc_get_mmc_dev(dev); struct sunxi_mmc_priv *priv = dev_get_priv(dev);
+ /* If polling, assume that the card is always present. */ + if ((mmc->cfg->host_caps & MMC_CAP_NONREMOVABLE) || + (mmc->cfg->host_caps & MMC_CAP_NEEDS_POLL)) + return 1; + if (dm_gpio_is_valid(&priv->cd_gpio)) { int cd_state = dm_gpio_get_value(&priv->cd_gpio);
- return cd_state ^ priv->cd_inverted; + if (mmc->cfg->host_caps & MMC_CAP_CD_ACTIVE_HIGH) + return !cd_state; + else + return cd_state; } return 1; } @@ -649,23 +657,21 @@ static int sunxi_mmc_probe(struct udevice *dev) struct mmc_config *cfg = &plat->cfg; struct ofnode_phandle_args args; u32 *ccu_reg; - int bus_width, ret; + int ret;
cfg->name = dev->name; - bus_width = dev_read_u32_default(dev, "bus-width", 1);
cfg->voltages = MMC_VDD_32_33 | MMC_VDD_33_34; - cfg->host_caps = 0; - if (bus_width == 8) - cfg->host_caps |= MMC_MODE_8BIT; - if (bus_width >= 4) - cfg->host_caps |= MMC_MODE_4BIT; - cfg->host_caps |= MMC_MODE_HS_52MHz | MMC_MODE_HS; + cfg->host_caps = MMC_MODE_HS_52MHz | MMC_MODE_HS; cfg->b_max = CONFIG_SYS_MMC_MAX_BLK_COUNT;
cfg->f_min = 400000; cfg->f_max = 52000000;
+ ret = mmc_of_parse(dev, cfg); + if (ret) + return ret; + priv->reg = dev_read_addr_ptr(dev);
/* We don't have a sunxi clock driver so find the clock address here */ @@ -691,17 +697,13 @@ static int sunxi_mmc_probe(struct udevice *dev) return ret;
/* This GPIO is optional */ - if (!dev_read_bool(dev, "non-removable") && - !gpio_request_by_name(dev, "cd-gpios", 0, &priv->cd_gpio, + if (!gpio_request_by_name(dev, "cd-gpios", 0, &priv->cd_gpio, GPIOD_IS_IN)) { int cd_pin = gpio_get_number(&priv->cd_gpio);
sunxi_gpio_set_pull(cd_pin, SUNXI_GPIO_PULL_UP); }
- /* Check if card detect is inverted */ - priv->cd_inverted = dev_read_bool(dev, "cd-inverted"); - upriv->mmc = &plat->mmc;
/* Reset controller */

On 5/25/21 8:30 AM, Andre Przywara wrote:
At the moment the Allwinner MMC driver parses the bus-width and non-removable DT properties itself, in the probe() routine.
There is actually a generic function provided by the MMC framework doing this job, also it parses more generic properties like broken-cd and advanced transfer modes.
Drop our own code and call mmc_of_parse() instead, to get all new features for free.
Signed-off-by: Andre Przywara andre.przywara@arm.com
Reviewed-by: Jaehoon Chung jh80.chung@samsung.com
Best Regards, Jaehoon Chung
drivers/mmc/sunxi_mmc.c | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-)
diff --git a/drivers/mmc/sunxi_mmc.c b/drivers/mmc/sunxi_mmc.c index 115b519546e..178b8cf106d 100644 --- a/drivers/mmc/sunxi_mmc.c +++ b/drivers/mmc/sunxi_mmc.c @@ -37,7 +37,6 @@ struct sunxi_mmc_priv { uint32_t *mclkreg; unsigned fatal_err; struct gpio_desc cd_gpio; /* Change Detect GPIO */
- int cd_inverted; /* Inverted Card Detect */ struct sunxi_mmc *reg; struct mmc_config cfg;
}; @@ -612,12 +611,21 @@ static int sunxi_mmc_send_cmd(struct udevice *dev, struct mmc_cmd *cmd,
static int sunxi_mmc_getcd(struct udevice *dev) {
struct mmc *mmc = mmc_get_mmc_dev(dev); struct sunxi_mmc_priv *priv = dev_get_priv(dev);
/* If polling, assume that the card is always present. */
if ((mmc->cfg->host_caps & MMC_CAP_NONREMOVABLE) ||
(mmc->cfg->host_caps & MMC_CAP_NEEDS_POLL))
return 1;
if (dm_gpio_is_valid(&priv->cd_gpio)) { int cd_state = dm_gpio_get_value(&priv->cd_gpio);
return cd_state ^ priv->cd_inverted;
if (mmc->cfg->host_caps & MMC_CAP_CD_ACTIVE_HIGH)
return !cd_state;
else
} return 1;return cd_state;
} @@ -649,23 +657,21 @@ static int sunxi_mmc_probe(struct udevice *dev) struct mmc_config *cfg = &plat->cfg; struct ofnode_phandle_args args; u32 *ccu_reg;
- int bus_width, ret;
int ret;
cfg->name = dev->name;
bus_width = dev_read_u32_default(dev, "bus-width", 1);
cfg->voltages = MMC_VDD_32_33 | MMC_VDD_33_34;
cfg->host_caps = 0;
if (bus_width == 8)
cfg->host_caps |= MMC_MODE_8BIT;
if (bus_width >= 4)
cfg->host_caps |= MMC_MODE_4BIT;
cfg->host_caps |= MMC_MODE_HS_52MHz | MMC_MODE_HS;
cfg->host_caps = MMC_MODE_HS_52MHz | MMC_MODE_HS; cfg->b_max = CONFIG_SYS_MMC_MAX_BLK_COUNT;
cfg->f_min = 400000; cfg->f_max = 52000000;
ret = mmc_of_parse(dev, cfg);
if (ret)
return ret;
priv->reg = dev_read_addr_ptr(dev);
/* We don't have a sunxi clock driver so find the clock address here */
@@ -691,17 +697,13 @@ static int sunxi_mmc_probe(struct udevice *dev) return ret;
/* This GPIO is optional */
- if (!dev_read_bool(dev, "non-removable") &&
!gpio_request_by_name(dev, "cd-gpios", 0, &priv->cd_gpio,
if (!gpio_request_by_name(dev, "cd-gpios", 0, &priv->cd_gpio, GPIOD_IS_IN)) { int cd_pin = gpio_get_number(&priv->cd_gpio);
sunxi_gpio_set_pull(cd_pin, SUNXI_GPIO_PULL_UP); }
/* Check if card detect is inverted */
priv->cd_inverted = dev_read_bool(dev, "cd-inverted");
upriv->mmc = &plat->mmc;
/* Reset controller */

On Tue, 25 May 2021 00:30:21 +0100 Andre Przywara andre.przywara@arm.com wrote:
Hi,
thanks to Jaehoon for reviewing some patches in here!
Can anyone else please have a look at this series? I am tempted to push them in the upcoming merge window, to expose them to a wider testing audience, but would really like to have some people's eyes on it.
Also testing this on different boards would be much appreciated, especially patch 7/8 deserves some scrutiny, I guess.
Thanks! Andre
While debugging some eMMC problem on the H616 SoC, I stumbled upon some weird code in the Allwinner MMC driver. Some closer inspection and some help from Ondrej later this series of fixes emerged: Some patches remove part of the #ifdef hell we needlessly had in the driver. A big chunk is around the "new timing mode", which all "newer" SoCs (since around 2015) have, and which requires some extra bit to be set. We didn't enable this mode for all SoCs, this is now fixed in patches 3-6. Patch 7 fixes a big performance problem we had due to using MMIO accesses for the actual data transfer, as opposed to DMA transfers used in Linux. Short from adding a lot of code to use DMA as well, we can actually halve the number of MMIO accesses on reads, effectively doubling the bus transfer performance. This helps the H6 a lot, but also improves the eMMC read performance. The final patch makes use of some generic MMC DT code, to parse generic DT properties. This allows us to remove sunxi specific code, but also adds support for "broken-cd" and more advanced MMC speed modes.
Please have a look and test this code on as many boards as possible. While a performance increase is nice, we don't want to risk data integrity over this, so please try to verify that it still works for you.
Cheers, Andre.
P.S. Patches 5 and 6 use different approaches to differentiate between SoCs specific quirks: Patch 5/8 selects an explicit symbol from the SoC specific sections in our Kconfig file, while patch 6/8 compares the selected SoC type in the C code. Please let me know which approach is better, I can then use this for both patches (and in the future).
Andre Przywara (8): mmc: sunxi: Avoid #ifdefs in delay and width setup mmc: sunxi: Fix warnings with CONFIG_PHYS_64BIT mmc: sunxi: Fix MMC clock parent selection mmc: sunxi: Cleanup "new timing mode" selection mmc: sunxi: Enable "new timing mode" on all new SoCs mmc: sunxi: Cleanup and fix self-calibration code mmc: sunxi: Increase MMIO FIFO read performance mmc: sunxi: Use mmc_of_parse()
.../include/asm/arch-sunxi/clock_sun50i_h6.h | 2 +- arch/arm/include/asm/arch-sunxi/mmc.h | 1 + arch/arm/mach-sunxi/Kconfig | 3 + drivers/mmc/sunxi_mmc.c | 160 +++++++++++------- 4 files changed, 102 insertions(+), 64 deletions(-)

On 7/3/21 6:24 PM, Andre Przywara wrote:
On Tue, 25 May 2021 00:30:21 +0100 Andre Przywara andre.przywara@arm.com wrote:
Hi,
thanks to Jaehoon for reviewing some patches in here!
Can anyone else please have a look at this series? I am tempted to push them in the upcoming merge window, to expose them to a wider testing audience, but would really like to have some people's eyes on it.
Also testing this on different boards would be much appreciated, especially patch 7/8 deserves some scrutiny, I guess.
For the series:
Tested-by: Samuel Holland samuel@sholland.org
I have this patch series integrated into my local branch, so I have boot-tested it on several boards: PinePhone (A64), Pine A64+ (A64), Orange Pi Win (A64), Orange Pi PC2 (H5), and Orange Pi Zero 2 (H616).
Cheers, Samuel
participants (3)
-
Andre Przywara
-
Jaehoon Chung
-
Samuel Holland