am654_sdhci: mmc fail to send stop cmd

Hi all,
on one device with one specific SD-card (possibly an aging one), I'm seeing frequent "mmc fail to send stop cmd" messages, followed by read errors when loading kernel and dtb. -ETIMEDOUT is returned by mmd_send_cmd. However, I can always resolve this by simply retrying the stop command like this:
diff --git a/drivers/mmc/mmc.c b/drivers/mmc/mmc.c index f36d11ddc8..9019d9f2ed 100644 --- a/drivers/mmc/mmc.c +++ b/drivers/mmc/mmc.c @@ -406,7 +406,11 @@ static int mmc_read_blocks(struct mmc *mmc, void *dst, lbaint_t start, #if !defined(CONFIG_SPL_BUILD) || defined(CONFIG_SPL_LIBCOMMON_SUPPORT) pr_err("mmc fail to send stop cmd\n"); #endif - return 0; + pr_err("retrying...\n"); + if (mmc_send_cmd(mmc, &cmd, NULL)) { + pr_err("failed again\n"); + return 0; + } } }
Hardware is our IOT2050, baseline is today's master (1c4b5038afcc) with board-enabling and a bunch of patches from your tree [1]. However, already 4d6da10ce611 exposes the problem.
What could cause this?
Jan

Hi Jan,
Subject: am654_sdhci: mmc fail to send stop cmd
Hi all,
on one device with one specific SD-card (possibly an aging one), I'm seeing frequent "mmc fail to send stop cmd" messages, followed by read errors when loading kernel and dtb. -ETIMEDOUT is returned by mmd_send_cmd. However, I can always resolve this by simply retrying the stop command like this:
diff --git a/drivers/mmc/mmc.c b/drivers/mmc/mmc.c index f36d11ddc8..9019d9f2ed 100644 --- a/drivers/mmc/mmc.c +++ b/drivers/mmc/mmc.c @@ -406,7 +406,11 @@ static int mmc_read_blocks(struct mmc *mmc, void *dst, lbaint_t start, #if !defined(CONFIG_SPL_BUILD) || defined(CONFIG_SPL_LIBCOMMON_SUPPORT) pr_err("mmc fail to send stop cmd\n"); #endif
return 0;
pr_err("retrying...\n");
if (mmc_send_cmd(mmc, &cmd, NULL)) {
pr_err("failed again\n");
return 0;
} }}
Hardware is our IOT2050, baseline is today's master (1c4b5038afcc) with board-enabling and a bunch of patches from your tree [1]. However, already 4d6da10ce611 exposes the problem.
What could cause this?
Where the timeout happen in driver?
Did you try enlarge the timeout value?
Regards, Peng.
Jan
[1] https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub. com%2Fsiemens%2Fu-boot%2Fcommits%2Fjan%2Fiot2050&data=02%7 C01%7CPeng.Fan%40nxp.com%7Cda088100ee5a46cdc37008d82b29779f%7 C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63730680439552710 6&sdata=oiS6nOxxAQykjMyTecz%2FTJY4OW8WiZ2CbszR2mBrXuI%3D& amp;reserved=0

On 20.07.20 03:21, Peng Fan wrote:
Hi Jan,
Subject: am654_sdhci: mmc fail to send stop cmd
Hi all,
on one device with one specific SD-card (possibly an aging one), I'm seeing frequent "mmc fail to send stop cmd" messages, followed by read errors when loading kernel and dtb. -ETIMEDOUT is returned by mmd_send_cmd. However, I can always resolve this by simply retrying the stop command like this:
diff --git a/drivers/mmc/mmc.c b/drivers/mmc/mmc.c index f36d11ddc8..9019d9f2ed 100644 --- a/drivers/mmc/mmc.c +++ b/drivers/mmc/mmc.c @@ -406,7 +406,11 @@ static int mmc_read_blocks(struct mmc *mmc, void *dst, lbaint_t start, #if !defined(CONFIG_SPL_BUILD) || defined(CONFIG_SPL_LIBCOMMON_SUPPORT) pr_err("mmc fail to send stop cmd\n"); #endif
return 0;
pr_err("retrying...\n");
if (mmc_send_cmd(mmc, &cmd, NULL)) {
pr_err("failed again\n");
return 0;
} }}
Hardware is our IOT2050, baseline is today's master (1c4b5038afcc) with board-enabling and a bunch of patches from your tree [1]. However, already 4d6da10ce611 exposes the problem.
What could cause this?
Where the timeout happen in driver?
Did you try enlarge the timeout value?
Not sure yet where I could do that. The timeout is detected and reported by the hardware via SDHCI_INT_STATUS (= 0x18000 in case of an error).
Thanks, Jan
Regards, Peng.
Jan
[1] https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.
^^^^^^^^^
Welcome to the club. If you are on TB, I can recommend "Unmangle Outlook Safelinks" to get rid of this insecurity measure, at least on the client side.
com%2Fsiemens%2Fu-boot%2Fcommits%2Fjan%2Fiot2050&data=02%7 C01%7CPeng.Fan%40nxp.com%7Cda088100ee5a46cdc37008d82b29779f%7 C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63730680439552710 6&sdata=oiS6nOxxAQykjMyTecz%2FTJY4OW8WiZ2CbszR2mBrXuI%3D& amp;reserved=0

On 7/20/20 10:21 AM, Peng Fan wrote:
Hi Jan,
Subject: am654_sdhci: mmc fail to send stop cmd
Hi all,
on one device with one specific SD-card (possibly an aging one), I'm seeing frequent "mmc fail to send stop cmd" messages, followed by read errors when loading kernel and dtb. -ETIMEDOUT is returned by mmd_send_cmd. However, I can always resolve this by simply retrying the stop command like this:
diff --git a/drivers/mmc/mmc.c b/drivers/mmc/mmc.c index f36d11ddc8..9019d9f2ed 100644 --- a/drivers/mmc/mmc.c +++ b/drivers/mmc/mmc.c @@ -406,7 +406,11 @@ static int mmc_read_blocks(struct mmc *mmc, void *dst, lbaint_t start, #if !defined(CONFIG_SPL_BUILD) || defined(CONFIG_SPL_LIBCOMMON_SUPPORT) pr_err("mmc fail to send stop cmd\n"); #endif
return 0;
pr_err("retrying...\n");
if (mmc_send_cmd(mmc, &cmd, NULL)) {
pr_err("failed again\n");
return 0;
} }}
Hardware is our IOT2050, baseline is today's master (1c4b5038afcc) with board-enabling and a bunch of patches from your tree [1]. However, already 4d6da10ce611 exposes the problem.
What could cause this?
Where the timeout happen in driver?
Did you try enlarge the timeout value?
how about adding SDHCI_QUIRK_WAIT_SEND_CMD? And as Peng's comment, It needs to find where return error in driver code.
Best Regards, Jaehoon Chung
Regards, Peng.
Jan
[1] https://protect2.fireeye.com/v1/url?k=89b609db-d478086f-89b78294-000babdfecb.... com%2Fsiemens%2Fu-boot%2Fcommits%2Fjan%2Fiot2050&data=02%7 C01%7CPeng.Fan%40nxp.com%7Cda088100ee5a46cdc37008d82b29779f%7 C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63730680439552710 6&sdata=oiS6nOxxAQykjMyTecz%2FTJY4OW8WiZ2CbszR2mBrXuI%3D& amp;reserved=0

On 21.07.20 01:23, Jaehoon Chung wrote:
On 7/20/20 10:21 AM, Peng Fan wrote:
Hi Jan,
Subject: am654_sdhci: mmc fail to send stop cmd
Hi all,
on one device with one specific SD-card (possibly an aging one), I'm seeing frequent "mmc fail to send stop cmd" messages, followed by read errors when loading kernel and dtb. -ETIMEDOUT is returned by mmd_send_cmd. However, I can always resolve this by simply retrying the stop command like this:
diff --git a/drivers/mmc/mmc.c b/drivers/mmc/mmc.c index f36d11ddc8..9019d9f2ed 100644 --- a/drivers/mmc/mmc.c +++ b/drivers/mmc/mmc.c @@ -406,7 +406,11 @@ static int mmc_read_blocks(struct mmc *mmc, void *dst, lbaint_t start, #if !defined(CONFIG_SPL_BUILD) || defined(CONFIG_SPL_LIBCOMMON_SUPPORT) pr_err("mmc fail to send stop cmd\n"); #endif
return 0;
pr_err("retrying...\n");
if (mmc_send_cmd(mmc, &cmd, NULL)) {
pr_err("failed again\n");
return 0;
} }}
Hardware is our IOT2050, baseline is today's master (1c4b5038afcc) with board-enabling and a bunch of patches from your tree [1]. However, already 4d6da10ce611 exposes the problem.
What could cause this?
Where the timeout happen in driver?
Did you try enlarge the timeout value?
how about adding SDHCI_QUIRK_WAIT_SEND_CMD?
I tried that already, but the result was even worse, a non-working mmc.
And as Peng's comment, It needs to find where return error in driver code.
As written in my other reply: https://gitlab.denx.de/u-boot/u-boot/-/blob/f12341a9529540113f01989149bbbeb6... Thus, it's reported by the hw.
Thanks, Jan

Jan,
On 21/07/20 12:06 pm, Jan Kiszka wrote:
On 21.07.20 01:23, Jaehoon Chung wrote:
On 7/20/20 10:21 AM, Peng Fan wrote:
Hi Jan,
Subject: am654_sdhci: mmc fail to send stop cmd
Hi all,
on one device with one specific SD-card (possibly an aging one), I'm seeing frequent "mmc fail to send stop cmd" messages, followed by read errors when loading kernel and dtb. -ETIMEDOUT is returned by mmd_send_cmd. However, I can always resolve this by simply retrying the stop command like this:
diff --git a/drivers/mmc/mmc.c b/drivers/mmc/mmc.c index f36d11ddc8..9019d9f2ed 100644 --- a/drivers/mmc/mmc.c +++ b/drivers/mmc/mmc.c @@ -406,7 +406,11 @@ static int mmc_read_blocks(struct mmc *mmc, void *dst, lbaint_t start, #if !defined(CONFIG_SPL_BUILD) || defined(CONFIG_SPL_LIBCOMMON_SUPPORT) pr_err("mmc fail to send stop cmd\n"); #endif - return 0; + pr_err("retrying...\n"); + if (mmc_send_cmd(mmc, &cmd, NULL)) { + pr_err("failed again\n"); + return 0; + } } }
Hardware is our IOT2050, baseline is today's master (1c4b5038afcc) with board-enabling and a bunch of patches from your tree [1]. However, already 4d6da10ce611 exposes the problem.
What could cause this?
Where the timeout happen in driver?
Did you try enlarge the timeout value?
how about adding SDHCI_QUIRK_WAIT_SEND_CMD?
I tried that already, but the result was even worse, a non-working mmc.
And as Peng's comment, It needs to find where return error in driver code.
As written in my other reply: https://gitlab.denx.de/u-boot/u-boot/-/blob/f12341a9529540113f01989149bbbeb6... Thus, it's reported by the hw.
Its a command timeout for which we cannot program a higher timeout.
Can you send a full failure log?
Also, does the same card + board combination work in kernel? That should help us point to hardware vs U-boot.
Thanks, Faiz

On 21.07.20 19:03, Faiz Abbas wrote:
Jan,
On 21/07/20 12:06 pm, Jan Kiszka wrote:
On 21.07.20 01:23, Jaehoon Chung wrote:
On 7/20/20 10:21 AM, Peng Fan wrote:
Hi Jan,
Subject: am654_sdhci: mmc fail to send stop cmd
Hi all,
on one device with one specific SD-card (possibly an aging one), I'm seeing frequent "mmc fail to send stop cmd" messages, followed by read errors when loading kernel and dtb. -ETIMEDOUT is returned by mmd_send_cmd. However, I can always resolve this by simply retrying the stop command like this:
diff --git a/drivers/mmc/mmc.c b/drivers/mmc/mmc.c index f36d11ddc8..9019d9f2ed 100644 --- a/drivers/mmc/mmc.c +++ b/drivers/mmc/mmc.c @@ -406,7 +406,11 @@ static int mmc_read_blocks(struct mmc *mmc, void *dst, lbaint_t start, #if !defined(CONFIG_SPL_BUILD) || defined(CONFIG_SPL_LIBCOMMON_SUPPORT) pr_err("mmc fail to send stop cmd\n"); #endif - return 0; + pr_err("retrying...\n"); + if (mmc_send_cmd(mmc, &cmd, NULL)) { + pr_err("failed again\n"); + return 0; + } } }
Hardware is our IOT2050, baseline is today's master (1c4b5038afcc) with board-enabling and a bunch of patches from your tree [1]. However, already 4d6da10ce611 exposes the problem.
What could cause this?
Where the timeout happen in driver?
Did you try enlarge the timeout value?
how about adding SDHCI_QUIRK_WAIT_SEND_CMD?
I tried that already, but the result was even worse, a non-working mmc.
And as Peng's comment, It needs to find where return error in driver code.
As written in my other reply: https://gitlab.denx.de/u-boot/u-boot/-/blob/f12341a9529540113f01989149bbbeb6... Thus, it's reported by the hw.
Its a command timeout for which we cannot program a higher timeout.
Can you send a full failure log?
[unrelated fsbl, spl stuff]
U-Boot 2020.07-00883-g4d6da10ce6-dirty (Jul 20 2020 - 06:30:08 +0200)
Model: Siemens IOT2050 Advanced Base Board DRAM: 2 GiB MMC: sdhci@4f80000: 1, sdhci@04FA0000: 0 Loading Environment from SPI Flash... SF: Detected w25q128 with page size 256 Bytes, erase size 64 KiB, total 16 MiB OK In: serial Out: serial Err: serial Hit any key to stop autoboot: 0 stat: 18000 stat: 18000 stat: 208000 switch to partitions #0, OK mmc1(part 0) is current device ** No partition table - mmc 1 ** switch to partitions #0, OK mmc0 is current device Scanning mmc 0:1... Found U-Boot script /boot/boot.scr 784 bytes read in 2 ms (382.8 KiB/s) ## Executing script at 83000000 65329 bytes read in 11 ms (5.7 MiB/s) stat: 18000 mmc fail to send stop cmd, -110 retrying... 17113096 bytes read in 1409 ms (11.6 MiB/s) Moving Image from 0x80080000 to 0x80200000, end=812c0000 ## Flattened Device Tree blob at 82000000 Booting using the fdt blob at 0x82000000 Loading Device Tree to 00000000fdf0f000, end 00000000fdf21f30 ... OK
[kernel boot]
The diff I'm carrying on top of [1] is below.
Also, does the same card + board combination work in kernel? That should help us point to hardware vs U-boot.
The same card on the same board works without complaints with the kernel driver (5.8-rc5 at the moment). Even more strange, the same card a different board (IOT2050 Basic, some SoC series, slightly different type) does not throw those errors with the same U-Boot.
Note that we are still carrying those clock swapping changes in [2]. I've also tried to remove it, but it has no impact on this issue.
Thanks! Jan
[1] https://github.com/siemens/u-boot/commits/4d6da10ce611484befd4cebbf294c89bff... [2] https://github.com/siemens/u-boot/commit/4d6da10ce611484befd4cebbf294c89bffe...
diff --git a/drivers/mmc/mmc.c b/drivers/mmc/mmc.c index f36d11ddc8..c855e3075e 100644 --- a/drivers/mmc/mmc.c +++ b/drivers/mmc/mmc.c @@ -402,11 +402,16 @@ static int mmc_read_blocks(struct mmc *mmc, void *dst, lbaint_t start, cmd.cmdidx = MMC_CMD_STOP_TRANSMISSION; cmd.cmdarg = 0; cmd.resp_type = MMC_RSP_R1b; - if (mmc_send_cmd(mmc, &cmd, NULL)) { + int ret = mmc_send_cmd(mmc, &cmd, NULL); + if (ret) { #if !defined(CONFIG_SPL_BUILD) || defined(CONFIG_SPL_LIBCOMMON_SUPPORT) - pr_err("mmc fail to send stop cmd\n"); + pr_err("mmc fail to send stop cmd, %d\n", ret); #endif - return 0; + pr_err("retrying...\n"); + if (mmc_send_cmd(mmc, &cmd, NULL)) { + pr_err("failed again\n"); + return 0; + } } }
diff --git a/drivers/mmc/sdhci.c b/drivers/mmc/sdhci.c index f4eb655f6e..faefe6c8c9 100644 --- a/drivers/mmc/sdhci.c +++ b/drivers/mmc/sdhci.c @@ -381,6 +381,7 @@ static int sdhci_send_command(struct mmc *mmc, struct mmc_cmd *cmd,
sdhci_reset(host, SDHCI_RESET_CMD); sdhci_reset(host, SDHCI_RESET_DATA); + printf("stat: %x\n", stat); if (stat & SDHCI_INT_TIMEOUT) return -ETIMEDOUT; else

Jan,
On 21/07/20 10:52 pm, Jan Kiszka wrote:
On 21.07.20 19:03, Faiz Abbas wrote:
Jan,
On 21/07/20 12:06 pm, Jan Kiszka wrote:
On 21.07.20 01:23, Jaehoon Chung wrote:
On 7/20/20 10:21 AM, Peng Fan wrote:
Hi Jan,
Subject: am654_sdhci: mmc fail to send stop cmd
Hi all,
on one device with one specific SD-card (possibly an aging one), I'm seeing frequent "mmc fail to send stop cmd" messages, followed by read errors when loading kernel and dtb. -ETIMEDOUT is returned by mmd_send_cmd. However, I can always resolve this by simply retrying the stop command like this:
diff --git a/drivers/mmc/mmc.c b/drivers/mmc/mmc.c index f36d11ddc8..9019d9f2ed 100644 --- a/drivers/mmc/mmc.c +++ b/drivers/mmc/mmc.c @@ -406,7 +406,11 @@ static int mmc_read_blocks(struct mmc *mmc, void *dst, lbaint_t start, #if !defined(CONFIG_SPL_BUILD) || defined(CONFIG_SPL_LIBCOMMON_SUPPORT) pr_err("mmc fail to send stop cmd\n"); #endif - return 0; + pr_err("retrying...\n"); + if (mmc_send_cmd(mmc, &cmd, NULL)) { + pr_err("failed again\n"); + return 0; + } } }
Hardware is our IOT2050, baseline is today's master (1c4b5038afcc) with board-enabling and a bunch of patches from your tree [1]. However, already 4d6da10ce611 exposes the problem.
What could cause this?
Where the timeout happen in driver?
Did you try enlarge the timeout value?
how about adding SDHCI_QUIRK_WAIT_SEND_CMD?
I tried that already, but the result was even worse, a non-working mmc.
And as Peng's comment, It needs to find where return error in driver code.
As written in my other reply: https://gitlab.denx.de/u-boot/u-boot/-/blob/f12341a9529540113f01989149bbbeb6... Thus, it's reported by the hw.
Its a command timeout for which we cannot program a higher timeout.
Can you send a full failure log?
[unrelated fsbl, spl stuff]
U-Boot 2020.07-00883-g4d6da10ce6-dirty (Jul 20 2020 - 06:30:08 +0200)
Model: Siemens IOT2050 Advanced Base Board DRAM: 2 GiB MMC: sdhci@4f80000: 1, sdhci@04FA0000: 0 Loading Environment from SPI Flash... SF: Detected w25q128 with page size 256 Bytes, erase size 64 KiB, total 16 MiB OK In: serial Out: serial Err: serial Hit any key to stop autoboot: 0 stat: 18000 stat: 18000 stat: 208000 switch to partitions #0, OK mmc1(part 0) is current device ** No partition table - mmc 1 ** switch to partitions #0, OK mmc0 is current device Scanning mmc 0:1... Found U-Boot script /boot/boot.scr 784 bytes read in 2 ms (382.8 KiB/s) ## Executing script at 83000000 65329 bytes read in 11 ms (5.7 MiB/s) stat: 18000 mmc fail to send stop cmd, -110 retrying... 17113096 bytes read in 1409 ms (11.6 MiB/s) Moving Image from 0x80080000 to 0x80200000, end=812c0000 ## Flattened Device Tree blob at 82000000 Booting using the fdt blob at 0x82000000 Loading Device Tree to 00000000fdf0f000, end 00000000fdf21f30 ... OK
[kernel boot]
The diff I'm carrying on top of [1] is below.
Also, does the same card + board combination work in kernel? That should help us point to hardware vs U-boot.
The same card on the same board works without complaints with the kernel driver (5.8-rc5 at the moment). Even more strange, the same card a different board (IOT2050 Basic, some SoC series, slightly different type) does not throw those errors with the same U-Boot.
Was this card working with an older U-boot version and only failing in mainline?
Note that we are still carrying those clock swapping changes in [2]. I've also tried to remove it, but it has no impact on this issue.
One more thing to try is to reduce the speed mode to default as we are already gating frequency to 25 MHz. Can you modify the sdhci-caps-mask to the following for sdhci1?
sdhci-caps-mask = <0x7 0x200000>;
Thanks, Faiz

Jan,
On 23/07/20 8:55 am, Faiz Abbas wrote:
Jan,
On 21/07/20 10:52 pm, Jan Kiszka wrote:
On 21.07.20 19:03, Faiz Abbas wrote:
Jan,
On 21/07/20 12:06 pm, Jan Kiszka wrote:
On 21.07.20 01:23, Jaehoon Chung wrote:
On 7/20/20 10:21 AM, Peng Fan wrote:
Hi Jan,
> Subject: am654_sdhci: mmc fail to send stop cmd > > Hi all, > > on one device with one specific SD-card (possibly an aging one), I'm seeing > frequent "mmc fail to send stop cmd" messages, followed by read errors > when loading kernel and dtb. -ETIMEDOUT is returned by mmd_send_cmd. > However, I can always resolve this by simply retrying the stop command like > this: >
...
Its a command timeout for which we cannot program a higher timeout.
Can you send a full failure log?
[unrelated fsbl, spl stuff]
U-Boot 2020.07-00883-g4d6da10ce6-dirty (Jul 20 2020 - 06:30:08 +0200)
Model: Siemens IOT2050 Advanced Base Board DRAM: 2 GiB MMC: sdhci@4f80000: 1, sdhci@04FA0000: 0 Loading Environment from SPI Flash... SF: Detected w25q128 with page size 256 Bytes, erase size 64 KiB, total 16 MiB OK In: serial Out: serial Err: serial Hit any key to stop autoboot: 0 stat: 18000 stat: 18000 stat: 208000 switch to partitions #0, OK mmc1(part 0) is current device ** No partition table - mmc 1 ** switch to partitions #0, OK mmc0 is current device Scanning mmc 0:1... Found U-Boot script /boot/boot.scr 784 bytes read in 2 ms (382.8 KiB/s) ## Executing script at 83000000 65329 bytes read in 11 ms (5.7 MiB/s) stat: 18000 mmc fail to send stop cmd, -110 retrying... 17113096 bytes read in 1409 ms (11.6 MiB/s) Moving Image from 0x80080000 to 0x80200000, end=812c0000 ## Flattened Device Tree blob at 82000000 Booting using the fdt blob at 0x82000000 Loading Device Tree to 00000000fdf0f000, end 00000000fdf21f30 ... OK
[kernel boot]
The diff I'm carrying on top of [1] is below.
Also, does the same card + board combination work in kernel? That should help us point to hardware vs U-boot.
The same card on the same board works without complaints with the kernel driver (5.8-rc5 at the moment). Even more strange, the same card a different board (IOT2050 Basic, some SoC series, slightly different type) does not throw those errors with the same U-Boot.
Was this card working with an older U-boot version and only failing in mainline?
Note that we are still carrying those clock swapping changes in [2]. I've also tried to remove it, but it has no impact on this issue.
One more thing to try is to reduce the speed mode to default as we are already gating frequency to 25 MHz. Can you modify the sdhci-caps-mask to the following for sdhci1?
sdhci-caps-mask = <0x7 0x200000>;
You'll need to apply this fix for this mask to work:
https://patchwork.ozlabs.org/project/uboot/patch/20200723041219.2438-1-faiz_...
Thanks, Faiz

On 23.07.20 06:14, Faiz Abbas wrote:
Jan,
On 23/07/20 8:55 am, Faiz Abbas wrote:
Jan,
On 21/07/20 10:52 pm, Jan Kiszka wrote:
On 21.07.20 19:03, Faiz Abbas wrote:
Jan,
On 21/07/20 12:06 pm, Jan Kiszka wrote:
On 21.07.20 01:23, Jaehoon Chung wrote:
On 7/20/20 10:21 AM, Peng Fan wrote: > Hi Jan, > >> Subject: am654_sdhci: mmc fail to send stop cmd >> >> Hi all, >> >> on one device with one specific SD-card (possibly an aging one), I'm seeing >> frequent "mmc fail to send stop cmd" messages, followed by read errors >> when loading kernel and dtb. -ETIMEDOUT is returned by mmd_send_cmd. >> However, I can always resolve this by simply retrying the stop command like >> this: >>
...
Its a command timeout for which we cannot program a higher timeout.
Can you send a full failure log?
[unrelated fsbl, spl stuff]
U-Boot 2020.07-00883-g4d6da10ce6-dirty (Jul 20 2020 - 06:30:08 +0200)
Model: Siemens IOT2050 Advanced Base Board DRAM: 2 GiB MMC: sdhci@4f80000: 1, sdhci@04FA0000: 0 Loading Environment from SPI Flash... SF: Detected w25q128 with page size 256 Bytes, erase size 64 KiB, total 16 MiB OK In: serial Out: serial Err: serial Hit any key to stop autoboot: 0 stat: 18000 stat: 18000 stat: 208000 switch to partitions #0, OK mmc1(part 0) is current device ** No partition table - mmc 1 ** switch to partitions #0, OK mmc0 is current device Scanning mmc 0:1... Found U-Boot script /boot/boot.scr 784 bytes read in 2 ms (382.8 KiB/s) ## Executing script at 83000000 65329 bytes read in 11 ms (5.7 MiB/s) stat: 18000 mmc fail to send stop cmd, -110 retrying... 17113096 bytes read in 1409 ms (11.6 MiB/s) Moving Image from 0x80080000 to 0x80200000, end=812c0000 ## Flattened Device Tree blob at 82000000 Booting using the fdt blob at 0x82000000 Loading Device Tree to 00000000fdf0f000, end 00000000fdf21f30 ... OK
[kernel boot]
The diff I'm carrying on top of [1] is below.
Also, does the same card + board combination work in kernel? That should help us point to hardware vs U-boot.
The same card on the same board works without complaints with the kernel driver (5.8-rc5 at the moment). Even more strange, the same card a different board (IOT2050 Basic, some SoC series, slightly different type) does not throw those errors with the same U-Boot.
Was this card working with an older U-boot version and only failing in mainline?
Note that we are still carrying those clock swapping changes in [2]. I've also tried to remove it, but it has no impact on this issue.
One more thing to try is to reduce the speed mode to default as we are already gating frequency to 25 MHz. Can you modify the sdhci-caps-mask to the following for sdhci1?
sdhci-caps-mask = <0x7 0x200000>;
You'll need to apply this fix for this mask to work:
https://patchwork.ozlabs.org/project/uboot/patch/20200723041219.2438-1-faiz_...
BTW, could this be queued for upstream? We depend on it now.
Thanks, Jan
PS: Subject has a typo ("correspnding").

On 23.07.20 05:25, Faiz Abbas wrote:
Jan,
On 21/07/20 10:52 pm, Jan Kiszka wrote:
On 21.07.20 19:03, Faiz Abbas wrote:
Jan,
On 21/07/20 12:06 pm, Jan Kiszka wrote:
On 21.07.20 01:23, Jaehoon Chung wrote:
On 7/20/20 10:21 AM, Peng Fan wrote:
Hi Jan,
> Subject: am654_sdhci: mmc fail to send stop cmd > > Hi all, > > on one device with one specific SD-card (possibly an aging one), I'm seeing > frequent "mmc fail to send stop cmd" messages, followed by read errors > when loading kernel and dtb. -ETIMEDOUT is returned by mmd_send_cmd. > However, I can always resolve this by simply retrying the stop command like > this: > > diff --git a/drivers/mmc/mmc.c b/drivers/mmc/mmc.c index > f36d11ddc8..9019d9f2ed 100644 > --- a/drivers/mmc/mmc.c > +++ b/drivers/mmc/mmc.c > @@ -406,7 +406,11 @@ static int mmc_read_blocks(struct mmc *mmc, void > *dst, lbaint_t start, #if !defined(CONFIG_SPL_BUILD) || > defined(CONFIG_SPL_LIBCOMMON_SUPPORT) > pr_err("mmc fail to send stop cmd\n"); #endif > - return 0; > + pr_err("retrying...\n"); > + if (mmc_send_cmd(mmc, &cmd, NULL)) { > + pr_err("failed again\n"); > + return 0; > + } > } > } > > > Hardware is our IOT2050, baseline is today's master (1c4b5038afcc) with > board-enabling and a bunch of patches from your tree [1]. However, already > 4d6da10ce611 exposes the problem. > > What could cause this?
Where the timeout happen in driver?
Did you try enlarge the timeout value?
how about adding SDHCI_QUIRK_WAIT_SEND_CMD?
I tried that already, but the result was even worse, a non-working mmc.
And as Peng's comment, It needs to find where return error in driver code.
As written in my other reply: https://gitlab.denx.de/u-boot/u-boot/-/blob/f12341a9529540113f01989149bbbeb6... Thus, it's reported by the hw.
Its a command timeout for which we cannot program a higher timeout.
Can you send a full failure log?
[unrelated fsbl, spl stuff]
U-Boot 2020.07-00883-g4d6da10ce6-dirty (Jul 20 2020 - 06:30:08 +0200)
Model: Siemens IOT2050 Advanced Base Board DRAM: 2 GiB MMC: sdhci@4f80000: 1, sdhci@04FA0000: 0 Loading Environment from SPI Flash... SF: Detected w25q128 with page size 256 Bytes, erase size 64 KiB, total 16 MiB OK In: serial Out: serial Err: serial Hit any key to stop autoboot: 0 stat: 18000 stat: 18000 stat: 208000 switch to partitions #0, OK mmc1(part 0) is current device ** No partition table - mmc 1 ** switch to partitions #0, OK mmc0 is current device Scanning mmc 0:1... Found U-Boot script /boot/boot.scr 784 bytes read in 2 ms (382.8 KiB/s) ## Executing script at 83000000 65329 bytes read in 11 ms (5.7 MiB/s) stat: 18000 mmc fail to send stop cmd, -110 retrying... 17113096 bytes read in 1409 ms (11.6 MiB/s) Moving Image from 0x80080000 to 0x80200000, end=812c0000 ## Flattened Device Tree blob at 82000000 Booting using the fdt blob at 0x82000000 Loading Device Tree to 00000000fdf0f000, end 00000000fdf21f30 ... OK
[kernel boot]
The diff I'm carrying on top of [1] is below.
Also, does the same card + board combination work in kernel? That should help us point to hardware vs U-boot.
The same card on the same board works without complaints with the kernel driver (5.8-rc5 at the moment). Even more strange, the same card a different board (IOT2050 Basic, some SoC series, slightly different type) does not throw those errors with the same U-Boot.
Was this card working with an older U-boot version and only failing in mainline?
Good point: Just tested our legacy firmware that was based on https://git.ti.com/cgit/processor-sdk/processor-sdk-u-boot/log/?h=029e4c009a... (https://github.com/siemens/meta-iot2050/blob/master/recipes-bsp/u-boot/u-boo...), and it does not expose the issue so far. If I look at the transfer rate, 2.8 MiB/s with the old firmware vs. 11.x MiB/s with upstream, you suggestion below may make the difference.
Note that we are still carrying those clock swapping changes in [2]. I've also tried to remove it, but it has no impact on this issue.
One more thing to try is to reduce the speed mode to default as we are already gating frequency to 25 MHz. Can you modify the sdhci-caps-mask to the following for sdhci1?
sdhci-caps-mask = <0x7 0x200000>;
Trying that out now...
Jan

On 23.07.20 07:25, Jan Kiszka wrote:
On 23.07.20 05:25, Faiz Abbas wrote:
Jan,
On 21/07/20 10:52 pm, Jan Kiszka wrote:
On 21.07.20 19:03, Faiz Abbas wrote:
Jan,
On 21/07/20 12:06 pm, Jan Kiszka wrote:
On 21.07.20 01:23, Jaehoon Chung wrote:
On 7/20/20 10:21 AM, Peng Fan wrote: > Hi Jan, > >> Subject: am654_sdhci: mmc fail to send stop cmd >> >> Hi all, >> >> on one device with one specific SD-card (possibly an aging one), >> I'm seeing >> frequent "mmc fail to send stop cmd" messages, followed by read >> errors >> when loading kernel and dtb. -ETIMEDOUT is returned by >> mmd_send_cmd. >> However, I can always resolve this by simply retrying the stop >> command like >> this: >> >> diff --git a/drivers/mmc/mmc.c b/drivers/mmc/mmc.c index >> f36d11ddc8..9019d9f2ed 100644 >> --- a/drivers/mmc/mmc.c >> +++ b/drivers/mmc/mmc.c >> @@ -406,7 +406,11 @@ static int mmc_read_blocks(struct mmc *mmc, >> void >> *dst, lbaint_t start, #if !defined(CONFIG_SPL_BUILD) || >> defined(CONFIG_SPL_LIBCOMMON_SUPPORT) >> pr_err("mmc fail to send stop cmd\n"); #endif >> - return 0; >> + pr_err("retrying...\n"); >> + if (mmc_send_cmd(mmc, &cmd, NULL)) { >> + pr_err("failed again\n"); >> + return 0; >> + } >> } >> } >> >> >> Hardware is our IOT2050, baseline is today's master >> (1c4b5038afcc) with >> board-enabling and a bunch of patches from your tree [1]. >> However, already >> 4d6da10ce611 exposes the problem. >> >> What could cause this? > > Where the timeout happen in driver? > > Did you try enlarge the timeout value?
how about adding SDHCI_QUIRK_WAIT_SEND_CMD?
I tried that already, but the result was even worse, a non-working mmc.
And as Peng's comment, It needs to find where return error in driver code.
As written in my other reply: https://gitlab.denx.de/u-boot/u-boot/-/blob/f12341a9529540113f01989149bbbeb6...
Thus, it's reported by the hw.
Its a command timeout for which we cannot program a higher timeout.
Can you send a full failure log?
[unrelated fsbl, spl stuff]
U-Boot 2020.07-00883-g4d6da10ce6-dirty (Jul 20 2020 - 06:30:08 +0200)
Model: Siemens IOT2050 Advanced Base Board DRAM: 2 GiB MMC: sdhci@4f80000: 1, sdhci@04FA0000: 0 Loading Environment from SPI Flash... SF: Detected w25q128 with page size 256 Bytes, erase size 64 KiB, total 16 MiB OK In: serial Out: serial Err: serial Hit any key to stop autoboot: 0 stat: 18000 stat: 18000 stat: 208000 switch to partitions #0, OK mmc1(part 0) is current device ** No partition table - mmc 1 ** switch to partitions #0, OK mmc0 is current device Scanning mmc 0:1... Found U-Boot script /boot/boot.scr 784 bytes read in 2 ms (382.8 KiB/s) ## Executing script at 83000000 65329 bytes read in 11 ms (5.7 MiB/s) stat: 18000 mmc fail to send stop cmd, -110 retrying... 17113096 bytes read in 1409 ms (11.6 MiB/s) Moving Image from 0x80080000 to 0x80200000, end=812c0000 ## Flattened Device Tree blob at 82000000 Booting using the fdt blob at 0x82000000 Loading Device Tree to 00000000fdf0f000, end 00000000fdf21f30 ... OK
[kernel boot]
The diff I'm carrying on top of [1] is below.
Also, does the same card + board combination work in kernel? That should help us point to hardware vs U-boot.
The same card on the same board works without complaints with the kernel driver (5.8-rc5 at the moment). Even more strange, the same card a different board (IOT2050 Basic, some SoC series, slightly different type) does not throw those errors with the same U-Boot.
Was this card working with an older U-boot version and only failing in mainline?
Good point: Just tested our legacy firmware that was based on https://git.ti.com/cgit/processor-sdk/processor-sdk-u-boot/log/?h=029e4c009a... (https://github.com/siemens/meta-iot2050/blob/master/recipes-bsp/u-boot/u-boo...), and it does not expose the issue so far. If I look at the transfer rate, 2.8 MiB/s with the old firmware vs. 11.x MiB/s with upstream, you suggestion below may make the difference.
Note that we are still carrying those clock swapping changes in [2]. I've also tried to remove it, but it has no impact on this issue.
One more thing to try is to reduce the speed mode to default as we are already gating frequency to 25 MHz. Can you modify the sdhci-caps-mask to the following for sdhci1?
sdhci-caps-mask = <0x7 0x200000>;
Trying that out now...
Yep, that works as well (and it does not even degrade the read performance: still 11 MiB/s with this card).
What does it tell us?
Jan
participants (4)
-
Faiz Abbas
-
Jaehoon Chung
-
Jan Kiszka
-
Peng Fan