[U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds

newer
[U-Boot] [PATCH v2 1/5] spi_sf:...

Lukasz Majewski

28 Aug 2015 28 Aug '15

3:50 p.m.

The commit: d9dbb97be0e4a550457aec5f11afefb446169c90 "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end of dw mmc transfer.

For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB) the default timeout is to short.

The new value - 20 seconds - takes into account the situation when SD card triggers internal clean up. Such process may take more than 10 seconds on some cards.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com Cc: Marek Vasut marex@denx.de Cc: Pantelis Antoniou panto@antoniou-consulting.com Cc: Tom Rini trini@konsulko.com --- drivers/mmc/dw_mmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/dw_mmc.c b/drivers/mmc/dw_mmc.c index 77b87e0..21a92d2 100644 --- a/drivers/mmc/dw_mmc.c +++ b/drivers/mmc/dw_mmc.c @@ -213,7 +213,7 @@ static int dwmci_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd,

if (data) { start = get_timer(0); - timeout = 1000; + timeout = 20000; for (;;) { mask = dwmci_readl(host, DWMCI_RINTSTS); /* Error during data transfer. */

-- 2.0.0.rc2

Show replies by date

Lukasz Majewski

28 Aug 28 Aug

3:50 p.m.

New subject: [U-Boot] [PATCH 2/2] mmc: dw_mmc: Make timeout error visible to u-boot console

The timeout error for DW MMC transfer should be visible on the u-boot console to speed up the process of debugging.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com Cc: Tom Rini trini@konsulko.com Cc: Simon Glass sjg@chromium.org --- drivers/mmc/dw_mmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/dw_mmc.c b/drivers/mmc/dw_mmc.c index 21a92d2..98f5cb7 100644 --- a/drivers/mmc/dw_mmc.c +++ b/drivers/mmc/dw_mmc.c @@ -231,7 +231,7 @@ static int dwmci_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd,

/* Check for timeout. */ if (get_timer(start) > timeout) { - debug("%s: Timeout waiting for data!\n", + printf("%s: Timeout waiting for data!\n", __func__); ret = TIMEOUT; break;

-- 2.0.0.rc2

Simon Glass

29 Aug 29 Aug

1:21 a.m.

New subject: [U-Boot] [PATCH 2/2] mmc: dw_mmc: Make timeout error visible to u-boot console

Hi Lukasz,

On 28 August 2015 at 07:50, Lukasz Majewski l.majewski@samsung.com wrote:

...

The timeout error for DW MMC transfer should be visible on the u-boot console to speed up the process of debugging.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com Cc: Tom Rini trini@konsulko.com Cc: Simon Glass sjg@chromium.org

drivers/mmc/dw_mmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

Can this be reported at the command line level instead? It feels wrong to be printing errors in a driver. Also, why does it return TIMEOUT instead of -ETIMEOUT?

Regards, Simon

Lukasz Majewski

2:09 p.m.

New subject: [U-Boot] [PATCH 2/2] mmc: dw_mmc: Make timeout error visible to u-boot console

On Fri, 28 Aug 2015 17:21:51 -0600 Simon Glass sjg@chromium.org wrote:

...

Hi Lukasz,

On 28 August 2015 at 07:50, Lukasz Majewski l.majewski@samsung.com wrote:

...
The timeout error for DW MMC transfer should be visible on the u-boot console to speed up the process of debugging.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com Cc: Tom Rini trini@konsulko.com Cc: Simon Glass sjg@chromium.org

drivers/mmc/dw_mmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

Can this be reported at the command line level instead?

With my setup - dfu tests - I didn't receive any error/information about the MMC (SD Card to be precise) timeout.

(I would expect MMC subsystem to return -ETIMEOUT and then this error would be propagated to dfu and stop execution of the dfu command).

This didn't work and hence should be scrutinized.

...

It feels wrong to be printing errors in a driver.

If this information would be printed on the console I might have noticed it immediately and save some time for debugging.

...

Also, why does it return TIMEOUT instead of -ETIMEOUT?

Good question.

...

Regards, Simon

Best regards,

Lukasz Majewski

Simon Glass

5:07 p.m.

New subject: [U-Boot] [PATCH 2/2] mmc: dw_mmc: Make timeout error visible to u-boot console

Hi Lukasz,

On 29 August 2015 at 06:09, Lukasz Majewski l.majewski@majess.pl wrote:

...

On Fri, 28 Aug 2015 17:21:51 -0600 Simon Glass sjg@chromium.org wrote:

...
Hi Lukasz,

On 28 August 2015 at 07:50, Lukasz Majewski l.majewski@samsung.com wrote:

...
The timeout error for DW MMC transfer should be visible on the u-boot console to speed up the process of debugging.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com Cc: Tom Rini trini@konsulko.com Cc: Simon Glass sjg@chromium.org

drivers/mmc/dw_mmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

Can this be reported at the command line level instead?

With my setup - dfu tests - I didn't receive any error/information about the MMC (SD Card to be precise) timeout.

(I would expect MMC subsystem to return -ETIMEOUT and then this error would be propagated to dfu and stop execution of the dfu command).

This didn't work and hence should be scrutinized.

Agreed.

...

...
It feels wrong to be printing errors in a driver.

If this information would be printed on the console I might have noticed it immediately and save some time for debugging.

...
Also, why does it return TIMEOUT instead of -ETIMEOUT?

Good question.

It looks likes mmc.h has its own errors (NO_CARD_ERR, etc.). I suspect these could be converted to use standard errors.

Anyway your question remains. A driver error should be reported by the caller.

Regards, Simon

Lukasz Majewski

3 Sep 3 Sep

2:33 p.m.

New subject: [U-Boot] [PATCH 2/2] mmc: dw_mmc: Make timeout error visible to u-boot console

Hi Simon,

...

Hi Lukasz,

On 29 August 2015 at 06:09, Lukasz Majewski l.majewski@majess.pl wrote:

...
On Fri, 28 Aug 2015 17:21:51 -0600 Simon Glass sjg@chromium.org wrote:

...
Hi Lukasz,

On 28 August 2015 at 07:50, Lukasz Majewski l.majewski@samsung.com wrote:

...
The timeout error for DW MMC transfer should be visible on the u-boot console to speed up the process of debugging.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com Cc: Tom Rini trini@konsulko.com Cc: Simon Glass sjg@chromium.org

drivers/mmc/dw_mmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

Can this be reported at the command line level instead?

With my setup - dfu tests - I didn't receive any error/information about the MMC (SD Card to be precise) timeout.

(I would expect MMC subsystem to return -ETIMEOUT and then this error would be propagated to dfu and stop execution of the dfu command).

This didn't work and hence should be scrutinized.

Agreed.

...
...
It feels wrong to be printing errors in a driver.

If this information would be printed on the console I might have noticed it immediately and save some time for debugging.

...
Also, why does it return TIMEOUT instead of -ETIMEOUT?

Good question.

It looks likes mmc.h has its own errors (NO_CARD_ERR, etc.). I suspect these could be converted to use standard errors.

Anyway your question remains. A driver error should be reported by the caller.

Please drop this patch.

I've posted other one (as a reply tp this one): "[PATCH] FIX: fat: Provide correct return code from disk_{read|write}"

to fix this problem.

...

Regards, Simon

-- Best regards, Lukasz Majewski Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

Lukasz Majewski

2:21 p.m.

New subject: [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers

It is very common that FAT code is using following pattern: if (disk_{read|write}() < 0) return -1;

Up till now the above code was dead, since disk_{read|write) could only return value >= 0. As a result some errors from medium layer (i.e. eMMC/SD) were not caught.

The above behavior was caused by block_{read|write|erase} declared at struct block_dev_desc (@part.h). It returns unsigned long, where 0 indicates error and > 0 indicates that medium operation was correct.

This patch as error regards 0 returned from block_{read|write|erase} when nr_blocks is grater than zero. Read/Write operation with nr_blocks=0 should return 0 and hence is not considered as an error.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com

Test HW: Odroid XU3 - Exynos 5433 --- fs/fat/fat.c | 11 +++++++++-- fs/fat/fat_write.c | 11 +++++++++-- 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/fs/fat/fat.c b/fs/fat/fat.c index bccc3e3..d743014 100644 --- a/fs/fat/fat.c +++ b/fs/fat/fat.c @@ -45,11 +45,18 @@ static disk_partition_t cur_part_info;

static int disk_read(__u32 block, __u32 nr_blocks, void *buf) { + ulong ret; + if (!cur_dev || !cur_dev->block_read) return -1;

- return cur_dev->block_read(cur_dev->dev, - cur_part_info.start + block, nr_blocks, buf); + ret = cur_dev->block_read(cur_dev->dev, + cur_part_info.start + block, nr_blocks, buf); + + if (nr_blocks && ret == 0) + return -1; + + return ret; }

int fat_set_blk_dev(block_dev_desc_t *dev_desc, disk_partition_t *info) diff --git a/fs/fat/fat_write.c b/fs/fat/fat_write.c index 98b88ad..adb6940 100644 --- a/fs/fat/fat_write.c +++ b/fs/fat/fat_write.c @@ -30,6 +30,8 @@ static void uppercase(char *str, int len) static int total_sector; static int disk_write(__u32 block, __u32 nr_blocks, void *buf) { + ulong ret; + if (!cur_dev || !cur_dev->block_write) return -1;

@@ -39,8 +41,13 @@ static int disk_write(__u32 block, __u32 nr_blocks, void *buf) return -1; }

- return cur_dev->block_write(cur_dev->dev, - cur_part_info.start + block, nr_blocks, buf); + ret = cur_dev->block_write(cur_dev->dev, + cur_part_info.start + block, + nr_blocks, buf); + if (nr_blocks && ret == 0) + return -1; + + return ret; }

-- 2.0.0.rc2

Tom Rini

2:44 p.m.

New subject: [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers

On Thu, Sep 03, 2015 at 02:21:39PM +0200, Lukasz Majewski wrote:

...

It is very common that FAT code is using following pattern: if (disk_{read|write}() < 0) return -1;

Up till now the above code was dead, since disk_{read|write) could only return value >= 0. As a result some errors from medium layer (i.e. eMMC/SD) were not caught.

The above behavior was caused by block_{read|write|erase} declared at struct block_dev_desc (@part.h). It returns unsigned long, where 0 indicates error and > 0 indicates that medium operation was correct.

This patch as error regards 0 returned from block_{read|write|erase} when nr_blocks is grater than zero. Read/Write operation with nr_blocks=0 should return 0 and hence is not considered as an error.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com

Test HW: Odroid XU3 - Exynos 5433

Can you pick up Stephen's FAT replacement series and see if it also fixes this problem? Thanks!

-- Tom

Lukasz Majewski

3:40 p.m.

New subject: [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers

Hi Tom,

...

On Thu, Sep 03, 2015 at 02:21:39PM +0200, Lukasz Majewski wrote:

...
It is very common that FAT code is using following pattern: if (disk_{read|write}() < 0) return -1;

Up till now the above code was dead, since disk_{read|write) could only return value >= 0. As a result some errors from medium layer (i.e. eMMC/SD) were not caught.

The above behavior was caused by block_{read|write|erase} declared at struct block_dev_desc (@part.h). It returns unsigned long, where 0 indicates error and > 0 indicates that medium operation was correct.

This patch as error regards 0 returned from block_{read|write|erase} when nr_blocks is grater than zero. Read/Write operation with nr_blocks=0 should return 0 and hence is not considered as an error.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com

Test HW: Odroid XU3 - Exynos 5433

Can you pick up Stephen's FAT replacement series and see if it also fixes this problem? Thanks!

Ok, I will test this fat implementation.

However, since for v2015.10 it won't be included, I would also opt for adding this fix to the current u-boot.

-- Best regards, Lukasz Majewski Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

Lukasz Majewski

4:18 p.m.

New subject: [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers

Hi Lukasz,

...

Hi Tom,

...
On Thu, Sep 03, 2015 at 02:21:39PM +0200, Lukasz Majewski wrote:

...
It is very common that FAT code is using following pattern: if (disk_{read|write}() < 0) return -1;

Up till now the above code was dead, since disk_{read|write) could only return value >= 0. As a result some errors from medium layer (i.e. eMMC/SD) were not caught.

The above behavior was caused by block_{read|write|erase} declared at struct block_dev_desc (@part.h). It returns unsigned long, where 0 indicates error and > 0 indicates that medium operation was correct.

This patch as error regards 0 returned from block_{read|write|erase} when nr_blocks is grater than zero. Read/Write operation with nr_blocks=0 should return 0 and hence is not considered as an error.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com

Test HW: Odroid XU3 - Exynos 5433

Can you pick up Stephen's FAT replacement series and see if it also fixes this problem? Thanks!

Ok, I will test this fat implementation.

I've applied v2 of this patchset on top of SHA1: 79c884d7e449a63fa8f07b7495f8f9873355c48f

Unfortunately, DFU tests fail with first attempt to pass the test.

Apparently this code needs some more review/testing before being included to main line.

...

However, since for v2015.10 it won't be included, I would also opt for adding this fix to the current u-boot.

-- Best regards, Lukasz Majewski Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

Stephen Warren

23 Sep 23 Sep

5:17 a.m.

New subject: [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers

On 09/03/2015 08:18 AM, Lukasz Majewski wrote:

...

Hi Lukasz,

...
Hi Tom,

...
On Thu, Sep 03, 2015 at 02:21:39PM +0200, Lukasz Majewski wrote:

...
It is very common that FAT code is using following pattern: if (disk_{read|write}() < 0) return -1;

Up till now the above code was dead, since disk_{read|write) could only return value >= 0. As a result some errors from medium layer (i.e. eMMC/SD) were not caught.

The above behavior was caused by block_{read|write|erase} declared at struct block_dev_desc (@part.h). It returns unsigned long, where 0 indicates error and > 0 indicates that medium operation was correct.

This patch as error regards 0 returned from block_{read|write|erase} when nr_blocks is grater than zero. Read/Write operation with nr_blocks=0 should return 0 and hence is not considered as an error.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com

Test HW: Odroid XU3 - Exynos 5433

Can you pick up Stephen's FAT replacement series and see if it also fixes this problem? Thanks!

Ok, I will test this fat implementation.

I've applied v2 of this patchset on top of SHA1: 79c884d7e449a63fa8f07b7495f8f9873355c48f

Unfortunately, DFU tests fail with first attempt to pass the test.

I've found a couple of problems.

First up, file_fat_write() wasn't truncating the file when writing, so the file size wasn't changing when over-writing a large file with a small file. With this fixed, I can run the DFU tests just fine for all the small files (<1M). I've fixed this locally and in the ff branch on my github.

Second, ff is slow:

Some random old build I had in flash on my system:

...

Tegra124 (Jetson TK1) # load mmc 1:1 $loadaddr dfu1.bin reading dfu1.bin 1048576 bytes read in 95 ms (10.5 MiB/s)

With my ff branch:

...

Tegra124 (Jetson TK1) # load mmc 1:1 $loadaddr dfu1.bin 1048576 bytes read in 5038 ms (203.1 KiB/s)

That's quite the slow-down! I believe this is causing dfu-util to time out on the larger files (1M+). Just for functional testing, I'll try and find a way to hack dfu-util to have a much larger timeout for the final flush operation. I wonder if the old FAT implementation had a disk cache (e.g. that 32K buffer in BSS?) and we need the same for ff? I'll try and track down why it's so slow.

Perhaps there are other issues as yet unfound.

Lukasz Majewski

10:40 a.m.

New subject: [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers

Hi Stephen,

...

On 09/03/2015 08:18 AM, Lukasz Majewski wrote:

...
Hi Lukasz,

...
Hi Tom,

...
On Thu, Sep 03, 2015 at 02:21:39PM +0200, Lukasz Majewski wrote:

...
It is very common that FAT code is using following pattern: if (disk_{read|write}() < 0) return -1;

Up till now the above code was dead, since disk_{read|write) could only return value >= 0. As a result some errors from medium layer (i.e. eMMC/SD) were not caught.

The above behavior was caused by block_{read|write|erase} declared at struct block_dev_desc (@part.h). It returns unsigned long, where 0 indicates error and > 0 indicates that medium operation was correct.

This patch as error regards 0 returned from block_{read|write|erase} when nr_blocks is grater than zero. Read/Write operation with nr_blocks=0 should return 0 and hence is not considered as an error.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com

Test HW: Odroid XU3 - Exynos 5433

Can you pick up Stephen's FAT replacement series and see if it also fixes this problem? Thanks!

Ok, I will test this fat implementation.

I've applied v2 of this patchset on top of SHA1: 79c884d7e449a63fa8f07b7495f8f9873355c48f

Unfortunately, DFU tests fail with first attempt to pass the test.

I've found a couple of problems.

First up, file_fat_write() wasn't truncating the file when writing, so the file size wasn't changing when over-writing a large file with a small file. With this fixed, I can run the DFU tests just fine for all the small files (<1M). I've fixed this locally and in the ff branch on my github.

Nice to hear that you have found the error.

...

Second, ff is slow:

Some random old build I had in flash on my system:

...
Tegra124 (Jetson TK1) # load mmc 1:1 $loadaddr dfu1.bin reading dfu1.bin 1048576 bytes read in 95 ms (10.5 MiB/s)

With my ff branch:

...
Tegra124 (Jetson TK1) # load mmc 1:1 $loadaddr dfu1.bin 1048576 bytes read in 5038 ms (203.1 KiB/s)

That's quite the slow-down! I believe this is causing dfu-util to time out on the larger files (1M+). Just for functional testing, I'll try and find a way to hack dfu-util to have a much larger timeout for the final flush operation. I wonder if the old FAT implementation had a disk cache (e.g. that 32K buffer in BSS?) and we need the same for ff?

I think that our current Fat implementation is optimized for tiny embedded system (and probably no cache).

...

I'll try and track down why it's so slow.

Perhaps there are other issues as yet unfound.

We might also check with sandbox FS set of tests.

-- Best regards, Lukasz Majewski Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

Stephen Warren

25 Sep 25 Sep

7:47 a.m.

New subject: [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers

On 09/23/2015 02:40 AM, Lukasz Majewski wrote:

...

Hi Stephen,

...
On 09/03/2015 08:18 AM, Lukasz Majewski wrote:

...
Hi Lukasz,

...
Hi Tom,

...
On Thu, Sep 03, 2015 at 02:21:39PM +0200, Lukasz Majewski wrote:

...
It is very common that FAT code is using following pattern: if (disk_{read|write}() < 0) return -1;

Up till now the above code was dead, since disk_{read|write) could only return value >= 0. As a result some errors from medium layer (i.e. eMMC/SD) were not caught.

The above behavior was caused by block_{read|write|erase} declared at struct block_dev_desc (@part.h). It returns unsigned long, where 0 indicates error and > 0 indicates that medium operation was correct.

This patch as error regards 0 returned from block_{read|write|erase} when nr_blocks is grater than zero. Read/Write operation with nr_blocks=0 should return 0 and hence is not considered as an error.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com

Test HW: Odroid XU3 - Exynos 5433

Can you pick up Stephen's FAT replacement series and see if it also fixes this problem? Thanks!

Ok, I will test this fat implementation.

I've applied v2 of this patchset on top of SHA1: 79c884d7e449a63fa8f07b7495f8f9873355c48f

Unfortunately, DFU tests fail with first attempt to pass the test.

I've found a couple of problems.

First up, file_fat_write() wasn't truncating the file when writing, so the file size wasn't changing when over-writing a large file with a small file. With this fixed, I can run the DFU tests just fine for all the small files (<1M). I've fixed this locally and in the ff branch on my github.

Nice to hear that you have found the error.

...
Second, ff is slow:

Some random old build I had in flash on my system:

...
Tegra124 (Jetson TK1) # load mmc 1:1 $loadaddr dfu1.bin reading dfu1.bin 1048576 bytes read in 95 ms (10.5 MiB/s)

With my ff branch:

...
Tegra124 (Jetson TK1) # load mmc 1:1 $loadaddr dfu1.bin 1048576 bytes read in 5038 ms (203.1 KiB/s)

That's quite the slow-down! I believe this is causing dfu-util to time out on the larger files (1M+). Just for functional testing, I'll try and find a way to hack dfu-util to have a much larger timeout for the final flush operation. I wonder if the old FAT implementation had a disk cache (e.g. that 32K buffer in BSS?) and we need the same for ff?

Extending the timeout (massively) in dfu-util did make dfu_gadget_test.sh work.

...

I think that our current Fat implementation is optimized for tiny embedded system (and probably no cache).

The ff library claims to be too. I'll try tracing the IO pattern in sandbox and see where the difference lies.

...

...
I'll try and track down why it's so slow.

Perhaps there are other issues as yet unfound.

We might also check with sandbox FS set of tests.

I get the same failures for fs-test.sh with or without this series; TC10 fails and everything else passes, for the non-"sb" FAT tests. (with the file truncation fix I mentioned above applied, although I don't know if it matters for fs-test.sh)

Lukasz Majewski

9 Sep 9 Sep

9:02 a.m.

New subject: [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers

Hi,

...

It is very common that FAT code is using following pattern: if (disk_{read|write}() < 0) return -1;

Up till now the above code was dead, since disk_{read|write) could only return value >= 0. As a result some errors from medium layer (i.e. eMMC/SD) were not caught.

The above behavior was caused by block_{read|write|erase} declared at struct block_dev_desc (@part.h). It returns unsigned long, where 0 indicates error and > 0 indicates that medium operation was correct.

This patch as error regards 0 returned from block_{read|write|erase} when nr_blocks is grater than zero. Read/Write operation with nr_blocks=0 should return 0 and hence is not considered as an error.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com

Are there any more questions regarding this patch? I'd be more than happy if it would be added to v2015.10 :-).

...

Test HW: Odroid XU3 - Exynos 5433

fs/fat/fat.c | 11 +++++++++-- fs/fat/fat_write.c | 11 +++++++++-- 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/fs/fat/fat.c b/fs/fat/fat.c index bccc3e3..d743014 100644 --- a/fs/fat/fat.c +++ b/fs/fat/fat.c @@ -45,11 +45,18 @@ static disk_partition_t cur_part_info;

static int disk_read(__u32 block, __u32 nr_blocks, void *buf) {

ulong ret;

if (!cur_dev || !cur_dev->block_read) return -1;
return cur_dev->block_read(cur_dev->dev,
	cur_part_info.start + block, nr_blocks, buf);
ret = cur_dev->block_read(cur_dev->dev,
		  cur_part_info.start + block,
nr_blocks, buf); +
if (nr_blocks && ret == 0)
return -1;
return ret;
}

int fat_set_blk_dev(block_dev_desc_t *dev_desc, disk_partition_t *info) diff --git a/fs/fat/fat_write.c b/fs/fat/fat_write.c index 98b88ad..adb6940 100644 --- a/fs/fat/fat_write.c +++ b/fs/fat/fat_write.c @@ -30,6 +30,8 @@ static void uppercase(char *str, int len) static int total_sector; static int disk_write(__u32 block, __u32 nr_blocks, void *buf) {

ulong ret;

if (!cur_dev || !cur_dev->block_write) return -1;

@@ -39,8 +41,13 @@ static int disk_write(__u32 block, __u32 nr_blocks, void *buf) return -1; }
return cur_dev->block_write(cur_dev->dev,
	cur_part_info.start + block,
nr_blocks, buf);
ret = cur_dev->block_write(cur_dev->dev,
		   cur_part_info.start + block,
		   nr_blocks, buf);
if (nr_blocks && ret == 0)
return -1;
return ret;
}

/*

-- Best regards, Lukasz Majewski Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

Lukasz Majewski

17 Sep 17 Sep

4:44 p.m.

New subject: [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers

Hi Tom,,

...

Hi,

...
It is very common that FAT code is using following pattern: if (disk_{read|write}() < 0) return -1;

Up till now the above code was dead, since disk_{read|write) could only return value >= 0. As a result some errors from medium layer (i.e. eMMC/SD) were not caught.

The above behavior was caused by block_{read|write|erase} declared at struct block_dev_desc (@part.h). It returns unsigned long, where 0 indicates error and > 0 indicates that medium operation was correct.

This patch as error regards 0 returned from block_{read|write|erase} when nr_blocks is grater than zero. Read/Write operation with nr_blocks=0 should return 0 and hence is not considered as an error.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com

Are there any more questions regarding this patch? I'd be more than happy if it would be added to v2015.10 :-).

Gentle ping :-)

...

...
Test HW: Odroid XU3 - Exynos 5433

fs/fat/fat.c | 11 +++++++++-- fs/fat/fat_write.c | 11 +++++++++-- 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/fs/fat/fat.c b/fs/fat/fat.c index bccc3e3..d743014 100644 --- a/fs/fat/fat.c +++ b/fs/fat/fat.c @@ -45,11 +45,18 @@ static disk_partition_t cur_part_info;

static int disk_read(__u32 block, __u32 nr_blocks, void *buf) {

ulong ret;

if (!cur_dev || !cur_dev->block_read) return -1;
return cur_dev->block_read(cur_dev->dev,
	cur_part_info.start + block, nr_blocks,
buf);
ret = cur_dev->block_read(cur_dev->dev,
		  cur_part_info.start + block,
nr_blocks, buf); +
if (nr_blocks && ret == 0)
return -1;
return ret;
}

int fat_set_blk_dev(block_dev_desc_t *dev_desc, disk_partition_t *info) diff --git a/fs/fat/fat_write.c b/fs/fat/fat_write.c index 98b88ad..adb6940 100644 --- a/fs/fat/fat_write.c +++ b/fs/fat/fat_write.c @@ -30,6 +30,8 @@ static void uppercase(char *str, int len) static int total_sector; static int disk_write(__u32 block, __u32 nr_blocks, void *buf) {

ulong ret;

if (!cur_dev || !cur_dev->block_write) return -1;

@@ -39,8 +41,13 @@ static int disk_write(__u32 block, __u32 nr_blocks, void *buf) return -1; }
return cur_dev->block_write(cur_dev->dev,
	cur_part_info.start + block,
nr_blocks, buf);
ret = cur_dev->block_write(cur_dev->dev,
		   cur_part_info.start + block,
		   nr_blocks, buf);
if (nr_blocks && ret == 0)
return -1;
return ret;
}

/*

-- Best regards, Lukasz Majewski Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

Tom Rini

12 Sep 12 Sep

2:51 p.m.

New subject: [U-Boot] FIX: fat: Provide correct return code from disk_{read|write} to upper layers

On Thu, Sep 03, 2015 at 02:21:39PM +0200, Łukasz Majewski wrote:

...

It is very common that FAT code is using following pattern: if (disk_{read|write}() < 0) return -1;

Up till now the above code was dead, since disk_{read|write) could only return value >= 0. As a result some errors from medium layer (i.e. eMMC/SD) were not caught.

The above behavior was caused by block_{read|write|erase} declared at struct block_dev_desc (@part.h). It returns unsigned long, where 0 indicates error and > 0 indicates that medium operation was correct.

This patch as error regards 0 returned from block_{read|write|erase} when nr_blocks is grater than zero. Read/Write operation with nr_blocks=0 should return 0 and hence is not considered as an error.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com

Test HW: Odroid XU3 - Exynos 5433

Applied to u-boot/master, thanks!

-- Tom

Marek Vasut

28 Aug 28 Aug

11:55 p.m.

On Friday, August 28, 2015 at 03:50:20 PM, Lukasz Majewski wrote:

...

The commit: d9dbb97be0e4a550457aec5f11afefb446169c90 "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end of dw mmc transfer.

For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB) the default timeout is to short.

The new value - 20 seconds - takes into account the situation when SD card triggers internal clean up. Such process may take more than 10 seconds on some cards.

What happens if you pull the SD card out of the slot during such a process?

Also, where did you find out there is such "cleanup" mechanism please ?

...

Signed-off-by: Lukasz Majewski l.majewski@samsung.com Cc: Marek Vasut marex@denx.de Cc: Pantelis Antoniou panto@antoniou-consulting.com Cc: Tom Rini trini@konsulko.com

drivers/mmc/dw_mmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/dw_mmc.c b/drivers/mmc/dw_mmc.c index 77b87e0..21a92d2 100644 --- a/drivers/mmc/dw_mmc.c +++ b/drivers/mmc/dw_mmc.c @@ -213,7 +213,7 @@ static int dwmci_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd,

if (data) { start = get_timer(0);
timeout = 1000;
timeout = 20000;
for (;;) { mask = dwmci_readl(host, DWMCI_RINTSTS); /* Error during data transfer. */

Best regards, Marek Vasut

Lukasz Majewski

29 Aug 29 Aug

1:55 p.m.

On Fri, 28 Aug 2015 23:55:17 +0200 Marek Vasut marex@denx.de wrote:

...

On Friday, August 28, 2015 at 03:50:20 PM, Lukasz Majewski wrote:

...
The commit: d9dbb97be0e4a550457aec5f11afefb446169c90 "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end of dw mmc transfer.

For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB) the default timeout is to short.

The new value - 20 seconds - takes into account the situation when SD card triggers internal clean up. Such process may take more than 10 seconds on some cards.

What happens if you pull the SD card out of the slot during such a process?

Then we would wait 20 seconds :-) to proceed.

To be clear - the mentioned patch introduced regression. It works for small files on a commonly available SD cards (like 4 GiB Kingston/Adata).

When I ran DFU tests I've discovered that there is a problem with storing 8MiB file (dat_8M.img).

Even worse - when one wants to store Image.itb file (which might be 4-6 MiB) it sometimes works and sometimes not. Nightmare for debugging.

Please correct me if I'm wrong - but is seems like we are now using 1 second timeout for detection if SD card has been removed?

Shouldn't we use polling on the card detect IO pin instead? We could add such polling in several places in the MMC subsystem (like we do it with watchdog).

Marek, Pantelis, what do you think about this?

...

Also, where did you find out there is such "cleanup" mechanism please ?

Internally we did some tests with several SD cards. We were stunned when it turned out that for some workloads it took up to 15 seconds to end write operation for small data.

The culprit is the SD Card embedded controller responsible for FTL - flash translation layer. It allows NAND memory on the card to be visible as the block device. More importantly it also takes care of wear leveling and bad block management.

Hence, we don't know when it would start housekeeping operations. We can only poll/wait until this controller finishes it work. The code as it was (with the indefinite loop) was taking this situation into account.

The 1 second timeout is apparently too short and makes using SD card non-deterministic and error prone in u-boot.

Even worse, many devices use SD card as the only storage device.

...

...
Signed-off-by: Lukasz Majewski l.majewski@samsung.com Cc: Marek Vasut marex@denx.de Cc: Pantelis Antoniou panto@antoniou-consulting.com Cc: Tom Rini trini@konsulko.com

drivers/mmc/dw_mmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/dw_mmc.c b/drivers/mmc/dw_mmc.c index 77b87e0..21a92d2 100644 --- a/drivers/mmc/dw_mmc.c +++ b/drivers/mmc/dw_mmc.c @@ -213,7 +213,7 @@ static int dwmci_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd,

if (data) { start = get_timer(0);
timeout = 1000;
timeout = 20000;
for (;;) { mask = dwmci_readl(host, DWMCI_RINTSTS); /* Error during data transfer. */
Best regards, Marek Vasut

Best regards,

Lukasz Majewski

Marek Vasut

3:52 p.m.

On Saturday, August 29, 2015 at 01:55:36 PM, Lukasz Majewski wrote:

...

On Fri, 28 Aug 2015 23:55:17 +0200

Hi!

...

Marek Vasut marex@denx.de wrote:

...
On Friday, August 28, 2015 at 03:50:20 PM, Lukasz Majewski wrote:

...
The commit: d9dbb97be0e4a550457aec5f11afefb446169c90 "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end of dw mmc transfer.

For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB) the default timeout is to short.

The new value - 20 seconds - takes into account the situation when SD card triggers internal clean up. Such process may take more than 10 seconds on some cards.

What happens if you pull the SD card out of the slot during such a process?

Then we would wait 20 seconds :-) to proceed.

Oops, I think I was not clear here. I was wondering what happens to the card if you yank it out of the slot whole it's performing it's internal cleanup or whatever. Is it possible that the card suffers data corruption, effectively trashing user data ? Is this behavior specific to Samsung SD cards ?

...

To be clear - the mentioned patch introduced regression.

That's a bug, not a regression, but anyway, that's not the point. I do agree with you that we do have a problem and I'm inclined to Ack this patch, I'd like to understand what the real implications of such a behavior of these cards are.

...

It works for small files on a commonly available SD cards (like 4 GiB Kingston/Adata).

When I ran DFU tests I've discovered that there is a problem with storing 8MiB file (dat_8M.img).

Even worse - when one wants to store Image.itb file (which might be 4-6 MiB) it sometimes works and sometimes not. Nightmare for debugging.

Ew, that's one crappy card you have there. I'm reading the SD card "Physical Layer Simplified Specification Version 4.10" (part1_410.pdf) section 4.6.2.2 and it states that for SDHC cards, the write operation should take at most 250mS, for SDXC it's 500mS. Could it be that your card is violating the spec ?

...

Please correct me if I'm wrong - but is seems like we are now using 1 second timeout for detection if SD card has been removed?

Shouldn't we use polling on the card detect IO pin instead? We could add such polling in several places in the MMC subsystem (like we do it with watchdog).

Marek, Pantelis, what do you think about this?

If you implement board_mmc_getcd(), you can check if the card is present this way instead of waiting for command to time out. The infrastructure for that is already in place. Right ?

It'd be cool if the MMC subsystem could pull the wp-gpios and cd-gpios from DT though :)

...

...
Also, where did you find out there is such "cleanup" mechanism please ?

Internally we did some tests with several SD cards. We were stunned when it turned out that for some workloads it took up to 15 seconds to end write operation for small data.

The culprit is the SD Card embedded controller responsible for FTL - flash translation layer. It allows NAND memory on the card to be visible as the block device. More importantly it also takes care of wear leveling and bad block management.

Hence, we don't know when it would start housekeeping operations. We can only poll/wait until this controller finishes it work. The code as it was (with the indefinite loop) was taking this situation into account.

The 1 second timeout is apparently too short and makes using SD card non-deterministic and error prone in u-boot.

Even worse, many devices use SD card as the only storage device.

Yes, horrible.

Best regards, Marek Vasut

Lukasz Majewski

6:38 p.m.

Hi Marek,

...

On Saturday, August 29, 2015 at 01:55:36 PM, Lukasz Majewski wrote:

...
On Fri, 28 Aug 2015 23:55:17 +0200

Hi!

...
Marek Vasut marex@denx.de wrote:

...
On Friday, August 28, 2015 at 03:50:20 PM, Lukasz Majewski wrote:

...
The commit: d9dbb97be0e4a550457aec5f11afefb446169c90 "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end of dw mmc transfer.

For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB) the default timeout is to short.

The new value - 20 seconds - takes into account the situation when SD card triggers internal clean up. Such process may take more than 10 seconds on some cards.

What happens if you pull the SD card out of the slot during such a process?

Then we would wait 20 seconds :-) to proceed.

Oops, I think I was not clear here. I was wondering what happens to the card if you yank it out of the slot whole it's performing it's internal cleanup or whatever. Is it possible that the card suffers data corruption, effectively trashing user data ?

I think that only the card manufacturer may reveal what can happen with the card (what policy have been implemented in their FW).

I guess that you may lost data in such case.

...

Is this behavior specific to Samsung SD cards ?

I've experienced the problem with Kingston (brand new one) and Adata MicroSD HC (4GiB) cards.

...

...
To be clear - the mentioned patch introduced regression.

That's a bug, not a regression, but anyway, that's not the point. I do agree with you that we do have a problem and I'm inclined to Ack this patch, I'd like to understand what the real implications of such a behavior of these cards are.

...
It works for small files on a commonly available SD cards (like 4 GiB Kingston/Adata).

When I ran DFU tests I've discovered that there is a problem with storing 8MiB file (dat_8M.img).

Even worse - when one wants to store Image.itb file (which might be 4-6 MiB) it sometimes works and sometimes not. Nightmare for debugging.

Ew, that's one crappy card you have there. I'm reading the SD card "Physical Layer Simplified Specification Version 4.10" (part1_410.pdf) section 4.6.2.2 and it states that for SDHC cards, the write operation should take at most 250mS, for SDXC it's 500mS. Could it be that your card is violating the spec ?

I'll look into the spec and then comment :-).

For my boards the time was 1.2 seconds for storing 8 MiB file.

...

...
Please correct me if I'm wrong - but is seems like we are now using 1 second timeout for detection if SD card has been removed?

Shouldn't we use polling on the card detect IO pin instead? We could add such polling in several places in the MMC subsystem (like we do it with watchdog).

Marek, Pantelis, what do you think about this?

If you implement board_mmc_getcd(), you can check if the card is present this way instead of waiting for command to time out. The infrastructure for that is already in place. Right ?

So you suggest adding board_mmc_getcd() in several places in the mmc subsystem driver to detect removal of the SD card?

...

It'd be cool if the MMC subsystem could pull the wp-gpios and cd-gpios from DT though :)

...

...
...
Also, where did you find out there is such "cleanup" mechanism please ?

Internally we did some tests with several SD cards. We were stunned when it turned out that for some workloads it took up to 15 seconds to end write operation for small data.

The culprit is the SD Card embedded controller responsible for FTL - flash translation layer. It allows NAND memory on the card to be visible as the block device. More importantly it also takes care of wear leveling and bad block management.

Hence, we don't know when it would start housekeeping operations. We can only poll/wait until this controller finishes it work. The code as it was (with the indefinite loop) was taking this situation into account.

The 1 second timeout is apparently too short and makes using SD card non-deterministic and error prone in u-boot.

Even worse, many devices use SD card as the only storage device.

Yes, horrible.

Good that we have agreed.

...

Best regards, Marek Vasut

Best regards,

Łukasz Majewski

Marek Vasut

9:19 p.m.

On Saturday, August 29, 2015 at 06:38:48 PM, Lukasz Majewski wrote:

...

Hi Marek,

Hi Lukasz,

...

...
On Saturday, August 29, 2015 at 01:55:36 PM, Lukasz Majewski wrote:

...
On Fri, 28 Aug 2015 23:55:17 +0200

Hi!

...
Marek Vasut marex@denx.de wrote:

...
On Friday, August 28, 2015 at 03:50:20 PM, Lukasz Majewski wrote:

...
The commit: d9dbb97be0e4a550457aec5f11afefb446169c90 "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end of dw mmc transfer.

For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB) the default timeout is to short.

The new value - 20 seconds - takes into account the situation when SD card triggers internal clean up. Such process may take more than 10 seconds on some cards.

What happens if you pull the SD card out of the slot during such a process?

Then we would wait 20 seconds :-) to proceed.

Oops, I think I was not clear here. I was wondering what happens to the card if you yank it out of the slot whole it's performing it's internal cleanup or whatever. Is it possible that the card suffers data corruption, effectively trashing user data ?

I think that only the card manufacturer may reveal what can happen with the card (what policy have been implemented in their FW).

I guess that you may lost data in such case.

Uuuurgh, that's seriously shitty.

...

...
Is this behavior specific to Samsung SD cards ?

I've experienced the problem with Kingston (brand new one) and Adata MicroSD HC (4GiB) cards.

I had bad previous experience with Kingston too.

...

...
...
To be clear - the mentioned patch introduced regression.

That's a bug, not a regression, but anyway, that's not the point. I do agree with you that we do have a problem and I'm inclined to Ack this patch, I'd like to understand what the real implications of such a behavior of these cards are.

...
It works for small files on a commonly available SD cards (like 4 GiB Kingston/Adata).

When I ran DFU tests I've discovered that there is a problem with storing 8MiB file (dat_8M.img).

Even worse - when one wants to store Image.itb file (which might be 4-6 MiB) it sometimes works and sometimes not. Nightmare for debugging.

Ew, that's one crappy card you have there. I'm reading the SD card "Physical Layer Simplified Specification Version 4.10" (part1_410.pdf) section 4.6.2.2 and it states that for SDHC cards, the write operation should take at most 250mS, for SDXC it's 500mS. Could it be that your card is violating the spec ?

I'll look into the spec and then comment :-).

For my boards the time was 1.2 seconds for storing 8 MiB file.

OK, but that's weird. The transfer should always be up to 512B long and not any longer, unless it's a multiblock transfer.

...

...
...
Please correct me if I'm wrong - but is seems like we are now using 1 second timeout for detection if SD card has been removed?

Shouldn't we use polling on the card detect IO pin instead? We could add such polling in several places in the MMC subsystem (like we do it with watchdog).

Marek, Pantelis, what do you think about this?

If you implement board_mmc_getcd(), you can check if the card is present this way instead of waiting for command to time out. The infrastructure for that is already in place. Right ?

So you suggest adding board_mmc_getcd() in several places in the mmc subsystem driver to detect removal of the SD card?

Hmmmm, I'm not sure about this one. Panto ?

...

...
It'd be cool if the MMC subsystem could pull the wp-gpios and cd-gpios from DT though :)

+1

...
...
...
Also, where did you find out there is such "cleanup" mechanism please ?

Internally we did some tests with several SD cards. We were stunned when it turned out that for some workloads it took up to 15 seconds to end write operation for small data.

The culprit is the SD Card embedded controller responsible for FTL - flash translation layer. It allows NAND memory on the card to be visible as the block device. More importantly it also takes care of wear leveling and bad block management.

Hence, we don't know when it would start housekeeping operations. We can only poll/wait until this controller finishes it work. The code as it was (with the indefinite loop) was taking this situation into account.

The 1 second timeout is apparently too short and makes using SD card non-deterministic and error prone in u-boot.

Even worse, many devices use SD card as the only storage device.

Yes, horrible.

Good that we have agreed.

Heh :)

Best regards, Marek Vasut

Lukasz Majewski

1 Sep 1 Sep

1:19 p.m.

Hi Marek,

...

On Saturday, August 29, 2015 at 06:38:48 PM, Lukasz Majewski wrote:

...
Hi Marek,

Hi Lukasz,

...
...
On Saturday, August 29, 2015 at 01:55:36 PM, Lukasz Majewski wrote:

...
On Fri, 28 Aug 2015 23:55:17 +0200

Hi!

...
Marek Vasut marex@denx.de wrote:

...
On Friday, August 28, 2015 at 03:50:20 PM, Lukasz Majewski wrote:

...
The commit: d9dbb97be0e4a550457aec5f11afefb446169c90 "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end of dw mmc transfer.

For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB) the default timeout is to short.

The new value - 20 seconds - takes into account the situation when SD card triggers internal clean up. Such process may take more than 10 seconds on some cards.

What happens if you pull the SD card out of the slot during such a process?

Then we would wait 20 seconds :-) to proceed.

Oops, I think I was not clear here. I was wondering what happens to the card if you yank it out of the slot whole it's performing it's internal cleanup or whatever. Is it possible that the card suffers data corruption, effectively trashing user data ?

I think that only the card manufacturer may reveal what can happen with the card (what policy have been implemented in their FW).

I guess that you may lost data in such case.

Uuuurgh, that's seriously shitty.

...
...
Is this behavior specific to Samsung SD cards ?

I've experienced the problem with Kingston (brand new one) and Adata MicroSD HC (4GiB) cards.

I had bad previous experience with Kingston too.

...
...
...
To be clear - the mentioned patch introduced regression.

That's a bug, not a regression, but anyway, that's not the point. I do agree with you that we do have a problem and I'm inclined to Ack this patch, I'd like to understand what the real implications of such a behavior of these cards are.

...
It works for small files on a commonly available SD cards (like 4 GiB Kingston/Adata).

When I ran DFU tests I've discovered that there is a problem with storing 8MiB file (dat_8M.img).

Even worse - when one wants to store Image.itb file (which might be 4-6 MiB) it sometimes works and sometimes not. Nightmare for debugging.

Ew, that's one crappy card you have there. I'm reading the SD card "Physical Layer Simplified Specification Version 4.10" (part1_410.pdf) section 4.6.2.2 and it states that for SDHC cards, the write operation should take at most 250mS, for SDXC it's 500mS. Could it be that your card is violating the spec ?

The "timeout" error is for situation when you issue write command (either single or multiple block) and you don't receive any response from the card.

In our case we use multiblock transfer (CMD25) with either set number of block to transfer (CMD23) or explicit end of transmission (CMD12).

Let's consider the second case.

We setup data and issue CMD25. Then we check the CMD25 status (if we don't receive reply in 250 ms we would get timeout error). Afterwards data from buffer is transmitted to the card and flashed. This operation may take long time. During this process you can issue CMD13 (SEND_STATUS) to receive information about card state ([*] 4.10. - page 85).

Two notable fields of [*] to check are READY_FOR_DATA and CURRENT_STATE. Those two state what is the SD Card controller situation.

Then, you end the transfer with CMD12, which also provides some status information from the SDCard (like "prg"|"rcv", etc).

If you want to issue next command you must check READY_FOR_DATA and CURRENT_STATE. In the case of internal SD card controller operation you would not get READY_FOR_DATA until it ends.

...

...
I'll look into the spec and then comment :-).

For my boards the time was 1.2 seconds for storing 8 MiB file.

OK, but that's weird. The transfer should always be up to 512B long and not any longer, unless it's a multiblock transfer.

It is a multi block transfer.

...

...
...
...
Please correct me if I'm wrong - but is seems like we are now using 1 second timeout for detection if SD card has been removed?

Shouldn't we use polling on the card detect IO pin instead? We could add such polling in several places in the MMC subsystem (like we do it with watchdog).

Marek, Pantelis, what do you think about this?

If you implement board_mmc_getcd(), you can check if the card is present this way instead of waiting for command to time out. The infrastructure for that is already in place. Right ?

So you suggest adding board_mmc_getcd() in several places in the mmc subsystem driver to detect removal of the SD card?

Hmmmm, I'm not sure about this one. Panto ?

...
...
It'd be cool if the MMC subsystem could pull the wp-gpios and cd-gpios from DT though :)

+1

...
...
...
Also, where did you find out there is such "cleanup" mechanism please ?

Internally we did some tests with several SD cards. We were stunned when it turned out that for some workloads it took up to 15 seconds to end write operation for small data.

The culprit is the SD Card embedded controller responsible for FTL - flash translation layer. It allows NAND memory on the card to be visible as the block device. More importantly it also takes care of wear leveling and bad block management.

Hence, we don't know when it would start housekeeping operations. We can only poll/wait until this controller finishes it work. The code as it was (with the indefinite loop) was taking this situation into account.

The 1 second timeout is apparently too short and makes using SD card non-deterministic and error prone in u-boot.

Even worse, many devices use SD card as the only storage device.

Yes, horrible.

Good that we have agreed.

Heh :)

Best regards, Marek Vasut

-- Best regards, Lukasz Majewski Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

Marek Vasut

1:33 p.m.

On Tuesday, September 01, 2015 at 01:19:09 PM, Lukasz Majewski wrote:

...

Hi Marek,

Hi!

...

...
On Saturday, August 29, 2015 at 06:38:48 PM, Lukasz Majewski wrote:

...
Hi Marek,

Hi Lukasz,

...
...
On Saturday, August 29, 2015 at 01:55:36 PM, Lukasz Majewski

wrote:

...
On Fri, 28 Aug 2015 23:55:17 +0200

Hi!

...
Marek Vasut marex@denx.de wrote:

...
On Friday, August 28, 2015 at 03:50:20 PM, Lukasz Majewski

wrote: > The commit: d9dbb97be0e4a550457aec5f11afefb446169c90 > "mmc: dw_mmc: Zap endless timeout" removed endless loop > waiting for end of dw mmc transfer. > > For some workloads - dfu test @ Odroid XU3 (sending 8MiB > file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata > 4GiB) the default timeout is to short. > > The new value - 20 seconds - takes into account the > situation when SD card triggers internal clean up. Such > process may take more than 10 seconds on some cards.

What happens if you pull the SD card out of the slot during such a process?

Then we would wait 20 seconds :-) to proceed.

Oops, I think I was not clear here. I was wondering what happens to the card if you yank it out of the slot whole it's performing it's internal cleanup or whatever. Is it possible that the card suffers data corruption, effectively trashing user data ?

I think that only the card manufacturer may reveal what can happen with the card (what policy have been implemented in their FW).

I guess that you may lost data in such case.

Uuuurgh, that's seriously shitty.

...
...
Is this behavior specific to Samsung SD cards ?

I've experienced the problem with Kingston (brand new one) and Adata MicroSD HC (4GiB) cards.

I had bad previous experience with Kingston too.

...
...
...
To be clear - the mentioned patch introduced regression.

That's a bug, not a regression, but anyway, that's not the point. I do agree with you that we do have a problem and I'm inclined to Ack this patch, I'd like to understand what the real implications of such a behavior of these cards are.

...
It works for small files on a commonly available SD cards (like 4 GiB Kingston/Adata).

When I ran DFU tests I've discovered that there is a problem with storing 8MiB file (dat_8M.img).

Even worse - when one wants to store Image.itb file (which might be 4-6 MiB) it sometimes works and sometimes not. Nightmare for debugging.

Ew, that's one crappy card you have there. I'm reading the SD card "Physical Layer Simplified Specification Version 4.10" (part1_410.pdf) section 4.6.2.2 and it states that for SDHC cards, the write operation should take at most 250mS, for SDXC it's 500mS. Could it be that your card is violating the spec ?

The "timeout" error is for situation when you issue write command (either single or multiple block) and you don't receive any response from the card.

In our case we use multiblock transfer (CMD25) with either set number of block to transfer (CMD23) or explicit end of transmission (CMD12).

Let's consider the second case.

We setup data and issue CMD25. Then we check the CMD25 status (if we don't receive reply in 250 ms we would get timeout error). Afterwards data from buffer is transmitted to the card and flashed. This operation may take long time. During this process you can issue CMD13 (SEND_STATUS) to receive information about card state ([*] 4.10. - page 85).

Maybe we should implement such polling through CMD13 then ? But, this should be a matter for next MW, this MW we should go with increasing the timeout to some 20 or 30 second. What do you think ?

...

Two notable fields of [*] to check are READY_FOR_DATA and CURRENT_STATE. Those two state what is the SD Card controller situation.

Then, you end the transfer with CMD12, which also provides some status information from the SDCard (like "prg"|"rcv", etc).

If you want to issue next command you must check READY_FOR_DATA and CURRENT_STATE. In the case of internal SD card controller operation you would not get READY_FOR_DATA until it ends.

...
...
I'll look into the spec and then comment :-).

For my boards the time was 1.2 seconds for storing 8 MiB file.

OK, but that's weird. The transfer should always be up to 512B long and not any longer, unless it's a multiblock transfer.

It is a multi block transfer.

[...]

Lukasz Majewski

5:25 p.m.

Hi Marek,

...

On Tuesday, September 01, 2015 at 01:19:09 PM, Lukasz Majewski wrote:

...
Hi Marek,

Hi!

...
...
On Saturday, August 29, 2015 at 06:38:48 PM, Lukasz Majewski wrote:

...
Hi Marek,

Hi Lukasz,

...
...
On Saturday, August 29, 2015 at 01:55:36 PM, Lukasz Majewski

wrote:

...
On Fri, 28 Aug 2015 23:55:17 +0200

Hi!

...
Marek Vasut marex@denx.de wrote: > On Friday, August 28, 2015 at 03:50:20 PM, Lukasz Majewski > > wrote: > > The commit: d9dbb97be0e4a550457aec5f11afefb446169c90 > > "mmc: dw_mmc: Zap endless timeout" removed endless loop > > waiting for end of dw mmc transfer. > > > > For some workloads - dfu test @ Odroid XU3 (sending 8MiB > > file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata > > 4GiB) the default timeout is to short. > > > > The new value - 20 seconds - takes into account the > > situation when SD card triggers internal clean up. Such > > process may take more than 10 seconds on some cards. > > What happens if you pull the SD card out of the slot > during such a process?

Then we would wait 20 seconds :-) to proceed.

Oops, I think I was not clear here. I was wondering what happens to the card if you yank it out of the slot whole it's performing it's internal cleanup or whatever. Is it possible that the card suffers data corruption, effectively trashing user data ?

I think that only the card manufacturer may reveal what can happen with the card (what policy have been implemented in their FW).

I guess that you may lost data in such case.

Uuuurgh, that's seriously shitty.

...
...
Is this behavior specific to Samsung SD cards ?

I've experienced the problem with Kingston (brand new one) and Adata MicroSD HC (4GiB) cards.

I had bad previous experience with Kingston too.

...
...
...
To be clear - the mentioned patch introduced regression.

That's a bug, not a regression, but anyway, that's not the point. I do agree with you that we do have a problem and I'm inclined to Ack this patch, I'd like to understand what the real implications of such a behavior of these cards are.

...
It works for small files on a commonly available SD cards (like 4 GiB Kingston/Adata).

When I ran DFU tests I've discovered that there is a problem with storing 8MiB file (dat_8M.img).

Even worse - when one wants to store Image.itb file (which might be 4-6 MiB) it sometimes works and sometimes not. Nightmare for debugging.

Ew, that's one crappy card you have there. I'm reading the SD card "Physical Layer Simplified Specification Version 4.10" (part1_410.pdf) section 4.6.2.2 and it states that for SDHC cards, the write operation should take at most 250mS, for SDXC it's 500mS. Could it be that your card is violating the spec ?

The "timeout" error is for situation when you issue write command (either single or multiple block) and you don't receive any response from the card.

In our case we use multiblock transfer (CMD25) with either set number of block to transfer (CMD23) or explicit end of transmission (CMD12).

Let's consider the second case.

We setup data and issue CMD25. Then we check the CMD25 status (if we don't receive reply in 250 ms we would get timeout error). Afterwards data from buffer is transmitted to the card and flashed. This operation may take long time. During this process you can issue CMD13 (SEND_STATUS) to receive information about card state ([*] 4.10. - page 85).

Maybe we should implement such polling through CMD13 then ? But, this should be a matter for next MW, this MW we should go with increasing the timeout to some 20 or 30 second. What do you think ?

For next release we can increase the timeout. Rewritting eMMC subsystem code is a challenging task and we will not finish until the next u-boot release.

...

...
Two notable fields of [*] to check are READY_FOR_DATA and CURRENT_STATE. Those two state what is the SD Card controller situation.

Then, you end the transfer with CMD12, which also provides some status information from the SDCard (like "prg"|"rcv", etc).

If you want to issue next command you must check READY_FOR_DATA and CURRENT_STATE. In the case of internal SD card controller operation you would not get READY_FOR_DATA until it ends.

...
...
I'll look into the spec and then comment :-).

For my boards the time was 1.2 seconds for storing 8 MiB file.

OK, but that's weird. The transfer should always be up to 512B long and not any longer, unless it's a multiblock transfer.

It is a multi block transfer.

[...]

-- Best regards, Lukasz Majewski Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

Marek Vasut

5:35 p.m.

On Tuesday, September 01, 2015 at 05:25:00 PM, Lukasz Majewski wrote:

...

Hi Marek,

...
On Tuesday, September 01, 2015 at 01:19:09 PM, Lukasz Majewski wrote:

...
Hi Marek,

Hi!

...
...
On Saturday, August 29, 2015 at 06:38:48 PM, Lukasz Majewski

wrote:

...
Hi Marek,

Hi Lukasz,

...
...
On Saturday, August 29, 2015 at 01:55:36 PM, Lukasz Majewski

wrote: > On Fri, 28 Aug 2015 23:55:17 +0200

Hi!

> Marek Vasut marex@denx.de wrote: > > On Friday, August 28, 2015 at 03:50:20 PM, Lukasz Majewski > > > > wrote: > > > The commit: d9dbb97be0e4a550457aec5f11afefb446169c90 > > > "mmc: dw_mmc: Zap endless timeout" removed endless loop > > > waiting for end of dw mmc transfer. > > > > > > For some workloads - dfu test @ Odroid XU3 (sending 8MiB > > > file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata > > > 4GiB) the default timeout is to short. > > > > > > The new value - 20 seconds - takes into account the > > > situation when SD card triggers internal clean up. Such > > > process may take more than 10 seconds on some cards. > > > > What happens if you pull the SD card out of the slot > > during such a process? > > Then we would wait 20 seconds :-) to proceed.

Oops, I think I was not clear here. I was wondering what happens to the card if you yank it out of the slot whole it's performing it's internal cleanup or whatever. Is it possible that the card suffers data corruption, effectively trashing user data ?

I think that only the card manufacturer may reveal what can happen with the card (what policy have been implemented in their FW).

I guess that you may lost data in such case.

Uuuurgh, that's seriously shitty.

...
...
Is this behavior specific to Samsung SD cards ?

I've experienced the problem with Kingston (brand new one) and Adata MicroSD HC (4GiB) cards.

I had bad previous experience with Kingston too.

...
...
> To be clear - the mentioned patch introduced regression.

That's a bug, not a regression, but anyway, that's not the point. I do agree with you that we do have a problem and I'm inclined to Ack this patch, I'd like to understand what the real implications of such a behavior of these cards are.

> It works for > small files on a commonly available SD cards (like 4 GiB > Kingston/Adata). > > When I ran DFU tests I've discovered that there is a problem > with storing 8MiB file (dat_8M.img). > > Even worse - when one wants to store Image.itb file (which > might be 4-6 MiB) it sometimes works and sometimes not. > Nightmare for debugging.

Ew, that's one crappy card you have there. I'm reading the SD card "Physical Layer Simplified Specification Version 4.10" (part1_410.pdf) section 4.6.2.2 and it states that for SDHC cards, the write operation should take at most 250mS, for SDXC it's 500mS. Could it be that your card is violating the spec ?

The "timeout" error is for situation when you issue write command (either single or multiple block) and you don't receive any response from the card.

In our case we use multiblock transfer (CMD25) with either set number of block to transfer (CMD23) or explicit end of transmission (CMD12).

Let's consider the second case.

We setup data and issue CMD25. Then we check the CMD25 status (if we don't receive reply in 250 ms we would get timeout error). Afterwards data from buffer is transmitted to the card and flashed. This operation may take long time. During this process you can issue CMD13 (SEND_STATUS) to receive information about card state ([*] 4.10. - page 85).

Maybe we should implement such polling through CMD13 then ? But, this should be a matter for next MW, this MW we should go with increasing the timeout to some 20 or 30 second. What do you think ?

For next release we can increase the timeout. Rewritting eMMC subsystem code is a challenging task and we will not finish until the next u-boot release.

Yeah, we're in agreement.

Pantelis Antoniou

6:22 p.m.

Hi Marek,

...

On Aug 29, 2015, at 22:19 , Marek Vasut marex@denx.de wrote:

On Saturday, August 29, 2015 at 06:38:48 PM, Lukasz Majewski wrote:

...
Hi Marek,

Hi Lukasz,

...
...
On Saturday, August 29, 2015 at 01:55:36 PM, Lukasz Majewski wrote:

...
On Fri, 28 Aug 2015 23:55:17 +0200

Hi!

...
Marek Vasut marex@denx.de wrote:

...
On Friday, August 28, 2015 at 03:50:20 PM, Lukasz Majewski wrote:

...
The commit: d9dbb97be0e4a550457aec5f11afefb446169c90 "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end of dw mmc transfer.

For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB) the default timeout is to short.

The new value - 20 seconds - takes into account the situation when SD card triggers internal clean up. Such process may take more than 10 seconds on some cards.

What happens if you pull the SD card out of the slot during such a process?

Then we would wait 20 seconds :-) to proceed.

Oops, I think I was not clear here. I was wondering what happens to the card if you yank it out of the slot whole it's performing it's internal cleanup or whatever. Is it possible that the card suffers data corruption, effectively trashing user data ?

I think that only the card manufacturer may reveal what can happen with the card (what policy have been implemented in their FW).

I guess that you may lost data in such case.

Uuuurgh, that's seriously shitty.

Welcome to the world of managed flash. Manufacturers tend not to follow the spec to the letter. What matters is standard consumer usage benchmarks.

...

...
...
Is this behavior specific to Samsung SD cards ?

I've experienced the problem with Kingston (brand new one) and Adata MicroSD HC (4GiB) cards.

I had bad previous experience with Kingston too.

...
...
...
To be clear - the mentioned patch introduced regression.

That's a bug, not a regression, but anyway, that's not the point. I do agree with you that we do have a problem and I'm inclined to Ack this patch, I'd like to understand what the real implications of such a behavior of these cards are.

...
It works for small files on a commonly available SD cards (like 4 GiB Kingston/Adata).

When I ran DFU tests I've discovered that there is a problem with storing 8MiB file (dat_8M.img).

Even worse - when one wants to store Image.itb file (which might be 4-6 MiB) it sometimes works and sometimes not. Nightmare for debugging.

Ew, that's one crappy card you have there. I'm reading the SD card "Physical Layer Simplified Specification Version 4.10" (part1_410.pdf) section 4.6.2.2 and it states that for SDHC cards, the write operation should take at most 250mS, for SDXC it's 500mS. Could it be that your card is violating the spec ?

I'll look into the spec and then comment :-).

For my boards the time was 1.2 seconds for storing 8 MiB file.

OK, but that's weird. The transfer should always be up to 512B long and not any longer, unless it's a multiblock transfer.

...
...
...
Please correct me if I'm wrong - but is seems like we are now using 1 second timeout for detection if SD card has been removed?

Shouldn't we use polling on the card detect IO pin instead? We could add such polling in several places in the MMC subsystem (like we do it with watchdog).

Marek, Pantelis, what do you think about this?

If you implement board_mmc_getcd(), you can check if the card is present this way instead of waiting for command to time out. The infrastructure for that is already in place. Right ?

So you suggest adding board_mmc_getcd() in several places in the mmc subsystem driver to detect removal of the SD card?

Hmmmm, I'm not sure about this one. Panto ?

I’m fine with this. Perhaps we can avoid spamming this all over the place, and that would be better.

...

...
...
It'd be cool if the MMC subsystem could pull the wp-gpios and cd-gpios from DT though :)

+1

Indeed. I would expect this to be a per-board thing. I would not expect all platforms to provide that capability.

...

...
...
...
...
Also, where did you find out there is such "cleanup" mechanism please ?

Internally we did some tests with several SD cards. We were stunned when it turned out that for some workloads it took up to 15 seconds to end write operation for small data.

The culprit is the SD Card embedded controller responsible for FTL - flash translation layer. It allows NAND memory on the card to be visible as the block device. More importantly it also takes care of wear leveling and bad block management.

Hence, we don't know when it would start housekeeping operations. We can only poll/wait until this controller finishes it work. The code as it was (with the indefinite loop) was taking this situation into account.

The 1 second timeout is apparently too short and makes using SD card non-deterministic and error prone in u-boot.

Even worse, many devices use SD card as the only storage device.

Yes, horrible.

Good that we have agreed.

Heh :)

Horrible closed source flash management algorithm is horrible, film at 11 :)

...

Best regards, Marek Vasut

Regards

— Pantelis

Marek Vasut

2 Sep 2 Sep

10:06 a.m.

On Tuesday, September 01, 2015 at 06:22:08 PM, Pantelis Antoniou wrote:

...

Hi Marek,

Hi!

[...]

...

...
...
So you suggest adding board_mmc_getcd() in several places in the mmc subsystem driver to detect removal of the SD card?

Hmmmm, I'm not sure about this one. Panto ?

I’m fine with this. Perhaps we can avoid spamming this all over the place, and that would be better.

...
...
...
It'd be cool if the MMC subsystem could pull the wp-gpios and cd-gpios from DT though :)

+1

Indeed. I would expect this to be a per-board thing. I would not expect all platforms to provide that capability.

That's right.

...

...
...
...
...
...
Also, where did you find out there is such "cleanup" mechanism please ?

Internally we did some tests with several SD cards. We were stunned when it turned out that for some workloads it took up to 15 seconds to end write operation for small data.

The culprit is the SD Card embedded controller responsible for FTL - flash translation layer. It allows NAND memory on the card to be visible as the block device. More importantly it also takes care of wear leveling and bad block management.

Hence, we don't know when it would start housekeeping operations. We can only poll/wait until this controller finishes it work. The code as it was (with the indefinite loop) was taking this situation into account.

The 1 second timeout is apparently too short and makes using SD card non-deterministic and error prone in u-boot.

Even worse, many devices use SD card as the only storage device.

Yes, horrible.

Good that we have agreed.

Heh :)

Horrible closed source flash management algorithm is horrible, film at 11 :)

It was some seriously repugnant flick ...

Lukasz Majewski

9 Sep 9 Sep

9:01 a.m.

Hi,

...

The commit: d9dbb97be0e4a550457aec5f11afefb446169c90 "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end of dw mmc transfer.

For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB) the default timeout is to short.

The new value - 20 seconds - takes into account the situation when SD card triggers internal clean up. Such process may take more than 10 seconds on some cards.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com Cc: Marek Vasut marex@denx.de Cc: Pantelis Antoniou panto@antoniou-consulting.com Cc: Tom Rini trini@konsulko.com

Are there any more questions regarding this patch or is it ready for submission as fix for v2015.10?

...

drivers/mmc/dw_mmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/dw_mmc.c b/drivers/mmc/dw_mmc.c index 77b87e0..21a92d2 100644 --- a/drivers/mmc/dw_mmc.c +++ b/drivers/mmc/dw_mmc.c @@ -213,7 +213,7 @@ static int dwmci_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, if (data) { start = get_timer(0);
timeout = 1000;
timeout = 20000;
for (;;) { mask = dwmci_readl(host, DWMCI_RINTSTS); /* Error during data transfer. */

-- Best regards, Lukasz Majewski Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

Marek Vasut

1:34 p.m.

On Wednesday, September 09, 2015 at 09:01:30 AM, Lukasz Majewski wrote:

...

Hi,

...
The commit: d9dbb97be0e4a550457aec5f11afefb446169c90 "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end of dw mmc transfer.

For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB) the default timeout is to short.

The new value - 20 seconds - takes into account the situation when SD card triggers internal clean up. Such process may take more than 10 seconds on some cards.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com Cc: Marek Vasut marex@denx.de Cc: Pantelis Antoniou panto@antoniou-consulting.com Cc: Tom Rini trini@konsulko.com

Are there any more questions regarding this patch or is it ready for submission as fix for v2015.10?

No comments, just apply this.

But this should really be fixed properly in the next MW.

Best regards, Marek Vasut

Alexey Brodkin

11 Sep 11 Sep

7:20 p.m.

Hi Marek, Lukasz,

...

On Wednesday, September 09, 2015 at 09:01:30 AM, Lukasz Majewski wrote:

...
Hi,

...
The commit: d9dbb97be0e4a550457aec5f11afefb446169c90 "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end of dw mmc transfer.

For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB) the default timeout is to short.

The new value - 20 seconds - takes into account the situation when SD card triggers internal clean up. Such process may take more than 10 seconds on some cards.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com Cc: Marek Vasut marex@denx.de Cc: Pantelis Antoniou panto@antoniou-consulting.com Cc: Tom Rini trini@konsulko.com

Are there any more questions regarding this patch or is it ready for submission as fix for v2015.10?

No comments, just apply this.

But this should really be fixed properly in the next MW.

Best regards, Marek Vasut

FWIW I faced similar problem even reading data. At least on one of my boards reading of ~8Mb file took ~1.7 seconds and so 1 second timeout was interrupting data exchange.

So indeed we need to have some dirty hack for upcoming release like bumping timeout to something really huge but later we need to fix that problem properly.

I though proper solution would be to set timeout depending of amount of data to be exchanged... something like take slowest speed of SD/MMC that is allowed by spec and calculate delay based on how much time it might take for that slow device and for safety multiply it by say 2.

Now from this thread I see that there're other reasons that might affect length of at least write operation. In other words it could be complicated unfortunately.

Still we need to fix regression first with virtually infinite timeout :) I would even thing that simple revert of Marek's patch may make sense for now. From both points of view for keeping history clean (compared to stacked fixes/workarounds) and from removal of regression root cause.

It's not that I like to have infinite loops but given previous implementation worked fine for people in the previous U-Boot release.

-Alexey

Lukasz Majewski

11:45 p.m.

Hi Alexey,

...

Hi Marek, Lukasz,

...
On Wednesday, September 09, 2015 at 09:01:30 AM, Lukasz Majewski wrote:

...
Hi,

...
The commit: d9dbb97be0e4a550457aec5f11afefb446169c90 "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end of dw mmc transfer.

For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB) the default timeout is to short.

The new value - 20 seconds - takes into account the situation when SD card triggers internal clean up. Such process may take more than 10 seconds on some cards.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com Cc: Marek Vasut marex@denx.de Cc: Pantelis Antoniou panto@antoniou-consulting.com Cc: Tom Rini trini@konsulko.com

Are there any more questions regarding this patch or is it ready for submission as fix for v2015.10?

No comments, just apply this.

But this should really be fixed properly in the next MW.

Best regards, Marek Vasut

FWIW I faced similar problem even reading data. At least on one of my boards reading of ~8Mb file took ~1.7 seconds and so 1 second timeout was interrupting data exchange.

Was it SD card or eMMC device?

...

So indeed we need to have some dirty hack for upcoming release like bumping timeout to something really huge but later we need to fix that problem properly.

I though proper solution would be to set timeout depending of amount of data to be exchanged... something like take slowest speed of SD/MMC that is allowed by spec and calculate delay based on how much time it might take for that slow device and for safety multiply it by say 2.

As fair as I remember, card provide this kind of information. We can try to investigate this possibility.

...

Now from this thread I see that there're other reasons that might affect length of at least write operation. In other words it could be complicated unfortunately.

My gut feeling is that proper handling of eMMC would require quite a fair mmc subsystem rework.

...

Still we need to fix regression first with virtually infinite timeout :) I would even thing that simple revert of Marek's patch may make sense for now.

+1 - unfortunately there were some other patches applied to this particular patch. Simple revert might be a bit tricky here.

...

From both points of view for keeping history clean (compared to stacked fixes/workarounds) and from removal of regression root cause.

As I said before - +1 from me.

...

It's not that I like to have infinite loops but given previous implementation worked fine for people in the previous U-Boot release.

Good justification

Best regards, Lukasz Majewski

...

-Alexey _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

Marek Vasut

12 Sep 12 Sep

6:13 p.m.

On Friday, September 11, 2015 at 11:45:28 PM, Lukasz Majewski wrote:

...

Hi Alexey,

Hi,

...

...
Hi Marek, Lukasz,

...
On Wednesday, September 09, 2015 at 09:01:30 AM, Lukasz Majewski

wrote:

...
Hi,

...
The commit: d9dbb97be0e4a550457aec5f11afefb446169c90 "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end of dw mmc transfer.

For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB) the default timeout is to short.

The new value - 20 seconds - takes into account the situation when SD card triggers internal clean up. Such process may take more than 10 seconds on some cards.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com Cc: Marek Vasut marex@denx.de Cc: Pantelis Antoniou panto@antoniou-consulting.com Cc: Tom Rini trini@konsulko.com

Are there any more questions regarding this patch or is it ready for submission as fix for v2015.10?

No comments, just apply this.

But this should really be fixed properly in the next MW.

Best regards, Marek Vasut

FWIW I faced similar problem even reading data. At least on one of my boards reading of ~8Mb file took ~1.7 seconds and so 1 second timeout was interrupting data exchange.

Was it SD card or eMMC device?

...
So indeed we need to have some dirty hack for upcoming release like bumping timeout to something really huge but later we need to fix that problem properly.

I though proper solution would be to set timeout depending of amount of data to be exchanged... something like take slowest speed of SD/MMC that is allowed by spec and calculate delay based on how much time it might take for that slow device and for safety multiply it by say 2.

As fair as I remember, card provide this kind of information. We can try to investigate this possibility.

Please do.

...

...
Now from this thread I see that there're other reasons that might affect length of at least write operation. In other words it could be complicated unfortunately.

My gut feeling is that proper handling of eMMC would require quite a fair mmc subsystem rework.

Can you please elaborate ?

...

...
Still we need to fix regression first with virtually infinite timeout :) I would even thing that simple revert of Marek's patch may make sense for now.

+1 - unfortunately there were some other patches applied to this particular patch. Simple revert might be a bit tricky here.

-1 - In case the card gets removed during the DMA transfer and the board doesn't have a watchdog, it will get stuck indefinitelly. We absolutelly don't want this sort of behavior in U-Boot. I understand that this is the easiest way for everyone to achieve some sort of "working" solution, but it is definitelly not the correct one. While I do agree to increasing the timeout, I do not agree to unbounded loops, sorry.

...

...
From both points of view for keeping history clean (compared to stacked fixes/workarounds) and from removal of regression root cause.

As I said before - +1 from me.

As I said before, -1 from me. Btw. did anything regress in here? To me, this seems like a newly discovered bug ...

...

...
It's not that I like to have infinite loops but given previous implementation worked fine for people in the previous U-Boot release.

Good justification

It is never a justified to return to a potentially problematic version for the sake of getting some sort of crappy hardware operational.

Best regards, Marek Vasut

Lukasz Majewski

13 Sep 13 Sep

12:03 p.m.

Hi Marek,

...

On Friday, September 11, 2015 at 11:45:28 PM, Lukasz Majewski wrote:

...
Hi Alexey,

Hi,

...
...
Hi Marek, Lukasz,

...
On Wednesday, September 09, 2015 at 09:01:30 AM, Lukasz Majewski

wrote:

...
Hi,

...
The commit: d9dbb97be0e4a550457aec5f11afefb446169c90 "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end of dw mmc transfer.

For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB) the default timeout is to short.

The new value - 20 seconds - takes into account the situation when SD card triggers internal clean up. Such process may take more than 10 seconds on some cards.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com Cc: Marek Vasut marex@denx.de Cc: Pantelis Antoniou panto@antoniou-consulting.com Cc: Tom Rini trini@konsulko.com

Are there any more questions regarding this patch or is it ready for submission as fix for v2015.10?

No comments, just apply this.

But this should really be fixed properly in the next MW.

Best regards, Marek Vasut

FWIW I faced similar problem even reading data. At least on one of my boards reading of ~8Mb file took ~1.7 seconds and so 1 second timeout was interrupting data exchange.

Was it SD card or eMMC device?

...
So indeed we need to have some dirty hack for upcoming release like bumping timeout to something really huge but later we need to fix that problem properly.

I though proper solution would be to set timeout depending of amount of data to be exchanged... something like take slowest speed of SD/MMC that is allowed by spec and calculate delay based on how much time it might take for that slow device and for safety multiply it by say 2.

As fair as I remember, card provide this kind of information. We can try to investigate this possibility.

Please do.

...
...
Now from this thread I see that there're other reasons that might affect length of at least write operation. In other words it could be complicated unfortunately.

My gut feeling is that proper handling of eMMC would require quite a fair mmc subsystem rework.

Can you please elaborate ?

This is only my personal guess (when taking account my previous work done with dw_mmc on Exynos HW) - I think Pantelis would have more knowledge to elaborate here.

...

...
...
Still we need to fix regression first with virtually infinite timeout :) I would even thing that simple revert of Marek's patch may make sense for now.

+1 - unfortunately there were some other patches applied to this particular patch. Simple revert might be a bit tricky here.

-1 - In case the card gets removed during the DMA transfer and the board doesn't have a watchdog, it will get stuck indefinitelly.

I'm just wondering here - why the indefinite loop was working previously? Was anybody complaining (on the ML) about the problem of removing the SD card when some operation is ongoing?

The problem with potential removal of SD card (after booting the board) is with us for quite long time. Even with indefinite loop (without your patch) we also could "hang" the board if the SD card was removed during a transfer.

...

We absolutelly don't want this sort of behavior in U-Boot. I understand that this is the easiest way for everyone to achieve some sort of "working" solution, but it is definitelly not the correct one. While I do agree to increasing the timeout, I do not agree to unbounded loops, sorry.

We have agreed to not agree :-)

...

...
...
From both points of view for keeping history clean (compared to stacked fixes/workarounds) and from removal of regression root cause.

As I said before - +1 from me.

As I said before, -1 from me. Btw. did anything regress in here? To me, this seems like a newly discovered bug ...

Yes, this is a bug. We had similar problem with Samsung's SDHCI, before we switched to dw_mmc. This issue is new at dw_mmc.

...

...
...
It's not that I like to have infinite loops but given previous implementation worked fine for people in the previous U-Boot release.

Good justification

It is never a justified to return to a potentially problematic version

IMHO revering the change (before the release) is from the software development point of view better solution than adding some heuristic delta to timeout.

...

for the sake of getting some sort of crappy hardware operational.

Unfortunately this "crappy hardware" is pervasive and we cannot do anything about it.

To sum up (my point of view): 1. The best would be to revert the patch - but if simple "git revert" is not working then, 2. We should increase the timeout (with my patch) for v2015.10 release 3. After release we can devise some kind of solution 4. It is a good topic for U-boot's minisummit discussion at Dublin

Marek, Alexey, Tom, Pantelis what do you think?

...

Best regards, Marek Vasut

Best regards, Lukasz Majewski

Marek Vasut

4 p.m.

On Sunday, September 13, 2015 at 12:03:18 PM, Lukasz Majewski wrote:

...

Hi Marek,

Hi,

[...]

...

...
...
...
Now from this thread I see that there're other reasons that might affect length of at least write operation. In other words it could be complicated unfortunately.

My gut feeling is that proper handling of eMMC would require quite a fair mmc subsystem rework.

Can you please elaborate ?

This is only my personal guess (when taking account my previous work done with dw_mmc on Exynos HW)

I think Pantelis would have more knowledge to elaborate here.

...

...
...
...
Still we need to fix regression first with virtually infinite timeout :) I would even thing that simple revert of Marek's patch may make sense for now.

+1 - unfortunately there were some other patches applied to this particular patch. Simple revert might be a bit tricky here.

-1 - In case the card gets removed during the DMA transfer and the board doesn't have a watchdog, it will get stuck indefinitelly.

I'm just wondering here - why the indefinite loop was working previously? Was anybody complaining (on the ML) about the problem of removing the SD card when some operation is ongoing?

It worked for me for all the workloads I used. Noone was complaining.

...

The problem with potential removal of SD card (after booting the board) is with us for quite long time. Even with indefinite loop (without your patch) we also could "hang" the board if the SD card was removed during a transfer.

Which is why we should weed out the unbounded loops.

...

...
We absolutelly don't want this sort of behavior in U-Boot. I understand that this is the easiest way for everyone to achieve some sort of "working" solution, but it is definitelly not the correct one. While I do agree to increasing the timeout, I do not agree to unbounded loops, sorry.

We have agreed to not agree :-)

Yes :-)

...

...
...
...
From both points of view for keeping history clean (compared to stacked fixes/workarounds) and from removal of regression root cause.

As I said before - +1 from me.

As I said before, -1 from me. Btw. did anything regress in here? To me, this seems like a newly discovered bug ...

Yes, this is a bug. We had similar problem with Samsung's SDHCI, before we switched to dw_mmc. This issue is new at dw_mmc.

...
...
...
It's not that I like to have infinite loops but given previous implementation worked fine for people in the previous U-Boot release.

Good justification

It is never a justified to return to a potentially problematic version

IMHO revering the change (before the release) is from the software development point of view better solution than adding some heuristic delta to timeout.

...
for the sake of getting some sort of crappy hardware operational.

Unfortunately this "crappy hardware" is pervasive and we cannot do anything about it.

To sum up (my point of view):

The best would be to revert the patch - but if simple "git revert" is

not working then, 2. We should increase the timeout (with my patch) for v2015.10 release

Let's do this for the sake of crappy cards.

...

After release we can devise some kind of solution

It is a good topic for U-boot's minisummit discussion at Dublin

Marek, Alexey, Tom, Pantelis what do you think?

I think yes.

Alexey Brodkin

14 Sep 14 Sep

12:15 p.m.

Hi Marek, Lukasz,

On Sun, 2015-09-13 at 16:00 +0200, Marek Vasut wrote:

...

On Sunday, September 13, 2015 at 12:03:18 PM, Lukasz Majewski wrote:

...
Hi Marek,

Hi,

[...]

...

...
...
...
...
Still we need to fix regression first with virtually infinite timeout :) I would even thing that simple revert of Marek's patch may make sense for now.

+1 - unfortunately there were some other patches applied to this particular patch. Simple revert might be a bit tricky here.

-1 - In case the card gets removed during the DMA transfer and the board doesn't have a watchdog, it will get stuck indefinitelly.

I'm just wondering here - why the indefinite loop was working previously? Was anybody complaining (on the ML) about the problem of removing the SD card when some operation is ongoing?

It worked for me for all the workloads I used. Noone was complaining.

The same story here - previous code with infinite loop was working for my boards. And now I do see a problem with pretty simple scenario that we do use in our products.

...

...
The problem with potential removal of SD card (after booting the board) is with us for quite long time. Even with indefinite loop (without your patch) we also could "hang" the board if the SD card was removed during a transfer.

Which is why we should weed out the unbounded loops.

...
...
We absolutelly don't want this sort of behavior in U-Boot. I understand that this is the easiest way for everyone to achieve some sort of "working" solution, but it is definitelly not the correct one. While I do agree to increasing the timeout, I do not agree to unbounded loops, sorry.

We have agreed to not agree :-)

Yes :-)

The first thing I care is working U-Boot v2015.10 out of the box on my boards. And so I may agree on any temporary solution. I see it as timeout value either being infinite or obviously very high like 60 seconds.

60 seconds might sound stupid but my thought behind this is to make sure even long transfers succeed. Imagine 100 Mb rootfs or update file downloaded from slow SD-card.

...

...
...
...
...
From both points of view for keeping history clean (compared to stacked fixes/workarounds) and from removal of regression root cause.

As I said before - +1 from me.

As I said before, -1 from me. Btw. did anything regress in here? To me, this seems like a newly discovered bug ...

Yes, this is a bug. We had similar problem with Samsung's SDHCI, before we switched to dw_mmc. This issue is new at dw_mmc.

...
...
...
It's not that I like to have infinite loops but given previous implementation worked fine for people in the previous U-Boot release.

Good justification

It is never a justified to return to a potentially problematic version

IMHO revering the change (before the release) is from the software development point of view better solution than adding some heuristic delta to timeout.

...
for the sake of getting some sort of crappy hardware operational.

Unfortunately this "crappy hardware" is pervasive and we cannot do anything about it.

To sum up (my point of view):

The best would be to revert the patch - but if simple "git revert" is

not working then,

Well even if clean revert won't work we may do manual tweaks so that functionally it is "revert". If of any interest I may come up with that sort of patch.

...

...

We should increase the timeout (with my patch) for v2015.10 release

If everybody is OK with that let's go do it. Because release is around the corner and I don't want to explain each and every user how to fix their new problem.

...

Let's do this for the sake of crappy cards.

...

After release we can devise some kind of solution

It is a good topic for U-boot's minisummit discussion at Dublin

Marek, Alexey, Tom, Pantelis what do you think?

I think yes.

What's important we need to make sure Tom is aware of this situation and he won't cut a release until our fix is in place and all involved parties reported their happiness.

-Alexey

Lukasz Majewski

1:22 p.m.

Hi Alexey,

...

Hi Marek, Lukasz,

On Sun, 2015-09-13 at 16:00 +0200, Marek Vasut wrote:

...
On Sunday, September 13, 2015 at 12:03:18 PM, Lukasz Majewski wrote:

...
Hi Marek,

Hi,

[...]

...
...
...
...
...
Still we need to fix regression first with virtually infinite timeout :) I would even thing that simple revert of Marek's patch may make sense for now.

+1 - unfortunately there were some other patches applied to this particular patch. Simple revert might be a bit tricky here.

-1 - In case the card gets removed during the DMA transfer and the board doesn't have a watchdog, it will get stuck indefinitelly.

I'm just wondering here - why the indefinite loop was working previously? Was anybody complaining (on the ML) about the problem of removing the SD card when some operation is ongoing?

It worked for me for all the workloads I used. Noone was complaining.

The same story here - previous code with infinite loop was working for my boards. And now I do see a problem with pretty simple scenario that we do use in our products.

...
...
The problem with potential removal of SD card (after booting the board) is with us for quite long time. Even with indefinite loop (without your patch) we also could "hang" the board if the SD card was removed during a transfer.

Which is why we should weed out the unbounded loops.

...
...
We absolutelly don't want this sort of behavior in U-Boot. I understand that this is the easiest way for everyone to achieve some sort of "working" solution, but it is definitelly not the correct one. While I do agree to increasing the timeout, I do not agree to unbounded loops, sorry.

We have agreed to not agree :-)

Yes :-)

The first thing I care is working U-Boot v2015.10 out of the box on my boards. And so I may agree on any temporary solution. I see it as timeout value either being infinite or obviously very high like 60 seconds.

60 seconds might sound stupid but my thought behind this is to make sure even long transfers succeed. Imagine 100 Mb rootfs or update file downloaded from slow SD-card.

Transfer of rootfs to SD-card (downloaded to memory via tftp) is definitely valid scenario.

...

...
...
...
...
...
From both points of view for keeping history clean (compared to stacked fixes/workarounds) and from removal of regression root cause.

As I said before - +1 from me.

As I said before, -1 from me. Btw. did anything regress in here? To me, this seems like a newly discovered bug ...

Yes, this is a bug. We had similar problem with Samsung's SDHCI, before we switched to dw_mmc. This issue is new at dw_mmc.

...
...
...
It's not that I like to have infinite loops but given previous implementation worked fine for people in the previous U-Boot release.

Good justification

It is never a justified to return to a potentially problematic version

IMHO revering the change (before the release) is from the software development point of view better solution than adding some heuristic delta to timeout.

...
for the sake of getting some sort of crappy hardware operational.

Unfortunately this "crappy hardware" is pervasive and we cannot do anything about it.

To sum up (my point of view):

The best would be to revert the patch - but if simple "git

revert" is not working then,

Well even if clean revert won't work we may do manual tweaks so that functionally it is "revert". If of any interest I may come up with that sort of patch.

...
...

We should increase the timeout (with my patch) for v2015.10

release

If everybody is OK with that let's go do it. Because release is around the corner and I don't want to explain each and every user how to fix their new problem.

v2015.10 correctness is crucial here.

...

...
Let's do this for the sake of crappy cards.

...

After release we can devise some kind of solution

It is a good topic for U-boot's minisummit discussion at Dublin

Marek, Alexey, Tom, Pantelis what do you think?

I think yes.

What's important we need to make sure Tom is aware of this situation and he won't cut a release until our fix is in place and all involved parties reported their happiness.

I also think that Tom should speak up on this issue.

...

-Alexey

-- Best regards, Lukasz Majewski Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

Marek Vasut

3:36 p.m.

On Monday, September 14, 2015 at 01:22:20 PM, Lukasz Majewski wrote:

...

Hi Alexey,

...
Hi Marek, Lukasz,

On Sun, 2015-09-13 at 16:00 +0200, Marek Vasut wrote:

...
On Sunday, September 13, 2015 at 12:03:18 PM, Lukasz Majewski wrote:

...
Hi Marek,

Hi,

[...]

...
...
...
> Still we need to fix regression first with virtually > infinite timeout :) I would even thing that simple revert > of Marek's patch may make sense for now.

+1 - unfortunately there were some other patches applied to this particular patch. Simple revert might be a bit tricky here.

-1 - In case the card gets removed during the DMA transfer and the board doesn't have a watchdog, it will get stuck indefinitelly.

I'm just wondering here - why the indefinite loop was working previously? Was anybody complaining (on the ML) about the problem of removing the SD card when some operation is ongoing?

It worked for me for all the workloads I used. Noone was complaining.

The same story here - previous code with infinite loop was working for my boards. And now I do see a problem with pretty simple scenario that we do use in our products.

...
...
The problem with potential removal of SD card (after booting the board) is with us for quite long time. Even with indefinite loop (without your patch) we also could "hang" the board if the SD card was removed during a transfer.

Which is why we should weed out the unbounded loops.

...
...
We absolutelly don't want this sort of behavior in U-Boot. I understand that this is the easiest way for everyone to achieve some sort of "working" solution, but it is definitelly not the correct one. While I do agree to increasing the timeout, I do not agree to unbounded loops, sorry.

We have agreed to not agree :-)

Yes :-)

The first thing I care is working U-Boot v2015.10 out of the box on my boards. And so I may agree on any temporary solution. I see it as timeout value either being infinite or obviously very high like 60 seconds.

60 seconds might sound stupid but my thought behind this is to make sure even long transfers succeed. Imagine 100 Mb rootfs or update file downloaded from slow SD-card.

Transfer of rootfs to SD-card (downloaded to memory via tftp) is definitely valid scenario.

...
...
...
...
...
> From both points of view for keeping history > clean (compared to stacked fixes/workarounds) and from > removal of regression root cause.

As I said before - +1 from me.

As I said before, -1 from me. Btw. did anything regress in here? To me, this seems like a newly discovered bug ...

Yes, this is a bug. We had similar problem with Samsung's SDHCI, before we switched to dw_mmc. This issue is new at dw_mmc.

...
...
> It's not that I like to have infinite loops but given > previous implementation worked fine for people in the > previous U-Boot release.

Good justification

It is never a justified to return to a potentially problematic version

IMHO revering the change (before the release) is from the software development point of view better solution than adding some heuristic delta to timeout.

...
for the sake of getting some sort of crappy hardware operational.

Unfortunately this "crappy hardware" is pervasive and we cannot do anything about it.

To sum up (my point of view):

The best would be to revert the patch - but if simple "git

revert" is not working then,

Well even if clean revert won't work we may do manual tweaks so that functionally it is "revert". If of any interest I may come up with that sort of patch.

...
...

We should increase the timeout (with my patch) for v2015.10

release

If everybody is OK with that let's go do it. Because release is around the corner and I don't want to explain each and every user how to fix their new problem.

v2015.10 correctness is crucial here.

...
...
Let's do this for the sake of crappy cards.

...

After release we can devise some kind of solution

It is a good topic for U-boot's minisummit discussion at Dublin

Marek, Alexey, Tom, Pantelis what do you think?

I think yes.

What's important we need to make sure Tom is aware of this situation and he won't cut a release until our fix is in place and all involved parties reported their happiness.

I also think that Tom should speak up on this issue.

btw. I think I know why SoCFPGA never triggered this issue, it has CONFIG_SYS_MMC_MAX_BLK_COUNT set to 256 so a large transfer was always broken to smaller chunks.

Lukasz Majewski

17 Sep 17 Sep

4:43 p.m.

Hi Tom,

...

On Monday, September 14, 2015 at 01:22:20 PM, Lukasz Majewski wrote:

...
Hi Alexey,

...
Hi Marek, Lukasz,

On Sun, 2015-09-13 at 16:00 +0200, Marek Vasut wrote:

...
On Sunday, September 13, 2015 at 12:03:18 PM, Lukasz Majewski wrote:

...
Hi Marek,

Hi,

[...]

...
...
> > Still we need to fix regression first with virtually > > infinite timeout :) I would even thing that simple > > revert of Marek's patch may make sense for now. > > +1 - unfortunately there were some other patches applied > to this particular patch. Simple revert might be a bit > tricky here.

-1 - In case the card gets removed during the DMA transfer and the board doesn't have a watchdog, it will get stuck indefinitelly.

I'm just wondering here - why the indefinite loop was working previously? Was anybody complaining (on the ML) about the problem of removing the SD card when some operation is ongoing?

It worked for me for all the workloads I used. Noone was complaining.

The same story here - previous code with infinite loop was working for my boards. And now I do see a problem with pretty simple scenario that we do use in our products.

...
...
The problem with potential removal of SD card (after booting the board) is with us for quite long time. Even with indefinite loop (without your patch) we also could "hang" the board if the SD card was removed during a transfer.

Which is why we should weed out the unbounded loops.

...
...
We absolutelly don't want this sort of behavior in U-Boot. I understand that this is the easiest way for everyone to achieve some sort of "working" solution, but it is definitelly not the correct one. While I do agree to increasing the timeout, I do not agree to unbounded loops, sorry.

We have agreed to not agree :-)

Yes :-)

The first thing I care is working U-Boot v2015.10 out of the box on my boards. And so I may agree on any temporary solution. I see it as timeout value either being infinite or obviously very high like 60 seconds.

60 seconds might sound stupid but my thought behind this is to make sure even long transfers succeed. Imagine 100 Mb rootfs or update file downloaded from slow SD-card.

Transfer of rootfs to SD-card (downloaded to memory via tftp) is definitely valid scenario.

...
...
...
...
> > From both points of view for keeping history > > clean (compared to stacked fixes/workarounds) and from > > removal of regression root cause. > > As I said before - +1 from me.

As I said before, -1 from me. Btw. did anything regress in here? To me, this seems like a newly discovered bug ...

Yes, this is a bug. We had similar problem with Samsung's SDHCI, before we switched to dw_mmc. This issue is new at dw_mmc.

...
> > It's not that I like to have infinite loops but given > > previous implementation worked fine for people in the > > previous U-Boot release. > > Good justification

It is never a justified to return to a potentially problematic version

IMHO revering the change (before the release) is from the software development point of view better solution than adding some heuristic delta to timeout.

...
for the sake of getting some sort of crappy hardware operational.

Unfortunately this "crappy hardware" is pervasive and we cannot do anything about it.

To sum up (my point of view):

The best would be to revert the patch - but if simple "git

revert" is not working then,

Well even if clean revert won't work we may do manual tweaks so that functionally it is "revert". If of any interest I may come up with that sort of patch.

...
...

We should increase the timeout (with my patch) for v2015.10

release

If everybody is OK with that let's go do it. Because release is around the corner and I don't want to explain each and every user how to fix their new problem.

v2015.10 correctness is crucial here.

...
...
Let's do this for the sake of crappy cards.

...

After release we can devise some kind of solution

It is a good topic for U-boot's minisummit discussion at

Dublin

Marek, Alexey, Tom, Pantelis what do you think?

I think yes.

What's important we need to make sure Tom is aware of this situation and he won't cut a release until our fix is in place and all involved parties reported their happiness.

...

...
I also think that Tom should speak up on this issue.

Tom, could you give your opinion on this issue?

...

btw. I think I know why SoCFPGA never triggered this issue, it has CONFIG_SYS_MMC_MAX_BLK_COUNT set to 256 so a large transfer was always broken to smaller chunks.

-- Best regards, Lukasz Majewski Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

Tom Rini

18 Sep 18 Sep

2:31 a.m.

On Thu, Sep 17, 2015 at 04:43:33PM +0200, Lukasz Majewski wrote:

...

Hi Tom,

...
On Monday, September 14, 2015 at 01:22:20 PM, Lukasz Majewski wrote:

...
Hi Alexey,

...
Hi Marek, Lukasz,

On Sun, 2015-09-13 at 16:00 +0200, Marek Vasut wrote:

...
On Sunday, September 13, 2015 at 12:03:18 PM, Lukasz Majewski wrote:

...
Hi Marek,

Hi,

[...]

...
> > > Still we need to fix regression first with virtually > > > infinite timeout :) I would even thing that simple > > > revert of Marek's patch may make sense for now. > > > > +1 - unfortunately there were some other patches applied > > to this particular patch. Simple revert might be a bit > > tricky here. > > -1 - In case the card gets removed during the DMA transfer > and the board doesn't have a watchdog, it will get stuck > indefinitelly.

I'm just wondering here - why the indefinite loop was working previously? Was anybody complaining (on the ML) about the problem of removing the SD card when some operation is ongoing?

It worked for me for all the workloads I used. Noone was complaining.

The same story here - previous code with infinite loop was working for my boards. And now I do see a problem with pretty simple scenario that we do use in our products.

...
...
The problem with potential removal of SD card (after booting the board) is with us for quite long time. Even with indefinite loop (without your patch) we also could "hang" the board if the SD card was removed during a transfer.

Which is why we should weed out the unbounded loops.

...
> We > absolutelly don't want this sort of behavior in U-Boot. I > understand that this is the easiest way for everyone to > achieve some sort of "working" solution, but it is > definitelly not the correct one. While I do agree to > increasing the timeout, I do not agree to unbounded loops, > sorry.

We have agreed to not agree :-)

Yes :-)

The first thing I care is working U-Boot v2015.10 out of the box on my boards. And so I may agree on any temporary solution. I see it as timeout value either being infinite or obviously very high like 60 seconds.

60 seconds might sound stupid but my thought behind this is to make sure even long transfers succeed. Imagine 100 Mb rootfs or update file downloaded from slow SD-card.

Transfer of rootfs to SD-card (downloaded to memory via tftp) is definitely valid scenario.

...
...
...
> > > From both points of view for keeping history > > > clean (compared to stacked fixes/workarounds) and from > > > removal of regression root cause. > > > > As I said before - +1 from me. > > As I said before, -1 from me. Btw. did anything regress in > here? To me, this seems like a newly discovered bug ...

Yes, this is a bug. We had similar problem with Samsung's SDHCI, before we switched to dw_mmc. This issue is new at dw_mmc.

> > > It's not that I like to have infinite loops but given > > > previous implementation worked fine for people in the > > > previous U-Boot release. > > > > Good justification > > It is never a justified to return to a potentially > problematic version

IMHO revering the change (before the release) is from the software development point of view better solution than adding some heuristic delta to timeout.

> for the sake of getting some sort of crappy hardware > operational.

Unfortunately this "crappy hardware" is pervasive and we cannot do anything about it.

To sum up (my point of view):

The best would be to revert the patch - but if simple "git

revert" is not working then,

Well even if clean revert won't work we may do manual tweaks so that functionally it is "revert". If of any interest I may come up with that sort of patch.

...
...

We should increase the timeout (with my patch) for v2015.10

release

If everybody is OK with that let's go do it. Because release is around the corner and I don't want to explain each and every user how to fix their new problem.

v2015.10 correctness is crucial here.

Yes.

...

...
...
...
...
Let's do this for the sake of crappy cards.

...

After release we can devise some kind of solution

It is a good topic for U-boot's minisummit discussion at

Dublin

Marek, Alexey, Tom, Pantelis what do you think?

I think yes.

What's important we need to make sure Tom is aware of this situation and he won't cut a release until our fix is in place and all involved parties reported their happiness.

...
...
I also think that Tom should speak up on this issue.

Tom, could you give your opinion on this issue?

Well, is there a concensus patch now?

-- Tom

Lukasz Majewski

9:32 a.m.

Hi Tom,

...

On Thu, Sep 17, 2015 at 04:43:33PM +0200, Lukasz Majewski wrote:

...
Hi Tom,

...
On Monday, September 14, 2015 at 01:22:20 PM, Lukasz Majewski wrote:

...
Hi Alexey,

...
Hi Marek, Lukasz,

On Sun, 2015-09-13 at 16:00 +0200, Marek Vasut wrote:

...
On Sunday, September 13, 2015 at 12:03:18 PM, Lukasz Majewski wrote: > Hi Marek,

Hi,

[...]

> > > > Still we need to fix regression first with virtually > > > > infinite timeout :) I would even thing that simple > > > > revert of Marek's patch may make sense for now. > > > > > > +1 - unfortunately there were some other patches > > > applied to this particular patch. Simple revert might > > > be a bit tricky here. > > > > -1 - In case the card gets removed during the DMA > > transfer and the board doesn't have a watchdog, it will > > get stuck indefinitelly. > > I'm just wondering here - why the indefinite loop was > working previously? Was anybody complaining (on the ML) > about the problem of removing the SD card when some > operation is ongoing?

It worked for me for all the workloads I used. Noone was complaining.

The same story here - previous code with infinite loop was working for my boards. And now I do see a problem with pretty simple scenario that we do use in our products.

...
> The problem with potential removal of SD card (after > booting the board) is with us for quite long time. Even > with indefinite loop (without your patch) we also could > "hang" the board if the SD card was removed during a > transfer.

Which is why we should weed out the unbounded loops.

> > We > > absolutelly don't want this sort of behavior in U-Boot. > > I understand that this is the easiest way for everyone > > to achieve some sort of "working" solution, but it is > > definitelly not the correct one. While I do agree to > > increasing the timeout, I do not agree to unbounded > > loops, sorry. > > We have agreed to not agree :-)

Yes :-)

The first thing I care is working U-Boot v2015.10 out of the box on my boards. And so I may agree on any temporary solution. I see it as timeout value either being infinite or obviously very high like 60 seconds.

60 seconds might sound stupid but my thought behind this is to make sure even long transfers succeed. Imagine 100 Mb rootfs or update file downloaded from slow SD-card.

Transfer of rootfs to SD-card (downloaded to memory via tftp) is definitely valid scenario.

...
...
> > > > From both points of view for keeping history > > > > clean (compared to stacked fixes/workarounds) and > > > > from removal of regression root cause. > > > > > > As I said before - +1 from me. > > > > As I said before, -1 from me. Btw. did anything regress > > in here? To me, this seems like a newly discovered > > bug ... > > Yes, this is a bug. We had similar problem with Samsung's > SDHCI, before we switched to dw_mmc. This issue is new at > dw_mmc. > > > > > It's not that I like to have infinite loops but > > > > given previous implementation worked fine for > > > > people in the previous U-Boot release. > > > > > > Good justification > > > > It is never a justified to return to a potentially > > problematic version > > IMHO revering the change (before the release) is from the > software development point of view better solution than > adding some heuristic delta to timeout. > > > for the sake of getting some sort of crappy hardware > > operational. > > Unfortunately this "crappy hardware" is pervasive and we > cannot do anything about it. > > To sum up (my point of view): > 1. The best would be to revert the patch - but if simple > "git revert" is not working then,

Well even if clean revert won't work we may do manual tweaks so that functionally it is "revert". If of any interest I may come up with that sort of patch.

...
> 2. We should increase the timeout (with my patch) for > v2015.10 release

If everybody is OK with that let's go do it. Because release is around the corner and I don't want to explain each and every user how to fix their new problem.

v2015.10 correctness is crucial here.

Yes.

...
...
...
...
...
Let's do this for the sake of crappy cards.

> 3. After release we can devise some kind of solution > 4. It is a good topic for U-boot's minisummit discussion > at Dublin > > Marek, Alexey, Tom, Pantelis what do you think?

I think yes.

What's important we need to make sure Tom is aware of this situation and he won't cut a release until our fix is in place and all involved parties reported their happiness.

...
...
I also think that Tom should speak up on this issue.

Tom, could you give your opinion on this issue?

Well, is there a concensus patch now?

There isn't a consensus patch.

There are two options: 1. Try to revert changes (which remove endless loop)

2. Increase the delay to e.g. 4 minutes (as it is done in Linux) before v2015.10 and provide correct solution (based on internal SD card information) after the release.

-- Best regards, Lukasz Majewski Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

Przemyslaw Marczak

10:07 a.m.

Hello,

On 09/18/2015 09:32 AM, Lukasz Majewski wrote:

...

Hi Tom,

...
On Thu, Sep 17, 2015 at 04:43:33PM +0200, Lukasz Majewski wrote:

...
Hi Tom,

...
On Monday, September 14, 2015 at 01:22:20 PM, Lukasz Majewski wrote:

...
Hi Alexey,

...
Hi Marek, Lukasz,

On Sun, 2015-09-13 at 16:00 +0200, Marek Vasut wrote: > On Sunday, September 13, 2015 at 12:03:18 PM, Lukasz > Majewski wrote: >> Hi Marek, > > Hi, > > [...] > >>>>> Still we need to fix regression first with virtually >>>>> infinite timeout :) I would even thing that simple >>>>> revert of Marek's patch may make sense for now. >>>> >>>> +1 - unfortunately there were some other patches >>>> applied to this particular patch. Simple revert might >>>> be a bit tricky here. >>> >>> -1 - In case the card gets removed during the DMA >>> transfer and the board doesn't have a watchdog, it will >>> get stuck indefinitelly. >> >> I'm just wondering here - why the indefinite loop was >> working previously? Was anybody complaining (on the ML) >> about the problem of removing the SD card when some >> operation is ongoing? > > It worked for me for all the workloads I used. Noone was > complaining.

The same story here - previous code with infinite loop was working for my boards. And now I do see a problem with pretty simple scenario that we do use in our products.

>> The problem with potential removal of SD card (after >> booting the board) is with us for quite long time. Even >> with indefinite loop (without your patch) we also could >> "hang" the board if the SD card was removed during a >> transfer. > > Which is why we should weed out the unbounded loops. > >>> We >>> absolutelly don't want this sort of behavior in U-Boot. >>> I understand that this is the easiest way for everyone >>> to achieve some sort of "working" solution, but it is >>> definitelly not the correct one. While I do agree to >>> increasing the timeout, I do not agree to unbounded >>> loops, sorry. >> >> We have agreed to not agree :-) > > Yes :-)

The first thing I care is working U-Boot v2015.10 out of the box on my boards. And so I may agree on any temporary solution. I see it as timeout value either being infinite or obviously very high like 60 seconds.

60 seconds might sound stupid but my thought behind this is to make sure even long transfers succeed. Imagine 100 Mb rootfs or update file downloaded from slow SD-card.

Transfer of rootfs to SD-card (downloaded to memory via tftp) is definitely valid scenario.

...
>>>>> From both points of view for keeping history >>>>> clean (compared to stacked fixes/workarounds) and >>>>> from removal of regression root cause. >>>> >>>> As I said before - +1 from me. >>> >>> As I said before, -1 from me. Btw. did anything regress >>> in here? To me, this seems like a newly discovered >>> bug ... >> >> Yes, this is a bug. We had similar problem with Samsung's >> SDHCI, before we switched to dw_mmc. This issue is new at >> dw_mmc. >> >>>>> It's not that I like to have infinite loops but >>>>> given previous implementation worked fine for >>>>> people in the previous U-Boot release. >>>> >>>> Good justification >>> >>> It is never a justified to return to a potentially >>> problematic version >> >> IMHO revering the change (before the release) is from the >> software development point of view better solution than >> adding some heuristic delta to timeout. >> >>> for the sake of getting some sort of crappy hardware >>> operational. >> >> Unfortunately this "crappy hardware" is pervasive and we >> cannot do anything about it. >> >> To sum up (my point of view): >> 1. The best would be to revert the patch - but if simple >> "git revert" is not working then,

Well even if clean revert won't work we may do manual tweaks so that functionally it is "revert". If of any interest I may come up with that sort of patch.

>> 2. We should increase the timeout (with my patch) for >> v2015.10 release

If everybody is OK with that let's go do it. Because release is around the corner and I don't want to explain each and every user how to fix their new problem.

v2015.10 correctness is crucial here.

Yes.

...
...
...
...
> Let's do this for the sake of crappy cards. > >> 3. After release we can devise some kind of solution >> 4. It is a good topic for U-boot's minisummit discussion >> at Dublin >> >> Marek, Alexey, Tom, Pantelis what do you think? > > I think yes.

What's important we need to make sure Tom is aware of this situation and he won't cut a release until our fix is in place and all involved parties reported their happiness.

...
...
I also think that Tom should speak up on this issue.

Tom, could you give your opinion on this issue?

Well, is there a concensus patch now?

There isn't a consensus patch.

There are two options:

Try to revert changes (which remove endless loop)

Increase the delay to e.g. 4 minutes (as it is done in Linux) before

v2015.10 and provide correct solution (based on internal SD card information) after the release.

I recommend the second option, however please note that SD card doesn't provide any information about the delay required to finish the command, beside some informations about block erase operation time.

The block erase time info, probably depends on vendor implementation, so it would be better to use as long time for delay as possible, and just rely on the controller error status info instead of breaking the operation after too short timeout.

In such case I think, that it would be nice if user could be notified with some time interval, that the card is busy.

In other words, when he's waiting the second minute for the transfer end, then how does he know if the board doesn't hang?

Best regards,

-- Przemyslaw Marczak Samsung R&D Institute Poland Samsung Electronics p.marczak@samsung.com

Tom Rini

9:27 p.m.

On Fri, Sep 18, 2015 at 09:32:47AM +0200, Lukasz Majewski wrote:

...

Hi Tom,

...
On Thu, Sep 17, 2015 at 04:43:33PM +0200, Lukasz Majewski wrote:

...
Hi Tom,

...
On Monday, September 14, 2015 at 01:22:20 PM, Lukasz Majewski wrote:

...
Hi Alexey,

...
Hi Marek, Lukasz,

On Sun, 2015-09-13 at 16:00 +0200, Marek Vasut wrote: > On Sunday, September 13, 2015 at 12:03:18 PM, Lukasz > Majewski wrote: > > Hi Marek, > > Hi, > > [...] > > > > > > Still we need to fix regression first with virtually > > > > > infinite timeout :) I would even thing that simple > > > > > revert of Marek's patch may make sense for now. > > > > > > > > +1 - unfortunately there were some other patches > > > > applied to this particular patch. Simple revert might > > > > be a bit tricky here. > > > > > > -1 - In case the card gets removed during the DMA > > > transfer and the board doesn't have a watchdog, it will > > > get stuck indefinitelly. > > > > I'm just wondering here - why the indefinite loop was > > working previously? Was anybody complaining (on the ML) > > about the problem of removing the SD card when some > > operation is ongoing? > > It worked for me for all the workloads I used. Noone was > complaining.

The same story here - previous code with infinite loop was working for my boards. And now I do see a problem with pretty simple scenario that we do use in our products.

> > The problem with potential removal of SD card (after > > booting the board) is with us for quite long time. Even > > with indefinite loop (without your patch) we also could > > "hang" the board if the SD card was removed during a > > transfer. > > Which is why we should weed out the unbounded loops. > > > > We > > > absolutelly don't want this sort of behavior in U-Boot. > > > I understand that this is the easiest way for everyone > > > to achieve some sort of "working" solution, but it is > > > definitelly not the correct one. While I do agree to > > > increasing the timeout, I do not agree to unbounded > > > loops, sorry. > > > > We have agreed to not agree :-) > > Yes :-)

The first thing I care is working U-Boot v2015.10 out of the box on my boards. And so I may agree on any temporary solution. I see it as timeout value either being infinite or obviously very high like 60 seconds.

60 seconds might sound stupid but my thought behind this is to make sure even long transfers succeed. Imagine 100 Mb rootfs or update file downloaded from slow SD-card.

Transfer of rootfs to SD-card (downloaded to memory via tftp) is definitely valid scenario.

...
> > > > > From both points of view for keeping history > > > > > clean (compared to stacked fixes/workarounds) and > > > > > from removal of regression root cause. > > > > > > > > As I said before - +1 from me. > > > > > > As I said before, -1 from me. Btw. did anything regress > > > in here? To me, this seems like a newly discovered > > > bug ... > > > > Yes, this is a bug. We had similar problem with Samsung's > > SDHCI, before we switched to dw_mmc. This issue is new at > > dw_mmc. > > > > > > > It's not that I like to have infinite loops but > > > > > given previous implementation worked fine for > > > > > people in the previous U-Boot release. > > > > > > > > Good justification > > > > > > It is never a justified to return to a potentially > > > problematic version > > > > IMHO revering the change (before the release) is from the > > software development point of view better solution than > > adding some heuristic delta to timeout. > > > > > for the sake of getting some sort of crappy hardware > > > operational. > > > > Unfortunately this "crappy hardware" is pervasive and we > > cannot do anything about it. > > > > To sum up (my point of view): > > 1. The best would be to revert the patch - but if simple > > "git revert" is not working then,

Well even if clean revert won't work we may do manual tweaks so that functionally it is "revert". If of any interest I may come up with that sort of patch.

> > 2. We should increase the timeout (with my patch) for > > v2015.10 release

If everybody is OK with that let's go do it. Because release is around the corner and I don't want to explain each and every user how to fix their new problem.

v2015.10 correctness is crucial here.

Yes.

...
...
...
...
> Let's do this for the sake of crappy cards. > > > 3. After release we can devise some kind of solution > > 4. It is a good topic for U-boot's minisummit discussion > > at Dublin > > > > Marek, Alexey, Tom, Pantelis what do you think? > > I think yes.

What's important we need to make sure Tom is aware of this situation and he won't cut a release until our fix is in place and all involved parties reported their happiness.

...
...
I also think that Tom should speak up on this issue.

Tom, could you give your opinion on this issue?

Well, is there a concensus patch now?

There isn't a consensus patch.

There are two options:

Try to revert changes (which remove endless loop)

Increase the delay to e.g. 4 minutes (as it is done in Linux) before

v2015.10 and provide correct solution (based on internal SD card information) after the release.

OK, lets go with #2.

-- Tom

Pantelis Antoniou

21 Sep 21 Sep

5:32 p.m.

Hi Tom,

...

On Sep 18, 2015, at 22:27 , Tom Rini trini@konsulko.com wrote:

On Fri, Sep 18, 2015 at 09:32:47AM +0200, Lukasz Majewski wrote:

...
Hi Tom,

...
On Thu, Sep 17, 2015 at 04:43:33PM +0200, Lukasz Majewski wrote:

...
Hi Tom,

...
On Monday, September 14, 2015 at 01:22:20 PM, Lukasz Majewski wrote:

...
Hi Alexey,

> Hi Marek, Lukasz, > > On Sun, 2015-09-13 at 16:00 +0200, Marek Vasut wrote: >> On Sunday, September 13, 2015 at 12:03:18 PM, Lukasz >> Majewski wrote: >>> Hi Marek, >> >> Hi, >> >> [...] >> >>>>>> Still we need to fix regression first with virtually >>>>>> infinite timeout :) I would even thing that simple >>>>>> revert of Marek's patch may make sense for now. >>>>> >>>>> +1 - unfortunately there were some other patches >>>>> applied to this particular patch. Simple revert might >>>>> be a bit tricky here. >>>> >>>> -1 - In case the card gets removed during the DMA >>>> transfer and the board doesn't have a watchdog, it will >>>> get stuck indefinitelly. >>> >>> I'm just wondering here - why the indefinite loop was >>> working previously? Was anybody complaining (on the ML) >>> about the problem of removing the SD card when some >>> operation is ongoing? >> >> It worked for me for all the workloads I used. Noone was >> complaining. > > The same story here - previous code with infinite loop was > working for my boards. And now I do see a problem with pretty > simple scenario that we do use in our products. > >>> The problem with potential removal of SD card (after >>> booting the board) is with us for quite long time. Even >>> with indefinite loop (without your patch) we also could >>> "hang" the board if the SD card was removed during a >>> transfer. >> >> Which is why we should weed out the unbounded loops. >> >>>> We >>>> absolutelly don't want this sort of behavior in U-Boot. >>>> I understand that this is the easiest way for everyone >>>> to achieve some sort of "working" solution, but it is >>>> definitelly not the correct one. While I do agree to >>>> increasing the timeout, I do not agree to unbounded >>>> loops, sorry. >>> >>> We have agreed to not agree :-) >> >> Yes :-) > > The first thing I care is working U-Boot v2015.10 out of the > box on my boards. And so I may agree on any temporary > solution. I see it as timeout value either being infinite or > obviously very high like 60 seconds. > > 60 seconds might sound stupid but my thought behind this is to > make sure even long transfers succeed. Imagine 100 Mb rootfs > or update file downloaded from slow SD-card.

Transfer of rootfs to SD-card (downloaded to memory via tftp) is definitely valid scenario.

>>>>>> From both points of view for keeping history >>>>>> clean (compared to stacked fixes/workarounds) and >>>>>> from removal of regression root cause. >>>>> >>>>> As I said before - +1 from me. >>>> >>>> As I said before, -1 from me. Btw. did anything regress >>>> in here? To me, this seems like a newly discovered >>>> bug ... >>> >>> Yes, this is a bug. We had similar problem with Samsung's >>> SDHCI, before we switched to dw_mmc. This issue is new at >>> dw_mmc. >>> >>>>>> It's not that I like to have infinite loops but >>>>>> given previous implementation worked fine for >>>>>> people in the previous U-Boot release. >>>>> >>>>> Good justification >>>> >>>> It is never a justified to return to a potentially >>>> problematic version >>> >>> IMHO revering the change (before the release) is from the >>> software development point of view better solution than >>> adding some heuristic delta to timeout. >>> >>>> for the sake of getting some sort of crappy hardware >>>> operational. >>> >>> Unfortunately this "crappy hardware" is pervasive and we >>> cannot do anything about it. >>> >>> To sum up (my point of view): >>> 1. The best would be to revert the patch - but if simple >>> "git revert" is not working then, > > Well even if clean revert won't work we may do manual tweaks > so that functionally it is "revert". If of any interest I may > come up with that sort of patch. > >>> 2. We should increase the timeout (with my patch) for >>> v2015.10 release > > If everybody is OK with that let's go do it. Because release > is around the corner and I don't want to explain each and > every user how to fix their new problem.

v2015.10 correctness is crucial here.

Yes.

...
...
...
>> Let's do this for the sake of crappy cards. >> >>> 3. After release we can devise some kind of solution >>> 4. It is a good topic for U-boot's minisummit discussion >>> at Dublin >>> >>> Marek, Alexey, Tom, Pantelis what do you think? >> >> I think yes. > > What's important we need to make sure Tom is aware of this > situation and he won't cut a release until our fix is in place > and all involved parties reported their happiness.

...
...
I also think that Tom should speak up on this issue.

Tom, could you give your opinion on this issue?

Well, is there a concensus patch now?

There isn't a consensus patch.

There are two options:

Try to revert changes (which remove endless loop)

Increase the delay to e.g. 4 minutes (as it is done in Linux) before

v2015.10 and provide correct solution (based on internal SD card information) after the release.

OK, lets go with #2.

Send me a patch with the delay to 4mins please (blergh).

...

-- Tom

Regards

— Pantelis

Alexey Brodkin

14 Sep 14 Sep

12:30 p.m.

On Fri, 2015-09-11 at 23:45 +0200, Lukasz Majewski wrote:

...

Hi Alexey,

...

...
FWIW I faced similar problem even reading data. At least on one of my boards reading of ~8Mb file took ~1.7 seconds and so 1 second timeout was interrupting data exchange.

Was it SD card or eMMC device?

It was external SD-card.

...

...
So indeed we need to have some dirty hack for upcoming release like bumping timeout to something really huge but later we need to fix that problem properly.

I though proper solution would be to set timeout depending of amount of data to be exchanged... something like take slowest speed of SD/MMC that is allowed by spec and calculate delay based on how much time it might take for that slow device and for safety multiply it by say 2.

As fair as I remember, card provide this kind of information. We can try to investigate this possibility.

I'm pretty sure card may report a lot of info about itself. And if we look at what people do in Linux kernel we may see calculations of different timeouts in "drivers/mmc/core/core.c".

But keeping in mind we're not writing another OS kernel but we're just dealing with bootloader I'd prefer to keep things simple and do check only critical things like totally broken HW. I.e. something simple should be enough for U-Boot.

...

...
Now from this thread I see that there're other reasons that might affect length of at least write operation. In other words it could be complicated unfortunately.

-Alexey

Przemyslaw Marczak

1:15 p.m.

Hi Alexey,

On 09/14/2015 12:30 PM, Alexey Brodkin wrote:

...

On Fri, 2015-09-11 at 23:45 +0200, Lukasz Majewski wrote:

...
Hi Alexey,

...
...
FWIW I faced similar problem even reading data. At least on one of my boards reading of ~8Mb file took ~1.7 seconds and so 1 second timeout was interrupting data exchange.

Was it SD card or eMMC device?

It was external SD-card.

...
...
So indeed we need to have some dirty hack for upcoming release like bumping timeout to something really huge but later we need to fix that problem properly.

I though proper solution would be to set timeout depending of amount of data to be exchanged... something like take slowest speed of SD/MMC that is allowed by spec and calculate delay based on how much time it might take for that slow device and for safety multiply it by say 2.

As fair as I remember, card provide this kind of information. We can try to investigate this possibility.

I'm pretty sure card may report a lot of info about itself. And if we look at what people do in Linux kernel we may see calculations of different timeouts in "drivers/mmc/core/core.c".

But keeping in mind we're not writing another OS kernel but we're just dealing with bootloader I'd prefer to keep things simple and do check only critical things like totally broken HW. I.e. something simple should be enough for U-Boot.

The standard doesn't specify the timeout for read/write commands, because it is not a constant value. To finish the command, card needs make some internal cleanup, which is called "Background operations" in Jedec standard.

There is no way to estimate the time to finish the "bkops", maybe the card's firmware also doesn't know when it finish.

The 4 min of timeout for background operations is probably a result after some tests.

If I good remember only erase timeout is specified by the standard, and can be estimated after read some values from EXT_CSD, if the vendor provides it.

...

...
...
Now from this thread I see that there're other reasons that might affect length of at least write operation. In other words it could be complicated unfortunately.

-Alexey

Some day, I had a device which couldn't boot - it goes down after a while, and the console was silent. The device was temporary broken.

After power-up with JTAG, and manually run the "background operations" on the card, the device could then boot.

So the conclusion was, that the timeout for read in the bootloader was to short, and the tired card tried to make some internal cleanup - so the read command could not be finished in the timeout estimated for fresh card.

The bkops are supported from eMMC 4.3, and are not supported for SD cards. In this case, Linux can trigger the card internal cleanup in it's idle state, so the read/write operation will probably finish as fast as possible.

It is going to be little frustrating, to wait for the fix, to such critical issue...

Best regards,

-- Przemyslaw Marczak Samsung R&D Institute Poland Samsung Electronics p.marczak@samsung.com

Przemyslaw Marczak

12:33 p.m.

Hi all,

On 09/09/2015 09:01 AM, Lukasz Majewski wrote:

...

Hi,

...
The commit: d9dbb97be0e4a550457aec5f11afefb446169c90 "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end of dw mmc transfer.

For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB) the default timeout is to short.

The new value - 20 seconds - takes into account the situation when SD card triggers internal clean up. Such process may take more than 10 seconds on some cards.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com Cc: Marek Vasut marex@denx.de Cc: Pantelis Antoniou panto@antoniou-consulting.com Cc: Tom Rini trini@konsulko.com

Are there any more questions regarding this patch or is it ready for submission as fix for v2015.10?

...
drivers/mmc/dw_mmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/dw_mmc.c b/drivers/mmc/dw_mmc.c index 77b87e0..21a92d2 100644 --- a/drivers/mmc/dw_mmc.c +++ b/drivers/mmc/dw_mmc.c @@ -213,7 +213,7 @@ static int dwmci_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, if (data) { start = get_timer(0);
timeout = 1000;
timeout = 20000;
for (;;) { mask = dwmci_readl(host, DWMCI_RINTSTS); /* Error during data transfer. */

I made some quick mmc command time test with the following code changes:

======================================================================== diff --git a/drivers/mmc/dw_mmc.c b/drivers/mmc/dw_mmc.c index a84c1e1..78e5d5d 100644 --- a/drivers/mmc/dw_mmc.c +++ b/drivers/mmc/dw_mmc.c @@ -113,7 +113,7 @@ static int dwmci_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, ALLOC_CACHE_ALIGN_BUFFER(struct dwmci_idmac, cur_idmac, data ? DIV_ROUND_UP(data->blocks, 8) : 0); int ret = 0, flags = 0, i; - unsigned int timeout = 100000; + unsigned int t = 0, timeout = 100000; u32 retry = 10000; u32 mask, ctrl; ulong start = get_timer(0); @@ -219,7 +219,7 @@ static int dwmci_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, mask = dwmci_readl(host, DWMCI_RINTSTS); /* Error during data transfer. */ if (mask & (DWMCI_DATA_ERR | DWMCI_DATA_TOUT)) { - debug("%s: DATA ERROR!\n", __func__); + printf("%s: DATA ERROR!\n", __func__); ret = -EINVAL; break; } @@ -231,13 +231,15 @@ static int dwmci_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, }

/* Check for timeout. */ - if (get_timer(start) > timeout) { - debug("%s: Timeout waiting for data!\n", + t = get_timer(start); + if (t > timeout) { + printf("%s: Timeout waiting for data!\n", __func__); ret = TIMEOUT; break; } } + printf("[MMC CMD] Tms: %u\n", t);

dwmci_writel(host, DWMCI_RINTSTS, mask); ========================================================================

Test info: ---------- 1. Tree: commit 850f788709cef8f7d53d571aec3bfb73b14c5531 "Merge branch 'rmobile' of git://git.denx.de/u-boot-sh" 2. SD card: SanDisk 32GB HC 4 3. eMMC card: Toshiba 16GB, 8-bit DDR mode 4. Board: Odroid XU3

Example usecase: ---------------- ODROID-XU3 # mmc read 0x40000000 0x0 0x800

MMC read: dev # 0, block # 0, count 2048 ... [MMC CMD] Tms: 46 2048 blocks read: OK

Time results for SD read/write: ------------------------------------ 1MB: 46ms / 220ms 5MB: 229ms / 931ms 10MB: 456ms / timeout

Time results for eMMC read/write: ------------------------------------ 1MB: 13ms / 46ms 5MB: 62ms / 260ms 10MB: 123ms / 515ms 20MB: 245ms / timeout

Timeout error for SD, write of 10MB data: ----------------------------------------- ODROID-XU3 # mmc write 0x40000000 0x0 0x5000

MMC write: dev # 0, block # 0, count 20480 ... dwmci_send_cmd: Timeout waiting for data! [MMC CMD] Tms: 1001 mmc write failed 0 blocks written: ERROR

I made one more test, by removing the SD card when reading the data: ---------------------------------------------------------------- ODROID-XU3 # mmc read 0x40000000 0x0 0xa000

MMC read: dev # 0, block # 0, count 40960 ... dwmci_send_cmd: DATA ERROR! [MMC CMD] Tus: 283 0 blocks read: ERROR

Summary ------- So, as we can see, the 1 second of timeout should be enough for small data size transfer. I also note some use cases for which, update the kernel of size ~4MB was failed. So this depends on the present card condition, and it can change for each use case - you never know, how much time card needs to finish the same command.

So, the conslusion is, that the 20-30 seconds of timeout seem to be okay. This will not introduce "unbounded loops", because the transfer status is read in the loop, and it works well - as the card removing example shows.

Please test this code on your platform and share your results.

Best regards,

-- Przemyslaw Marczak Samsung R&D Institute Poland Samsung Electronics p.marczak@samsung.com

Lukasz Majewski

25 Sep 25 Sep

6:25 p.m.

New subject: [U-Boot] [PATCH] mmc: dw_mmc: Increase timeout to 4 minutes (as in Linux kernel)

The commit: d9dbb97be0e4a550457aec5f11afefb446169c90 "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end of dw mmc transfer.

For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB) the default timeout is to short.

The new value - 4 minutes (240 seconds) - is the same as the one used in Linux kernel driver. Such fix should be good enough until we come up with better fix for this issue.

diff --git a/drivers/mmc/dw_mmc.c b/drivers/mmc/dw_mmc.c index a84c1e1..26d34ae 100644 --- a/drivers/mmc/dw_mmc.c +++ b/drivers/mmc/dw_mmc.c @@ -214,7 +214,7 @@ static int dwmci_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd,

if (data) { start = get_timer(0); - timeout = 1000; + timeout = 240000; for (;;) { mask = dwmci_readl(host, DWMCI_RINTSTS); /* Error during data transfer. */

-- 2.0.0.rc2

Przemyslaw Marczak

28 Sep 28 Sep

3:43 p.m.

New subject: [U-Boot] [PATCH] mmc: dw_mmc: Increase timeout to 4 minutes (as in Linux kernel)

Hi Lukasz,

On 09/25/2015 06:25 PM, Lukasz Majewski wrote:

...

The commit: d9dbb97be0e4a550457aec5f11afefb446169c90 "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end of dw mmc transfer.

For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB) the default timeout is to short.

The new value - 4 minutes (240 seconds) - is the same as the one used in Linux kernel driver. Such fix should be good enough until we come up with better fix for this issue.

I think, that you should adjust the commit message, because it describes the result of the real problem which is in the background.

It doesn't fix the DFU, because it wasn't broken. What is the real problem and how does this commit fix it?

...

Signed-off-by: Lukasz Majewski l.majewski@samsung.com Cc: Marek Vasut marex@denx.de Cc: Pantelis Antoniou panto@antoniou-consulting.com Cc: Tom Rini trini@konsulko.com

drivers/mmc/dw_mmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/dw_mmc.c b/drivers/mmc/dw_mmc.c index a84c1e1..26d34ae 100644 --- a/drivers/mmc/dw_mmc.c +++ b/drivers/mmc/dw_mmc.c @@ -214,7 +214,7 @@ static int dwmci_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd,

if (data) { start = get_timer(0);
timeout = 1000;
timeout = 240000;
for (;;) { mask = dwmci_readl(host, DWMCI_RINTSTS); /* Error during data transfer. */

I tested this patch on Odroid XU3 and Odroid U3, with SD and eMMC cards.

For read/write data of size 1800MB, it doesn't break the operation. Read takes about one minute for some eMMC card, and the write takes about 2 and a half minute.

The normal read/write operations are done partially for big size data, so it should be good, which shows test with write 1800MB to SD card.

Writing 1800MB of data takes about 8 minutes on some SD card and then I can see only the info that something was started - but I don't know if it's going on, or the board is dead... So maybe we should get some info from this part of code, that the transfer is going on, e.g. after each next 10% of data portion?

Tested-by: Przemyslaw Marczak p.marczak@samsung.com

Best regards,

-- Przemyslaw Marczak Samsung R&D Institute Poland Samsung Electronics p.marczak@samsung.com

Tom Rini

11:08 p.m.

New subject: [U-Boot] [PATCH] mmc: dw_mmc: Increase timeout to 4 minutes (as in Linux kernel)

On Mon, Sep 28, 2015 at 03:43:50PM +0200, Przemyslaw Marczak wrote:

...

Hi Lukasz,

On 09/25/2015 06:25 PM, Lukasz Majewski wrote:

...
The commit: d9dbb97be0e4a550457aec5f11afefb446169c90 "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end of dw mmc transfer.

For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB) the default timeout is to short.

The new value - 4 minutes (240 seconds) - is the same as the one used in Linux kernel driver. Such fix should be good enough until we come up with better fix for this issue.

I think, that you should adjust the commit message, because it describes the result of the real problem which is in the background.

It doesn't fix the DFU, because it wasn't broken. What is the real problem and how does this commit fix it?

I think this message is good enough (esp since I want to apply it now) and will link back to all of these other messages about it for details.

-- Tom

Tom Rini

11:08 p.m.

New subject: [U-Boot] mmc: dw_mmc: Increase timeout to 4 minutes (as in Linux kernel)

On Fri, Sep 25, 2015 at 06:25:25PM +0200, Łukasz Majewski wrote:

...

The commit: d9dbb97be0e4a550457aec5f11afefb446169c90 "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end of dw mmc transfer.

For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB) the default timeout is to short.

The new value - 4 minutes (240 seconds) - is the same as the one used in Linux kernel driver. Such fix should be good enough until we come up with better fix for this issue.

Signed-off-by: Lukasz Majewski l.majewski@samsung.com Cc: Marek Vasut marex@denx.de Cc: Pantelis Antoniou panto@antoniou-consulting.com Cc: Tom Rini trini@konsulko.com Tested-by: Przemyslaw Marczak p.marczak@samsung.com

Applied to u-boot/master, thanks!

-- Tom

3508

Age (days ago)

3539

Last active (days ago)

List overview

Download

49 comments

9 participants

tags (0)

participants (9)

Alexey Brodkin
Lukasz Majewski
Lukasz Majewski
Marek Vasut
Pantelis Antoniou
Przemyslaw Marczak
Simon Glass
Stephen Warren
Tom Rini