[U-Boot] [PATCH 1/2] mtd: vf610_nfc: remove caching of page in buffer

To improve performance we remember the current page in the buffer and avoid reading it twice. This implicit page cache increases complexity while does not increase performance in real world cases. This patch removes that feature. --- As discussed in the other patchset... http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/215802
...I did some performance measurements:
Time to "Starting kernel ..." - without bad block scan & with UBIFS fastmap: 2.02s - with bad block scan & with UBIFS fastmap: 3.99s - without bad block scan & without UBIFS fastmap: 4.42s - with bad block scan & without UBIFS fastmap: 6.38s
Without page cache (with this patch applied): Time to "Starting kernel ..." - without bad block scan & with UBIFS fastmap: 2.02s - with bad block scan & with UBIFS fastmap: 4.01s - without bad block scan & without UBIFS fastmap: 4.41s - with bad block scan & without UBIFS fastmap: 6.39s
I then checked whether that reread occurs at least once with a printf in the "Already read?" if. It looks like this access pattern does not occure (anymore?) during a normal boot.
Hence I now vote to get rid of it too...
drivers/mtd/nand/vf610_nfc.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-)
diff --git a/drivers/mtd/nand/vf610_nfc.c b/drivers/mtd/nand/vf610_nfc.c index d98dd28..fa0bb9d 100644 --- a/drivers/mtd/nand/vf610_nfc.c +++ b/drivers/mtd/nand/vf610_nfc.c @@ -147,7 +147,6 @@ struct vf610_nfc { uint column; int spareonly; int page_sz; - int page; /* Status and ID are in alternate locations. */ int alt_buf; #define ALT_BUF_ID 1 @@ -347,7 +346,6 @@ static void vf610_nfc_command(struct mtd_info *mtd, unsigned command,
switch (command) { case NAND_CMD_PAGEPROG: - nfc->page = -1; vf610_nfc_transfer_size(nfc->regs, nfc->page_sz); vf610_nfc_send_commands(nfc->regs, NAND_CMD_SEQIN, command, PROGRAM_PAGE_CMD_CODE); @@ -367,10 +365,6 @@ static void vf610_nfc_command(struct mtd_info *mtd, unsigned command, case NAND_CMD_SEQIN: /* Pre-read for partial writes. */ case NAND_CMD_READ0: column = 0; - /* Already read? */ - if (nfc->page == page) - return; - nfc->page = page; vf610_nfc_transfer_size(nfc->regs, nfc->page_sz); vf610_nfc_send_commands(nfc->regs, NAND_CMD_READ0, NAND_CMD_READSTART, READ_PAGE_CMD_CODE); @@ -378,7 +372,6 @@ static void vf610_nfc_command(struct mtd_info *mtd, unsigned command, break;
case NAND_CMD_ERASE1: - nfc->page = -1; vf610_nfc_transfer_size(nfc->regs, 0); vf610_nfc_send_commands(nfc->regs, command, NAND_CMD_ERASE2, ERASE_CMD_CODE); @@ -532,10 +525,8 @@ static inline int vf610_nfc_correct_data(struct mtd_info *mtd, u_char *dat) flip = count_written_bits(dat, nfc->chip.ecc.size, ecc_count);
/* ECC failed. */ - if (flip > ecc_count) { - nfc->page = -1; + if (flip > ecc_count) return -1; - }
/* Erased page. */ memset(dat, 0xff, nfc->chip.ecc.size);

Implement read of OOB area only. When using column and sector size properties, only parts of the page can be read. However, this works only when hardware ECC is disabled, otherwise the ECC engine would ruin the data in the buffer. To allow OOB only reads, three points had to be addressed: - Set ECC mode per command. - Handle NAND_CMD_READOOB seperate. Make sure column and sector size is correctly set up, while disabling ECC. - Now, the OOB data end up at the beginning of the buffer. Remove the special handling of OOB (spareonly).
Especially bad block scans benefit from this change. On a 512MiB SLC NAND device, the bad block scan took 1.5s less than before. --- During the testing of the first patch, I came accross this bottleneck. The same measurements as done in the first patch:
Time to "Starting kernel ..." - without bad block scan & with UBIFS fastmap: 2.01s - with bad block scan & with UBIFS fastmap: 2.53s - without bad block scan & without UBIFS fastmap: 4.41s - with bad block scan & without UBIFS fastmap: 4.95s
While we move more complexity into vf610_nfc_command, given the simplification of code overall and the improved bad block scanning performance, it's a worthwhile change.
drivers/mtd/nand/vf610_nfc.c | 103 ++++++++++++++++++------------------------- 1 file changed, 42 insertions(+), 61 deletions(-)
diff --git a/drivers/mtd/nand/vf610_nfc.c b/drivers/mtd/nand/vf610_nfc.c index fa0bb9d..75c2493 100644 --- a/drivers/mtd/nand/vf610_nfc.c +++ b/drivers/mtd/nand/vf610_nfc.c @@ -145,8 +145,6 @@ struct vf610_nfc { struct nand_chip chip; void __iomem *regs; uint column; - int spareonly; - int page_sz; /* Status and ID are in alternate locations. */ int alt_buf; #define ALT_BUF_ID 1 @@ -319,8 +317,8 @@ static void vf610_nfc_addr_cycle(struct mtd_info *mtd, int column, int page) { if (column != -1) { struct vf610_nfc *nfc = mtd_to_nfc(mtd); - if (nfc->chip.options | NAND_BUSWIDTH_16) - column = column/2; + if (nfc->chip.options & NAND_BUSWIDTH_16) + column = column / 2; vf610_nfc_set_field(mtd, NFC_COL_ADDR, COL_ADDR_MASK, COL_ADDR_SHIFT, column); } @@ -329,6 +327,13 @@ static void vf610_nfc_addr_cycle(struct mtd_info *mtd, int column, int page) ROW_ADDR_SHIFT, page); }
+static inline void vf610_nfc_ecc_mode(struct mtd_info *mtd, int ecc_mode) +{ + vf610_nfc_set_field(mtd, NFC_FLASH_CONFIG, + CONFIG_ECC_MODE_MASK, + CONFIG_ECC_MODE_SHIFT, ecc_mode); +} + static inline void vf610_nfc_transfer_size(void __iomem *regbase, int size) { __raw_writel(size, regbase + NFC_SECTOR_SIZE); @@ -339,36 +344,48 @@ static void vf610_nfc_command(struct mtd_info *mtd, unsigned command, int column, int page) { struct vf610_nfc *nfc = mtd_to_nfc(mtd); + int page_sz = nfc->chip.options & NAND_BUSWIDTH_16 ? 1 : 0;
- nfc->column = max(column, 0); - nfc->spareonly = 0; - nfc->alt_buf = 0; + nfc->column = max(column, 0); + nfc->alt_buf = 0;
switch (command) { case NAND_CMD_PAGEPROG: - vf610_nfc_transfer_size(nfc->regs, nfc->page_sz); + page_sz += mtd->writesize + mtd->oobsize; + vf610_nfc_transfer_size(nfc->regs, page_sz); vf610_nfc_send_commands(nfc->regs, NAND_CMD_SEQIN, command, PROGRAM_PAGE_CMD_CODE); vf610_nfc_addr_cycle(mtd, column, page); + vf610_nfc_ecc_mode(mtd, ECC_45_BYTE); break;
case NAND_CMD_RESET: vf610_nfc_transfer_size(nfc->regs, 0); vf610_nfc_send_command(nfc->regs, command, RESET_CMD_CODE); break; + + case NAND_CMD_READOOB: + page_sz += mtd->oobsize; + column = mtd->writesize; + vf610_nfc_transfer_size(nfc->regs, page_sz); + vf610_nfc_send_commands(nfc->regs, NAND_CMD_READ0, + NAND_CMD_READSTART, READ_PAGE_CMD_CODE); + vf610_nfc_addr_cycle(mtd, column, page); + vf610_nfc_ecc_mode(mtd, ECC_BYPASS); + break; /* - * NFC does not support sub-page reads and writes, + * NFC does not support sub-page reads and writes when using HW-ECC, * so emulate them using full page transfers. */ - case NAND_CMD_READOOB: - nfc->spareonly = 1; case NAND_CMD_SEQIN: /* Pre-read for partial writes. */ case NAND_CMD_READ0: + page_sz += mtd->writesize + mtd->oobsize; column = 0; - vf610_nfc_transfer_size(nfc->regs, nfc->page_sz); + vf610_nfc_transfer_size(nfc->regs, page_sz); vf610_nfc_send_commands(nfc->regs, NAND_CMD_READ0, NAND_CMD_READSTART, READ_PAGE_CMD_CODE); vf610_nfc_addr_cycle(mtd, column, page); + vf610_nfc_ecc_mode(mtd, ECC_45_BYTE); break;
case NAND_CMD_ERASE1: @@ -397,46 +414,26 @@ static void vf610_nfc_command(struct mtd_info *mtd, unsigned command, vf610_nfc_done(mtd); }
-static inline void vf610_nfc_read_spare(struct mtd_info *mtd, void *buf, - int len) -{ - struct vf610_nfc *nfc = mtd_to_nfc(mtd); - - len = min(mtd->oobsize, (uint)len); - if (len > 0) - vf610_nfc_memcpy(buf, nfc->regs + mtd->writesize, len); -} - /* Read data from NFC buffers */ static void vf610_nfc_read_buf(struct mtd_info *mtd, u_char *buf, int len) { struct vf610_nfc *nfc = mtd_to_nfc(mtd); uint c = nfc->column; - uint l;
- /* Handle main area */ - if (!nfc->spareonly) { - l = min((uint)len, mtd->writesize - c); - nfc->column += l; - - if (!nfc->alt_buf) - vf610_nfc_memcpy(buf, nfc->regs + NFC_MAIN_AREA(0) + c, - l); - else - if (nfc->alt_buf & ALT_BUF_ID) - *buf = vf610_nfc_get_id(mtd, c); - else - *buf = vf610_nfc_get_status(mtd); - - buf += l; - len -= l; + switch (nfc->alt_buf) + { + case ALT_BUF_ID: + *buf = vf610_nfc_get_id(mtd, c); + break; + case ALT_BUF_STAT: + *buf = vf610_nfc_get_status(mtd); + break; + default: + vf610_nfc_memcpy(buf, nfc->regs + NFC_MAIN_AREA(0) + c, len); + break; }
- /* Handle spare area access */ - if (len) { - nfc->column += len; - vf610_nfc_read_spare(mtd, buf, len); - } + nfc->column += len; }
/* Write data to NFC buffers */ @@ -627,17 +624,9 @@ static int vf610_nfc_nand_init(int devnum, void __iomem *addr) if (cfg.flash_bbt) chip->bbt_options = NAND_BBT_USE_FLASH | NAND_BBT_CREATE;
- /* Default to software ECC until flash ID. */ - vf610_nfc_set_field(mtd, NFC_FLASH_CONFIG, - CONFIG_ECC_MODE_MASK, - CONFIG_ECC_MODE_SHIFT, ECC_BYPASS); - chip->bbt_td = &bbt_main_descr; chip->bbt_md = &bbt_mirror_descr;
- nfc->page_sz = PAGE_2K + OOB_64; - nfc->page_sz += cfg.width == 16 ? 1 : 0; - /* Set configuration register. */ vf610_nfc_clear(mtd, NFC_FLASH_CONFIG, CONFIG_ADDR_AUTO_INCR_BIT); vf610_nfc_clear(mtd, NFC_FLASH_CONFIG, CONFIG_BUFNO_AUTO_INCR_BIT); @@ -665,15 +654,12 @@ static int vf610_nfc_nand_init(int devnum, void __iomem *addr)
chip->ecc.mode = NAND_ECC_SOFT; /* default */
- nfc->page_sz = mtd->writesize + mtd->oobsize; - /* Single buffer only, max 256 OOB minus ECC status */ - if (nfc->page_sz > PAGE_2K + 256 - 8) { + if (mtd->writesize + mtd->oobsize > PAGE_2K + 256 - 8) { dev_err(nfc->dev, "Unsupported flash size\n"); err = -ENXIO; goto error; } - nfc->page_sz += cfg.width == 16 ? 1 : 0;
if (cfg.hardware_ecc) { if (mtd->writesize != PAGE_2K && mtd->oobsize < 64) { @@ -694,11 +680,6 @@ static int vf610_nfc_nand_init(int devnum, void __iomem *addr) chip->ecc.size = PAGE_2K; chip->ecc.strength = 24;
- /* set ECC mode to 45 bytes OOB with 24 bits correction */ - vf610_nfc_set_field(mtd, NFC_FLASH_CONFIG, - CONFIG_ECC_MODE_MASK, - CONFIG_ECC_MODE_SHIFT, ECC_45_BYTE); - /* Enable ECC_STATUS */ vf610_nfc_set(mtd, NFC_FLASH_CONFIG, CONFIG_ECC_SRAM_REQ_BIT); }

On 2 Apr 2015, stefan@agner.ch wrote:
To improve performance we remember the current page in the buffer and avoid reading it twice. This implicit page cache increases complexity while does not increase performance in real world cases. This patch removes that feature.
As discussed in the other patchset... http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/215802
...I did some performance measurements:
Time to "Starting kernel ..."
- without bad block scan & with UBIFS fastmap: 2.02s
- with bad block scan & with UBIFS fastmap: 3.99s
- without bad block scan & without UBIFS fastmap: 4.42s
- with bad block scan & without UBIFS fastmap: 6.38s
Without page cache (with this patch applied): Time to "Starting kernel ..."
- without bad block scan & with UBIFS fastmap: 2.02s
- with bad block scan & with UBIFS fastmap: 4.01s
- without bad block scan & without UBIFS fastmap: 4.41s
- with bad block scan & without UBIFS fastmap: 6.39s
[snip]
I also measured 'write performance' with the mtd_speedtest (performing similar patch to the Linux driver) and I see no difference. I think a write benchmark is more appropriate to test this functionality? While at least it seems that neither read nor write is affected by the simplification.
Fwiw, Bill Pringlemeir.

On 2015-04-02 22:30, Bill Pringlemeir wrote:
On 2 Apr 2015, stefan@agner.ch wrote:
To improve performance we remember the current page in the buffer and avoid reading it twice. This implicit page cache increases complexity while does not increase performance in real world cases. This patch removes that feature.
As discussed in the other patchset... http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/215802
...I did some performance measurements:
Time to "Starting kernel ..."
- without bad block scan & with UBIFS fastmap: 2.02s
- with bad block scan & with UBIFS fastmap: 3.99s
- without bad block scan & without UBIFS fastmap: 4.42s
- with bad block scan & without UBIFS fastmap: 6.38s
Without page cache (with this patch applied): Time to "Starting kernel ..."
- without bad block scan & with UBIFS fastmap: 2.02s
- with bad block scan & with UBIFS fastmap: 4.01s
- without bad block scan & without UBIFS fastmap: 4.41s
- with bad block scan & without UBIFS fastmap: 6.39s
[snip]
I also measured 'write performance' with the mtd_speedtest (performing similar patch to the Linux driver) and I see no difference. I think a write benchmark is more appropriate to test this functionality? While at least it seems that neither read nor write is affected by the simplification.
On U-Boot, I just benchmarked the overall boot time since this is most important for us. I plan to (re-)integrate the changes into the Linux driver and check the performance again later this week.
Thanks for for the write test. So I can take this as a Ack?
I will send all the NFC changes in one patchset as v2 probably later today.
-- Stefan

On 7 Apr 2015, stefan@agner.ch wrote:
On 2015-04-02 22:30, Bill Pringlemeir wrote:
On 2 Apr 2015, stefan@agner.ch wrote:
[snip]
I also measured 'write performance' with the mtd_speedtest (performing similar patch to the Linux driver) and I see no difference. I think a write benchmark is more appropriate to test this functionality? While at least it seems that neither read nor write is affected by the simplification.
On U-Boot, I just benchmarked the overall boot time since this is most important for us. I plan to (re-)integrate the changes into the Linux driver and check the performance again later this week.
Thanks for for the write test. So I can take this as a Ack?
Of course, if you want it,
Acked-by: Bill Pringlemeir bpringlemeir@nbsps.com
The OOB patch also significantly decreases UbiFS mounting time in Linux. I load Linux itself via tftp/network and not using u-boot with nand. I guess I should try that.
I will send all the NFC changes in one patchset as v2 probably later today.

On 2015-04-07 16:24, Bill Pringlemeir wrote:
On 7 Apr 2015, stefan@agner.ch wrote:
On 2015-04-02 22:30, Bill Pringlemeir wrote:
On 2 Apr 2015, stefan@agner.ch wrote:
[snip]
I also measured 'write performance' with the mtd_speedtest (performing similar patch to the Linux driver) and I see no difference. I think a write benchmark is more appropriate to test this functionality? While at least it seems that neither read nor write is affected by the simplification.
On U-Boot, I just benchmarked the overall boot time since this is most important for us. I plan to (re-)integrate the changes into the Linux driver and check the performance again later this week.
Thanks for for the write test. So I can take this as a Ack?
Of course, if you want it,
Acked-by: Bill Pringlemeir bpringlemeir@nbsps.com
Thx
The OOB patch also significantly decreases UbiFS mounting time in Linux. I load Linux itself via tftp/network and not using u-boot with nand. I guess I should try that.
Is it UBIFS mounting time or is it bad block scanning? Afaik, it should speed up the latter significantly, but I don't see why it should speed up the former.
-- Stefan

On 7 Apr 2015, stefan@agner.ch wrote:
On 2015-04-07 16:24, Bill Pringlemeir wrote:
The OOB patch also significantly decreases UbiFS mounting time in Linux. I load Linux itself via tftp/network and not using u-boot with nand. I guess I should try that.
Is it UBIFS mounting time or is it bad block scanning? Afaik, it should speed up the latter significantly, but I don't see why it should speed up the former.
It is both,
* no changes.
[0.840632] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xca [0.964838] 0x000001040000-0x000010000000 : "root" .124s
base ubi0 mount is .833585s
* improved READ_OOB
[0.638869] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xca [0.707994] 0x000001040000-0x000010000000 : "root" .069s
base ubi0 mount 1/10s reduction. .738204s
This is for a 256MB device. The Ubi mount/scan time is not completely in-significant.
For instance, here is my last run with improved READ_OOB
[ 0.942538] ubi0: attaching mtd3 ... [ 1.680742] ubi0: background thread "ubi_bgt0d" started, PID 104
This is my 'base ubi0 mount' numbers.
The time is slightly different than what I recorded previously. I booted several times without 'READ_OOB' and the times were consistently '.83xxS'. It is possible that the initial BBT scan being quicker altered something; so it is not UBI use of OOB. I am not sure. I just noted that it was ??significantly?? different.
Regards, Bill Pringlemeir.
participants (2)
-
Bill Pringlemeir
-
Stefan Agner