[U-Boot] [PATCH RFC] NAND: Improve read performance from Large Page NAND devices

Improve read performance from Large Page NAND devices.
This patch produces a ~31% improvement in oob_first read speed (on a 300MHz ARM9). The time for a mid-buffer 2k page read is now 293us, 6.99MB/s (was 385us, 5.31MB/s). oob_first is probably the best case improvement.
Signed-off-by: Nick Thompson nick.thompson@ge.com ---
From the last RFC:
Added a new config option CONFIG_NAND_READ_CMD_NO_WAIT. Not defined on any platform (except my local da830evm), this enables the main point of the patch - processing, usually ECC and looping constructs, while waiting for the NAND to load its cache.
Without the new config option, the code should behave as before and the patch simply layers the code a little better. However, it still allows page_reads to control there own command sequences, saving time most notably in the oob_first page read. The nand_wait_cache_load function optimises away to nothing.
[This option should allow boards that define there own cmdfunc, rather than defining a new page_read function, to continue as is for now. The cost is that the pre-fetch read optimisations are not automatically made available]
With the new option, read commands can be sent to the NAND, but processing can continue until the data is actually required. It can easily take longer to process ECCs than the NAND takes to load its cache, so the cache load time overhead disappears into a parallel operation.
nand_wait_cache_load never falls back to a udelay. Significant time should have passed and a fixed delay is not acceptable. Either the ready/busy pin is probed, or the NAND status register is directly read (as in write and erase operations currently).
diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c index 7171bdd..78725d1 100644 --- a/drivers/mtd/nand/nand_base.c +++ b/drivers/mtd/nand/nand_base.c @@ -90,6 +90,26 @@ #define CONFIG_SYS_NAND_RESET_CNT 200000 #endif
+/* NAND Read States */ +#define NAND_RSTATE_INIT 0 +#define NAND_RSTATE_LAST (1 << 31) /* Don't pre-request next page */ + +#define nand_rstate_is_init(x) ((x & ~NAND_RSTATE_LAST) == NAND_RSTATE_INIT) +#define nand_rstate_is_last(x) (x & NAND_RSTATE_LAST) + +/* Is the device a Large Page device? */ +#ifdef CONFIG_NAND_NO_SMALL_PAGE +#define nand_is_lp_device(mtd) 1 +#else +#define nand_is_lp_device(mtd) (mtd->writesize > 512) +#endif + +#ifdef CONFIG_NAND_READ_CMD_NO_WAIT +#define read_waits 0 +#else +#define read_waits 1 +#endif + /* Define default oob placement schemes for large and small page devices */ static struct nand_ecclayout nand_oob_8 = { .eccbytes = 3, @@ -143,6 +163,8 @@ static int nand_do_write_oob(struct mtd_info *mtd, loff_t to,
static int nand_wait(struct mtd_info *mtd, struct nand_chip *this);
+static int nand_wait_cache_load(struct mtd_info *mtd, struct nand_chip *this); + /* * For devices which display every fart in the system on a separate LED. Is * compiled away when LED support is disabled. @@ -385,6 +407,7 @@ static int nand_block_bad(struct mtd_info *mtd, loff_t ofs, int getchip) if (chip->options & NAND_BUSWIDTH_16) { chip->cmdfunc(mtd, NAND_CMD_READOOB, chip->badblockpos & 0xFE, page); + nand_wait_cache_load(mtd, chip); bad = cpu_to_le16(chip->read_word(mtd)); if (chip->badblockpos & 0x1) bad >>= 8; @@ -392,6 +415,7 @@ static int nand_block_bad(struct mtd_info *mtd, loff_t ofs, int getchip) res = 1; } else { chip->cmdfunc(mtd, NAND_CMD_READOOB, chip->badblockpos, page); + nand_wait_cache_load(mtd, chip); if (chip->read_byte(mtd) != 0xff) res = 1; } @@ -534,8 +558,9 @@ void nand_wait_ready(struct mtd_info *mtd) * Send command to NAND device. This function is used for small page * devices (256/512 Bytes per page) */ -static void nand_command(struct mtd_info *mtd, unsigned int command, - int column, int page_addr) +static void __attribute__((unused)) nand_command(struct mtd_info *mtd, + unsigned int command, + int column, int page_addr) { register struct nand_chip *chip = mtd->priv; int ctrl = NAND_CTRL_CLE | NAND_CTRL_CHANGE; @@ -644,6 +669,8 @@ static void nand_command_lp(struct mtd_info *mtd, unsigned int command, { register struct nand_chip *chip = mtd->priv; uint32_t rst_sts_cnt = CONFIG_SYS_NAND_RESET_CNT; + void (*cmd_ctrl)(struct mtd_info *mtd, int cmd, unsigned int ctrl); + cmd_ctrl = chip->cmd_ctrl;
/* Emulate NAND_CMD_READOOB */ if (command == NAND_CMD_READOOB) { @@ -663,21 +690,21 @@ static void nand_command_lp(struct mtd_info *mtd, unsigned int command, /* Adjust columns for 16 bit buswidth */ if (chip->options & NAND_BUSWIDTH_16) column >>= 1; - chip->cmd_ctrl(mtd, column, ctrl); + cmd_ctrl(mtd, column, ctrl); ctrl &= ~NAND_CTRL_CHANGE; - chip->cmd_ctrl(mtd, column >> 8, ctrl); + cmd_ctrl(mtd, column >> 8, ctrl); } if (page_addr != -1) { - chip->cmd_ctrl(mtd, page_addr, ctrl); - chip->cmd_ctrl(mtd, page_addr >> 8, + cmd_ctrl(mtd, page_addr, ctrl); + cmd_ctrl(mtd, page_addr >> 8, NAND_NCE | NAND_ALE); /* One more address cycle for devices > 128MiB */ if (chip->chipsize > (128 << 20)) - chip->cmd_ctrl(mtd, page_addr >> 16, - NAND_NCE | NAND_ALE); + cmd_ctrl(mtd, page_addr >> 16, + NAND_NCE | NAND_ALE); } } - chip->cmd_ctrl(mtd, NAND_CMD_NONE, NAND_NCE | NAND_CTRL_CHANGE); + cmd_ctrl(mtd, NAND_CMD_NONE, NAND_NCE | NAND_CTRL_CHANGE);
/* * program and erase have their own busy handlers @@ -710,29 +737,31 @@ static void nand_command_lp(struct mtd_info *mtd, unsigned int command, if (chip->dev_ready) break; udelay(chip->chip_delay); - chip->cmd_ctrl(mtd, NAND_CMD_STATUS, - NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE); - chip->cmd_ctrl(mtd, NAND_CMD_NONE, - NAND_NCE | NAND_CTRL_CHANGE); + cmd_ctrl(mtd, NAND_CMD_STATUS, + NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE); + cmd_ctrl(mtd, NAND_CMD_NONE, + NAND_NCE | NAND_CTRL_CHANGE); while (!(chip->read_byte(mtd) & NAND_STATUS_READY) && (rst_sts_cnt--)); return;
case NAND_CMD_RNDOUT: /* No ready / busy check necessary */ - chip->cmd_ctrl(mtd, NAND_CMD_RNDOUTSTART, - NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE); - chip->cmd_ctrl(mtd, NAND_CMD_NONE, - NAND_NCE | NAND_CTRL_CHANGE); + cmd_ctrl(mtd, NAND_CMD_RNDOUTSTART, + NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE); + cmd_ctrl(mtd, NAND_CMD_NONE, + NAND_NCE | NAND_CTRL_CHANGE); return;
case NAND_CMD_READ0: - chip->cmd_ctrl(mtd, NAND_CMD_READSTART, - NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE); - chip->cmd_ctrl(mtd, NAND_CMD_NONE, - NAND_NCE | NAND_CTRL_CHANGE); + cmd_ctrl(mtd, NAND_CMD_READSTART, + NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE); + cmd_ctrl(mtd, NAND_CMD_NONE, + NAND_NCE | NAND_CTRL_CHANGE); + if (!read_waits) + return;
- /* This applies to read commands */ + /* read falls through if reading should wait */ default: /* * If we don't have access to the busy pin, we apply the given @@ -866,11 +895,6 @@ static int nand_wait(struct mtd_info *mtd, struct nand_chip *this) reset_timer();
while (1) { - if (get_timer(0) > timeo) { - printf("Timeout!"); - return 0x01; - } - if (this->dev_ready) { if (this->dev_ready(mtd)) break; @@ -878,7 +902,13 @@ static int nand_wait(struct mtd_info *mtd, struct nand_chip *this) if (this->read_byte(mtd) & NAND_STATUS_READY) break; } + + if (get_timer(0) > timeo) { + printf("Timeout!"); + return 0x01; + } } + #ifdef PPCHAMELON_NAND_TIMER_HACK reset_timer(); while (get_timer(0) < 10); @@ -889,19 +919,53 @@ static int nand_wait(struct mtd_info *mtd, struct nand_chip *this) #endif
/** + * Wait for cache ready after read request. + * + * Returns to read state before returning. + * + * @mtd: mtd info structure + * @chip: nand chip info structure + */ +static int nand_wait_cache_load(struct mtd_info *mtd, struct nand_chip *chip) +{ + if (nand_is_lp_device(mtd) && !read_waits) { + int state = nand_wait(mtd, chip); + chip->cmd_ctrl(mtd, NAND_CMD_READSTART, NAND_CTRL_CLE | + NAND_CTRL_CHANGE); + chip->cmd_ctrl(mtd, NAND_CMD_NONE, NAND_NCE | + NAND_CTRL_CHANGE); + return state; + } else + return 0; +} + +/** * nand_read_page_raw - [Intern] read raw page data without ecc * @mtd: mtd info structure * @chip: nand chip info structure * @buf: buffer to store read data * @page: page number to read - * - * Not for syndrome calculating ecc controllers, which use a special oob layout + * @rstate: read state */ static int nand_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip, - uint8_t *buf, int page) + uint8_t *buf, int page, uint32_t *rstate) { + if (nand_rstate_is_init(*rstate)) { + chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page); + (*rstate)++; + } + + nand_wait_cache_load(mtd, chip); + chip->read_buf(mtd, buf, mtd->writesize); chip->read_buf(mtd, chip->oob_poi, mtd->oobsize); + + if (!nand_rstate_is_last(*rstate)) { + if (!NAND_CANAUTOINCR(chip) || + ((page + 1) & NAND_BLOCK_MASK(chip)) == 0) + chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page + 1); + } + return 0; }
@@ -911,17 +975,25 @@ static int nand_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip, * @chip: nand chip info structure * @buf: buffer to store read data * @page: page number to read + * @rstate: read state * * We need a special oob layout and handling even when OOB isn't used. */ static int nand_read_page_raw_syndrome(struct mtd_info *mtd, struct nand_chip *chip, - uint8_t *buf, int page) + uint8_t *buf, int page, uint32_t *rstate) { int eccsize = chip->ecc.size; int eccbytes = chip->ecc.bytes; uint8_t *oob = chip->oob_poi; int steps, size;
+ if (nand_rstate_is_init(*rstate)) { + chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page); + (*rstate)++; + } + + nand_wait_cache_load(mtd, chip); + for (steps = chip->ecc.steps; steps > 0; steps--) { chip->read_buf(mtd, buf, eccsize); buf += eccsize; @@ -944,6 +1016,12 @@ static int nand_read_page_raw_syndrome(struct mtd_info *mtd, struct nand_chip *c if (size) chip->read_buf(mtd, oob, size);
+ if (!nand_rstate_is_last(*rstate)) { + if (!NAND_CANAUTOINCR(chip) || + ((page + 1) & NAND_BLOCK_MASK(chip)) == 0) + chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page + 1); + } + return 0; }
@@ -953,9 +1031,10 @@ static int nand_read_page_raw_syndrome(struct mtd_info *mtd, struct nand_chip *c * @chip: nand chip info structure * @buf: buffer to store read data * @page: page number to read + * @rstate: read state */ static int nand_read_page_swecc(struct mtd_info *mtd, struct nand_chip *chip, - uint8_t *buf, int page) + uint8_t *buf, int page, uint32_t *rstate) { int i, eccsize = chip->ecc.size; int eccbytes = chip->ecc.bytes; @@ -965,7 +1044,7 @@ static int nand_read_page_swecc(struct mtd_info *mtd, struct nand_chip *chip, uint8_t *ecc_code = chip->buffers->ecccode; uint32_t *eccpos = chip->ecc.layout->eccpos;
- chip->ecc.read_page_raw(mtd, chip, buf, page); + chip->ecc.read_page_raw(mtd, chip, buf, page, rstate);
for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) chip->ecc.calculate(mtd, p, &ecc_calc[i]); @@ -995,8 +1074,12 @@ static int nand_read_page_swecc(struct mtd_info *mtd, struct nand_chip *chip, * @data_offs: offset of requested data within the page * @readlen: data length * @bufpoi: buffer to store read data + * @page: page to read + * @rstate: page read state */ -static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip, uint32_t data_offs, uint32_t readlen, uint8_t *bufpoi) +static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip, + uint32_t data_offs, uint32_t readlen, + uint8_t *bufpoi, int page, uint32_t *rstate) { int start_step, end_step, num_steps; uint32_t *eccpos = chip->ecc.layout->eccpos; @@ -1005,6 +1088,11 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip, uint3 int datafrag_len, eccfrag_len, aligned_len, aligned_pos; int busw = (chip->options & NAND_BUSWIDTH_16) ? 2 : 1;
+ if (nand_rstate_is_init(*rstate)) { + chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page); + (*rstate)++; + } + /* Column address wihin the page aligned to ECC size (256bytes). */ start_step = data_offs / chip->ecc.size; end_step = (data_offs + readlen - 1) / chip->ecc.size; @@ -1015,6 +1103,9 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip, uint3 eccfrag_len = num_steps * chip->ecc.bytes;
data_col_addr = start_step * chip->ecc.size; + + nand_wait_cache_load(mtd, chip); + /* If we read not a page aligned data */ if (data_col_addr != 0) chip->cmdfunc(mtd, NAND_CMD_RNDOUT, data_col_addr, -1); @@ -1053,6 +1144,12 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip, uint3 chip->read_buf(mtd, &chip->oob_poi[aligned_pos], aligned_len); }
+ if (!nand_rstate_is_last(*rstate)) { + if (!NAND_CANAUTOINCR(chip) || + ((page + 1) & NAND_BLOCK_MASK(chip)) == 0) + chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page + 1); + } + for (i = 0; i < eccfrag_len; i++) chip->buffers->ecccode[i] = chip->oob_poi[eccpos[i + start_step * chip->ecc.bytes]];
@@ -1075,11 +1172,12 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip, uint3 * @chip: nand chip info structure * @buf: buffer to store read data * @page: page number to read + * @rstate: page read state * * Not for syndrome calculating ecc controllers which need a special oob layout */ static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip, - uint8_t *buf, int page) + uint8_t *buf, int page, uint32_t *rstate) { int i, eccsize = chip->ecc.size; int eccbytes = chip->ecc.bytes; @@ -1089,6 +1187,13 @@ static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip, uint8_t *ecc_code = chip->buffers->ecccode; uint32_t *eccpos = chip->ecc.layout->eccpos;
+ if (nand_rstate_is_init(*rstate)) { + chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page); + (*rstate)++; + } + + nand_wait_cache_load(mtd, chip); + for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) { chip->ecc.hwctl(mtd, NAND_ECC_READ); chip->read_buf(mtd, p, eccsize); @@ -1096,6 +1201,12 @@ static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip, } chip->read_buf(mtd, chip->oob_poi, mtd->oobsize);
+ if (!nand_rstate_is_last(*rstate)) { + if (!NAND_CANAUTOINCR(chip) || + ((page + 1) & NAND_BLOCK_MASK(chip)) == 0) + chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page + 1); + } + for (i = 0; i < chip->ecc.total; i++) ecc_code[i] = chip->oob_poi[eccpos[i]];
@@ -1120,6 +1231,7 @@ static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip, * @chip: nand chip info structure * @buf: buffer to store read data * @page: page number to read + * @rstate: page read state * * Hardware ECC for large page chips, require OOB to be read first. * For this ECC mode, the write_page method is re-used from ECC_HW. @@ -1129,23 +1241,39 @@ static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip, * overwriting the NAND manufacturer bad block markings. */ static int nand_read_page_hwecc_oob_first(struct mtd_info *mtd, - struct nand_chip *chip, uint8_t *buf, int page) + struct nand_chip *chip, uint8_t *buf, + int page, uint32_t *rstate) { int i, eccsize = chip->ecc.size; int eccbytes = chip->ecc.bytes; int eccsteps = chip->ecc.steps; uint8_t *p = buf; uint8_t *ecc_code = chip->buffers->ecccode; - uint32_t *eccpos = chip->ecc.layout->eccpos; uint8_t *ecc_calc = chip->buffers->ecccalc; + uint8_t * const oob_poi = chip->oob_poi; + uint8_t *ecc_p; + uint32_t eccpos;
- /* Read the OOB area first */ - chip->cmdfunc(mtd, NAND_CMD_READOOB, 0, page); - chip->read_buf(mtd, chip->oob_poi, mtd->oobsize); - chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page); + if (nand_rstate_is_init(*rstate)) { + chip->cmdfunc(mtd, NAND_CMD_READOOB, 0, page); + (*rstate)++; + }
- for (i = 0; i < chip->ecc.total; i++) - ecc_code[i] = chip->oob_poi[eccpos[i]]; + nand_wait_cache_load(mtd, chip); + + chip->read_buf(mtd, oob_poi, mtd->oobsize); + + /* Read from start of page */ + if (nand_is_lp_device(mtd)) + chip->cmdfunc(mtd, NAND_CMD_RNDOUT, 0, -1); + else + chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page); + + /* extract ECC codes while we wait */ + ecc_p = ecc_code; + eccpos = chip->ecc.layout->eccpos[0]; + for (i = chip->ecc.total; i > 0; i--) + *ecc_p++ = oob_poi[eccpos++];
for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) { int stat; @@ -1154,6 +1282,10 @@ static int nand_read_page_hwecc_oob_first(struct mtd_info *mtd, chip->read_buf(mtd, p, eccsize); chip->ecc.calculate(mtd, p, &ecc_calc[i]);
+ /* kick off new read if next page required */ + if (eccsteps == 1 && !nand_rstate_is_last(*rstate)) + chip->cmdfunc(mtd, NAND_CMD_READOOB, 0, page + 1); + stat = chip->ecc.correct(mtd, p, &ecc_code[i], NULL); if (stat < 0) mtd->ecc_stats.failed++; @@ -1169,12 +1301,13 @@ static int nand_read_page_hwecc_oob_first(struct mtd_info *mtd, * @chip: nand chip info structure * @buf: buffer to store read data * @page: page number to read + * @rstate: page read state * * The hw generator calculates the error syndrome automatically. Therefor * we need a special oob layout and handling. */ static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip, - uint8_t *buf, int page) + uint8_t *buf, int page, uint32_t *rstate) { int i, eccsize = chip->ecc.size; int eccbytes = chip->ecc.bytes; @@ -1182,6 +1315,13 @@ static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip, uint8_t *p = buf; uint8_t *oob = chip->oob_poi;
+ if (nand_rstate_is_init(*rstate)) { + chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page); + (*rstate)++; + } + + nand_wait_cache_load(mtd, chip); + for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) { int stat;
@@ -1215,6 +1355,12 @@ static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip, if (i) chip->read_buf(mtd, oob, i);
+ if (!nand_rstate_is_last(*rstate)) { + if (!NAND_CANAUTOINCR(chip) || + ((page + 1) & NAND_BLOCK_MASK(chip)) == 0) + chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page + 1); + } + return 0; }
@@ -1281,12 +1427,11 @@ static int nand_do_read_ops(struct mtd_info *mtd, loff_t from, int chipnr, page, realpage, col, bytes, aligned; struct nand_chip *chip = mtd->priv; struct mtd_ecc_stats stats; - int blkcheck = (1 << (chip->phys_erase_shift - chip->page_shift)) - 1; - int sndcmd = 1; int ret = 0; uint32_t readlen = ops->len; uint32_t oobreadlen = ops->ooblen; uint8_t *bufpoi, *oob, *buf; + uint32_t rstate = NAND_RSTATE_INIT;
stats = mtd->ecc_stats;
@@ -1309,20 +1454,22 @@ static int nand_do_read_ops(struct mtd_info *mtd, loff_t from, if (realpage != chip->pagebuf || oob) { bufpoi = aligned ? buf : chip->buffers->databuf;
- if (likely(sndcmd)) { - chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page); - sndcmd = 0; - } + /* Last read from this chip? */ + if (((readlen - bytes) == 0) || + (((realpage + 1) & (chip->pagemask)) == 0)) + rstate |= NAND_RSTATE_LAST;
/* Now read the page into the buffer */ if (unlikely(ops->mode == MTD_OOB_RAW)) ret = chip->ecc.read_page_raw(mtd, chip, - bufpoi, page); + bufpoi, page, &rstate); else if (!aligned && NAND_SUBPAGE_READ(chip) && !oob) - ret = chip->ecc.read_subpage(mtd, chip, col, bytes, bufpoi); + ret = chip->ecc.read_subpage(mtd, chip, col, + bytes, bufpoi, + page, &rstate); else ret = chip->ecc.read_page(mtd, chip, bufpoi, - page); + page, &rstate); if (ret < 0) break;
@@ -1385,12 +1532,6 @@ static int nand_do_read_ops(struct mtd_info *mtd, loff_t from, chip->select_chip(mtd, -1); chip->select_chip(mtd, chipnr); } - - /* Check, if the chip supports auto page increment - * or if we have hit a block boundary. - */ - if (!NAND_CANAUTOINCR(chip) || !(page & blkcheck)) - sndcmd = 1; }
ops->retlen = ops->len - (size_t) readlen; @@ -1455,6 +1596,7 @@ static int nand_read_oob_std(struct mtd_info *mtd, struct nand_chip *chip, { if (sndcmd) { chip->cmdfunc(mtd, NAND_CMD_READOOB, 0, page); + nand_wait_cache_load(mtd, chip); sndcmd = 0; } chip->read_buf(mtd, chip->oob_poi, mtd->oobsize); @@ -1480,13 +1622,16 @@ static int nand_read_oob_syndrome(struct mtd_info *mtd, struct nand_chip *chip, int i, toread, sndrnd = 0, pos;
chip->cmdfunc(mtd, NAND_CMD_READ0, chip->ecc.size, page); + nand_wait_cache_load(mtd, chip); for (i = 0; i < chip->ecc.steps; i++) { if (sndrnd) { pos = eccsize + i * (eccsize + chunk); - if (mtd->writesize > 512) + if (nand_is_lp_device(mtd)) chip->cmdfunc(mtd, NAND_CMD_RNDOUT, pos, -1); - else + else { chip->cmdfunc(mtd, NAND_CMD_READ0, pos, page); + nand_wait_cache_load(mtd, chip); + } } else sndrnd = 1; toread = min_t(int, length, chunk); @@ -1552,7 +1697,7 @@ static int nand_write_oob_syndrome(struct mtd_info *mtd, chip->cmdfunc(mtd, NAND_CMD_SEQIN, pos, page); for (i = 0; i < steps; i++) { if (sndcmd) { - if (mtd->writesize <= 512) { + if (!nand_is_lp_device(mtd)) { uint32_t fill = 0xFFFFFFFF;
len = eccsize; @@ -1921,6 +2066,7 @@ static int nand_write_page(struct mtd_info *mtd, struct nand_chip *chip, #ifdef CONFIG_MTD_NAND_VERIFY_WRITE /* Send command to read back the data */ chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page); + nand_wait_cache_load(mtd, chip);
if (chip->verify_buf(mtd, buf, mtd->writesize)) return -EIO; @@ -2560,7 +2706,11 @@ static void nand_set_defaults(struct nand_chip *chip, int busw)
/* check, if a user supplied command function given */ if (chip->cmdfunc == NULL) +#ifdef CONFIG_NAND_NO_SMALL_PAGE + chip->cmdfunc = nand_command_lp; +#else chip->cmdfunc = nand_command; +#endif
/* check, if a user supplied wait function given */ if (chip->waitfunc == NULL) @@ -2722,7 +2872,7 @@ static struct nand_flash_dev *nand_get_flash_type(struct mtd_info *mtd, chip->chip_shift = ffs((unsigned)(chip->chipsize >> 32)) + 31;
/* Set the bad block position */ - chip->badblockpos = mtd->writesize > 512 ? + chip->badblockpos = nand_is_lp_device(mtd) ? NAND_LARGE_BADBLOCK_POS : NAND_SMALL_BADBLOCK_POS;
/* Get chip options, preserve non chip based options */ @@ -2746,9 +2896,15 @@ static struct nand_flash_dev *nand_get_flash_type(struct mtd_info *mtd, else chip->erase_cmd = single_erase_cmd;
+#ifdef CONFIG_NAND_NO_SMALL_PAGE + if (mtd->writesize <= 512) + /* no support for small page devices */ + return ERR_PTR(-ENODEV); +#else /* Do not replace user supplied command function ! */ - if (mtd->writesize > 512 && chip->cmdfunc == nand_command) + if (nand_is_lp_device(mtd) && chip->cmdfunc == nand_command) chip->cmdfunc = nand_command_lp; +#endif
MTDDEBUG (MTD_DEBUG_LEVEL0, "NAND device: Manufacturer ID:" " 0x%02x, Chip ID: 0x%02x (%s %s)\n", *maf_id, dev_id, diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h index 94ad0c0..3d632ee 100644 --- a/include/linux/mtd/nand.h +++ b/include/linux/mtd/nand.h @@ -181,7 +181,13 @@ typedef enum { (NAND_NO_PADDING | NAND_CACHEPRG | NAND_COPYBACK)
/* Macros to identify the above */ +#ifdef CONFIG_NAND_NO_SMALL_PAGE +#define NAND_CANAUTOINCR(chip) 0 +#else #define NAND_CANAUTOINCR(chip) (!(chip->options & NAND_NO_AUTOINCR)) +#endif +#define NAND_BLOCK_MASK(chip) \ + ((1 << (chip->phys_erase_shift - chip->page_shift)) - 1) #define NAND_MUST_PAD(chip) (!(chip->options & NAND_NO_PADDING)) #define NAND_HAS_CACHEPROG(chip) ((chip->options & NAND_CACHEPRG)) #define NAND_HAS_COPYBACK(chip) ((chip->options & NAND_COPYBACK)) @@ -269,17 +275,20 @@ struct nand_ecc_ctrl { uint8_t *calc_ecc); int (*read_page_raw)(struct mtd_info *mtd, struct nand_chip *chip, - uint8_t *buf, int page); + uint8_t *buf, int page, + uint32_t *rstate); void (*write_page_raw)(struct mtd_info *mtd, struct nand_chip *chip, const uint8_t *buf); int (*read_page)(struct mtd_info *mtd, struct nand_chip *chip, - uint8_t *buf, int page); + uint8_t *buf, int page, + uint32_t *rstate); int (*read_subpage)(struct mtd_info *mtd, struct nand_chip *chip, uint32_t offs, uint32_t len, - uint8_t *buf); + uint8_t *buf, int page, + uint32_t *rstate); void (*write_page)(struct mtd_info *mtd, struct nand_chip *chip, const uint8_t *buf);

Dear Nick Thompson,
In message 4B1E71D9.6080802@ge.com you wrote:
Improve read performance from Large Page NAND devices.
This patch produces a ~31% improvement in oob_first read speed (on a 300MHz ARM9). The time for a mid-buffer 2k page read is now 293us, 6.99MB/s (was 385us, 5.31MB/s). oob_first is probably the best case improvement.
Signed-off-by: Nick Thompson nick.thompson@ge.com
I tested this on mpc5121ads (sector size 128 KiB) and sequoia (sector size 16 KiB).
The patch applied not cleanly against "master" (but was easy to fix).
However, I did not notice any changes to the speed for a "nand read" at all.
Is this not the right flash types, or not thr right type of test?
Best regards,
Wolfgang Denk

On 08/12/09 22:06, Wolfgang Denk wrote:
Dear Nick Thompson,
In message 4B1E71D9.6080802@ge.com you wrote:
Improve read performance from Large Page NAND devices.
This patch produces a ~31% improvement in oob_first read speed (on a 300MHz ARM9). The time for a mid-buffer 2k page read is now 293us, 6.99MB/s (was 385us, 5.31MB/s). oob_first is probably the best case improvement.
Signed-off-by: Nick Thompson nick.thompson@ge.com
I tested this on mpc5121ads (sector size 128 KiB) and sequoia (sector size 16 KiB).
The patch applied not cleanly against "master" (but was easy to fix).
However, I did not notice any changes to the speed for a "nand read" at all.
Is this not the right flash types, or not thr right type of test?
Best regards,
Wolfgang Denk
Hi Wolfgang,
Thanks for testing this. I think this has worked as I would have hoped. Your test is fine. I use "nand read" and make measurements on a mixed signal 'scope.
On the mpc5121ads, mpc5121_nfc.c is used which defines its own command function. This function always waits for read commands to complete and the config file doesn't define the CONFIG_NAND_READ_CMD_NO_WAIT config (and in this case it can't) so the read should behave more or less identically.
[Possibly the command function could be changed to not wait for read to complete...]
On Sequoia, ndfc.c is used which appears to use the default command functions. If the sector size is 16kBytes I assume this is a small page NAND device and you should see no change at all. If it was a large page device you would also see no difference, unless the config file set CONFIG_NAND_READ_CMD_NO_WAIT, in which case the read_page function for H/W ECC, should fetch the next page in parallel to doing ECC correction, leading to an improvement in performance.
All in all your tests was very successful, and though I left you disappointed this time, from my point of view I'm very pleased :)
Thanks, Nick.

Dear Nick Thompson,
In message 4B1E71D9.6080802@ge.com you wrote:
Improve read performance from Large Page NAND devices.
This patch produces a ~31% improvement in oob_first read speed (on a 300MHz ARM9). The time for a mid-buffer 2k page read is now 293us, 6.99MB/s (was 385us, 5.31MB/s). oob_first is probably the best case improvement.
Signed-off-by: Nick Thompson nick.thompson@ge.com
Also tested on Canyonlands (460EX); here I actually see a slightj improvement (5.5% faster, i. e. time to read 126 MB from NAND goes down from 28.8 to 27.2 seconds (4.4 -> 4.6 MiB/s).
Tested-by: Wolfgang Denk wd@denx.de
Best regards,
Wolfgang Denk

On 09/12/09 11:02, Wolfgang Denk wrote:
Dear Nick Thompson,
In message 4B1E71D9.6080802@ge.com you wrote:
Improve read performance from Large Page NAND devices.
This patch produces a ~31% improvement in oob_first read speed (on a 300MHz ARM9). The time for a mid-buffer 2k page read is now 293us, 6.99MB/s (was 385us, 5.31MB/s). oob_first is probably the best case improvement.
Signed-off-by: Nick Thompson nick.thompson@ge.com
Also tested on Canyonlands (460EX); here I actually see a slightj improvement (5.5% faster, i. e. time to read 126 MB from NAND goes down from 28.8 to 27.2 seconds (4.4 -> 4.6 MiB/s).
Tested-by: Wolfgang Denk wd@denx.de
Hi Wolfgang,
Thanks again.
It seems the raw page data transfer rate is quite low on that board. The patch saves time between page data transfers, so the percentage improvement seen is better if you can get the page data out quicker.
The default read_buf (and write_buf) in nand_base.c are safe, but slow. I put in davinci specific optimised versions (DMA or multibyte read ticks might be used) to double my raw transfer rate. Without that, my measurements would show ~15% improvement only.
In total on da830evm I'm getting a >300% speed improvement. 1.69MB/s changes to 6.99MB/s. (5.31MB/s without this patch.)
Nick.

What kind of CPU usage are you seeing? I am throughput of ~1.9MBs for writes on a Samsung K9WBG08U1M 4GB with 4K page but with high cpu usage.
Nick Thompson-9 wrote:
On 09/12/09 11:02, Wolfgang Denk wrote:
Dear Nick Thompson,
In message 4B1E71D9.6080802@ge.com you wrote:
Improve read performance from Large Page NAND devices.
This patch produces a ~31% improvement in oob_first read speed (on a 300MHz ARM9). The time for a mid-buffer 2k page read is now 293us, 6.99MB/s (was 385us, 5.31MB/s). oob_first is probably the best case improvement.
Signed-off-by: Nick Thompson nick.thompson@ge.com
Also tested on Canyonlands (460EX); here I actually see a slightj improvement (5.5% faster, i. e. time to read 126 MB from NAND goes down from 28.8 to 27.2 seconds (4.4 -> 4.6 MiB/s).
Tested-by: Wolfgang Denk wd@denx.de
Hi Wolfgang,
Thanks again.
It seems the raw page data transfer rate is quite low on that board. The patch saves time between page data transfers, so the percentage improvement seen is better if you can get the page data out quicker.
The default read_buf (and write_buf) in nand_base.c are safe, but slow. I put in davinci specific optimised versions (DMA or multibyte read ticks might be used) to double my raw transfer rate. Without that, my measurements would show ~15% improvement only.
In total on da830evm I'm getting a >300% speed improvement. 1.69MB/s changes to 6.99MB/s. (5.31MB/s without this patch.)
Nick. _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

On 16/01/10 01:51, Josh Gelinske wrote:
What kind of CPU usage are you seeing? I am throughput of ~1.9MBs for writes on a Samsung K9WBG08U1M 4GB with 4K page but with high cpu usage.
I'm not sure what you are asking here. There is no idle loop to measure so CPU is running at 100%.
The code does poll the NAND device for data ready, but in my case the NAND loads its cache faster than the CPU can finish sorting out ECC checks (despite ECC H/W assistance). The result is the CPU never actually on the NAND.
This patch (not yet formally submitted) removes this potential waiting time that was present - plus some redundant command sequences due to incorrect function levelling. Most of my performance gain comes from optimising the data transfer (to and) from the NAND - see drivers/mtd/nand/davinci_nand.c - which is already in the mainline.
Apologies if that doesn't answer your question.
Nick.

I hadn't dug into it yet to find the polling behavior but like you said with no idle the CPU. It was just different behavior from what I see with the SD which is some CPU time spent waiting on I/O (maybe because of the DMA vs polling).
I did find a few of simple optimizations that gained me ~1MBs increase on NAND and SD as well. - /proc/sys/vm/dirty_expire_centiseconds = 2 - /proc/sys/vm/dirty_background_ratio = 5 - Unchecked KConfig "Optimize for size" which was enabled by default.
Thanks for your feedback.
Appareo Systems, LLC 1810 NDSU Research Circle N Fargo, ND 58102
JOSH GELINSKE FIRMWARE ENGINEERING MANAGER
Appareo Systems, LLC 1810 NDSU Research Circle N Fargo, ND 58102 P: (701) 356-2200 Ext 251 C: (701) 306-0899 F: (701) 356-3157
http://www.appareo.com jgelinske@appareo.com
NOTICE: This message (including attachments) is covered by the Electronic Communication Privacy Act, 18 U.S.C. sections 2510-2521, is CONFIDENTIAL and may also be protected by ATTORNEY-CLIENT OR OTHER PRIVILEGE. If you believe that it has been sent to you in error, do not read it. If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution, or copying of this communication is strictly prohibited. Please reply to the sender that you have received the message in error and then delete it.
-----Original Message----- From: Nick Thompson [mailto:nick.thompson@ge.com] Sent: Monday, January 18, 2010 6:48 AM To: Josh Gelinske Cc: u-boot@lists.denx.de Subject: Re: [U-Boot] [PATCH RFC] NAND: Improve read performance from Large Page NAND devices
On 16/01/10 01:51, Josh Gelinske wrote:
What kind of CPU usage are you seeing? I am throughput of ~1.9MBs for
writes
on a Samsung K9WBG08U1M 4GB with 4K page but with high cpu usage.
I'm not sure what you are asking here. There is no idle loop to measure so CPU is running at 100%.
The code does poll the NAND device for data ready, but in my case the NAND loads its cache faster than the CPU can finish sorting out ECC checks (despite ECC H/W assistance). The result is the CPU never actually on the NAND.
This patch (not yet formally submitted) removes this potential waiting time that was present - plus some redundant command sequences due to incorrect function levelling. Most of my performance gain comes from optimising the data transfer (to and) from the NAND - see drivers/mtd/nand/davinci_nand.c - which is already in the mainline.
Apologies if that doesn't answer your question.
Nick.
participants (3)
-
Josh Gelinske
-
Nick Thompson
-
Wolfgang Denk