[U-Boot] [PATCH RFC] NAND: Improve read performance from Large Page NAND devices

Improve read performance from Large Page NAND devices.
This patch employs the following concepts to produce a ~37% improvement in oob_first read speed (on a 300MHz ARM9). The time for a mid-buffer 2k page read is now 260us, 7.88MB/s (was 357us, 5.74MB/s). oob_first is probably the best case improvement.
Provides a new config option to allow building for large page devices only. reducing code size by ~800 bytes. [CONFIG_SYS_NAND_NO_SMALL_PAGE] This almost exactly compensates for the code increase due to other changes.
Removes waiting after read0 and readoob large page cmdfunc calls. This allows the caller to continue processing in parallel to the NAND cache load.
Provides a large page only nand_wait_cache_load function for detection of end of cache loading in the NAND after a (possibly much) earlier read request. This uses chip->dev_ready or NAND STATUS register reads, never falling back to fixed udelay calls.
Provides a persistent state store over a buffer read cycle, to allow "page read" functions to control their own read sequencing.
Requires "page read" functions to initiate read transfers themselves and allows them to pre-request the next page over multiple page reads. NAND cache load can then proceed in parallel with software operation. nand_do_read_ops maintains a flag in the state register indicating if the following page can be requested (and will be read) after the current page read is complete. This flag also prevents pre-requests on autoinc NAND devices.
Uses random read command, when available, in oob_first to avoid a second read0 by rewinding the cache column counter to 0 instead.
oob_first page read uses a "continuous and in order" OOB ECC assumption for a 3x improvement in ECC extraction time (34us->10us). These seems to be valid for all current standard OOB definitions, but may not be valid in all cases. This 10us is a still a 3.9% overhead in read times on LP devices.
Optimised continual function call indirections in nand_command_lp.
Signed-off-by: Nick Thompson nick.thompson@gefanuc.com --- Note: For comments only. This patch is not intended to be committed anywhere. Other files replace some of these functions and require matching modification.
Equivalent changes may also be useful in Linux.
diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c index 426bb95..64e6d19 100644 --- a/drivers/mtd/nand/nand_base.c +++ b/drivers/mtd/nand/nand_base.c @@ -90,6 +90,22 @@ #define CONFIG_SYS_NAND_RESET_CNT 200000 #endif
+/* NAND Read States */ +#define NAND_RSTATE_INIT 0 +#define NAND_RSTATE_READY 1 +#define NAND_RSTATE_NO_REQ (1 << 31) /* Don't pre-request next page */ + +#define nand_rstate_is_init(x) ((x & ~NAND_RSTATE_NO_REQ) == NAND_RSTATE_INIT) +#define nand_next_page_req(x) (!(x & NAND_RSTATE_NO_REQ)) + +#define CONFIG_SYS_NAND_NO_SMALL_PAGE +/* Is the device a Large Page device? */ +#ifdef CONFIG_SYS_NAND_NO_SMALL_PAGE +#define nand_is_lp_device(mtd) (1) +#else +#define nand_is_lp_device(mtd) (mtd->writesize > 512) +#endif + /* Define default oob placement schemes for large and small page devices */ static struct nand_ecclayout nand_oob_8 = { .eccbytes = 3, @@ -143,6 +159,8 @@ static int nand_do_write_oob(struct mtd_info *mtd, loff_t to,
static int nand_wait(struct mtd_info *mtd, struct nand_chip *this);
+static int nand_wait_cache_load(struct mtd_info *mtd, struct nand_chip *this); + /* * For devices which display every fart in the system on a separate LED. Is * compiled away when LED support is disabled. @@ -385,6 +403,8 @@ static int nand_block_bad(struct mtd_info *mtd, loff_t ofs, int getchip) if (chip->options & NAND_BUSWIDTH_16) { chip->cmdfunc(mtd, NAND_CMD_READOOB, chip->badblockpos & 0xFE, page); + if (nand_is_lp_device(mtd)) + nand_wait_cache_load(mtd, chip); bad = cpu_to_le16(chip->read_word(mtd)); if (chip->badblockpos & 0x1) bad >>= 8; @@ -392,6 +412,8 @@ static int nand_block_bad(struct mtd_info *mtd, loff_t ofs, int getchip) res = 1; } else { chip->cmdfunc(mtd, NAND_CMD_READOOB, chip->badblockpos, page); + if (nand_is_lp_device(mtd)) + nand_wait_cache_load(mtd, chip); if (chip->read_byte(mtd) != 0xff) res = 1; } @@ -644,6 +666,8 @@ static void nand_command_lp(struct mtd_info *mtd, unsigned int command, { register struct nand_chip *chip = mtd->priv; uint32_t rst_sts_cnt = CONFIG_SYS_NAND_RESET_CNT; + void (*cmd_ctrl)(struct mtd_info *mtd, int cmd, unsigned int ctrl); + cmd_ctrl = chip->cmd_ctrl;
/* Emulate NAND_CMD_READOOB */ if (command == NAND_CMD_READOOB) { @@ -663,21 +687,21 @@ static void nand_command_lp(struct mtd_info *mtd, unsigned int command, /* Adjust columns for 16 bit buswidth */ if (chip->options & NAND_BUSWIDTH_16) column >>= 1; - chip->cmd_ctrl(mtd, column, ctrl); + cmd_ctrl(mtd, column, ctrl); ctrl &= ~NAND_CTRL_CHANGE; - chip->cmd_ctrl(mtd, column >> 8, ctrl); + cmd_ctrl(mtd, column >> 8, ctrl); } if (page_addr != -1) { - chip->cmd_ctrl(mtd, page_addr, ctrl); - chip->cmd_ctrl(mtd, page_addr >> 8, + cmd_ctrl(mtd, page_addr, ctrl); + cmd_ctrl(mtd, page_addr >> 8, NAND_NCE | NAND_ALE); /* One more address cycle for devices > 128MiB */ if (chip->chipsize > (128 << 20)) - chip->cmd_ctrl(mtd, page_addr >> 16, - NAND_NCE | NAND_ALE); + cmd_ctrl(mtd, page_addr >> 16, + NAND_NCE | NAND_ALE); } } - chip->cmd_ctrl(mtd, NAND_CMD_NONE, NAND_NCE | NAND_CTRL_CHANGE); + cmd_ctrl(mtd, NAND_CMD_NONE, NAND_NCE | NAND_CTRL_CHANGE);
/* * program and erase have their own busy handlers @@ -710,29 +734,28 @@ static void nand_command_lp(struct mtd_info *mtd, unsigned int command, if (chip->dev_ready) break; udelay(chip->chip_delay); - chip->cmd_ctrl(mtd, NAND_CMD_STATUS, - NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE); - chip->cmd_ctrl(mtd, NAND_CMD_NONE, - NAND_NCE | NAND_CTRL_CHANGE); + cmd_ctrl(mtd, NAND_CMD_STATUS, + NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE); + cmd_ctrl(mtd, NAND_CMD_NONE, + NAND_NCE | NAND_CTRL_CHANGE); while (!(chip->read_byte(mtd) & NAND_STATUS_READY) && (rst_sts_cnt--)); return;
case NAND_CMD_RNDOUT: /* No ready / busy check necessary */ - chip->cmd_ctrl(mtd, NAND_CMD_RNDOUTSTART, - NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE); - chip->cmd_ctrl(mtd, NAND_CMD_NONE, - NAND_NCE | NAND_CTRL_CHANGE); + cmd_ctrl(mtd, NAND_CMD_RNDOUTSTART, + NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE); + cmd_ctrl(mtd, NAND_CMD_NONE, + NAND_NCE | NAND_CTRL_CHANGE); return;
case NAND_CMD_READ0: - chip->cmd_ctrl(mtd, NAND_CMD_READSTART, - NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE); - chip->cmd_ctrl(mtd, NAND_CMD_NONE, - NAND_NCE | NAND_CTRL_CHANGE); - - /* This applies to read commands */ + cmd_ctrl(mtd, NAND_CMD_READSTART, + NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE); + cmd_ctrl(mtd, NAND_CMD_NONE, + NAND_NCE | NAND_CTRL_CHANGE); + return; default: /* * If we don't have access to the busy pin, we apply the given @@ -863,6 +886,15 @@ static int nand_wait(struct mtd_info *mtd, struct nand_chip *this) else this->cmdfunc(mtd, NAND_CMD_STATUS, -1, -1);
+ /* The chip might be ready by now, don't lose anymore time */ + if (this->dev_ready) { + if (this->dev_ready(mtd)) + goto ready; + } else { + if (this->read_byte(mtd) & NAND_STATUS_READY) + goto ready; + } + reset_timer();
while (1) { @@ -879,6 +911,8 @@ static int nand_wait(struct mtd_info *mtd, struct nand_chip *this) break; } } + +ready: #ifdef PPCHAMELON_NAND_TIMER_HACK reset_timer(); while (get_timer(0) < 10); @@ -889,16 +923,45 @@ static int nand_wait(struct mtd_info *mtd, struct nand_chip *this) #endif
/** + * Wait for cache ready after read request. + * + * Returns to read state before returning. + * + * @mtd: mtd info structure + * @chip: nand chip info structure + */ +static int nand_wait_cache_load(struct mtd_info *mtd, struct nand_chip *chip) +{ + int state = nand_wait(mtd, chip); + chip->cmd_ctrl(mtd, NAND_CMD_READSTART, NAND_CLE | NAND_CTRL_CLE | + NAND_CTRL_CHANGE); + chip->cmd_ctrl(mtd, NAND_CMD_NONE, NAND_CLE | NAND_CTRL_CHANGE); + return state; +} + +/** * nand_read_page_raw - [Intern] read raw page data without ecc * @mtd: mtd info structure * @chip: nand chip info structure * @buf: buffer to store read data */ static int nand_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip, - uint8_t *buf, int page) + uint8_t *buf, int page, uint32_t *rstate) { + if (nand_rstate_is_init(*rstate)) { + chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page); + *rstate = NAND_RSTATE_READY; + } + + if (nand_is_lp_device(mtd)) + nand_wait_cache_load(mtd, chip); + chip->read_buf(mtd, buf, mtd->writesize); chip->read_buf(mtd, chip->oob_poi, mtd->oobsize); + + if (nand_next_page_req(*rstate)) + chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page+1); + return 0; }
@@ -909,7 +972,7 @@ static int nand_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip, * @buf: buffer to store read data */ static int nand_read_page_swecc(struct mtd_info *mtd, struct nand_chip *chip, - uint8_t *buf, int page) + uint8_t *buf, int page, uint32_t *rstate) { int i, eccsize = chip->ecc.size; int eccbytes = chip->ecc.bytes; @@ -919,7 +982,7 @@ static int nand_read_page_swecc(struct mtd_info *mtd, struct nand_chip *chip, uint8_t *ecc_code = chip->buffers->ecccode; uint32_t *eccpos = chip->ecc.layout->eccpos;
- chip->ecc.read_page_raw(mtd, chip, buf, page); + chip->ecc.read_page_raw(mtd, chip, buf, page, rstate);
for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) chip->ecc.calculate(mtd, p, &ecc_calc[i]); @@ -950,7 +1013,9 @@ static int nand_read_page_swecc(struct mtd_info *mtd, struct nand_chip *chip, * @readlen data length * @buf: buffer to store read data */ -static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip, uint32_t data_offs, uint32_t readlen, uint8_t *bufpoi) +static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip, + uint32_t data_offs, uint32_t readlen, + uint8_t *bufpoi, int page, uint32_t *rstate) { int start_step, end_step, num_steps; uint32_t *eccpos = chip->ecc.layout->eccpos; @@ -959,6 +1024,11 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip, uint3 int datafrag_len, eccfrag_len, aligned_len, aligned_pos; int busw = (chip->options & NAND_BUSWIDTH_16) ? 2 : 1;
+ if (nand_rstate_is_init(*rstate)) { + chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page); + *rstate = NAND_RSTATE_READY; + } + /* Column address wihin the page aligned to ECC size (256bytes). */ start_step = data_offs / chip->ecc.size; end_step = (data_offs + readlen - 1) / chip->ecc.size; @@ -969,6 +1039,10 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip, uint3 eccfrag_len = num_steps * chip->ecc.bytes;
data_col_addr = start_step * chip->ecc.size; + + if (nand_is_lp_device(mtd)) + nand_wait_cache_load(mtd, chip); + /* If we read not a page aligned data */ if (data_col_addr != 0) chip->cmdfunc(mtd, NAND_CMD_RNDOUT, data_col_addr, -1); @@ -1007,6 +1081,9 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip, uint3 chip->read_buf(mtd, &chip->oob_poi[aligned_pos], aligned_len); }
+ if (nand_next_page_req(*rstate)) + chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page+1); + for (i = 0; i < eccfrag_len; i++) chip->buffers->ecccode[i] = chip->oob_poi[eccpos[i + start_step * chip->ecc.bytes]];
@@ -1032,7 +1109,7 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip, uint3 * Not for syndrome calculating ecc controllers which need a special oob layout */ static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip, - uint8_t *buf, int page) + uint8_t *buf, int page, uint32_t *rstate) { int i, eccsize = chip->ecc.size; int eccbytes = chip->ecc.bytes; @@ -1042,6 +1119,14 @@ static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip, uint8_t *ecc_code = chip->buffers->ecccode; uint32_t *eccpos = chip->ecc.layout->eccpos;
+ if (nand_rstate_is_init(*rstate)) { + chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page); + *rstate = NAND_RSTATE_READY; + } + + if (nand_is_lp_device(mtd)) + nand_wait_cache_load(mtd, chip); + for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) { chip->ecc.hwctl(mtd, NAND_ECC_READ); chip->read_buf(mtd, p, eccsize); @@ -1049,6 +1134,9 @@ static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip, } chip->read_buf(mtd, chip->oob_poi, mtd->oobsize);
+ if (nand_next_page_req(*rstate)) + chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page+1); + for (i = 0; i < chip->ecc.total; i++) ecc_code[i] = chip->oob_poi[eccpos[i]];
@@ -1081,29 +1169,52 @@ static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip, * overwriting the NAND manufacturer bad block markings. */ static int nand_read_page_hwecc_oob_first(struct mtd_info *mtd, - struct nand_chip *chip, uint8_t *buf, int page) + struct nand_chip *chip, uint8_t *buf, + int page, uint32_t *rstate) { int i, eccsize = chip->ecc.size; int eccbytes = chip->ecc.bytes; int eccsteps = chip->ecc.steps; uint8_t *p = buf; uint8_t *ecc_code = chip->buffers->ecccode; - uint32_t *eccpos = chip->ecc.layout->eccpos; uint8_t *ecc_calc = chip->buffers->ecccalc; + uint8_t * const oob_poi = chip->oob_poi; + uint8_t *ecc_p; + uint32_t eccpos; + const int lp_device = nand_is_lp_device(mtd);
- /* Read the OOB area first */ - chip->cmdfunc(mtd, NAND_CMD_READOOB, 0, page); - chip->read_buf(mtd, chip->oob_poi, mtd->oobsize); - chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page); + if (nand_rstate_is_init(*rstate)) { + chip->cmdfunc(mtd, NAND_CMD_READOOB, 0, page); + *rstate = NAND_RSTATE_READY; + }
- for (i = 0; i < chip->ecc.total; i++) - ecc_code[i] = chip->oob_poi[eccpos[i]]; + if (lp_device) + nand_wait_cache_load(mtd, chip); + + chip->read_buf(mtd, oob_poi, mtd->oobsize); + + /* Read from start of page */ + if (lp_device) + chip->cmdfunc(mtd, NAND_CMD_RNDOUT, 0, -1); + else + chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page); + + /* extract ECC codes while we wait */ + ecc_p = ecc_code; + eccpos = chip->ecc.layout->eccpos[0]; + for (i = chip->ecc.total; i > 0; i--) + *ecc_p++ = oob_poi[eccpos++];
for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) { int stat;
chip->ecc.hwctl(mtd, NAND_ECC_READ); chip->read_buf(mtd, p, eccsize); + + /* kick off new read if next page required */ + if (eccsteps == 1 && nand_next_page_req(*rstate)) + chip->cmdfunc(mtd, NAND_CMD_READOOB, 0, page+1); + chip->ecc.calculate(mtd, p, &ecc_calc[i]);
stat = chip->ecc.correct(mtd, p, &ecc_code[i], NULL); @@ -1125,7 +1236,7 @@ static int nand_read_page_hwecc_oob_first(struct mtd_info *mtd, * we need a special oob layout and handling. */ static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip, - uint8_t *buf, int page) + uint8_t *buf, int page, uint32_t *rstate) { int i, eccsize = chip->ecc.size; int eccbytes = chip->ecc.bytes; @@ -1133,6 +1244,14 @@ static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip, uint8_t *p = buf; uint8_t *oob = chip->oob_poi;
+ if (nand_rstate_is_init(*rstate)) { + chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page); + *rstate = NAND_RSTATE_READY; + } + + if (nand_is_lp_device(mtd)) + nand_wait_cache_load(mtd, chip); + for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) { int stat;
@@ -1166,6 +1285,9 @@ static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip, if (i) chip->read_buf(mtd, oob, i);
+ if (nand_next_page_req(*rstate)) + chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page+1); + return 0; }
@@ -1233,11 +1355,11 @@ static int nand_do_read_ops(struct mtd_info *mtd, loff_t from, struct nand_chip *chip = mtd->priv; struct mtd_ecc_stats stats; int blkcheck = (1 << (chip->phys_erase_shift - chip->page_shift)) - 1; - int sndcmd = 1; int ret = 0; uint32_t readlen = ops->len; uint32_t oobreadlen = ops->ooblen; uint8_t *bufpoi, *oob, *buf; + uint32_t rstate = NAND_RSTATE_INIT;
stats = mtd->ecc_stats;
@@ -1260,20 +1382,33 @@ static int nand_do_read_ops(struct mtd_info *mtd, loff_t from, if (realpage != chip->pagebuf || oob) { bufpoi = aligned ? buf : chip->buffers->databuf;
- if (likely(sndcmd)) { - chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page); - sndcmd = 0; + /* + * Autoinc devices automatically fetch the next page, + * but can't increment over a block boundary + */ + if (NAND_CANAUTOINCR(chip)) { + if (((page + 1) & blkcheck) != 0) + rstate |= NAND_RSTATE_NO_REQ; + else + rstate &= ~NAND_RSTATE_NO_REQ; }
+ /* Last read from this chip? */ + if (((readlen - bytes) == 0) || + (((realpage + 1) & (chip->pagemask)) == 0)) + rstate |= NAND_RSTATE_NO_REQ; + /* Now read the page into the buffer */ if (unlikely(ops->mode == MTD_OOB_RAW)) ret = chip->ecc.read_page_raw(mtd, chip, - bufpoi, page); + bufpoi, page, &rstate); else if (!aligned && NAND_SUBPAGE_READ(chip) && !oob) - ret = chip->ecc.read_subpage(mtd, chip, col, bytes, bufpoi); + ret = chip->ecc.read_subpage(mtd, chip, col, + bytes, bufpoi, + page, &rstate); else ret = chip->ecc.read_page(mtd, chip, bufpoi, - page); + page, &rstate); if (ret < 0) break;
@@ -1336,12 +1471,6 @@ static int nand_do_read_ops(struct mtd_info *mtd, loff_t from, chip->select_chip(mtd, -1); chip->select_chip(mtd, chipnr); } - - /* Check, if the chip supports auto page increment - * or if we have hit a block boundary. - */ - if (!NAND_CANAUTOINCR(chip) || !(page & blkcheck)) - sndcmd = 1; }
ops->retlen = ops->len - (size_t) readlen; @@ -1406,6 +1535,8 @@ static int nand_read_oob_std(struct mtd_info *mtd, struct nand_chip *chip, { if (sndcmd) { chip->cmdfunc(mtd, NAND_CMD_READOOB, 0, page); + if (nand_is_lp_device(mtd)) + nand_wait_cache_load(mtd, chip); sndcmd = 0; } chip->read_buf(mtd, chip->oob_poi, mtd->oobsize); @@ -1434,12 +1565,15 @@ static int nand_read_oob_syndrome(struct mtd_info *mtd, struct nand_chip *chip, for (i = 0; i < chip->ecc.steps; i++) { if (sndrnd) { pos = eccsize + i * (eccsize + chunk); - if (mtd->writesize > 512) + if (nand_is_lp_device(mtd)) chip->cmdfunc(mtd, NAND_CMD_RNDOUT, pos, -1); else chip->cmdfunc(mtd, NAND_CMD_READ0, pos, page); - } else + } else { sndrnd = 1; + if (nand_is_lp_device(mtd)) + nand_wait_cache_load(mtd, chip); + } toread = min_t(int, length, chunk); chip->read_buf(mtd, bufpoi, toread); bufpoi += toread; @@ -1503,7 +1637,7 @@ static int nand_write_oob_syndrome(struct mtd_info *mtd, chip->cmdfunc(mtd, NAND_CMD_SEQIN, pos, page); for (i = 0; i < steps; i++) { if (sndcmd) { - if (mtd->writesize <= 512) { + if (!nand_is_lp_device(mtd)) { uint32_t fill = 0xFFFFFFFF;
len = eccsize; @@ -1832,6 +1966,8 @@ static int nand_write_page(struct mtd_info *mtd, struct nand_chip *chip, #ifdef CONFIG_MTD_NAND_VERIFY_WRITE /* Send command to read back the data */ chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page); + if (nand_is_lp_device(mtd)) + nand_wait_cache_load(mtd, chip);
if (chip->verify_buf(mtd, buf, mtd->writesize)) return -EIO; @@ -2468,7 +2604,11 @@ static void nand_set_defaults(struct nand_chip *chip, int busw)
/* check, if a user supplied command function given */ if (chip->cmdfunc == NULL) +#ifdef CONFIG_SYS_NAND_NO_SMALL_PAGE + chip->cmdfunc = nand_command_lp; +#else chip->cmdfunc = nand_command; +#endif
/* check, if a user supplied wait function given */ if (chip->waitfunc == NULL) @@ -2627,7 +2767,7 @@ static struct nand_flash_dev *nand_get_flash_type(struct mtd_info *mtd, chip->chip_shift = ffs(chip->chipsize) - 1;
/* Set the bad block position */ - chip->badblockpos = mtd->writesize > 512 ? + chip->badblockpos = nand_is_lp_device(mtd) ? NAND_LARGE_BADBLOCK_POS : NAND_SMALL_BADBLOCK_POS;
/* Get chip options, preserve non chip based options */ @@ -2651,9 +2791,14 @@ static struct nand_flash_dev *nand_get_flash_type(struct mtd_info *mtd, else chip->erase_cmd = single_erase_cmd;
+#ifdef CONFIG_SYS_NAND_NO_SMALL_PAGE + if (!nand_is_lp_device(mtd)) + return ERR_PTR(-ENODEV); +#else /* Do not replace user supplied command function ! */ - if (mtd->writesize > 512 && chip->cmdfunc == nand_command) + if (nand_is_lp_device(mtd) && chip->cmdfunc == nand_command) chip->cmdfunc = nand_command_lp; +#endif
MTDDEBUG (MTD_DEBUG_LEVEL0, "NAND device: Manufacturer ID:" " 0x%02x, Chip ID: 0x%02x (%s %s)\n", *maf_id, dev_id, diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h index cb7c19a..85b7c3c 100644 --- a/include/linux/mtd/nand.h +++ b/include/linux/mtd/nand.h @@ -269,17 +269,20 @@ struct nand_ecc_ctrl { uint8_t *calc_ecc); int (*read_page_raw)(struct mtd_info *mtd, struct nand_chip *chip, - uint8_t *buf, int page); + uint8_t *buf, int page, + uint32_t *rstate); void (*write_page_raw)(struct mtd_info *mtd, struct nand_chip *chip, const uint8_t *buf); int (*read_page)(struct mtd_info *mtd, struct nand_chip *chip, - uint8_t *buf, int page); + uint8_t *buf, int page, + uint32_t *rstate); int (*read_subpage)(struct mtd_info *mtd, struct nand_chip *chip, uint32_t offs, uint32_t len, - uint8_t *buf); + uint8_t *buf, int page, + uint32_t *rstate); void (*write_page)(struct mtd_info *mtd, struct nand_chip *chip, const uint8_t *buf);

Nick Thompson wrote:
Improve read performance from Large Page NAND devices.
This patch employs the following concepts to produce a ~37% improvement in oob_first read speed (on a 300MHz ARM9). The time for a mid-buffer 2k page read is now 260us, 7.88MB/s (was 357us, 5.74MB/s). oob_first is probably the best case improvement.
Provides a new config option to allow building for large page devices only. reducing code size by ~800 bytes. [CONFIG_SYS_NAND_NO_SMALL_PAGE] This almost exactly compensates for the code increase due to other changes.
Could we make it more orthogonal? I.e. CONFIG_NAND_512B_PAGE, CONFIG_NAND_2K_PAGE, CONFIG_NAND_4K_PAGE? As is, it does nothing to keep things from growing for small-page-only boards.
As it would determine what support is present rather than what the hardware actually is, I don't think it would go in CONFIG_SYS.
- /* The chip might be ready by now, don't lose anymore time */
- if (this->dev_ready) {
if (this->dev_ready(mtd))
goto ready;
- } else {
if (this->read_byte(mtd) & NAND_STATUS_READY)
goto ready;
- }
Does it really take a noticeable amount of time to do reset_timer() and get_timer() once?
- Wait for cache ready after read request.
- Returns to read state before returning.
- @mtd: mtd info structure
- @chip: nand chip info structure
- */
+static int nand_wait_cache_load(struct mtd_info *mtd, struct nand_chip *chip) +{
- int state = nand_wait(mtd, chip);
- chip->cmd_ctrl(mtd, NAND_CMD_READSTART, NAND_CLE | NAND_CTRL_CLE |
NAND_CTRL_CHANGE);
NAND_CTRL_CLE includes NAND_CLE.
Why nand_wait() before READSTART? The existing nand_command_lp() doesn't do this AFAICT.
This change will break drivers that support large page and use the default read_page functions, but do not implement cmd_ctrl (they replace cmdfunc instead). This includes fsl_elbc_nand, mxc_nand, and mpc5121_nfc. While I'd like to move them to implementing their own read_page-type functions instead of cmdfunc, is there any way to make it a smoother transition?
- chip->cmd_ctrl(mtd, NAND_CMD_NONE, NAND_CLE | NAND_CTRL_CHANGE);
Shouldn't this be NAND_NCE | NAND_CTRL_CHANGE? Don't we want to drop CLE here?
- if (nand_next_page_req(*rstate))
chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page+1);
Spaces around binary operators.
diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h index cb7c19a..85b7c3c 100644 --- a/include/linux/mtd/nand.h +++ b/include/linux/mtd/nand.h @@ -269,17 +269,20 @@ struct nand_ecc_ctrl { uint8_t *calc_ecc); int (*read_page_raw)(struct mtd_info *mtd, struct nand_chip *chip,
uint8_t *buf, int page);
uint8_t *buf, int page,
void (*write_page_raw)(struct mtd_info *mtd, struct nand_chip *chip, const uint8_t *buf); int (*read_page)(struct mtd_info *mtd, struct nand_chip *chip,uint32_t *rstate);
uint8_t *buf, int page);
uint8_t *buf, int page,
int (*read_subpage)(struct mtd_info *mtd, struct nand_chip *chip, uint32_t offs, uint32_t len,uint32_t *rstate);
uint8_t *buf);
uint8_t *buf, int page,
uint32_t *rstate);
Does rstate really need to be a parameter, or could it be part of the mtd struct? I'd really like nand_do_read_ops() to not have to know about any of this, and have it be internal to the read_page functions -- other than perhaps clearing the value on exit, or some other way to let read_page know that its context has changed.
If we need to communicate to the read_page function whether this is the last page, then that can be a separate flag that isn't tied up with what the hardware is capable of, or whether a boundary is being spanned.
-Scott

On 01/12/09 00:55, Scott Wood wrote:
Nick Thompson wrote:
Improve read performance from Large Page NAND devices.
This patch employs the following concepts to produce a ~37% improvement in oob_first read speed (on a 300MHz ARM9). The time for a mid-buffer 2k page read is now 260us, 7.88MB/s (was 357us, 5.74MB/s). oob_first is probably the best case improvement.
Provides a new config option to allow building for large page devices only. reducing code size by ~800 bytes. [CONFIG_SYS_NAND_NO_SMALL_PAGE] This almost exactly compensates for the code increase due to other changes.
Could we make it more orthogonal? I.e. CONFIG_NAND_512B_PAGE, CONFIG_NAND_2K_PAGE, CONFIG_NAND_4K_PAGE? As is, it does nothing to keep things from growing for small-page-only boards.
I guess that would be possible. The proposed usage is to determine if the incompatible small page command set is supported as well as the ONFI large page command set. So it excludes nand_command, which should now have an unused attribute, and optimises away conditionals in many other functions.
It's not currently there to decided what pages sizes are supported.
I could put in a CONFIG_NAND_NO_LARGE_PAGE as well, but since small page devices are EOL now I was only looking to make it easier to remove legacy code (a bit like the Museum IDs in nand_ids.c).
As it would determine what support is present rather than what the hardware actually is, I don't think it would go in CONFIG_SYS.
Right, that's fine.
- /* The chip might be ready by now, don't lose anymore time */
- if (this->dev_ready) {
if (this->dev_ready(mtd))
goto ready;
- } else {
if (this->read_byte(mtd) & NAND_STATUS_READY)
goto ready;
- }
Does it really take a noticeable amount of time to do reset_timer() and get_timer() once?
On ARM the timer functions use divides to adjust the timer values. it takes several microseconds longer to spot the ready status when the chip is in fact already ready. It's not major, but it is easily measurable.
- Wait for cache ready after read request.
- Returns to read state before returning.
- @mtd: mtd info structure
- @chip: nand chip info structure
- */
+static int nand_wait_cache_load(struct mtd_info *mtd, struct nand_chip *chip) +{
- int state = nand_wait(mtd, chip);
- chip->cmd_ctrl(mtd, NAND_CMD_READSTART, NAND_CLE | NAND_CTRL_CLE |
NAND_CTRL_CHANGE);
NAND_CTRL_CLE includes NAND_CLE.
Yes, will fix.
Why nand_wait() before READSTART? The existing nand_command_lp() doesn't do this AFAICT.
In fact I'm only after the functionality in nand_wait. nand_wait always sends a "read status" command. To get out of read status state you have to send a new command and, if waiting for a read to complete, the obvious command is read start as this puts the chip back into read state.
The full sequence is "read", "read start", (chip goes busy), "read status", (polling status reads...), (chip goes ready), "read start", (data reads...).
Currently nand_command_lp always uses a the "ready busy" read function, or a timer if the read busy pin is not accessible. Using a timer is out of the question here, since (we are hoping) significant time has already elapsed.
This change will break drivers that support large page and use the default read_page functions, but do not implement cmd_ctrl (they replace cmdfunc instead). This includes fsl_elbc_nand, mxc_nand, and mpc5121_nfc. While I'd like to move them to implementing their own read_page-type functions instead of cmdfunc, is there any way to make it a smoother transition?
Yes, as it stands they would need modifying simultaneously and I have no way to test such a change myself. The only required change in cmdfunc is not to wait after a read0 request. You maybe in a better position to decide if this has wider repercussions, but I will take a look at the above drivers as well. [This is the main reason I made this an RFC].
- chip->cmd_ctrl(mtd, NAND_CMD_NONE, NAND_CLE | NAND_CTRL_CHANGE);
Shouldn't this be NAND_NCE | NAND_CTRL_CHANGE? Don't we want to drop CLE here?
You are right, will fix.
- if (nand_next_page_req(*rstate))
chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page+1);
Spaces around binary operators.
Okay.
diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h index cb7c19a..85b7c3c 100644 --- a/include/linux/mtd/nand.h +++ b/include/linux/mtd/nand.h @@ -269,17 +269,20 @@ struct nand_ecc_ctrl { uint8_t *calc_ecc); int (*read_page_raw)(struct mtd_info *mtd, struct nand_chip *chip,
uint8_t *buf, int page);
uint8_t *buf, int page,
void (*write_page_raw)(struct mtd_info *mtd, struct nand_chip *chip, const uint8_t *buf); int (*read_page)(struct mtd_info *mtd, struct nand_chip *chip,uint32_t *rstate);
uint8_t *buf, int page);
uint8_t *buf, int page,
int (*read_subpage)(struct mtd_info *mtd, struct nand_chip *chip, uint32_t offs, uint32_t len,uint32_t *rstate);
uint8_t *buf);
uint8_t *buf, int page,
uint32_t *rstate);
Does rstate really need to be a parameter, or could it be part of the mtd struct? I'd really like nand_do_read_ops() to not have to know about any of this, and have it be internal to the read_page functions -- other than perhaps clearing the value on exit, or some other way to let read_page know that its context has changed.
If we need to communicate to the read_page function whether this is the last page, then that can be a separate flag that isn't tied up with what the hardware is capable of, or whether a boundary is being spanned.
Only do_read_ops knows if this is the last page read, so that does need to be passed to read page. I didn't want to add such low level stuff into mtd struct as it is very transitory information and bloats the struct.
The page read function also needs to know if this is the first page read. Currently rstate's main use is this first (INIT) and/or last (NO_REQ) state.
The page read functions can handle the boundaries and autoinc status, but only at the expense of repeating the same tests in each read page function. I'm okay with that, if its acceptable. It will make the code slightly larger, but I don't really like the way I'm forcing a "last page" state to avoid sending reads in the autoinc case as it is masking the true purpose of the flag.
Thanks, Nick.

On 01/12/09 10:13, Nick Thompson wrote:
On 01/12/09 00:55, Scott Wood wrote:
This change will break drivers that support large page and use the default read_page functions, but do not implement cmd_ctrl (they replace cmdfunc instead). This includes fsl_elbc_nand, mxc_nand, and mpc5121_nfc. While I'd like to move them to implementing their own read_page-type functions instead of cmdfunc, is there any way to make it a smoother transition?
Yes, as it stands they would need modifying simultaneously and I have no way to test such a change myself. The only required change in cmdfunc is not to wait after a read0 request. You maybe in a better position to decide if this has wider repercussions, but I will take a look at the above drivers as well. [This is the main reason I made this an RFC].
How about, if nand_wait_cache_load was replaceable (by a no-op in your case) and the pre-fetch optimisation could be disabled by setting rstate to (INIT | NO_REQ) on every page read function call #ifdef CONFIG_NAND_NO_PREFETCH_READS?
This leaves a problem with NAND_CMD_RNDOUT which is used by oob_first page reads but not supported by fsl_elbc_cmdfunc. I expect you don't use oob_first though..?
I believe this would allow you to restore the original sequences and keep you going until you can define your own page read functions.
[BTW these changes applied quite cleanly to my Linux tree and give similar performance gains there as well. If we can reach agreement here, I will make patches for Linux as well. Unfortunately, my tree is 2.6.18 with backported NAND support from 2.6.32rc1, but I can deal with that.]
Nick.

Nick Thompson wrote:
On 01/12/09 10:13, Nick Thompson wrote:
On 01/12/09 00:55, Scott Wood wrote:
This change will break drivers that support large page and use the default read_page functions, but do not implement cmd_ctrl (they replace cmdfunc instead). This includes fsl_elbc_nand, mxc_nand, and mpc5121_nfc. While I'd like to move them to implementing their own read_page-type functions instead of cmdfunc, is there any way to make it a smoother transition?
Yes, as it stands they would need modifying simultaneously and I have no way to test such a change myself. The only required change in cmdfunc is not to wait after a read0 request. You maybe in a better position to decide if this has wider repercussions, but I will take a look at the above drivers as well. [This is the main reason I made this an RFC].
How about, if nand_wait_cache_load was replaceable (by a no-op in your case)
Or it could be off by default, and enabled only on those platforms where it works and is beneficial.
and the pre-fetch optimisation could be disabled by setting rstate to (INIT | NO_REQ) on every page read function call #ifdef CONFIG_NAND_NO_PREFETCH_READS?
This leaves a problem with NAND_CMD_RNDOUT which is used by oob_first page reads but not supported by fsl_elbc_cmdfunc. I expect you don't use oob_first though..?
Right. It's also used on swecc, but we don't use that either.
I believe this would allow you to restore the original sequences and keep you going until you can define your own page read functions.
[BTW these changes applied quite cleanly to my Linux tree and give similar performance gains there as well. If we can reach agreement here, I will make patches for Linux as well. Unfortunately, my tree is 2.6.18 with backported NAND support from 2.6.32rc1, but I can deal with that.]
If possible, it would be nice to run these patches by the Linux list now, so we get their feedback earlier rather than later.
-Scott

Nick Thompson wrote:
I could put in a CONFIG_NAND_NO_LARGE_PAGE as well, but since small page devices are EOL now I was only looking to make it easier to remove legacy code (a bit like the Museum IDs in nand_ids.c).
I don't think small page support is going away any time soon.
Does it really take a noticeable amount of time to do reset_timer() and get_timer() once?
On ARM the timer functions use divides to adjust the timer values.
Yay mandatory 1000 Hz clock rate. :-P
Does ARM have a 64-bit multiply that can be used to speed up division-by-constant? If not, can the division be pulled out of reset_timer by doing (new - old) / rate instead of new/rate - old/rate?
I assume it's OK for reset_timer to reset completely to zero, including the remainder.
it takes several microseconds longer to spot the ready status when the chip is in fact already ready. It's not major, but it is easily measurable.
Wow, I know divide is expensive, but several microseconds?
Why nand_wait() before READSTART? The existing nand_command_lp() doesn't do this AFAICT.
In fact I'm only after the functionality in nand_wait. nand_wait always sends a "read status" command. To get out of read status state you have to send a new command and, if waiting for a read to complete, the obvious command is read start as this puts the chip back into read state.
The full sequence is "read", "read start", (chip goes busy), "read status", (polling status reads...), (chip goes ready), "read start", (data reads...).
Ah, I see. So this is specifically to help the case where you don't have access to the busy pin? I'm wondering if it will slow things down when the busy pin is hooked up, versus not having to issue the extra commands and read the status.
If we need to communicate to the read_page function whether this is the last page, then that can be a separate flag that isn't tied up with what the hardware is capable of, or whether a boundary is being spanned.
Only do_read_ops knows if this is the last page read, so that does need to be passed to read page.
Right, the "if" was wondering what the impact would be of just unconditionally fetching the next page.
I didn't want to add such low level stuff into mtd struct as it is very transitory information and bloats the struct.
It's internal state of functions supplied by the generic code; that seems the place for such things.
Though I don't really care that much, as long as nand_do_read_ops() doesn't have to manage the state beyond saying "start" and "end".
The page read function also needs to know if this is the first page read.
Clearing the state on entry to the read ops function would allow that, with only minor dictation of how read_page uses the field.
The page read functions can handle the boundaries and autoinc status, but only at the expense of repeating the same tests in each read page function.
The tests can be factored out into helper functions -- but it should be the replaceable read_page functions that contain such low-level knowledge. nand_do_read_ops() should just be a high-level loop around read_page(), plus the OOB and partial page shuffling.
-Scott
participants (2)
-
Nick Thompson
-
Scott Wood