[U-Boot] [PATCH v2 0/8] JFFS2 fixes and performance improvements

In reply to comments from Wolfgang Denk and Heiko Schocher:
My aim was to optimize U-Boot's loading of a JFFS2 file, since we needed to boot much more quickly on one of our devices than we currently were. We had discussed the possibility of abandoning JFFS2 entirely, but chose to see how quickly JFFS2 could be made to work. We also knew that Linux would mount this file system much quicker than u-boot did, so knew that at least some speed improvement was possible.
The improvments was from three sources: (1) searching U-Boot mailing list archives, (2) looking at the linux kernel, and (3) our own measurements and improvements. For example, the merge sort (before my modifications) can be found here: http://lists.denx.de/pipermail/u-boot/2007-January/018777.html
Two of the patches were inspired from the Linux kernel: Changing scansize, and recognising the CLEANMARKER. While trying to synchronize U-Boot code with Linux sounds like a good idea, I think this wouldn't be the right way to go, since Linux needs to also build lists of blocks which are empty, and a list of blocks that still need to be erased. Stripping this out is more work than just enhancing what U-Boot currently has.
=== Original cover message follows ===
These patches fix bugs and improve performance of JFFS2. Some of these improvements can already be found in old mailing lists, but for some reason they have not made their way into the u-boot source. I have the feeling that any one of these patches didn't show enough performance gain to warrant adding it to the source. I am hopeful that together, all these patches can be seen to make a big difference.
One of these patches ("Only list each directory entry once") is a bug fix, all the rest are for performance. Although performance is not high on the priority list for a bootloader, the length of time it was taking to scan a JFFS2 filesystem was painfully slow.
The code for mergesort was found in an abandoned u-boot patch, although I have refactored the code somewhat. The original author is still shown at the top of that file.
The timings below are with jffs2_summary_support turned off. With these improvements, summary support was no longer useful - most of our time is now spent loading the actual release. Even without this feature turned on, the code will still load from a filesystem that has summary nodes.
Due to not having other resources, I also have not done anything for NAND flash. At least some of these changes will be directly applicable to NAND as well as NOR.
My own testing is on a system with a 1GHz PowerPC, and 256MB of NOR Flash. The flash accesses are slow compared with processing power and RAM, so minimising the number of flash accesses makes a huge difference. Here are the timing comparisons for three JFFS2 operations on this system: 1) Scanning the file system 2) Getting a director listing of the top level, and 3) Loading a 30MB file.
Times are in seconds, and the contribution from each patch has been measured:
Scan List Load Total Original: 266.0 17.3 23.4 306.7 Speed up comparison: 52.3 17.3 23.4 93.0 List dir entries once: 52.3 11.0 23.4 86.7 Improve read speed: 52.3 0.8 5.4 58.5 Optimize building lists: 31.9 0.8 5.4 38.1 Change scansize: 30.8 0.8 5.4 37.0 Use cleanmarker: 16.0 0.8 5.4 22.2 Use mergesort: 2.0 0.8 5.4 8.2
Note that "List dir entries once" is not a speed improvement as such, but because old versions of a file and deleted files are no longer scanned, there is an improvement on this filesystem (the flash is approx half full).
Also, "change scansize" appears to do very little in this benchmark list. But without it, the "use cleanmarker" would not show as much improvement.
Changes in v2: - Fix comment style - Remove extra {} pair. - Changed comment style. - Fixed some missing calls to put_fl_mem(). - Change comment style - Change comment style - Changed comment style - Changed copyright notice to use SPDX-Licence-Identifier. - Added URL to original mergesort code. - Removed #ifdef, using Makefile change instead.
Mark Tomlinson (8): JFFS2: Return early when file read not necessary JFFS2: Speed up and fix comparison functions JFFS2: Only list each directory entry once JFFS2: Improve speed reading flash files JFFS2: Optimize building lists during scan JFFS2: Change scansize to match linux kernel JFFS2: Use CLEANMARKER to reduce scanning time JFFS2: Use merge sort when parsing filesystem
fs/jffs2/Makefile | 1 + fs/jffs2/jffs2_1pass.c | 257 ++++++++++++++++++++++++++++++----------------- fs/jffs2/jffs2_private.h | 4 + fs/jffs2/mergesort.c | 52 ++++++++++ 4 files changed, 223 insertions(+), 91 deletions(-) create mode 100644 fs/jffs2/mergesort.c

If a destination is not provided, jffs2_1pass_read_inode() only returns the length of the file. In this case, avoid reading all the data nodes, and return as soon as the length of the file is known.
Signed-off-by: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz
---
Changes in v2: - Fix comment style - Remove extra {} pair.
fs/jffs2/jffs2_1pass.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/fs/jffs2/jffs2_1pass.c b/fs/jffs2/jffs2_1pass.c index b1d6470..2e569ff 100644 --- a/fs/jffs2/jffs2_1pass.c +++ b/fs/jffs2/jffs2_1pass.c @@ -719,6 +719,12 @@ jffs2_1pass_read_inode(struct b_lists *pL, u32 inode, char *dest) } put_fl_mem(jNode, pL->readbuf); } + /* + * If no destination is provided, we are done. + * Just return the total size. + */ + if (!dest) + return totalSize; #endif
for (b = pL->frag.listHead; b != NULL; b = b->next) {

On Wed, Jul 01, 2015 at 04:38:22PM +1200, Mark Tomlinson wrote:
If a destination is not provided, jffs2_1pass_read_inode() only returns the length of the file. In this case, avoid reading all the data nodes, and return as soon as the length of the file is known.
Signed-off-by: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz
Applied to u-boot/master, thanks!

Copying complete nodes from flash can be slow if the flash is slow to read. By only reading the data needed, the sorting operation can be made much faster.
The directory entry comparison function also had a two bugs. First, it did not ensure the name was copied, so the name comparison may have been faulty (although it would have worked with NOR flash). Second, setting the ino to zero to ignore the entry did not work, since this was either writing to a temporary buffer, or (for NOR flash) directly to flash. Either way, the change was not remembered.
Signed-off-by: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz
---
Changes in v2: - Changed comment style.
fs/jffs2/jffs2_1pass.c | 87 +++++++++++++++++++++++++++----------------------- 1 file changed, 47 insertions(+), 40 deletions(-)
diff --git a/fs/jffs2/jffs2_1pass.c b/fs/jffs2/jffs2_1pass.c index 2e569ff..aaeb522 100644 --- a/fs/jffs2/jffs2_1pass.c +++ b/fs/jffs2/jffs2_1pass.c @@ -598,14 +598,18 @@ insert_node(struct b_list *list, u32 offset) */ static int compare_inodes(struct b_node *new, struct b_node *old) { - struct jffs2_raw_inode ojNew; - struct jffs2_raw_inode ojOld; - struct jffs2_raw_inode *jNew = - (struct jffs2_raw_inode *)get_fl_mem(new->offset, sizeof(ojNew), &ojNew); - struct jffs2_raw_inode *jOld = - (struct jffs2_raw_inode *)get_fl_mem(old->offset, sizeof(ojOld), &ojOld); - - return jNew->version > jOld->version; + /* + * Only read in the version info from flash, not the entire inode. + * This can make a big difference to speed if flash is slow. + */ + u32 new_version; + u32 old_version; + get_fl_mem(new->offset + offsetof(struct jffs2_raw_inode, version), + sizeof(new_version), &new_version); + get_fl_mem(old->offset + offsetof(struct jffs2_raw_inode, version), + sizeof(old_version), &old_version); + + return new_version > old_version; }
/* Sort directory entries so all entries in the same directory @@ -615,42 +619,45 @@ static int compare_inodes(struct b_node *new, struct b_node *old) */ static int compare_dirents(struct b_node *new, struct b_node *old) { - struct jffs2_raw_dirent ojNew; - struct jffs2_raw_dirent ojOld; - struct jffs2_raw_dirent *jNew = - (struct jffs2_raw_dirent *)get_fl_mem(new->offset, sizeof(ojNew), &ojNew); - struct jffs2_raw_dirent *jOld = - (struct jffs2_raw_dirent *)get_fl_mem(old->offset, sizeof(ojOld), &ojOld); - int cmp; - - /* ascending sort by pino */ - if (jNew->pino != jOld->pino) - return jNew->pino > jOld->pino; - - /* pino is the same, so use ascending sort by nsize, so - * we don't do strncmp unless we really must. - */ - if (jNew->nsize != jOld->nsize) - return jNew->nsize > jOld->nsize; - - /* length is also the same, so use ascending sort by name - */ - cmp = strncmp((char *)jNew->name, (char *)jOld->name, jNew->nsize); - if (cmp != 0) - return cmp > 0; - - /* we have duplicate names in this directory, so use ascending - * sort by version + /* + * Using NULL as the buffer for NOR flash prevents the entire node + * being read. This makes most comparisons much quicker as only one + * or two entries from the node will be used most of the time. */ - if (jNew->version > jOld->version) { - /* since jNew is newer, we know jOld is not valid, so - * mark it with inode 0 and it will not be used + struct jffs2_raw_dirent *jNew = get_node_mem(new->offset, NULL); + struct jffs2_raw_dirent *jOld = get_node_mem(old->offset, NULL); + int cmp; + int ret; + + if (jNew->pino != jOld->pino) { + /* ascending sort by pino */ + ret = jNew->pino > jOld->pino; + } else if (jNew->nsize != jOld->nsize) { + /* + * pino is the same, so use ascending sort by nsize, + * so we don't do strncmp unless we really must. */ - jOld->ino = 0; - return 1; + ret = jNew->nsize > jOld->nsize; + } else { + /* + * length is also the same, so use ascending sort by name + */ + cmp = strncmp((char *)jNew->name, (char *)jOld->name, + jNew->nsize); + if (cmp != 0) { + ret = cmp > 0; + } else { + /* + * we have duplicate names in this directory, + * so use ascending sort by version + */ + ret = jNew->version > jOld->version; + } } + put_fl_mem(jNew, NULL); + put_fl_mem(jOld, NULL);
- return 0; + return ret; } #endif

On Wed, Jul 01, 2015 at 04:38:23PM +1200, Mark Tomlinson wrote:
Copying complete nodes from flash can be slow if the flash is slow to read. By only reading the data needed, the sorting operation can be made much faster.
The directory entry comparison function also had a two bugs. First, it did not ensure the name was copied, so the name comparison may have been faulty (although it would have worked with NOR flash). Second, setting the ino to zero to ignore the entry did not work, since this was either writing to a temporary buffer, or (for NOR flash) directly to flash. Either way, the change was not remembered.
Signed-off-by: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz
Applied to u-boot/master, thanks!

If multiple versions of a file exist, only the most recent version should be used. The scheme to write 0 for the inode in older versions did not work, since this would have required writing to flash.
The only time this caused an issue was listing a directory, where older versions of the file would still be seen. Since the directory entries are sorted, just look at the next entry in the list, and if it's the same move to that entry instead.
Signed-off-by: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz
---
Changes in v2: - Fixed some missing calls to put_fl_mem().
fs/jffs2/jffs2_1pass.c | 38 +++++++++++++++++++++++++++++++++----- 1 file changed, 33 insertions(+), 5 deletions(-)
diff --git a/fs/jffs2/jffs2_1pass.c b/fs/jffs2/jffs2_1pass.c index aaeb522..346f3a1 100644 --- a/fs/jffs2/jffs2_1pass.c +++ b/fs/jffs2/jffs2_1pass.c @@ -845,7 +845,6 @@ jffs2_1pass_find_inode(struct b_lists * pL, const char *name, u32 pino) jDir = (struct jffs2_raw_dirent *) get_node_mem(b->offset, pL->readbuf); if ((pino == jDir->pino) && (len == jDir->nsize) && - (jDir->ino) && /* 0 for unlink */ (!strncmp((char *)jDir->name, name, len))) { /* a match */ if (jDir->version < version) { put_fl_mem(jDir, pL->readbuf); @@ -966,13 +965,43 @@ jffs2_1pass_list_inodes(struct b_lists * pL, u32 pino) for (b = pL->dir.listHead; b; b = b->next) { jDir = (struct jffs2_raw_dirent *) get_node_mem(b->offset, pL->readbuf); - if ((pino == jDir->pino) && (jDir->ino)) { /* ino=0 -> unlink */ + if (pino == jDir->pino) { u32 i_version = 0; struct jffs2_raw_inode ojNode; struct jffs2_raw_inode *jNode, *i = NULL; - struct b_node *b2 = pL->frag.listHead; + struct b_node *b2;
- while (b2) { +#ifdef CONFIG_SYS_JFFS2_SORT_FRAGMENTS + /* Check for more recent versions of this file */ + int match; + do { + struct b_node *next = b->next; + struct jffs2_raw_dirent *jDirNext; + if (!next) + break; + jDirNext = (struct jffs2_raw_dirent *) + get_node_mem(next->offset, NULL); + match = jDirNext->pino == jDir->pino && + jDirNext->nsize == jDir->nsize && + strncmp((char *)jDirNext->name, + (char *)jDir->name, + jDir->nsize) == 0; + if (match) { + /* Use next. It is more recent */ + b = next; + /* Update buffer with the new info */ + *jDir = *jDirNext; + } + put_fl_mem(jDirNext, NULL); + } while (match); +#endif + if (jDir->ino == 0) { + /* Deleted file */ + put_fl_mem(jDir, pL->readbuf); + continue; + } + + for (b2 = pL->frag.listHead; b2; b2 = b2->next) { jNode = (struct jffs2_raw_inode *) get_fl_mem(b2->offset, sizeof(ojNode), &ojNode); if (jNode->ino == jDir->ino && jNode->version >= i_version) { @@ -988,7 +1017,6 @@ jffs2_1pass_list_inodes(struct b_lists * pL, u32 pino) sizeof(*i), NULL); } - b2 = b2->next; }
dump_inode(pL, jDir, i);

On Wed, Jul 01, 2015 at 04:38:24PM +1200, Mark Tomlinson wrote:
If multiple versions of a file exist, only the most recent version should be used. The scheme to write 0 for the inode in older versions did not work, since this would have required writing to flash.
The only time this caused an issue was listing a directory, where older versions of the file would still be seen. Since the directory entries are sorted, just look at the next entry in the list, and if it's the same move to that entry instead.
Signed-off-by: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz
Applied to u-boot/master, thanks!

jffs2_1pass_read_inode() would read the entire data for each node in the filesystem, regardless of whether it was part of the file to be loaded or not. By only reading the header data for an inode, and then reading the data only when it is found to be part of the file to be loaded, much copying of data is saved.
jffs2_1pass_list_inodes() read each inode for every file in the directory into a buffer. By using NULL as a buffer pointer, NOR flash simply returns a pointer, and therefore avoids a memory copy.
Signed-off-by: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz
---
Changes in v2: - Change comment style
fs/jffs2/jffs2_1pass.c | 27 +++++++++++++++++++++------ 1 file changed, 21 insertions(+), 6 deletions(-)
diff --git a/fs/jffs2/jffs2_1pass.c b/fs/jffs2/jffs2_1pass.c index 346f3a1..e58e7d2 100644 --- a/fs/jffs2/jffs2_1pass.c +++ b/fs/jffs2/jffs2_1pass.c @@ -735,8 +735,13 @@ jffs2_1pass_read_inode(struct b_lists *pL, u32 inode, char *dest) #endif
for (b = pL->frag.listHead; b != NULL; b = b->next) { - jNode = (struct jffs2_raw_inode *) get_node_mem(b->offset, - pL->readbuf); + /* + * Copy just the node and not the data at this point, + * since we don't yet know if we need this data. + */ + jNode = (struct jffs2_raw_inode *)get_fl_mem(b->offset, + sizeof(struct jffs2_raw_inode), + pL->readbuf); if (inode == jNode->ino) { #if 0 putLabeledWord("\r\n\r\nread_inode: totlen = ", jNode->totlen); @@ -760,7 +765,15 @@ jffs2_1pass_read_inode(struct b_lists *pL, u32 inode, char *dest) #endif
if(dest) { - src = ((uchar *) jNode) + sizeof(struct jffs2_raw_inode); + /* + * Now that the inode has been checked, + * read the entire inode, including data. + */ + put_fl_mem(jNode, pL->readbuf); + jNode = (struct jffs2_raw_inode *) + get_node_mem(b->offset, pL->readbuf); + src = ((uchar *)jNode) + + sizeof(struct jffs2_raw_inode); /* ignore data behind latest known EOF */ if (jNode->offset > totalSize) { put_fl_mem(jNode, pL->readbuf); @@ -967,7 +980,6 @@ jffs2_1pass_list_inodes(struct b_lists * pL, u32 pino) pL->readbuf); if (pino == jDir->pino) { u32 i_version = 0; - struct jffs2_raw_inode ojNode; struct jffs2_raw_inode *jNode, *i = NULL; struct b_node *b2;
@@ -1003,8 +1015,10 @@ jffs2_1pass_list_inodes(struct b_lists * pL, u32 pino)
for (b2 = pL->frag.listHead; b2; b2 = b2->next) { jNode = (struct jffs2_raw_inode *) - get_fl_mem(b2->offset, sizeof(ojNode), &ojNode); - if (jNode->ino == jDir->ino && jNode->version >= i_version) { + get_fl_mem(b2->offset, sizeof(*jNode), + NULL); + if (jNode->ino == jDir->ino && + jNode->version >= i_version) { i_version = jNode->version; if (i) put_fl_mem(i, NULL); @@ -1017,6 +1031,7 @@ jffs2_1pass_list_inodes(struct b_lists * pL, u32 pino) sizeof(*i), NULL); } + put_fl_mem(jNode, NULL); }
dump_inode(pL, jDir, i);

On Wed, Jul 01, 2015 at 04:38:25PM +1200, Mark Tomlinson wrote:
jffs2_1pass_read_inode() would read the entire data for each node in the filesystem, regardless of whether it was part of the file to be loaded or not. By only reading the header data for an inode, and then reading the data only when it is found to be part of the file to be loaded, much copying of data is saved.
jffs2_1pass_list_inodes() read each inode for every file in the directory into a buffer. By using NULL as a buffer pointer, NOR flash simply returns a pointer, and therefore avoids a memory copy.
Signed-off-by: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz
Applied to u-boot/master, thanks!

If the flash is slow, reading less from the flash into buffers makes the process faster.
Signed-off-by: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz
---
Changes in v2: - Change comment style
fs/jffs2/jffs2_1pass.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/fs/jffs2/jffs2_1pass.c b/fs/jffs2/jffs2_1pass.c index e58e7d2..f488537 100644 --- a/fs/jffs2/jffs2_1pass.c +++ b/fs/jffs2/jffs2_1pass.c @@ -1501,7 +1501,7 @@ jffs2_1pass_build_lists(struct part_info * part) u32 counterF = 0; u32 counterN = 0; u32 max_totlen = 0; - u32 buf_size = DEFAULT_EMPTY_SCAN_SIZE; + u32 buf_size; char *buf;
nr_sectors = lldiv(part->size, part->sector_size); @@ -1513,7 +1513,7 @@ jffs2_1pass_build_lists(struct part_info * part) /* if we are building a list we need to refresh the cache. */ jffs_init_1pass_list(part); pL = (struct b_lists *)part->jffs2_priv; - buf = malloc(buf_size); + buf = malloc(DEFAULT_EMPTY_SCAN_SIZE); puts ("Scanning JFFS2 FS: ");
/* start at the beginning of the partition */ @@ -1529,6 +1529,8 @@ jffs2_1pass_build_lists(struct part_info * part) int ret; #endif
+ /* Set buf_size to maximum length */ + buf_size = DEFAULT_EMPTY_SCAN_SIZE; WATCHDOG_RESET();
#ifdef CONFIG_JFFS2_SUMMARY @@ -1603,6 +1605,11 @@ jffs2_1pass_build_lists(struct part_info * part)
ofs += sector_ofs; prevofs = ofs - 1; + /* + * Set buf_size down to the minimum size required. + * This prevents reading in chunks of flash data unnecessarily. + */ + buf_size = sizeof(union jffs2_node_union);
scan_more: while (ofs < sector_ofs + part->sector_size) { @@ -1683,13 +1690,18 @@ jffs2_1pass_build_lists(struct part_info * part) case JFFS2_NODETYPE_INODE: if (buf_ofs + buf_len < ofs + sizeof(struct jffs2_raw_inode)) { + buf_len = min_t(uint32_t, + sizeof(struct jffs2_raw_inode), + sector_ofs + + part->sector_size - + ofs); get_fl_mem((u32)part->offset + ofs, buf_len, buf); buf_ofs = ofs; node = (void *)buf; } - if (!inode_crc((struct jffs2_raw_inode *) node)) - break; + if (!inode_crc((struct jffs2_raw_inode *)node)) + break;
if (insert_node(&pL->frag, (u32) part->offset + ofs) == NULL) { @@ -1706,6 +1718,11 @@ jffs2_1pass_build_lists(struct part_info * part) ((struct jffs2_raw_dirent *) node)->nsize) { + buf_len = min_t(uint32_t, + node->totlen, + sector_ofs + + part->sector_size - + ofs); get_fl_mem((u32)part->offset + ofs, buf_len, buf); buf_ofs = ofs;

On Wed, Jul 01, 2015 at 04:38:26PM +1200, Mark Tomlinson wrote:
If the flash is slow, reading less from the flash into buffers makes the process faster.
Signed-off-by: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz
Applied to u-boot/master, thanks!

The scan code is similar to the linux kernel, but the kernel defines a much smaller size to scan through before deciding a sector is blank. Assuming that what is in the kernel is OK, make these two match.
On its own, this change makes no difference to scanning of any sectors which have a clean marker at the beginning, since the entire sector is not blank.
Signed-off-by: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz ---
Changes in v2: None
fs/jffs2/jffs2_1pass.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/jffs2/jffs2_1pass.c b/fs/jffs2/jffs2_1pass.c index f488537..c55d472 100644 --- a/fs/jffs2/jffs2_1pass.c +++ b/fs/jffs2/jffs2_1pass.c @@ -1480,7 +1480,7 @@ dump_dirents(struct b_lists *pL) } #endif
-#define DEFAULT_EMPTY_SCAN_SIZE 4096 +#define DEFAULT_EMPTY_SCAN_SIZE 256
static inline uint32_t EMPTY_SCAN_SIZE(uint32_t sector_size) {

On Wed, Jul 01, 2015 at 04:38:27PM +1200, Mark Tomlinson wrote:
The scan code is similar to the linux kernel, but the kernel defines a much smaller size to scan through before deciding a sector is blank. Assuming that what is in the kernel is OK, make these two match.
On its own, this change makes no difference to scanning of any sectors which have a clean marker at the beginning, since the entire sector is not blank.
Signed-off-by: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz
Applied to u-boot/master, thanks!

If a sector has a CLEANMARKER at the beginning, it indicates that the entire sector has been erased. Therefore, if this is found, we can skip the entire block. This was not being done before this patch.
The code now does the same as the kernel does when encountering a CLEANMARKER. It still checks that the next few words are FFFFFFFF, and if so, the block is assumed to be empty, and so is skipped.
Signed-off-by: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz
---
Changes in v2: - Changed comment style
fs/jffs2/jffs2_1pass.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)
diff --git a/fs/jffs2/jffs2_1pass.c b/fs/jffs2/jffs2_1pass.c index c55d472..4325792 100644 --- a/fs/jffs2/jffs2_1pass.c +++ b/fs/jffs2/jffs2_1pass.c @@ -1528,6 +1528,8 @@ jffs2_1pass_build_lists(struct part_info * part) uint32_t sumlen; int ret; #endif + /* Indicates a sector with a CLEANMARKER was found */ + int clean_sector = 0;
/* Set buf_size to maximum length */ buf_size = DEFAULT_EMPTY_SCAN_SIZE; @@ -1652,6 +1654,14 @@ jffs2_1pass_build_lists(struct part_info * part) ofs += 4; } /* Ran off end. */ + /* + * If this sector had a clean marker at the + * beginning, and immediately following this + * have been a bunch of FF bytes, treat the + * entire sector as empty. + */ + if (clean_sector) + break;
/* See how much more there is to read in this * eraseblock... @@ -1673,6 +1683,11 @@ jffs2_1pass_build_lists(struct part_info * part) buf_ofs = ofs; goto more_empty; } + /* + * Found something not erased in the sector, so reset + * the 'clean_sector' flag. + */ + clean_sector = 0; if (node->magic != JFFS2_MAGIC_BITMASK || !hdr_crc(node)) { ofs += 4; @@ -1754,6 +1769,16 @@ jffs2_1pass_build_lists(struct part_info * part) "%d != %zu\n", node->totlen, sizeof(struct jffs2_unknown_node)); + if ((node->totlen == + sizeof(struct jffs2_unknown_node)) && + (ofs == sector_ofs)) { + /* + * Found a CLEANMARKER at the beginning + * of the sector. It's in the correct + * place with correct size and CRC. + */ + clean_sector = 1; + } break; case JFFS2_NODETYPE_PADDING: if (node->totlen < sizeof(struct jffs2_unknown_node))

On Wed, Jul 01, 2015 at 04:38:28PM +1200, Mark Tomlinson wrote:
If a sector has a CLEANMARKER at the beginning, it indicates that the entire sector has been erased. Therefore, if this is found, we can skip the entire block. This was not being done before this patch.
The code now does the same as the kernel does when encountering a CLEANMARKER. It still checks that the next few words are FFFFFFFF, and if so, the block is assumed to be empty, and so is skipped.
Signed-off-by: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz
Applied to u-boot/master, thanks!

When building the file system the existing code does an insertion into a linked list. It attempts to speed this up by keeping a pointer to where the last entry was inserted but it's still slow.
Now the nodes are just inserted into the list without searching through for the correct place. This unsorted list is then sorted once using mergesort after all the entries have been added to the list. This speeds up the scanning of the flash file system considerably.
Signed-off-by: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz
---
Changes in v2: - Changed copyright notice to use SPDX-Licence-Identifier. - Added URL to original mergesort code. - Removed #ifdef, using Makefile change instead.
fs/jffs2/Makefile | 1 + fs/jffs2/jffs2_1pass.c | 47 +++++++++++-------------------------------- fs/jffs2/jffs2_private.h | 4 ++++ fs/jffs2/mergesort.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 69 insertions(+), 35 deletions(-) create mode 100644 fs/jffs2/mergesort.c
diff --git a/fs/jffs2/Makefile b/fs/jffs2/Makefile index 4cb0600..3625d74 100644 --- a/fs/jffs2/Makefile +++ b/fs/jffs2/Makefile @@ -10,4 +10,5 @@ obj-y += compr_rtime.o obj-y += compr_rubin.o obj-y += compr_zlib.o obj-y += jffs2_1pass.o +obj-$(CONFIG_SYS_JFFS2_SORT_FRAGMENTS) += mergesort.o obj-y += mini_inflate.o diff --git a/fs/jffs2/jffs2_1pass.c b/fs/jffs2/jffs2_1pass.c index 4325792..64f5542 100644 --- a/fs/jffs2/jffs2_1pass.c +++ b/fs/jffs2/jffs2_1pass.c @@ -545,49 +545,19 @@ static struct b_node * insert_node(struct b_list *list, u32 offset) { struct b_node *new; -#ifdef CONFIG_SYS_JFFS2_SORT_FRAGMENTS - struct b_node *b, *prev; -#endif
if (!(new = add_node(list))) { putstr("add_node failed!\r\n"); return NULL; } new->offset = offset; + new->next = NULL;
-#ifdef CONFIG_SYS_JFFS2_SORT_FRAGMENTS - if (list->listTail != NULL && list->listCompare(new, list->listTail)) - prev = list->listTail; - else if (list->listLast != NULL && list->listCompare(new, list->listLast)) - prev = list->listLast; + if (list->listTail != NULL) + list->listTail->next = new; else - prev = NULL; - - for (b = (prev ? prev->next : list->listHead); - b != NULL && list->listCompare(new, b); - prev = b, b = b->next) { - list->listLoops++; - } - if (b != NULL) - list->listLast = prev; - - if (b != NULL) { - new->next = b; - if (prev != NULL) - prev->next = new; - else - list->listHead = new; - } else -#endif - { - new->next = (struct b_node *) NULL; - if (list->listTail != NULL) { - list->listTail->next = new; - list->listTail = new; - } else { - list->listTail = list->listHead = new; - } - } + list->listHead = new; + list->listTail = new;
return new; } @@ -1800,6 +1770,13 @@ jffs2_1pass_build_lists(struct part_info * part) }
free(buf); +#if defined(CONFIG_SYS_JFFS2_SORT_FRAGMENTS) + /* + * Sort the lists. + */ + sort_list(&pL->frag); + sort_list(&pL->dir); +#endif putstr("\b\b done.\r\n"); /* close off the dots */
/* We don't care if malloc failed - then each read operation will diff --git a/fs/jffs2/jffs2_private.h b/fs/jffs2/jffs2_private.h index 658b325..06b6ca2 100644 --- a/fs/jffs2/jffs2_private.h +++ b/fs/jffs2/jffs2_private.h @@ -98,4 +98,8 @@ data_crc(struct jffs2_raw_inode *node) } }
+#if defined(CONFIG_SYS_JFFS2_SORT_FRAGMENTS) +/* External merge sort. */ +int sort_list(struct b_list *list); +#endif #endif /* jffs2_private.h */ diff --git a/fs/jffs2/mergesort.c b/fs/jffs2/mergesort.c new file mode 100644 index 0000000..6e633a1 --- /dev/null +++ b/fs/jffs2/mergesort.c @@ -0,0 +1,52 @@ +/* + * This file is copyright 2001 Simon Tatham. + * Rewritten from original source 2006 by Dan Merillat for use in u-boot. + * + * Original code can be found at: + * http://www.chiark.greenend.org.uk/~sgtatham/algorithms/listsort.html + * + * SPDX-License-Identifier: MIT + */ + +#include <common.h> +#include "jffs2_private.h" + +int sort_list(struct b_list *list) +{ + struct b_node *p, *q, *e, **tail; + int k, psize, qsize; + + if (!list->listHead) + return 0; + + for (k = 1; k < list->listCount; k *= 2) { + tail = &list->listHead; + for (p = q = list->listHead; p; p = q) { + /* step 'k' places from p; */ + for (psize = 0; q && psize < k; psize++) + q = q->next; + qsize = k; + + /* two lists, merge them. */ + while (psize || (qsize && q)) { + /* merge the next element */ + if (psize == 0 || + ((qsize && q) && + list->listCompare(p, q))) { + /* p is empty, or p > q, so q next */ + e = q; + q = q->next; + qsize--; + } else { + e = p; + p = p->next; + psize--; + } + e->next = NULL; /* break accidental loops. */ + *tail = e; + tail = &e->next; + } + } + } + return 0; +}

On Wed, Jul 01, 2015 at 04:38:29PM +1200, Mark Tomlinson wrote:
When building the file system the existing code does an insertion into a linked list. It attempts to speed this up by keeping a pointer to where the last entry was inserted but it's still slow.
Now the nodes are just inserted into the list without searching through for the correct place. This unsorted list is then sorted once using mergesort after all the entries have been added to the list. This speeds up the scanning of the flash file system considerably.
Signed-off-by: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz
Applied to u-boot/master, thanks!

Mark,
On Wed, Jul 1, 2015 at 4:38 PM, Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz wrote:
In reply to comments from Wolfgang Denk and Heiko Schocher:
I think you forgot to Cc Wolfgang and Heiko on this. Ditto with the updated patches Heiko reviewed from v1.
My aim was to optimize U-Boot's loading of a JFFS2 file, since we needed to boot much more quickly on one of our devices than we currently were. We had discussed the possibility of abandoning JFFS2 entirely, but chose to see how quickly JFFS2 could be made to work. We also knew that Linux would mount this file system much quicker than u-boot did, so knew that at least some speed improvement was possible.
The improvments was from three sources: (1) searching U-Boot mailing list archives, (2) looking at the linux kernel, and (3) our own measurements and improvements. For example, the merge sort (before my modifications) can be found here: http://lists.denx.de/pipermail/u-boot/2007-January/018777.html
Two of the patches were inspired from the Linux kernel: Changing scansize, and recognising the CLEANMARKER. While trying to synchronize U-Boot code with Linux sounds like a good idea, I think this wouldn't be the right way to go, since Linux needs to also build lists of blocks which are empty, and a list of blocks that still need to be erased. Stripping this out is more work than just enhancing what U-Boot currently has.
=== Original cover message follows ===
These patches fix bugs and improve performance of JFFS2. Some of these improvements can already be found in old mailing lists, but for some reason they have not made their way into the u-boot source. I have the feeling that any one of these patches didn't show enough performance gain to warrant adding it to the source. I am hopeful that together, all these patches can be seen to make a big difference.
One of these patches ("Only list each directory entry once") is a bug fix, all the rest are for performance. Although performance is not high on the priority list for a bootloader, the length of time it was taking to scan a JFFS2 filesystem was painfully slow.
The code for mergesort was found in an abandoned u-boot patch, although I have refactored the code somewhat. The original author is still shown at the top of that file.
The timings below are with jffs2_summary_support turned off. With these improvements, summary support was no longer useful - most of our time is now spent loading the actual release. Even without this feature turned on, the code will still load from a filesystem that has summary nodes.
Due to not having other resources, I also have not done anything for NAND flash. At least some of these changes will be directly applicable to NAND as well as NOR.
My own testing is on a system with a 1GHz PowerPC, and 256MB of NOR Flash. The flash accesses are slow compared with processing power and RAM, so minimising the number of flash accesses makes a huge difference. Here are the timing comparisons for three JFFS2 operations on this system:
- Scanning the file system
- Getting a director listing of the top level, and
- Loading a 30MB file.
Times are in seconds, and the contribution from each patch has been measured:
Scan List Load Total
Original: 266.0 17.3 23.4 306.7 Speed up comparison: 52.3 17.3 23.4 93.0 List dir entries once: 52.3 11.0 23.4 86.7 Improve read speed: 52.3 0.8 5.4 58.5 Optimize building lists: 31.9 0.8 5.4 38.1 Change scansize: 30.8 0.8 5.4 37.0 Use cleanmarker: 16.0 0.8 5.4 22.2 Use mergesort: 2.0 0.8 5.4 8.2
Note that "List dir entries once" is not a speed improvement as such, but because old versions of a file and deleted files are no longer scanned, there is an improvement on this filesystem (the flash is approx half full).
Also, "change scansize" appears to do very little in this benchmark list. But without it, the "use cleanmarker" would not show as much improvement.
Changes in v2:
- Fix comment style
- Remove extra {} pair.
- Changed comment style.
- Fixed some missing calls to put_fl_mem().
- Change comment style
- Change comment style
- Changed comment style
- Changed copyright notice to use SPDX-Licence-Identifier.
- Added URL to original mergesort code.
- Removed #ifdef, using Makefile change instead.
Mark Tomlinson (8): JFFS2: Return early when file read not necessary JFFS2: Speed up and fix comparison functions JFFS2: Only list each directory entry once JFFS2: Improve speed reading flash files JFFS2: Optimize building lists during scan JFFS2: Change scansize to match linux kernel JFFS2: Use CLEANMARKER to reduce scanning time JFFS2: Use merge sort when parsing filesystem
fs/jffs2/Makefile | 1 + fs/jffs2/jffs2_1pass.c | 257 ++++++++++++++++++++++++++++++----------------- fs/jffs2/jffs2_private.h | 4 + fs/jffs2/mergesort.c | 52 ++++++++++ 4 files changed, 223 insertions(+), 91 deletions(-) create mode 100644 fs/jffs2/mergesort.c
-- 1.9.1
U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
participants (3)
-
Chris Packham
-
Mark Tomlinson
-
Tom Rini