[U-Boot] [RFC] utilize flash small block sizes to reduce flash footprint

In a nit-picking moment trying to save some flash storage, I looked at the current typical flash layout of U-Boot and came up with below RFC.
Before I start to implement anything, I'd like to hear your comments about this, especially from the architecture maintainers. Given that you have some spare time to look into this, of course. Sorry in case this was discussed before, did only a coarse list archive search.
I'd volunteer to provide an implementation for the architectures I have close at hand today (Blackfin, ARM11), as time permits. For other architectures, I'd need some assistance.
Abstract --------
Approach to utilize flashes with different block sizes to reduce U-Boot flash footprint. In general only relevant for boards with environment in flash.
Prerequisites -------------
Quite a few NOR flashes are divided into different block sizes. For example, the widespread Intel/Numonyx/Micron P30 series is divided into 4 blocks of 32 kByte, followed by n blocks of 128 kByte. Depending on variant, the smaller blocks are the first blocks (bottom) or the last blocks (top). The bottom variant is more common, as it allows flexible total sizes without a change in the location of the small blocks.
Given a not uncommon U-Boot size of around 128kB (+/- a few 10kB), accompanied by a separate flash block for a configurable environment, it would make sense to make use of this layout.
All below statements are focused on NOR flashes like above, with environment in flash. Also things like standalone applications, out-of-binary graphics data, etc. are not taken into account. Hence this RFC, as I probably miss some requirements and/or problems.
Flash layouts -------------
Current best-case layout: 32kB U-Boot 32kB U-Boot 32kB U-Boot 32kB U-Boot 128kB U-Boot environment => 256kB flash used
Current worst-case layout: 32kB U-Boot 32kB U-Boot 32kB U-Boot 32kB U-Boot 128kB U-Boot (given U-Boot is >128kB) 128kB U-Boot environment 128kB optional: VPD => 384kB to 512kB flash used
Suggested layout: 32kB U-Boot environment 32kB U-Boot or optional VPD 32kB U-Boot 32kB U-Boot 128kB U-Boot => 256kB flash used with 192kB/224kB room for U-Boot
VPD: Vital Product Data (board type, MAC addresses, serial number, licenses, certificates, etc.), which are not to be (solely) saved in U-Boot environment, for whatever reason.
Current status --------------
A quick grep indicates that most, if not all boards locate U-Boot at the start of the flash.
The U-Boot environment is written to the start of the configured flash block. Prior to that, this block is erased, its content is lost.
Possible approach -----------------
Most CPUs start execution at flash address 0 or somewhere in this region at the reset ISR vector.
In case U-Boot is not located there but some flash sectors later, at this reset vector there has to be some architecture specific code block, which could be a - simple jump to U-Boot entry - dummy ISR table with reset vector pointing to U-Boot - specific trampoline code - ...
In other words, a few bytes (up to e.g. 1kB) of (static) binary data, generated at compile time.
This has to be put before the environment, either implicit or included at the start of "struct environment_s". The latter would of course break backward compatibility, at least without some additional code.
The variant of including this code into a specific block would again waste a flash block resp. mandate the presence of a VPD or similar.
Things to define / check ------------------------
- U-Boot entry point on non-PIC builds (U-Boot start != CONFIG_SYS_FLASH_BASE) - U-Boot on architectures not capable of a single jump forward by e.g. 32kB or 64kB - redundant environment - mixed flash/external environment - ...

Hi Andreas,
Le 27/09/2010 07:15, Andreas Pretzsch a écrit :
In a nit-picking moment trying to save some flash storage, I looked at the current typical flash layout of U-Boot and came up with below RFC.
Before I start to implement anything, I'd like to hear your comments about this, especially from the architecture maintainers. Given that you have some spare time to look into this, of course. Sorry in case this was discussed before, did only a coarse list archive search.
I'd volunteer to provide an implementation for the architectures I have close at hand today (Blackfin, ARM11), as time permits. For other architectures, I'd need some assistance.
I am willing to provide some assistance, as I recently posted (then withdrew in order to take Heiko's reloc patches into account) a patch that would maximize use of the topmost flash block for systems that boot at a high address yet fat from the topmost address.
Find my comments interspersed with your RFC.
Abstract
Approach to utilize flashes with different block sizes to reduce U-Boot flash footprint. In general only relevant for boards with environment in flash.
Prerequisites
Quite a few NOR flashes are divided into different block sizes. For example, the widespread Intel/Numonyx/Micron P30 series is divided into 4 blocks of 32 kByte, followed by n blocks of 128 kByte. Depending on variant, the smaller blocks are the first blocks (bottom) or the last blocks (top). The bottom variant is more common, as it allows flexible total sizes without a change in the location of the small blocks.
This is considering about a single case of FLASH block layout. On my systems, I have two cases, a 29LV400 with 1x16, 2x8, 1x32, 7x64 KB layout; and a uniform Spansion s29gl with 64 KB sectors -- and even the uniform case could handle some optimizing.
Given a not uncommon U-Boot size of around 128kB (+/- a few 10kB), accompanied by a separate flash block for a configurable environment, it would make sense to make use of this layout.
All below statements are focused on NOR flashes like above, with environment in flash. Also things like standalone applications, out-of-binary graphics data, etc. are not taken into account. Hence this RFC, as I probably miss some requirements and/or problems.
Flash layouts
Current best-case layout: 32kB U-Boot 32kB U-Boot 32kB U-Boot 32kB U-Boot 128kB U-Boot environment => 256kB flash used
Current worst-case layout: 32kB U-Boot 32kB U-Boot 32kB U-Boot 32kB U-Boot 128kB U-Boot (given U-Boot is>128kB) 128kB U-Boot environment 128kB optional: VPD => 384kB to 512kB flash used
Suggested layout: 32kB U-Boot environment 32kB U-Boot or optional VPD 32kB U-Boot 32kB U-Boot 128kB U-Boot => 256kB flash used with 192kB/224kB room for U-Boot
This is actualy what is done by Lacie when using the 29LV400: they put the environment in one of the 8K sectors.
VPD: Vital Product Data (board type, MAC addresses, serial number, licenses, certificates, etc.), which are not to be (solely) saved in U-Boot environment, for whatever reason.
Current status
A quick grep indicates that most, if not all boards locate U-Boot at the start of the flash.
Not all, definitely -- the net5big has a big flash, yet places u-boot rather high (still, not high enough for me as it wastes 64 KB out of 512 on my edmini, so I'd done a patch to use that, but it does not work with Heiko's reloc patches).
The U-Boot environment is written to the start of the configured flash block. Prior to that, this block is erased, its content is lost.
Possible approach
Most CPUs start execution at flash address 0 or somewhere in this region at the reset ISR vector.
Hmm... Wrong for the two boards I use: they boot at 0xffff0000, with flash areas starting at fff80000 and ffc00000 respectively. I suspect many ARM926 based boards boot to ffff0000 too.
In case U-Boot is not located there but some flash sectors later, at this reset vector there has to be some architecture specific code block, which could be a
- simple jump to U-Boot entry
- dummy ISR table with reset vector pointing to U-Boot
- specific trampoline code
- ...
This point applies to the boards I use: they reset at ffff0000 but the u-boot code is never put there; instead, a small block of code does some checks then branches to fff90000, where u-boot's _start entry point resides.
In other words, a few bytes (up to e.g. 1kB) of (static) binary data, generated at compile time.
This has to be put before the environment, either implicit or included at the start of "struct environment_s".
This would work only for cases where booting is at the start of the flash. In my case, the small booting code is at the top, wasting 64 KB.
The latter would of course break backward compatibility, at least without some additional code.
The variant of including this code into a specific block would again waste a flash block resp. mandate the presence of a VPD or similar.
Things to define / check
- U-Boot entry point on non-PIC builds (U-Boot start != CONFIG_SYS_FLASH_BASE)
- U-Boot on architectures not capable of a single jump forward by e.g. 32kB or 64kB
- redundant environment
- mixed flash/external environment
- ...
The obvious issue in your design is that if there is a loss of power while the environment is written into, then the boot code runs the risk of being erased; making the board boot again properly will then require technical operations far beyond customer abilities if this board is used in production.
However I see a solution to your problem and mine in a single unified design.
My 'top 64KB' patch would work by putting the start of u-boot (the part that would run from flash) in the top 64 KB so that _sttart would end up exactly at ffff0000, then putting the rest of the u-boot code below this, not touching the environment.
Now in your case, you want (unlike me) to have u-boot in the lowest possible area of a bootom boot FLASH.
Bottom boot flashes always have several smaller sectors which add up in size to a big sector so that big sectors are properly aligned. U-boot environment takes only one. So what if u-boot _start is located in the lowest sector along with some more of u-boot's code, then the environment in the next sector, the the rest of u-boot in susbequent ones?
That's where our designs could meet: define a set of configuration options which indicate where part 1 and part 2 of the u-boot code should go in FLASH. In my case part 1 would go to the upper 64 KB; in your case, it would go to the lower sector and size.
One important note: I don't mean to manually map code to these two parts at link time by fine-tuning the linker file as Wolfgang suggested, because I think this is pretty sensitive to any change in the code which would make the mapping over- or under-fill one area.
What I intend to do is build an u-boot which has its text+data in the usual single linear part (it could be linked for some RAM location), then split the .bin up into part 1 and 2, and flash each part in its own FLASH area. The startup code would fetch parts 1 and 2 and assemble them back in RAM.
This of course requires making the startup code really position-independent so that it could run from anywhere within the FLASH, and that it all resides in part 1.
The second point is what Heiko's patches made more difficult, because they full in much code for FLASH execution that was run from RAM before in arm926 systems.
Solving this will probably require that code and data intended for FLASH execution be marked with a specific attribute similar to the __init one in Linux; this attribute will put the code and data in separate sections (.text.flash and .data.flash, for instance) which the linker file would then output before the others (.text, .*data*).
Amicalement,

Le 27/09/2010 08:37, Albert ARIBAUD a écrit :
The second point is what Heiko's patches made more difficult, because they full in much code for FLASH execution that was run from RAM before in arm926 systems.
s/full/pull/
Apologies,

Dear Albert,
first, thanks for reading through my RFC. I agree that there are a couple of simplifications in there, e.g. reset vectors. Written after about 12 hours of hacking...
I see you're struggling with a comparable, yet a bit more complicated issue. I have to admit that I only skimmed over the threads, but it looks to me that most of it could be done also with linkers help.
Personally, I'm fine with a tuned linker file. Preparations are already there in U-Boot and the real life block allocation is pretty static, too. So following the KISS principle, I'd go with that approach.
For the specific case of mixed-sector-size flashes with linear allocation (which is all the RFC was about in the first place), Wolfgang pointed to embedded environment resp. linker adaption. Which solves the issue perfectly. Therefore I see "my" case as closed and the RFC as redundant.
As time permits, I'll have a look at your points again later. But honestly, the stack on my desk piles up a bit too much right now... Sorry.
Best regards, Andreas
Am Montag, den 27.09.2010, 08:37 +0200 schrieb Albert ARIBAUD:
Hi Andreas,
Le 27/09/2010 07:15, Andreas Pretzsch a écrit :
In a nit-picking moment trying to save some flash storage, I looked at the current typical flash layout of U-Boot and came up with below RFC.
Before I start to implement anything, I'd like to hear your comments about this, especially from the architecture maintainers. Given that you have some spare time to look into this, of course. Sorry in case this was discussed before, did only a coarse list archive search.
I'd volunteer to provide an implementation for the architectures I have close at hand today (Blackfin, ARM11), as time permits. For other architectures, I'd need some assistance.
I am willing to provide some assistance, as I recently posted (then withdrew in order to take Heiko's reloc patches into account) a patch that would maximize use of the topmost flash block for systems that boot at a high address yet fat from the topmost address.
Find my comments interspersed with your RFC.
Abstract
Approach to utilize flashes with different block sizes to reduce U-Boot flash footprint. In general only relevant for boards with environment in flash.
Prerequisites
Quite a few NOR flashes are divided into different block sizes. For example, the widespread Intel/Numonyx/Micron P30 series is divided into 4 blocks of 32 kByte, followed by n blocks of 128 kByte. Depending on variant, the smaller blocks are the first blocks (bottom) or the last blocks (top). The bottom variant is more common, as it allows flexible total sizes without a change in the location of the small blocks.
This is considering about a single case of FLASH block layout. On my systems, I have two cases, a 29LV400 with 1x16, 2x8, 1x32, 7x64 KB layout; and a uniform Spansion s29gl with 64 KB sectors -- and even the uniform case could handle some optimizing.
Given a not uncommon U-Boot size of around 128kB (+/- a few 10kB), accompanied by a separate flash block for a configurable environment, it would make sense to make use of this layout.
All below statements are focused on NOR flashes like above, with environment in flash. Also things like standalone applications, out-of-binary graphics data, etc. are not taken into account. Hence this RFC, as I probably miss some requirements and/or problems.
Flash layouts
Current best-case layout: 32kB U-Boot 32kB U-Boot 32kB U-Boot 32kB U-Boot 128kB U-Boot environment => 256kB flash used
Current worst-case layout: 32kB U-Boot 32kB U-Boot 32kB U-Boot 32kB U-Boot 128kB U-Boot (given U-Boot is>128kB) 128kB U-Boot environment 128kB optional: VPD => 384kB to 512kB flash used
Suggested layout: 32kB U-Boot environment 32kB U-Boot or optional VPD 32kB U-Boot 32kB U-Boot 128kB U-Boot => 256kB flash used with 192kB/224kB room for U-Boot
This is actualy what is done by Lacie when using the 29LV400: they put the environment in one of the 8K sectors.
VPD: Vital Product Data (board type, MAC addresses, serial number, licenses, certificates, etc.), which are not to be (solely) saved in U-Boot environment, for whatever reason.
Current status
A quick grep indicates that most, if not all boards locate U-Boot at the start of the flash.
Not all, definitely -- the net5big has a big flash, yet places u-boot rather high (still, not high enough for me as it wastes 64 KB out of 512 on my edmini, so I'd done a patch to use that, but it does not work with Heiko's reloc patches).
The U-Boot environment is written to the start of the configured flash block. Prior to that, this block is erased, its content is lost.
Possible approach
Most CPUs start execution at flash address 0 or somewhere in this region at the reset ISR vector.
Hmm... Wrong for the two boards I use: they boot at 0xffff0000, with flash areas starting at fff80000 and ffc00000 respectively. I suspect many ARM926 based boards boot to ffff0000 too.
In case U-Boot is not located there but some flash sectors later, at this reset vector there has to be some architecture specific code block, which could be a
- simple jump to U-Boot entry
- dummy ISR table with reset vector pointing to U-Boot
- specific trampoline code
- ...
This point applies to the boards I use: they reset at ffff0000 but the u-boot code is never put there; instead, a small block of code does some checks then branches to fff90000, where u-boot's _start entry point resides.
In other words, a few bytes (up to e.g. 1kB) of (static) binary data, generated at compile time.
This has to be put before the environment, either implicit or included at the start of "struct environment_s".
This would work only for cases where booting is at the start of the flash. In my case, the small booting code is at the top, wasting 64 KB.
The latter would of course break backward compatibility, at least without some additional code.
The variant of including this code into a specific block would again waste a flash block resp. mandate the presence of a VPD or similar.
Things to define / check
- U-Boot entry point on non-PIC builds (U-Boot start != CONFIG_SYS_FLASH_BASE)
- U-Boot on architectures not capable of a single jump forward by e.g. 32kB or 64kB
- redundant environment
- mixed flash/external environment
- ...
The obvious issue in your design is that if there is a loss of power while the environment is written into, then the boot code runs the risk of being erased; making the board boot again properly will then require technical operations far beyond customer abilities if this board is used in production.
However I see a solution to your problem and mine in a single unified design.
My 'top 64KB' patch would work by putting the start of u-boot (the part that would run from flash) in the top 64 KB so that _sttart would end up exactly at ffff0000, then putting the rest of the u-boot code below this, not touching the environment.
Now in your case, you want (unlike me) to have u-boot in the lowest possible area of a bootom boot FLASH.
Bottom boot flashes always have several smaller sectors which add up in size to a big sector so that big sectors are properly aligned. U-boot environment takes only one. So what if u-boot _start is located in the lowest sector along with some more of u-boot's code, then the environment in the next sector, the the rest of u-boot in susbequent ones?
That's where our designs could meet: define a set of configuration options which indicate where part 1 and part 2 of the u-boot code should go in FLASH. In my case part 1 would go to the upper 64 KB; in your case, it would go to the lower sector and size.
One important note: I don't mean to manually map code to these two parts at link time by fine-tuning the linker file as Wolfgang suggested, because I think this is pretty sensitive to any change in the code which would make the mapping over- or under-fill one area.
What I intend to do is build an u-boot which has its text+data in the usual single linear part (it could be linked for some RAM location), then split the .bin up into part 1 and 2, and flash each part in its own FLASH area. The startup code would fetch parts 1 and 2 and assemble them back in RAM.
This of course requires making the startup code really position-independent so that it could run from anywhere within the FLASH, and that it all resides in part 1.
The second point is what Heiko's patches made more difficult, because they full in much code for FLASH execution that was run from RAM before in arm926 systems.
Solving this will probably require that code and data intended for FLASH execution be marked with a specific attribute similar to the __init one in Linux; this attribute will put the code and data in separate sections (.text.flash and .data.flash, for instance) which the linker file would then output before the others (.text, .*data*).
Amicalement,
Albert. _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

Le 27/09/2010 18:16, Andreas Pretzsch a écrit :
Dear Albert,
first, thanks for reading through my RFC. I agree that there are a couple of simplifications in there, e.g. reset vectors. Written after about 12 hours of hacking...
:)
I see you're struggling with a comparable, yet a bit more complicated issue. I have to admit that I only skimmed over the threads, but it looks to me that most of it could be done also with linkers help.
As I said, linker file tweaking can provide a non-perfectly-optimal solution (as you can hardly perfectly fill the sector which contains _start), for one specific system, i.e. one SoC model with one FLASH model, and a given configuration (i.e. set of object files to link together), but if you change either, or you change the configuration, there is a risk, however small, that the manual mapping become even more non-optimal by creating more unused space (one can live with that) or infeasible by overfilling the sector (and that is more problematic).
OTOH, the patch I'm envisioning would have the advantage of optimally filling all sectors used for code and data (except one obviously), working out-of-the-box over a large set of (ARM[926]) systems and configurations -- the location and size of the two .bin parts would be the only thing to adapt for each board, once and for all as long as the type of FLASH remains unchanged.
(I realize there is also a marginal advantage to the patch: as I said, it requires separating the code and data that is needed when running from FLASH from the one that is needed only when running from RAM. This could be further improved to copy from FLASH to RAM only what is needed for running from RAM. When u-boot is used to launch OSes, it won't make a difference ; when it launches standalone apps, that's a bit more RAM available to the apps. Not much, mind.)
Personally, I'm fine with a tuned linker file. Preparations are already there in U-Boot and the real life block allocation is pretty static, too. So following the KISS principle, I'd go with that approach.
I understand perfectly -- I'm all for everyone choosing the solution they see fit. :)
For the specific case of mixed-sector-size flashes with linear allocation (which is all the RFC was about in the first place), Wolfgang pointed to embedded environment resp. linker adaption. Which solves the issue perfectly. Therefore I see "my" case as closed and the RFC as redundant.
As time permits, I'll have a look at your points again later. But honestly, the stack on my desk piles up a bit too much right now... Sorry.
No problems. I'll post a patch anyway; reading your RFC simply made me aware that the patch should be more generic than it is right now, so as to cover your needs as I understand them. :)
Best regards, Andreas
Amicalement,

Dear Andreas Pretzsch,
In message 1285564526.21734.187.camel@ws-apr.office.loc you wrote:
In a nit-picking moment trying to save some flash storage, I looked at the current typical flash layout of U-Boot and came up with below RFC.
Thanks.
I'd volunteer to provide an implementation for the architectures I have close at hand today (Blackfin, ARM11), as time permits. For other architectures, I'd need some assistance.
You design a big and somewhat complicate solution for a problem which does not exist, because it has already been solved more than a decase ago, i. e. right with the very first versions of U-Boot (or PPCBoot, as it was called by then).
We call this feature "embedded environment", and all what it takes to use it oin a machine is a somehwat hand-crafted version of the linker script which aligns the location of the environmentin the right (small) flash sectors.
This works like a charm, and I recommend you have a look into this.
Best regards,
Wolfgang Denk

Am Montag, den 27.09.2010, 10:59 +0200 schrieb Wolfgang Denk:
You design a big and somewhat complicate solution for a problem which
ACK. Had the linker approach also in mind, but didn't want to touch each and every lds. But definitively the more elegant solution.
does not exist, because it has already been solved more than a decase ago, i. e. right with the very first versions of U-Boot (or PPCBoot, as it was called by then).
We call this feature "embedded environment", and all what it takes to use it oin a machine is a somehwat hand-crafted version of the linker script which aligns the location of the environmentin the right (small) flash sectors.
This works like a charm, and I recommend you have a look into this.
Sigh. Big brown paper bag for me.
Solves exactly the "issue" I described and works perfectly. And is documented very well in the README. As usual, things are already solved in U-Boot. Thanks for the pointer.
Really had blinkers on, no idea why I didn't see it. Respectively, saw it but somehow mixed it up with hardcoded, read-only environment. Lession learned: Get some sleep before scribbling senseless RFCs.
Sorry for the noise, case closed.
participants (3)
-
Albert ARIBAUD
-
Andreas Pretzsch
-
Wolfgang Denk