[U-Boot] [PATCH] atngw100: Use virtual address in CONFIG_ENV_ADDR

Ever since the CFI driver was rewritten to use virtual addresses, thus eliminating the whole point of the map_physmem() macro, ATNGW100 has been broken like this:
U-Boot> saveenv Saving Environment to Flash... Error: start and/or end address not on sector boundary
So let's take a different approach and store the virtual address in CONFIG_ENV_ADDR. I personally believe that's rubbish and will break whenever we decide to change the virtual-to-physical mapping in any way, but it looks like it's the direction in which u-boot is currently moving, and it does fix the problem.
Signed-off-by: Haavard Skinnemoen haavard.skinnemoen@atmel.com --- include/configs/atngw100.h | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/include/configs/atngw100.h b/include/configs/atngw100.h index 4ed5514..9777ec0 100644 --- a/include/configs/atngw100.h +++ b/include/configs/atngw100.h @@ -155,7 +155,7 @@
#define CONFIG_ENV_IS_IN_FLASH 1 #define CONFIG_ENV_SIZE 65536 -#define CONFIG_ENV_ADDR (CONFIG_SYS_FLASH_BASE + CONFIG_SYS_FLASH_SIZE - CONFIG_ENV_SIZE) +#define CONFIG_ENV_ADDR (0xa0000000 + CONFIG_SYS_FLASH_SIZE - CONFIG_ENV_SIZE)
#define CONFIG_SYS_INIT_SP_ADDR (CONFIG_SYS_INTRAM_BASE + CONFIG_SYS_INTRAM_SIZE)

Haavard Skinnemoen wrote:
Ever since the CFI driver was rewritten to use virtual addresses, thus eliminating the whole point of the map_physmem() macro, ATNGW100 has been broken like this:
How about other boards (like the MIMC200) ?
Aren't *all* AVR32 boards affected in this way ?
Mark

Mark Jackson mpfj-list@mimc.co.uk wrote:
Haavard Skinnemoen wrote:
Ever since the CFI driver was rewritten to use virtual addresses, thus eliminating the whole point of the map_physmem() macro, ATNGW100 has been broken like this:
How about other boards (like the MIMC200) ?
Aren't *all* AVR32 boards affected in this way ?
Possibly, but NGW100 is the only one which I've seen reports about. STK1000 is safe since it doesn't use the CFI driver.
Haavard

Haavard Skinnemoen wrote:
Mark Jackson mpfj-list@mimc.co.uk wrote:
Haavard Skinnemoen wrote:
Ever since the CFI driver was rewritten to use virtual addresses, thus eliminating the whole point of the map_physmem() macro, ATNGW100 has been broken like this:
How about other boards (like the MIMC200) ?
Aren't *all* AVR32 boards affected in this way ?
Possibly, but NGW100 is the only one which I've seen reports about. STK1000 is safe since it doesn't use the CFI driver.
I did kinda report this in the thread "JFFS2 scanning bug", and the "triple-revert" patch you posted on 26/05/09 16:58 appeared to fix it.
Since this didn't change any board files (only the core CFI files) I guess I assumed this "revert" would work its way upstream and I wouldn't have to change anything.
Shall I just submit a patch to fix the mimc200 board in the same way as your NGW100 patch ?
Regards Mark

Mark Jackson mpfj-list@mimc.co.uk wrote:
Haavard Skinnemoen wrote:
Possibly, but NGW100 is the only one which I've seen reports about. STK1000 is safe since it doesn't use the CFI driver.
I did kinda report this in the thread "JFFS2 scanning bug", and the "triple-revert" patch you posted on 26/05/09 16:58 appeared to fix it.
Ah...so it breaks JFFS2 as well? I doubt that changing the environment address fixes that...
Since this didn't change any board files (only the core CFI files) I guess I assumed this "revert" would work its way upstream and I wouldn't have to change anything.
Hmm, yeah, maybe I should post the revert again.
I have to admit I'm completely confused about how u-boot deals with virtual and physical addresses these days. It used to expose only physical addresses through external interfaces, but now it looks like it's a bit of both, and it's impossible to tell which goes where.
Shall I just submit a patch to fix the mimc200 board in the same way as your NGW100 patch ?
Yes, that will probably be a good idea if it has the same problem with saveenv.
Haavard

Haavard Skinnemoen wrote:
Mark Jackson mpfj-list@mimc.co.uk wrote:
Haavard Skinnemoen wrote:
Possibly, but NGW100 is the only one which I've seen reports about. STK1000 is safe since it doesn't use the CFI driver.
I did kinda report this in the thread "JFFS2 scanning bug", and the "triple-revert" patch you posted on 26/05/09 16:58 appeared to fix it.
Ah...so it breaks JFFS2 as well? I doubt that changing the environment address fixes that...
Since this didn't change any board files (only the core CFI files) I guess I assumed this "revert" would work its way upstream and I wouldn't have to change anything.
Hmm, yeah, maybe I should post the revert again.
I have to admit I'm completely confused about how u-boot deals with virtual and physical addresses these days. It used to expose only physical addresses through external interfaces, but now it looks like it's a bit of both, and it's impossible to tell which goes where.
Shall I just submit a patch to fix the mimc200 board in the same way as your NGW100 patch ?
Yes, that will probably be a good idea if it has the same problem with saveenv.
Okay ... looks like there are 2 problems revolving round CFI.
(1) saveenv (2) jffs2
The CONFIG_ENV_ADDR patch fixes (1) but *not* (2).
The "triple-revert" patch fixes both (1) and (2).
Not quite sure how to proceed from here. For the time being, I'll go down the "triple-revert" patch route until something better pops up !!
Regards Mark

Dear Mark Jackson,
In message 4A97B2D4.6080406@mimc.co.uk you wrote:
Okay ... looks like there are 2 problems revolving round CFI.
(1) saveenv (2) jffs2
The CONFIG_ENV_ADDR patch fixes (1) but *not* (2).
The "triple-revert" patch fixes both (1) and (2).
Not quite sure how to proceed from here. For the time being, I'll go down the "triple-revert" patch route until something better pops up !!
I wonder why nobody seems to want to put the CFI custodianon Cc: ?
Best regards,
Wolfgang Denk

Dear Haavard Skinnemoen,
In message 1251448970-15252-1-git-send-email-haavard.skinnemoen@atmel.com you wrote:
Ever since the CFI driver was rewritten to use virtual addresses, thus eliminating the whole point of the map_physmem() macro, ATNGW100 has been broken like this:
U-Boot> saveenv Saving Environment to Flash... Error: start and/or end address not on sector boundary
So let's take a different approach and store the virtual address in CONFIG_ENV_ADDR. I personally believe that's rubbish and will break whenever we decide to change the virtual-to-physical mapping in any way, but it looks like it's the direction in which u-boot is currently moving, and it does fix the problem.
Signed-off-by: Haavard Skinnemoen haavard.skinnemoen@atmel.com
include/configs/atngw100.h | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/include/configs/atngw100.h b/include/configs/atngw100.h index 4ed5514..9777ec0 100644 --- a/include/configs/atngw100.h +++ b/include/configs/atngw100.h @@ -155,7 +155,7 @@
#define CONFIG_ENV_IS_IN_FLASH 1 #define CONFIG_ENV_SIZE 65536 -#define CONFIG_ENV_ADDR (CONFIG_SYS_FLASH_BASE + CONFIG_SYS_FLASH_SIZE - CONFIG_ENV_SIZE) +#define CONFIG_ENV_ADDR (0xa0000000 + CONFIG_SYS_FLASH_SIZE - CONFIG_ENV_SIZE)
This is definitely a change to the worse.
I feel a strong urge to NAK this. There must be a better way to fix the problems you are experiencing.
Stefan, can you please comment, too?
Best regards,
Wolfgang Denk

Wolfgang Denk wd@denx.de wrote:
#define CONFIG_ENV_IS_IN_FLASH 1 #define CONFIG_ENV_SIZE 65536 -#define CONFIG_ENV_ADDR (CONFIG_SYS_FLASH_BASE + CONFIG_SYS_FLASH_SIZE - CONFIG_ENV_SIZE) +#define CONFIG_ENV_ADDR (0xa0000000 + CONFIG_SYS_FLASH_SIZE - CONFIG_ENV_SIZE)
This is definitely a change to the worse.
Yes, I think so too.
I feel a strong urge to NAK this. There must be a better way to fix the problems you are experiencing.
Yes...the idea behind the map_physmem() macro was to do any physical-to-virtual address mapping inside the CFI flash driver and never expose anything but physical addresses to the outside world. This meant that the sector start addresses were expressed in terms of physical addresses, matching how things were wired up on the board, and the architecture was free to map those to any virtual address which is most suitable in terms of caching properties, etc.
Now, however, the flash start addresses are mapped to virtual addresses at initialization time, so everything that wants to interact with the flash must known which address the architecture decided to map the flash at, which is not necessarily even possible to know in advance if it is being done dynamically through a hardware MMU.
Reverting 09ce9921a7d8b1ce764656b14b42217bbf4faa38 would bring things back to the way they were, and fix both the saveenv problem and the jffs2 problem.
Haavard

On Aug 28, 2009, at 7:14 AM, Haavard Skinnemoen wrote:
Wolfgang Denk wd@denx.de wrote:
#define CONFIG_ENV_IS_IN_FLASH 1 #define CONFIG_ENV_SIZE 65536 -#define CONFIG_ENV_ADDR (CONFIG_SYS_FLASH_BASE + CONFIG_SYS_FLASH_SIZE - CONFIG_ENV_SIZE) +#define CONFIG_ENV_ADDR (0xa0000000 + CONFIG_SYS_FLASH_SIZE - CONFIG_ENV_SIZE)
This is definitely a change to the worse.
Yes, I think so too.
I feel a strong urge to NAK this. There must be a better way to fix the problems you are experiencing.
Yes...the idea behind the map_physmem() macro was to do any physical-to-virtual address mapping inside the CFI flash driver and never expose anything but physical addresses to the outside world. This meant that the sector start addresses were expressed in terms of physical addresses, matching how things were wired up on the board, and the architecture was free to map those to any virtual address which is most suitable in terms of caching properties, etc.
Now, however, the flash start addresses are mapped to virtual addresses at initialization time, so everything that wants to interact with the flash must known which address the architecture decided to map the flash at, which is not necessarily even possible to know in advance if it is being done dynamically through a hardware MMU.
Reverting 09ce9921a7d8b1ce764656b14b42217bbf4faa38 would bring things back to the way they were, and fix both the saveenv problem and the jffs2 problem.
Such a revert would break other boards that now expect the new behavior (like all the 36-bit physical cfg for FSL boards)
- k

Kumar Gala galak@kernel.crashing.org wrote:
Reverting 09ce9921a7d8b1ce764656b14b42217bbf4faa38 would bring things back to the way they were, and fix both the saveenv problem and the jffs2 problem.
Such a revert would break other boards that now expect the new behavior (like all the 36-bit physical cfg for FSL boards)
Right, I suspected that.
So...which config symbols are supposed to be virtual now, and how are you supposed to know how the virtual-to-physical mappings are set up in advance?
Haavard

On Aug 28, 2009, at 8:49 AM, Haavard Skinnemoen wrote:
Kumar Gala galak@kernel.crashing.org wrote:
Reverting 09ce9921a7d8b1ce764656b14b42217bbf4faa38 would bring things back to the way they were, and fix both the saveenv problem and the jffs2 problem.
Such a revert would break other boards that now expect the new behavior (like all the 36-bit physical cfg for FSL boards)
Right, I suspected that.
FYI, before the patch, the CFI driver was sometimes doing the map, but IIRC it was also abusing the "physical" address, treating it as a virtual address without mapping it. The only way for that to work is when VA=PA (or, depending on what you were doing with the address, you just got lucky). The CFI driver was the outlier - all the other flash code was treating the start field as a VA already. So I don't think just reverting the patch is the answer.
So...which config symbols are supposed to be virtual now, and how are you supposed to know how the virtual-to-physical mappings are set up in advance?
Everything is treated as virtual unless it's being used for hardware setup. If you use something to do memory accesses, it's virtual. A lot of code had been just using the PA as a VA, because things were always mapped 1-1.
Can you point me at an example in your scenario of code that interacts with the flash?
Your problem Cheers, Becky

On Friday 28 August 2009 21:58:15 Becky Bruce wrote:
On Aug 28, 2009, at 8:49 AM, Haavard Skinnemoen wrote:
Kumar Gala galak@kernel.crashing.org wrote:
Reverting 09ce9921a7d8b1ce764656b14b42217bbf4faa38 would bring things back to the way they were, and fix both the saveenv problem and the jffs2 problem.
Such a revert would break other boards that now expect the new behavior (like all the 36-bit physical cfg for FSL boards)
Right, I suspected that.
FYI, before the patch, the CFI driver was sometimes doing the map, but IIRC it was also abusing the "physical" address, treating it as a virtual address without mapping it. The only way for that to work is when VA=PA (or, depending on what you were doing with the address, you just got lucky). The CFI driver was the outlier - all the other flash code was treating the start field as a VA already. So I don't think just reverting the patch is the answer.
I think too, that revering the patch in question is not the right "solution" for this problem in the current stage. But I have to admit that I don't have enough insight into your platform to come up with another good idea quickly.
So...which config symbols are supposed to be virtual now, and how are you supposed to know how the virtual-to-physical mappings are set up in advance?
Everything is treated as virtual unless it's being used for hardware setup. If you use something to do memory accesses, it's virtual. A lot of code had been just using the PA as a VA, because things were always mapped 1-1.
Correct. All those addresses to configure the CFI driver should be virtual addresses. BTW: This is valid for other drivers as well (e.g. IDE, video).
Can you point me at an example in your scenario of code that interacts with the flash?
Yes, please give us an example of your memory mapping (physical and virtual) on your failing platform.
Cheers, Stefan
-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: office@denx.de

On Sat, 29 Aug 2009 13:39:02 +0200 Stefan Roese sr@denx.de wrote:
I think too, that revering the patch in question is not the right "solution" for this problem in the current stage. But I have to admit that I don't have enough insight into your platform to come up with another good idea quickly.
Yeah, I only meant to suggest it as one possible solution, with this patch being one other, less intrusive but also less complete, solution.
And I'm not really even opposed to the patch in question, I just need to know how I should configure my systems from now on since all of a sudden, some addresses are supposed to be virtual while others are physical, and it doesn't seem to be documented anywhere what should be used where.
So...which config symbols are supposed to be virtual now, and how are you supposed to know how the virtual-to-physical mappings are set up in advance?
Everything is treated as virtual unless it's being used for hardware setup. If you use something to do memory accesses, it's virtual. A lot of code had been just using the PA as a VA, because things were always mapped 1-1.
Correct. All those addresses to configure the CFI driver should be virtual addresses.
That cannot possibly be true, as the start address of the flash is passed to physmem_map().
It is also an absolutely idiotic thing to do since on any CPU with a MMU, the virtual address can be anything, so locking all the configuration symbols to specific virtual addresses would effectively lock down the MMU configuration forever (and I imagine that would be a particularly nasty problem on hardware with more physical address bits than virtual.)
Can you point me at an example in your scenario of code that interacts with the flash?
Yes, please give us an example of your memory mapping (physical and virtual) on your failing platform.
CFI-compatible flash at physical address 0. By default, all virtual addresses up to 2GB are mapped 1:1 with caching enabled (when paging is enabled, which it isn't in u-boot, this area is translated through the MMU.) Then, virtual address 0x80000000..0x9fffffff is mapped to 0x0..0x1fffffff with caching enabled (supervisor-only and not translated even with paging enabled; this is where the Linux kernel is mapped). Then, virtual address 0xa0000000..0xbfffffff is mapped to 0x0..0x1fffffff with caching disabled -- this is where we want to access the flash and other external hardware. Then there's another paged region and finally a 512MB uncached, direct-mapped region for accessing internal peripherals.
So the problem is that even though we may read from the flash using the physical address as virtual address, we cannot initialize, erase or write it using those addresses. Hence the need for map_physmem().
Haavard

On Fri, 28 Aug 2009 14:58:15 -0500 Becky Bruce becky.bruce@freescale.com wrote:
FYI, before the patch, the CFI driver was sometimes doing the map, but IIRC it was also abusing the "physical" address, treating it as a virtual address without mapping it.
In that case, those places should have been fixed, no?
The only way for that to work is when VA=PA (or, depending on what you were doing with the address, you just got lucky). The CFI driver was the outlier - all the other flash code was treating the start field as a VA already. So I don't think just reverting the patch is the answer.
Except for everything _outside_ the flash code which still deals with physical addresses, like the environment stuff and JFFS2. The flash code takes those addresses and compares them with the virtual addresses in the start array, and things break.
So...which config symbols are supposed to be virtual now, and how are you supposed to know how the virtual-to-physical mappings are set up in advance?
Everything is treated as virtual unless it's being used for hardware setup.
Exactly what constitutes "hardware setup"?
If you use something to do memory accesses, it's virtual.
Yes, but then the address should also be in a pointer, not an unsigned long which the flash 'start' array is.
A lot of code had been just using the PA as a VA, because things were always mapped 1-1.
Yes, there's lots of code which is broken in that respect...
Can you point me at an example in your scenario of code that interacts with the flash?
CONFIG_ENV_ADDR is used to store the environment in CFI flash. Reading the environment works OK-ish since the flash is accessible through a cacheable 1:1 mapping from virtual/physical address 0. However, when writing and erasing, the physical address stored in CONFIG_ENV_ADDR appears to be outside of the virtual sector addresses stored in the 'start' array, so the flash code throws an error.
There are basically two ways to fix it: Either go back to using physical addresses in the flash API, or make CONFIG_ENV_ADDR virtual (and from what I hear, the jffs2 code needs a similar fix.) This patch does the latter, but it seems like it doesn't fix things completely, and Wolfgang didn't appear very happy about it.
Haavard

Dear Haavard & Becky,
In message 20090830173640.2af9ce3c@siona you wrote:
The only way for that to work is when VA=PA (or, depending on what you were doing with the address,
Well, VA==PA is the default setting for U-Boot that shall be used for all systems, unless it is really impossible on a board to implement an exception from that rule.
you just got lucky). The CFI driver was the outlier - all the other flash code was treating the start field as a VA already. So I don't think just reverting the patch is the answer.
We did not even have any notion of VA's in U-Boot until very recently, and I call on everybody not to make U-Boot more complicated than necessary.
In almost all cases RAM and NOR flash fit easily in the physical address space of the CPUs, and for the sake of this majority of systems it must be possible to access memory on such systems assuming a simple 1:1 mapping.
Becky: the fact that Haavard's code is breaking is not considered an indication of a deficiency in his code.
If the CFI driver kind of allowed for VAs before (but incompletely / incorrectly), then this dis not cause problems on any systems using a strict 1:1 mapping.
Any changes to the code to correctly support other mappings must be done in a way that they (1) do not break and (2) do not add additional burdon on systems with a simple 1:1 mapping.
Everything is treated as virtual unless it's being used for hardware setup.
Thisis NOT correct. U-Boot usually does NOT use virtual addresses. Only very few systems do, and these must care not to disturb the majority of systems which do no need to differentiate between physical and virtual addresses.
If you use something to do memory accesses, it's virtual.
Unless you have a very special system you can rely on a strict 1:1 mapping.
A lot of code had been just using the PA as a VA, because things were always mapped 1-1.
Not only were, but _are_ and _will_be_.
There are basically two ways to fix it: Either go back to using physical addresses in the flash API, or make CONFIG_ENV_ADDR virtual
I understand we cannot do that, because some systems need to map (NOR) flash to virtual addresses outside the physical address space. Ok, so the CFI driver shall consistently be able to use VAs - but on simple systems with a 1:1 mapping there shall be no need in the system configuration to spend any thoughts on this.
(and from what I hear, the jffs2 code needs a similar fix.) This patch does the latter, but it seems like it doesn't fix things completely, and Wolfgang didn't appear very happy about it.
Indeed I'm deeply trouble when log standing rules get silently bent and even broken.
Best regards,
Wolfgang Denk

On Sun, 30 Aug 2009 20:11:01 +0200 Wolfgang Denk wd@denx.de wrote:
Dear Haavard & Becky,
In message 20090830173640.2af9ce3c@siona you wrote:
The only way for that to work is when VA=PA (or, depending on what you were doing with the address,
Well, VA==PA is the default setting for U-Boot that shall be used for all systems, unless it is really impossible on a board to implement an exception from that rule.
While not impossible, following that rule on the NGW100 would require that I either disable the caches (which would result in a massive slowdown) or start using the MMU more actively to get the caching properties right.
you just got lucky). The CFI driver was the outlier - all the other flash code was treating the start field as a VA already. So I don't think just reverting the patch is the answer.
We did not even have any notion of VA's in U-Boot until very recently, and I call on everybody not to make U-Boot more complicated than necessary.
I don't think any boards with PA==VA are affected. This issue is about two platforms with VA!=PA following different rules...
In almost all cases RAM and NOR flash fit easily in the physical address space of the CPUs, and for the sake of this majority of systems it must be possible to access memory on such systems assuming a simple 1:1 mapping.
You're ignoring cache issues.
Any changes to the code to correctly support other mappings must be done in a way that they (1) do not break and (2) do not add additional burdon on systems with a simple 1:1 mapping.
That was the idea when I introduced map_physmem() to the CFI driver a while back: the externally visible addresses were kept unchanged, while the remapping was done internally. Also, map_physmem() is a no-op on platforms with a simple 1:1 mapping.
If you use something to do memory accesses, it's virtual.
Unless you have a very special system you can rely on a strict 1:1 mapping.
Technically, the addresses seen by the CPU are virtual. And I don't think systems with a cache should be considered 'very special' these days...
There are basically two ways to fix it: Either go back to using physical addresses in the flash API, or make CONFIG_ENV_ADDR virtual
I understand we cannot do that, because some systems need to map (NOR) flash to virtual addresses outside the physical address space. Ok, so the CFI driver shall consistently be able to use VAs - but on simple systems with a 1:1 mapping there shall be no need in the system configuration to spend any thoughts on this.
But exactly what's wrong with hiding all this complexity inside map_physmem()? It was designed _exactly_ for this purpose -- to allow platforms with non-trivial mappings between VA and PA to do the necessary mapping when the driver requests it.
(and from what I hear, the jffs2 code needs a similar fix.) This patch does the latter, but it seems like it doesn't fix things completely, and Wolfgang didn't appear very happy about it.
Indeed I'm deeply trouble when log standing rules get silently bent and even broken.
And I deeply regret using the CFI driver on NGW100...it took a lot of effort to get it mostly limping along, and then it got broken at the first opportunity. I should have stuck with the much smaller and more efficient board-specific driver I had to begin with.
That PA==VA rule is pretty much the reason we're in this mess -- if more platforms had broken this rule, perhaps more of the code would have been written properly without lots of implicit conversion from PA to VA via ugly casts between unsigned long and all sorts of pointers.
Haavard

Dear Haavard Skinnemoen,
In message 20090830224218.10f14dc4@siona you wrote:
Well, VA==PA is the default setting for U-Boot that shall be used for all systems, unless it is really impossible on a board to implement an exception from that rule.
While not impossible, following that rule on the NGW100 would require that I either disable the caches (which would result in a massive slowdown) or start using the MMU more actively to get the caching properties right.
OK, so this would be the plan for a clean fix, then?
In almost all cases RAM and NOR flash fit easily in the physical address space of the CPUs, and for the sake of this majority of systems it must be possible to access memory on such systems assuming a simple 1:1 mapping.
You're ignoring cache issues.
Indeed I am, and intentionally, because this is a different topic. If your system requires to set up the MMU to enable caching, then you are supposed to do it in a way compatible with the rest of the software design, i. e. as transparently as possible. None of the architectures I know resonably well have problems setting up a 1: 1 address mapping even when the MMU is on (but I have to admit that AVR32 is not among these architectures).
Technically, the addresses seen by the CPU are virtual. And I don't think systems with a cache should be considered 'very special' these days...
Cache should not be relevant at all when defining a physcal or virtual memory map.
But exactly what's wrong with hiding all this complexity inside map_physmem()? It was designed _exactly_ for this purpose -- to allow platforms with non-trivial mappings between VA and PA to do the necessary mapping when the driver requests it.
Sorry, I don't know that code and it's users well enough to coment. Maybe Becky and/or Stefan can shed some light on this...
That PA==VA rule is pretty much the reason we're in this mess -- if
Let's say, the fact that this rule has not been stricter enforced has caused that teh appearant problems were not discovered earlier.
more platforms had broken this rule, perhaps more of the code would
Heh. If more platforms had broken this rule we would probably have become aware of these violations earlier, and stopped them doing such naughty things ;-)
As it seems, this requires some more discussion, i. e. a clean fix within the next couple of days is unlikely. To me that means that it makes no sense to offer to delay the release of v2009.08 by a week or so, as we'd still not be ready then. I will therefore proceed as planned, putting up with the fact that some (two, IIRC?) AVR32 boards are broken in this release. [we had a very long testing phase, and the problem is not exactly new, so it should have been detected (and fixed) long before.]
Best regards,
Wolfgang Denk

Wolfgang Denk wd@denx.de wrote:
Dear Haavard Skinnemoen,
In message 20090830224218.10f14dc4@siona you wrote:
Well, VA==PA is the default setting for U-Boot that shall be used for all systems, unless it is really impossible on a board to implement an exception from that rule.
While not impossible, following that rule on the NGW100 would require that I either disable the caches (which would result in a massive slowdown) or start using the MMU more actively to get the caching properties right.
OK, so this would be the plan for a clean fix, then?
Possibly. But it means even more effort and even larger code, so I'm not exactly happy about it.
In almost all cases RAM and NOR flash fit easily in the physical address space of the CPUs, and for the sake of this majority of systems it must be possible to access memory on such systems assuming a simple 1:1 mapping.
You're ignoring cache issues.
Indeed I am, and intentionally, because this is a different topic. If your system requires to set up the MMU to enable caching, then you are supposed to do it in a way compatible with the rest of the software design, i. e. as transparently as possible. None of the architectures I know resonably well have problems setting up a 1: 1 address mapping even when the MMU is on (but I have to admit that AVR32 is not among these architectures).
I suspect quite a few other architectures run with caches disabled because it's not safe to run with caches enabled with the current software design.
Technically, the addresses seen by the CPU are virtual. And I don't think systems with a cache should be considered 'very special' these days...
Cache should not be relevant at all when defining a physcal or virtual memory map.
Physical, no, but it's very common that the MMU defines caching properties (enabled/disabled, writeback/writethrough, etc.)
That PA==VA rule is pretty much the reason we're in this mess -- if
Let's say, the fact that this rule has not been stricter enforced has caused that teh appearant problems were not discovered earlier.
more platforms had broken this rule, perhaps more of the code would
Heh. If more platforms had broken this rule we would probably have become aware of these violations earlier, and stopped them doing such naughty things ;-)
Seems like you think it's more important to follow arbitrary rules than writing code that works well.
As it seems, this requires some more discussion, i. e. a clean fix within the next couple of days is unlikely. To me that means that it makes no sense to offer to delay the release of v2009.08 by a week or so, as we'd still not be ready then. I will therefore proceed as planned, putting up with the fact that some (two, IIRC?) AVR32 boards are broken in this release. [we had a very long testing phase, and the problem is not exactly new, so it should have been detected (and fixed) long before.]
Yes, I agree.
Haavard

Dear Haavard Skinnemoen,
In message 20090831155327.62b58f8f@hskinnemoen-d830 you wrote:
Possibly. But it means even more effort and even larger code, so I'm not exactly happy about it.
Really? Sorry if I'm asking dumb questions - I don't know AVR32: is it true that stting up a non-1:1 mapping for the NOR flash (i. e. what you are doing now) is easier (less code) than setting up a 1:1 mapping? What exactly are the reasons for this?
Indeed I am, and intentionally, because this is a different topic. If your system requires to set up the MMU to enable caching, then you are supposed to do it in a way compatible with the rest of the software design, i. e. as transparently as possible. None of the architectures I know resonably well have problems setting up a 1: 1 address mapping even when the MMU is on (but I have to admit that AVR32 is not among these architectures).
I suspect quite a few other architectures run with caches disabled because it's not safe to run with caches enabled with the current software design.
Well, usually we run with IC on and with DC off, usually because quite a number of drivers and other code do not use proper I/O accessors yet, and/or because it's easier and nobody really cares. For example on PowerPC, adding support for the data cache usually gives only a minimal performance boost. This may be different on other architectures.
Cache should not be relevant at all when defining a physcal or virtual memory map.
Physical, no, but it's very common that the MMU defines caching properties (enabled/disabled, writeback/writethrough, etc.)
Agreed. But it should be not so difficult to use the MMU to set up a 1:1 mapping if you have to set up the MMU at all - or is there any problems with that which I'm not aware of?
Heh. If more platforms had broken this rule we would probably have become aware of these violations earlier, and stopped them doing such naughty things ;-)
Seems like you think it's more important to follow arbitrary rules than writing code that works well.
Keeping the code as simple as possible is not exactly an arbitrary rule. At least not for me.
Best regards,
Wolfgang Denk

Wolfgang Denk wd@denx.de wrote:
Dear Haavard Skinnemoen,
In message 20090831155327.62b58f8f@hskinnemoen-d830 you wrote:
Possibly. But it means even more effort and even larger code, so I'm not exactly happy about it.
Really? Sorry if I'm asking dumb questions - I don't know AVR32: is it true that stting up a non-1:1 mapping for the NOR flash (i. e. what you are doing now) is easier (less code) than setting up a 1:1 mapping? What exactly are the reasons for this?
Like I explained in an earlier mail, the default setup includes a 1:1 mapping of the lowest addresses, but it's cacheable. The default setup also includes an uncached mapping of the same memory at a higher virtual address.
Yes, it is much easier (and smaller) to use the default virtual memory layout than setting up paging (which involves several exception handlers).
Indeed I am, and intentionally, because this is a different topic. If your system requires to set up the MMU to enable caching, then you are supposed to do it in a way compatible with the rest of the software design, i. e. as transparently as possible. None of the architectures I know resonably well have problems setting up a 1: 1 address mapping even when the MMU is on (but I have to admit that AVR32 is not among these architectures).
I suspect quite a few other architectures run with caches disabled because it's not safe to run with caches enabled with the current software design.
Well, usually we run with IC on and with DC off, usually because quite a number of drivers and other code do not use proper I/O accessors yet, and/or because it's easier and nobody really cares. For example on PowerPC, adding support for the data cache usually gives only a minimal performance boost. This may be different on other architectures.
Ok, so the code is broken and nobody else cares?
I suppose I could disable the DC (which is a bit complicated, but possible), but that would just add to the already high cost (in terms of both code size and performance) of using common code (i.e. the CFI driver), so I'm leaning towards a custom flash driver instead.
Cache should not be relevant at all when defining a physcal or virtual memory map.
Physical, no, but it's very common that the MMU defines caching properties (enabled/disabled, writeback/writethrough, etc.)
Agreed. But it should be not so difficult to use the MMU to set up a 1:1 mapping if you have to set up the MMU at all - or is there any problems with that which I'm not aware of?
Yes, there is: caching.
Heh. If more platforms had broken this rule we would probably have become aware of these violations earlier, and stopped them doing such naughty things ;-)
Seems like you think it's more important to follow arbitrary rules than writing code that works well.
Keeping the code as simple as possible is not exactly an arbitrary rule. At least not for me.
As simple as possible, but no simpler...
Haavard

On Tuesday 01 September 2009 10:57:52 Haavard Skinnemoen wrote:
Well, usually we run with IC on and with DC off, usually because quite a number of drivers and other code do not use proper I/O accessors yet, and/or because it's easier and nobody really cares. For example on PowerPC, adding support for the data cache usually gives only a minimal performance boost. This may be different on other architectures.
Ok, so the code is broken and nobody else cares?
I wouldn't put it like this. The CFI driver assumes that the FLASH mapping is not cached. This makes perfect sense in my point of view.
I suppose I could disable the DC (which is a bit complicated, but possible), but that would just add to the already high cost (in terms of both code size and performance) of using common code (i.e. the CFI driver), so I'm leaning towards a custom flash driver instead.
On some 440 platforms we configure the FLASH cached upon powerup, and disable the caching after relocating to SDRAM. So when the CFI driver is started, the FLASH is not cached any more, but we have the cache speedup upon relocation.
Cheers, Stefan
-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: office@denx.de

Stefan Roese sr@denx.de wrote:
On Tuesday 01 September 2009 10:57:52 Haavard Skinnemoen wrote:
Well, usually we run with IC on and with DC off, usually because quite a number of drivers and other code do not use proper I/O accessors yet, and/or because it's easier and nobody really cares. For example on PowerPC, adding support for the data cache usually gives only a minimal performance boost. This may be different on other architectures.
Ok, so the code is broken and nobody else cares?
I wouldn't put it like this. The CFI driver assumes that the FLASH mapping is not cached. This makes perfect sense in my point of view.
Yes, and it can do that safely because it specifically asks for an uncached address from map_physmem(). That part didn't actually change with the patch in question -- the problem is that the address returned by map_physmem() is now exposed through external interfaces -- it's not just an internal implementation detail anymore.
I suppose I could disable the DC (which is a bit complicated, but possible), but that would just add to the already high cost (in terms of both code size and performance) of using common code (i.e. the CFI driver), so I'm leaning towards a custom flash driver instead.
On some 440 platforms we configure the FLASH cached upon powerup, and disable the caching after relocating to SDRAM. So when the CFI driver is started, the FLASH is not cached any more, but we have the cache speedup upon relocation.
That's what I want to do as well! And I can actually do that as long as the flash subsystem uses map_physmem() consistently.
Haavard

Dear Haavard Skinnemoen,
In message 20090901105752.5bb77fc0@hskinnemoen-d830 you wrote:
Really? Sorry if I'm asking dumb questions - I don't know AVR32: is it true that stting up a non-1:1 mapping for the NOR flash (i. e. what you are doing now) is easier (less code) than setting up a 1:1 mapping? What exactly are the reasons for this?
Like I explained in an earlier mail, the default setup includes a 1:1 mapping of the lowest addresses, but it's cacheable. The default setup also includes an uncached mapping of the same memory at a higher virtual address.
You mean you want to have the same memory area mapped _twice_, once cached and once (at another address) uncached?
Well, this obviously cannot be done with a plain 1:1 mapping. But then: isn't that asking for trouble anyway? Or is there anything that prevents you for example reading stale cached data after the memory content has changed by accesses through the uncached mapping?
Yes, it is much easier (and smaller) to use the default virtual memory layout than setting up paging (which involves several exception handlers).
I don't see how paging comes into play here?
Well, usually we run with IC on and with DC off, usually because quite a number of drivers and other code do not use proper I/O accessors yet, and/or because it's easier and nobody really cares. For example on PowerPC, adding support for the data cache usually gives only a minimal performance boost. This may be different on other architectures.
Ok, so the code is broken and nobody else cares?
Broken? You might call it a design decision. This is a boot loader, and simplicity of design and easy porting and board bring up are usually higher rated that sequeezing out the last few percent of performance. IIRC, on PowerPC the difference of running with DC enabled or not is only in the 10% range or so.
I suppose I could disable the DC (which is a bit complicated, but possible), but that would just add to the already high cost (in terms of both code size and performance) of using common code (i.e. the CFI driver), so I'm leaning towards a custom flash driver instead.
If you want to run with data caches enabled by default, then it would probably make more sense if you invested efforts in extending the CFI driver to provide hooks / callbacks to (temporarily) switch of data cache for the memory range in question. This wouls allow you to run with DC enabled on the flash area, and still use the CFI driver.
Such changes will have it much easier to find their way into mainline than adding a proprietary flash driver.
Best regards,
Wolfgang Denk

Wolfgang Denk wd@denx.de wrote:
Dear Haavard Skinnemoen,
Like I explained in an earlier mail, the default setup includes a 1:1 mapping of the lowest addresses, but it's cacheable. The default setup also includes an uncached mapping of the same memory at a higher virtual address.
You mean you want to have the same memory area mapped _twice_, once cached and once (at another address) uncached?
Yes.
Well, this obviously cannot be done with a plain 1:1 mapping. But then: isn't that asking for trouble anyway? Or is there anything that prevents you for example reading stale cached data after the memory content has changed by accesses through the uncached mapping?
There's nothing which prevents me from accessing a completely different address either -- I just need to make sure that I access the correct address, or things will blow up one way or another.
The default virtual memory model makes it very easy to do uncached access to certain types of memory (i.e. flash) and cached access to others (i.e. SDRAM). It also makes it easy to run from cached flash memory at startup and switch to uncached access later (after flushing the cache, of course).
Yes, it is much easier (and smaller) to use the default virtual memory layout than setting up paging (which involves several exception handlers).
I don't see how paging comes into play here?
If I can't use the default scheme described above (and you're pretty much saying I can't), I can define caching properties on a page-by-page basis, but that obviously requires paging to be enabled.
Ok, so the code is broken and nobody else cares?
Broken? You might call it a design decision. This is a boot loader, and simplicity of design and easy porting and board bring up are usually higher rated that sequeezing out the last few percent of performance.
And that is EXACTLY why I'm trying to advocate: Keep the additional complexity (which can be kept to a minimum) localized to a handful of drivers, and don't change the command line interface or the board configuration interface.
IIRC, on PowerPC the difference of running with DC enabled or not is only in the 10% range or so.
But as a matter of principle, I don't want to reduce the performance _and_ increase the code size just because a driver that used to work got broken. I want to fix the driver instead!
I suppose I could disable the DC (which is a bit complicated, but possible), but that would just add to the already high cost (in terms of both code size and performance) of using common code (i.e. the CFI driver), so I'm leaning towards a custom flash driver instead.
If you want to run with data caches enabled by default, then it would probably make more sense if you invested efforts in extending the CFI driver to provide hooks / callbacks to (temporarily) switch of data cache for the memory range in question. This wouls allow you to run with DC enabled on the flash area, and still use the CFI driver.
But that is exactly how map_physmem() works -- it allows the CFI driver (or any other driver) to request the caches to be bypassed for a given physical address range, possibly resulting in a different virtual address (though for backwards compatibility, all platforms except AVR32 simply return the physical address unchanged.)
The problem is that this virtual address (which currently isn't dynamic, but it's easy to imagine a platform where it could be) is exposed to the rest of the world, thus breaking both existing board configurations and potentially any command-line scripts (and, apparently, the jffs2 driver though I'm not entirely sure how that happened.)
Such changes will have it much easier to find their way into mainline than adding a proprietary flash driver.
It certainly won't be a proprietary driver; if anything, it would be a variant of the driver currently used by ATSTK1000.
Haavard

Dear Haavard Skinnemoen,
In message 20090901123829.55fcb24e@hskinnemoen-d830 you wrote:
And that is EXACTLY why I'm trying to advocate: Keep the additional complexity (which can be kept to a minimum) localized to a handful of drivers, and don't change the command line interface or the board configuration interface.
We're not doing this. At least not intentionally.
The discussion we had was based on our knowledge about existing systems, and needs to also handle more complex situations like for example 32 bit systems with 36 bit physical addresses.
But as a matter of principle, I don't want to reduce the performance _and_ increase the code size just because a driver that used to work got broken. I want to fix the driver instead!
Agreed - assuming it is possible without hurting the majority of other existing configurations.
If you want to run with data caches enabled by default, then it would probably make more sense if you invested efforts in extending the CFI driver to provide hooks / callbacks to (temporarily) switch of data cache for the memory range in question. This wouls allow you to run with DC enabled on the flash area, and still use the CFI driver.
But that is exactly how map_physmem() works -- it allows the CFI driver (or any other driver) to request the caches to be bypassed for a given physical address range, possibly resulting in a different virtual address (though for backwards compatibility, all platforms except AVR32 simply return the physical address unchanged.)
I have to admit that I have no idea how map_physmem() used to work or how it works now; at this point, I don;t care about implementation details, I try to focus only on the overall design.
I think your double mapping is a hightly architecture-specific feature which I do not like at all, but if you are happy with it and it works for you I cannot and will not prevent it. But after discussions with Stefan Roese and Detlev Zundel it seems to me that map_physmem() is probably not the right approach (judging only from the name). We should not try to fix cache attributes by modifying addresses / address maps
Instead, we should really extend the CFI driver such that it does not matter if the addresses you are passing refer to cached or uncached memory, and then provide hooks in the CFI driver to allow for testing if cache is enabled, and switching cache off if it is.
Detlev Zundel suggested to use this as a chance to come up with something like a cache API which then could be used by other drivers as well.
To me such an approach makes much more sense, as it is generic and can be used by other drivers and other architectures - even if it may come at the cost of more effort on your systems.
Such changes will have it much easier to find their way into mainline than adding a proprietary flash driver.
It certainly won't be a proprietary driver; if anything, it would be a variant of the driver currently used by ATSTK1000.
Well, but board/atmel/atstk1000/flash.c _is_ a proprietary driver for some flash chips that seem to be CFI conformant at first glance. You would not get such a driver into mailine any more.
Best regards,
Wolfgang Denk

Wolfgang Denk wd@denx.de wrote:
Dear Haavard Skinnemoen,
In message 20090901123829.55fcb24e@hskinnemoen-d830 you wrote:
And that is EXACTLY why I'm trying to advocate: Keep the additional complexity (which can be kept to a minimum) localized to a handful of drivers, and don't change the command line interface or the board configuration interface.
We're not doing this. At least not intentionally.
Good. Then let's please put a stop to the madness of exposing virtual addresses all over the place.
The discussion we had was based on our knowledge about existing systems, and needs to also handle more complex situations like for example 32 bit systems with 36 bit physical addresses.
As far as I understand, the only difference for such systems is that keeping 64-bit physical addresses around are a bit more costly than passing around 32-bit virtual pointers. But presumably those systems have enough memory to make it a non-issue...
But as a matter of principle, I don't want to reduce the performance _and_ increase the code size just because a driver that used to work got broken. I want to fix the driver instead!
Agreed - assuming it is possible without hurting the majority of other existing configurations.
Yes, that is indeed possible, as evidenced by the fact that it used to work without hurting any other configurations. It took another "special case" to break it.
If you want to run with data caches enabled by default, then it would probably make more sense if you invested efforts in extending the CFI driver to provide hooks / callbacks to (temporarily) switch of data cache for the memory range in question. This wouls allow you to run with DC enabled on the flash area, and still use the CFI driver.
But that is exactly how map_physmem() works -- it allows the CFI driver (or any other driver) to request the caches to be bypassed for a given physical address range, possibly resulting in a different virtual address (though for backwards compatibility, all platforms except AVR32 simply return the physical address unchanged.)
I have to admit that I have no idea how map_physmem() used to work or how it works now; at this point, I don;t care about implementation details, I try to focus only on the overall design.
It works exactly the same way now as it used to work. The difference is that its return value (which is basically just an architecture-specific cookie, aka. virtual address) is exposed to a much larger part of the system.
I think your double mapping is a hightly architecture-specific feature which I do not like at all, but if you are happy with it and it works for you I cannot and will not prevent it.
Yes, it was always meant to be an architecture-specific thing. Though I think some MIPS- and SH-based processors do something very similar.
But in order to utilize architecture-specific features, we need an architecture-independent abstraction and that's basically what map_physmem() is.
But after discussions with Stefan Roese and Detlev Zundel it seems to me that map_physmem() is probably not the right approach (judging only from the name). We should not try to fix cache attributes by modifying addresses / address maps
And why not? That's what Linux does, and it works wonderfully across 26 different architectures.
Instead, we should really extend the CFI driver such that it does not matter if the addresses you are passing refer to cached or uncached memory, and then provide hooks in the CFI driver to allow for testing if cache is enabled, and switching cache off if it is.
What's the advantage of such an approach? It sounds much more complicated from the driver's point of view as well.
Detlev Zundel suggested to use this as a chance to come up with something like a cache API which then could be used by other drivers as well.
My suggestion is to use map_physmem() and unmap_physmem(). It exists today, and it works, provided that we keep its usage internal to the driver and don't expose whatever architecture-specific values it returns.
To me such an approach makes much more sense, as it is generic and can be used by other drivers and other architectures - even if it may come at the cost of more effort on your systems.
In what way is it more generic? In what way can map/unmap_physmem() not be used with other drivers and architectures?
Such changes will have it much easier to find their way into mainline than adding a proprietary flash driver.
It certainly won't be a proprietary driver; if anything, it would be a variant of the driver currently used by ATSTK1000.
Well, but board/atmel/atstk1000/flash.c _is_ a proprietary driver for some flash chips that seem to be CFI conformant at first glance. You would not get such a driver into mailine any more.
So I guess dropping support for ATNGW100 is the only remaining option? At least STK1000 has a working flash driver.
Haavard

Dear Haavard Skinnemoen,
In message 20090901134257.59961e79@hskinnemoen-d830 you wrote:
complexity (which can be kept to a minimum) localized to a handful of drivers, and don't change the command line interface or the board configuration interface.
We're not doing this. At least not intentionally.
Good. Then let's please put a stop to the madness of exposing virtual addresses all over the place.
But that's what we've been doing all the time. You just did not notice it because of the usual 1:1 mapping.
On a 32 bit system with 36 bit physical addresses you cannot use a physical address when running the "md" command, for example. we always assumed that the 32 bit VA we used matched 1:1 to a PA with the missing high bits set to 0, right?
The discussion we had was based on our knowledge about existing systems, and needs to also handle more complex situations like for example 32 bit systems with 36 bit physical addresses.
As far as I understand, the only difference for such systems is that keeping 64-bit physical addresses around are a bit more costly than passing around 32-bit virtual pointers. But presumably those systems have enough memory to make it a non-issue...
That's not true. There are 32 bit systems with bigger physical address spaces.
And note that this is not a new thing. We have been doing this allt he time - just without ever explicitly mentioning it, because so far nobody ever complained about it.
Agreed - assuming it is possible without hurting the majority of other existing configurations.
Yes, that is indeed possible, as evidenced by the fact that it used to work without hurting any other configurations. It took another "special case" to break it.
My understanding is that the special case is yours - using a non-1:1 mapping.
discussions with Stefan Roese and Detlev Zundel it seems to me that map_physmem() is probably not the right approach (judging only from the name). We should not try to fix cache attributes by modifying addresses / address maps
And why not? That's what Linux does, and it works wonderfully across 26 different architectures.
We do not want to implement a full-fledged OS with virtual memory and on-demand paging and all that stuff in U-Boot.
Instead, we should really extend the CFI driver such that it does not matter if the addresses you are passing refer to cached or uncached memory, and then provide hooks in the CFI driver to allow for testing if cache is enabled, and switching cache off if it is.
What's the advantage of such an approach? It sounds much more complicated from the driver's point of view as well.
The advantage is that other drivers with similar needs can use it as well.
To me such an approach makes much more sense, as it is generic and can be used by other drivers and other architectures - even if it may come at the cost of more effort on your systems.
In what way is it more generic? In what way can map/unmap_physmem() not be used with other drivers and architectures?
On other architectures it is not possible to map the same memory area multiple times with different attributes. Remapping addresses thus cannot help - you really have to switch the MMO resp. memory controller setinngs to turn data cache on or off.
Well, but board/atmel/atstk1000/flash.c _is_ a proprietary driver for some flash chips that seem to be CFI conformant at first glance. You would not get such a driver into mailine any more.
So I guess dropping support for ATNGW100 is the only remaining option?
No, why? We're discussing ways to fix the problems, aren;t we?
At least STK1000 has a working flash driver.
Only because it was added so long ago, before we were more consequent about using the generic driver.
It would be a good idea to clean up this board support, remove the proprietary flash driver and use the CFI driver instead. Patches are welcome.
Best regards,
Wolfgang Denk

Wolfgang Denk wd@denx.de wrote:
Dear Haavard Skinnemoen,
In message 20090901134257.59961e79@hskinnemoen-d830 you wrote:
complexity (which can be kept to a minimum) localized to a handful of drivers, and don't change the command line interface or the board configuration interface.
We're not doing this. At least not intentionally.
Good. Then let's please put a stop to the madness of exposing virtual addresses all over the place.
But that's what we've been doing all the time. You just did not notice it because of the usual 1:1 mapping.
Up until this commit, yes:
commit 09ce9921a7d8b1ce764656b14b42217bbf4faa38 Author: Becky Bruce beckyb@kernel.crashing.org Date: Mon Feb 2 16:34:51 2009 -0600
flash/cfi_flash: Use virtual sector start address, not phys
after that, NGW100 support is broken because virtual addresses are no longer an implementation detail and are being exposed all over the place.
On a 32 bit system with 36 bit physical addresses you cannot use a physical address when running the "md" command, for example.
Yes you can, if you teach the "md" command to map it at a virtual address first. Again, map_physmem() makes this possible without changing the external interface in any way.
we always assumed that the 32 bit VA we used matched 1:1 to a PA with the missing high bits set to 0, right?
Yes, but how can you possibly access an arbitrary 36-bit address using that setup?
The discussion we had was based on our knowledge about existing systems, and needs to also handle more complex situations like for example 32 bit systems with 36 bit physical addresses.
As far as I understand, the only difference for such systems is that keeping 64-bit physical addresses around are a bit more costly than passing around 32-bit virtual pointers. But presumably those systems have enough memory to make it a non-issue...
That's not true. There are 32 bit systems with bigger physical address spaces.
Which part of what I said isn't true? The part about some systems might require 64-bit variables to store a physical address or that such variables take more storage than a 32-bit virtual address?
And note that this is not a new thing. We have been doing this allt he time - just without ever explicitly mentioning it, because so far nobody ever complained about it.
Doing what exactly? Limiting 36-bit PA systems to only use the lower 4GB of memory because VA must always equal PA come hell or high water?
Agreed - assuming it is possible without hurting the majority of other existing configurations.
Yes, that is indeed possible, as evidenced by the fact that it used to work without hurting any other configurations. It took another "special case" to break it.
My understanding is that the special case is yours - using a non-1:1 mapping.
But the other system must be a special case too -- otherwise it would work fine without commit 09ce9921a7d8b1ce764656b14b42217bbf4faa38. That commit does not make any difference whatsoever on 1:1 systems.
If the other system isn't a special case, why don't we just revert that commit?
discussions with Stefan Roese and Detlev Zundel it seems to me that map_physmem() is probably not the right approach (judging only from the name). We should not try to fix cache attributes by modifying addresses / address maps
And why not? That's what Linux does, and it works wonderfully across 26 different architectures.
We do not want to implement a full-fledged OS with virtual memory and on-demand paging and all that stuff in U-Boot.
Then why are you forcing me to implement it for AVR32?!?
Instead, we should really extend the CFI driver such that it does not matter if the addresses you are passing refer to cached or uncached memory, and then provide hooks in the CFI driver to allow for testing if cache is enabled, and switching cache off if it is.
What's the advantage of such an approach? It sounds much more complicated from the driver's point of view as well.
The advantage is that other drivers with similar needs can use it as well.
But they can use map_physmem() today! It allows you to specify exactly what caching properties you need. The fact that it _may_ return a virtual address which is different from the physical one just allows more flexibility in how the architecture chooses to implement it!
To me such an approach makes much more sense, as it is generic and can be used by other drivers and other architectures - even if it may come at the cost of more effort on your systems.
In what way is it more generic? In what way can map/unmap_physmem() not be used with other drivers and architectures?
On other architectures it is not possible to map the same memory area multiple times with different attributes.
So what? They can just return the physical address unchanged. It's not _mandatory_ to remap anything...
Remapping addresses thus cannot help - you really have to switch the MMO resp. memory controller setinngs to turn data cache on or off.
Yes, and map_physmem() allows an implementation to do that.
Well, but board/atmel/atstk1000/flash.c _is_ a proprietary driver for some flash chips that seem to be CFI conformant at first glance. You would not get such a driver into mailine any more.
So I guess dropping support for ATNGW100 is the only remaining option?
No, why? We're discussing ways to fix the problems, aren;t we?
Because we don't seem to be getting anywhere. You're ignoring all my arguments about why it is necessary to care about virtual addresses internally in the driver, yet you think it's fine to expose virtual addresses to the rest of the world, causing incompatibilities all over the place.
If your design doesn't fit existing and working hardware, it's your design which is broken, not the hardware. Especially when other software gets along with it just fine, and even u-boot used to work without problems.
At least STK1000 has a working flash driver.
Only because it was added so long ago, before we were more consequent about using the generic driver.
And thank $DEITY for that!
It would be a good idea to clean up this board support, remove the proprietary flash driver and use the CFI driver instead. Patches are welcome.
You must be joking. Replacing a working driver with a broken one? Why the hell would anyone do that?
Haavard

Dear Haavard Skinnemoen,
In message 20090901152305.68e8d339@hskinnemoen-d830 you wrote:
On a 32 bit system with 36 bit physical addresses you cannot use a physical address when running the "md" command, for example.
Yes you can, if you teach the "md" command to map it at a virtual address first. Again, map_physmem() makes this possible without changing the external interface in any way.
Well, that was the part of me saying before: "as long as it doesn't hurt others". We don't want to add that complexity to U-Boot as noone needs it.
The only few systems that seem to have problems are yours with their exotic memory mappping.
As far as I understand, the only difference for such systems is that keeping 64-bit physical addresses around are a bit more costly than passing around 32-bit virtual pointers. But presumably those systems have enough memory to make it a non-issue...
That's not true. There are 32 bit systems with bigger physical address spaces.
Which part of what I said isn't true? The part about some systems might require 64-bit variables to store a physical address or that such variables take more storage than a 32-bit virtual address?
Well, that the have enough memory not to case about size, for example. [Keep in mind that this also effects the U-Boot image size in NOR flash.]
We do not want to implement a full-fledged OS with virtual memory and on-demand paging and all that stuff in U-Boot.
Then why are you forcing me to implement it for AVR32?!?
I'm not. I'm suggesting to implement a plain stupid 1:1 mapping and a function to turn on and off data cache (at least on the flash area).
The advantage is that other drivers with similar needs can use it as well.
But they can use map_physmem() today! It allows you to specify exactly what caching properties you need. The fact that it _may_ return a virtual address which is different from the physical one just allows more flexibility in how the architecture chooses to implement it!
You make assumptions on how other architectures work that may or may not be true.
On other architectures it is not possible to map the same memory area multiple times with different attributes.
So what? They can just return the physical address unchanged. It's not _mandatory_ to remap anything...
So how would these benefit from using data cache when reading from flash? This works for you only because you are then reading from a different address range.
It would be a good idea to clean up this board support, remove the proprietary flash driver and use the CFI driver instead. Patches are welcome.
You must be joking. Replacing a working driver with a broken one? Why the hell would anyone do that?
Fix the issues on the way?
Best regards,
Wolfgang Denk

Wolfgang Denk wd@denx.de wrote:
Well, that was the part of me saying before: "as long as it doesn't hurt others". We don't want to add that complexity to U-Boot as noone needs it.
Right. I forgot that nobody actually needs this.
Fuck it, I'm out.
Haavard

Hi,
I don't want to intrude too much into the discussion, but I would like to offer a small bit of info
On Tue, Sep 1, 2009 at 10:23 AM, Haavard Skinnemoenhaavard.skinnemoen@atmel.com wrote:
It would be a good idea to clean up this board  support,  remove  the proprietary  flash driver and use the CFI driver instead. Patches are welcome.
You must be joking. Replacing a working driver with a broken one? Why the hell would anyone do that?
My custom board uses AT49BV640D. I wen't thru a lot of trouble getting u-boot to work with it. And one of my attempts was to use the "working driver" from stk1000 and it didn't work. On the other hand, the CFI driver with the tripple revert that Haavard proposed did. I'm currently patching it like that so I can continue my development, while people with much more knowleadge than I have on u-boot code could fix the issue, but looks like I'm about to get orphan on u-boot and Atmel.
Kind Regards, Thiago A. Correa

Thiago A. CorrĂȘa thiago.correa@gmail.com wrote:
Hi,
I don't want to intrude too much into the discussion, but I would like to offer a small bit of info
Oh, I wish more people would intrude ;-)
On Tue, Sep 1, 2009 at 10:23 AM, Haavard Skinnemoenhaavard.skinnemoen@atmel.com wrote:
It would be a good idea to clean up this board  support,  remove  the proprietary  flash driver and use the CFI driver instead. Patches are welcome.
You must be joking. Replacing a working driver with a broken one? Why the hell would anyone do that?
My custom board uses AT49BV640D. I wen't thru a lot of trouble getting u-boot to work with it. And one of my attempts was to use the "working driver" from stk1000 and it didn't work.
Understandably since 640D uses the intel command set while the 6416 chip on STK1000 uses the AMD command set and has a couple of interesting bugs...
Now, I still want to use common code as much as possible, but it's always been quite expensive in terms of code size, and it currently doesn't even work. While both of those should be possible to overcome, I'm getting incredibly frustrated that all my attempts at fixing it are being shot down by arcane, non-sensical rules which aren't even being enforced consistently.
On the other hand, the CFI driver with the tripple revert that Haavard proposed did. I'm currently patching it like that so I can continue my development, while people with much more knowleadge than I have on u-boot code could fix the issue, but looks like I'm about to get orphan on u-boot and Atmel.
Right...I'm beginning to doubt that anyone is familiar enough with the u-boot code, since everyone seems to have their own opinion about how things are supposed to work.
To summarize, here are the possible ways to fix the problem as I see it: - Use virtual address in CONFIG_ENV_ADDR to conform with the way the CFI driver currently works. Rejected by Wolfgang because virtual addresses don't exist. - Fix the API and user interface breakage by reverting commit 09ce9921. Rejected because virtual addresses are used everywhere. - Fix it by using map_physmem() in a way that works for all architectures. Rejected because all other architectures than PPC are evil and need to be punished for doing things differently. - Introduce a custom flash driver for ATNGW100. Rejected because stupid principles are more important than making things work.
So I don't really know where to proceed from here. I guess two additional options are forking the damn thing or creating a proprietary bootloader, but I don't really want to do either.
Haavard

Haavard Skinnemoen wrote:
<snip>
Right...I'm beginning to doubt that anyone is familiar enough with the u-boot code, since everyone seems to have their own opinion about how things are supposed to work.
To summarize, here are the possible ways to fix the problem as I see it:
- Use virtual address in CONFIG_ENV_ADDR to conform with the way the CFI driver currently works. Rejected by Wolfgang because virtual addresses don't exist.
- Fix the API and user interface breakage by reverting commit 09ce9921. Rejected because virtual addresses are used everywhere.
- Fix it by using map_physmem() in a way that works for all architectures. Rejected because all other architectures than PPC are evil and need to be punished for doing things differently.
Your "triple revert" patch doesn't look overly complex, and seems to only add a few map_physmem() calls (I'm summarising *quite* a bit there !!).
Is there not some way using weak functions (or similar) to add some AVR32 specific workarounds.
Or ... there's *plenty* of arch specific #ifdefs in most of the rest of u-boot, so could we not just "#ifdef AVR32" the existing code for the time being until this sticking point gets unstuck ?
- Introduce a custom flash driver for ATNGW100. Rejected because stupid principles are more important than making things work.
So I don't really know where to proceed from here. I guess two additional options are forking the damn thing or creating a proprietary bootloader, but I don't really want to do either.
Me neither !!
Mark

I have followed the recent discussions about problems in the CFI driver caused by the need to change the attributes of the address at which the flash is mapped. This discussion has raised some questions in my mind regarding the assumptions u-boot makes regarding the behavior of the addresses used in the code. I am writing this message for comments that may correct any mis-understandings I have. First, the addresses used by the u-boot program to reference memory and code are all "virtual" addresses (VA) because they go through the MMU to become a physical address(PA). If there is no MMU, the virtual address and the physical address are identical. The "normal", or legacy,policy for u-boot is to arrange a memory map such that all physical addresses are mapped to some virtual address. Originally, the mapping were such that the VA was actually == the PA, but today on some CPUs, this is not possible. When the size of the physical address space exceeds the size of the virtual address space, the VA may not =- the PA numerically, but there is a one-to-one correspondence. It MAY also acceptable to map the same PA so that it appears more than once in the address space (VA), but if this is done, any references inside u-boot (such as in drivers) must be consistent, i.e. the flash chip must exist at exactly one virtual address from u-boots viewpoint. This is required because the u-boot drivers maintain pointers into the flash, and it is important to be able to compare these pointers to input virtual addresses. It is also necessary that the VA of the flash chip not change after the driver has been set up, or else any saved pointers will be wrong. Becky Bruce "re-wrote the driver to use VAs instead of PAs." I am not exactly sure what this means, but I assume it meant allowing the VA referencing the flash to be distinct from the PA where the flash "lives" (and may require 36 bits on a PPC system to represent the PA). Does the driver re-map portions of the flash in order to access them? If the flash is really large, I can certainly see a need to do so. However, I assume on "medium size" flashes, it does not need to remap. In that case, don't all references just go through the MMU and get translated? The VA != PA, but from the point of view of u-boot, the VA is the only address that matters. The AVR32 certainly does not map flash dynamically, so it would not matter on that CPU. The issue with the CFI driver on the AVR32 is that it needs to disable cache on the address space of the flash memory when it is writing to the flash. This apparently is not trivial to do, but there is a second mapping that does have the cache off. Wolfgang has recommended the creation of a function to turn off the cache for the flash area, and also (presumably) one to turn the flash back on when the write is complete. Haavard has at present a function that returns an alternate address with the cache already off that addresses the same memory. This wouldn't cause a problem if the mapping happened immediately before the actual copy operation took place and was used for nothing else. However, if it happens early on in the driver, the address will not match the structure set up by the rest of the flash code using the non-translated address. Therefore, I have the following questions: If the map_physmem() macro is removed from the driver on the AVR32, does the driver work if it is told that the flash PA=VA = the un-cached address? If not, why not? Shouldn't this be just like any CFI on an un-cached PPC address? The driver will be somewhat slower reading but otherwise it should work. If/when it does work, couldn't a map_in_cache() macro be placed directly in front of the read code that copies data from flash to other buffers. The macro would return an address of the same data referenced through a cached address if it exists. This address would go nowhere else and never be stored anywhere. This would speed up the copy operation for situations where it matters, and is applicable to all platforms that can do such a thing. The most general solution would be a call to map_in_cache/map_in_not_cache for both reads and writes in the CFI driver. These routines would return a "substitute" address (or the same input one), and may actually add another mapping dynamically, use a pre-existing appropriate mapping, just turn on/turn off data cache globally, or do nothing). At the end of the copy, map_restore() would put the map back By default, the assumption would be that the flash is not cached and the macros do nothing. Sounds simple to me, what have I overlooked?
Bill Campbell
Haavard Skinnemoen wrote:
<snip>
Right...I'm beginning to doubt that anyone is familiar enough with the u-boot code, since everyone seems to have their own opinion about how things are supposed to work.
To summarize, here are the possible ways to fix the problem as I see it:
- Use virtual address in CONFIG_ENV_ADDR to conform with the way the CFI driver currently works. Rejected by Wolfgang because virtual addresses don't exist.
- Fix the API and user interface breakage by reverting commit 09ce9921. Rejected because virtual addresses are used everywhere.
- Fix it by using map_physmem() in a way that works for all architectures. Rejected because all other architectures than PPC are evil and need to be punished for doing things differently.
Your "triple revert" patch doesn't look overly complex, and seems to only add a few map_physmem() calls (I'm summarising *quite* a bit there !!).
Is there not some way using weak functions (or similar) to add some AVR32 specific workarounds.
Or ... there's *plenty* of arch specific #ifdefs in most of the rest of u-boot, so could we not just "#ifdef AVR32" the existing code for the time being until this sticking point gets unstuck ?
- Introduce a custom flash driver for ATNGW100. Rejected because stupid principles are more important than making things work.
So I don't really know where to proceed from here. I guess two additional options are forking the damn thing or creating a proprietary bootloader, but I don't really want to do either.
Me neither !!
Mark _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

Dear "J. William Campbell",
In message 4A9D5EF2.4030307@comcast.net you wrote:
I have followed the recent discussions about problems in the CFI
driver caused by the need to change the attributes of the address at which the flash is mapped. This discussion has raised some questions in my mind regarding the assumptions u-boot makes regarding the behavior of the addresses used in the code. I am writing this message for comments that may correct any mis-understandings I have. First, the addresses used by the u-boot program to reference memory and code are all "virtual" addresses (VA) because they go through the MMU to become a physical address(PA). If there is no MMU, the virtual address and the physical address are identical.
Even if there is a MMU, we keep it "switched off" on most systems, or otherwise set it up in such a way that there is still a 1:1 mapping, i. e. the virtual address and physical address are identical, too.
There have been several discussions concerning this topic on IRC (#u-boot at freenode); I'll try to summarize these here - not sure though if I don't miss anything: please feel free to complement what might be missing.
[Times are MET/MEST]
Nov 20, 2008:
[19:40:03] galak: do we think of nand->IO_ADDR_R as a physical or virtual addr? [19:40:10] scottwood: Wouldn't it need to change on Linux too? [19:40:42] galak: don't know do we does drivers/mtd/nand/fsl_elbc_nand.c look the same in linux? [19:40:48] scottwood: IO_ADDR_R is virtual [19:41:01] scottwood: So it needs to change with virt != phys [19:41:51] galak: ok, so this loop in board_nand_init(): [19:41:51] galak: for (priv->bank = 0; priv->bank < MAX_BANKS; priv->bank++) { [19:41:51] galak: br = in_be32(&elbc_ctrl->regs->bank[priv->bank].br); [19:41:51] galak: or = in_be32(&elbc_ctrl->regs->bank[priv->bank].or); [19:41:51] galak: [19:41:52] galak: if ((br & BR_V) && (br & BR_MSEL) == BR_MS_FCM && [19:41:54] galak: (br & or & BR_BA) == nand->IO_ADDR_R) [19:41:56] galak: break; [19:41:58] galak: } [19:42:24] galak: use comparing a virt (IO_ADDR_R) with physical (br & or & BR_BA) [19:43:25] scottwood: Right, it assumess identity mapping. We'll need some way of finding the physical address now. [19:45:06] scottwood: Does eLBC just ignore the upper 4 bits of the physical address, and rely on LAWs for that? [19:45:16] galak: yep [19:45:31] galak: which makes this slightly more annoying [19:47:21] galak: we could have board code implement a function that deals with the mapping (given IO_ADDR_R, it hands bank the bank #) [19:47:36] scottwood: I'd rather not make it board-specific. [19:47:56] scottwood: Is there a general virt-to-phys function in u-boot? [19:49:10] galak: nope [19:49:27] scottwood: There probably should be. [19:49:38] galak: agreed, but that a huge overhaul [19:50:29] scottwood: How many other platforms have virt != phys? [19:50:56] galak: not sure, probably none [19:51:10] scottwood: So virt_to_phys could do tlbsx on book-e and return the argument on everything else. [19:51:36] galak: yeah, we could do that [19:52:23] galak: we should probably have it walk BATs on classic [19:52:33] scottwood: yeah [19:53:57] galak: what do you think the API for virt_to_phys() looks like? (how to report error if no mapping exists?) [19:54:54] scottwood: Could return ~0, or have the physical address returned via pointer. [19:57:32] galak: ok..will work something up [19:58:17] scottwood: thanks
Nov 26, 2008:
[20:09:43] beckyb: it's the whole virtual vs physical address question [20:10:04] beckyb: when I did the 36b stuff that's in the tree now, I went for the minimally invasive approach [20:10:27] beckyb: so, at the moment, all the command-line stuff takes a *virtual* address as an argument [20:10:41] beckyb: I'm wondering if that's really what we want [20:17:18] beckyb: wdenk_: The whole issue has come up because with galak's map_physmem patches, we have to actually start distinguishing between the 2 kinds of addresses inside u-boot [20:17:50] beckyb: right now, the 36-bit code is taking advantage of the fact that u-boot doesn't know the difference [20:18:20] beckyb: so the *physical* address only really exists right now in the MMU mapping [20:18:33] beckyb: and in a few other places that actually care about the PA [20:18:39] beckyb: I'll stop babbling now [20:20:39] wdenk_: beckyb: I have to admit that I don't see the immediate problem yet. [20:20:56] beckyb: wdenk_: OK, I'll babble some more [20:21:00] wdenk_: beckyb: So far, everybody seems to be happy with using virtual addresses. [20:21:01] beckyb: Let me use a concrete example [20:21:14] beckyb: wdenk_: and I'm *fine* if we stick with that [20:21:26] beckyb: I just want to be sure that's what we really want [20:21:54] beckyb: I've been working on the flash code, which stores the physical sector address in a structure, then calls map_physmem to get a va *most* of the time it uses it [20:22:16] beckyb: I've corrected all the places where it treats a PA as a pointer, no problem [20:22:33] beckyb: but the "fli" command right now displays the physical address [20:22:48] beckyb: if we're sticking with VAs on the command line, I'd like to change "fli" to display the VAs [20:23:00] beckyb: so that the cp commands and fli take the same addrs [20:23:06] wdenk_: Hm... I can't tell for sure what we want either. [20:23:14] beckyb: it's a tough problem [20:23:25] beckyb: using the VA in "fli" is a much less invasive solution [20:23:44] wdenk_: But I agree that in such cases where it makes a difference, we should agree to use one consistent set of addresses. [20:24:05] beckyb: if we go to using the "pa" on the command line, the changes get fairly invasive [20:24:21] beckyb: as we have to go to phys_addr_t and strtoull [20:24:44] beckyb: using the va will work *as long as u-boot doesn't start doing paging* [20:25:03] beckyb: because using the VA assumes that mapping from VA to PA exists [20:25:24] beckyb: otoh, if we use the PA, we have to change all the commands to call map_physmem to get a VA before doing anything [20:25:25] wdenk_: Right. And I guess we can push this point at least out of this decade [20:25:36] beckyb: sure, which is why I went with the VA for now [20:25:47] beckyb: but before I made any more code changes, I wanted to talk with you about it [20:26:20] wdenk_: I think we should ask this again on the ML - I'm easily overlooking some aspects here. [20:26:36] beckyb: ok, I'll craft a note and send it out [20:26:41] beckyb: thanks [20:26:44] wdenk_: My gut feeling is that VA's are just fine. Hey, that's still a boot loader. [20:26:51] beckyb: exactly [20:27:22] beckyb: I'll probably word the note as "we're going with VAs on the command line unless somebody gives me a very good reason not to " [20:27:23] beckyb: [20:27:34] wdenk_: But I also see that we might need to extend some commands to operate on the PA's as well - in some cases you really want to be able to "md" or "mw" directly to some PCI addresses or similar. [20:27:43] beckyb: right, that makes sense [20:28:01] beckyb: at the least, we could start with a translation command [20:28:19] wdenk_: good idea, that's even less intrusive. [20:28:22] beckyb: that gives you the PA for a VA, so you can invoke the commands correctly. It's a quick and easy fix [20:28:31] wdenk_: But then, it doesn't solve the problem. [20:28:42] beckyb: nope, the user still has to be aware [20:28:52] wdenk_: Assume you have a 32 bit system with 36 bit PA's. [20:28:52] beckyb: The problem is, we don't really know what the user is thinking now [20:29:07] beckyb: which I do [20:29:10] wdenk_: How do you enter a PA then? [20:29:40] beckyb: the translate command would have to be able to parse that, but all the other commands could still take the VA [20:30:14] wdenk_: We have this case with the 440SPe "katmai" board with 4 GB of RAM. [20:30:52] beckyb: right, and there are some limitations there currently [20:31:04] beckyb: like you can't "mtest" all of RAM, because you can't map it all in [20:31:11] beckyb: I haven't fixed that yet [20:31:44] beckyb: but the rearranging of the memory map I did will make it easier to deal with on 8641 [20:31:48] wdenk_: "mtest" is the smallest problem. You always can run it on a documented range only. [20:31:59] beckyb: sure, it's just an example [20:32:51] beckyb: the fundamental point is, you're limited to accessing what you can map in 32 bits, unless we start making wholesale changes to u-boot [20:33:10] beckyb: because software sees things at VAs [20:35:17] wdenk_: So we agree to use all VA's in U-Boot, plus initially we will add an "xlat" command, and eventually (where and when needed) we might extend commands to recognize and handle PA's (something like "md xxxx" for VA versus "md .xxxx" for PA or similar) ? [20:35:39] beckyb: something like that, yeah, or maybe md -p xxxx [20:36:23] beckyb: and we need to start watching new code coming in very carefully, to start weaning people off this whole VA=PA assumption [20:36:38] beckyb: for common stuff, anyway [20:36:49] wdenk_: beckyb: I think it might be better to ad a marker to the address argument in case we should have commands where we have to pass more than one address and want to use different ones - like "cp" from VA to PA or vice versa. [20:36:58] beckyb: ah, right [20:37:10] beckyb: do any other u-boot commands do this? [20:37:41] wdenk_: take more than one address? well, let me think.... bootm does. [20:37:48] wdenk_: cmp [20:37:54] beckyb: no, I mean that have some sort of marker in an arg [20:38:04] beckyb: sorry, I wasn't clear [20:38:41] wdenk_: I'm not sure. At least we do this in the command names - like "cp" versus "cp.b" [20:39:00] beckyb: true..... so what if we had the marker be p. [20:39:17] beckyb: The "." by itself is going to screw me up [20:39:25] wdenk_: ".p" then? [20:39:38] wdenk_: md feeddeadbeef.p [20:40:01] beckyb: ah, it's at the end now. Sure, works for me. And we should also accept .v for completeness [20:40:43] wdenk_: We can also use a "p." / "v." prefix - I'm not sure if I have any preferences yet [20:41:12] beckyb: yeah, we can work that out later I'm sure the list will have preferences! [20:41:59] beckyb: I'll try to get something out to the list this afternoon
Becky then posted the summary of this discussion here:
http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/50705
Note that there was a general agreement among those who raised their voices.
The "normal", or legacy,policy for u-boot is to arrange a memory
map such that all physical addresses are mapped to some virtual address. Originally, the mapping were such that the VA was actually == the PA, but today on some CPUs, this is not possible. When the size of the physical address space exceeds the size of the virtual address space, the VA may not =- the PA numerically, but there is a one-to-one correspondence. It MAY also acceptable to map the same PA so that it appears more than once in the address space (VA), but if this is done,
This may or may not be possible. It may even make sense or be needed on some systems, and it may be impossible to do on others.
In any case, I think we should be careful not to mix things: what we are discussing here are address mappings. What we are not discussing is specific memory properties like being cached/uncached, guarded/ non-guarded, etc.
Such properties are important, too, but they need to get handled through a separate interface.
Becky Bruce "re-wrote the driver to use VAs instead of PAs." I am
not exactly sure what this means, but I assume it meant allowing the VA referencing the flash to be distinct from the PA where the flash "lives" (and may require 36 bits on a PPC system to represent the PA). Does the driver re-map
I think the information provided above sheds more light on this.
portions of the flash in order to access them? If the flash is really large, I can certainly see a need to do so. However, I assume on "medium size" flashes, it does not need to remap. In that case, don't all references just go through the MMU and get translated? The VA != PA, but from the point of view of u-boot, the VA is the only address that matters. The AVR32 certainly does not map flash dynamically, so it would not matter on that CPU.
OK.
The issue with the CFI driver on the AVR32 is that it needs to
disable cache on the address space of the flash memory when it is writing to the flash. This apparently is not trivial to do, but there is
Actually this is not specific to the AVR32, and so far most systems simply do not enable caches at all on the flash memory regions. I understand why the AVR32 solution is interesting, and I think, when we try to find a solution for this we should use this chance to find a solution that also allows other systems to turn on caches on the flash memory - things like loading the Linux kernel or ramdisk images etc. will benefit from that.
a second mapping that does have the cache off. Wolfgang has recommended the creation of a function to turn off the cache for the flash area, and also (presumably) one to turn the flash back on when the write is complete. Haavard has at present a function that returns an alternate
Right. My rationale for this was the wish to provide such a solution for all systems, including those that don;t allow to have several mappings (with different attributes) for the same physical memory region.
address with the cache already off that addresses the same memory. This wouldn't cause a problem if the mapping happened immediately before the actual copy operation took place and was used for nothing else. However, if it happens early on in the driver, the address will not match the structure set up by the rest of the flash code using the non-translated address.
And this method will not work on all systems that don't support such multiple mappings.
Therefore, I have the following questions: If the map_physmem()
macro is removed from the driver on the AVR32, does the driver work if it is told that the flash PA=VA = the un-cached address? If not, why
In the current state the situation PA=VA=un-cached is the default on almost all systems, and is workign fine there.
not? Shouldn't this be just like any CFI on an un-cached PPC address? The driver will be somewhat slower reading but otherwise it should work. If/when it does work, couldn't a map_in_cache() macro be placed directly in front of the read code that copies data from flash to other buffers.
While there are specific routines to "write" to the flash (init, erase, write), there is no specific code to "read" from flash. Reading is allowed everywhere by just performing load instructions from this memory area. The CFI driver (nor any other flash driver) is needed or involved to do that. That's the whole big advantage of NOR flash (which makes it _memory_) over storage devices like NAND etc. (which are _not_ memory).
You would have to add this macro everywhere - on anything that accesses memory. All commands that take memory addresses either directly or indirectly, each and every load instruction. That's not practical. [OK, you could probably set up the MMU to trap on read accesses on such a memory reagion, but that would not exactly be simpler either.]
The macro would return an address of the same data referenced through a cached address if it exists. This address would go nowhere else and never be stored anywhere. This would speed up the copy operation for situations where it matters, and is applicable to all platforms that can do such a thing. The most general solution would be a call to map_in_cache/map_in_not_cache for both reads and writes in the CFI
No. The CFI driver is not used for read operations, so this does not work.
driver. These routines would return a "substitute" address (or the same input one), and may actually add another mapping dynamically, use a pre-existing appropriate mapping, just turn on/turn off data cache globally, or do nothing). At the end of the copy, map_restore() would put the map back By default, the assumption would be that the flash is not cached and the macros do nothing. Sounds simple to me, what have I overlooked?
The only thing you overlooked in my opinion is that read accesses to NOR flash are plain memory accesses that are not handled by the CFI or any other driver.
Best regards,
Wolfgang Denk

Wolfgang Denk wrote:
Dear "J. William Campbell",
In message 4A9D5EF2.4030307@comcast.net you wrote:
I have followed the recent discussions about problems in the CFI
driver caused by the need to change the attributes of the address at which the flash is mapped. This discussion has raised some questions in my mind regarding the assumptions u-boot makes regarding the behavior of the addresses used in the code. I am writing this message for comments that may correct any mis-understandings I have. First, the addresses used by the u-boot program to reference memory and code are all "virtual" addresses (VA) because they go through the MMU to become a physical address(PA). If there is no MMU, the virtual address and the physical address are identical.
Even if there is a MMU, we keep it "switched off" on most systems, or otherwise set it up in such a way that there is still a 1:1 mapping, i. e. the virtual address and physical address are identical, too.
There have been several discussions concerning this topic on IRC (#u-boot at freenode); I'll try to summarize these here - not sure though if I don't miss anything: please feel free to complement what might be missing.
<snip>
Becky then posted the summary of this discussion here:
http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/50705
Note that there was a general agreement among those who raised their voices.
In quick summary, for the next few years, we will require that all "important" physical addresses have corresponding virtual addresses. Some limited support for mapping in "other resources" may be provided at an operator interface level, but it will be quite limited. OK, seems reasonable to me.
The "normal", or legacy,policy for u-boot is to arrange a memory
map such that all physical addresses are mapped to some virtual address. Originally, the mapping were such that the VA was actually == the PA, but today on some CPUs, this is not possible. When the size of the physical address space exceeds the size of the virtual address space, the VA may not =- the PA numerically, but there is a one-to-one correspondence. It MAY also acceptable to map the same PA so that it appears more than once in the address space (VA), but if this is done,
This may or may not be possible. It may even make sense or be needed on some systems, and it may be impossible to do on others.
In any case, I think we should be careful not to mix things: what we are discussing here are address mappings. What we are not discussing is specific memory properties like being cached/uncached, guarded/ non-guarded, etc.
Such properties are important, too, but they need to get handled through a separate interface.
Here is where I am quite sure you are going to have a problem. In very many CPUs, cache control and memory management are joined at the hip. Some systems have no easy way to enable and disable (D,I) cache globally, it is only doable on a page or segment basis. The PPC hardware has a relatively low cost way to do so, but not all architectures do.
Becky Bruce "re-wrote the driver to use VAs instead of PAs." I am
not exactly sure what this means, but I assume it meant allowing the VA referencing the flash to be distinct from the PA where the flash "lives" (and may require 36 bits on a PPC system to represent the PA). Does the driver re-map
I think the information provided above sheds more light on this.
Yes, it did.
portions of the flash in order to access them? If the flash is really large, I can certainly see a need to do so. However, I assume on "medium size" flashes, it does not need to remap. In that case, don't all references just go through the MMU and get translated? The VA != PA, but from the point of view of u-boot, the VA is the only address that matters. The AVR32 certainly does not map flash dynamically, so it would not matter on that CPU.
OK.
The issue with the CFI driver on the AVR32 is that it needs to
disable cache on the address space of the flash memory when it is writing to the flash. This apparently is not trivial to do, but there is
Actually this is not specific to the AVR32, and so far most systems simply do not enable caches at all on the flash memory regions. I understand why the AVR32 solution is interesting, and I think, when we try to find a solution for this we should use this chance to find a solution that also allows other systems to turn on caches on the flash memory - things like loading the Linux kernel or ramdisk images etc. will benefit from that.
Full ACK.
<snip>
While there are specific routines to "write" to the flash (init, erase, write), there is no specific code to "read" from flash. Reading is allowed everywhere by just performing load instructions from this memory area. The CFI driver (nor any other flash driver) is needed or involved to do that. That's the whole big advantage of NOR flash (which makes it _memory_) over storage devices like NAND etc. (which are _not_ memory).
You would have to add this macro everywhere - on anything that accesses memory. All commands that take memory addresses either directly or indirectly, each and every load instruction. That's not practical. [OK, you could probably set up the MMU to trap on read accesses on such a memory reagion, but that would not exactly be simpler either.]
Very True. I did forget about the read being just a memory reference. So if we desire the flash to be cached, it would have to "normally" be cached for reads to take advantage of the operation. <snip>
The only thing you overlooked in my opinion is that read accesses to NOR flash are plain memory accesses that are not handled by the CFI or any other driver.
Thanks for looking at this. It therefore seems to me that adding an "uncache(virtual address)" operation (that may return a substitute address for the actual write to the flash) followed by a "restore_cache()" operation inside the flash driver write routine should work. The uncache routine would do nothing if the flash is not cached to begin with, would globally turn off data cache if that is easy to do, or would provide an alternate virtual address to be used in the write. That alternate address would either be obtained from a statically mapped copy of the flash memory with cache disabled in that virtual address region, or a dynamic map added to the MMU that will cause references via the returned virtual address to the physical flash memory to be un-cached. Choose the approach that fits your hardware best. In any case, the D cache and/or I cache may need to be flushed on a write if the flash is mapped in cache anywhere. The restore_cache routine would do these operations if necessary, and also un-do whatever was done in the uncache routine. This way the flash can be mapped in to regular memory as cacheable, so we can get the speed advantages in normal operation. Writing to flash will require somewhat different tricks depending on the CPU.
Best regards,
Wolfgang Denk
Best Regards, Bill Campbell

Dear "J. William Campbell",
In message 4A9D99B1.1010707@comcast.net you wrote:
...
Becky then posted the summary of this discussion here:
http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/50705
...
In quick summary, for the next few years, we will require that all "important" physical addresses have corresponding virtual addresses.
...unless there are good reasons to deviate from that rule, but these cases are expected to be rare exceptions from the rule.
In any case, I think we should be careful not to mix things: what we are discussing here are address mappings. What we are not discussing is specific memory properties like being cached/uncached, guarded/ non-guarded, etc.
Such properties are important, too, but they need to get handled through a separate interface.
Here is where I am quite sure you are going to have a problem. In very many CPUs, cache control and memory management are joined at the hip. Some systems have no easy way to enable and disable (D,I) cache globally, it is only doable on a page or segment basis. The PPC hardware has a relatively low cost way to do so, but not all architectures do.
I am well aware of this problem, which is one of the reasons that the majority of systems is running with data cache turned off. Even PowerPC has DC off (at least after relocation to RAM), ARM does not implement cache support yet, etc.
When we state that U-Boot is a boot loader and thus should be kept simple, this more or less logically results in the consequence that if it's difficult to enable the DC (on some systems), then just don't do it, then.
Nobody enforces you to enable caches when you find it hard to do.
Very True. I did forget about the read being just a memory reference. So if we desire the flash to be cached, it would have to "normally" be cached for reads to take advantage of the operation.
ACK.
Thanks for looking at this. It therefore seems to me that adding an "uncache(virtual address)" operation (that may return a substitute address for the actual write to the flash) followed by a "restore_cache()" operation inside the flash driver write routine should work. The uncache routine would do nothing if the flash is not cached to
This is where Detlev's comment about using the chance to define a cache API comes into play.
Yes, we probably should create a set of functions like
enable_data_cache(address, size); and disable_data_cache(address, size);
which would turn on resp. off the caching attributes on the given memory range.
begin with, would globally turn off data cache if that is easy to do, or would provide an alternate virtual address to be used in the write. That
This is where I disagree.
I'm not really deep enough in the implementation details and thus would appreciate comments for example from Becky and Stefan. In my opinion, turning on or off the cache on an address range should be implemented as outlined above, i. e. as an operation changing the caching properties of the address range.
Using a completely different address range instead is a different thing, and not what I have in mind. I really dislike the idea of supporting "alternate addresses" in this context - even if this is what would be easiest to implement on some architectures.
Becky, Stefan: please comment...
Best regards,
Wolfgang Denk

On Sep 2, 2009, at 2:59 AM, Wolfgang Denk wrote:
Dear "J. William Campbell",
In message 4A9D99B1.1010707@comcast.net you wrote:
...
Becky then posted the summary of this discussion here:
http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/50705
...
In quick summary, for the next few years, we will require that all "important" physical addresses have corresponding virtual addresses.
...unless there are good reasons to deviate from that rule, but these cases are expected to be rare exceptions from the rule.
In any case, I think we should be careful not to mix things: what we are discussing here are address mappings. What we are not discussing is specific memory properties like being cached/uncached, guarded/ non-guarded, etc.
Such properties are important, too, but they need to get handled through a separate interface.
Here is where I am quite sure you are going to have a problem. In very many CPUs, cache control and memory management are joined at the hip. Some systems have no easy way to enable and disable (D,I) cache globally, it is only doable on a page or segment basis. The PPC hardware has a relatively low cost way to do so, but not all architectures do.
I am well aware of this problem, which is one of the reasons that the majority of systems is running with data cache turned off. Even PowerPC has DC off (at least after relocation to RAM), ARM does not implement cache support yet, etc.
When we state that U-Boot is a boot loader and thus should be kept simple, this more or less logically results in the consequence that if it's difficult to enable the DC (on some systems), then just don't do it, then.
Nobody enforces you to enable caches when you find it hard to do.
Very True. I did forget about the read being just a memory reference. So if we desire the flash to be cached, it would have to "normally" be cached for reads to take advantage of the operation.
ACK.
Thanks for looking at this. It therefore seems to me that adding an "uncache(virtual address)" operation (that may return a substitute address for the actual write to the flash) followed by a "restore_cache()" operation inside the flash driver write routine should work. The uncache routine would do nothing if the flash is not cached to
This is where Detlev's comment about using the chance to define a cache API comes into play.
Yes, we probably should create a set of functions like
enable_data_cache(address, size); and disable_data_cache(address, size);
which would turn on resp. off the caching attributes on the given memory range.
begin with, would globally turn off data cache if that is easy to do, or would provide an alternate virtual address to be used in the write. That
This is where I disagree.
I'm not really deep enough in the implementation details and thus would appreciate comments for example from Becky and Stefan. In my opinion, turning on or off the cache on an address range should be implemented as outlined above, i. e. as an operation changing the caching properties of the address range.
This makes sense to me. The disable function would need to flush the range from the cache, but that's the only difficulty I forsee. However, I dug up some AVR32 docs, and it looks like the whole dual cacheable/CI mapping thing may be architectural. The architecture seems to specify a virtual memory map for privileged state, and part of the VA range is not translated by the MMU, but has a default translation. There appear to be segments in the untranslated VA space that map to the same PA, one cacheable and the other not.
Using a completely different address range instead is a different thing, and not what I have in mind. I really dislike the idea of supporting "alternate addresses" in this context - even if this is what would be easiest to implement on some architectures.
I agree, but, then, I'm coming from an architecture where doing this is bad joo-joo. Again, it looks like AVR may actually need this - hopefully someone who knows more about AVR will speak up here.
Cheers, Becky
Becky, Stefan: please comment...
Best regards,
Wolfgang Denk
-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Conscious is when you are aware of something, and conscience is when you wish you weren't.

Becky Bruce wrote:
On Sep 2, 2009, at 2:59 AM, Wolfgang Denk wrote:
Dear "J. William Campbell",
In message 4A9D99B1.1010707@comcast.net you wrote:
...
Becky then posted the summary of this discussion here:
http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/50705
...
In quick summary, for the next few years, we will require that all "important" physical addresses have corresponding virtual addresses.
...unless there are good reasons to deviate from that rule, but these cases are expected to be rare exceptions from the rule.
In any case, I think we should be careful not to mix things: what we are discussing here are address mappings. What we are not discussing is specific memory properties like being cached/uncached, guarded/ non-guarded, etc.
Such properties are important, too, but they need to get handled through a separate interface.
Here is where I am quite sure you are going to have a problem. In very many CPUs, cache control and memory management are joined at the hip. Some systems have no easy way to enable and disable (D,I) cache globally, it is only doable on a page or segment basis. The PPC hardware has a relatively low cost way to do so, but not all architectures do.
I am well aware of this problem, which is one of the reasons that the majority of systems is running with data cache turned off. Even PowerPC has DC off (at least after relocation to RAM), ARM does not implement cache support yet, etc.
When we state that U-Boot is a boot loader and thus should be kept simple, this more or less logically results in the consequence that if it's difficult to enable the DC (on some systems), then just don't do it, then.
Nobody enforces you to enable caches when you find it hard to do.
Very True. I did forget about the read being just a memory reference. So if we desire the flash to be cached, it would have to "normally" be cached for reads to take advantage of the operation.
ACK.
Thanks for looking at this. It therefore seems to me that adding an "uncache(virtual address)" operation (that may return a substitute address for the actual write to the flash) followed by a "restore_cache()" operation inside the flash driver write routine should work. The uncache routine would do nothing if the flash is not cached to
This is where Detlev's comment about using the chance to define a cache API comes into play.
Yes, we probably should create a set of functions like
enable_data_cache(address, size);
and disable_data_cache(address, size);
which would turn on resp. off the caching attributes on the given memory range.
begin with, would globally turn off data cache if that is easy to do, or would provide an alternate virtual address to be used in the write. That
This is where I disagree.
I'm not really deep enough in the implementation details and thus would appreciate comments for example from Becky and Stefan. In my opinion, turning on or off the cache on an address range should be implemented as outlined above, i. e. as an operation changing the caching properties of the address range.
This makes sense to me. The disable function would need to flush the range from the cache, but that's the only difficulty I forsee. However, I dug up some AVR32 docs, and it looks like the whole dual cacheable/CI mapping thing may be architectural. The architecture seems to specify a virtual memory map for privileged state, and part of the VA range is not translated by the MMU, but has a default translation. There appear to be segments in the untranslated VA space that map to the same PA, one cacheable and the other not.
Unfortunately, more than one architecture that has problems turning off cache in an arbitrary region of memory. If a system has a "global" cache disable, that would be the easiest option. Since u-boot is single threaded, we don't have to worry about the cache not being turned back on because we lost control of the CPU. Alternatively, one could change the cache enable bits in an appropriate "BAT" if such a thing exists, and then change them back. However, I think you have to be open to the idea of an "alternate address", as that may be the easiest way to go on many systems, and possibly the only way to go on others. In any case, Becky is quite correct in pointing out that flushing the cache for the appropriate VAs is necessary if you alter the contents of memory region that were previously cached.
Using a completely different address range instead is a different thing, and not what I have in mind. I really dislike the idea of supporting "alternate addresses" in this context - even if this is what would be easiest to implement on some architectures.
I understand what you are saying here Wolfgang. However, I would plead the following possible mitigating circumstances make this "not so bad". First, the rule will be that the usage will be very local, inside at most say 100 lines of C code, just enough to do the "I/O" access required to the memory block. Look at it as a "write_be32_array" I/O accessor function. Second, when you have a situation where the sizeof(PA) > sizeof(VA), you are eventually going to require mappings inside of drivers in order to reach things that are not statically mapped in, both for reading and writing, independent of caching issues. The VA used to access the device will be "temporary". I know that this issue is being postponed a bit, but it will arise sooner than one hopes. Thinking about the problem now and establishing what is and is not necessary/permitted may result in an easier time of it later. (Speaking as an old PDP-11 programmer, KISAR6 will become your friend even in the boot loader!)
Best Regards, Bill Campbell
I agree, but, then, I'm coming from an architecture where doing this is bad joo-joo. Again, it looks like AVR may actually need this - hopefully someone who knows more about AVR will speak up here.
Cheers, Becky
Becky, Stefan: please comment...
Best regards,
Wolfgang Denk
-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Conscious is when you are aware of something, and conscience is when you wish you weren't.

On Thu, Sep 3, 2009 at 11:09 AM, Becky Brucebecky.bruce@freescale.com wrote:
This makes sense to me. Â The disable function would need to flush the range from the cache, but that's the only difficulty I forsee. However, I dug up some AVR32 docs, and it looks like the whole dual cacheable/CI mapping thing may be architectural. Â The architecture seems to specify a virtual memory map for privileged state, and part of the VA range is not translated by the MMU, but has a default translation. Â There appear to be segments in the untranslated VA space that map to the same PA, one cacheable and the other not.
MIPS32 uses essentially the same setup, in kernel mode (where u-boot runs), there is 1/2 the virtual address space that is mapped through the TLB and cacheability is a page attribute, the other 1/2 contains 4 fixed regions with different combinations of cache policy and whether they are mapped via TLB or static mapping. IIRC, MIPS64 adds a few small wrinkles, but is pretty much the same idea.
The unmapped cached and unmapped uncached regions always map to the same physical address and are the only segments typically used for u-boot. I have used the mapped segments on a machine with 32 bit VA and 36 bit PA to map a peripheral that wasn't accessable in the fixed mapped segments, but that was a bit of an odd case.

Becky Bruce wrote:
<snip>
This is where Detlev's comment about using the chance to define a cache API comes into play.
Yes, we probably should create a set of functions like
enable_data_cache(address, size);
and disable_data_cache(address, size);
which would turn on resp. off the caching attributes on the given memory range.
<snip>
Using a completely different address range instead is a different thing, and not what I have in mind. I really dislike the idea of supporting "alternate addresses" in this context - even if this is what would be easiest to implement on some architectures.
I agree, but, then, I'm coming from an architecture where doing this is bad joo-joo. Again, it looks like AVR may actually need this - hopefully someone who knows more about AVR will speak up here.
Although I wouldn't consider myself an AVR32 "expert", I am following this thread closely. It seems to me that the above 2 functions could be used to support Detlev's idea as well as Haavard's map_physmem() idea.
The functions could also return (architecture dependant) a "remapped" address to be used, then we could support:-
(1) AVR32-type cache which is based on different address ranges
Here the xxx_data_cache() functions would flush the cache, and return a new address that points to the relevant cached / uncached section of memory.
(2) Platforms with "single address" ranges but finer cache control
These would flush the cache, adjust the caching for the memory range as required, and then just return the *same* address (since no address re-mapping is required).
The functions using xxx_data_cache() would then code up as follows:-
void foo(address, size) { uncached_address = disable_data_cache(address, size); /* * ... do some code using only uncached_address ... */ cached_address = enable_data_cache(address, size); }
Regards Mark

Mark Jackson mpfj-list@mimc.co.uk wrote:
The functions could also return (architecture dependant) a "remapped" address to be used, then we could support:-
Right, and that is the important part: If I'm allowed to return a remapped address, this API will be trivial to implement on AVR32. If not, it will be quite difficult (and make everything larger and slower).
Your idea is good; it's mostly similar to what map_physmem() does today, but perhaps the name is wrong, and I suppose it should probably flush the caches in addition to just remapping the address.
Haavard

Becky Bruce becky.bruce@freescale.com wrote:
I'm not really deep enough in the implementation details and thus would appreciate comments for example from Becky and Stefan. In my opinion, turning on or off the cache on an address range should be implemented as outlined above, i. e. as an operation changing the caching properties of the address range.
This makes sense to me. The disable function would need to flush the range from the cache, but that's the only difficulty I forsee. However, I dug up some AVR32 docs, and it looks like the whole dual cacheable/CI mapping thing may be architectural. The architecture seems to specify a virtual memory map for privileged state, and part of the VA range is not translated by the MMU, but has a default translation. There appear to be segments in the untranslated VA space that map to the same PA, one cacheable and the other not.
Yes, that is correct. As Andrew pointed out, MIPS does essentially the same thing, and I _think_ some versions of SH have a similar setup as well. This makes it easy to set up uncached access on certain physical addresses without wasting any TLB entries.
On the other hand, turning off the cache entirely is a quite complicated operation which involves accessing the memory-mapped dcache tag memory and marking everything as allocated and invalid. So I'd rather not do that, especially when there's a much easier way.
Another alternative is to enable paging, which will allow fine-grained control over caching properties. It's probably not all that difficult to do, but it just seems so _pointless_.
Also, what I don't get is that your architecture _also_ needs to remap physical to virtual addresses, but for a different reason, so what is wrong with having an API that can do both?
In other words, the map_physmem() API can support the following three classes of architectures: - Architectures which don't need address remapping but do need to change caching properties: Just update the caching properties and return the physical address unchanged. - Architectures like AVR32 which change caching properties through the MMU: Return a new virtual address with the desired properties. - Architectures with PA>VA: Update the caching properties if necessary and return a temporary virtual address corresponding to the given physical address, allowing the entire physical address range of the CPU to be made available to drivers utilizing this API.
What is wrong with this API? Why are everyone so hell-bent on making it difficult for the last two classes of architectures? Did I get any or all of the scenarios above wrong?
Using a completely different address range instead is a different thing, and not what I have in mind. I really dislike the idea of supporting "alternate addresses" in this context - even if this is what would be easiest to implement on some architectures.
I agree, but, then, I'm coming from an architecture where doing this is bad joo-joo. Again, it looks like AVR may actually need this - hopefully someone who knows more about AVR will speak up here.
I've said what I intend to say about this. I find it really hard to argue about "opinions" which are not backed by facts. The "alternate addresses" are there whether we decide use them or not, but not using them makes what should be a trivial operation really hard to do.
Oh, and I'm really sorry about the tone I used a couple of days ago. I deliberately stopped posting for a few days after that...but I'm hoping we can all continue working towards a solution now.
Btw, I do have a few suggestions on how to resolve this, but I'm not going to come forward with them as long as I'm being stonewalled on the VA/PA mapping issue.
Haavard

On Aug 31, 2009, at 6:57 AM, Wolfgang Denk wrote:
Dear Haavard Skinnemoen,
In message 20090830224218.10f14dc4@siona you wrote:
Well, VA==PA is the default setting for U-Boot that shall be used for all systems, unless it is really impossible on a board to implement an exception from that rule.
While not impossible, following that rule on the NGW100 would require that I either disable the caches (which would result in a massive slowdown) or start using the MMU more actively to get the caching properties right.
OK, so this would be the plan for a clean fix, then?
Sorry, guys, I'm still not clear on what's going on. Haavard, did you expect each call to flash_map to return a different VA each time (i.e. you're trying to do some sort of dynamic mapping), or are you just calling it to get the VA that corresponds to some PA, since VA != PA on these 2 boards?
In almost all cases RAM and NOR flash fit easily in the physical address space of the CPUs, and for the sake of this majority of systems it must be possible to access memory on such systems assuming a simple 1:1 mapping.
You're ignoring cache issues.
Indeed I am, and intentionally, because this is a different topic. If your system requires to set up the MMU to enable caching, then you are supposed to do it in a way compatible with the rest of the software design, i. e. as transparently as possible. None of the architectures I know resonably well have problems setting up a 1: 1 address mapping even when the MMU is on (but I have to admit that AVR32 is not among these architectures).
Technically, the addresses seen by the CPU are virtual. And I don't think systems with a cache should be considered 'very special' these days...
Cache should not be relevant at all when defining a physcal or virtual memory map.
But exactly what's wrong with hiding all this complexity inside map_physmem()? It was designed _exactly_ for this purpose -- to allow platforms with non-trivial mappings between VA and PA to do the necessary mapping when the driver requests it.
Sorry, I don't know that code and it's users well enough to coment. Maybe Becky and/or Stefan can shed some light on this...
The problem is that would mean the CFI driver would be treating start as a PA, while every other flash driver that I looked at treats it as a VA. That kind of inconsistency would be bad. Plus, Wolfgang, Stefan, Kumar, and I discussed this on the list/IRC last november and agreed that it made sense for command-line foo (including the flash commands) to take a virtual address as the argument. Platforms that have a non-fixed memory map would need to provide a command-line interface to get a VA to use (since that's highly unusual and expected to remain so).
That PA==VA rule is pretty much the reason we're in this mess -- if
Let's say, the fact that this rule has not been stricter enforced has caused that teh appearant problems were not discovered earlier.
more platforms had broken this rule, perhaps more of the code would
Heh. If more platforms had broken this rule we would probably have become aware of these violations earlier, and stopped them doing such naughty things ;-)
Well, u-boot was supposed to be simple, and a VA=PA assumption ended up built into a lot of the code. Which isn't a problem until you need something different.... I had a lot of fun standing on my head trying to get 36-bit physical addressing on PowerPC working as a result of this. I suspect the next big u-boot problem will be the need for dynamic mappings, because we're assuming for the most part now that the board memory map is static.
As it seems, this requires some more discussion, i. e. a clean fix within the next couple of days is unlikely. To me that means that it makes no sense to offer to delay the release of v2009.08 by a week or so, as we'd still not be ready then. I will therefore proceed as planned, putting up with the fact that some (two, IIRC?) AVR32 boards are broken in this release. [we had a very long testing phase, and the problem is not exactly new, so it should have been detected (and fixed) long before.]
SGTM. But then, it isn't my board that's broken..... We do need a solution here, but I think we have time to work it out.
Cheers, Becky
Best regards,
Wolfgang Denk
-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de 1st Old Man: Gee, its windy today. 2nd Old Man: No it's not... it's Thursday. 3rd Old Man: Yeh, me too. Let's go for a beer. _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

Becky Bruce becky.bruce@freescale.com wrote:
Sorry, guys, I'm still not clear on what's going on. Haavard, did you expect each call to flash_map to return a different VA each time (i.e. you're trying to do some sort of dynamic mapping), or are you just calling it to get the VA that corresponds to some PA, since VA != PA on these 2 boards?
I'm calling it to get some VA which corresponds to a given PA with given caching properties. I don't think it's good design if the board code for every single board needs to know exactly which VA is going to be returned.
But exactly what's wrong with hiding all this complexity inside map_physmem()? It was designed _exactly_ for this purpose -- to allow platforms with non-trivial mappings between VA and PA to do the necessary mapping when the driver requests it.
Sorry, I don't know that code and it's users well enough to coment. Maybe Becky and/or Stefan can shed some light on this...
The problem is that would mean the CFI driver would be treating start as a PA, while every other flash driver that I looked at treats it as a VA. That kind of inconsistency would be bad.
Except that it's even more inconsistent now since lots of code elsewhere treats start as a PA, and the CFI driver was the only driver which did it mostly correctly.
Plus, Wolfgang, Stefan, Kumar, and I discussed this on the list/IRC last november and agreed that it made sense for command-line foo (including the flash commands) to take a virtual address as the argument. Platforms that have a non-fixed memory map would need to provide a command-line interface to get a VA to use (since that's highly unusual and expected to remain so).
I think that's an incredibly stupid idea, and it's really a shame that I didn't participate in the IRC meeting.
It would have been so much easier if the command line only had to deal with PA and the platform had to remap things transparently if necessary. Now, if you need to access an arbitrary physical address through a script, you need to first call this "set up a mapping" command, record the value it prints out (presumably) and make sure not to interpret any error message as a virtual address, use it to perform the command, and then unmap it afterwards.
So if anything is breaking the u-boot philosophy of simplicity it is this, and it does it in the worst possible way: it exposes all the complexity through the user interface!
That PA==VA rule is pretty much the reason we're in this mess -- if
Let's say, the fact that this rule has not been stricter enforced has caused that teh appearant problems were not discovered earlier.
more platforms had broken this rule, perhaps more of the code would
Heh. If more platforms had broken this rule we would probably have become aware of these violations earlier, and stopped them doing such naughty things ;-)
Well, u-boot was supposed to be simple, and a VA=PA assumption ended up built into a lot of the code. Which isn't a problem until you need something different.... I had a lot of fun standing on my head trying to get 36-bit physical addressing on PowerPC working as a result of this. I suspect the next big u-boot problem will be the need for dynamic mappings, because we're assuming for the most part now that the board memory map is static.
Exactly, and with the current approach, dynamic mappings will _never_ be possible because all the VA-to-PA mapping assumptions will be built into both the board code and the user interface!
So I ask again: Why don't we just fix the broken code instead? We have the mechanisms available to make this work via the map_physmem() functions, we just need to use them.
Haavard

On Aug 30, 2009, at 1:11 PM, Wolfgang Denk wrote:
Dear Haavard & Becky,
In message 20090830173640.2af9ce3c@siona you wrote:
The only way for that to work is when VA=PA (or, depending on what you were doing with the address,
Well, VA==PA is the default setting for U-Boot that shall be used for all systems, unless it is really impossible on a board to implement an exception from that rule.
you just got lucky). The CFI driver was the outlier - all the other flash code was treating the start field as a VA already. So I don't think just reverting the patch is the answer.
We did not even have any notion of VA's in U-Boot until very recently, and I call on everybody not to make U-Boot more complicated than necessary.
There have always been VAs in U-boot, whether the programmer was aware of it or not. They just happened to always be the same as a PA, and we never needed a separate "U-boot concept" for them. However, for any system with an MMU that has been turned on by u-boot, the notion of VAs exists. You can't dereference a PA directly in a system with translation (even though the MMU is going to translate it to 1:1), and that's what all of the flash drivers were doing. You can *treat* a PA as a VA and dereference it when you're 1:1, but at that point, you are in fact using it as a VA (nevermind its value).
In almost all cases RAM and NOR flash fit easily in the physical address space of the CPUs, and for the sake of this majority of systems it must be possible to access memory on such systems assuming a simple 1:1 mapping.
Agreed.
Becky: the fact that Haavard's code is breaking is not considered an indication of a deficiency in his code.
I'm not calling the code deficient, just a bit inconsistent, and it wasn't clear to me why. When code uses the same value 1) by mapping it and 2) by dereferencing it directly, that's a bit strange. Why map it in some cases and in not others? How is that supposed to work when VA!=PA or when the VA can change? This is one of the reasons that it seemed to make sense to modify the driver as I did - it should now be able to work when VA!=PA as well as when we're 1:1. I could find no users that seemed to need the dynamic mapping. However, if anyone does need to *dynamically* map the flash in and out with a different VA each time, then we do need to do things a bit differently and we should look into a solution for that. Clearly, I'm not the expert on the CFI code, so when I published that patch I expected someone to smack me if I was being a moron :)
If the CFI driver kind of allowed for VAs before (but incompletely / incorrectly), then this dis not cause problems on any systems using a strict 1:1 mapping.
Any changes to the code to correctly support other mappings must be done in a way that they (1) do not break and (2) do not add additional burdon on systems with a simple 1:1 mapping.
Agreed, there shouldn't be any burden on those systems.
Everything is treated as virtual unless it's being used for hardware setup.
Thisis NOT correct. U-Boot usually does NOT use virtual addresses. Only very few systems do, and these must care not to disturb the majority of systems which do no need to differentiate between physical and virtual addresses.
I'm not saying it *is* a VA as far as U-Boot knows, but that it is *treated* as one, as mentioned above. And this code was not expected to disturb the 1:1 case.
If you use something to do memory accesses, it's virtual.
Unless you have a very special system you can rely on a strict 1:1 mapping.
A lot of code had been just using the PA as a VA, because things were always mapped 1-1.
Not only were, but _are_ and _will_be_.
Of course - that should continue to be the default case unless it needs to differ for some reason.
There are basically two ways to fix it: Either go back to using physical addresses in the flash API, or make CONFIG_ENV_ADDR virtual
I understand we cannot do that, because some systems need to map (NOR) flash to virtual addresses outside the physical address space. Ok, so the CFI driver shall consistently be able to use VAs - but on simple systems with a 1:1 mapping there shall be no need in the system configuration to spend any thoughts on this.
(and from what I hear, the jffs2 code needs a similar fix.) This patch does the latter, but it seems like it doesn't fix things completely, and Wolfgang didn't appear very happy about it.
Indeed I'm deeply trouble when log standing rules get silently bent and even broken.
I agree 100% that 1:1 support should not be disturbed, since that's the default case. There was absolutely no intent here to bend any "log<sic ;-)> standing rules", and nothing has been done here that should have any impact on 1:1 as far as I'm aware. It's the !1:1 case that becomes more interesting and could have problems if you mix up your VAs and PAs, or you actually need dynamic mappings.
In any event, in this case, the problem that provoked this thread is not with 1:1, but with !1:1......
Cheers, b

Becky Bruce becky.bruce@freescale.com wrote:
Becky: the fact that Haavard's code is breaking is not considered an indication of a deficiency in his code.
I'm not calling the code deficient, just a bit inconsistent, and it wasn't clear to me why. When code uses the same value 1) by mapping it and 2) by dereferencing it directly, that's a bit strange. Why map it in some cases and in not others?
I agree, that is inconsistent. However, you "fixed" the code which was actually doing it correctly...we should instead fix the code which is incorrectly using the physical address as a virtual one.
While doing that, we could also consider dropping that hideous start array altogether and instead provide a handful of accessors for locating sector start addresses on the flash.
How is that supposed to work when VA!=PA or when the VA can change? This is one of the reasons that it seemed to make sense to modify the driver as I did - it should now be able to work when VA!=PA as well as when we're 1:1. I could find no users that seemed to need the dynamic mapping. However, if anyone does need to *dynamically* map the flash in and out with a different VA each time, then we do need to do things a bit differently and we should look into a solution for that.
I don't think anything needs a dynamic mapping right now, but if we're going to _ever_ support such systems in the future (and your 36-bit PA system is a likely candidate, isn't it?) we have to stop locking ourselves into a static mapping.
Clearly, I'm not the expert on the CFI code, so when I published that patch I expected someone to smack me if I was being a moron :)
I apologize for not doing so earlier ;-)
Anyway, I have to take most of the blame for this situation...if I had paid attention and flagged the problem earlier, much of this could have been avoided.
I'm hoping we can avoid too much pointing of fingers and work towards a reasonably future-proof solution which works well on all platforms.
If the CFI driver kind of allowed for VAs before (but incompletely / incorrectly), then this dis not cause problems on any systems using a strict 1:1 mapping.
Any changes to the code to correctly support other mappings must be done in a way that they (1) do not break and (2) do not add additional burdon on systems with a simple 1:1 mapping.
Agreed, there shouldn't be any burden on those systems.
Agree too, as long as "those systems" does not include common code which needs to run on all systems.
Everything is treated as virtual unless it's being used for hardware setup.
Thisis NOT correct. U-Boot usually does NOT use virtual addresses. Only very few systems do, and these must care not to disturb the majority of systems which do no need to differentiate between physical and virtual addresses.
I'm not saying it *is* a VA as far as U-Boot knows, but that it is *treated* as one, as mentioned above. And this code was not expected to disturb the 1:1 case.
Exposing VAs to the user interface and board code is IMO a very bad idea, however. And that is what's happening now -- instead of localizing the VAs to the CFI flash driver, it is now spread all over the place.
A lot of code had been just using the PA as a VA, because things were always mapped 1-1.
Not only were, but _are_ and _will_be_.
Of course - that should continue to be the default case unless it needs to differ for some reason.
Yes, and that's why the external interface (at least the command line and board configuration files) needs to use PA.
Indeed I'm deeply trouble when log standing rules get silently bent and even broken.
I agree 100% that 1:1 support should not be disturbed, since that's the default case. There was absolutely no intent here to bend any "log<sic ;-)> standing rules", and nothing has been done here that should have any impact on 1:1 as far as I'm aware.
Agree, 1:1 isn't the issue, it's two different systems with !1:1 which have started making incompatible changes to the CFI driver.
Haavard
participants (9)
-
Andrew Dyer
-
Becky Bruce
-
Haavard Skinnemoen
-
J. William Campbell
-
Kumar Gala
-
Mark Jackson
-
Stefan Roese
-
Thiago A. CorrĂȘa
-
Wolfgang Denk