[U-Boot] [PATCH] arm: pxa: PXA270 D-Cache as ram

2.2.5.2 of Marvell PXA27x Processor Family Developers Manual says: "The PXA27x processor cache configuration is identical to that of the PXA255 processor."
As a result, it is perfectly legitimate to use PXA25X 'lock_cache_for_stack' on PXA27X as well.
Signed-off-by: Sergey Yanovich ynvich@gmail.com --- arch/arm/cpu/pxa/start.S | 10 ++++++++-- include/configs/lp8x4x.h | 5 +++-- 2 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/arch/arm/cpu/pxa/start.S b/arch/arm/cpu/pxa/start.S index ada91a6..5ea512e 100644 --- a/arch/arm/cpu/pxa/start.S +++ b/arch/arm/cpu/pxa/start.S @@ -40,6 +40,12 @@ #include <version.h>
#ifdef CONFIG_CPU_PXA25X +#ifndef CONFIG_PXA2XX_CACHE_AS_RAM +#define CONFIG_PXA2XX_CACHE_AS_RAM +#endif +#endif + +#ifdef CONFIG_PXA2XX_CACHE_AS_RAM #if ((CONFIG_SYS_INIT_SP_ADDR) != 0xfffff800) #error "Init SP address must be set to 0xfffff800 for PXA250" #endif @@ -164,7 +170,7 @@ reset: bl cpu_init_crit #endif
-#ifdef CONFIG_CPU_PXA25X +#ifdef CONFIG_PXA2XX_CACHE_AS_RAM bl lock_cache_for_stack #endif
@@ -482,7 +488,7 @@ fiq: * This is useful on PXA25x and PXA26x in early bootstages, where there is no * other possible memory available to hold stack. */ -#ifdef CONFIG_CPU_PXA25X +#ifdef CONFIG_PXA2XX_CACHE_AS_RAM .macro CPWAIT reg mrc p15, 0, \reg, c2, c0, 0 mov \reg, \reg diff --git a/include/configs/lp8x4x.h b/include/configs/lp8x4x.h index 026f321..27ff2f4 100644 --- a/include/configs/lp8x4x.h +++ b/include/configs/lp8x4x.h @@ -149,8 +149,9 @@
#define CONFIG_SYS_LOAD_ADDR 0xa0008000 #define CONFIG_SYS_SDRAM_BASE PHYS_SDRAM_1 -/* Use first 64kb bank of the internal SRAM */ -#define CONFIG_SYS_INIT_SP_ADDR 0x5c010000 +/* Use CPU data cache as internal RAM */ +#define CONFIG_PXA2XX_CACHE_AS_RAM +#define CONFIG_SYS_INIT_SP_ADDR 0xfffff800
/* * NOR FLASH

Dear Sergey Yanovich,
2.2.5.2 of Marvell PXA27x Processor Family Developers Manual says: "The PXA27x processor cache configuration is identical to that of the PXA255 processor."
As a result, it is perfectly legitimate to use PXA25X 'lock_cache_for_stack' on PXA27X as well.
Signed-off-by: Sergey Yanovich ynvich@gmail.com
arch/arm/cpu/pxa/start.S | 10 ++++++++-- include/configs/lp8x4x.h | 5 +++-- 2 files changed, 11 insertions(+), 4 deletions(-)
Why would you need this on PXA27x ? Is SRAM not enough?
Best regards, Marek Vasut

Dear Marek Vasut,
On Tue, 2013-05-21 at 12:39 +0200, Marek Vasut wrote:
2.2.5.2 of Marvell PXA27x Processor Family Developers Manual says: "The PXA27x processor cache configuration is identical to that of the PXA255 processor."
As a result, it is perfectly legitimate to use PXA25X 'lock_cache_for_stack' on PXA27X as well.
Why would you need this on PXA27x ? Is SRAM not enough?
SRAM is battery-backup. It can be used as a ultra-fast persistent storage in OS. Wasting a quater of it just to boot the system isn't the best choice.

Dear Sergey Yanovich,
Dear Marek Vasut,
On Tue, 2013-05-21 at 12:39 +0200, Marek Vasut wrote:
2.2.5.2 of Marvell PXA27x Processor Family Developers Manual says: "The PXA27x processor cache configuration is identical to that of the PXA255 processor."
As a result, it is perfectly legitimate to use PXA25X 'lock_cache_for_stack' on PXA27X as well.
Why would you need this on PXA27x ? Is SRAM not enough?
SRAM is battery-backup.
SRAM is just the in-CPU bit of fast RAM. What do you mean by "battery-backup" ?
It can be used as a ultra-fast persistent storage in OS. Wasting a quater of it just to boot the system isn't the best choice.
The SRAM is used for stack in U-Boot until you leave board_init_f, then the stack is relocated to DRAM. The OS can use SRAM as needed, U-Boot is no longer operational once you load subsequent OS. What's the problem?
Best regards, Marek Vasut

Dear Marek Vasut,
On Tue, 2013-05-21 at 12:54 +0200, Marek Vasut wrote:
SRAM is just the in-CPU bit of fast RAM. What do you mean by "battery-backup" ?
Yes, you are right. It is 'for high speed code or data storage preserved during low-power states' using a quote from PXA270 EMTS (top of page 1). Battery-backup is optional. I mixed PXA270 and LP-8x4x specs.
It can be used as a ultra-fast persistent storage in OS. Wasting a quater of it just to boot the system isn't the best choice.
The SRAM is used for stack in U-Boot until you leave board_init_f, then the stack is relocated to DRAM. The OS can use SRAM as needed, U-Boot is no longer operational once you load subsequent OS. What's the problem?
Anyway, SRAM preserves its state when power is off. Poweroff time could be in years with a backup battery. In addition, D-Cache is an order of magnitude faster than SRAM (approx. 9 times) for both reads and writes.

Dear Sergey Yanovich,
Dear Marek Vasut,
On Tue, 2013-05-21 at 12:54 +0200, Marek Vasut wrote:
SRAM is just the in-CPU bit of fast RAM. What do you mean by "battery-backup" ?
Yes, you are right. It is 'for high speed code or data storage preserved during low-power states' using a quote from PXA270 EMTS (top of page 1). Battery-backup is optional. I mixed PXA270 and LP-8x4x specs.
Yes, it's just an in-CPU RAM.
It can be used as a ultra-fast persistent storage in OS. Wasting a quater of it just to boot the system isn't the best choice.
The SRAM is used for stack in U-Boot until you leave board_init_f, then the stack is relocated to DRAM. The OS can use SRAM as needed, U-Boot is no longer operational once you load subsequent OS. What's the problem?
Anyway, SRAM preserves its state when power is off. Poweroff time could be in years with a backup battery. In addition, D-Cache is an order of magnitude faster than SRAM (approx. 9 times) for both reads and writes.
Is there any measurable difference between using DCache and SRAM? Do you have any evidence that a speedup happens?
Still, the SRAM/DCache is only used until you leave board_init_f(), then it's all DRAM.
Best regards, Marek Vasut

Dear Marek Vasut,
On Tue, 2013-05-21 at 13:38 +0200, Marek Vasut wrote:
Yes, it's just an in-CPU RAM.
Well, it is not 'just' RAM. It preserves its state during deep sleep and power off modes.
Anyway, SRAM preserves its state when power is off. Poweroff time could be in years with a backup battery. In addition, D-Cache is an order of magnitude faster than SRAM (approx. 9 times) for both reads and writes.
Is there any measurable difference between using DCache and SRAM? Do you have any evidence that a speedup happens?
I haven't done any special profiling. I am just relying on PXA270 EMTS table 6-14. The table says SRAM reads take 9 cycles, writes take 7 cycles. D-Cache operations take 1 cycle.
Still, the SRAM/DCache is only used until you leave board_init_f(), then it's all DRAM.
Yes, the patch as it is will only affects relocation speed and preserve SRAM from corruption. The speed gain can also be applied to uImage copying/unpacking, but that requires deeper understanding than I have at the moment.

Dear Sergey Yanovich,
Dear Marek Vasut,
On Tue, 2013-05-21 at 13:38 +0200, Marek Vasut wrote:
Yes, it's just an in-CPU RAM.
Well, it is not 'just' RAM. It preserves its state during deep sleep and power off modes.
So does RAM during sleep state ;-)
Anyway, SRAM preserves its state when power is off. Poweroff time could be in years with a backup battery. In addition, D-Cache is an order of magnitude faster than SRAM (approx. 9 times) for both reads and writes.
Is there any measurable difference between using DCache and SRAM? Do you have any evidence that a speedup happens?
I haven't done any special profiling. I am just relying on PXA270 EMTS table 6-14. The table says SRAM reads take 9 cycles, writes take 7 cycles. D-Cache operations take 1 cycle.
Still, the SRAM/DCache is only used until you leave board_init_f(), then it's all DRAM.
Yes, the patch as it is will only affects relocation speed and preserve SRAM from corruption.
Now this is the right (convincing) argument! What kind of corruption ? When does it occur ?
The speed gain can also be applied to uImage copying/unpacking, but that requires deeper understanding than I have at the moment.
Uh ... I lost you here. Can you please elaborate some more ?
Best regards, Marek Vasut

On Tue, 2013-05-21 at 17:00 +0200, Marek Vasut wrote:
Yes, the patch as it is will only affects relocation speed and preserve SRAM from corruption.
Now this is the right (convincing) argument! What kind of corruption ? When does it occur ?
The whole 256 kB of SRAM could be used for persistent storage with the patch. Without it, part of SRAM should be dedicated for U-Boot stack or be overwritten on boot.
The speed gain can also be applied to uImage copying/unpacking, but that requires deeper understanding than I have at the moment.
Uh ... I lost you here. Can you please elaborate some more ?
You are right. After SDRAM is configured, it is enough to turn on data caching to receive its speed benefits.

Dear Sergey Yanovich,
On Tue, 2013-05-21 at 17:00 +0200, Marek Vasut wrote:
Yes, the patch as it is will only affects relocation speed and preserve SRAM from corruption.
Now this is the right (convincing) argument! What kind of corruption ? When does it occur ?
The whole 256 kB of SRAM could be used for persistent storage with the patch. Without it, part of SRAM should be dedicated for U-Boot stack or be overwritten on boot.
This won't hold on any PXA that uses SPL, like the vpac270 with OneNAND SPL and PXA3xx (which is out of tree, none of your concern ;-) )
The speed gain can also be applied to uImage copying/unpacking, but that requires deeper understanding than I have at the moment.
Uh ... I lost you here. Can you please elaborate some more ?
You are right. After SDRAM is configured, it is enough to turn on data caching to receive its speed benefits.
You must make sure anything that uses DMA won't crash. But I don't understand how locking cachelines as RAM and enabling dcache relate to each other in this context.
Best regards, Marek Vasut

Dear Marek Vasut,
On Tue, 2013-05-21 at 21:02 +0200, Marek Vasut wrote:
The whole 256 kB of SRAM could be used for persistent storage with the patch. Without it, part of SRAM should be dedicated for U-Boot stack or be overwritten on boot.
This won't hold on any PXA that uses SPL, like the vpac270 with OneNAND SPL and PXA3xx (which is out of tree, none of your concern ;-) )
I am no way trying to enforce D-Cache as RAM. The patch just provides an option for those who needs it.
You are right. After SDRAM is configured, it is enough to turn on data caching to receive its speed benefits.
You must make sure anything that uses DMA won't crash.
I wasn't sure why data cache is disabled in U-Boot for every board I saw configuration of. Thanks for pointing out.
But I don't understand how locking cachelines as RAM and enabling dcache relate to each other in this context.
I meant D-Cache is several times faster than SDRAM, so it may be possible to get a bit faster, if stack remains on D-Cache even after SDRAM is configured. Repeating my hedge, I don't see the full picture, yet. It may well be impossible (if U-Boot needs more than 32 kB of stack) or not worth the effort (if the gain is too small).

Dear Sergey Yanovich,
Dear Marek Vasut,
On Tue, 2013-05-21 at 21:02 +0200, Marek Vasut wrote:
The whole 256 kB of SRAM could be used for persistent storage with the patch. Without it, part of SRAM should be dedicated for U-Boot stack or be overwritten on boot.
This won't hold on any PXA that uses SPL, like the vpac270 with OneNAND SPL and PXA3xx (which is out of tree, none of your concern ;-) )
I am no way trying to enforce D-Cache as RAM. The patch just provides an option for those who needs it.
I'd love to have a uniform way to do this cache thing, really ...
You are right. After SDRAM is configured, it is enough to turn on data caching to receive its speed benefits.
You must make sure anything that uses DMA won't crash.
I wasn't sure why data cache is disabled in U-Boot for every board I saw configuration of. Thanks for pointing out.
But I don't understand how locking cachelines as RAM and enabling dcache relate to each other in this context.
I meant D-Cache is several times faster than SDRAM, so it may be possible to get a bit faster, if stack remains on D-Cache even after SDRAM is configured.
Not really, enabling dcache altogether and marking DRAM region as cached would be much better.
Repeating my hedge, I don't see the full picture, yet. It may well be impossible (if U-Boot needs more than 32 kB of stack)
No way.
or not worth the effort (if the gain is too small).
The larger gain would be from fixing the U-Boot drivers for PXA to work well with DCache ;-) Then the speedup would really be plenty significant, this can be well confirmed on many other ARM chips.
Best regards, Marek Vasut

Dear Marek Vasut,
On Tue, 2013-05-21 at 21:24 +0200, Marek Vasut wrote:
I'd love to have a uniform way to do this cache thing, really ...
Requoting the spec 'The PXA27x processor cache configuration is identical to that of the PXA255 processor'. It looks safe to configure all PXA2XX chipsets this way.
Maybe I am missing something, but SPL is no exception in this case.
Not really, enabling dcache altogether and marking DRAM region as cached would be much better.
Repeating my hedge, I don't see the full picture, yet. It may well be impossible (if U-Boot needs more than 32 kB of stack)
No way.
or not worth the effort (if the gain is too small).
The larger gain would be from fixing the U-Boot drivers for PXA to work well with DCache ;-) Then the speedup would really be plenty significant, this can be well confirmed on many other ARM chips.
I have plans to dig deeper into this after I complete the current project. Faster boot is always a good thing. Thanks for explaining in details.

Dear Sergey Yanovich,
Dear Marek Vasut,
On Tue, 2013-05-21 at 21:24 +0200, Marek Vasut wrote:
I'd love to have a uniform way to do this cache thing, really ...
Requoting the spec 'The PXA27x processor cache configuration is identical to that of the PXA255 processor'. It looks safe to configure all PXA2XX chipsets this way.
Maybe I am missing something, but SPL is no exception in this case.
The OneNAND has 1kbyte limit on code it will directly address when booting from it, you can't even make generate the MMU table there.
Not really, enabling dcache altogether and marking DRAM region as cached would be much better.
Repeating my hedge, I don't see the full picture, yet. It may well be impossible (if U-Boot needs more than 32 kB of stack)
No way.
or not worth the effort (if the gain is too small).
The larger gain would be from fixing the U-Boot drivers for PXA to work well with DCache ;-) Then the speedup would really be plenty significant, this can be well confirmed on many other ARM chips.
I have plans to dig deeper into this after I complete the current project. Faster boot is always a good thing. Thanks for explaining in details.
Sure, yet I think I just piled work onto you ;-)
Best regards, Marek Vasut

Dear Marek Vasut,
On Tue, 2013-05-21 at 22:07 +0200, Marek Vasut wrote:
The OneNAND has 1kbyte limit on code it will directly address when booting from it, you can't even make generate the MMU table there.
Do you mean there is no space left inside that 1K for lock_cache_for_stack()?
I have plans to dig deeper into this after I complete the current project. Faster boot is always a good thing. Thanks for explaining in details.
Sure, yet I think I just piled work onto you ;-)
It's OK. I have plans to do this anyway.

Dear Sergey Yanovich,
Dear Marek Vasut,
On Tue, 2013-05-21 at 22:07 +0200, Marek Vasut wrote:
The OneNAND has 1kbyte limit on code it will directly address when booting from it, you can't even make generate the MMU table there.
Do you mean there is no space left inside that 1K for lock_cache_for_stack()?
How would you do that ? You need MMU enabled to lock lines IIRC.
I have plans to dig deeper into this after I complete the current project. Faster boot is always a good thing. Thanks for explaining in details.
Sure, yet I think I just piled work onto you ;-)
It's OK. I have plans to do this anyway.
CCing Mike Dunn, you two should coordinate.
Best regards, Marek Vasut

Dear Marek Vasut,
On Tue, 2013-05-21 at 23:38 +0200, Marek Vasut wrote:
Do you mean there is no space left inside that 1K for lock_cache_for_stack()?
How would you do that ? You need MMU enabled to lock lines IIRC.
I see I don't know enough to continue the discussion, yet.
Concerning the patch in question, is it possible to allow cache for ram on PXA270 somehow before we have a final solution?

Dear Sergey Yanovich,
Dear Marek Vasut,
On Tue, 2013-05-21 at 23:38 +0200, Marek Vasut wrote:
Do you mean there is no space left inside that 1K for lock_cache_for_stack()?
How would you do that ? You need MMU enabled to lock lines IIRC.
I see I don't know enough to continue the discussion, yet.
Concerning the patch in question, is it possible to allow cache for ram on PXA270 somehow before we have a final solution?
I dont mind discussing this further and even using DCache for stack in board_init_f(), but we must make sure nothing gets broken.
Best regards, Marek Vasut

Dear Marek Vasut,
On Wed, 2013-05-22 at 15:04 +0200, Marek Vasut wrote:
Concerning the patch in question, is it possible to allow cache for ram on PXA270 somehow before we have a final solution?
I dont mind discussing this further and even using DCache for stack in board_init_f(), but we must make sure nothing gets broken.
If it is possible, I would like to have a configuration option, which allows to use D-Cache as RAM not only on PXA25X, but also on PXA27X. I propose to preserve status-quo for now and only use D-Cache as RAM when it is explicitly configured so.
You mentioned in a separate letter, that config options need documenting. This part is missing in my patch, and I'll fix this, if the above is OK.

Dear Sergey Yanovich,
Dear Marek Vasut,
On Wed, 2013-05-22 at 15:04 +0200, Marek Vasut wrote:
Concerning the patch in question, is it possible to allow cache for ram on PXA270 somehow before we have a final solution?
I dont mind discussing this further and even using DCache for stack in board_init_f(), but we must make sure nothing gets broken.
If it is possible, I would like to have a configuration option, which allows to use D-Cache as RAM not only on PXA25X, but also on PXA27X. I propose to preserve status-quo for now and only use D-Cache as RAM when it is explicitly configured so.
See above, let's do it properly please. Check the OneNAND SPL config on vpac270 board, will it still work with this enabled?
You mentioned in a separate letter, that config options need documenting. This part is missing in my patch, and I'll fix this, if the above is OK.
Best regards, Marek Vasut

On 05/21/2013 02:38 PM, Marek Vasut wrote:
Dear Sergey Yanovich,
Dear Marek Vasut,
On Tue, 2013-05-21 at 22:07 +0200, Marek Vasut wrote:
The OneNAND has 1kbyte limit on code it will directly address when booting from it, you can't even make generate the MMU table there.
Do you mean there is no space left inside that 1K for lock_cache_for_stack()?
How would you do that ? You need MMU enabled to lock lines IIRC.
I have plans to dig deeper into this after I complete the current project. Faster boot is always a good thing. Thanks for explaining in details.
Sure, yet I think I just piled work onto you ;-)
It's OK. I have plans to do this anyway.
CCing Mike Dunn, you two should coordinate.
Thanks Marek. Was following this on the ML. Will take a look when I get a chance.
participants (3)
-
Marek Vasut
-
Mike Dunn
-
Sergey Yanovich