[U-Boot] [PATCH 03/26] ARM: cp15: setup mmu and enable dcache

This has been tested on at91sam9263 and STN8815. Again, I didn't check if it has bad effects on non-arm926 cores.
Initially I had a "done" bit to only set up page tables at the beginning. However, since the aligmnent requirement was for the whole object file, this extra integer tool 16kB in BSS, so I chose to remove it.
Also, note not all boards use PHYS_SDRAM, but it looks like it's the most used name (more than CONFIG_SYS_DRAM_BASE for example).
Signed-off-by: Alessandro Rubini rubini@gnudd.com Signed-off-by: Heiko Schocher hs@denx.de --- - changes since v1: - add possibilty to use dcache in write_through mode, as Nick Thompson suggested. - use the ram setup info in bd_t to setup the TLB - added my Signed-off-by upon consultation with Alessandro
- changes since v2: - changed commit message - moved cache patches before relocation patches
arch/arm/lib/cache-cp15.c | 51 +++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 51 insertions(+), 0 deletions(-)
diff --git a/arch/arm/lib/cache-cp15.c b/arch/arm/lib/cache-cp15.c index 62ed54f..b2811f3 100644 --- a/arch/arm/lib/cache-cp15.c +++ b/arch/arm/lib/cache-cp15.c @@ -25,6 +25,15 @@ #include <asm/system.h>
#if !(defined(CONFIG_SYS_NO_ICACHE) && defined(CONFIG_SYS_NO_DCACHE)) + +#if defined(CONFIG_SYS_ARM_CACHE_WRITETHROUGH) +#define CACHE_SETUP 0x1a +#else +#define CACHE_SETUP 0x1e +#endif + +DECLARE_GLOBAL_DATA_PTR; + static void cp_delay (void) { volatile int i; @@ -32,6 +41,40 @@ static void cp_delay (void) /* copro seems to need some delay between reading and writing */ for (i = 0; i < 100; i++) nop(); + asm volatile("" : : : "memory"); +} + +/* to activate the MMU we need to set up virtual memory: use 1M areas in bss */ +static inline void mmu_setup(void) +{ + static u32 __attribute__((aligned(16384))) page_table[4096]; + bd_t *bd = gd->bd; + int i, j; + u32 reg; + + /* Set up an identity-mapping for all 4GB, rw for everyone */ + for (i = 0; i < 4096; i++) + page_table[i] = i << 20 | (3 << 10) | 0x12; + /* Then, enable cacheable and bufferable for RAM only */ + for (j = 0; j < CONFIG_NR_DRAM_BANKS; j++) { + for (i = bd->bi_dram[j].start >> 20; + i < (bd->bi_dram[j].start + bd->bi_dram[j].size) >> 20; + i++) { + page_table[i] = i << 20 | (3 << 10) | CACHE_SETUP; + } + } + + /* Copy the page table address to cp15 */ + asm volatile("mcr p15, 0, %0, c2, c0, 0" + : : "r" (page_table) : "memory"); + /* Set the access control to all-supervisor */ + asm volatile("mcr p15, 0, %0, c3, c0, 0" + : : "r" (~0)); + /* and enable the mmu */ + reg = get_cr(); /* get control reg. */ + cp_delay(); + set_cr(reg | CR_M); + }
/* cache_bit must be either CR_I or CR_C */ @@ -39,6 +82,9 @@ static void cache_enable(uint32_t cache_bit) { uint32_t reg;
+ /* The data cache is not active unless the mmu is enabled too */ + if (cache_bit == CR_C) + mmu_setup(); reg = get_cr(); /* get control reg. */ cp_delay(); set_cr(reg | cache_bit); @@ -49,6 +95,11 @@ static void cache_disable(uint32_t cache_bit) { uint32_t reg;
+ if (cache_bit == CR_C) { + /* if disabling data cache, disable mmu too */ + cache_bit |= CR_M; + flush_cache(0, ~0); + } reg = get_cr(); cp_delay(); set_cr(reg & ~cache_bit);

Hello Heiko Schocher,
bounces@lists.denx.de] On Behalf Of Heiko Schocher +static inline void mmu_setup(void) +{
- static u32 __attribute__((aligned(16384))) page_table[4096];
- bd_t *bd = gd->bd;
- int i, j;
- u32 reg;
- /* Set up an identity-mapping for all 4GB, rw for everyone */
- for (i = 0; i < 4096; i++)
page_table[i] = i << 20 | (3 << 10) | 0x12;
- /* Then, enable cacheable and bufferable for RAM only */
- for (j = 0; j < CONFIG_NR_DRAM_BANKS; j++) {
for (i = bd->bi_dram[j].start >> 20;
i < (bd->bi_dram[j].start + bd->bi_dram[j].size)
20;
i++) {
page_table[i] = i << 20 | (3 << 10) | CACHE_SETUP;
}
- }
- /* Copy the page table address to cp15 */
- asm volatile("mcr p15, 0, %0, c2, c0, 0"
: : "r" (page_table) : "memory");
- /* Set the access control to all-supervisor */
- asm volatile("mcr p15, 0, %0, c3, c0, 0"
: : "r" (~0));
- /* and enable the mmu */
- reg = get_cr(); /* get control reg. */
- cp_delay();
- set_cr(reg | CR_M);
I think you need to invalidate caches, TLB, branch-prediction array etc before enabling MMU. I don't think the caches are guaranteed to be invalidated at power on reset.
Please find below some experimental(but working) code I had written for enabling MMU and caches for an internal project. It's written for the ARM RVCT tool chain. So, it will not compile straight away on GCC.
It has code for invalidating caches, TLB, Branch prediction array etc. Also, it uses macros for the bit fields in the SCTLR register. However, the L1 D$ cleaning is not generic. It assumes the size and organization of the L1 cache. arch/arm/cpu/armv7/omap3/cache.S has a more generic implementation of invalidating the entire L1 D$.
In case you find it useful, you are welcome to reuse any part of this code. If you want me to create a patch myself I am open to do that too.
Best regards, Aneesh
Mmu.c:
/* * (C) Copyright 2008 * Texas Instruments, <www.ti.com> * Author: * Aneesh V aneesh@ti.com * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License as * published by the Free Software Foundation's version 2 of * the License. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, * MA 02111-1307 USA */
#include <utils.h>
#ifdef MMU_ENABLED
extern unsigned int pagetable[1024*4]; extern unsigned int Vector(void);
#define PAGE_TAB_SECTION_CONSTANTS 0x40002 #define NS_BIT_1 0x80000 #define nG_0 0 #define S_BIT_1 0x010000 #define AP_FULL_ACCESS 0x00C00 #define DOMAIN_0 0 #define XN_1 0x10 #define TEX_C_B_NORMAL_I_WBWA_O_WBWA 0x05004 #define TEX_C_B_NORMAL_I_WBWA_O_NC 0x04004 #define TEX_C_B_NORMAL_I_NC_O_NC 0x01002 #define TEX_C_B_NORMAL_SHARABLE_DEVICE 0x4
#define SECTION_ENTRY_FULL_ACCESS_FULL_CACHED (PAGE_TAB_SECTION_CONSTANTS|NS_BIT_1|nG_0|S_BIT_1|AP_FULL_ACCESS|DOMAIN_0|TEX_C_B_NORMAL_I_WBWA_O_WBWA) #define SECTION_ENTRY_FULL_ACCESS_FULL_CACHED_NON_SHARED (PAGE_TAB_SECTION_CONSTANTS|NS_BIT_1|nG_0|AP_FULL_ACCESS|DOMAIN_0|TEX_C_B_NORMAL_I_WBWA_O_WBWA) #define SECTION_ENTRY_FULL_ACCESS_NORMAL_NC (PAGE_TAB_SECTION_CONSTANTS|NS_BIT_1|nG_0|S_BIT_1|AP_FULL_ACCESS|DOMAIN_0|TEX_C_B_NORMAL_I_NC_O_NC) #define SECTION_ENTRY_FULL_ACCESS_SHARED_DEVICE (PAGE_TAB_SECTION_CONSTANTS|NS_BIT_1|nG_0|S_BIT_1|AP_FULL_ACCESS|DOMAIN_0|TEX_C_B_NORMAL_SHARABLE_DEVICE)
#define M_BIT_1 0x1 #define C_BIT_1 0x4 #define Z_BIT_1 0x800 #define I_BIT_1 0x1000 #define V_BIT_1 0x2000
#define SCTRL_VALUE (M_BIT_1|C_BIT_1|Z_BIT_1|I_BIT_1|V_BIT_1) //#define SCTRL_VALUE (M_BIT_1|Z_BIT_1|I_BIT_1|V_BIT_1)
__asm void setvector(void); void setup_pagetable(unsigned int *pagetable, unsigned int devicemem_attr, unsigned int code_and_data); __asm unsigned int setup_vector(unsigned int (*vector)(void)); __asm unsigned int setup_ttbr0(unsigned int ttbr0value); __asm unsigned int setup_ttbr1(unsigned int ttbr1value); __asm unsigned int setup_ttbrc(unsigned int ttbrc_value); __asm unsigned int setup_dacr(unsigned int dacr_value); __asm unsigned int setup_sctlr(unsigned int sctlrr_value); __asm void invalidate_caches(void); __asm void barriers(void); __asm void enable_pl310(void); void invalidate_dcache_setway(unsigned int setway); void stresstestaftermmu(void);
__asm unsigned int setup_prrr(unsigned int prrr_value) { MCR p15,0,r0,c10,c2,0 ; Read CP15 Primary Region Remap Register bx lr }
__asm unsigned int setup_nmrr(unsigned int nmrr_value) { MCR p15,0,r0,c10,c2,1 ; Read CP15 Normal Memory Remap Register bx lr }
void enablemmu(void) { int set, way, setway;
volatile int i = 10;
setup_pagetable(pagetable, SECTION_ENTRY_FULL_ACCESS_SHARED_DEVICE, SECTION_ENTRY_FULL_ACCESS_FULL_CACHED);
printf("page table address 0x%08x\r\n", pagetable); invalidate_caches(); printf("Cache and TLB invalidation done..\r\n");
//All TTBR permission values(RGN etc) are 0's. So nothing else to be done for TTBR0 value. setup_ttbr0(((unsigned int)pagetable) | 0x48); setup_ttbr1(((unsigned int)pagetable) | 0x48);
//TTBRC is 1(N=1) setup_ttbrc(1);
setup_dacr(1); setup_vector(Vector); //Only I and M bit set in SCR for(set=0; set<256; set++) { for(way=0; way<4; way++) { setway = way << 30; setway |= set << 5; invalidate_dcache_setway(setway); } } printf("Data cache invalidation done..\r\n");
//Enable L2 enable_pl310();
barriers(); setup_sctlr(SCTRL_VALUE); //Enable everything except I$ and D$ printf("mmu enabled..\r\n");
printf("If you are seeing this mmu and caches are working fine..\r\n"); }
__asm void invalidate_dcache_setway(unsigned int setway) { MCR p15, 0, r0, c7, c6, 2 ; invalidate (aka purge) by set/way bx lr }
__asm void barriers(void) { DCD 0xF57FF04F ; DSBSY (full system DSB) DCD 0xF57FF06F ; ISBSY (full system ISB) bx lr } //__asm int setuppagetables __asm void invalidate_caches(void) { mov r1, #0 mcr p15, 0, r1, c8, c7, 0 ; Invalidate entire unified TLB mcr p15, 0, r1, c8, c6, 0 ; Invalidate entire data TLB mcr p15, 0, r1, c8, c5, 0 ; Invalidate entire instruction TLB mcr p15, 0, r1, c7, c5, 6 ; Invalidate entire branch prediction array mcr p15, 0, r1, c7, c5, 0 ; Invalidate icache bx lr }
__asm unsigned int setup_dacr(unsigned int dacr_value) { MCR p15,0,r0,c3,c0,0 ; Write CP15 Domain Access Control Register MRC p15,0,r0,c3,c0,0 ; Read CP15 Domain Access Control Register bx lr }
__asm unsigned int setup_ttbrc(unsigned int ttbrc_value) {
MCR p15, 0, r0, c2, c0, 2 //Write TTBRC - N=0 MRC p15, 0, r0, c2, c0, 2 //Write TTBRC - N=0 bx lr }
__asm unsigned int setup_ttbr0(unsigned int ttbr0value) {
//Only TTBR0 is used. MCR p15, 0, r0, c2, c0, 0 //Write TTBR0 MRC p15, 0, r0, c2, c0, 0 //Write TTBR0 bx lr } __asm unsigned int setup_ttbr1(unsigned int ttbr1value) {
//Only TTBR0 is used. MCR p15, 0,r0, c2, c0, 1 ;Write Translation Table Base Register 1 MRC p15, 0,r0, c2, c0, 1 ;Read Translation Table Base Register 1 bx lr }
__asm unsigned int setup_vector(unsigned int (*vector)(void)) { MCR p15,0,r0,c12,c0,0 ; Set VBAR MRC p15,0,r0,c12,c0,0 ; Read VBAR bx lr
}
__asm unsigned int setup_sctlr(unsigned int sctlr_value) { MCR p15,0,r0,c1,c0,0 ; Write CP15 System Control Register DCD 0xF57FF04F ; DSBSY (full system DSB) DCD 0xF57FF06F ; ISBSY (full system ISB) MRC p15,0,r0,c1,c0,0 ; Read CP15 System Control Register bx lr }
__asm void enable_pl310(void) { STMFD sp!, {r4-r12,lr} MOV r0, #1 LDR r12, =0x102 DCD 0xE1600070 ; call ROM Code API to enable PL310 on ZEBU/Si ldmfd sp!, {r4-r12,pc} }
void setup_pagetable(unsigned int *pagetable, unsigned int devicemem_attr, unsigned int code_and_data) { int i; //volatile int j=10; printf(">setup_pagetable - device mem: 0x%x code_data: 0x%x\r\n", devicemem_attr, code_and_data); // while(j); for (i=0; i<2*1024; i++) { pagetable[i] = (i << 20) | devicemem_attr; } #if 1 for(; i<2*1024+256; i++) { pagetable[i] = (i << 20) | code_and_data; } for(; i<2*1024+512; i++) { pagetable[i] = (i << 20) | code_and_data; } #endif
for (; i<4*1024; i++) { pagetable[i] = (i << 20) | code_and_data; }
pagetable[0x403] = (0x403 << 20) | code_and_data;
printf("section entry for 0x00000000 - 0x%08x\r\n", pagetable[0]); printf("section entry for 0x80000000 - 0x%08x\r\n", pagetable[0x800]); }
#endif

Hello V, Aneesh,
V, Aneesh wrote:
Hello Heiko Schocher,
bounces@lists.denx.de] On Behalf Of Heiko Schocher +static inline void mmu_setup(void) +{
- static u32 __attribute__((aligned(16384))) page_table[4096];
- bd_t *bd = gd->bd;
- int i, j;
- u32 reg;
- /* Set up an identity-mapping for all 4GB, rw for everyone */
- for (i = 0; i < 4096; i++)
page_table[i] = i << 20 | (3 << 10) | 0x12;
- /* Then, enable cacheable and bufferable for RAM only */
- for (j = 0; j < CONFIG_NR_DRAM_BANKS; j++) {
for (i = bd->bi_dram[j].start >> 20;
i < (bd->bi_dram[j].start + bd->bi_dram[j].size)
20;
i++) {
page_table[i] = i << 20 | (3 << 10) | CACHE_SETUP;
}
- }
- /* Copy the page table address to cp15 */
- asm volatile("mcr p15, 0, %0, c2, c0, 0"
: : "r" (page_table) : "memory");
- /* Set the access control to all-supervisor */
- asm volatile("mcr p15, 0, %0, c3, c0, 0"
: : "r" (~0));
- /* and enable the mmu */
- reg = get_cr(); /* get control reg. */
- cp_delay();
- set_cr(reg | CR_M);
I think you need to invalidate caches, TLB, branch-prediction array etc before enabling MMU. I don't think the caches are guaranteed to be invalidated at power on reset.
Please find below some experimental(but working) code I had written for enabling MMU and caches for an internal project. It's written for the ARM RVCT tool chain. So, it will not compile straight away on GCC.
It has code for invalidating caches, TLB, Branch prediction array etc. Also, it uses macros for the bit fields in the SCTLR register. However, the L1 D$ cleaning is not generic. It assumes the size and organization of the L1 cache. arch/arm/cpu/armv7/omap3/cache.S has a more generic implementation of invalidating the entire L1 D$.
In case you find it useful, you are welcome to reuse any part of this code. If you want me to create a patch myself I am open to do that too.
Thanks!
I try to try out your suggestions, if you can make a patch (and try it on a plattform it would be great ;-)
bye, Heiko

Hello Heiko,
-----Original Message----- From: Heiko Schocher [mailto:hs@denx.de] Sent: Wednesday, September 01, 2010 4:56 PM To: V, Aneesh Cc: U-Boot user list; Alessandro Rubini Subject: Re: [U-Boot] [PATCH 03/26] ARM: cp15: setup mmu and enable dcache
Hello V, Aneesh,
V, Aneesh wrote:
Hello Heiko Schocher,
bounces@lists.denx.de] On Behalf Of Heiko Schocher +static inline void mmu_setup(void) +{
- static u32 __attribute__((aligned(16384))) page_table[4096];
- bd_t *bd = gd->bd;
- int i, j;
- u32 reg;
- /* Set up an identity-mapping for all 4GB, rw for everyone */
- for (i = 0; i < 4096; i++)
page_table[i] = i << 20 | (3 << 10) | 0x12;
- /* Then, enable cacheable and bufferable for RAM only */
- for (j = 0; j < CONFIG_NR_DRAM_BANKS; j++) {
for (i = bd->bi_dram[j].start >> 20;
i < (bd->bi_dram[j].start + bd->bi_dram[j].size)
20;
i++) {
page_table[i] = i << 20 | (3 << 10) | CACHE_SETUP;
}
- }
- /* Copy the page table address to cp15 */
- asm volatile("mcr p15, 0, %0, c2, c0, 0"
: : "r" (page_table) : "memory");
- /* Set the access control to all-supervisor */
- asm volatile("mcr p15, 0, %0, c3, c0, 0"
: : "r" (~0));
- /* and enable the mmu */
- reg = get_cr(); /* get control reg. */
- cp_delay();
- set_cr(reg | CR_M);
I think you need to invalidate caches, TLB, branch-prediction
array etc before enabling MMU. I don't think the caches are guaranteed to be invalidated at power on reset.
Please find below some experimental(but working) code I had
written for enabling MMU and caches for an internal project. It's written for the ARM RVCT tool chain. So, it will not compile straight away on GCC.
It has code for invalidating caches, TLB, Branch prediction array
etc. Also, it uses macros for the bit fields in the SCTLR register. However, the L1 D$ cleaning is not generic. It assumes the size and organization of the L1 cache.
arch/arm/cpu/armv7/omap3/cache.S has a more generic implementation
of invalidating the entire L1 D$.
In case you find it useful, you are welcome to reuse any part of
this code. If you want me to create a patch myself I am open to do that too.
Thanks!
I try to try out your suggestions, if you can make a patch (and try it on a plattform it would be great ;-)
I will try this out as time permits. I should be able to test it on OMAP3430(Cortex-A8) and OMAP4430(Cortex-A9).
Thanks, Aneesh

Hello Heiko,
-----Original Message----- From: u-boot-bounces@lists.denx.de [mailto:u-boot- bounces@lists.denx.de] On Behalf Of Heiko Schocher Sent: Wednesday, August 11, 2010 11:46 PM To: U-Boot user list Cc: Alessandro Rubini Subject: [U-Boot] [PATCH 03/26] ARM: cp15: setup mmu and enable dcache
This has been tested on at91sam9263 and STN8815. Again, I didn't check if it has bad effects on non-arm926 cores.
Initially I had a "done" bit to only set up page tables at the beginning. However, since the aligmnent requirement was for the whole object file, this extra integer tool 16kB in BSS, so I chose to remove it.
This is rather strange. I thought the linker would have done better. However, I could overcome this problem by making 'done' a non-zero initialized variable. Can't you try that? I think the 'done' bit is quite useful. I see that the dcache_enable() is called from multiple places.
I tested your patch on OMAP4430(Cortex-A9). It works fine
Best regards, Aneesh

Dear "V, Aneesh",
In message FF55437E1F14DA4BAEB721A458B6701706C58880FB@dbde02.ent.ti.com you wrote:
Initially I had a "done" bit to only set up page tables at the beginning. However, since the aligmnent requirement was for the whole object file, this extra integer tool 16kB in BSS, so I chose to remove it.
This is rather strange. I thought the linker would have done better. However, I could overcome this problem by making 'done' a non-zero initialized variable. Can't you try that?
"non-zero initialized variable" means you moved the storage location from the bss into the data segment.
The same can be done by using something like
int done __attribute__ ((section (".data"))) = 0;
This is a better approach because 1) a non-zero value will be easily misinterpreted as "already done", and 2) this clearly documents what is going on and why.
Best regards,
Wolfgang Denk
participants (3)
-
Heiko Schocher
-
V, Aneesh
-
Wolfgang Denk