[U-Boot] [PATCH 0/22] x86: Add support for MTRRs

One of the four items left to fix up for bare ivybridge support in U-Boot is MTRRs. These are an important part of the x86 platform since they provide a means to control the cache behaviour for areas of memory.
Memory areas can be marked as uncacheable or cacheable. For cacheable the write behaviour can be controlled:
- write-back: data is not written back from the cache until evicted - write-though: data goes into the cache but is also written to memory - write-protect: data cannot be written to this area - write-combining: multiple writes to this area can be combined
This series adds support for deciding the MTRR setup that should be used, commiting it to registers before relocation, updating it for the video frame buffer, disabling CAR (cache-as-RAM) when needed and making sure that the faster possible execution speed is provided. In addition an 'mtrr' command is added to permit inspection and adjustment of MTRR registers from the command line.
Simon Glass (22): x86: Correct XIP_ROM_SIZE x86: Drop RAMTOP Kconfig x86: Correct ifdtool microcode calculation x86: video: Add support for CONFIG_CONSOLE_SCROLL_LINES x86: config: Always scroll the display by 5 lines, for speed x86: video: Add a debug() to display the frame buffer address x86: pci: Don't return a vesa mode when there is not video x86: video: Add debug option to time the BIOS copy x86: ivybridge: Only run the Video BIOS when video is enabled x86: Use cache, don't clear the display in video BIOS x86: Tidy up VESA mode numbers x86: pci: Display vesa modes in hex x86: ivybridge: Drop support for ROM caching x86: Add support for MTRRs x86: ivybridge: Set up an MTRR for the video frame buffer x86: board_f: Adjust x86 boot order for performance x86: ivybridge: Request MTRRs for DRAM regions x86: Commit the current MTRRs before relocation x86: ivybridge: Add a way to turn off the CAR x86: Disable CAR before relocation on platforms that need it x86: ivybridge: Update microcode early in boot x86: Add an 'mtrr' command to list and adjust MTRRs
arch/x86/Kconfig | 6 +- arch/x86/cpu/Makefile | 1 + arch/x86/cpu/coreboot/coreboot.c | 22 ++--- arch/x86/cpu/ivybridge/car.S | 78 +++++++++++++-- arch/x86/cpu/ivybridge/cpu.c | 27 +---- arch/x86/cpu/ivybridge/gma.c | 25 ++++- arch/x86/cpu/ivybridge/microcode_intel.c | 9 +- arch/x86/cpu/ivybridge/sdram.c | 10 ++ arch/x86/cpu/mtrr.c | 81 +++++++++++++++ arch/x86/cpu/start.S | 10 ++ arch/x86/dts/link.dts | 3 - arch/x86/include/asm/global_data.h | 15 +++ arch/x86/include/asm/mtrr.h | 163 ++++++++++++++----------------- arch/x86/lib/bios.c | 14 +-- arch/x86/lib/init_helpers.c | 8 ++ common/Makefile | 1 + common/board_f.c | 8 +- common/cmd_mtrr.c | 138 ++++++++++++++++++++++++++ doc/README.x86 | 18 +++- drivers/pci/pci_rom.c | 9 +- drivers/video/cfb_console.c | 26 +++-- drivers/video/x86_fb.c | 1 + include/configs/x86-common.h | 1 + tools/ifdtool.c | 4 +- 24 files changed, 509 insertions(+), 169 deletions(-) create mode 100644 arch/x86/cpu/mtrr.c create mode 100644 common/cmd_mtrr.c

We don't need this in U-Boot since we calculate it based on available memory.
Signed-off-by: Simon Glass sjg@chromium.org ---
arch/x86/Kconfig | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 7d007bb..e992e64 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -47,10 +47,6 @@ config RAMBASE hex default 0x100000
-config RAMTOP - hex - default 0x200000 - config XIP_ROM_SIZE hex default ROM_SIZE

Hi Simon,
On Sun, Dec 28, 2014 at 10:20 AM, Simon Glass sjg@chromium.org wrote:
We don't need this in U-Boot since we calculate it based on available memory.
Signed-off-by: Simon Glass sjg@chromium.org
arch/x86/Kconfig | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 7d007bb..e992e64 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -47,10 +47,6 @@ config RAMBASE hex default 0x100000
-config RAMTOP
hex
default 0x200000
config XIP_ROM_SIZE hex default ROM_SIZE --
What about mtrr.h which is still referring to CONFIG_RAMTOP?
~/work/u-boot-x86/arch/x86$ grep -nr RAMTOP * include/asm/mtrr.h:103:#if !defined(CONFIG_RAMTOP) include/asm/mtrr.h:104:# error "CONFIG_RAMTOP not defined" include/asm/mtrr.h:117:#if (CONFIG_RAMTOP & (CONFIG_RAMTOP - 1)) != 0 include/asm/mtrr.h:118:# error "CONFIG_RAMTOP must be a power of 2" Kconfig:50:config RAMTOP
Regards, Bin

This currently assumes that U-Boot resides at the start of ROM. Update it to remove this assumption.
Signed-off-by: Simon Glass sjg@chromium.org ---
tools/ifdtool.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/ifdtool.c b/tools/ifdtool.c index fe8366b..590ccc9 100644 --- a/tools/ifdtool.c +++ b/tools/ifdtool.c @@ -788,9 +788,9 @@ static int write_uboot(char *image, int size, struct input_file *uboot, fdt_strerror(data_size)); return -ENOENT; } - offset = ucode_ptr - uboot->addr; + offset = (uint32_t)(ucode_ptr + size); ptr = (void *)image + offset; - ptr[0] = uboot->addr + (data - image); + ptr[0] = (data - image) - size; ptr[1] = data_size; debug("Wrote microcode pointer at %x: addr=%x, size=%x\n", ucode_ptr, ptr[0], ptr[1]);

On Sun, Dec 28, 2014 at 10:20 AM, Simon Glass sjg@chromium.org wrote:
This currently assumes that U-Boot resides at the start of ROM. Update it to remove this assumption.
Signed-off-by: Simon Glass sjg@chromium.org
tools/ifdtool.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/ifdtool.c b/tools/ifdtool.c index fe8366b..590ccc9 100644 --- a/tools/ifdtool.c +++ b/tools/ifdtool.c @@ -788,9 +788,9 @@ static int write_uboot(char *image, int size, struct input_file *uboot, fdt_strerror(data_size)); return -ENOENT; }
offset = ucode_ptr - uboot->addr;
offset = (uint32_t)(ucode_ptr + size); ptr = (void *)image + offset;
ptr[0] = uboot->addr + (data - image);
ptr[0] = (data - image) - size; ptr[1] = data_size; debug("Wrote microcode pointer at %x: addr=%x, size=%x\n", ucode_ptr, ptr[0], ptr[1]);
--
Reviewed-by: Bin Meng bmeng.cn@gmail.com
Tested on Intel Crown Bay by adjusting ROM_SIZE to 2MB
Tested-by: Bin Meng bmeng.cn@gmail.com

On Wed, Dec 31, 2014 at 10:47 AM, Bin Meng bmeng.cn@gmail.com wrote:
On Sun, Dec 28, 2014 at 10:20 AM, Simon Glass sjg@chromium.org wrote:
This currently assumes that U-Boot resides at the start of ROM. Update it to remove this assumption.
Signed-off-by: Simon Glass sjg@chromium.org
tools/ifdtool.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/ifdtool.c b/tools/ifdtool.c index fe8366b..590ccc9 100644 --- a/tools/ifdtool.c +++ b/tools/ifdtool.c @@ -788,9 +788,9 @@ static int write_uboot(char *image, int size, struct input_file *uboot, fdt_strerror(data_size)); return -ENOENT; }
offset = ucode_ptr - uboot->addr;
offset = (uint32_t)(ucode_ptr + size); ptr = (void *)image + offset;
ptr[0] = uboot->addr + (data - image);
ptr[0] = (data - image) - size; ptr[1] = data_size; debug("Wrote microcode pointer at %x: addr=%x, size=%x\n", ucode_ptr, ptr[0], ptr[1]);
--
Reviewed-by: Bin Meng bmeng.cn@gmail.com
Tested on Intel Crown Bay by adjusting ROM_SIZE to 2MB
Tested-by: Bin Meng bmeng.cn@gmail.com
Oops, one additional space after 'Tested-by:' should be removed.
Tested-by: Bin Meng bmeng.cn@gmail.com

Some machines are very slow to scroll their displays. To cope with this, support the CONFIG_CONSOLE_SCROLL_LINES option. Setting this to 5 allows the display to operate at an acceptable speed by scrolling 5 lines at a time.
This same option is available for LCDs so when these systems are unified this code can be unified also.
Signed-off-by: Simon Glass sjg@chromium.org ---
drivers/video/cfb_console.c | 26 +++++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-)
diff --git a/drivers/video/cfb_console.c b/drivers/video/cfb_console.c index a653bb4..75cdf61 100644 --- a/drivers/video/cfb_console.c +++ b/drivers/video/cfb_console.c @@ -312,7 +312,11 @@ void console_cursor(int state); #define CONSOLE_ROW_SECOND (video_console_address + CONSOLE_ROW_SIZE) #define CONSOLE_ROW_LAST (video_console_address + CONSOLE_SIZE - CONSOLE_ROW_SIZE) #define CONSOLE_SIZE (CONSOLE_ROW_SIZE * CONSOLE_ROWS) -#define CONSOLE_SCROLL_SIZE (CONSOLE_SIZE - CONSOLE_ROW_SIZE) + +/* By default we scroll by a single line */ +#ifndef CONFIG_CONSOLE_SCROLL_LINES +#define CONFIG_CONSOLE_SCROLL_LINES 1 +#endif
/* Macros */ #ifdef VIDEO_FB_LITTLE_ENDIAN @@ -753,26 +757,33 @@ static void console_clear_line(int line, int begin, int end)
static void console_scrollup(void) { + const int rows = CONFIG_CONSOLE_SCROLL_LINES; + int i; + /* copy up rows ignoring the first one */
#ifdef VIDEO_HW_BITBLT video_hw_bitblt(VIDEO_PIXEL_SIZE, /* bytes per pixel */ 0, /* source pos x */ video_logo_height + - VIDEO_FONT_HEIGHT, /* source pos y */ + VIDEO_FONT_HEIGHT * rows, /* source pos y */ 0, /* dest pos x */ video_logo_height, /* dest pos y */ VIDEO_VISIBLE_COLS, /* frame width */ VIDEO_VISIBLE_ROWS - video_logo_height - - VIDEO_FONT_HEIGHT /* frame height */ + - VIDEO_FONT_HEIGHT * rows /* frame height */ ); #else - memcpyl(CONSOLE_ROW_FIRST, CONSOLE_ROW_SECOND, - CONSOLE_SCROLL_SIZE >> 2); + memcpyl(CONSOLE_ROW_FIRST, CONSOLE_ROW_FIRST + rows * CONSOLE_ROW_SIZE, + (CONSOLE_SIZE - CONSOLE_ROW_SIZE * rows) >> 2); #endif /* clear the last one */ - console_clear_line(CONSOLE_ROWS - 1, 0, CONSOLE_COLS - 1); + for (i = 1; i <= rows; i++) + console_clear_line(CONSOLE_ROWS - i, 0, CONSOLE_COLS - 1); + + /* Decrement row number */ + console_row -= rows; }
static void console_back(void) @@ -884,9 +895,6 @@ static void console_newline(int n) if (console_row >= CONSOLE_ROWS) { /* Scroll everything up */ console_scrollup(); - - /* Decrement row number */ - console_row = CONSOLE_ROWS - 1; } }

Scrolling a line at a time is very slow for reasons that I don't understand. It seems to take about 100ms to copy 4MB of RAM in the frame buffer. To cope with this, scroll 5 lines each time.
Signed-off-by: Simon Glass sjg@chromium.org ---
include/configs/x86-common.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/include/configs/x86-common.h b/include/configs/x86-common.h index f16ae32..4f0a3c5 100644 --- a/include/configs/x86-common.h +++ b/include/configs/x86-common.h @@ -179,6 +179,7 @@ #define VIDEO_FB_16BPP_WORD_SWAP #define CONFIG_I8042_KBD #define CONFIG_CFB_CONSOLE +#define CONFIG_CONSOLE_SCROLL_LINES 5
/*----------------------------------------------------------------------- * CPU Features

Provide a way to display this address when booting.
Signed-off-by: Simon Glass sjg@chromium.org ---
drivers/video/x86_fb.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/video/x86_fb.c b/drivers/video/x86_fb.c index 8743a8c..6641033 100644 --- a/drivers/video/x86_fb.c +++ b/drivers/video/x86_fb.c @@ -32,6 +32,7 @@ void *video_hw_init(void) sprintf(gdev->modeIdent, "%dx%dx%d", gdev->winSizeX, gdev->winSizeY, bits_per_pixel); printf("%s\n", gdev->modeIdent); + debug("Frame buffer at %x\n", gdev->frameAdrs);
return (void *)gdev; }

If the video has not been set up, we should not return a success code. This can be detected by seeing if any of the variables are non-zero.
Signed-off-by: Simon Glass sjg@chromium.org ---
drivers/pci/pci_rom.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pci/pci_rom.c b/drivers/pci/pci_rom.c index af6a3ae..9808bb3 100644 --- a/drivers/pci/pci_rom.c +++ b/drivers/pci/pci_rom.c @@ -205,7 +205,7 @@ int vbe_get_video_info(struct graphic_device *gdev) gdev->vprBase = vesa->phys_base_ptr; gdev->cprBase = vesa->phys_base_ptr;
- return 0; + return gdev->winSizeX ? 0 : -ENOSYS; #else return -ENOSYS; #endif

This can be very slow - typically 80ms even on a fast machine since it uses the SPI flash to read the data. Add an option to display the time taken.
Signed-off-by: Simon Glass sjg@chromium.org ---
drivers/pci/pci_rom.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/pci/pci_rom.c b/drivers/pci/pci_rom.c index 9808bb3..5ba315b 100644 --- a/drivers/pci/pci_rom.c +++ b/drivers/pci/pci_rom.c @@ -156,6 +156,8 @@ int pci_rom_load(uint16_t class, struct pci_rom_header *rom_header,
target = (void *)PCI_VGA_RAM_IMAGE_START; if (target != rom_header) { + ulong start = get_timer(0); + debug("Copying VGA ROM Image from %p to %p, 0x%x bytes\n", rom_header, target, rom_size); memcpy(target, rom_header, rom_size); @@ -163,6 +165,7 @@ int pci_rom_load(uint16_t class, struct pci_rom_header *rom_header, printf("VGA ROM copy failed\n"); return -EFAULT; } + debug("Copy took %lums\n", get_timer(start)); } *ram_headerp = target;

On Sun, Dec 28, 2014 at 10:20 AM, Simon Glass sjg@chromium.org wrote:
This can be very slow - typically 80ms even on a fast machine since it uses the SPI flash to read the data. Add an option to display the time taken.
Signed-off-by: Simon Glass sjg@chromium.org
drivers/pci/pci_rom.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/pci/pci_rom.c b/drivers/pci/pci_rom.c index 9808bb3..5ba315b 100644 --- a/drivers/pci/pci_rom.c +++ b/drivers/pci/pci_rom.c @@ -156,6 +156,8 @@ int pci_rom_load(uint16_t class, struct pci_rom_header *rom_header,
target = (void *)PCI_VGA_RAM_IMAGE_START; if (target != rom_header) {
ulong start = get_timer(0);
debug("Copying VGA ROM Image from %p to %p, 0x%x bytes\n", rom_header, target, rom_size); memcpy(target, rom_header, rom_size);
@@ -163,6 +165,7 @@ int pci_rom_load(uint16_t class, struct pci_rom_header *rom_header, printf("VGA ROM copy failed\n"); return -EFAULT; }
debug("Copy took %lums\n", get_timer(start)); } *ram_headerp = target;
--
Reviewed-by: Bin Meng bmeng.cn@gmail.com

This takes about about 700ms on link when running natively and 900ms when running using the emulator. It is a waste of time if video is not enabled, so don't bother running the video BIOS in that case.
We could add a command to run the video BIOS later when needed, but this is not considered at present.
Signed-off-by: Simon Glass sjg@chromium.org ---
arch/x86/cpu/ivybridge/gma.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/arch/x86/cpu/ivybridge/gma.c b/arch/x86/cpu/ivybridge/gma.c index 3d7f740..125021b 100644 --- a/arch/x86/cpu/ivybridge/gma.c +++ b/arch/x86/cpu/ivybridge/gma.c @@ -15,6 +15,13 @@ #include <asm/pci.h> #include <asm/arch/pch.h> #include <asm/arch/sandybridge.h> +#include <linux/kconfig.h> + +#ifdef CONFIG_VIDEO +#define RUN_VIDEO_BIOS 1 +#else +#define RUN_VIDEO_BIOS 1 +#endif
struct gt_powermeter { u16 reg; @@ -745,7 +752,15 @@ int gma_func0_init(pci_dev_t dev, struct pci_controller *hose, if (ret) return ret;
- ret = pci_run_vga_bios(dev, int15_handler, false); + /* + * TODO: Change to IS_ENABLED(CONFIG_VIDEO) when Kconfig supports + * CONFIG_VIDEO. + */ + if (RUN_VIDEO_BIOS) { + start = get_timer(0); + ret = pci_run_vga_bios(dev, int15_handler, false); + debug("BIOS ran in %lums\n", get_timer(start)); + }
/* Post VBIOS init */ ret = gma_pm_init_post_vbios(gtt_bar, blob, node);

Hi Simon,
On Sun, Dec 28, 2014 at 10:20 AM, Simon Glass sjg@chromium.org wrote:
This takes about about 700ms on link when running natively and 900ms when running using the emulator. It is a waste of time if video is not enabled, so don't bother running the video BIOS in that case.
We could add a command to run the video BIOS later when needed, but this is not considered at present.
Signed-off-by: Simon Glass sjg@chromium.org
arch/x86/cpu/ivybridge/gma.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/arch/x86/cpu/ivybridge/gma.c b/arch/x86/cpu/ivybridge/gma.c index 3d7f740..125021b 100644 --- a/arch/x86/cpu/ivybridge/gma.c +++ b/arch/x86/cpu/ivybridge/gma.c @@ -15,6 +15,13 @@ #include <asm/pci.h> #include <asm/arch/pch.h> #include <asm/arch/sandybridge.h> +#include <linux/kconfig.h>
+#ifdef CONFIG_VIDEO +#define RUN_VIDEO_BIOS 1 +#else +#define RUN_VIDEO_BIOS 1
Intentional? Should this be 0?
+#endif
struct gt_powermeter { u16 reg; @@ -745,7 +752,15 @@ int gma_func0_init(pci_dev_t dev, struct pci_controller *hose, if (ret) return ret;
ret = pci_run_vga_bios(dev, int15_handler, false);
/*
* TODO: Change to IS_ENABLED(CONFIG_VIDEO) when Kconfig supports
* CONFIG_VIDEO.
*/
Or maybe just simply do this? The "if (RUN_VIDEO_BIOS)" looks strange.
#ifdef CONFIG_VIDEO start = get_timer(0); ret = pci_run_vga_bios(dev, int15_handler, false); debug("BIOS ran in %lums\n", get_timer(start)); #endif
if (RUN_VIDEO_BIOS) {
start = get_timer(0);
ret = pci_run_vga_bios(dev, int15_handler, false);
debug("BIOS ran in %lums\n", get_timer(start));
} /* Post VBIOS init */ ret = gma_pm_init_post_vbios(gtt_bar, blob, node);
--
Regards, Bin

There is no need to run with the cache disabled, and there is no point in clearing the display frame buffer since U-Boot does it later.
Signed-off-by: Simon Glass sjg@chromium.org ---
arch/x86/lib/bios.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/x86/lib/bios.c b/arch/x86/lib/bios.c index d1f8933..4285348 100644 --- a/arch/x86/lib/bios.c +++ b/arch/x86/lib/bios.c @@ -210,8 +210,8 @@ static u8 vbe_set_mode(struct vbe_mode_info *mi) debug("VBE: Setting VESA mode %#04x\n", mi->video_mode); /* request linear framebuffer mode */ mi->video_mode |= (1 << 14); - /* request clearing of framebuffer */ - mi->video_mode &= ~(1 << 15); + /* don't clear the framebuffer, we do that later */ + mi->video_mode |= (1 << 15); realmode_interrupt(0x10, VESA_SET_MODE, mi->video_mode, 0x0000, 0x0000, 0x0000, 0x0000);
@@ -262,7 +262,6 @@ void bios_run_on_x86(pci_dev_t pcidev, unsigned long addr, int vesa_mode, /* Make sure the code is placed. */ setup_realmode_code();
- disable_caches(); debug("Calling Option ROM at %lx, pci device %#x...", addr, num_dev);
/* Option ROM entry point is at OPROM start + 3 */

There are some bits which should be ignored when displaying the mode number. Make sure that they are not included in the mode that is displayed.
Signed-off-by: Simon Glass sjg@chromium.org ---
arch/x86/lib/bios.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/arch/x86/lib/bios.c b/arch/x86/lib/bios.c index 4285348..1d75cfc 100644 --- a/arch/x86/lib/bios.c +++ b/arch/x86/lib/bios.c @@ -207,12 +207,14 @@ static u8 vbe_get_mode_info(struct vbe_mode_info *mi)
static u8 vbe_set_mode(struct vbe_mode_info *mi) { - debug("VBE: Setting VESA mode %#04x\n", mi->video_mode); + int video_mode = mi->video_mode; + + debug("VBE: Setting VESA mode %#04x\n", video_mode); /* request linear framebuffer mode */ - mi->video_mode |= (1 << 14); + video_mode |= (1 << 14); /* don't clear the framebuffer, we do that later */ - mi->video_mode |= (1 << 15); - realmode_interrupt(0x10, VESA_SET_MODE, mi->video_mode, + video_mode |= (1 << 15); + realmode_interrupt(0x10, VESA_SET_MODE, video_mode, 0x0000, 0x0000, 0x0000, 0x0000);
return 0; @@ -236,6 +238,7 @@ static void vbe_set_graphics(int vesa_mode, struct vbe_mode_info *mode_info) return; }
+ mode_info->video_mode &= 0x3ff; vbe_set_mode(mode_info); }

The hex value is more commonly understood, so use that instead of decimal. Add a 0x prefix to avoid confusion.
Signed-off-by: Simon Glass sjg@chromium.org ---
drivers/pci/pci_rom.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/pci_rom.c b/drivers/pci/pci_rom.c index 5ba315b..7d25cc9 100644 --- a/drivers/pci/pci_rom.c +++ b/drivers/pci/pci_rom.c @@ -247,7 +247,7 @@ int pci_run_vga_bios(pci_dev_t dev, int (*int15_handler)(void), bool emulate) defined(CONFIG_FRAMEBUFFER_VESA_MODE) vesa_mode = CONFIG_FRAMEBUFFER_VESA_MODE; #endif - debug("Selected vesa mode %d\b", vesa_mode); + debug("Selected vesa mode %#x\n", vesa_mode); if (emulate) { #ifdef CONFIG_BIOSEMU BE_VGAInfo *info; @@ -275,7 +275,7 @@ int pci_run_vga_bios(pci_dev_t dev, int (*int15_handler)(void), bool emulate) return -ENOSYS; #endif } - debug("Final vesa mode %d\n", mode_info.video_mode); + debug("Final vesa mode %#x\n", mode_info.video_mode);
return 0; }

On Sun, Dec 28, 2014 at 10:20 AM, Simon Glass sjg@chromium.org wrote:
The hex value is more commonly understood, so use that instead of decimal. Add a 0x prefix to avoid confusion.
Signed-off-by: Simon Glass sjg@chromium.org
drivers/pci/pci_rom.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/pci_rom.c b/drivers/pci/pci_rom.c index 5ba315b..7d25cc9 100644 --- a/drivers/pci/pci_rom.c +++ b/drivers/pci/pci_rom.c @@ -247,7 +247,7 @@ int pci_run_vga_bios(pci_dev_t dev, int (*int15_handler)(void), bool emulate) defined(CONFIG_FRAMEBUFFER_VESA_MODE) vesa_mode = CONFIG_FRAMEBUFFER_VESA_MODE; #endif
debug("Selected vesa mode %d\b", vesa_mode);
debug("Selected vesa mode %#x\n", vesa_mode); if (emulate) {
#ifdef CONFIG_BIOSEMU BE_VGAInfo *info; @@ -275,7 +275,7 @@ int pci_run_vga_bios(pci_dev_t dev, int (*int15_handler)(void), bool emulate) return -ENOSYS; #endif }
debug("Final vesa mode %d\n", mode_info.video_mode);
debug("Final vesa mode %#x\n", mode_info.video_mode); return 0;
}
Reviewed-by: Bin Meng bmeng.cn@gmail.com

This is set up along with CAR (Cache-as-RAM) anyway. When we relocate we don't really need ROM caching (we read the VGA BIOS from ROM but that is about it)
Drop it.
Signed-off-by: Simon Glass sjg@chromium.org ---
arch/x86/cpu/ivybridge/cpu.c | 25 ------------------------- 1 file changed, 25 deletions(-)
diff --git a/arch/x86/cpu/ivybridge/cpu.c b/arch/x86/cpu/ivybridge/cpu.c index 969b07b..0543e06 100644 --- a/arch/x86/cpu/ivybridge/cpu.c +++ b/arch/x86/cpu/ivybridge/cpu.c @@ -49,27 +49,6 @@ static void enable_spi_prefetch(struct pci_controller *hose, pci_dev_t dev) pci_hose_write_config_byte(hose, dev, 0xdc, reg8); }
-static void set_var_mtrr( - unsigned reg, unsigned base, unsigned size, unsigned type) - -{ - /* Bit Bit 32-35 of MTRRphysMask should be set to 1 */ - /* FIXME: It only support 4G less range */ - wrmsr(MTRRphysBase_MSR(reg), base | type, 0); - wrmsr(MTRRphysMask_MSR(reg), ~(size - 1) | MTRRphysMaskValid, - (1 << (CONFIG_CPU_ADDR_BITS - 32)) - 1); -} - -static void enable_rom_caching(void) -{ - disable_caches(); - set_var_mtrr(1, 0xffc00000, 4 << 20, MTRR_TYPE_WRPROT); - enable_caches(); - - /* Enable Variable MTRRs */ - wrmsr(MTRRdefType_MSR, 0x800, 0); -} - static int set_flex_ratio_to_tdp_nominal(void) { msr_t flex_ratio, msr; @@ -165,10 +144,6 @@ int arch_cpu_init(void) /* This is already done in start.S, but let's do it in C */ enable_port80_on_lpc(hose, PCH_LPC_DEV);
- /* already done in car.S */ - if (false) - enable_rom_caching(); - set_spi_speed();
/*

Memory Type Range Registers are used to tell the CPU whether memory is cacheable and if so the cache write mode to use.
Clean up the existing header file to follow style, and remove the unneeded code.
These can speed up booting so should be supported. Add these to global_data so they can be requested while booting. We will apply the changes during relocation (in a later commit).
Signed-off-by: Simon Glass sjg@chromium.org ---
arch/x86/cpu/Makefile | 1 + arch/x86/cpu/coreboot/coreboot.c | 22 +++-- arch/x86/cpu/ivybridge/car.S | 12 +-- arch/x86/cpu/mtrr.c | 81 ++++++++++++++++++ arch/x86/include/asm/global_data.h | 15 ++++ arch/x86/include/asm/mtrr.h | 163 +++++++++++++++++-------------------- 6 files changed, 186 insertions(+), 108 deletions(-) create mode 100644 arch/x86/cpu/mtrr.c
diff --git a/arch/x86/cpu/Makefile b/arch/x86/cpu/Makefile index 5033d2b..62e43c0 100644 --- a/arch/x86/cpu/Makefile +++ b/arch/x86/cpu/Makefile @@ -17,5 +17,6 @@ obj-$(CONFIG_NORTHBRIDGE_INTEL_SANDYBRIDGE) += ivybridge/ obj-$(CONFIG_NORTHBRIDGE_INTEL_IVYBRIDGE) += ivybridge/ obj-$(CONFIG_INTEL_QUEENSBAY) += queensbay/ obj-y += lapic.o +obj-y += mtrr.o obj-$(CONFIG_PCI) += pci.o obj-y += turbo.o diff --git a/arch/x86/cpu/coreboot/coreboot.c b/arch/x86/cpu/coreboot/coreboot.c index cfacc05..6d06d5a 100644 --- a/arch/x86/cpu/coreboot/coreboot.c +++ b/arch/x86/cpu/coreboot/coreboot.c @@ -15,6 +15,7 @@ #include <asm/cache.h> #include <asm/cpu.h> #include <asm/io.h> +#include <asm/mtrr.h> #include <asm/arch/tables.h> #include <asm/arch/sysinfo.h> #include <asm/arch/timestamp.h> @@ -64,11 +65,6 @@ int board_eth_init(bd_t *bis) return pci_eth_init(bis); }
-#define MTRR_TYPE_WP 5 -#define MTRRcap_MSR 0xfe -#define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg)) -#define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1) - void board_final_cleanup(void) { /* Un-cache the ROM so the kernel has one @@ -77,15 +73,17 @@ void board_final_cleanup(void) * Coreboot should have assigned this to the * top available variable MTRR. */ - u8 top_mtrr = (native_read_msr(MTRRcap_MSR) & 0xff) - 1; - u8 top_type = native_read_msr(MTRRphysBase_MSR(top_mtrr)) & 0xff; + u8 top_mtrr = (native_read_msr(MTRR_CAP_MSR) & 0xff) - 1; + u8 top_type = native_read_msr(MTRR_PHYS_BASE_MSR(top_mtrr)) & 0xff;
/* Make sure this MTRR is the correct Write-Protected type */ - if (top_type == MTRR_TYPE_WP) { - disable_caches(); - wrmsrl(MTRRphysBase_MSR(top_mtrr), 0); - wrmsrl(MTRRphysMask_MSR(top_mtrr), 0); - enable_caches(); + if (top_type == MTRR_TYPE_WRPROT) { + struct mtrr_state state; + + mtrr_open(&state); + wrmsrl(MTRR_PHYS_BASE_MSR(top_mtrr), 0); + wrmsrl(MTRR_PHYS_MASK_MSR(top_mtrr), 0); + mtrr_close(&state); }
/* Issue SMI to Coreboot to lock down ME and registers */ diff --git a/arch/x86/cpu/ivybridge/car.S b/arch/x86/cpu/ivybridge/car.S index dca68e4..72b22ea 100644 --- a/arch/x86/cpu/ivybridge/car.S +++ b/arch/x86/cpu/ivybridge/car.S @@ -61,7 +61,7 @@ clear_mtrrs:
post_code(POST_CAR_MTRR) /* Configure the default memory type to uncacheable */ - movl $MTRRdefType_MSR, %ecx + movl $MTRR_DEF_TYPE_MSR, %ecx rdmsr andl $(~0x00000cff), %eax wrmsr @@ -76,16 +76,16 @@ clear_mtrrs: post_code(POST_CAR_BASE_ADDRESS) /* Set Cache-as-RAM mask */ movl $(MTRR_PHYS_MASK_MSR(0)), %ecx - movl $(~(CACHE_AS_RAM_SIZE - 1) | MTRRphysMaskValid), %eax + movl $(~(CACHE_AS_RAM_SIZE - 1) | MTRR_PHYS_MASK_VALID), %eax movl $CPU_PHYSMASK_HI, %edx wrmsr
post_code(POST_CAR_MASK)
/* Enable MTRR */ - movl $MTRRdefType_MSR, %ecx + movl $MTRR_DEF_TYPE_MSR, %ecx rdmsr - orl $MTRRdefTypeEn, %eax + orl $MTRR_DEF_TYPE_EN, %eax wrmsr
/* Enable cache (CR0.CD = 0, CR0.NW = 0) */ @@ -130,7 +130,7 @@ clear_mtrrs:
movl $MTRR_PHYS_MASK_MSR(1), %ecx movl $CPU_PHYSMASK_HI, %edx - movl $(~(CONFIG_XIP_ROM_SIZE - 1) | MTRRphysMaskValid), %eax + movl $(~(CONFIG_XIP_ROM_SIZE - 1) | MTRR_PHYS_MASK_VALID), %eax wrmsr
post_code(POST_CAR_ROM_CACHE) @@ -141,7 +141,7 @@ clear_mtrrs: xorl %edx, %edx wrmsr movl $MTRR_PHYS_MASK_MSR(2), %ecx - movl $(CACHE_MRC_MASK | MTRRphysMaskValid), %eax + movl $(CACHE_MRC_MASK | MTRR_PHYS_MASK_VALID), %eax movl $CPU_PHYSMASK_HI, %edx wrmsr #endif diff --git a/arch/x86/cpu/mtrr.c b/arch/x86/cpu/mtrr.c new file mode 100644 index 0000000..d5a825d1 --- /dev/null +++ b/arch/x86/cpu/mtrr.c @@ -0,0 +1,81 @@ +/* + * (C) Copyright 2014 Google, Inc + * + * SPDX-License-Identifier: GPL-2.0+ + * + * Memory Type Range Regsters - these are used to tell the CPU whether + * memory is cacheable and if so the cache write mode to use. + * + * These can speed up booting. See the mtrr command. + * + * Reference: Intel Architecture Software Developer's Manual, Volume 3: + * System Programming + */ + +#include <common.h> +#include <asm/io.h> +#include <asm/msr.h> +#include <asm/mtrr.h> + +/* Prepare to adjust MTRRs */ +void mtrr_open(struct mtrr_state *state) +{ + state->enable_cache = dcache_status(); + + if (state->enable_cache) + disable_caches(); + state->deftype = native_read_msr(MTRR_DEF_TYPE_MSR); + wrmsrl(MTRR_DEF_TYPE_MSR, state->deftype & ~MTRR_DEF_TYPE_EN); +} + +/* Clean up after adjusting MTRRs, and enable them */ +void mtrr_close(struct mtrr_state *state) +{ + wrmsrl(MTRR_DEF_TYPE_MSR, state->deftype | MTRR_DEF_TYPE_EN); + if (state->enable_cache) + enable_caches(); +} + +int mtrr_commit(bool do_caches) +{ + struct mtrr_request *req = gd->arch.mtrr_req; + struct mtrr_state state; + uint64_t mask; + int i; + + mtrr_open(&state); + for (i = 0; i < gd->arch.mtrr_req_count; i++, req++) { + mask = ~(req->size - 1); + mask &= (1ULL << CONFIG_CPU_ADDR_BITS) - 1; + wrmsrl(MTRR_PHYS_BASE_MSR(i), req->start | req->type); + wrmsrl(MTRR_PHYS_MASK_MSR(i), mask | MTRR_PHYS_MASK_VALID); + } + + /* Clear the ones that are unused */ + for (; i < MTRR_COUNT; i++) + wrmsrl(MTRR_PHYS_MASK_MSR(i), 0); + mtrr_close(&state); + + return 0; +} + +int mtrr_add_request(int type, uint64_t start, uint64_t size) +{ + struct mtrr_request *req; + uint64_t mask; + + if (gd->arch.mtrr_req_count == MAX_MTRR_REQUESTS) + return -ENOSPC; + req = &gd->arch.mtrr_req[gd->arch.mtrr_req_count++]; + req->type = type; + req->start = start; + req->size = size; + debug("%d: type=%d, %08llx %08llx\n", gd->arch.mtrr_req_count - 1, + req->type, req->start, req->size); + mask = ~(req->size - 1); + mask &= (1ULL << CONFIG_CPU_ADDR_BITS) - 1; + mask |= MTRR_PHYS_MASK_VALID; + debug(" %016llx %016llx\n", req->start | req->type, mask); + + return 0; +} diff --git a/arch/x86/include/asm/global_data.h b/arch/x86/include/asm/global_data.h index 03d491a..15e76f6 100644 --- a/arch/x86/include/asm/global_data.h +++ b/arch/x86/include/asm/global_data.h @@ -29,6 +29,19 @@ struct memory_info { struct memory_area area[CONFIG_NR_DRAM_BANKS]; };
+#define MAX_MTRR_REQUESTS 8 + +/** + * A request for a memory region to be set up in a particular way. These + * requests are processed before board_init_r() is called. They are generally + * optional and can be ignored with some performance impact. + */ +struct mtrr_request { + int type; /* MTRR_TYPE_... */ + uint64_t start; + uint64_t size; +}; + /* Architecture-specific global data */ struct arch_global_data { struct global_data *gd_addr; /* Location of Global Data */ @@ -50,6 +63,8 @@ struct arch_global_data { #ifdef CONFIG_HAVE_FSP void *hob_list; /* FSP HOB list */ #endif + struct mtrr_request mtrr_req[MAX_MTRR_REQUESTS]; + int mtrr_req_count; };
#endif diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h index 5f05a48..3c11740 100644 --- a/arch/x86/include/asm/mtrr.h +++ b/arch/x86/include/asm/mtrr.h @@ -9,99 +9,86 @@ #ifndef _ASM_MTRR_H #define _ASM_MTRR_H
-/* These are the region types */ -#define MTRR_TYPE_UNCACHEABLE 0 -#define MTRR_TYPE_WRCOMB 1 -/*#define MTRR_TYPE_ 2*/ -/*#define MTRR_TYPE_ 3*/ -#define MTRR_TYPE_WRTHROUGH 4 -#define MTRR_TYPE_WRPROT 5 -#define MTRR_TYPE_WRBACK 6 -#define MTRR_NUM_TYPES 7 - -#define MTRRcap_MSR 0x0fe -#define MTRRdefType_MSR 0x2ff - -#define MTRRdefTypeEn (1 << 11) -#define MTRRdefTypeFixEn (1 << 10) - -#define SMRRphysBase_MSR 0x1f2 -#define SMRRphysMask_MSR 0x1f3 - -#define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg)) -#define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1) - -#define MTRRphysMaskValid (1 << 11) - -#define NUM_FIXED_RANGES 88 -#define RANGES_PER_FIXED_MTRR 8 -#define MTRRfix64K_00000_MSR 0x250 -#define MTRRfix16K_80000_MSR 0x258 -#define MTRRfix16K_A0000_MSR 0x259 -#define MTRRfix4K_C0000_MSR 0x268 -#define MTRRfix4K_C8000_MSR 0x269 -#define MTRRfix4K_D0000_MSR 0x26a -#define MTRRfix4K_D8000_MSR 0x26b -#define MTRRfix4K_E0000_MSR 0x26c -#define MTRRfix4K_E8000_MSR 0x26d -#define MTRRfix4K_F0000_MSR 0x26e -#define MTRRfix4K_F8000_MSR 0x26f +/* MTRR region types */ +#define MTRR_TYPE_UNCACHEABLE 0 +#define MTRR_TYPE_WRCOMB 1 +#define MTRR_TYPE_WRTHROUGH 4 +#define MTRR_TYPE_WRPROT 5 +#define MTRR_TYPE_WRBACK 6 + +#define MTRR_TYPE_COUNT 7 + +#define MTRR_CAP_MSR 0x0fe +#define MTRR_DEF_TYPE_MSR 0x2ff + +#define MTRR_DEF_TYPE_EN (1 << 11) +#define MTRR_DEF_TYPE_FIX_EN (1 << 10) + +#define MTRR_PHYS_BASE_MSR(reg) (0x200 + 2 * (reg)) +#define MTRR_PHYS_MASK_MSR(reg) (0x200 + 2 * (reg) + 1) + +#define MTRR_PHYS_MASK_VALID (1 << 11) + +#define MTRR_BASE_TYPE_MASK 0x7 + +/* Number of MTRRs supported */ +#define MTRR_COUNT 8
#if !defined(__ASSEMBLER__)
-/* - * The MTRR code has some side effects that the callers should be aware for. - * 1. The call sequence matters. x86_setup_mtrrs() calls - * x86_setup_fixed_mtrrs_no_enable() then enable_fixed_mtrrs() (equivalent - * of x86_setup_fixed_mtrrs()) then x86_setup_var_mtrrs(). If the callers - * want to call the components of x86_setup_mtrrs() because of other - * rquirements the ordering should still preserved. - * 2. enable_fixed_mtrr() will enable both variable and fixed MTRRs because - * of the nature of the global MTRR enable flag. Therefore, all direct - * or indirect callers of enable_fixed_mtrr() should ensure that the - * variable MTRR MSRs do not contain bad ranges. - * 3. If CONFIG_CACHE_ROM is selected an MTRR is allocated for enabling - * the caching of the ROM. However, it is set to uncacheable (UC). It - * is the responsiblity of the caller to enable it by calling - * x86_mtrr_enable_rom_caching(). +/** + * Information about the previous MTRR state, set up by mtrr_open() + * + * @deftype: Previous value of MTRR_DEF_TYPE_MSR + * @enable_cache: true if cache was enabled */ -void x86_setup_mtrrs(void); -/* - * x86_setup_var_mtrrs() parameters: - * address_bits - number of physical address bits supported by cpu - * above4gb - 2 means dynamically detect number of variable MTRRs available. - * non-zero means handle memory ranges above 4GiB. - * 0 means ignore memory ranges above 4GiB +struct mtrr_state { + uint64_t deftype; + bool enable_cache; +}; + +/** + * mtrr_open() - Prepare to adjust MTRRs + * + * Use mtrr_open() passing in a structure - this function will init it. Then + * when done, pass the same structure to mtrr_close() to re-enable MTRRs and + * possibly the cache. + * + * @state: Empty structure to pass in to hold settings */ -void x86_setup_var_mtrrs(unsigned int address_bits, unsigned int above4gb); -void enable_fixed_mtrr(void); -void x86_setup_fixed_mtrrs(void); -/* Set up fixed MTRRs but do not enable them. */ -void x86_setup_fixed_mtrrs_no_enable(void); -int x86_mtrr_check(void); -/* ROM caching can be used after variable MTRRs are set up. Beware that - * enabling CONFIG_CACHE_ROM will eat through quite a few MTRRs based on - * one's IO hole size and WRCOMB resources. Be sure to check the console - * log when enabling CONFIG_CACHE_ROM or adding WRCOMB resources. Beware that - * on CPUs with core-scoped MTRR registers such as hyperthreaded CPUs the - * rom caching will be disabled if all threads run the MTRR code. Therefore, - * one needs to call x86_mtrr_enable_rom_caching() after all threads of the - * same core have run the MTRR code. */ -#if CONFIG_CACHE_ROM -void x86_mtrr_enable_rom_caching(void); -void x86_mtrr_disable_rom_caching(void); -/* Return the variable range MTRR index of the ROM cache. */ -long x86_mtrr_rom_cache_var_index(void); -#else -static inline void x86_mtrr_enable_rom_caching(void) {} -static inline void x86_mtrr_disable_rom_caching(void) {} -static inline long x86_mtrr_rom_cache_var_index(void) { return -1; } -#endif /* CONFIG_CACHE_ROM */ +void mtrr_open(struct mtrr_state *state);
-#endif +/** + * mtrr_open() - Clean up after adjusting MTRRs, and enable them + * + * This uses the structure containing information returned from mtrr_open(). + * + * @state: Structure from mtrr_open() + */ +/* */ +void mtrr_close(struct mtrr_state *state); + +/** + * mtrr_add_request() - Add a new MTRR request + * + * This adds a request for a memory region to be set up in a particular way. + * + * @type: Requested type (MTRR_TYPE_) + * @start: Start address + * @size: Size + */ +int mtrr_add_request(int type, uint64_t start, uint64_t size); + +/** + * mtrr_commit() - set up the MTRR registers based on current requests + * + * This sets up MTRRs for the available DRAM and the requests received so far. + * It must be called with caches disabled. + * + * @do_caches: true if caches are currently on + */ +int mtrr_commit(bool do_caches);
-#if !defined(CONFIG_RAMTOP) -# error "CONFIG_RAMTOP not defined" #endif
#if ((CONFIG_XIP_ROM_SIZE & (CONFIG_XIP_ROM_SIZE - 1)) != 0) @@ -114,8 +101,4 @@ static inline long x86_mtrr_rom_cache_var_index(void) { return -1; }
#define CACHE_ROM_BASE (((1 << 20) - (CONFIG_CACHE_ROM_SIZE >> 12)) << 12)
-#if (CONFIG_RAMTOP & (CONFIG_RAMTOP - 1)) != 0 -# error "CONFIG_RAMTOP must be a power of 2" -#endif - #endif

Hi Simon,
On Sun, Dec 28, 2014 at 10:20 AM, Simon Glass sjg@chromium.org wrote:
Memory Type Range Registers are used to tell the CPU whether memory is cacheable and if so the cache write mode to use.
Clean up the existing header file to follow style, and remove the unneeded code.
These can speed up booting so should be supported. Add these to global_data so they can be requested while booting. We will apply the changes during relocation (in a later commit).
Signed-off-by: Simon Glass sjg@chromium.org
arch/x86/cpu/Makefile | 1 + arch/x86/cpu/coreboot/coreboot.c | 22 +++-- arch/x86/cpu/ivybridge/car.S | 12 +-- arch/x86/cpu/mtrr.c | 81 ++++++++++++++++++ arch/x86/include/asm/global_data.h | 15 ++++ arch/x86/include/asm/mtrr.h | 163 +++++++++++++++++-------------------- 6 files changed, 186 insertions(+), 108 deletions(-) create mode 100644 arch/x86/cpu/mtrr.c
diff --git a/arch/x86/cpu/Makefile b/arch/x86/cpu/Makefile index 5033d2b..62e43c0 100644 --- a/arch/x86/cpu/Makefile +++ b/arch/x86/cpu/Makefile @@ -17,5 +17,6 @@ obj-$(CONFIG_NORTHBRIDGE_INTEL_SANDYBRIDGE) += ivybridge/ obj-$(CONFIG_NORTHBRIDGE_INTEL_IVYBRIDGE) += ivybridge/ obj-$(CONFIG_INTEL_QUEENSBAY) += queensbay/ obj-y += lapic.o +obj-y += mtrr.o obj-$(CONFIG_PCI) += pci.o obj-y += turbo.o diff --git a/arch/x86/cpu/coreboot/coreboot.c b/arch/x86/cpu/coreboot/coreboot.c index cfacc05..6d06d5a 100644 --- a/arch/x86/cpu/coreboot/coreboot.c +++ b/arch/x86/cpu/coreboot/coreboot.c @@ -15,6 +15,7 @@ #include <asm/cache.h> #include <asm/cpu.h> #include <asm/io.h> +#include <asm/mtrr.h> #include <asm/arch/tables.h> #include <asm/arch/sysinfo.h> #include <asm/arch/timestamp.h> @@ -64,11 +65,6 @@ int board_eth_init(bd_t *bis) return pci_eth_init(bis); }
-#define MTRR_TYPE_WP 5 -#define MTRRcap_MSR 0xfe -#define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg)) -#define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1)
void board_final_cleanup(void) { /* Un-cache the ROM so the kernel has one @@ -77,15 +73,17 @@ void board_final_cleanup(void) * Coreboot should have assigned this to the * top available variable MTRR. */
u8 top_mtrr = (native_read_msr(MTRRcap_MSR) & 0xff) - 1;
u8 top_type = native_read_msr(MTRRphysBase_MSR(top_mtrr)) & 0xff;
u8 top_mtrr = (native_read_msr(MTRR_CAP_MSR) & 0xff) - 1;
u8 top_type = native_read_msr(MTRR_PHYS_BASE_MSR(top_mtrr)) & 0xff; /* Make sure this MTRR is the correct Write-Protected type */
if (top_type == MTRR_TYPE_WP) {
disable_caches();
wrmsrl(MTRRphysBase_MSR(top_mtrr), 0);
wrmsrl(MTRRphysMask_MSR(top_mtrr), 0);
enable_caches();
if (top_type == MTRR_TYPE_WRPROT) {
struct mtrr_state state;
mtrr_open(&state);
wrmsrl(MTRR_PHYS_BASE_MSR(top_mtrr), 0);
wrmsrl(MTRR_PHYS_MASK_MSR(top_mtrr), 0);
mtrr_close(&state); } /* Issue SMI to Coreboot to lock down ME and registers */
diff --git a/arch/x86/cpu/ivybridge/car.S b/arch/x86/cpu/ivybridge/car.S index dca68e4..72b22ea 100644 --- a/arch/x86/cpu/ivybridge/car.S +++ b/arch/x86/cpu/ivybridge/car.S @@ -61,7 +61,7 @@ clear_mtrrs:
post_code(POST_CAR_MTRR) /* Configure the default memory type to uncacheable */
movl $MTRRdefType_MSR, %ecx
movl $MTRR_DEF_TYPE_MSR, %ecx rdmsr andl $(~0x00000cff), %eax wrmsr
@@ -76,16 +76,16 @@ clear_mtrrs: post_code(POST_CAR_BASE_ADDRESS) /* Set Cache-as-RAM mask */ movl $(MTRR_PHYS_MASK_MSR(0)), %ecx
movl $(~(CACHE_AS_RAM_SIZE - 1) | MTRRphysMaskValid), %eax
movl $(~(CACHE_AS_RAM_SIZE - 1) | MTRR_PHYS_MASK_VALID), %eax movl $CPU_PHYSMASK_HI, %edx wrmsr post_code(POST_CAR_MASK) /* Enable MTRR */
movl $MTRRdefType_MSR, %ecx
movl $MTRR_DEF_TYPE_MSR, %ecx rdmsr
orl $MTRRdefTypeEn, %eax
orl $MTRR_DEF_TYPE_EN, %eax wrmsr /* Enable cache (CR0.CD = 0, CR0.NW = 0) */
@@ -130,7 +130,7 @@ clear_mtrrs:
movl $MTRR_PHYS_MASK_MSR(1), %ecx movl $CPU_PHYSMASK_HI, %edx
movl $(~(CONFIG_XIP_ROM_SIZE - 1) | MTRRphysMaskValid), %eax
movl $(~(CONFIG_XIP_ROM_SIZE - 1) | MTRR_PHYS_MASK_VALID), %eax wrmsr post_code(POST_CAR_ROM_CACHE)
@@ -141,7 +141,7 @@ clear_mtrrs: xorl %edx, %edx wrmsr movl $MTRR_PHYS_MASK_MSR(2), %ecx
movl $(CACHE_MRC_MASK | MTRRphysMaskValid), %eax
movl $(CACHE_MRC_MASK | MTRR_PHYS_MASK_VALID), %eax movl $CPU_PHYSMASK_HI, %edx wrmsr
#endif diff --git a/arch/x86/cpu/mtrr.c b/arch/x86/cpu/mtrr.c new file mode 100644 index 0000000..d5a825d1 --- /dev/null +++ b/arch/x86/cpu/mtrr.c @@ -0,0 +1,81 @@ +/*
- (C) Copyright 2014 Google, Inc
- SPDX-License-Identifier: GPL-2.0+
- Memory Type Range Regsters - these are used to tell the CPU whether
- memory is cacheable and if so the cache write mode to use.
- These can speed up booting. See the mtrr command.
- Reference: Intel Architecture Software Developer's Manual, Volume 3:
- System Programming
- */
+#include <common.h> +#include <asm/io.h> +#include <asm/msr.h> +#include <asm/mtrr.h>
+/* Prepare to adjust MTRRs */ +void mtrr_open(struct mtrr_state *state) +{
state->enable_cache = dcache_status();
if (state->enable_cache)
disable_caches();
state->deftype = native_read_msr(MTRR_DEF_TYPE_MSR);
wrmsrl(MTRR_DEF_TYPE_MSR, state->deftype & ~MTRR_DEF_TYPE_EN);
+}
+/* Clean up after adjusting MTRRs, and enable them */ +void mtrr_close(struct mtrr_state *state) +{
wrmsrl(MTRR_DEF_TYPE_MSR, state->deftype | MTRR_DEF_TYPE_EN);
if (state->enable_cache)
enable_caches();
+}
+int mtrr_commit(bool do_caches) +{
struct mtrr_request *req = gd->arch.mtrr_req;
struct mtrr_state state;
uint64_t mask;
int i;
mtrr_open(&state);
for (i = 0; i < gd->arch.mtrr_req_count; i++, req++) {
mask = ~(req->size - 1);
mask &= (1ULL << CONFIG_CPU_ADDR_BITS) - 1;
wrmsrl(MTRR_PHYS_BASE_MSR(i), req->start | req->type);
wrmsrl(MTRR_PHYS_MASK_MSR(i), mask | MTRR_PHYS_MASK_VALID);
}
/* Clear the ones that are unused */
for (; i < MTRR_COUNT; i++)
wrmsrl(MTRR_PHYS_MASK_MSR(i), 0);
mtrr_close(&state);
return 0;
+}
+int mtrr_add_request(int type, uint64_t start, uint64_t size) +{
struct mtrr_request *req;
uint64_t mask;
if (gd->arch.mtrr_req_count == MAX_MTRR_REQUESTS)
return -ENOSPC;
req = &gd->arch.mtrr_req[gd->arch.mtrr_req_count++];
req->type = type;
req->start = start;
req->size = size;
debug("%d: type=%d, %08llx %08llx\n", gd->arch.mtrr_req_count - 1,
req->type, req->start, req->size);
mask = ~(req->size - 1);
mask &= (1ULL << CONFIG_CPU_ADDR_BITS) - 1;
mask |= MTRR_PHYS_MASK_VALID;
debug(" %016llx %016llx\n", req->start | req->type, mask);
return 0;
+} diff --git a/arch/x86/include/asm/global_data.h b/arch/x86/include/asm/global_data.h index 03d491a..15e76f6 100644 --- a/arch/x86/include/asm/global_data.h +++ b/arch/x86/include/asm/global_data.h @@ -29,6 +29,19 @@ struct memory_info { struct memory_area area[CONFIG_NR_DRAM_BANKS]; };
+#define MAX_MTRR_REQUESTS 8
+/**
- A request for a memory region to be set up in a particular way. These
- requests are processed before board_init_r() is called. They are generally
- optional and can be ignored with some performance impact.
- */
+struct mtrr_request {
int type; /* MTRR_TYPE_... */
uint64_t start;
uint64_t size;
+};
/* Architecture-specific global data */ struct arch_global_data { struct global_data *gd_addr; /* Location of Global Data */ @@ -50,6 +63,8 @@ struct arch_global_data { #ifdef CONFIG_HAVE_FSP void *hob_list; /* FSP HOB list */ #endif
struct mtrr_request mtrr_req[MAX_MTRR_REQUESTS];
int mtrr_req_count;
};
#endif diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h index 5f05a48..3c11740 100644 --- a/arch/x86/include/asm/mtrr.h +++ b/arch/x86/include/asm/mtrr.h @@ -9,99 +9,86 @@ #ifndef _ASM_MTRR_H #define _ASM_MTRR_H
-/* These are the region types */ -#define MTRR_TYPE_UNCACHEABLE 0 -#define MTRR_TYPE_WRCOMB 1 -/*#define MTRR_TYPE_ 2*/ -/*#define MTRR_TYPE_ 3*/ -#define MTRR_TYPE_WRTHROUGH 4 -#define MTRR_TYPE_WRPROT 5 -#define MTRR_TYPE_WRBACK 6 -#define MTRR_NUM_TYPES 7
-#define MTRRcap_MSR 0x0fe -#define MTRRdefType_MSR 0x2ff
-#define MTRRdefTypeEn (1 << 11) -#define MTRRdefTypeFixEn (1 << 10)
-#define SMRRphysBase_MSR 0x1f2 -#define SMRRphysMask_MSR 0x1f3
-#define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg)) -#define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1)
-#define MTRRphysMaskValid (1 << 11)
-#define NUM_FIXED_RANGES 88 -#define RANGES_PER_FIXED_MTRR 8 -#define MTRRfix64K_00000_MSR 0x250 -#define MTRRfix16K_80000_MSR 0x258 -#define MTRRfix16K_A0000_MSR 0x259 -#define MTRRfix4K_C0000_MSR 0x268 -#define MTRRfix4K_C8000_MSR 0x269 -#define MTRRfix4K_D0000_MSR 0x26a -#define MTRRfix4K_D8000_MSR 0x26b -#define MTRRfix4K_E0000_MSR 0x26c -#define MTRRfix4K_E8000_MSR 0x26d -#define MTRRfix4K_F0000_MSR 0x26e -#define MTRRfix4K_F8000_MSR 0x26f +/* MTRR region types */ +#define MTRR_TYPE_UNCACHEABLE 0 +#define MTRR_TYPE_WRCOMB 1 +#define MTRR_TYPE_WRTHROUGH 4 +#define MTRR_TYPE_WRPROT 5 +#define MTRR_TYPE_WRBACK 6
+#define MTRR_TYPE_COUNT 7
+#define MTRR_CAP_MSR 0x0fe +#define MTRR_DEF_TYPE_MSR 0x2ff
+#define MTRR_DEF_TYPE_EN (1 << 11) +#define MTRR_DEF_TYPE_FIX_EN (1 << 10)
+#define MTRR_PHYS_BASE_MSR(reg) (0x200 + 2 * (reg)) +#define MTRR_PHYS_MASK_MSR(reg) (0x200 + 2 * (reg) + 1)
+#define MTRR_PHYS_MASK_VALID (1 << 11)
+#define MTRR_BASE_TYPE_MASK 0x7
+/* Number of MTRRs supported */ +#define MTRR_COUNT 8
#if !defined(__ASSEMBLER__)
-/*
- The MTRR code has some side effects that the callers should be aware for.
- The call sequence matters. x86_setup_mtrrs() calls
- x86_setup_fixed_mtrrs_no_enable() then enable_fixed_mtrrs() (equivalent
- of x86_setup_fixed_mtrrs()) then x86_setup_var_mtrrs(). If the callers
- want to call the components of x86_setup_mtrrs() because of other
- rquirements the ordering should still preserved.
- enable_fixed_mtrr() will enable both variable and fixed MTRRs because
- of the nature of the global MTRR enable flag. Therefore, all direct
- or indirect callers of enable_fixed_mtrr() should ensure that the
- variable MTRR MSRs do not contain bad ranges.
- If CONFIG_CACHE_ROM is selected an MTRR is allocated for enabling
- the caching of the ROM. However, it is set to uncacheable (UC). It
- is the responsiblity of the caller to enable it by calling
- x86_mtrr_enable_rom_caching().
+/**
- Information about the previous MTRR state, set up by mtrr_open()
- @deftype: Previous value of MTRR_DEF_TYPE_MSR
*/
- @enable_cache: true if cache was enabled
-void x86_setup_mtrrs(void); -/*
- x86_setup_var_mtrrs() parameters:
- address_bits - number of physical address bits supported by cpu
- above4gb - 2 means dynamically detect number of variable MTRRs available.
non-zero means handle memory ranges above 4GiB.
0 means ignore memory ranges above 4GiB
+struct mtrr_state {
uint64_t deftype;
bool enable_cache;
+};
+/**
- mtrr_open() - Prepare to adjust MTRRs
- Use mtrr_open() passing in a structure - this function will init it. Then
- when done, pass the same structure to mtrr_close() to re-enable MTRRs and
- possibly the cache.
*/
- @state: Empty structure to pass in to hold settings
-void x86_setup_var_mtrrs(unsigned int address_bits, unsigned int above4gb); -void enable_fixed_mtrr(void); -void x86_setup_fixed_mtrrs(void); -/* Set up fixed MTRRs but do not enable them. */ -void x86_setup_fixed_mtrrs_no_enable(void); -int x86_mtrr_check(void); -/* ROM caching can be used after variable MTRRs are set up. Beware that
- enabling CONFIG_CACHE_ROM will eat through quite a few MTRRs based on
- one's IO hole size and WRCOMB resources. Be sure to check the console
- log when enabling CONFIG_CACHE_ROM or adding WRCOMB resources. Beware that
- on CPUs with core-scoped MTRR registers such as hyperthreaded CPUs the
- rom caching will be disabled if all threads run the MTRR code. Therefore,
- one needs to call x86_mtrr_enable_rom_caching() after all threads of the
- same core have run the MTRR code. */
-#if CONFIG_CACHE_ROM -void x86_mtrr_enable_rom_caching(void); -void x86_mtrr_disable_rom_caching(void); -/* Return the variable range MTRR index of the ROM cache. */ -long x86_mtrr_rom_cache_var_index(void); -#else -static inline void x86_mtrr_enable_rom_caching(void) {} -static inline void x86_mtrr_disable_rom_caching(void) {} -static inline long x86_mtrr_rom_cache_var_index(void) { return -1; } -#endif /* CONFIG_CACHE_ROM */ +void mtrr_open(struct mtrr_state *state);
-#endif +/**
- mtrr_open() - Clean up after adjusting MTRRs, and enable them
- This uses the structure containing information returned from mtrr_open().
- @state: Structure from mtrr_open()
- */
+/* */ +void mtrr_close(struct mtrr_state *state);
+/**
- mtrr_add_request() - Add a new MTRR request
- This adds a request for a memory region to be set up in a particular way.
- @type: Requested type (MTRR_TYPE_)
- @start: Start address
- @size: Size
- */
+int mtrr_add_request(int type, uint64_t start, uint64_t size);
+/**
- mtrr_commit() - set up the MTRR registers based on current requests
- This sets up MTRRs for the available DRAM and the requests received so far.
- It must be called with caches disabled.
- @do_caches: true if caches are currently on
- */
+int mtrr_commit(bool do_caches);
-#if !defined(CONFIG_RAMTOP) -# error "CONFIG_RAMTOP not defined" #endif
Can we move the removal of CONFIG_RAMTOP to the patch#2 in this series?
#if ((CONFIG_XIP_ROM_SIZE & (CONFIG_XIP_ROM_SIZE - 1)) != 0) @@ -114,8 +101,4 @@ static inline long x86_mtrr_rom_cache_var_index(void) { return -1; }
#define CACHE_ROM_BASE (((1 << 20) - (CONFIG_CACHE_ROM_SIZE >> 12)) << 12)
-#if (CONFIG_RAMTOP & (CONFIG_RAMTOP - 1)) != 0 -# error "CONFIG_RAMTOP must be a power of 2" -#endif
Ditto.
#endif
Regards, Bin

Set the frame buffer to write-combining. This makes it faster, although for scrolling write-through is even faster for U-Boot.
Signed-off-by: Simon Glass sjg@chromium.org ---
arch/x86/cpu/ivybridge/gma.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/arch/x86/cpu/ivybridge/gma.c b/arch/x86/cpu/ivybridge/gma.c index 125021b..03b2ebd 100644 --- a/arch/x86/cpu/ivybridge/gma.c +++ b/arch/x86/cpu/ivybridge/gma.c @@ -12,6 +12,7 @@ #include <fdtdec.h> #include <pci_rom.h> #include <asm/io.h> +#include <asm/mtrr.h> #include <asm/pci.h> #include <asm/arch/pch.h> #include <asm/arch/sandybridge.h> @@ -738,6 +739,8 @@ int gma_func0_init(pci_dev_t dev, struct pci_controller *hose, const void *blob, int node) { void *gtt_bar; + ulong start; + ulong base; u32 reg32; int ret;
@@ -746,6 +749,11 @@ int gma_func0_init(pci_dev_t dev, struct pci_controller *hose, reg32 |= PCI_COMMAND_MASTER | PCI_COMMAND_MEMORY | PCI_COMMAND_IO; pci_write_config32(dev, PCI_COMMAND, reg32);
+ /* Use write-combining for the graphics memory, 256MB */ + base = pci_read_bar32(hose, dev, 2); + mtrr_add_request(MTRR_TYPE_WRCOMB, base, 256 << 20); + mtrr_commit(true); + gtt_bar = (void *)pci_read_bar32(pci_bus_to_hose(0), dev, 0); debug("GT bar %p\n", gtt_bar); ret = gma_pm_init_pre_vbios(gtt_bar);

Hi Simon,
On Sun, Dec 28, 2014 at 10:20 AM, Simon Glass sjg@chromium.org wrote:
Set the frame buffer to write-combining. This makes it faster, although for scrolling write-through is even faster for U-Boot.
Signed-off-by: Simon Glass sjg@chromium.org
arch/x86/cpu/ivybridge/gma.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/arch/x86/cpu/ivybridge/gma.c b/arch/x86/cpu/ivybridge/gma.c index 125021b..03b2ebd 100644 --- a/arch/x86/cpu/ivybridge/gma.c +++ b/arch/x86/cpu/ivybridge/gma.c @@ -12,6 +12,7 @@ #include <fdtdec.h> #include <pci_rom.h> #include <asm/io.h> +#include <asm/mtrr.h> #include <asm/pci.h> #include <asm/arch/pch.h> #include <asm/arch/sandybridge.h> @@ -738,6 +739,8 @@ int gma_func0_init(pci_dev_t dev, struct pci_controller *hose, const void *blob, int node) { void *gtt_bar;
ulong start;
ulong base; u32 reg32; int ret;
@@ -746,6 +749,11 @@ int gma_func0_init(pci_dev_t dev, struct pci_controller *hose, reg32 |= PCI_COMMAND_MASTER | PCI_COMMAND_MEMORY | PCI_COMMAND_IO; pci_write_config32(dev, PCI_COMMAND, reg32);
/* Use write-combining for the graphics memory, 256MB */
base = pci_read_bar32(hose, dev, 2);
mtrr_add_request(MTRR_TYPE_WRCOMB, base, 256 << 20);
To make the codes more generic, should we decode the memory size via bar instead of hardcoded 256MB?
mtrr_commit(true);
gtt_bar = (void *)pci_read_bar32(pci_bus_to_hose(0), dev, 0); debug("GT bar %p\n", gtt_bar); ret = gma_pm_init_pre_vbios(gtt_bar);
--
Regards, Bin

For bare platforms we turn off ROM-caching before calling board_init_f_r() It is then very slow to copy U-Boot from ROM to RAM. So adjust the order so that the copying happens before we turn off ROM-caching.
Signed-off-by: Simon Glass sjg@chromium.org ---
common/board_f.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/common/board_f.c b/common/board_f.c index 98c9c72..1b65575 100644 --- a/common/board_f.c +++ b/common/board_f.c @@ -983,6 +983,11 @@ static init_fnc_t init_sequence_f[] = { INIT_FUNC_WATCHDOG_RESET reloc_fdt, setup_reloc, +#ifdef CONFIG_X86 + copy_uboot_to_ram, + clear_bss, + do_elf_reloc_fixups, +#endif #if !defined(CONFIG_ARM) && !defined(CONFIG_SANDBOX) jump_to_copy, #endif @@ -1042,9 +1047,6 @@ void board_init_f(ulong boot_flags) */ static init_fnc_t init_sequence_f_r[] = { init_cache_f_r, - copy_uboot_to_ram, - clear_bss, - do_elf_reloc_fixups,
NULL, };

Hi Simon,
On Sun, Dec 28, 2014 at 10:20 AM, Simon Glass sjg@chromium.org wrote:
For bare platforms we turn off ROM-caching before calling board_init_f_r() It is then very slow to copy U-Boot from ROM to RAM. So adjust the order so that the copying happens before we turn off ROM-caching.
Signed-off-by: Simon Glass sjg@chromium.org
common/board_f.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/common/board_f.c b/common/board_f.c index 98c9c72..1b65575 100644 --- a/common/board_f.c +++ b/common/board_f.c @@ -983,6 +983,11 @@ static init_fnc_t init_sequence_f[] = { INIT_FUNC_WATCHDOG_RESET reloc_fdt, setup_reloc, +#ifdef CONFIG_X86
copy_uboot_to_ram,
clear_bss,
do_elf_reloc_fixups,
+#endif #if !defined(CONFIG_ARM) && !defined(CONFIG_SANDBOX) jump_to_copy, #endif @@ -1042,9 +1047,6 @@ void board_init_f(ulong boot_flags) */ static init_fnc_t init_sequence_f_r[] = { init_cache_f_r,
copy_uboot_to_ram,
clear_bss,
do_elf_reloc_fixups, NULL,
};
I don't understand. Isn't the init_cache_f_r() which turns on the cache?
Regards, Bin

Hi Bin,
On 30 December 2014 at 22:51, Bin Meng bmeng.cn@gmail.com wrote:
Hi Simon,
On Sun, Dec 28, 2014 at 10:20 AM, Simon Glass sjg@chromium.org wrote:
For bare platforms we turn off ROM-caching before calling board_init_f_r() It is then very slow to copy U-Boot from ROM to RAM. So adjust the order so that the copying happens before we turn off ROM-caching.
Signed-off-by: Simon Glass sjg@chromium.org
common/board_f.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/common/board_f.c b/common/board_f.c index 98c9c72..1b65575 100644 --- a/common/board_f.c +++ b/common/board_f.c @@ -983,6 +983,11 @@ static init_fnc_t init_sequence_f[] = { INIT_FUNC_WATCHDOG_RESET reloc_fdt, setup_reloc, +#ifdef CONFIG_X86
copy_uboot_to_ram,
clear_bss,
do_elf_reloc_fixups,
+#endif #if !defined(CONFIG_ARM) && !defined(CONFIG_SANDBOX) jump_to_copy, #endif @@ -1042,9 +1047,6 @@ void board_init_f(ulong boot_flags) */ static init_fnc_t init_sequence_f_r[] = { init_cache_f_r,
copy_uboot_to_ram,
clear_bss,
do_elf_reloc_fixups, NULL,
};
I don't understand. Isn't the init_cache_f_r() which turns on the cache?
Yes it turns on the cache, but turns off the ROM cache (they are different things). So this ordering is much faster. I think I might have more tuning to do though.
Regards, Simon

Hi Simon,
On Fri, Jan 2, 2015 at 6:23 AM, Simon Glass sjg@chromium.org wrote:
Hi Bin,
On 30 December 2014 at 22:51, Bin Meng bmeng.cn@gmail.com wrote:
Hi Simon,
On Sun, Dec 28, 2014 at 10:20 AM, Simon Glass sjg@chromium.org wrote:
For bare platforms we turn off ROM-caching before calling board_init_f_r() It is then very slow to copy U-Boot from ROM to RAM. So adjust the order so that the copying happens before we turn off ROM-caching.
Signed-off-by: Simon Glass sjg@chromium.org
common/board_f.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/common/board_f.c b/common/board_f.c index 98c9c72..1b65575 100644 --- a/common/board_f.c +++ b/common/board_f.c @@ -983,6 +983,11 @@ static init_fnc_t init_sequence_f[] = { INIT_FUNC_WATCHDOG_RESET reloc_fdt, setup_reloc, +#ifdef CONFIG_X86
copy_uboot_to_ram,
clear_bss,
do_elf_reloc_fixups,
+#endif #if !defined(CONFIG_ARM) && !defined(CONFIG_SANDBOX) jump_to_copy, #endif @@ -1042,9 +1047,6 @@ void board_init_f(ulong boot_flags) */ static init_fnc_t init_sequence_f_r[] = { init_cache_f_r,
copy_uboot_to_ram,
clear_bss,
do_elf_reloc_fixups, NULL,
};
I don't understand. Isn't the init_cache_f_r() which turns on the cache?
Yes it turns on the cache, but turns off the ROM cache (they are different things). So this ordering is much faster. I think I might have more tuning to do though.
Got it. Since we moved these 3 routines from init_sequence_f_r[] to init_sequence_f[], how about we remove the whole init_sequence_f_r[] completely? If not possible, the comment block above init_sequence_f_r[] needs to be fixed?
* * The '_f_r' sequence must, as a minimum, copy U-Boot to RAM (if * supported). It _should_, if possible, copy global data to RAM and * initialise the CPU caches (to speed up the relocation process) * * NOTE: At present only x86 uses this route, but it is intended that * all archs will move to this when generic relocation is implemented. */
So looks that we should either drop this _f_r stage, or make other arch use this _f_r?
Regards, Bin

Hi Bin,
On 3 January 2015 at 20:26, Bin Meng bmeng.cn@gmail.com wrote:
Hi Simon,
On Fri, Jan 2, 2015 at 6:23 AM, Simon Glass sjg@chromium.org wrote:
Hi Bin,
On 30 December 2014 at 22:51, Bin Meng bmeng.cn@gmail.com wrote:
Hi Simon,
On Sun, Dec 28, 2014 at 10:20 AM, Simon Glass sjg@chromium.org wrote:
For bare platforms we turn off ROM-caching before calling board_init_f_r() It is then very slow to copy U-Boot from ROM to RAM. So adjust the order so that the copying happens before we turn off ROM-caching.
Signed-off-by: Simon Glass sjg@chromium.org
common/board_f.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/common/board_f.c b/common/board_f.c index 98c9c72..1b65575 100644 --- a/common/board_f.c +++ b/common/board_f.c @@ -983,6 +983,11 @@ static init_fnc_t init_sequence_f[] = { INIT_FUNC_WATCHDOG_RESET reloc_fdt, setup_reloc, +#ifdef CONFIG_X86
copy_uboot_to_ram,
clear_bss,
do_elf_reloc_fixups,
+#endif #if !defined(CONFIG_ARM) && !defined(CONFIG_SANDBOX) jump_to_copy, #endif @@ -1042,9 +1047,6 @@ void board_init_f(ulong boot_flags) */ static init_fnc_t init_sequence_f_r[] = { init_cache_f_r,
copy_uboot_to_ram,
clear_bss,
do_elf_reloc_fixups, NULL,
};
I don't understand. Isn't the init_cache_f_r() which turns on the cache?
Yes it turns on the cache, but turns off the ROM cache (they are different things). So this ordering is much faster. I think I might have more tuning to do though.
Got it. Since we moved these 3 routines from init_sequence_f_r[] to init_sequence_f[], how about we remove the whole init_sequence_f_r[] completely? If not possible, the comment block above init_sequence_f_r[] needs to be fixed?
I'll remove the comment.
- The '_f_r' sequence must, as a minimum, copy U-Boot to RAM (if
- supported). It _should_, if possible, copy global data to RAM and
- initialise the CPU caches (to speed up the relocation process)
- NOTE: At present only x86 uses this route, but it is intended that
- all archs will move to this when generic relocation is implemented.
*/
So looks that we should either drop this _f_r stage, or make other arch use this _f_r?
I think we should move to generic relocation, meaning that all archs do relocation the same way with the same code. Then only arch-specific stuff is the then ELF fixup function, which adjusts a relocation site, and the code to actually change the stack pointer.
This board_init_f_r() code is part of one attempt to do this - the premise was that turning the cache on before relocating U-Boot will save time. That's true, but it would be even better to turn the cache on much earlier. With pit (Chromebook 2) we turn on the cache in SPL. So I think turning it on here is too late if we are trying to save time. Still it is a good start and if we could make it happen generally it would be nice.
See here for my original attempt, which was tied up with generic board. In the end I untied them and we got one but not the other.
http://lists.denx.de/pipermail/u-boot/2012-February/118409.html
Since then Albert has tidied up ARM start.S a lot which makes this much easier.
So that's the background. One of these days I might take another look at it if it doesn't get someone's attention.
Regards, Simon

Hi Simon,
On Mon, Jan 5, 2015 at 9:44 AM, Simon Glass sjg@chromium.org wrote:
Hi Bin,
On 3 January 2015 at 20:26, Bin Meng bmeng.cn@gmail.com wrote:
Hi Simon,
On Fri, Jan 2, 2015 at 6:23 AM, Simon Glass sjg@chromium.org wrote:
Hi Bin,
On 30 December 2014 at 22:51, Bin Meng bmeng.cn@gmail.com wrote:
Hi Simon,
On Sun, Dec 28, 2014 at 10:20 AM, Simon Glass sjg@chromium.org wrote:
For bare platforms we turn off ROM-caching before calling board_init_f_r() It is then very slow to copy U-Boot from ROM to RAM. So adjust the order so that the copying happens before we turn off ROM-caching.
Signed-off-by: Simon Glass sjg@chromium.org
common/board_f.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/common/board_f.c b/common/board_f.c index 98c9c72..1b65575 100644 --- a/common/board_f.c +++ b/common/board_f.c @@ -983,6 +983,11 @@ static init_fnc_t init_sequence_f[] = { INIT_FUNC_WATCHDOG_RESET reloc_fdt, setup_reloc, +#ifdef CONFIG_X86
copy_uboot_to_ram,
clear_bss,
do_elf_reloc_fixups,
+#endif #if !defined(CONFIG_ARM) && !defined(CONFIG_SANDBOX) jump_to_copy, #endif @@ -1042,9 +1047,6 @@ void board_init_f(ulong boot_flags) */ static init_fnc_t init_sequence_f_r[] = { init_cache_f_r,
copy_uboot_to_ram,
clear_bss,
do_elf_reloc_fixups, NULL,
};
I don't understand. Isn't the init_cache_f_r() which turns on the cache?
Yes it turns on the cache, but turns off the ROM cache (they are different things). So this ordering is much faster. I think I might have more tuning to do though.
Got it. Since we moved these 3 routines from init_sequence_f_r[] to init_sequence_f[], how about we remove the whole init_sequence_f_r[] completely? If not possible, the comment block above init_sequence_f_r[] needs to be fixed?
I'll remove the comment.
- The '_f_r' sequence must, as a minimum, copy U-Boot to RAM (if
- supported). It _should_, if possible, copy global data to RAM and
- initialise the CPU caches (to speed up the relocation process)
- NOTE: At present only x86 uses this route, but it is intended that
- all archs will move to this when generic relocation is implemented.
*/
So looks that we should either drop this _f_r stage, or make other arch use this _f_r?
I think we should move to generic relocation, meaning that all archs do relocation the same way with the same code. Then only arch-specific stuff is the then ELF fixup function, which adjusts a relocation site, and the code to actually change the stack pointer.
This board_init_f_r() code is part of one attempt to do this - the premise was that turning the cache on before relocating U-Boot will save time. That's true, but it would be even better to turn the cache on much earlier. With pit (Chromebook 2) we turn on the cache in SPL. So I think turning it on here is too late if we are trying to save time. Still it is a good start and if we could make it happen generally it would be nice.
See here for my original attempt, which was tied up with generic board. In the end I untied them and we got one but not the other.
http://lists.denx.de/pipermail/u-boot/2012-February/118409.html
Since then Albert has tidied up ARM start.S a lot which makes this much easier.
So that's the background. One of these days I might take another look at it if it doesn't get someone's attention.
Regards, Simon
Thanks for the background information. I will take a look. Hope we can achieve generic board support as soon as we can.
Regards, Bin

Hi Simon & Bin,
On Tue, Jan 6, 2015 at 12:40 AM, Bin Meng bmeng.cn@gmail.com wrote:
Hi Simon,
On Mon, Jan 5, 2015 at 9:44 AM, Simon Glass sjg@chromium.org wrote:
Hi Bin,
On 3 January 2015 at 20:26, Bin Meng bmeng.cn@gmail.com wrote:
Hi Simon,
On Fri, Jan 2, 2015 at 6:23 AM, Simon Glass sjg@chromium.org wrote:
Hi Bin,
On 30 December 2014 at 22:51, Bin Meng bmeng.cn@gmail.com wrote:
Hi Simon,
On Sun, Dec 28, 2014 at 10:20 AM, Simon Glass sjg@chromium.org
wrote:
For bare platforms we turn off ROM-caching before calling
board_init_f_r()
It is then very slow to copy U-Boot from ROM to RAM. So adjust the
order so
that the copying happens before we turn off ROM-caching.
Signed-off-by: Simon Glass sjg@chromium.org
common/board_f.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/common/board_f.c b/common/board_f.c index 98c9c72..1b65575 100644 --- a/common/board_f.c +++ b/common/board_f.c @@ -983,6 +983,11 @@ static init_fnc_t init_sequence_f[] = { INIT_FUNC_WATCHDOG_RESET reloc_fdt, setup_reloc, +#ifdef CONFIG_X86
copy_uboot_to_ram,
clear_bss,
do_elf_reloc_fixups,
+#endif #if !defined(CONFIG_ARM) && !defined(CONFIG_SANDBOX) jump_to_copy, #endif @@ -1042,9 +1047,6 @@ void board_init_f(ulong boot_flags) */ static init_fnc_t init_sequence_f_r[] = { init_cache_f_r,
copy_uboot_to_ram,
clear_bss,
do_elf_reloc_fixups, NULL,
};
Wow, doesn't this bring back some memories. I've had a look over this code as it is now and it took a while to sink in. Things have moved on in the past 2 years :)
I don't understand. Isn't the init_cache_f_r() which turns on the
cache?
Yes it turns on the cache, but turns off the ROM cache (they are different things). So this ordering is much faster. I think I might have more tuning to do though.
Got it. Since we moved these 3 routines from init_sequence_f_r[] to init_sequence_f[], how about we remove the whole init_sequence_f_r[] completely? If not possible, the comment block above init_sequence_f_r[] needs to be fixed?
I'll remove the comment.
I think init_sequence_f_r can go... but I need to have a better look at the code
If turning off the ROM cache by init_cache_f_r is the problem, then perhaps the following order would be better:
copy_uboot_to_ram, init_cache_f_r, clear_bss, do_elf_reloc_fixups,
Without enabling the CPU's cache, clear_bss and do_elf_reloc_fixups will suffer.
- The '_f_r' sequence must, as a minimum, copy U-Boot to RAM (if
- supported). It _should_, if possible, copy global data to RAM and
- initialise the CPU caches (to speed up the relocation process)
- NOTE: At present only x86 uses this route, but it is intended that
- all archs will move to this when generic relocation is implemented.
*/
So looks that we should either drop this _f_r stage, or make other arch use this _f_r?
I think we should move to generic relocation, meaning that all archs do relocation the same way with the same code. Then only arch-specific stuff is the then ELF fixup function, which adjusts a relocation site, and the code to actually change the stack pointer.
This was always my plan - have arch specific do_reloc_fixups and the rest would be generic
This board_init_f_r() code is part of one attempt to do this - the premise was that turning the cache on before relocating U-Boot will save time. That's true, but it would be even better to turn the cache on much earlier. With pit (Chromebook 2) we turn on the cache in SPL. So I think turning it on here is too late if we are trying to save time. Still it is a good start and if we could make it happen generally it would be nice.
And now you have me thinking board_init_f_r is not needed at all...
If we can setup the stack, clear BSS, copy U-Boot to RAM and perform relocation fixups while running from ROM, what is left for board_init_f_r to do?
The only thing I can think of is the caveats mentioned in the comment ('static' variables are read-only / Global Data (gd->xxx) is read/write). But aren't these true when running from ROM?
Looking at non-x86 arches, relocate_code() looks to do what copy_uboot_to_ram and clear_bss does in x86 land (not sure about do_elf_reloc_fixups - seems to be arch specific as to whether relocate_code() does this or it is done later (hence the CONFIG_NEEDS_MANUAL_RELOC #define?)
See here for my original attempt, which was tied up with generic board. In the end I untied them and we got one but not the other.
http://lists.denx.de/pipermail/u-boot/2012-February/118409.html
Ah, generic relocation... I really wish my life hadn't taken a hard-left turn when it did. We got so close.
From where I'm looking (fresh eyes - I might be missing something big) we
should be able to do the ROM->RAM copy, BSS clearing, and relocation fixups in board_init_f.
When we return from board_init_f it should be a fairly simple matter of: - Copying the contents of the Global Data structure from CAR to RAM (or from RAM to RAM if your on a platform which initialises RAM before U-Boot) - Set the gd pointer (reserved register) to point to the new copy - Figure out where board_init_r is and jump to it
board_init_r should be able to do any remaining cache tweaks. If cache tweaks cannot be done while executing from RAM then they need to be done in board_init_f
I just cannot see the point of board_init_f_r any more
Since then Albert has tidied up ARM start.S a lot which makes this much
easier.
So that's the background. One of these days I might take another look at it if it doesn't get someone's attention.
Oh dear - it looks like I just put my hand up :)
Regards, Simon
Thanks for the background information. I will take a look. Hope we can achieve generic board support as soon as we can.
Regards, Bin
Regards,
Graeme

Hi,
On 5 January 2015 at 10:22, Graeme Russ gruss@tss-engineering.com wrote:
Hi Simon & Bin,
On Tue, Jan 6, 2015 at 12:40 AM, Bin Meng bmeng.cn@gmail.com wrote:
Hi Simon,
On Mon, Jan 5, 2015 at 9:44 AM, Simon Glass sjg@chromium.org wrote:
Hi Bin,
On 3 January 2015 at 20:26, Bin Meng bmeng.cn@gmail.com wrote:
Hi Simon,
On Fri, Jan 2, 2015 at 6:23 AM, Simon Glass sjg@chromium.org wrote:
Hi Bin,
On 30 December 2014 at 22:51, Bin Meng bmeng.cn@gmail.com wrote:
Hi Simon,
On Sun, Dec 28, 2014 at 10:20 AM, Simon Glass sjg@chromium.org wrote: > For bare platforms we turn off ROM-caching before calling > board_init_f_r() > It is then very slow to copy U-Boot from ROM to RAM. So adjust the > order so > that the copying happens before we turn off ROM-caching. > > Signed-off-by: Simon Glass sjg@chromium.org > --- > > common/board_f.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/common/board_f.c b/common/board_f.c > index 98c9c72..1b65575 100644 > --- a/common/board_f.c > +++ b/common/board_f.c > @@ -983,6 +983,11 @@ static init_fnc_t init_sequence_f[] = { > INIT_FUNC_WATCHDOG_RESET > reloc_fdt, > setup_reloc, > +#ifdef CONFIG_X86 > + copy_uboot_to_ram, > + clear_bss, > + do_elf_reloc_fixups, > +#endif > #if !defined(CONFIG_ARM) && !defined(CONFIG_SANDBOX) > jump_to_copy, > #endif > @@ -1042,9 +1047,6 @@ void board_init_f(ulong boot_flags) > */ > static init_fnc_t init_sequence_f_r[] = { > init_cache_f_r, > - copy_uboot_to_ram, > - clear_bss, > - do_elf_reloc_fixups, > > NULL, > }; > --
Wow, doesn't this bring back some memories. I've had a look over this code as it is now and it took a while to sink in. Things have moved on in the past 2 years :)
Nice to hear from you again!
I don't understand. Isn't the init_cache_f_r() which turns on the cache?
Yes it turns on the cache, but turns off the ROM cache (they are different things). So this ordering is much faster. I think I might have more tuning to do though.
Got it. Since we moved these 3 routines from init_sequence_f_r[] to init_sequence_f[], how about we remove the whole init_sequence_f_r[] completely? If not possible, the comment block above init_sequence_f_r[] needs to be fixed?
I'll remove the comment.
I think init_sequence_f_r can go... but I need to have a better look at the code
If turning off the ROM cache by init_cache_f_r is the problem, then perhaps the following order would be better:
copy_uboot_to_ram, init_cache_f_r, clear_bss, do_elf_reloc_fixups,
Without enabling the CPU's cache, clear_bss and do_elf_reloc_fixups will suffer.
Better would be to have init_cache_f_r after all this I think.
- The '_f_r' sequence must, as a minimum, copy U-Boot to RAM (if
- supported). It _should_, if possible, copy global data to RAM and
- initialise the CPU caches (to speed up the relocation process)
- NOTE: At present only x86 uses this route, but it is intended that
- all archs will move to this when generic relocation is implemented.
*/
So looks that we should either drop this _f_r stage, or make other arch use this _f_r?
I think we should move to generic relocation, meaning that all archs do relocation the same way with the same code. Then only arch-specific stuff is the then ELF fixup function, which adjusts a relocation site, and the code to actually change the stack pointer.
This was always my plan - have arch specific do_reloc_fixups and the rest would be generic
This board_init_f_r() code is part of one attempt to do this - the premise was that turning the cache on before relocating U-Boot will save time. That's true, but it would be even better to turn the cache on much earlier. With pit (Chromebook 2) we turn on the cache in SPL. So I think turning it on here is too late if we are trying to save time. Still it is a good start and if we could make it happen generally it would be nice.
And now you have me thinking board_init_f_r is not needed at all...
If we can setup the stack, clear BSS, copy U-Boot to RAM and perform relocation fixups while running from ROM, what is left for board_init_f_r to do?
The only thing I can think of is the caveats mentioned in the comment ('static' variables are read-only / Global Data (gd->xxx) is read/write). But aren't these true when running from ROM?
Looking at non-x86 arches, relocate_code() looks to do what copy_uboot_to_ram and clear_bss does in x86 land (not sure about do_elf_reloc_fixups - seems to be arch specific as to whether relocate_code() does this or it is done later (hence the CONFIG_NEEDS_MANUAL_RELOC #define?)
Yes this can be unified. There is still something in there though for board_init_f_r(), at least as things are now. It just so happens on x86 that we are running from ROM and CAR so as I understand it we sort-of have the cache on before relocation. That doesn't apply for coreboot though, so there is probably an optimisation to be made somewhere.
See here for my original attempt, which was tied up with generic board. In the end I untied them and we got one but not the other.
http://lists.denx.de/pipermail/u-boot/2012-February/118409.html
Ah, generic relocation... I really wish my life hadn't taken a hard-left turn when it did. We got so close.
From where I'm looking (fresh eyes - I might be missing something big) we should be able to do the ROM->RAM copy, BSS clearing, and relocation fixups in board_init_f.
When we return from board_init_f it should be a fairly simple matter of:
- Copying the contents of the Global Data structure from CAR to RAM (or
from RAM to RAM if your on a platform which initialises RAM before U-Boot)
- Set the gd pointer (reserved register) to point to the new copy
- Figure out where board_init_r is and jump to it
board_init_r should be able to do any remaining cache tweaks. If cache tweaks cannot be done while executing from RAM then they need to be done in board_init_f
I just cannot see the point of board_init_f_r any more
Yes, it's hard to see, I'm not sure either.
Anyway I'm going to apply this patch while we figure out what further work can be done.
Regards, Simon

Hi Simon,
On 06/01/15 04:33, Simon Glass wrote:
Nice to hear from you again!
Thanks - Good to be back in the game
> I don't understand. Isn't the init_cache_f_r() which turns on the > cache? >
Yes it turns on the cache, but turns off the ROM cache (they are different things). So this ordering is much faster. I think I might have more tuning to do though.
Got it. Since we moved these 3 routines from init_sequence_f_r[] to init_sequence_f[], how about we remove the whole init_sequence_f_r[] completely? If not possible, the comment block above init_sequence_f_r[] needs to be fixed?
I'll remove the comment.
I think init_sequence_f_r can go... but I need to have a better look at the code
If turning off the ROM cache by init_cache_f_r is the problem, then perhaps the following order would be better:
copy_uboot_to_ram, init_cache_f_r, clear_bss, do_elf_reloc_fixups,
Without enabling the CPU's cache, clear_bss and do_elf_reloc_fixups will suffer.
Better would be to have init_cache_f_r after all this I think.
Why so? After copying the U-Boot code from ROM to RAM, we would want RAM based operations to be as fast as possible. By this point, we have passed most of the XIP code, so XIP performance _should_ be a non-issue.
This board_init_f_r() code is part of one attempt to do this - the premise was that turning the cache on before relocating U-Boot will save time. That's true, but it would be even better to turn the cache on much earlier. With pit (Chromebook 2) we turn on the cache in SPL. So I think turning it on here is too late if we are trying to save time. Still it is a good start and if we could make it happen generally it would be nice.
And now you have me thinking board_init_f_r is not needed at all...
If we can setup the stack, clear BSS, copy U-Boot to RAM and perform relocation fixups while running from ROM, what is left for board_init_f_r to do?
The only thing I can think of is the caveats mentioned in the comment ('static' variables are read-only / Global Data (gd->xxx) is read/write). But aren't these true when running from ROM?
Looking at non-x86 arches, relocate_code() looks to do what copy_uboot_to_ram and clear_bss does in x86 land (not sure about do_elf_reloc_fixups - seems to be arch specific as to whether relocate_code() does this or it is done later (hence the CONFIG_NEEDS_MANUAL_RELOC #define?)
Yes this can be unified. There is still something in there though for board_init_f_r(), at least as things are now. It just so happens on x86 that we are running from ROM and CAR so as I understand it we sort-of have the cache on before relocation. That doesn't apply for coreboot though, so there is probably an optimisation to be made somewhere.
U-Boot runs from RAM right from the start in coreboot correct? In which case, coreboot builds can probably do away with all the relocation code I assume coreboot can relocate U-Boot to any place in memory. Last time I looked, coreboot had some ELF processing code.
See here for my original attempt, which was tied up with generic board. In the end I untied them and we got one but not the other.
http://lists.denx.de/pipermail/u-boot/2012-February/118409.html
Ah, generic relocation... I really wish my life hadn't taken a hard-left turn when it did. We got so close.
From where I'm looking (fresh eyes - I might be missing something big) we should be able to do the ROM->RAM copy, BSS clearing, and relocation fixups in board_init_f.
When we return from board_init_f it should be a fairly simple matter of:
- Copying the contents of the Global Data structure from CAR to RAM (or
from RAM to RAM if your on a platform which initialises RAM before U-Boot)
- Set the gd pointer (reserved register) to point to the new copy
- Figure out where board_init_r is and jump to it
board_init_r should be able to do any remaining cache tweaks. If cache tweaks cannot be done while executing from RAM then they need to be done in board_init_f
I just cannot see the point of board_init_f_r any more
Yes, it's hard to see, I'm not sure either.
Anyway I'm going to apply this patch while we figure out what further work can be done.
Sounds good to me.
The pity is that I don't have ANY hardware to test any of this any more, and it looks like any development I do in the foreseeable future will be on ARM. So hacking and testing x86 might be a bit of a problem
Regards,
Graeme

Hi Graeme,
On 5 January 2015 at 10:46, Graeme Russ graeme.russ@gmail.com wrote:
Hi Simon,
On 06/01/15 04:33, Simon Glass wrote:
Nice to hear from you again!
Thanks - Good to be back in the game
>> I don't understand. Isn't the init_cache_f_r() which turns on the >> cache? >> > > Yes it turns on the cache, but turns off the ROM cache (they are > different things). So this ordering is much faster. I think I might > have more tuning to do though. >
Got it. Since we moved these 3 routines from init_sequence_f_r[] to init_sequence_f[], how about we remove the whole init_sequence_f_r[] completely? If not possible, the comment block above init_sequence_f_r[] needs to be fixed?
I'll remove the comment.
I think init_sequence_f_r can go... but I need to have a better look at the code
If turning off the ROM cache by init_cache_f_r is the problem, then perhaps the following order would be better:
copy_uboot_to_ram, init_cache_f_r, clear_bss, do_elf_reloc_fixups,
Without enabling the CPU's cache, clear_bss and do_elf_reloc_fixups will suffer.
Better would be to have init_cache_f_r after all this I think.
Why so? After copying the U-Boot code from ROM to RAM, we would want RAM based operations to be as fast as possible. By this point, we have passed most of the XIP code, so XIP performance _should_ be a non-issue.
Well I figure that the code is still running from ROM, and with the ROM cache off this is almost exquisitely slow. On the other hand we don't access the same DRAM location twice when clearing BSS and fixing up relocations.
That said, I have not actually benchmarked this.
This board_init_f_r() code is part of one attempt to do this - the premise was that turning the cache on before relocating U-Boot will save time. That's true, but it would be even better to turn the cache on much earlier. With pit (Chromebook 2) we turn on the cache in SPL. So I think turning it on here is too late if we are trying to save time. Still it is a good start and if we could make it happen generally it would be nice.
And now you have me thinking board_init_f_r is not needed at all...
If we can setup the stack, clear BSS, copy U-Boot to RAM and perform relocation fixups while running from ROM, what is left for board_init_f_r to do?
The only thing I can think of is the caveats mentioned in the comment ('static' variables are read-only / Global Data (gd->xxx) is read/write). But aren't these true when running from ROM?
Looking at non-x86 arches, relocate_code() looks to do what copy_uboot_to_ram and clear_bss does in x86 land (not sure about do_elf_reloc_fixups - seems to be arch specific as to whether relocate_code() does this or it is done later (hence the CONFIG_NEEDS_MANUAL_RELOC #define?)
Yes this can be unified. There is still something in there though for board_init_f_r(), at least as things are now. It just so happens on x86 that we are running from ROM and CAR so as I understand it we sort-of have the cache on before relocation. That doesn't apply for coreboot though, so there is probably an optimisation to be made somewhere.
U-Boot runs from RAM right from the start in coreboot correct? In which case, coreboot builds can probably do away with all the relocation code I assume coreboot can relocate U-Boot to any place in memory. Last time I looked, coreboot had some ELF processing code.
Sure but U-Boot expects to relocate, and wants to decide where to put itself (e.g. allowing room for other things). We use the 'flat binary' method for Coreboot, so U-Boot is not actually an ELF image.
I think there are some changes that could be made to speed things up with Coreboot (e.g. leaving the cache on).
See here for my original attempt, which was tied up with generic board. In the end I untied them and we got one but not the other.
http://lists.denx.de/pipermail/u-boot/2012-February/118409.html
Ah, generic relocation... I really wish my life hadn't taken a hard-left turn when it did. We got so close.
From where I'm looking (fresh eyes - I might be missing something big) we should be able to do the ROM->RAM copy, BSS clearing, and relocation fixups in board_init_f.
When we return from board_init_f it should be a fairly simple matter of:
- Copying the contents of the Global Data structure from CAR to RAM (or
from RAM to RAM if your on a platform which initialises RAM before U-Boot)
- Set the gd pointer (reserved register) to point to the new copy
- Figure out where board_init_r is and jump to it
board_init_r should be able to do any remaining cache tweaks. If cache tweaks cannot be done while executing from RAM then they need to be done in board_init_f
I just cannot see the point of board_init_f_r any more
Yes, it's hard to see, I'm not sure either.
Anyway I'm going to apply this patch while we figure out what further work can be done.
Sounds good to me.
The pity is that I don't have ANY hardware to test any of this any more, and it looks like any development I do in the foreseeable future will be on ARM. So hacking and testing x86 might be a bit of a problem
ARM is a bit saner anyway :-) The binary blob thing on x86 is painful. I still do most stuff on ARM. But it's been interesting for me learning about x86 again after all these years.
Regards, Simon

We should use MTRRs to speed up execution. Add a list of MTRR requests which will dealt with when we relocate and run from RAM.
We set RAM as cacheable (with write-back) and registers as non-cacheable.
Signed-off-by: Simon Glass sjg@chromium.org ---
arch/x86/cpu/ivybridge/sdram.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/arch/x86/cpu/ivybridge/sdram.c b/arch/x86/cpu/ivybridge/sdram.c index b95e781..9504735 100644 --- a/arch/x86/cpu/ivybridge/sdram.c +++ b/arch/x86/cpu/ivybridge/sdram.c @@ -17,6 +17,7 @@ #include <asm/processor.h> #include <asm/gpio.h> #include <asm/global_data.h> +#include <asm/mtrr.h> #include <asm/pci.h> #include <asm/arch/me.h> #include <asm/arch/pei_data.h> @@ -430,6 +431,15 @@ static int sdram_find(pci_dev_t dev) add_memory_area(info, (2 << 28) + (2 << 20), 4 << 28); add_memory_area(info, (4 << 28) + (2 << 20), tseg_base); add_memory_area(info, 1ULL << 32, touud); + + /* Add MTRRs for memory */ + mtrr_add_request(MTRR_TYPE_WRBACK, 0, 2ULL << 30); + mtrr_add_request(MTRR_TYPE_WRBACK, 2ULL << 30, 512 << 20); + mtrr_add_request(MTRR_TYPE_WRBACK, 0xaULL << 28, 256 << 20); + mtrr_add_request(MTRR_TYPE_UNCACHEABLE, tseg_base, 16 << 20); + mtrr_add_request(MTRR_TYPE_UNCACHEABLE, tseg_base + (16 << 20), + 32 << 20); + /* * If >= 4GB installed then memory from TOLUD to 4GB * is remapped above TOM, TOUUD will account for both

Once we stop running from ROM we should set up the MTTRs to speed up execution. This is only needed for platforms that don't have an FSP. Also in the Coreboot case, the MTRRs are set up for us.
Signed-off-by: Simon Glass sjg@chromium.org ---
arch/x86/lib/init_helpers.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/arch/x86/lib/init_helpers.c b/arch/x86/lib/init_helpers.c index be4eb12..fc211d9 100644 --- a/arch/x86/lib/init_helpers.c +++ b/arch/x86/lib/init_helpers.c @@ -7,6 +7,7 @@ #include <common.h> #include <fdtdec.h> #include <spi.h> +#include <asm/mtrr.h> #include <asm/sections.h>
DECLARE_GLOBAL_DATA_PTR; @@ -66,6 +67,13 @@ int calculate_relocation_address(void)
int init_cache_f_r(void) { +#if defined(CONFIG_X86_RESET_VECTOR) & !defined(CONFIG_HAVE_FSP) + int ret; + + ret = mtrr_commit(false); + if (ret) + return ret; +#endif /* Initialise the CPU cache(s) */ return init_cache(); }

Cache-as-RAM should be turned off when we relocate since we want to run from RAM. Add a function to perform this task.
Signed-off-by: Simon Glass sjg@chromium.org ---
arch/x86/cpu/ivybridge/car.S | 52 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+)
diff --git a/arch/x86/cpu/ivybridge/car.S b/arch/x86/cpu/ivybridge/car.S index 72b22ea..6e7e1e4 100644 --- a/arch/x86/cpu/ivybridge/car.S +++ b/arch/x86/cpu/ivybridge/car.S @@ -163,6 +163,58 @@ wait_for_sipi: /* return */ jmp car_init_ret
+.globl car_uninit +car_uninit: + /* Disable cache */ + movl %cr0, %eax + orl $X86_CR0_CD, %eax + movl %eax, %cr0 + + /* Disable MTRRs */ + movl $MTRR_DEF_TYPE_MSR, %ecx + rdmsr + andl $(~MTRR_DEF_TYPE_EN), %eax + wrmsr + + /* Disable the no-eviction run state */ + movl NOEVICTMOD_MSR, %ecx + rdmsr + andl $~2, %eax + wrmsr + + invd + + /* Disable the no-eviction mode */ + rdmsr + andl $~1, %eax + wrmsr + +#ifdef CONFIG_CACHE_MRC_BIN + /* Clear the MTRR that was used to cache MRC */ + xorl %eax, %eax + xorl %edx, %edx + movl $MTRR_PHYS_BASE_MSR(2), %ecx + wrmsr + movl $MTRR_PHYS_MASK_MSR(2), %ecx + wrmsr +#endif + + /* Enable MTRRs */ + movl $MTRR_DEF_TYPE_MSR, %ecx + rdmsr + orl $MTRR_DEF_TYPE_EN, %eax + wrmsr + + invd + + /* Return to the caller */ + jmp *%ebx + +.Lhlt: + post_code(0xee) + hlt + jmp .Lhlt + mtrr_table: /* Fixed MTRRs */ .word 0x250, 0x258, 0x259

Hi Simon,
On Sun, Dec 28, 2014 at 10:20 AM, Simon Glass sjg@chromium.org wrote:
Cache-as-RAM should be turned off when we relocate since we want to run from RAM. Add a function to perform this task.
Signed-off-by: Simon Glass sjg@chromium.org
arch/x86/cpu/ivybridge/car.S | 52 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+)
diff --git a/arch/x86/cpu/ivybridge/car.S b/arch/x86/cpu/ivybridge/car.S index 72b22ea..6e7e1e4 100644 --- a/arch/x86/cpu/ivybridge/car.S +++ b/arch/x86/cpu/ivybridge/car.S @@ -163,6 +163,58 @@ wait_for_sipi: /* return */ jmp car_init_ret
+.globl car_uninit +car_uninit:
/* Disable cache */
movl %cr0, %eax
orl $X86_CR0_CD, %eax
movl %eax, %cr0
/* Disable MTRRs */
movl $MTRR_DEF_TYPE_MSR, %ecx
rdmsr
andl $(~MTRR_DEF_TYPE_EN), %eax
wrmsr
/* Disable the no-eviction run state */
movl NOEVICTMOD_MSR, %ecx
rdmsr
andl $~2, %eax
wrmsr
invd
/* Disable the no-eviction mode */
rdmsr
andl $~1, %eax
wrmsr
+#ifdef CONFIG_CACHE_MRC_BIN
/* Clear the MTRR that was used to cache MRC */
xorl %eax, %eax
xorl %edx, %edx
movl $MTRR_PHYS_BASE_MSR(2), %ecx
wrmsr
movl $MTRR_PHYS_MASK_MSR(2), %ecx
wrmsr
+#endif
/* Enable MTRRs */
movl $MTRR_DEF_TYPE_MSR, %ecx
rdmsr
orl $MTRR_DEF_TYPE_EN, %eax
wrmsr
invd
/* Return to the caller */
jmp *%ebx
+.Lhlt:
post_code(0xee)
hlt
jmp .Lhlt
I don't see any codes could jump to this 4 lines above. Remove it?
mtrr_table: /* Fixed MTRRs */ .word 0x250, 0x258, 0x259 --
Regards, Bin

For platforms with CAR we should disable it before relocation. Check if this function is available and call it if so.
Signed-off-by: Simon Glass sjg@chromium.org ---
arch/x86/cpu/start.S | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/arch/x86/cpu/start.S b/arch/x86/cpu/start.S index 125782c..8cebde1 100644 --- a/arch/x86/cpu/start.S +++ b/arch/x86/cpu/start.S @@ -205,6 +205,16 @@ board_init_f_r_trampoline: /* Setup global descriptor table so gd->xyz works */ call setup_gdt
+ /* Set if we need to disable CAR */ + movl $car_uninit, %eax + cmpl $0, %eax + jz car_ret + + /* Pass return address in ebx */ +.weak car_uninit + movl $car_ret, %ebx + jmp car_uninit +car_ret: /* Re-enter U-Boot by calling board_init_f_r */ call board_init_f_r

Hi Simon,
On Sun, Dec 28, 2014 at 10:20 AM, Simon Glass sjg@chromium.org wrote:
For platforms with CAR we should disable it before relocation. Check if this function is available and call it if so.
Signed-off-by: Simon Glass sjg@chromium.org
arch/x86/cpu/start.S | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/arch/x86/cpu/start.S b/arch/x86/cpu/start.S index 125782c..8cebde1 100644 --- a/arch/x86/cpu/start.S +++ b/arch/x86/cpu/start.S @@ -205,6 +205,16 @@ board_init_f_r_trampoline: /* Setup global descriptor table so gd->xyz works */ call setup_gdt
/* Set if we need to disable CAR */
movl $car_uninit, %eax
cmpl $0, %eax
jz car_ret
/* Pass return address in ebx */
+.weak car_uninit
movl $car_ret, %ebx
jmp car_uninit
Can we use 'call' here instead of jmp and %ebx as the return address?
+car_ret:
car_uninit_ret
/* Re-enter U-Boot by calling board_init_f_r */ call board_init_f_r
--
Regards, Bin

Hi Bin,
On 30 December 2014 at 23:21, Bin Meng bmeng.cn@gmail.com wrote:
Hi Simon,
On Sun, Dec 28, 2014 at 10:20 AM, Simon Glass sjg@chromium.org wrote:
For platforms with CAR we should disable it before relocation. Check if this function is available and call it if so.
Signed-off-by: Simon Glass sjg@chromium.org
arch/x86/cpu/start.S | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/arch/x86/cpu/start.S b/arch/x86/cpu/start.S index 125782c..8cebde1 100644 --- a/arch/x86/cpu/start.S +++ b/arch/x86/cpu/start.S @@ -205,6 +205,16 @@ board_init_f_r_trampoline: /* Setup global descriptor table so gd->xyz works */ call setup_gdt
/* Set if we need to disable CAR */
movl $car_uninit, %eax
cmpl $0, %eax
jz car_ret
/* Pass return address in ebx */
+.weak car_uninit
movl $car_ret, %ebx
jmp car_uninit
Can we use 'call' here instead of jmp and %ebx as the return address?
Yes let's do that. The stack must be working so we might as well use it.
+car_ret:
car_uninit_ret
OK
/* Re-enter U-Boot by calling board_init_f_r */ call board_init_f_r
Regards, Simon

At present the normal update (which happens much later) does not work. This seems to have something to do with the 'no eviction' mode in the CAR, or at least moving the microcode update after that causes it not to work.
For now, do an update early on so that it definitely works. Also refuse to continue unless the microcode update check (later in boot) is successful.
Signed-off-by: Simon Glass sjg@chromium.org ---
arch/x86/cpu/ivybridge/car.S | 14 ++++++++++++++ arch/x86/cpu/ivybridge/cpu.c | 2 +- arch/x86/cpu/ivybridge/microcode_intel.c | 9 +++++++-- arch/x86/dts/link.dts | 3 --- 4 files changed, 22 insertions(+), 6 deletions(-)
diff --git a/arch/x86/cpu/ivybridge/car.S b/arch/x86/cpu/ivybridge/car.S index 6e7e1e4..95da087 100644 --- a/arch/x86/cpu/ivybridge/car.S +++ b/arch/x86/cpu/ivybridge/car.S @@ -45,6 +45,14 @@ car_init: movl $0xFEE00300, %esi movl %eax, (%esi)
+ /* TODO: Load microcode later - the 'no eviction' mode breaks this */ + movl $0x79, %ecx + xorl %edx, %edx + movl $_dt_ucode_base_size, %eax + movl (%eax), %eax + addl $0x30, %eax + wrmsr + post_code(POST_CAR_SIPI) /* Zero out all fixed range and variable range MTRRs */ movl $mtrr_table, %esi @@ -228,3 +236,9 @@ mtrr_table: .word 0x20C, 0x20D, 0x20E, 0x20F .word 0x210, 0x211, 0x212, 0x213 mtrr_table_end: + + .align 4 +_dt_ucode_base_size: + /* These next two fields are filled in by ifdtool */ + .long 0 /* microcode base */ + .long 0 /* microcode size */ diff --git a/arch/x86/cpu/ivybridge/cpu.c b/arch/x86/cpu/ivybridge/cpu.c index 0543e06..e925310 100644 --- a/arch/x86/cpu/ivybridge/cpu.c +++ b/arch/x86/cpu/ivybridge/cpu.c @@ -263,7 +263,7 @@ int print_cpuinfo(void) enable_lapic();
ret = microcode_update_intel(); - if (ret && ret != -ENOENT && ret != -EEXIST) + if (ret) return ret;
/* Enable upper 128bytes of CMOS */ diff --git a/arch/x86/cpu/ivybridge/microcode_intel.c b/arch/x86/cpu/ivybridge/microcode_intel.c index 0817751..c012820 100644 --- a/arch/x86/cpu/ivybridge/microcode_intel.c +++ b/arch/x86/cpu/ivybridge/microcode_intel.c @@ -120,6 +120,7 @@ int microcode_update_intel(void) int count; int node; int ret; + int rev;
microcode_read_cpu(&cpu); node = 0; @@ -147,12 +148,16 @@ int microcode_update_intel(void) skipped++; continue; } - ret = microcode_read_rev(); wrmsr(0x79, (ulong)update.data, 0); + rev = microcode_read_rev(); debug("microcode: updated to revision 0x%x date=%04x-%02x-%02x\n", - microcode_read_rev(), update.date_code & 0xffff, + rev, update.date_code & 0xffff, (update.date_code >> 24) & 0xff, (update.date_code >> 16) & 0xff); + if (update.update_revision != rev) { + printf("Microcode update failed\n"); + return -EFAULT; + } count++; } while (1); } diff --git a/arch/x86/dts/link.dts b/arch/x86/dts/link.dts index a739080..1ebc334 100644 --- a/arch/x86/dts/link.dts +++ b/arch/x86/dts/link.dts @@ -214,9 +214,6 @@
microcode { update@0 { -#include "microcode/m12206a7_00000029.dtsi" - }; - update@1 { #include "microcode/m12306a9_0000001b.dtsi" }; };

Hi Simon,
On Sun, Dec 28, 2014 at 10:20 AM, Simon Glass sjg@chromium.org wrote:
At present the normal update (which happens much later) does not work. This seems to have something to do with the 'no eviction' mode in the CAR, or at least moving the microcode update after that causes it not to work.
For now, do an update early on so that it definitely works. Also refuse to continue unless the microcode update check (later in boot) is successful.
Signed-off-by: Simon Glass sjg@chromium.org
arch/x86/cpu/ivybridge/car.S | 14 ++++++++++++++ arch/x86/cpu/ivybridge/cpu.c | 2 +- arch/x86/cpu/ivybridge/microcode_intel.c | 9 +++++++-- arch/x86/dts/link.dts | 3 --- 4 files changed, 22 insertions(+), 6 deletions(-)
diff --git a/arch/x86/cpu/ivybridge/car.S b/arch/x86/cpu/ivybridge/car.S index 6e7e1e4..95da087 100644 --- a/arch/x86/cpu/ivybridge/car.S +++ b/arch/x86/cpu/ivybridge/car.S @@ -45,6 +45,14 @@ car_init: movl $0xFEE00300, %esi movl %eax, (%esi)
/* TODO: Load microcode later - the 'no eviction' mode breaks this */
movl $0x79, %ecx
Replace 0x79 to MSR_IA32_UCODE_WRITE from msr-index.h
xorl %edx, %edx
movl $_dt_ucode_base_size, %eax
movl (%eax), %eax
addl $0x30, %eax
And here 0x30 to something like MICROCODE_HEADER_LEN.
wrmsr
post_code(POST_CAR_SIPI) /* Zero out all fixed range and variable range MTRRs */ movl $mtrr_table, %esi
@@ -228,3 +236,9 @@ mtrr_table: .word 0x20C, 0x20D, 0x20E, 0x20F .word 0x210, 0x211, 0x212, 0x213 mtrr_table_end:
.align 4
+_dt_ucode_base_size:
/* These next two fields are filled in by ifdtool */
.long 0 /* microcode base */
.long 0 /* microcode size */
diff --git a/arch/x86/cpu/ivybridge/cpu.c b/arch/x86/cpu/ivybridge/cpu.c index 0543e06..e925310 100644 --- a/arch/x86/cpu/ivybridge/cpu.c +++ b/arch/x86/cpu/ivybridge/cpu.c @@ -263,7 +263,7 @@ int print_cpuinfo(void) enable_lapic();
ret = microcode_update_intel();
Since we already did the microcode update in car_init, why should we do this again here?
if (ret && ret != -ENOENT && ret != -EEXIST)
if (ret) return ret; /* Enable upper 128bytes of CMOS */
diff --git a/arch/x86/cpu/ivybridge/microcode_intel.c b/arch/x86/cpu/ivybridge/microcode_intel.c index 0817751..c012820 100644 --- a/arch/x86/cpu/ivybridge/microcode_intel.c +++ b/arch/x86/cpu/ivybridge/microcode_intel.c @@ -120,6 +120,7 @@ int microcode_update_intel(void) int count; int node; int ret;
int rev; microcode_read_cpu(&cpu); node = 0;
@@ -147,12 +148,16 @@ int microcode_update_intel(void) skipped++; continue; }
ret = microcode_read_rev(); wrmsr(0x79, (ulong)update.data, 0);
rev = microcode_read_rev(); debug("microcode: updated to revision 0x%x date=%04x-%02x-%02x\n",
microcode_read_rev(), update.date_code & 0xffff,
rev, update.date_code & 0xffff, (update.date_code >> 24) & 0xff, (update.date_code >> 16) & 0xff);
if (update.update_revision != rev) {
printf("Microcode update failed\n");
return -EFAULT;
} count++; } while (1);
} diff --git a/arch/x86/dts/link.dts b/arch/x86/dts/link.dts index a739080..1ebc334 100644 --- a/arch/x86/dts/link.dts +++ b/arch/x86/dts/link.dts @@ -214,9 +214,6 @@
microcode { update@0 {
-#include "microcode/m12206a7_00000029.dtsi"
};
update@1 {
#include "microcode/m12306a9_0000001b.dtsi" }; }; --
Regards, Bin

On Wed, Dec 31, 2014 at 2:45 PM, Bin Meng bmeng.cn@gmail.com wrote:
Hi Simon,
On Sun, Dec 28, 2014 at 10:20 AM, Simon Glass sjg@chromium.org wrote:
At present the normal update (which happens much later) does not work. This seems to have something to do with the 'no eviction' mode in the CAR, or at least moving the microcode update after that causes it not to work.
For now, do an update early on so that it definitely works. Also refuse to continue unless the microcode update check (later in boot) is successful.
Signed-off-by: Simon Glass sjg@chromium.org
arch/x86/cpu/ivybridge/car.S | 14 ++++++++++++++ arch/x86/cpu/ivybridge/cpu.c | 2 +- arch/x86/cpu/ivybridge/microcode_intel.c | 9 +++++++-- arch/x86/dts/link.dts | 3 --- 4 files changed, 22 insertions(+), 6 deletions(-)
diff --git a/arch/x86/cpu/ivybridge/car.S b/arch/x86/cpu/ivybridge/car.S index 6e7e1e4..95da087 100644 --- a/arch/x86/cpu/ivybridge/car.S +++ b/arch/x86/cpu/ivybridge/car.S @@ -45,6 +45,14 @@ car_init: movl $0xFEE00300, %esi movl %eax, (%esi)
/* TODO: Load microcode later - the 'no eviction' mode breaks this */
movl $0x79, %ecx
Replace 0x79 to MSR_IA32_UCODE_WRITE from msr-index.h
xorl %edx, %edx
movl $_dt_ucode_base_size, %eax
movl (%eax), %eax
addl $0x30, %eax
And here 0x30 to something like MICROCODE_HEADER_LEN.
I realized UCODE_HEADER_LEN might be better. At least shorter than microcode :-), and is consistent with the MSR_IA32_UCODE_WRITE.
[snip]
Regards, Bin

Hi Bin,
On 30 December 2014 at 23:45, Bin Meng bmeng.cn@gmail.com wrote:
Hi Simon,
On Sun, Dec 28, 2014 at 10:20 AM, Simon Glass sjg@chromium.org wrote:
At present the normal update (which happens much later) does not work. This seems to have something to do with the 'no eviction' mode in the CAR, or at least moving the microcode update after that causes it not to work.
For now, do an update early on so that it definitely works. Also refuse to continue unless the microcode update check (later in boot) is successful.
Signed-off-by: Simon Glass sjg@chromium.org
arch/x86/cpu/ivybridge/car.S | 14 ++++++++++++++ arch/x86/cpu/ivybridge/cpu.c | 2 +- arch/x86/cpu/ivybridge/microcode_intel.c | 9 +++++++-- arch/x86/dts/link.dts | 3 --- 4 files changed, 22 insertions(+), 6 deletions(-)
diff --git a/arch/x86/cpu/ivybridge/car.S b/arch/x86/cpu/ivybridge/car.S index 6e7e1e4..95da087 100644 --- a/arch/x86/cpu/ivybridge/car.S +++ b/arch/x86/cpu/ivybridge/car.S @@ -45,6 +45,14 @@ car_init: movl $0xFEE00300, %esi movl %eax, (%esi)
/* TODO: Load microcode later - the 'no eviction' mode breaks this */
movl $0x79, %ecx
Replace 0x79 to MSR_IA32_UCODE_WRITE from msr-index.h
xorl %edx, %edx
movl $_dt_ucode_base_size, %eax
movl (%eax), %eax
addl $0x30, %eax
And here 0x30 to something like MICROCODE_HEADER_LEN.
wrmsr
post_code(POST_CAR_SIPI) /* Zero out all fixed range and variable range MTRRs */ movl $mtrr_table, %esi
@@ -228,3 +236,9 @@ mtrr_table: .word 0x20C, 0x20D, 0x20E, 0x20F .word 0x210, 0x211, 0x212, 0x213 mtrr_table_end:
.align 4
+_dt_ucode_base_size:
/* These next two fields are filled in by ifdtool */
.long 0 /* microcode base */
.long 0 /* microcode size */
diff --git a/arch/x86/cpu/ivybridge/cpu.c b/arch/x86/cpu/ivybridge/cpu.c index 0543e06..e925310 100644 --- a/arch/x86/cpu/ivybridge/cpu.c +++ b/arch/x86/cpu/ivybridge/cpu.c @@ -263,7 +263,7 @@ int print_cpuinfo(void) enable_lapic();
ret = microcode_update_intel();
Since we already did the microcode update in car_init, why should we do this again here?
The car_init() version is a work-around that I would like to remove.
This update actually checks that it happened and fails with an error if not. For car we don't have a way yet of displaying an error.
if (ret && ret != -ENOENT && ret != -EEXIST)
if (ret) return ret; /* Enable upper 128bytes of CMOS */
diff --git a/arch/x86/cpu/ivybridge/microcode_intel.c b/arch/x86/cpu/ivybridge/microcode_intel.c index 0817751..c012820 100644 --- a/arch/x86/cpu/ivybridge/microcode_intel.c +++ b/arch/x86/cpu/ivybridge/microcode_intel.c @@ -120,6 +120,7 @@ int microcode_update_intel(void) int count; int node; int ret;
int rev; microcode_read_cpu(&cpu); node = 0;
@@ -147,12 +148,16 @@ int microcode_update_intel(void) skipped++; continue; }
ret = microcode_read_rev(); wrmsr(0x79, (ulong)update.data, 0);
rev = microcode_read_rev(); debug("microcode: updated to revision 0x%x date=%04x-%02x-%02x\n",
microcode_read_rev(), update.date_code & 0xffff,
rev, update.date_code & 0xffff, (update.date_code >> 24) & 0xff, (update.date_code >> 16) & 0xff);
if (update.update_revision != rev) {
printf("Microcode update failed\n");
return -EFAULT;
} count++; } while (1);
} diff --git a/arch/x86/dts/link.dts b/arch/x86/dts/link.dts index a739080..1ebc334 100644 --- a/arch/x86/dts/link.dts +++ b/arch/x86/dts/link.dts @@ -214,9 +214,6 @@
microcode { update@0 {
-#include "microcode/m12206a7_00000029.dtsi"
};
update@1 {
#include "microcode/m12306a9_0000001b.dtsi" }; }; --
Regards, Simon

It is useful to be able to see the MTRR setup in U-Boot. Add a command to list the state of the variable MTRR registers and allow them to be changed.
Update the documentation to list some of the available commands.
This does not support fixed MTRRs as yet.
Signed-off-by: Simon Glass sjg@chromium.org ---
common/Makefile | 1 + common/cmd_mtrr.c | 138 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ doc/README.x86 | 18 ++++++- 3 files changed, 156 insertions(+), 1 deletion(-) create mode 100644 common/cmd_mtrr.c
diff --git a/common/Makefile b/common/Makefile index c668a2f..39e4e5f 100644 --- a/common/Makefile +++ b/common/Makefile @@ -135,6 +135,7 @@ obj-$(CONFIG_CMD_MMC) += cmd_mmc.o obj-$(CONFIG_CMD_MMC_SPI) += cmd_mmc_spi.o obj-$(CONFIG_MP) += cmd_mp.o obj-$(CONFIG_CMD_MTDPARTS) += cmd_mtdparts.o +obj-$(CONFIG_X86) += cmd_mtrr.o obj-$(CONFIG_CMD_NAND) += cmd_nand.o obj-$(CONFIG_CMD_NET) += cmd_net.o obj-$(CONFIG_CMD_ONENAND) += cmd_onenand.o diff --git a/common/cmd_mtrr.c b/common/cmd_mtrr.c new file mode 100644 index 0000000..7e0506b --- /dev/null +++ b/common/cmd_mtrr.c @@ -0,0 +1,138 @@ +/* + * (C) Copyright 2014 Google, Inc + * + * SPDX-License-Identifier: GPL-2.0+ + */ + +#include <common.h> +#include <asm/msr.h> +#include <asm/mtrr.h> + +static const char *const mtrr_type_name[MTRR_TYPE_COUNT] = { + "Uncacheable", + "Combine", + "2", + "3", + "Through", + "Protect", + "Back", +}; + +static int do_mtrr_list(void) +{ + int i; + + printf("Reg Valid Write-type %-16s %-16s %-16s\n", "Base ||", + "Mask ||", "Size ||"); + for (i = 0; i < MTRR_COUNT; i++) { + const char *type = "Invalid"; + uint64_t base, mask, size; + bool valid; + + base = native_read_msr(MTRR_PHYS_BASE_MSR(i)); + mask = native_read_msr(MTRR_PHYS_MASK_MSR(i)); + size = ~mask & ((1ULL << CONFIG_CPU_ADDR_BITS) - 1); + size |= (1 << 12) - 1; + size += 1; + valid = mask & MTRR_PHYS_MASK_VALID; + type = mtrr_type_name[base & MTRR_BASE_TYPE_MASK]; + printf("%d %-5s %-12s %016llx %016llx %016llx\n", i, + valid ? "Y" : "N", type, base, mask, size); + } + + return 0; +} + +static int do_mtrr_set(uint reg, int argc, char * const argv[]) +{ + const char *typename = argv[0]; + struct mtrr_state state; + uint32_t start, size; + uint64_t base, mask; + int i, type = -1; + bool valid; + + if (argc < 3) + return CMD_RET_USAGE; + for (i = 0; i < MTRR_TYPE_COUNT; i++) { + if (*typename == *mtrr_type_name[i]) + type = i; + } + if (type == -1) { + printf("Invalid type name %s\n", typename); + return CMD_RET_USAGE; + } + start = simple_strtoul(argv[1], NULL, 16); + size = simple_strtoul(argv[2], NULL, 16); + + base = start | type; + valid = native_read_msr(MTRR_PHYS_MASK_MSR(reg)) & MTRR_PHYS_MASK_VALID; + mask = ~((uint64_t)size - 1); + mask &= (1ULL << CONFIG_CPU_ADDR_BITS) - 1; + if (valid) + mask |= MTRR_PHYS_MASK_VALID; + + printf("base=%llx, mask=%llx\n", base, mask); + mtrr_open(&state); + wrmsrl(MTRR_PHYS_BASE_MSR(reg), base); + wrmsrl(MTRR_PHYS_MASK_MSR(reg), mask); + mtrr_close(&state); + + return 0; +} + +static int mtrr_set_valid(int reg, bool valid) +{ + struct mtrr_state state; + uint64_t mask; + + mtrr_open(&state); + mask = native_read_msr(MTRR_PHYS_MASK_MSR(reg)); + if (valid) + mask |= MTRR_PHYS_MASK_VALID; + else + mask &= ~MTRR_PHYS_MASK_VALID; + wrmsrl(MTRR_PHYS_MASK_MSR(reg), mask); + mtrr_close(&state); + + return 0; +} + +static int do_mtrr(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[]) +{ + const char *cmd; + uint reg; + + cmd = argv[1]; + if (argc < 2 || *cmd == 'l') + return do_mtrr_list(); + argc -= 2; + argv += 2; + if (argc <= 0) + return CMD_RET_USAGE; + reg = simple_strtoul(argv[0], NULL, 16); + if (reg >= MTRR_COUNT) { + printf("Invalid register number\n"); + return CMD_RET_USAGE; + } + if (*cmd == 'e') + return mtrr_set_valid(reg, true); + else if (*cmd == 'd') + return mtrr_set_valid(reg, false); + else if (*cmd == 's') + return do_mtrr_set(reg, argc - 1, argv + 1); + else + return CMD_RET_USAGE; + + return 0; +} + +U_BOOT_CMD( + mtrr, 6, 1, do_mtrr, + "Use x86 memory type range registers (32-bit only)", + "[list] - list current registers\n" + "set <reg> <type> <start> <size> - set a register\n" + "\t<type> is Uncacheable, Combine, Through, Protect, Back\n" + "disable <reg> - disable a register\n" + "ensable <reg> - enable a register" +); diff --git a/doc/README.x86 b/doc/README.x86 index 5fab044..6fbfb90 100644 --- a/doc/README.x86 +++ b/doc/README.x86 @@ -110,9 +110,25 @@ be turned on. Not every device on the board is configured via devie tree, but more and more devices will be added as time goes by. Check out the directory arch/x86/dts/ for these device tree source files.
+Useful Commands +--------------- + +In keeping with the U-Boot philosophy of providing functions to check and +adjust internal settings, there are several x86-specific commands that may be +useful: + +hob - Display information about Firmware Support Package (FSP) Hand-off + Block. This is only available on platform which use FSP, mostly + Atom. +iod - Display I/O memory +iow - Write I/O memory +mtrr - List and set the Memory Type Range Registers (MTRR). These are used to + tell the CPU whether memory is cacheable and if so the cache write + mode to use. U-Boot sets up some reasonable values but you can + adjust then with this command. + TODO List --------- -- MTRR support (for performance) - Audio - Chrome OS verified boot - SMI and ACPI support, to provide platform info and facilities to Linux

Hi Simon,
On Sun, Dec 28, 2014 at 10:20 AM, Simon Glass sjg@chromium.org wrote:
It is useful to be able to see the MTRR setup in U-Boot. Add a command to list the state of the variable MTRR registers and allow them to be changed.
Nice work! This will be a useful debug command.
Update the documentation to list some of the available commands.
This does not support fixed MTRRs as yet.
Signed-off-by: Simon Glass sjg@chromium.org
common/Makefile | 1 + common/cmd_mtrr.c | 138 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ doc/README.x86 | 18 ++++++- 3 files changed, 156 insertions(+), 1 deletion(-) create mode 100644 common/cmd_mtrr.c
diff --git a/common/Makefile b/common/Makefile index c668a2f..39e4e5f 100644 --- a/common/Makefile +++ b/common/Makefile @@ -135,6 +135,7 @@ obj-$(CONFIG_CMD_MMC) += cmd_mmc.o obj-$(CONFIG_CMD_MMC_SPI) += cmd_mmc_spi.o obj-$(CONFIG_MP) += cmd_mp.o obj-$(CONFIG_CMD_MTDPARTS) += cmd_mtdparts.o +obj-$(CONFIG_X86) += cmd_mtrr.o obj-$(CONFIG_CMD_NAND) += cmd_nand.o obj-$(CONFIG_CMD_NET) += cmd_net.o obj-$(CONFIG_CMD_ONENAND) += cmd_onenand.o diff --git a/common/cmd_mtrr.c b/common/cmd_mtrr.c new file mode 100644 index 0000000..7e0506b --- /dev/null +++ b/common/cmd_mtrr.c
Should we put cmd_mtrr.c to arch/x86/lib?
@@ -0,0 +1,138 @@ +/*
- (C) Copyright 2014 Google, Inc
- SPDX-License-Identifier: GPL-2.0+
- */
+#include <common.h> +#include <asm/msr.h> +#include <asm/mtrr.h>
+static const char *const mtrr_type_name[MTRR_TYPE_COUNT] = {
"Uncacheable",
"Combine",
"2",
"3",
"Through",
"Protect",
"Back",
+};
+static int do_mtrr_list(void) +{
int i;
printf("Reg Valid Write-type %-16s %-16s %-16s\n", "Base ||",
"Mask ||", "Size ||");
for (i = 0; i < MTRR_COUNT; i++) {
const char *type = "Invalid";
uint64_t base, mask, size;
bool valid;
base = native_read_msr(MTRR_PHYS_BASE_MSR(i));
mask = native_read_msr(MTRR_PHYS_MASK_MSR(i));
size = ~mask & ((1ULL << CONFIG_CPU_ADDR_BITS) - 1);
size |= (1 << 12) - 1;
size += 1;
valid = mask & MTRR_PHYS_MASK_VALID;
type = mtrr_type_name[base & MTRR_BASE_TYPE_MASK];
printf("%d %-5s %-12s %016llx %016llx %016llx\n", i,
valid ? "Y" : "N", type, base, mask, size);
}
return 0;
+}
+static int do_mtrr_set(uint reg, int argc, char * const argv[]) +{
const char *typename = argv[0];
struct mtrr_state state;
uint32_t start, size;
uint64_t base, mask;
int i, type = -1;
bool valid;
if (argc < 3)
return CMD_RET_USAGE;
for (i = 0; i < MTRR_TYPE_COUNT; i++) {
if (*typename == *mtrr_type_name[i])
type = i;
}
if (type == -1) {
printf("Invalid type name %s\n", typename);
return CMD_RET_USAGE;
}
start = simple_strtoul(argv[1], NULL, 16);
size = simple_strtoul(argv[2], NULL, 16);
base = start | type;
valid = native_read_msr(MTRR_PHYS_MASK_MSR(reg)) & MTRR_PHYS_MASK_VALID;
mask = ~((uint64_t)size - 1);
mask &= (1ULL << CONFIG_CPU_ADDR_BITS) - 1;
if (valid)
mask |= MTRR_PHYS_MASK_VALID;
printf("base=%llx, mask=%llx\n", base, mask);
mtrr_open(&state);
wrmsrl(MTRR_PHYS_BASE_MSR(reg), base);
wrmsrl(MTRR_PHYS_MASK_MSR(reg), mask);
mtrr_close(&state);
return 0;
+}
+static int mtrr_set_valid(int reg, bool valid) +{
struct mtrr_state state;
uint64_t mask;
mtrr_open(&state);
mask = native_read_msr(MTRR_PHYS_MASK_MSR(reg));
if (valid)
mask |= MTRR_PHYS_MASK_VALID;
else
mask &= ~MTRR_PHYS_MASK_VALID;
wrmsrl(MTRR_PHYS_MASK_MSR(reg), mask);
mtrr_close(&state);
return 0;
+}
+static int do_mtrr(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[]) +{
const char *cmd;
uint reg;
cmd = argv[1];
if (argc < 2 || *cmd == 'l')
return do_mtrr_list();
argc -= 2;
argv += 2;
if (argc <= 0)
return CMD_RET_USAGE;
reg = simple_strtoul(argv[0], NULL, 16);
if (reg >= MTRR_COUNT) {
printf("Invalid register number\n");
return CMD_RET_USAGE;
}
if (*cmd == 'e')
return mtrr_set_valid(reg, true);
else if (*cmd == 'd')
return mtrr_set_valid(reg, false);
else if (*cmd == 's')
return do_mtrr_set(reg, argc - 1, argv + 1);
else
return CMD_RET_USAGE;
return 0;
+}
+U_BOOT_CMD(
mtrr, 6, 1, do_mtrr,
"Use x86 memory type range registers (32-bit only)",
"[list] - list current registers\n"
"set <reg> <type> <start> <size> - set a register\n"
"\t<type> is Uncacheable, Combine, Through, Protect, Back\n"
"disable <reg> - disable a register\n"
"ensable <reg> - enable a register"
+); diff --git a/doc/README.x86 b/doc/README.x86 index 5fab044..6fbfb90 100644 --- a/doc/README.x86 +++ b/doc/README.x86 @@ -110,9 +110,25 @@ be turned on. Not every device on the board is configured via devie tree, but more and more devices will be added as time goes by. Check out the directory arch/x86/dts/ for these device tree source files.
+Useful Commands +---------------
+In keeping with the U-Boot philosophy of providing functions to check and +adjust internal settings, there are several x86-specific commands that may be +useful:
+hob - Display information about Firmware Support Package (FSP) Hand-off
Block. This is only available on platform which use FSP, mostly
use -> uses
Atom.
+iod - Display I/O memory +iow - Write I/O memory +mtrr - List and set the Memory Type Range Registers (MTRR). These are used to
tell the CPU whether memory is cacheable and if so the cache write
mode to use. U-Boot sets up some reasonable values but you can
adjust then with this command.
TODO List
-- MTRR support (for performance)
- Audio
- Chrome OS verified boot
- SMI and ACPI support, to provide platform info and facilities to Linux
--
Regards, Bin
participants (4)
-
Bin Meng
-
Graeme Russ
-
Graeme Russ
-
Simon Glass