[U-Boot] [PATCH v3 00/26] sunxi: Allwinner A64: SPL support

Hi,
another reworked version of the SPL support series for the Allwinner A64 SoC. Again many thanks to the diligent reviewers, I hope I didn't miss any comments. As the previous versions this one includes support for both AArch64 and AArch32 SPL builds. The FIT support is still missing, which means the functionality is limited. Due to the missing ARM Trusted Firmware (ATF) in this firmware chain we lose Ethernet and SMP, among other minor things. A full 64-bit build can be written to an SD card as expected and will boot the U-Boot proper prompt. However Linux will crash on boot, as PSCI is missing. Building the 32-bit version of the SPL and combining this with an ATF build and the 64-bit U-Boot proper allows to use FEL booting now: # sunxi-fel spl sunxi-spl.bin write 0x4a000000 u-boot-dtb.bin \ write 0x44000 bl31.bin reset64 0x44000 This way of booting the board gives full functionality.
The first patch is a rather simple fix (with no changes to v2). Patches 2-8 prepare the SPL code to be compiled for 64-bit in general and AArch64 in particular. Patches 9-11 refactor the existing boot0 header functionality to be used by patch 12, which introduces the 64-bit switch in the first SPL instructions. Patches 13-20 then introduce the actual core of the SPL support: the DRAM initialization, courtesy of Jens. This piggy backs on the existing H3 DRAM code, deviating where needed. This has been reworked compared to v2: I added a patch from Philipp to replace the rather uninspired register writes in the MBUS priority setup function with some meaningful code, explaining the various bits. Also the actual A64 DRAM code is no longer #ifdef'ed into the H3 driver, but uses parameters to (static) functions. The compiler detects this and removes the dead code from the other variant, resulting in the same binary size for the H3.
Patch 21 finally enables the 64-bit SPL support. So now building the existing pine64_plus_defconfig will generate a sunxi-spl.bin, which can be prepended to the U-Boot proper image (not .bin) to boot from an SD card. Due to the missing ATF support this is of limited usability at the moment, though. Also FEL support requires more love - to switch back to AArch32 before returning to FEL (without crashing, that is ;-), so this is disabled. On my setup this results in a 26KB SPL binary, which is close to the 28K limit mksunxiboot imposes at the moment. Adding anything (like FIT support or DEBUG) will exceed this, and although I have patches to let mksunxiboot get close to 32KB, this is the ulimate frontier.
So patches 22-25 then teach the SPL how to detect an U-Boot image file of a different bitness and do the RMR switch from AArch32 to AArch64, if needed. This is used by the final patch 26, which creates another _defconfig to let the SPL compile for AArch32 using the Thumb2 encoding. This results in a binary of less than 17KB in my case, so has plenty of room for extensions.
Cheers, Andre.
Changelog v2 .. v3: - add various Reviewed-by: and Acked-by: tags - split tiny-printf fix to handle "-" separately - add various comments and extend commit messages - add assembly file to re-create the embedded RMR switch code - add patch 14/26 to explain the MBUS priority setup - move DRAM r/w delay values into #defines to simplify re-usablity - replace #ifdef'ed addition of A64 support to the H3 DRAM driver with an approach using static parameters
Changelog v1 .. v2: - drop SPI build fix (already merged) - confine A31 register init change to H3 and A64 - use IS_ENABLED() instead of #idef to guard MBUS2 clock init - fix tiny-printf (proper sign extension for 32-bit integers) - add "size" output in commit msg to document tiny-printf size impact - fix sdelay(): use only one register, add "cc" clobber - update RMR switch code to provide easy access to RVBAR register address - drop redundant DRAM frequency setting from Pine64 defconfig - minor changes as requested by reviewers
Andre Przywara (21): sun6i: Restrict some register initialization to Allwinner A31 SoC armv8: prevent using THUMB armv8: add lowlevel_init.S SPL: tiny-printf: add "l" modifier SPL: tiny-printf: ignore "-" modifier move UL() macro from armv8/mmu.h into common.h SPL: make struct spl_image 64-bit safe armv8: add simple sdelay implementation armv8: move reset branch into boot hook ARM: boot0 hook: remove macro, include whole header file sunxi: introduce extra config option for boot0 header sunxi: A64: do an RMR switch if started in AArch32 mode sunxi: provide default DRAM config for sun50i in Kconfig sunxi: H3/A64: fix non-ODT setting sunxi: DRAM: fix H3 DRAM size display on aarch64 sunxi: A64: enable SPL SPL: read and store arch property from U-Boot image Makefile: use "arm64" architecture for U-Boot image files ARM: SPL/FIT: differentiate between arm and arm64 arch properties sunxi: introduce RMR switch to enter payloads in 64-bit mode sunxi: A64: add 32-bit SPL support
Jens Kuske (3): sunxi: H3: add and rename some DRAM contoller registers sunxi: H3: add DRAM controller single bit delay support sunxi: A64: use H3 DRAM initialization code for A64 as well
Philipp Tomsich (2): sunxi: H3: Rework MBUS priority setup sunxi: clocks: Use the correct pattern register for PLL11
Makefile | 9 +- arch/arm/cpu/armv8/Makefile | 1 + arch/arm/cpu/armv8/cpu.c | 14 + arch/arm/cpu/armv8/lowlevel_init.S | 44 +++ arch/arm/cpu/armv8/start.S | 5 +- arch/arm/include/asm/arch-bcm235xx/boot0.h | 8 +- arch/arm/include/asm/arch-bcm281xx/boot0.h | 8 +- arch/arm/include/asm/arch-sunxi/boot0.h | 37 ++- arch/arm/include/asm/arch-sunxi/clock_sun6i.h | 1 + arch/arm/include/asm/arch-sunxi/cpu.h | 3 + arch/arm/include/asm/arch-sunxi/dram.h | 2 +- arch/arm/include/asm/arch-sunxi/dram_sun8i_h3.h | 53 ++-- arch/arm/include/asm/armv8/mmu.h | 8 - arch/arm/lib/Makefile | 2 + arch/arm/lib/spl.c | 15 + arch/arm/lib/vectors.S | 1 - arch/arm/mach-omap2/boot-common.c | 2 +- arch/arm/mach-sunxi/Makefile | 2 + arch/arm/mach-sunxi/board.c | 2 +- arch/arm/mach-sunxi/clock_sun6i.c | 10 +- arch/arm/mach-sunxi/dram_sun8i_h3.c | 382 +++++++++++++++++------- arch/arm/mach-sunxi/rmr_switch.S | 41 +++ arch/arm/mach-sunxi/spl_switch.c | 81 +++++ arch/arm/mach-tegra/spl.c | 2 +- board/sunxi/Kconfig | 41 ++- common/spl/spl.c | 9 +- common/spl/spl_fit.c | 8 + common/spl/spl_mmc.c | 2 +- configs/pine64_plus_defconfig | 7 +- configs/sun50i_spl32_defconfig | 10 + include/common.h | 13 +- include/configs/sunxi-common.h | 4 +- include/spl.h | 19 +- lib/tiny-printf.c | 50 +++- 34 files changed, 695 insertions(+), 201 deletions(-) create mode 100644 arch/arm/cpu/armv8/lowlevel_init.S create mode 100644 arch/arm/mach-sunxi/rmr_switch.S create mode 100644 arch/arm/mach-sunxi/spl_switch.c create mode 100644 configs/sun50i_spl32_defconfig

These days many Allwinner SoCs use clock_sun6i.c, although out of them only the (original sun6i) A31 has a second MBUS clock register. Also the requirement for setting up the PRCM PLL_CTLR1 register to provide the proper voltage seems to be a property of older SoCs only as well.
Restrict the MBUS initialization to this SoC only to avoid writing bogus values to (undefined) registers in other chips. I can only verify that the PLL voltage setup is not needed for H3 and A64, so for now we only spare those two SoCs.
Signed-off-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Alexander Graf agraf@suse.de Reviewed-by: Chen-Yu Tsai wens@csie.org Reviewed-by: Simon Glass sjg@chromium.org Acked-by: Maxime Ripard maxime.ripard@free-electrons.com --- arch/arm/mach-sunxi/clock_sun6i.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mach-sunxi/clock_sun6i.c b/arch/arm/mach-sunxi/clock_sun6i.c index ed8cd9b..80cfc0b 100644 --- a/arch/arm/mach-sunxi/clock_sun6i.c +++ b/arch/arm/mach-sunxi/clock_sun6i.c @@ -21,6 +21,8 @@ void clock_init_safe(void) { struct sunxi_ccm_reg * const ccm = (struct sunxi_ccm_reg *)SUNXI_CCM_BASE; + +#if !defined(CONFIG_MACH_SUN8I_H3) && !defined(CONFIG_MACH_SUN50I) struct sunxi_prcm_reg * const prcm = (struct sunxi_prcm_reg *)SUNXI_PRCM_BASE;
@@ -31,6 +33,7 @@ void clock_init_safe(void) PRCM_PLL_CTRL_LDO_DIGITAL_EN | PRCM_PLL_CTRL_LDO_ANALOG_EN | PRCM_PLL_CTRL_EXT_OSC_EN | PRCM_PLL_CTRL_LDO_OUT_L(1140)); clrbits_le32(&prcm->pll_ctrl1, PRCM_PLL_CTRL_LDO_KEY_MASK); +#endif
clock_set_pll1(408000000);
@@ -41,7 +44,8 @@ void clock_init_safe(void) writel(AHB1_ABP1_DIV_DEFAULT, &ccm->ahb1_apb1_div);
writel(MBUS_CLK_DEFAULT, &ccm->mbus0_clk_cfg); - writel(MBUS_CLK_DEFAULT, &ccm->mbus1_clk_cfg); + if (IS_ENABLED(CONFIG_MACH_SUN6I)) + writel(MBUS_CLK_DEFAULT, &ccm->mbus1_clk_cfg); } #endif

The predominantely 32-bit ARM targets try to compile the SPL in Thumb mode to reduce code size. The 64-bit AArch64 instruction set does not know an alternative, concise encoding, so the Thumb build option should only be set for 32-bit targets. Likewise -marm machine options are only valid for ARMv7 targets.
Signed-off-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Alexander Graf agraf@suse.de Reviewed-by: Simon Glass sjg@chromium.org Reviewed-by: Tom Rini trini@konsulko.com Acked-by: Maxime Ripard maxime.ripard@free-electrons.com --- arch/arm/lib/Makefile | 2 ++ include/configs/sunxi-common.h | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile index 0051f76..024139d 100644 --- a/arch/arm/lib/Makefile +++ b/arch/arm/lib/Makefile @@ -77,8 +77,10 @@ ifndef CONFIG_HAS_THUMB2
# for C files, just apend -marm, which will override previous -mthumb*
+ifndef CONFIG_ARM64 CFLAGS_cache.o := -marm CFLAGS_cache-cp15.o := -marm +endif
# For .S, drop -mthumb* and other thumb-related options. # CFLAGS_REMOVE_* would not have an effet, so AFLAGS_REMOVE_* diff --git a/include/configs/sunxi-common.h b/include/configs/sunxi-common.h index b0bfc0d..e05c318 100644 --- a/include/configs/sunxi-common.h +++ b/include/configs/sunxi-common.h @@ -35,7 +35,7 @@ /* * High Level Configuration Options */ -#ifdef CONFIG_SPL_BUILD +#if defined(CONFIG_SPL_BUILD) && !defined(CONFIG_ARM64) #define CONFIG_SYS_THUMB_BUILD /* Thumbs mode to save space in SPL */ #endif

For boards that call s_init() when the SPL runs, we are expected to setup an early stack before calling this C function. Implement the proper AArch64 version of this based on the ARMv7 code. This allows sunxi boards to setup the basic peripherals even with a 64-bit SPL.
Signed-off-by: Andre Przywara andre.przywara@arm.com --- arch/arm/cpu/armv8/Makefile | 1 + arch/arm/cpu/armv8/lowlevel_init.S | 44 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+) create mode 100644 arch/arm/cpu/armv8/lowlevel_init.S
diff --git a/arch/arm/cpu/armv8/Makefile b/arch/arm/cpu/armv8/Makefile index dea1465..799a752 100644 --- a/arch/arm/cpu/armv8/Makefile +++ b/arch/arm/cpu/armv8/Makefile @@ -25,3 +25,4 @@ obj-$(CONFIG_FSL_LAYERSCAPE) += fsl-layerscape/ obj-$(CONFIG_S32V234) += s32v234/ obj-$(CONFIG_ARCH_ZYNQMP) += zynqmp/ obj-$(CONFIG_TARGET_HIKEY) += hisilicon/ +obj-$(CONFIG_ARCH_SUNXI) += lowlevel_init.o diff --git a/arch/arm/cpu/armv8/lowlevel_init.S b/arch/arm/cpu/armv8/lowlevel_init.S new file mode 100644 index 0000000..189e35f --- /dev/null +++ b/arch/arm/cpu/armv8/lowlevel_init.S @@ -0,0 +1,44 @@ +/* + * A lowlevel_init function that sets up the stack to call a C function to + * perform further init. + * + * SPDX-License-Identifier: GPL-2.0+ + */ + +#include <asm-offsets.h> +#include <config.h> +#include <linux/linkage.h> + +ENTRY(lowlevel_init) + /* + * Setup a temporary stack. Global data is not available yet. + */ +#if defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_STACK) + ldr w0, =CONFIG_SPL_STACK +#else + ldr w0, =CONFIG_SYS_INIT_SP_ADDR +#endif + bic sp, x0, #0xf /* 16-byte alignment for ABI compliance */ + + /* + * Save the old LR(passed in x29) and the current LR to stack + */ + stp x29, x30, [sp, #-16]! + + /* + * Call the very early init function. This should do only the + * absolute bare minimum to get started. It should not: + * + * - set up DRAM + * - use global_data + * - clear BSS + * - try to start a console + * + * For boards with SPL this should be empty since SPL can do all of + * this init in the SPL board_init_f() function which is called + * immediately after this. + */ + bl s_init + ldp x29, x30, [sp] + ret +ENDPROC(lowlevel_init)

On 19 December 2016 at 14:49, Andre Przywara andre.przywara@arm.com wrote:
For boards that call s_init() when the SPL runs, we are expected to setup an early stack before calling this C function. Implement the proper AArch64 version of this based on the ARMv7 code. This allows sunxi boards to setup the basic peripherals even with a 64-bit SPL.
Signed-off-by: Andre Przywara andre.przywara@arm.com
arch/arm/cpu/armv8/Makefile | 1 + arch/arm/cpu/armv8/lowlevel_init.S | 44 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+) create mode 100644 arch/arm/cpu/armv8/lowlevel_init.S
Reviewed-by: Simon Glass sjg@chromium.org

tiny-printf does not know about the "l" modifier so far, which breaks the crash dump on AArch64, because it uses %lx to print the registers. Add an easy way of handling longs correctly.
Using a relatively decent compiler (GCC 5.3.0) this does _not_ increase the code size of tiny-printf.o for 32-bit builds (where long and int are actually the same), actually it looses three (ARM Thumb2) instructions from the actual SPL (numbers for orangepi_plus_defconfig): text data bss dec hex filename 758 0 0 758 2f6 spl/lib/tiny-printf.o before 18839 488 232 19559 4c67 spl/u-boot-spl before 758 0 0 758 2f6 spl/lib/tiny-printf.o after 18833 488 232 19553 4c61 spl/u-boot-spl after
This adds some substantial amount of code to a 64-bit build, though: (taken after a later commit, which enables the ARM64 SPL build for sunxi) text data bss dec hex filename 1542 0 0 1542 606 spl/lib/tiny-printf.o before 25830 392 360 26582 67d6 spl/u-boot-spl before 1758 0 0 1758 6de spl/lib/tiny-printf.o after 26040 392 360 26792 68a8 spl/u-boot-spl after
Signed-off-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Simon Glass sjg@chromium.org --- lib/tiny-printf.c | 47 ++++++++++++++++++++++++++++++++++++----------- 1 file changed, 36 insertions(+), 11 deletions(-)
diff --git a/lib/tiny-printf.c b/lib/tiny-printf.c index 30ac759..0b8512f 100644 --- a/lib/tiny-printf.c +++ b/lib/tiny-printf.c @@ -38,8 +38,8 @@ static void out_dgt(struct printf_info *info, char dgt) info->zs = 1; }
-static void div_out(struct printf_info *info, unsigned int *num, - unsigned int div) +static void div_out(struct printf_info *info, unsigned long *num, + unsigned long div) { unsigned char dgt = 0;
@@ -56,9 +56,9 @@ int _vprintf(struct printf_info *info, const char *fmt, va_list va) { char ch; char *p; - unsigned int num; + unsigned long num; char buf[12]; - unsigned int div; + unsigned long div;
while ((ch = *(fmt++))) { if (ch != '%') { @@ -66,6 +66,7 @@ int _vprintf(struct printf_info *info, const char *fmt, va_list va) } else { bool lz = false; int width = 0; + bool islong = false;
ch = *(fmt++); if (ch == '0') { @@ -80,6 +81,11 @@ int _vprintf(struct printf_info *info, const char *fmt, va_list va) ch = *fmt++; } } + if (ch == 'l') { + ch = *(fmt++); + islong = true; + } + info->bf = buf; p = info->bf; info->zs = 0; @@ -89,24 +95,43 @@ int _vprintf(struct printf_info *info, const char *fmt, va_list va) goto abort; case 'u': case 'd': - num = va_arg(va, unsigned int); - if (ch == 'd' && (int)num < 0) { - num = -(int)num; - out(info, '-'); + div = 1000000000; + if (islong) { + num = va_arg(va, unsigned long); + if (sizeof(long) > 4) + div *= div * 10; + } else { + num = va_arg(va, unsigned int); + } + + if (ch == 'd') { + if (islong && (long)num < 0) { + num = -(long)num; + out(info, '-'); + } else if (!islong && (int)num < 0) { + num = -(int)num; + out(info, '-'); + } } if (!num) { out_dgt(info, 0); } else { - for (div = 1000000000; div; div /= 10) + for (; div; div /= 10) div_out(info, &num, div); } break; case 'x': - num = va_arg(va, unsigned int); + if (islong) { + num = va_arg(va, unsigned long); + div = 1UL << (sizeof(long) * 8 - 4); + } else { + num = va_arg(va, unsigned int); + div = 0x10000000; + } if (!num) { out_dgt(info, 0); } else { - for (div = 0x10000000; div; div /= 0x10) + for (; div; div /= 0x10) div_out(info, &num, div); } break;

tiny-printf does not know about the "-" modifier, which aligns numbers. This is used by some SPL code, but as it's purely cosmetical, we just ignore this modifier here to avoid changing correct printf strings.
Signed-off-by: Andre Przywara andre.przywara@arm.com --- lib/tiny-printf.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/lib/tiny-printf.c b/lib/tiny-printf.c index 0b8512f..dfa8432 100644 --- a/lib/tiny-printf.c +++ b/lib/tiny-printf.c @@ -69,6 +69,9 @@ int _vprintf(struct printf_info *info, const char *fmt, va_list va) bool islong = false;
ch = *(fmt++); + if (ch == '-') + ch = *(fmt++); + if (ch == '0') { ch = *(fmt++); lz = 1;

On 19 December 2016 at 14:49, Andre Przywara andre.przywara@arm.com wrote:
tiny-printf does not know about the "-" modifier, which aligns numbers. This is used by some SPL code, but as it's purely cosmetical, we just ignore this modifier here to avoid changing correct printf strings.
Signed-off-by: Andre Przywara andre.przywara@arm.com
lib/tiny-printf.c | 3 +++ 1 file changed, 3 insertions(+)
Reviewed-by: Simon Glass sjg@chromium.org

The UL() macro is pretty useful in sharing constants between assembly and C files while still being able to specify a type for C. Move the macro from an armv8 specific header into a common header file to be able to use it by arm code (for instance) as well.
Signed-off-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Alexander Graf agraf@suse.de --- arch/arm/include/asm/armv8/mmu.h | 8 -------- include/common.h | 13 ++++++++++++- 2 files changed, 12 insertions(+), 9 deletions(-)
diff --git a/arch/arm/include/asm/armv8/mmu.h b/arch/arm/include/asm/armv8/mmu.h index aa0f3c4..e9b4cdb 100644 --- a/arch/arm/include/asm/armv8/mmu.h +++ b/arch/arm/include/asm/armv8/mmu.h @@ -8,14 +8,6 @@ #ifndef _ASM_ARMV8_MMU_H_ #define _ASM_ARMV8_MMU_H_
-#ifdef __ASSEMBLY__ -#define _AC(X, Y) X -#else -#define _AC(X, Y) (X##Y) -#endif - -#define UL(x) _AC(x, UL) - /***************************************************************/ /* * The following definitions are related each other, shoud be diff --git a/include/common.h b/include/common.h index a8d833b..ee0436b 100644 --- a/include/common.h +++ b/include/common.h @@ -15,6 +15,9 @@ typedef volatile unsigned long vu_long; typedef volatile unsigned short vu_short; typedef volatile unsigned char vu_char;
+/* Allow sharing constants with type modifiers between C and assembly. */ +#define _AC(X, Y) (X##Y) + #include <config.h> #include <errno.h> #include <asm-offsets.h> @@ -936,7 +939,12 @@ int cpu_disable(int nr); int cpu_release(int nr, int argc, char * const argv[]); #endif
-#endif /* __ASSEMBLY__ */ +#else /* __ASSEMBLY__ */ + +/* Drop a C type modifier (like in 3UL) for constants used in assembly. */ +#define _AC(X, Y) X + +#endif /* __ASSEMBLY__ */
#ifdef CONFIG_PPC /* @@ -948,6 +956,9 @@ int cpu_release(int nr, int argc, char * const argv[]);
/* Put only stuff here that the assembler can digest */
+/* Declare an unsigned long constant digestable both by C and an assembler. */ +#define UL(x) _AC(x, UL) + #ifdef CONFIG_POST #define CONFIG_HAS_POST #ifndef CONFIG_POST_ALT_LIST

Since entry_point and load_addr are addresses, they should be represented as longs to cover the whole address space and to avoid warning when compiling the SPL in 64-bit. Also adjust debug prints to add the 'l' specifier, where needed.
Signed-off-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Alexander Graf agraf@suse.de Reviewed-by: Simon Glass sjg@chromium.org Reviewed-by: Tom Rini trini@konsulko.com Acked-by: Maxime Ripard maxime.ripard@free-electrons.com --- arch/arm/mach-omap2/boot-common.c | 2 +- arch/arm/mach-tegra/spl.c | 2 +- common/spl/spl.c | 8 ++++---- common/spl/spl_mmc.c | 2 +- include/spl.h | 4 ++-- 5 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/arm/mach-omap2/boot-common.c b/arch/arm/mach-omap2/boot-common.c index 385310b..7ae3d80 100644 --- a/arch/arm/mach-omap2/boot-common.c +++ b/arch/arm/mach-omap2/boot-common.c @@ -228,7 +228,7 @@ void __noreturn jump_to_image_no_args(struct spl_image_info *spl_image)
u32 boot_params = *((u32 *)OMAP_SRAM_SCRATCH_BOOT_PARAMS);
- debug("image entry point: 0x%X\n", spl_image->entry_point); + debug("image entry point: 0x%lX\n", spl_image->entry_point); /* Pass the saved boot_params from rom code */ image_entry((u32 *)boot_params); } diff --git a/arch/arm/mach-tegra/spl.c b/arch/arm/mach-tegra/spl.c index e0f9d5b..41c88cb 100644 --- a/arch/arm/mach-tegra/spl.c +++ b/arch/arm/mach-tegra/spl.c @@ -42,7 +42,7 @@ u32 spl_boot_device(void)
void __noreturn jump_to_image_no_args(struct spl_image_info *spl_image) { - debug("image entry point: 0x%X\n", spl_image->entry_point); + debug("image entry point: 0x%lX\n", spl_image->entry_point);
start_cpu((u32)spl_image->entry_point); halt_avp(); diff --git a/common/spl/spl.c b/common/spl/spl.c index f7df834..a76ea3a 100644 --- a/common/spl/spl.c +++ b/common/spl/spl.c @@ -115,7 +115,7 @@ int spl_parse_image_header(struct spl_image_info *spl_image, } spl_image->os = image_get_os(header); spl_image->name = image_get_name(header); - debug("spl: payload image: %.*s load addr: 0x%x size: %d\n", + debug("spl: payload image: %.*s load addr: 0x%lx size: %d\n", (int)sizeof(spl_image->name), spl_image->name, spl_image->load_addr, spl_image->size); } else { @@ -140,7 +140,7 @@ int spl_parse_image_header(struct spl_image_info *spl_image, spl_image->load_addr = CONFIG_SYS_LOAD_ADDR; spl_image->entry_point = CONFIG_SYS_LOAD_ADDR; spl_image->size = end - start; - debug("spl: payload zImage, load addr: 0x%x size: %d\n", + debug("spl: payload zImage, load addr: 0x%lx size: %d\n", spl_image->load_addr, spl_image->size); return 0; } @@ -164,9 +164,9 @@ __weak void __noreturn jump_to_image_no_args(struct spl_image_info *spl_image) typedef void __noreturn (*image_entry_noargs_t)(void);
image_entry_noargs_t image_entry = - (image_entry_noargs_t)(unsigned long)spl_image->entry_point; + (image_entry_noargs_t)spl_image->entry_point;
- debug("image entry point: 0x%X\n", spl_image->entry_point); + debug("image entry point: 0x%lX\n", spl_image->entry_point); image_entry(); }
diff --git a/common/spl/spl_mmc.c b/common/spl/spl_mmc.c index 85e3de8..0cd355c 100644 --- a/common/spl/spl_mmc.c +++ b/common/spl/spl_mmc.c @@ -36,7 +36,7 @@ static int mmc_load_legacy(struct spl_image_info *spl_image, struct mmc *mmc, /* Read the header too to avoid extra memcpy */ count = blk_dread(mmc_get_blk_desc(mmc), sector, image_size_sectors, (void *)(ulong)spl_image->load_addr); - debug("read %x sectors to %x\n", image_size_sectors, + debug("read %x sectors to %lx\n", image_size_sectors, spl_image->load_addr); if (count != image_size_sectors) return -EIO; diff --git a/include/spl.h b/include/spl.h index 6e746b2..bde4437 100644 --- a/include/spl.h +++ b/include/spl.h @@ -23,8 +23,8 @@ struct spl_image_info { const char *name; u8 os; - u32 load_addr; - u32 entry_point; + ulong load_addr; + ulong entry_point; u32 size; u32 flags; };

The sunxi DRAM setup code needs an sdelay() implementation, which wasn't defined for armv8 so far. Shamelessly copy the armv7 version and adjust it to work in AArch64.
Signed-off-by: Andre Przywara andre.przywara@arm.com --- arch/arm/cpu/armv8/cpu.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/arch/arm/cpu/armv8/cpu.c b/arch/arm/cpu/armv8/cpu.c index e06c3cc..c093ae7 100644 --- a/arch/arm/cpu/armv8/cpu.c +++ b/arch/arm/cpu/armv8/cpu.c @@ -16,6 +16,20 @@ #include <asm/system.h> #include <linux/compiler.h>
+/* + * sdelay() - simple spin loop. + * + * Will delay execution by roughly (@loops * 2) cycles. + * This is necessary to be used before timers are accessible. + * + * A value of "0" will results in 2^64 loops. + */ +void sdelay(unsigned long loops) +{ + __asm__ volatile ("1:\n" "subs %0, %0, #1\n" + "b.ne 1b" : "=r" (loops) : "0"(loops) : "cc"); +} + int cleanup_before_linux(void) { /*

The boot0 hook we have so far is applied _after_ the initial branch to the "reset" entry point. An upcoming change requires even this branch to be changed, so we apply the hook macro at the earliest point, and have the branch in the hook file as well. This is no functional change at this point, just refactoring to simplify upcoming patches.
Signed-off-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Simon Glass sjg@chromium.org --- arch/arm/cpu/armv8/start.S | 4 ++-- arch/arm/include/asm/arch-sunxi/boot0.h | 1 + 2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S index 4f5f6d8..ee393d7 100644 --- a/arch/arm/cpu/armv8/start.S +++ b/arch/arm/cpu/armv8/start.S @@ -19,8 +19,6 @@
.globl _start _start: - b reset - #ifdef CONFIG_ENABLE_ARM_SOC_BOOT0_HOOK /* * Various SoCs need something special and SoC-specific up front in @@ -29,6 +27,8 @@ _start: */ #include <asm/arch/boot0.h> ARM_SOC_BOOT0_HOOK +#else + b reset #endif
.align 3 diff --git a/arch/arm/include/asm/arch-sunxi/boot0.h b/arch/arm/include/asm/arch-sunxi/boot0.h index ea5675e..6f28d63 100644 --- a/arch/arm/include/asm/arch-sunxi/boot0.h +++ b/arch/arm/include/asm/arch-sunxi/boot0.h @@ -9,6 +9,7 @@
/* reserve space for BOOT0 header information */ #define ARM_SOC_BOOT0_HOOK \ + b reset; \ .space 1532
#endif /* __BOOT0_H */

For prepending some board specific header area to U-Boot images we were so far including a header file with a macro definition containing the actual header specification. This works fine if there are just a few statements and if there is only one alternative. However adding more complex code quickly gets messy with this approach, so let's just drop that intermediate macro and let the #include actually insert the code directly. This converts the callers and the callees, but doesn't change anything at this point.
Signed-off-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Simon Glass sjg@chromium.org --- arch/arm/cpu/armv8/start.S | 1 - arch/arm/include/asm/arch-bcm235xx/boot0.h | 8 +------- arch/arm/include/asm/arch-bcm281xx/boot0.h | 8 +------- arch/arm/include/asm/arch-sunxi/boot0.h | 8 +------- arch/arm/lib/vectors.S | 1 - 5 files changed, 3 insertions(+), 23 deletions(-)
diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S index ee393d7..140609d 100644 --- a/arch/arm/cpu/armv8/start.S +++ b/arch/arm/cpu/armv8/start.S @@ -26,7 +26,6 @@ _start: * use it here. */ #include <asm/arch/boot0.h> -ARM_SOC_BOOT0_HOOK #else b reset #endif diff --git a/arch/arm/include/asm/arch-bcm235xx/boot0.h b/arch/arm/include/asm/arch-bcm235xx/boot0.h index 7e72882..9ff90b8 100644 --- a/arch/arm/include/asm/arch-bcm235xx/boot0.h +++ b/arch/arm/include/asm/arch-bcm235xx/boot0.h @@ -4,12 +4,6 @@ * SPDX-License-Identifier: GPL-2.0+ */
-#ifndef __BOOT0_H -#define __BOOT0_H - /* BOOT0 header information */ -#define ARM_SOC_BOOT0_HOOK \ - .word 0xbabeface; \ + .word 0xbabeface; .word _end - _start - -#endif /* __BOOT0_H */ diff --git a/arch/arm/include/asm/arch-bcm281xx/boot0.h b/arch/arm/include/asm/arch-bcm281xx/boot0.h index 7e72882..9ff90b8 100644 --- a/arch/arm/include/asm/arch-bcm281xx/boot0.h +++ b/arch/arm/include/asm/arch-bcm281xx/boot0.h @@ -4,12 +4,6 @@ * SPDX-License-Identifier: GPL-2.0+ */
-#ifndef __BOOT0_H -#define __BOOT0_H - /* BOOT0 header information */ -#define ARM_SOC_BOOT0_HOOK \ - .word 0xbabeface; \ + .word 0xbabeface; .word _end - _start - -#endif /* __BOOT0_H */ diff --git a/arch/arm/include/asm/arch-sunxi/boot0.h b/arch/arm/include/asm/arch-sunxi/boot0.h index 6f28d63..6a13db5 100644 --- a/arch/arm/include/asm/arch-sunxi/boot0.h +++ b/arch/arm/include/asm/arch-sunxi/boot0.h @@ -4,12 +4,6 @@ * SPDX-License-Identifier: GPL-2.0+ */
-#ifndef __BOOT0_H -#define __BOOT0_H - /* reserve space for BOOT0 header information */ -#define ARM_SOC_BOOT0_HOOK \ - b reset; \ + b reset .space 1532 - -#endif /* __BOOT0_H */ diff --git a/arch/arm/lib/vectors.S b/arch/arm/lib/vectors.S index 5cc132b..9fe7415 100644 --- a/arch/arm/lib/vectors.S +++ b/arch/arm/lib/vectors.S @@ -67,7 +67,6 @@ _start: * use it here. */ #include <asm/arch/boot0.h> -ARM_SOC_BOOT0_HOOK #endif
/*

Hi Andre,
On Sun, Dec 18, 2016 at 5:50 PM, Andre Przywara andre.przywara@arm.com wrote:
For prepending some board specific header area to U-Boot images we were so far including a header file with a macro definition containing the actual header specification. This works fine if there are just a few statements and if there is only one alternative. However adding more complex code quickly gets messy with this approach, so let's just drop that intermediate macro and let the #include actually insert the code directly. This converts the callers and the callees, but doesn't change anything at this point.
Signed-off-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Simon Glass sjg@chromium.org
arch/arm/cpu/armv8/start.S | 1 - arch/arm/include/asm/arch-bcm235xx/boot0.h | 8 +------- arch/arm/include/asm/arch-bcm281xx/boot0.h | 8 +------- arch/arm/include/asm/arch-sunxi/boot0.h | 8 +------- arch/arm/lib/vectors.S | 1 - 5 files changed, 3 insertions(+), 23 deletions(-)
diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S index ee393d7..140609d 100644 --- a/arch/arm/cpu/armv8/start.S +++ b/arch/arm/cpu/armv8/start.S @@ -26,7 +26,6 @@ _start:
- use it here.
*/ #include <asm/arch/boot0.h> -ARM_SOC_BOOT0_HOOK #else b reset #endif diff --git a/arch/arm/include/asm/arch-bcm235xx/boot0.h b/arch/arm/include/asm/arch-bcm235xx/boot0.h index 7e72882..9ff90b8 100644 --- a/arch/arm/include/asm/arch-bcm235xx/boot0.h +++ b/arch/arm/include/asm/arch-bcm235xx/boot0.h @@ -4,12 +4,6 @@
- SPDX-License-Identifier: GPL-2.0+
*/
-#ifndef __BOOT0_H -#define __BOOT0_H
/* BOOT0 header information */ -#define ARM_SOC_BOOT0_HOOK \
.word 0xbabeface; \
.word 0xbabeface;
this trailing semi-colon is not necessary
.word _end - _start
-#endif /* __BOOT0_H */ diff --git a/arch/arm/include/asm/arch-bcm281xx/boot0.h b/arch/arm/include/asm/arch-bcm281xx/boot0.h index 7e72882..9ff90b8 100644 --- a/arch/arm/include/asm/arch-bcm281xx/boot0.h +++ b/arch/arm/include/asm/arch-bcm281xx/boot0.h @@ -4,12 +4,6 @@
- SPDX-License-Identifier: GPL-2.0+
*/
-#ifndef __BOOT0_H -#define __BOOT0_H
/* BOOT0 header information */ -#define ARM_SOC_BOOT0_HOOK \
.word 0xbabeface; \
.word 0xbabeface;
this trailing semi-colon is not necessary
.word _end - _start
-#endif /* __BOOT0_H */ diff --git a/arch/arm/include/asm/arch-sunxi/boot0.h b/arch/arm/include/asm/arch-sunxi/boot0.h index 6f28d63..6a13db5 100644 --- a/arch/arm/include/asm/arch-sunxi/boot0.h +++ b/arch/arm/include/asm/arch-sunxi/boot0.h @@ -4,12 +4,6 @@
- SPDX-License-Identifier: GPL-2.0+
*/
-#ifndef __BOOT0_H -#define __BOOT0_H
/* reserve space for BOOT0 header information */ -#define ARM_SOC_BOOT0_HOOK \
b reset; \
b reset .space 1532
-#endif /* __BOOT0_H */ diff --git a/arch/arm/lib/vectors.S b/arch/arm/lib/vectors.S index 5cc132b..9fe7415 100644 --- a/arch/arm/lib/vectors.S +++ b/arch/arm/lib/vectors.S @@ -67,7 +67,6 @@ _start:
- use it here.
*/ #include <asm/arch/boot0.h> -ARM_SOC_BOOT0_HOOK #endif
/*
2.8.2
U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Tested-by: Steve Rae steve.rae@raedomain.com
Thanks, Steve

The ENABLE_ARM_SOC_BOOT0_HOOK option is a generic option shared with other boards. To allow alternative code to be inserted, we create another, now function specific config symbol on top of it to simplify later additions. No functional change at this time.
Signed-off-by: Andre Przywara andre.przywara@arm.com Acked-by: Maxime Ripard maxime.ripard@free-electrons.com Reviewed-by: Simon Glass sjg@chromium.org --- board/sunxi/Kconfig | 9 +++++++++ configs/pine64_plus_defconfig | 2 +- 2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/board/sunxi/Kconfig b/board/sunxi/Kconfig index e1d4ab1..0cd57a2 100644 --- a/board/sunxi/Kconfig +++ b/board/sunxi/Kconfig @@ -133,6 +133,15 @@ config MACH_SUN8I bool default y if MACH_SUN8I_A23 || MACH_SUN8I_A33 || MACH_SUN8I_H3 || MACH_SUN8I_A83T
+config RESERVE_ALLWINNER_BOOT0_HEADER + bool "reserve space for Allwinner boot0 header" + select ENABLE_ARM_SOC_BOOT0_HOOK + ---help--- + Prepend a 1536 byte (empty) header to the U-Boot image file, to be + filled with magic values post build. The Allwinner provided boot0 + blob relies on this information to load and execute U-Boot. + Only needed on 64-bit Allwinner boards so far when using boot0. + config DRAM_TYPE int "sunxi dram type" depends on MACH_SUN8I_A83T diff --git a/configs/pine64_plus_defconfig b/configs/pine64_plus_defconfig index 6d0198f..ea53b96 100644 --- a/configs/pine64_plus_defconfig +++ b/configs/pine64_plus_defconfig @@ -1,5 +1,5 @@ CONFIG_ARM=y -CONFIG_ENABLE_ARM_SOC_BOOT0_HOOK=y +CONFIG_RESERVE_ALLWINNER_BOOT0_HEADER=y CONFIG_ARCH_SUNXI=y CONFIG_MACH_SUN50I=y CONFIG_DRAM_CLK=672

The Allwinner A64 SoC starts execution in AArch32 mode, and both the boot ROM and Allwinner's boot0 keep running in this mode. So U-Boot gets entered in 32-bit, although we want it to run in AArch64.
By using a "magic" instruction, which happens to be an almost-NOP in AArch64 and a branch in AArch32, we differentiate between being entered in 64-bit or 32-bit mode. If in 64-bit mode, we proceed with the branch to reset, but in 32-bit mode we trigger an RMR write to bring the core into AArch64/EL3 and re-enter U-Boot at CONFIG_SYS_TEXT_BASE. This allows a 64-bit U-Boot to be both entered in 32 and 64-bit mode, so we can use the same start code for the SPL and the U-Boot proper.
We use the existing custom header (boot0.h) functionality, but restrict the existing boot0 header reservation to the non-SPL build now. A SPL wouldn't need such header anyway. This allows to have both options defined and lets us use one for the SPL and the other for U-Boot proper.
Also add arch/arm/mach-sunxi/rmr_switch.S, which contains the original ARM assembly code and instructions how to re-generate the encoded version.
Signed-off-by: Andre Przywara andre.przywara@arm.com --- arch/arm/include/asm/arch-sunxi/boot0.h | 30 ++++++++++++++++++++++++ arch/arm/mach-sunxi/rmr_switch.S | 41 +++++++++++++++++++++++++++++++++ board/sunxi/Kconfig | 14 +++++++++++ 3 files changed, 85 insertions(+) create mode 100644 arch/arm/mach-sunxi/rmr_switch.S
diff --git a/arch/arm/include/asm/arch-sunxi/boot0.h b/arch/arm/include/asm/arch-sunxi/boot0.h index 6a13db5..9c6d82d 100644 --- a/arch/arm/include/asm/arch-sunxi/boot0.h +++ b/arch/arm/include/asm/arch-sunxi/boot0.h @@ -4,6 +4,36 @@ * SPDX-License-Identifier: GPL-2.0+ */
+#if defined(CONFIG_RESERVE_ALLWINNER_BOOT0_HEADER) && !defined(CONFIG_SPL_BUILD) /* reserve space for BOOT0 header information */ b reset .space 1532 +#elif defined(CONFIG_ARM_BOOT_HOOK_RMR) +/* + * Switch into AArch64 if needed. + * Refer to arch/arm/mach-sunxi/rmr_switch.S for the original source. + */ + tst x0, x0 // this is "b #0x84" in ARM + b reset + .space 0x7c + .word 0xe59f1024 // ldr r1, [pc, #36] ; 0x170000a0 + .word 0xe59f0024 // ldr r0, [pc, #36] ; CONFIG_*_TEXT_BASE + .word 0xe5810000 // str r0, [r1] + .word 0xf57ff04f // dsb sy + .word 0xf57ff06f // isb sy + .word 0xee1c0f50 // mrc 15, 0, r0, cr12, cr0, {2} ; RMR + .word 0xe3800003 // orr r0, r0, #3 + .word 0xee0c0f50 // mcr 15, 0, r0, cr12, cr0, {2} ; RMR + .word 0xf57ff06f // isb sy + .word 0xe320f003 // wfi + .word 0xeafffffd // b @wfi + .word 0x017000a0 // writeable RVBAR mapping address +#ifdef CONFIG_SPL_BUILD + .word CONFIG_SPL_TEXT_BASE +#else + .word CONFIG_SYS_TEXT_BASE +#endif +#else +/* normal execution */ + b reset +#endif diff --git a/arch/arm/mach-sunxi/rmr_switch.S b/arch/arm/mach-sunxi/rmr_switch.S new file mode 100644 index 0000000..cefa930 --- /dev/null +++ b/arch/arm/mach-sunxi/rmr_switch.S @@ -0,0 +1,41 @@ +@ +@ ARMv8 RMR reset sequence on Allwinner SoCs. +@ +@ All 64-bit capable Allwinner SoCs reset in AArch32 (and continue to +@ exectute the Boot ROM in this state), so we need to switch to AArch64 +@ at some point. +@ Section G6.2.133 of the ARMv8 ARM describes the Reset Management Register +@ (RMR), which triggers a warm-reset of a core and can request to switch +@ into a different execution state (AArch32 or AArch64). +@ The address at which execution starts after the reset is held in the +@ RVBAR system register, which is architecturally read-only. +@ Allwinner provides a writable alias of this register in MMIO space, so +@ we can easily set the start address of AArch64 code. +@ This code below switches to AArch64 and starts execution at the specified +@ start address. It needs to be assembled by an ARM(32) assembler and +@ the machine code must be inserted as verbatim .word statements into the +@ beginning of the AArch64 U-Boot code. +@ To get the encoded bytes, use: +@ ${CROSS_COMPILE}gcc -c -o rmr_switch.o rmr_switch.S +@ ${CROSS_COMPILE}objdump -d rmr_switch.o +@ +@ The resulting words should be inserted into the U-Boot file at +@ arch/arm/include/asm/arch-sunxi/boot0.h. +@ +@ This file is not build by the U-Boot build system, but provided only as a +@ reference and to be able to regenerate a (probably fixed) version of this +@ code found in encoded form in boot0.h. + +.text + + ldr r1, =0x017000a0 @ MMIO mapped RVBAR[0] register + ldr r0, =0x57aA7add @ start address, to be replaced + str r0, [r1] + dsb sy + isb sy + mrc 15, 0, r0, cr12, cr0, 2 @ read RMR register + orr r0, r0, #3 @ request reset in AArch64 + mcr 15, 0, r0, cr12, cr0, 2 @ write RMR register + isb sy +1: wfi + b 1b diff --git a/board/sunxi/Kconfig b/board/sunxi/Kconfig index 0cd57a2..f020573 100644 --- a/board/sunxi/Kconfig +++ b/board/sunxi/Kconfig @@ -142,6 +142,20 @@ config RESERVE_ALLWINNER_BOOT0_HEADER blob relies on this information to load and execute U-Boot. Only needed on 64-bit Allwinner boards so far when using boot0.
+config ARM_BOOT_HOOK_RMR + bool + depends on ARM64 + default y + select ENABLE_ARM_SOC_BOOT0_HOOK + ---help--- + Insert some ARM32 code at the very beginning of the U-Boot binary + which uses an RMR register write to bring the core into AArch64 mode. + The very first instruction acts as a switch, since it's carefully + chosen to be a NOP in one mode and a branch in the other, so the + code would only be executed if not already in AArch64. + This allows both the SPL and the U-Boot proper to be entered in + either mode and switch to AArch64 if needed. + config DRAM_TYPE int "sunxi dram type" depends on MACH_SUN8I_A83T

On Mon, Dec 19, 2016 at 01:50:02AM +0000, Andre Przywara wrote:
The Allwinner A64 SoC starts execution in AArch32 mode, and both the boot ROM and Allwinner's boot0 keep running in this mode. So U-Boot gets entered in 32-bit, although we want it to run in AArch64.
By using a "magic" instruction, which happens to be an almost-NOP in AArch64 and a branch in AArch32, we differentiate between being entered in 64-bit or 32-bit mode. If in 64-bit mode, we proceed with the branch to reset, but in 32-bit mode we trigger an RMR write to bring the core into AArch64/EL3 and re-enter U-Boot at CONFIG_SYS_TEXT_BASE. This allows a 64-bit U-Boot to be both entered in 32 and 64-bit mode, so we can use the same start code for the SPL and the U-Boot proper.
We use the existing custom header (boot0.h) functionality, but restrict the existing boot0 header reservation to the non-SPL build now. A SPL wouldn't need such header anyway. This allows to have both options defined and lets us use one for the SPL and the other for U-Boot proper.
Also add arch/arm/mach-sunxi/rmr_switch.S, which contains the original ARM assembly code and instructions how to re-generate the encoded version.
Signed-off-by: Andre Przywara andre.przywara@arm.com
Acked-by: Maxime Ripard maxime.ripard@free-electrons.com
Thanks, Maxime

To avoid enumerating the very same DRAM values in defconfig files for each and every Allwinner A64 board out there, let's put some sane default values in the Kconfig file. Boards with different needs can override them at any time.
Signed-off-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Simon Glass sjg@chromium.org --- board/sunxi/Kconfig | 2 ++ configs/pine64_plus_defconfig | 2 -- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/board/sunxi/Kconfig b/board/sunxi/Kconfig index f020573..c2eb85e 100644 --- a/board/sunxi/Kconfig +++ b/board/sunxi/Kconfig @@ -168,6 +168,7 @@ config DRAM_CLK default 792 if MACH_SUN9I default 312 if MACH_SUN6I || MACH_SUN8I default 360 if MACH_SUN4I || MACH_SUN5I || MACH_SUN7I + default 672 if MACH_SUN50I ---help--- Set the dram clock speed, valid range 240 - 480 (prior to sun9i), must be a multiple of 24. For the sun9i (A80), the tested values @@ -187,6 +188,7 @@ config DRAM_ZQ default 123 if MACH_SUN4I || MACH_SUN5I || MACH_SUN6I || MACH_SUN8I default 127 if MACH_SUN7I default 4145117 if MACH_SUN9I + default 3881915 if MACH_SUN50I ---help--- Set the dram zq value.
diff --git a/configs/pine64_plus_defconfig b/configs/pine64_plus_defconfig index ea53b96..ebc24b8 100644 --- a/configs/pine64_plus_defconfig +++ b/configs/pine64_plus_defconfig @@ -2,8 +2,6 @@ CONFIG_ARM=y CONFIG_RESERVE_ALLWINNER_BOOT0_HEADER=y CONFIG_ARCH_SUNXI=y CONFIG_MACH_SUN50I=y -CONFIG_DRAM_CLK=672 -CONFIG_DRAM_ZQ=3881915 CONFIG_DEFAULT_DEVICE_TREE="sun50i-a64-pine64-plus" # CONFIG_SYS_MALLOC_CLEAR_ON_INIT is not set CONFIG_CONSOLE_MUX=y

On Mon, Dec 19, 2016 at 01:50:03AM +0000, Andre Przywara wrote:
To avoid enumerating the very same DRAM values in defconfig files for each and every Allwinner A64 board out there, let's put some sane default values in the Kconfig file. Boards with different needs can override them at any time.
Signed-off-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Simon Glass sjg@chromium.org
Acked-by: Maxime Ripard maxime.ripard@free-electrons.com
Maxime

From: Philipp Tomsich philipp.tomsich@theobroma-systems.com
So far the MBUS priority setup was done by writing "magic" values taken from a DRAM controller register dump after a boot0 run. By peeking at the Linux (sic!) MBUS driver [1] from the Allwinner BSP kernel, we learned more about the actual meaning of those bits. Add macros and refactor the setup function to make the MBUS setup much more readable and meaningful. The actual values used now are a transformation of the values used before, which are assembled by the new code to result in the same register writes. So this rework does not change any settings, also the code size stays the same.
The respective source files in the BSP kernel had a proper GPL header, so lifting this code and information into U-Boot is legal.
[Andre: provide a convenience macro to fit definitions on one line]
[1] https://github.com/longsleep/linux-pine64/blob/lichee-dev-v3.10.65/drivers/b...
Signed-off-by: Philipp Tomsich philipp.tomsich@theobroma-systems.com Signed-off-by: Andre Przywara andre.przywara@arm.com --- arch/arm/mach-sunxi/dram_sun8i_h3.c | 88 +++++++++++++++++++++++++++---------- 1 file changed, 64 insertions(+), 24 deletions(-)
diff --git a/arch/arm/mach-sunxi/dram_sun8i_h3.c b/arch/arm/mach-sunxi/dram_sun8i_h3.c index b08b8e6..8925446 100644 --- a/arch/arm/mach-sunxi/dram_sun8i_h3.c +++ b/arch/arm/mach-sunxi/dram_sun8i_h3.c @@ -94,6 +94,58 @@ static void mctl_dq_delay(u32 read, u32 write) udelay(1); }
+enum { + MBUS_PORT_CPU = 0, + MBUS_PORT_GPU = 1, + MBUS_PORT_UNUSED = 2, + MBUS_PORT_DMA = 3, + MBUS_PORT_VE = 4, + MBUS_PORT_CSI = 5, + MBUS_PORT_NAND = 6, + MBUS_PORT_SS = 7, + MBUS_PORT_TS = 8, + MBUS_PORT_DI = 9, + MBUS_PORT_DE = 10, + MBUS_PORT_DE_CFD = 11, +}; + +enum { + MBUS_QOS_LOWEST = 0, + MBUS_QOS_LOW, + MBUS_QOS_HIGH, + MBUS_QOS_HIGHEST +}; + +inline void mbus_configure_port(u8 port, + bool bwlimit, + bool priority, + u8 qos, /* MBUS_QOS_LOWEST .. MBUS_QOS_HIGEST */ + u8 waittime, /* 0 .. 0xf */ + u8 acs, /* 0 .. 0xff */ + u16 bwl0, /* 0 .. 0xffff, bandwidth limit in MB/s */ + u16 bwl1, + u16 bwl2) +{ + struct sunxi_mctl_com_reg * const mctl_com = + (struct sunxi_mctl_com_reg *)SUNXI_DRAM_COM_BASE; + + const u32 cfg0 = ( (bwlimit ? (1 << 0) : 0) + | (priority ? (1 << 1) : 0) + | ((qos & 0x3) << 2) + | ((waittime & 0xf) << 4) + | ((acs & 0xff) << 8) + | (bwl0 << 16) ); + const u32 cfg1 = ((u32)bwl2 << 16) | (bwl1 & 0xffff); + + debug("MBUS port %d cfg0 %08x cfg1 %08x\n", port, cfg0, cfg1); + writel(cfg0, &mctl_com->mcr[port][0]); + writel(cfg1, &mctl_com->mcr[port][1]); +} + +#define MBUS_CONF(port, bwlimit, qos, acs, bwl0, bwl1, bwl2) \ + mbus_configure_port(MBUS_PORT_ ## port, bwlimit, false, \ + MBUS_QOS_ ## qos, 0, acs, bwl0, bwl1, bwl2) + static void mctl_set_master_priority(void) { struct sunxi_mctl_com_reg * const mctl_com = @@ -105,30 +157,18 @@ static void mctl_set_master_priority(void) /* set cpu high priority */ writel(0x00000001, &mctl_com->mapr);
- writel(0x0200000d, &mctl_com->mcr[0][0]); - writel(0x00800100, &mctl_com->mcr[0][1]); - writel(0x06000009, &mctl_com->mcr[1][0]); - writel(0x01000400, &mctl_com->mcr[1][1]); - writel(0x0200000d, &mctl_com->mcr[2][0]); - writel(0x00600100, &mctl_com->mcr[2][1]); - writel(0x0100000d, &mctl_com->mcr[3][0]); - writel(0x00200080, &mctl_com->mcr[3][1]); - writel(0x07000009, &mctl_com->mcr[4][0]); - writel(0x01000640, &mctl_com->mcr[4][1]); - writel(0x0100000d, &mctl_com->mcr[5][0]); - writel(0x00200080, &mctl_com->mcr[5][1]); - writel(0x01000009, &mctl_com->mcr[6][0]); - writel(0x00400080, &mctl_com->mcr[6][1]); - writel(0x0100000d, &mctl_com->mcr[7][0]); - writel(0x00400080, &mctl_com->mcr[7][1]); - writel(0x0100000d, &mctl_com->mcr[8][0]); - writel(0x00400080, &mctl_com->mcr[8][1]); - writel(0x04000009, &mctl_com->mcr[9][0]); - writel(0x00400100, &mctl_com->mcr[9][1]); - writel(0x2000030d, &mctl_com->mcr[10][0]); - writel(0x04001800, &mctl_com->mcr[10][1]); - writel(0x04000009, &mctl_com->mcr[11][0]); - writel(0x00400120, &mctl_com->mcr[11][1]); + MBUS_CONF( CPU, true, HIGHEST, 0, 512, 256, 128); + MBUS_CONF( GPU, true, HIGH, 0, 1536, 1024, 256); + MBUS_CONF(UNUSED, true, HIGHEST, 0, 512, 256, 96); + MBUS_CONF( DMA, true, HIGHEST, 0, 256, 128, 32); + MBUS_CONF( VE, true, HIGH, 0, 1792, 1600, 256); + MBUS_CONF( CSI, true, HIGHEST, 0, 256, 128, 32); + MBUS_CONF( NAND, true, HIGH, 0, 256, 128, 64); + MBUS_CONF( SS, true, HIGHEST, 0, 256, 128, 64); + MBUS_CONF( TS, true, HIGHEST, 0, 256, 128, 64); + MBUS_CONF( DI, true, HIGH, 0, 1024, 256, 64); + MBUS_CONF( DE, true, HIGHEST, 3, 8192, 6120, 1024); + MBUS_CONF(DE_CFD, true, HIGH, 0, 1024, 288, 64); }
static void mctl_set_timing_params(struct dram_para *para)

On Mon, Dec 19, 2016 at 01:50:04AM +0000, Andre Przywara wrote:
From: Philipp Tomsich philipp.tomsich@theobroma-systems.com
So far the MBUS priority setup was done by writing "magic" values taken from a DRAM controller register dump after a boot0 run. By peeking at the Linux (sic!) MBUS driver [1] from the Allwinner BSP kernel, we learned more about the actual meaning of those bits. Add macros and refactor the setup function to make the MBUS setup much more readable and meaningful. The actual values used now are a transformation of the values used before, which are assembled by the new code to result in the same register writes. So this rework does not change any settings, also the code size stays the same.
The respective source files in the BSP kernel had a proper GPL header, so lifting this code and information into U-Boot is legal.
[Andre: provide a convenience macro to fit definitions on one line]
[1] https://github.com/longsleep/linux-pine64/blob/lichee-dev-v3.10.65/drivers/b...
Signed-off-by: Philipp Tomsich philipp.tomsich@theobroma-systems.com Signed-off-by: Andre Przywara andre.przywara@arm.com
Nice cleanup!
Acked-by: Maxime Ripard maxime.ripard@free-electrons.com
Thanks, Maxime

From: Jens Kuske jenskuske@gmail.com
The IOCR registers got renamed to BDLR to match the public documentation of similar controllers.
Signed-off-by: Jens Kuske jenskuske@gmail.com Signed-off-by: Andre Przywara andre.przywara@arm.com --- arch/arm/include/asm/arch-sunxi/dram_sun8i_h3.h | 43 ++++++++++++++----------- arch/arm/mach-sunxi/dram_sun8i_h3.c | 34 +++++++++---------- 2 files changed, 41 insertions(+), 36 deletions(-)
diff --git a/arch/arm/include/asm/arch-sunxi/dram_sun8i_h3.h b/arch/arm/include/asm/arch-sunxi/dram_sun8i_h3.h index d0f2b8a..346538c 100644 --- a/arch/arm/include/asm/arch-sunxi/dram_sun8i_h3.h +++ b/arch/arm/include/asm/arch-sunxi/dram_sun8i_h3.h @@ -106,20 +106,23 @@ struct sunxi_mctl_ctl_reg { u32 perfhpr[2]; /* 0x1c4 */ u32 perflpr[2]; /* 0x1cc */ u32 perfwr[2]; /* 0x1d4 */ - u8 res8[0x2c]; /* 0x1dc */ - u32 aciocr; /* 0x208 */ - u8 res9[0xf4]; /* 0x20c */ + u8 res8[0x24]; /* 0x1dc */ + u32 acmdlr; /* 0x200 AC master delay line register */ + u32 aclcdlr; /* 0x204 AC local calibrated delay line register */ + u32 aciocr; /* 0x208 AC I/O configuration register */ + u8 res9[0x4]; /* 0x20c */ + u32 acbdlr[31]; /* 0x210 AC bit delay line registers */ + u8 res10[0x74]; /* 0x28c */ struct { /* 0x300 DATX8 modules*/ - u32 mdlr; /* 0x00 */ - u32 lcdlr[3]; /* 0x04 */ - u32 iocr[11]; /* 0x10 IO configuration register */ - u32 bdlr6; /* 0x3c */ - u32 gtr; /* 0x40 */ - u32 gcr; /* 0x44 */ - u32 gsr[3]; /* 0x48 */ + u32 mdlr; /* 0x00 master delay line register */ + u32 lcdlr[3]; /* 0x04 local calibrated delay line registers */ + u32 bdlr[12]; /* 0x10 bit delay line registers */ + u32 gtr; /* 0x40 general timing register */ + u32 gcr; /* 0x44 general configuration register */ + u32 gsr[3]; /* 0x48 general status registers */ u8 res0[0x2c]; /* 0x54 */ - } datx[4]; - u8 res10[0x388]; /* 0x500 */ + } dx[4]; + u8 res11[0x388]; /* 0x500 */ u32 upd2; /* 0x888 */ };
@@ -172,14 +175,16 @@ struct sunxi_mctl_ctl_reg {
#define PGSR_INIT_DONE (0x1 << 0) /* PHY init done */
-#define ZQCR_PWRDOWN (0x1 << 31) /* ZQ power down */ +#define ZQCR_PWRDOWN (1U << 31) /* ZQ power down */
-#define DATX_IOCR_DQ(x) (x) /* DQ0-7 IOCR index */ -#define DATX_IOCR_DM (8) /* DM IOCR index */ -#define DATX_IOCR_DQS (9) /* DQS IOCR index */ -#define DATX_IOCR_DQSN (10) /* DQSN IOCR index */ +#define ACBDLR_WRITE_DELAY(x) ((x) << 8)
-#define DATX_IOCR_WRITE_DELAY(x) ((x) << 8) -#define DATX_IOCR_READ_DELAY(x) ((x) << 0) +#define DXBDLR_DQ(x) (x) /* DQ0-7 BDLR index */ +#define DXBDLR_DM 8 /* DM BDLR index */ +#define DXBDLR_DQS 9 /* DQS BDLR index */ +#define DXBDLR_DQSN 10 /* DQSN BDLR index */ + +#define DXBDLR_WRITE_DELAY(x) ((x) << 8) +#define DXBDLR_READ_DELAY(x) ((x) << 0)
#endif /* _SUNXI_DRAM_SUN8I_H3_H */ diff --git a/arch/arm/mach-sunxi/dram_sun8i_h3.c b/arch/arm/mach-sunxi/dram_sun8i_h3.c index 8925446..539268f 100644 --- a/arch/arm/mach-sunxi/dram_sun8i_h3.c +++ b/arch/arm/mach-sunxi/dram_sun8i_h3.c @@ -72,21 +72,21 @@ static void mctl_dq_delay(u32 read, u32 write) u32 val;
for (i = 0; i < 4; i++) { - val = DATX_IOCR_WRITE_DELAY((write >> (i * 4)) & 0xf) | - DATX_IOCR_READ_DELAY(((read >> (i * 4)) & 0xf) * 2); + val = DXBDLR_WRITE_DELAY((write >> (i * 4)) & 0xf) | + DXBDLR_READ_DELAY(((read >> (i * 4)) & 0xf) * 2);
- for (j = DATX_IOCR_DQ(0); j <= DATX_IOCR_DM; j++) - writel(val, &mctl_ctl->datx[i].iocr[j]); + for (j = DXBDLR_DQ(0); j <= DXBDLR_DM; j++) + writel(val, &mctl_ctl->dx[i].bdlr[j]); }
clrbits_le32(&mctl_ctl->pgcr[0], 1 << 26);
for (i = 0; i < 4; i++) { - val = DATX_IOCR_WRITE_DELAY((write >> (16 + i * 4)) & 0xf) | - DATX_IOCR_READ_DELAY((read >> (16 + i * 4)) & 0xf); + val = DXBDLR_WRITE_DELAY((write >> (16 + i * 4)) & 0xf) | + DXBDLR_READ_DELAY((read >> (16 + i * 4)) & 0xf);
- writel(val, &mctl_ctl->datx[i].iocr[DATX_IOCR_DQS]); - writel(val, &mctl_ctl->datx[i].iocr[DATX_IOCR_DQSN]); + writel(val, &mctl_ctl->dx[i].bdlr[DXBDLR_DQS]); + writel(val, &mctl_ctl->dx[i].bdlr[DXBDLR_DQSN]); }
setbits_le32(&mctl_ctl->pgcr[0], 1 << 26); @@ -384,7 +384,7 @@ static int mctl_channel_init(struct dram_para *para)
/* set dramc odt */ for (i = 0; i < 4; i++) - clrsetbits_le32(&mctl_ctl->datx[i].gcr, (0x3 << 4) | + clrsetbits_le32(&mctl_ctl->dx[i].gcr, (0x3 << 4) | (0x1 << 1) | (0x3 << 2) | (0x3 << 12) | (0x3 << 14), IS_ENABLED(CONFIG_DRAM_ODT_EN) ? 0x0 : 0x2); @@ -404,8 +404,8 @@ static int mctl_channel_init(struct dram_para *para)
/* set half DQ */ if (para->bus_width != 32) { - writel(0x0, &mctl_ctl->datx[2].gcr); - writel(0x0, &mctl_ctl->datx[3].gcr); + writel(0x0, &mctl_ctl->dx[2].gcr); + writel(0x0, &mctl_ctl->dx[3].gcr); }
/* data training configuration */ @@ -426,17 +426,17 @@ static int mctl_channel_init(struct dram_para *para) /* detect ranks and bus width */ if (readl(&mctl_ctl->pgsr[0]) & (0xfe << 20)) { /* only one rank */ - if (((readl(&mctl_ctl->datx[0].gsr[0]) >> 24) & 0x2) || - ((readl(&mctl_ctl->datx[1].gsr[0]) >> 24) & 0x2)) { + if (((readl(&mctl_ctl->dx[0].gsr[0]) >> 24) & 0x2) || + ((readl(&mctl_ctl->dx[1].gsr[0]) >> 24) & 0x2)) { clrsetbits_le32(&mctl_ctl->dtcr, 0xf << 24, 0x1 << 24); para->dual_rank = 0; }
/* only half DQ width */ - if (((readl(&mctl_ctl->datx[2].gsr[0]) >> 24) & 0x1) || - ((readl(&mctl_ctl->datx[3].gsr[0]) >> 24) & 0x1)) { - writel(0x0, &mctl_ctl->datx[2].gcr); - writel(0x0, &mctl_ctl->datx[3].gcr); + if (((readl(&mctl_ctl->dx[2].gsr[0]) >> 24) & 0x1) || + ((readl(&mctl_ctl->dx[3].gsr[0]) >> 24) & 0x1)) { + writel(0x0, &mctl_ctl->dx[2].gcr); + writel(0x0, &mctl_ctl->dx[3].gcr); para->bus_width = 16; }

From: Jens Kuske jenskuske@gmail.com
So far the DRAM driver for the H3 SoC (and apparently boot0/libdram as well) only applied coarse delay line settings, with one delay value for all the data lines in each byte lane and one value for the control lines.
Instead of setting the delays for whole bytes only allow setting it for each individual bit. Also add support for address/command lane delays.
For the purpose of this patch the rules for the existing coarse settings were just applied to the new scheme, so the actual register writes don't change for the H3. Other SoCs will utilize this feature later properly.
With a stock GCC 5.3.0 this increases the dram_sun8i_h3.o code size from 2296 to 2344 Bytes.
[Andre: move delay parameters into macros to ease later sharing, use defines for numbers of delay registers, extend commit message]
Signed-off-by: Jens Kuske jenskuske@gmail.com Signed-off-by: Andre Przywara andre.przywara@arm.com --- arch/arm/mach-sunxi/dram_sun8i_h3.c | 65 ++++++++++++++++++++++--------------- 1 file changed, 38 insertions(+), 27 deletions(-)
diff --git a/arch/arm/mach-sunxi/dram_sun8i_h3.c b/arch/arm/mach-sunxi/dram_sun8i_h3.c index 539268f..f89ce5c 100644 --- a/arch/arm/mach-sunxi/dram_sun8i_h3.c +++ b/arch/arm/mach-sunxi/dram_sun8i_h3.c @@ -15,13 +15,19 @@ #include <asm/arch/dram.h> #include <linux/kconfig.h>
+#define BITS_PER_BYTE 8 +#define NR_OF_BYTE_LANES (32 / BITS_PER_BYTE) +/* The eight data lines (DQn) plus DM, DQS and DQSN */ +#define LINES_PER_BYTE_LANE (BITS_PER_BYTE + 3) + struct dram_para { - u32 read_delays; - u32 write_delays; u16 page_size; u8 bus_width; u8 dual_rank; u8 row_bits; + const u8 dx_read_delays[NR_OF_BYTE_LANES][LINES_PER_BYTE_LANE]; + const u8 dx_write_delays[NR_OF_BYTE_LANES][LINES_PER_BYTE_LANE]; + const u8 ac_delays[31]; };
static inline int ns_to_t(int nanoseconds) @@ -64,34 +70,25 @@ static void mctl_phy_init(u32 val) mctl_await_completion(&mctl_ctl->pgsr[0], PGSR_INIT_DONE, 0x1); }
-static void mctl_dq_delay(u32 read, u32 write) +static void mctl_set_bit_delays(struct dram_para *para) { struct sunxi_mctl_ctl_reg * const mctl_ctl = (struct sunxi_mctl_ctl_reg *)SUNXI_DRAM_CTL0_BASE; int i, j; - u32 val; - - for (i = 0; i < 4; i++) { - val = DXBDLR_WRITE_DELAY((write >> (i * 4)) & 0xf) | - DXBDLR_READ_DELAY(((read >> (i * 4)) & 0xf) * 2); - - for (j = DXBDLR_DQ(0); j <= DXBDLR_DM; j++) - writel(val, &mctl_ctl->dx[i].bdlr[j]); - }
clrbits_le32(&mctl_ctl->pgcr[0], 1 << 26);
- for (i = 0; i < 4; i++) { - val = DXBDLR_WRITE_DELAY((write >> (16 + i * 4)) & 0xf) | - DXBDLR_READ_DELAY((read >> (16 + i * 4)) & 0xf); + for (i = 0; i < NR_OF_BYTE_LANES; i++) + for (j = 0; j < LINES_PER_BYTE_LANE; j++) + writel(DXBDLR_WRITE_DELAY(para->dx_write_delays[i][j]) | + DXBDLR_READ_DELAY(para->dx_read_delays[i][j]), + &mctl_ctl->dx[i].bdlr[j]);
- writel(val, &mctl_ctl->dx[i].bdlr[DXBDLR_DQS]); - writel(val, &mctl_ctl->dx[i].bdlr[DXBDLR_DQSN]); - } + for (i = 0; i < 31; i++) + writel(ACBDLR_WRITE_DELAY(para->ac_delays[i]), + &mctl_ctl->acbdlr[i]);
setbits_le32(&mctl_ctl->pgcr[0], 1 << 26); - - udelay(1); }
enum { @@ -412,11 +409,8 @@ static int mctl_channel_init(struct dram_para *para) clrsetbits_le32(&mctl_ctl->dtcr, 0xf << 24, (para->dual_rank ? 0x3 : 0x1) << 24);
- - if (para->read_delays || para->write_delays) { - mctl_dq_delay(para->read_delays, para->write_delays); - udelay(50); - } + mctl_set_bit_delays(para); + udelay(50);
mctl_zq_calibration(para);
@@ -490,6 +484,22 @@ static void mctl_auto_detect_dram_size(struct dram_para *para) break; }
+#define SUN8I_H3_DX_READ_DELAYS \ + {{ 18, 18, 18, 18, 18, 18, 18, 18, 18, 0, 0 }, \ + { 14, 14, 14, 14, 14, 14, 14, 14, 14, 0, 0 }, \ + { 18, 18, 18, 18, 18, 18, 18, 18, 18, 0, 0 }, \ + { 14, 14, 14, 14, 14, 14, 14, 14, 14, 0, 0 }} +#define SUN8I_H3_DX_WRITE_DELAYS \ + {{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 10 }, \ + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 10 }, \ + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 10 }, \ + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 6 }} +#define SUN8I_H3_AC_DELAYS \ + { 0, 0, 0, 0, 0, 0, 0, 0, \ + 0, 0, 0, 0, 0, 0, 0, 0, \ + 0, 0, 0, 0, 0, 0, 0, 0, \ + 0, 0, 0, 0, 0, 0, 0 } + unsigned long sunxi_dram_init(void) { struct sunxi_mctl_com_reg * const mctl_com = @@ -498,12 +508,13 @@ unsigned long sunxi_dram_init(void) (struct sunxi_mctl_ctl_reg *)SUNXI_DRAM_CTL0_BASE;
struct dram_para para = { - .read_delays = 0x00007979, /* dram_tpr12 */ - .write_delays = 0x6aaa0000, /* dram_tpr11 */ .dual_rank = 0, .bus_width = 32, .row_bits = 15, .page_size = 4096, + .dx_read_delays = SUN8I_H3_DX_READ_DELAYS, + .dx_write_delays = SUN8I_H3_DX_WRITE_DELAYS, + .ac_delays = SUN8I_H3_AC_DELAYS, };
mctl_sys_init(¶);

On Mon, Dec 19, 2016 at 01:50:06AM +0000, Andre Przywara wrote:
From: Jens Kuske jenskuske@gmail.com
So far the DRAM driver for the H3 SoC (and apparently boot0/libdram as well) only applied coarse delay line settings, with one delay value for all the data lines in each byte lane and one value for the control lines.
Instead of setting the delays for whole bytes only allow setting it for each individual bit. Also add support for address/command lane delays.
For the purpose of this patch the rules for the existing coarse settings were just applied to the new scheme, so the actual register writes don't change for the H3. Other SoCs will utilize this feature later properly.
With a stock GCC 5.3.0 this increases the dram_sun8i_h3.o code size from 2296 to 2344 Bytes.
[Andre: move delay parameters into macros to ease later sharing, use defines for numbers of delay registers, extend commit message]
Signed-off-by: Jens Kuske jenskuske@gmail.com Signed-off-by: Andre Przywara andre.przywara@arm.com
I said it earlier, but some comments on these new fields would really be welcome to document the structure and what values they're supposed to hold.

Hi,
On 19/12/16 09:57, Maxime Ripard wrote:
On Mon, Dec 19, 2016 at 01:50:06AM +0000, Andre Przywara wrote:
From: Jens Kuske jenskuske@gmail.com
So far the DRAM driver for the H3 SoC (and apparently boot0/libdram as well) only applied coarse delay line settings, with one delay value for all the data lines in each byte lane and one value for the control lines.
Instead of setting the delays for whole bytes only allow setting it for each individual bit. Also add support for address/command lane delays.
For the purpose of this patch the rules for the existing coarse settings were just applied to the new scheme, so the actual register writes don't change for the H3. Other SoCs will utilize this feature later properly.
With a stock GCC 5.3.0 this increases the dram_sun8i_h3.o code size from 2296 to 2344 Bytes.
[Andre: move delay parameters into macros to ease later sharing, use defines for numbers of delay registers, extend commit message]
Signed-off-by: Jens Kuske jenskuske@gmail.com Signed-off-by: Andre Przywara andre.przywara@arm.com
I said it earlier, but some comments on these new fields would really be welcome to document the structure and what values they're supposed to hold.
I guess you know as much as I do on this topic. Apparently there are delays to compensate for differing trace lengths on the PCB, one for every bit line. On the 32-bit DRAM controller these are grouped in four groups of 8 bits each (hence byte lane).
Please keep in mind that there is no easily available documentation, some parts are just copying what boot0/libdram does. As we don't know the exact meaning of these fields, I prefer to not add any guesses here. I was hoping that the defines I added in this version would shed some light on this? I am pretty sure of their meaning, but that's as far as it goes. For instance I have no idea what the units are, for cycles they are quite big.
So frankly I don't really know what to add here still.
I can elaborate on how to get these actual values from an existing boot0/libdram, but that would be more of a documentation patch than actual code.
Cheers, Andre.

On Mon, Dec 19, 2016 at 10:53:32AM +0000, Andre Przywara wrote:
Hi,
On 19/12/16 09:57, Maxime Ripard wrote:
On Mon, Dec 19, 2016 at 01:50:06AM +0000, Andre Przywara wrote:
From: Jens Kuske jenskuske@gmail.com
So far the DRAM driver for the H3 SoC (and apparently boot0/libdram as well) only applied coarse delay line settings, with one delay value for all the data lines in each byte lane and one value for the control lines.
Instead of setting the delays for whole bytes only allow setting it for each individual bit. Also add support for address/command lane delays.
For the purpose of this patch the rules for the existing coarse settings were just applied to the new scheme, so the actual register writes don't change for the H3. Other SoCs will utilize this feature later properly.
With a stock GCC 5.3.0 this increases the dram_sun8i_h3.o code size from 2296 to 2344 Bytes.
[Andre: move delay parameters into macros to ease later sharing, use defines for numbers of delay registers, extend commit message]
Signed-off-by: Jens Kuske jenskuske@gmail.com Signed-off-by: Andre Przywara andre.przywara@arm.com
I said it earlier, but some comments on these new fields would really be welcome to document the structure and what values they're supposed to hold.
I guess you know as much as I do on this topic.
I'd say that you know much more than I do on this one ;)
Apparently there are delays to compensate for differing trace lengths on the PCB, one for every bit line. On the 32-bit DRAM controller these are grouped in four groups of 8 bits each (hence byte lane).
Please keep in mind that there is no easily available documentation, some parts are just copying what boot0/libdram does. As we don't know the exact meaning of these fields, I prefer to not add any guesses here. I was hoping that the defines I added in this version would shed some light on this? I am pretty sure of their meaning, but that's as far as it goes. For instance I have no idea what the units are, for cycles they are quite big.
So frankly I don't really know what to add here still.
Ok.
I can elaborate on how to get these actual values from an existing boot0/libdram, but that would be more of a documentation patch than actual code.
That would still be very valuable, if it makes sense, you always can put that as a separate patch.
Thanks, Maxime

From: Philipp Tomsich philipp.tomsich@theobroma-systems.com
Signed-off-by: Philipp Tomsich philipp.tomsich@theobroma-systems.com Signed-off-by: Andre Przywara andre.przywara@arm.com --- arch/arm/mach-sunxi/clock_sun6i.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/mach-sunxi/clock_sun6i.c b/arch/arm/mach-sunxi/clock_sun6i.c index 80cfc0b..8e39bbe 100644 --- a/arch/arm/mach-sunxi/clock_sun6i.c +++ b/arch/arm/mach-sunxi/clock_sun6i.c @@ -224,7 +224,7 @@ void clock_set_pll11(unsigned int clk, bool sigma_delta_enable) (struct sunxi_ccm_reg *)SUNXI_CCM_BASE;
if (sigma_delta_enable) - writel(CCM_PLL11_PATTERN, &ccm->pll5_pattern_cfg); + writel(CCM_PLL11_PATTERN, &ccm->pll11_pattern_cfg0);
writel(CCM_PLL11_CTRL_EN | CCM_PLL11_CTRL_UPD | (sigma_delta_enable ? CCM_PLL11_CTRL_SIGMA_DELTA_EN : 0) |

On Mon, Dec 19, 2016 at 01:50:07AM +0000, Andre Przywara wrote:
From: Philipp Tomsich philipp.tomsich@theobroma-systems.com
Signed-off-by: Philipp Tomsich philipp.tomsich@theobroma-systems.com Signed-off-by: Andre Przywara andre.przywara@arm.com
Acked-by: Maxime Ripard maxime.ripard@free-electrons.com
Thanks! Maxime

From: Jens Kuske jenskuske@gmail.com
The A64 DRAM controller is very similar to the H3 one, so the code can be reused with some small changes. This refactoring does not change the code size for the existing H3 part.
[Andre: rework from #ifdefs to using socid parameters in static functions, minor fixes, merging in fixes from Jens]
Signed-off-by: Jens Kuske jenskuske@gmail.com Signed-off-by: Andre Przywara andre.przywara@arm.com --- arch/arm/include/asm/arch-sunxi/clock_sun6i.h | 1 + arch/arm/include/asm/arch-sunxi/cpu.h | 3 + arch/arm/include/asm/arch-sunxi/dram.h | 2 +- arch/arm/include/asm/arch-sunxi/dram_sun8i_h3.h | 10 +- arch/arm/mach-sunxi/Makefile | 1 + arch/arm/mach-sunxi/clock_sun6i.c | 2 +- arch/arm/mach-sunxi/dram_sun8i_h3.c | 211 ++++++++++++++++++------ 7 files changed, 174 insertions(+), 56 deletions(-)
diff --git a/arch/arm/include/asm/arch-sunxi/clock_sun6i.h b/arch/arm/include/asm/arch-sunxi/clock_sun6i.h index be9fcfd..3f87672 100644 --- a/arch/arm/include/asm/arch-sunxi/clock_sun6i.h +++ b/arch/arm/include/asm/arch-sunxi/clock_sun6i.h @@ -322,6 +322,7 @@ struct sunxi_ccm_reg { #define CCM_DRAMCLK_CFG_DIV0_MASK (0xf << 8) #define CCM_DRAMCLK_CFG_SRC_PLL5 (0x0 << 20) #define CCM_DRAMCLK_CFG_SRC_PLL6x2 (0x1 << 20) +#define CCM_DRAMCLK_CFG_SRC_PLL11 (0x1 << 20) /* A64 only */ #define CCM_DRAMCLK_CFG_SRC_MASK (0x3 << 20) #define CCM_DRAMCLK_CFG_UPD (0x1 << 16) #define CCM_DRAMCLK_CFG_RST (0x1 << 31) diff --git a/arch/arm/include/asm/arch-sunxi/cpu.h b/arch/arm/include/asm/arch-sunxi/cpu.h index 73583ed..6f96a97 100644 --- a/arch/arm/include/asm/arch-sunxi/cpu.h +++ b/arch/arm/include/asm/arch-sunxi/cpu.h @@ -13,4 +13,7 @@ #include <asm/arch/cpu_sun4i.h> #endif
+#define SOCID_A64 0x1689 +#define SOCID_H3 0x1680 + #endif /* _SUNXI_CPU_H */ diff --git a/arch/arm/include/asm/arch-sunxi/dram.h b/arch/arm/include/asm/arch-sunxi/dram.h index e0be744..53e6d47 100644 --- a/arch/arm/include/asm/arch-sunxi/dram.h +++ b/arch/arm/include/asm/arch-sunxi/dram.h @@ -24,7 +24,7 @@ #include <asm/arch/dram_sun8i_a33.h> #elif defined(CONFIG_MACH_SUN8I_A83T) #include <asm/arch/dram_sun8i_a83t.h> -#elif defined(CONFIG_MACH_SUN8I_H3) +#elif defined(CONFIG_MACH_SUN8I_H3) || defined(CONFIG_MACH_SUN50I) #include <asm/arch/dram_sun8i_h3.h> #elif defined(CONFIG_MACH_SUN9I) #include <asm/arch/dram_sun9i.h> diff --git a/arch/arm/include/asm/arch-sunxi/dram_sun8i_h3.h b/arch/arm/include/asm/arch-sunxi/dram_sun8i_h3.h index 346538c..25d07d9 100644 --- a/arch/arm/include/asm/arch-sunxi/dram_sun8i_h3.h +++ b/arch/arm/include/asm/arch-sunxi/dram_sun8i_h3.h @@ -15,7 +15,8 @@
struct sunxi_mctl_com_reg { u32 cr; /* 0x00 control register */ - u8 res0[0xc]; /* 0x04 */ + u8 res0[0x8]; /* 0x04 */ + u32 tmr; /* 0x0c (unused on H3) */ u32 mcr[16][2]; /* 0x10 */ u32 bwcr; /* 0x90 bandwidth control register */ u32 maer; /* 0x94 master enable register */ @@ -32,7 +33,9 @@ struct sunxi_mctl_com_reg { u32 swoffr; /* 0xc4 */ u8 res2[0x8]; /* 0xc8 */ u32 cccr; /* 0xd0 */ - u8 res3[0x72c]; /* 0xd4 */ + u8 res3[0x54]; /* 0xd4 */ + u32 mdfs_bwlr[3]; /* 0x128 (unused on H3) */ + u8 res4[0x6cc]; /* 0x134 */ u32 protect; /* 0x800 */ };
@@ -81,7 +84,8 @@ struct sunxi_mctl_ctl_reg { u32 rfshtmg; /* 0x90 refresh timing */ u32 rfshctl1; /* 0x94 */ u32 pwrtmg; /* 0x98 */ - u8 res3[0x20]; /* 0x9c */ + u8 res3[0x1c]; /* 0x9c */ + u32 vtfcr; /* 0xb8 (unused on H3) */ u32 dqsgmr; /* 0xbc */ u32 dtcr; /* 0xc0 */ u32 dtar[4]; /* 0xc4 */ diff --git a/arch/arm/mach-sunxi/Makefile b/arch/arm/mach-sunxi/Makefile index e73114e..7daba11 100644 --- a/arch/arm/mach-sunxi/Makefile +++ b/arch/arm/mach-sunxi/Makefile @@ -50,4 +50,5 @@ obj-$(CONFIG_MACH_SUN8I_A33) += dram_sun8i_a33.o obj-$(CONFIG_MACH_SUN8I_A83T) += dram_sun8i_a83t.o obj-$(CONFIG_MACH_SUN8I_H3) += dram_sun8i_h3.o obj-$(CONFIG_MACH_SUN9I) += dram_sun9i.o +obj-$(CONFIG_MACH_SUN50I) += dram_sun8i_h3.o endif diff --git a/arch/arm/mach-sunxi/clock_sun6i.c b/arch/arm/mach-sunxi/clock_sun6i.c index 8e39bbe..d123b3a 100644 --- a/arch/arm/mach-sunxi/clock_sun6i.c +++ b/arch/arm/mach-sunxi/clock_sun6i.c @@ -217,7 +217,7 @@ done: } #endif
-#ifdef CONFIG_MACH_SUN8I_A33 +#if defined(CONFIG_MACH_SUN8I_A33) || defined(CONFIG_MACH_SUN50I) void clock_set_pll11(unsigned int clk, bool sigma_delta_enable) { struct sunxi_ccm_reg * const ccm = diff --git a/arch/arm/mach-sunxi/dram_sun8i_h3.c b/arch/arm/mach-sunxi/dram_sun8i_h3.c index f89ce5c..6ee73ae 100644 --- a/arch/arm/mach-sunxi/dram_sun8i_h3.c +++ b/arch/arm/mach-sunxi/dram_sun8i_h3.c @@ -13,6 +13,7 @@ #include <asm/io.h> #include <asm/arch/clock.h> #include <asm/arch/dram.h> +#include <asm/arch/cpu.h> #include <linux/kconfig.h>
#define BITS_PER_BYTE 8 @@ -37,30 +38,6 @@ static inline int ns_to_t(int nanoseconds) return DIV_ROUND_UP(ctrl_freq * nanoseconds, 1000); }
-static u32 bin_to_mgray(int val) -{ - static const u8 lookup_table[32] = { - 0x00, 0x01, 0x02, 0x03, 0x06, 0x07, 0x04, 0x05, - 0x0c, 0x0d, 0x0e, 0x0f, 0x0a, 0x0b, 0x08, 0x09, - 0x18, 0x19, 0x1a, 0x1b, 0x1e, 0x1f, 0x1c, 0x1d, - 0x14, 0x15, 0x16, 0x17, 0x12, 0x13, 0x10, 0x11, - }; - - return lookup_table[clamp(val, 0, 31)]; -} - -static int mgray_to_bin(u32 val) -{ - static const u8 lookup_table[32] = { - 0x00, 0x01, 0x02, 0x03, 0x06, 0x07, 0x04, 0x05, - 0x0e, 0x0f, 0x0c, 0x0d, 0x08, 0x09, 0x0a, 0x0b, - 0x1e, 0x1f, 0x1c, 0x1d, 0x18, 0x19, 0x1a, 0x1b, - 0x10, 0x11, 0x12, 0x13, 0x16, 0x17, 0x14, 0x15, - }; - - return lookup_table[val & 0x1f]; -} - static void mctl_phy_init(u32 val) { struct sunxi_mctl_ctl_reg * const mctl_ctl = @@ -143,13 +120,13 @@ inline void mbus_configure_port(u8 port, mbus_configure_port(MBUS_PORT_ ## port, bwlimit, false, \ MBUS_QOS_ ## qos, 0, acs, bwl0, bwl1, bwl2)
-static void mctl_set_master_priority(void) +static void mctl_set_master_priority_h3(void) { struct sunxi_mctl_com_reg * const mctl_com = (struct sunxi_mctl_com_reg *)SUNXI_DRAM_COM_BASE;
/* enable bandwidth limit windows and set windows size 1us */ - writel(0x00010190, &mctl_com->bwcr); + writel((1 << 16) | (400 << 0), &mctl_com->bwcr);
/* set cpu high priority */ writel(0x00000001, &mctl_com->mapr); @@ -168,7 +145,46 @@ static void mctl_set_master_priority(void) MBUS_CONF(DE_CFD, true, HIGH, 0, 1024, 288, 64); }
-static void mctl_set_timing_params(struct dram_para *para) +static void mctl_set_master_priority_a64(void) +{ + struct sunxi_mctl_com_reg * const mctl_com = + (struct sunxi_mctl_com_reg *)SUNXI_DRAM_COM_BASE; + + /* enable bandwidth limit windows and set windows size 1us */ + writel(399, &mctl_com->tmr); + writel((1 << 16), &mctl_com->bwcr); + + /* Port 2 is reserved per Allwinner's linux-3.10 source, yet they + * initialise it */ + MBUS_CONF( CPU, true, HIGHEST, 0, 160, 100, 80); + MBUS_CONF( GPU, false, HIGH, 0, 1536, 1400, 256); + MBUS_CONF(UNUSED, true, HIGHEST, 0, 512, 256, 96); + MBUS_CONF( DMA, true, HIGH, 0, 256, 80, 100); + MBUS_CONF( VE, true, HIGH, 0, 1792, 1600, 256); + MBUS_CONF( CSI, true, HIGH, 0, 256, 128, 0); + MBUS_CONF( NAND, true, HIGH, 0, 256, 128, 64); + MBUS_CONF( SS, true, HIGHEST, 0, 256, 128, 64); + MBUS_CONF( TS, true, HIGHEST, 0, 256, 128, 64); + MBUS_CONF( DI, true, HIGH, 0, 1024, 256, 64); + MBUS_CONF( DE, true, HIGH, 2, 8192, 6144, 2048); + MBUS_CONF(DE_CFD, true, HIGH, 0, 1280, 144, 64); + + writel(0x81000004, &mctl_com->mdfs_bwlr[2]); +} + +static void mctl_set_master_priority(uint16_t socid) +{ + switch (socid) { + case SOCID_H3: + mctl_set_master_priority_h3(); + return; + case SOCID_A64: + mctl_set_master_priority_a64(); + return; + } +} + +static void mctl_set_timing_params(uint16_t socid, struct dram_para *para) { struct sunxi_mctl_ctl_reg * const mctl_ctl = (struct sunxi_mctl_ctl_reg *)SUNXI_DRAM_CTL0_BASE; @@ -249,7 +265,31 @@ static void mctl_set_timing_params(struct dram_para *para) writel(RFSHTMG_TREFI(trefi) | RFSHTMG_TRFC(trfc), &mctl_ctl->rfshtmg); }
-static void mctl_zq_calibration(struct dram_para *para) +static u32 bin_to_mgray(int val) +{ + static const u8 lookup_table[32] = { + 0x00, 0x01, 0x02, 0x03, 0x06, 0x07, 0x04, 0x05, + 0x0c, 0x0d, 0x0e, 0x0f, 0x0a, 0x0b, 0x08, 0x09, + 0x18, 0x19, 0x1a, 0x1b, 0x1e, 0x1f, 0x1c, 0x1d, + 0x14, 0x15, 0x16, 0x17, 0x12, 0x13, 0x10, 0x11, + }; + + return lookup_table[clamp(val, 0, 31)]; +} + +static int mgray_to_bin(u32 val) +{ + static const u8 lookup_table[32] = { + 0x00, 0x01, 0x02, 0x03, 0x06, 0x07, 0x04, 0x05, + 0x0e, 0x0f, 0x0c, 0x0d, 0x08, 0x09, 0x0a, 0x0b, + 0x1e, 0x1f, 0x1c, 0x1d, 0x18, 0x19, 0x1a, 0x1b, + 0x10, 0x11, 0x12, 0x13, 0x16, 0x17, 0x14, 0x15, + }; + + return lookup_table[val & 0x1f]; +} + +static void mctl_h3_zq_calibration_quirk(struct dram_para *para) { struct sunxi_mctl_ctl_reg * const mctl_ctl = (struct sunxi_mctl_ctl_reg *)SUNXI_DRAM_CTL0_BASE; @@ -319,7 +359,7 @@ static void mctl_set_cr(struct dram_para *para) MCTL_CR_ROW_BITS(para->row_bits), &mctl_com->cr); }
-static void mctl_sys_init(struct dram_para *para) +static void mctl_sys_init(uint16_t socid, struct dram_para *para) { struct sunxi_ccm_reg * const ccm = (struct sunxi_ccm_reg *)SUNXI_CCM_BASE; @@ -331,16 +371,30 @@ static void mctl_sys_init(struct dram_para *para) clrbits_le32(&ccm->ahb_gate0, 1 << AHB_GATE_OFFSET_MCTL); clrbits_le32(&ccm->ahb_reset0_cfg, 1 << AHB_RESET_OFFSET_MCTL); clrbits_le32(&ccm->pll5_cfg, CCM_PLL5_CTRL_EN); + if (socid == SOCID_A64) + clrbits_le32(&ccm->pll11_cfg, CCM_PLL11_CTRL_EN); udelay(10);
clrbits_le32(&ccm->dram_clk_cfg, CCM_DRAMCLK_CFG_RST); udelay(1000);
- clock_set_pll5(CONFIG_DRAM_CLK * 2 * 1000000, false); - clrsetbits_le32(&ccm->dram_clk_cfg, - CCM_DRAMCLK_CFG_DIV_MASK | CCM_DRAMCLK_CFG_SRC_MASK, - CCM_DRAMCLK_CFG_DIV(1) | CCM_DRAMCLK_CFG_SRC_PLL5 | - CCM_DRAMCLK_CFG_UPD); + if (socid == SOCID_A64) { + clock_set_pll11(CONFIG_DRAM_CLK * 2 * 1000000, false); + clrsetbits_le32(&ccm->dram_clk_cfg, + CCM_DRAMCLK_CFG_DIV_MASK | + CCM_DRAMCLK_CFG_SRC_MASK, + CCM_DRAMCLK_CFG_DIV(1) | + CCM_DRAMCLK_CFG_SRC_PLL11 | + CCM_DRAMCLK_CFG_UPD); + } else if (socid == SOCID_H3) { + clock_set_pll5(CONFIG_DRAM_CLK * 2 * 1000000, false); + clrsetbits_le32(&ccm->dram_clk_cfg, + CCM_DRAMCLK_CFG_DIV_MASK | + CCM_DRAMCLK_CFG_SRC_MASK, + CCM_DRAMCLK_CFG_DIV(1) | + CCM_DRAMCLK_CFG_SRC_PLL5 | + CCM_DRAMCLK_CFG_UPD); + } mctl_await_completion(&ccm->dram_clk_cfg, CCM_DRAMCLK_CFG_UPD, 0);
setbits_le32(&ccm->ahb_reset0_cfg, 1 << AHB_RESET_OFFSET_MCTL); @@ -355,7 +409,7 @@ static void mctl_sys_init(struct dram_para *para) udelay(500); }
-static int mctl_channel_init(struct dram_para *para) +static int mctl_channel_init(uint16_t socid, struct dram_para *para) { struct sunxi_mctl_com_reg * const mctl_com = (struct sunxi_mctl_com_reg *)SUNXI_DRAM_COM_BASE; @@ -365,8 +419,8 @@ static int mctl_channel_init(struct dram_para *para) unsigned int i;
mctl_set_cr(para); - mctl_set_timing_params(para); - mctl_set_master_priority(); + mctl_set_timing_params(socid, para); + mctl_set_master_priority(socid);
/* setting VTC, default disable all VT */ clrbits_le32(&mctl_ctl->pgcr[0], (1 << 30) | 0x3f); @@ -392,12 +446,18 @@ static int mctl_channel_init(struct dram_para *para) /* set DQS auto gating PD mode */ setbits_le32(&mctl_ctl->pgcr[2], 0x3 << 6);
- /* dx ddr_clk & hdr_clk dynamic mode */ - clrbits_le32(&mctl_ctl->pgcr[0], (0x3 << 14) | (0x3 << 12)); - - /* dphy & aphy phase select 270 degree */ - clrsetbits_le32(&mctl_ctl->pgcr[2], (0x3 << 10) | (0x3 << 8), - (0x1 << 10) | (0x2 << 8)); + if (socid == SOCID_H3) { + /* dx ddr_clk & hdr_clk dynamic mode */ + clrbits_le32(&mctl_ctl->pgcr[0], (0x3 << 14) | (0x3 << 12)); + + /* dphy & aphy phase select 270 degree */ + clrsetbits_le32(&mctl_ctl->pgcr[2], (0x3 << 10) | (0x3 << 8), + (0x1 << 10) | (0x2 << 8)); + } else if (socid == SOCID_A64) { + /* dphy & aphy phase select ? */ + clrsetbits_le32(&mctl_ctl->pgcr[2], (0x3 << 10) | (0x3 << 8), + (0x0 << 10) | (0x3 << 8)); + }
/* set half DQ */ if (para->bus_width != 32) { @@ -412,10 +472,17 @@ static int mctl_channel_init(struct dram_para *para) mctl_set_bit_delays(para); udelay(50);
- mctl_zq_calibration(para); + if (socid == SOCID_H3) { + mctl_h3_zq_calibration_quirk(para);
- mctl_phy_init(PIR_PLLINIT | PIR_DCAL | PIR_PHYRST | PIR_DRAMRST | - PIR_DRAMINIT | PIR_QSGATE); + mctl_phy_init(PIR_PLLINIT | PIR_DCAL | PIR_PHYRST | + PIR_DRAMRST | PIR_DRAMINIT | PIR_QSGATE); + } else if (socid == SOCID_A64) { + clrsetbits_le32(&mctl_ctl->zqcr, 0xffffff, CONFIG_DRAM_ZQ); + + mctl_phy_init(PIR_ZCAL | PIR_PLLINIT | PIR_DCAL | PIR_PHYRST | + PIR_DRAMRST | PIR_DRAMINIT | PIR_QSGATE); + }
/* detect ranks and bus width */ if (readl(&mctl_ctl->pgsr[0]) & (0xfe << 20)) { @@ -453,7 +520,10 @@ static int mctl_channel_init(struct dram_para *para) udelay(10);
/* set PGCR3, CKE polarity */ - writel(0x00aa0060, &mctl_ctl->pgcr[3]); + if (socid == SOCID_H3) + writel(0x00aa0060, &mctl_ctl->pgcr[3]); + else if (socid == SOCID_A64) + writel(0xc0aa0060, &mctl_ctl->pgcr[3]);
/* power down zq calibration module for power save */ setbits_le32(&mctl_ctl->zqcr, ZQCR_PWRDOWN); @@ -500,6 +570,22 @@ static void mctl_auto_detect_dram_size(struct dram_para *para) 0, 0, 0, 0, 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0, 0 }
+#define SUN50I_A64_DX_READ_DELAYS \ + {{ 16, 16, 16, 16, 17, 16, 16, 17, 16, 1, 0 }, \ + { 17, 17, 17, 17, 17, 17, 17, 17, 17, 1, 0 }, \ + { 16, 17, 17, 16, 16, 16, 16, 16, 16, 0, 0 }, \ + { 17, 17, 17, 17, 17, 17, 17, 17, 17, 1, 0 }} +#define SUN50I_A64_DX_WRITE_DELAYS \ + {{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 15 }, \ + { 0, 0, 0, 0, 1, 1, 1, 1, 0, 10, 10 }, \ + { 1, 0, 1, 1, 1, 1, 1, 1, 0, 11, 11 }, \ + { 1, 0, 0, 1, 1, 1, 1, 1, 0, 12, 12 }} +#define SUN50I_A64_AC_DELAYS \ + { 5, 5, 13, 10, 2, 5, 3, 3, \ + 0, 3, 3, 3, 1, 0, 0, 0, \ + 3, 4, 0, 3, 4, 1, 4, 0, \ + 1, 1, 0, 1, 13, 5, 4 } + unsigned long sunxi_dram_init(void) { struct sunxi_mctl_com_reg * const mctl_com = @@ -512,13 +598,30 @@ unsigned long sunxi_dram_init(void) .bus_width = 32, .row_bits = 15, .page_size = 4096, + +#if defined(CONFIG_MACH_SUN8I_H3) .dx_read_delays = SUN8I_H3_DX_READ_DELAYS, .dx_write_delays = SUN8I_H3_DX_WRITE_DELAYS, .ac_delays = SUN8I_H3_AC_DELAYS, +#elif defined(CONFIG_MACH_SUN50I) + .dx_read_delays = SUN50I_A64_DX_READ_DELAYS, + .dx_write_delays = SUN50I_A64_DX_WRITE_DELAYS, + .ac_delays = SUN50I_A64_AC_DELAYS, +#endif }; - - mctl_sys_init(¶); - if (mctl_channel_init(¶)) +/* + * Let the compiler optimize alternatives away by passing this value into + * the static functions. This saves us #ifdefs, but still keeps the binary + * small. + */ +#if defined(CONFIG_MACH_SUN8I_H3) + uint16_t socid = SOCID_H3; +#elif defined(CONFIG_MACH_SUN50I) + uint16_t socid = SOCID_A64; +#endif + + mctl_sys_init(socid, ¶); + if (mctl_channel_init(socid, ¶)) return 0;
if (para.dual_rank) @@ -528,7 +631,13 @@ unsigned long sunxi_dram_init(void) udelay(1);
/* odt delay */ - writel(0x0c000400, &mctl_ctl->odtcfg); + if (socid == SOCID_H3) + writel(0x0c000400, &mctl_ctl->odtcfg); + + if (socid == SOCID_A64) { + setbits_le32(&mctl_ctl->vtfcr, 2 << 8); + clrbits_le32(&mctl_ctl->pgcr[2], (1 << 13)); + }
/* clear credit value */ setbits_le32(&mctl_com->cccr, 1 << 31);

On Mon, Dec 19, 2016 at 01:50:08AM +0000, Andre Przywara wrote:
From: Jens Kuske jenskuske@gmail.com
The A64 DRAM controller is very similar to the H3 one, so the code can be reused with some small changes. This refactoring does not change the code size for the existing H3 part.
[Andre: rework from #ifdefs to using socid parameters in static functions, minor fixes, merging in fixes from Jens]
Signed-off-by: Jens Kuske jenskuske@gmail.com Signed-off-by: Andre Przywara andre.przywara@arm.com
Acked-by: Maxime Ripard maxime.ripard@free-electrons.com
Thanks, Maxime

According to Jens disabling the on-die-termination should set bit 5, not bit 1 in the respective register. Fix this.
Reported-by: Jens Kuske jenskuske@gmail.com Signed-off-by: Andre Przywara andre.przywara@arm.com --- arch/arm/mach-sunxi/dram_sun8i_h3.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/mach-sunxi/dram_sun8i_h3.c b/arch/arm/mach-sunxi/dram_sun8i_h3.c index 6ee73ae..1bdd738 100644 --- a/arch/arm/mach-sunxi/dram_sun8i_h3.c +++ b/arch/arm/mach-sunxi/dram_sun8i_h3.c @@ -438,7 +438,7 @@ static int mctl_channel_init(uint16_t socid, struct dram_para *para) clrsetbits_le32(&mctl_ctl->dx[i].gcr, (0x3 << 4) | (0x1 << 1) | (0x3 << 2) | (0x3 << 12) | (0x3 << 14), - IS_ENABLED(CONFIG_DRAM_ODT_EN) ? 0x0 : 0x2); + IS_ENABLED(CONFIG_DRAM_ODT_EN) ? 0x0 : 0x20);
/* AC PDR should always ON */ setbits_le32(&mctl_ctl->aciocr, 0x1 << 1);

On Mon, Dec 19, 2016 at 01:50:09AM +0000, Andre Przywara wrote:
According to Jens disabling the on-die-termination should set bit 5, not bit 1 in the respective register. Fix this.
Reported-by: Jens Kuske jenskuske@gmail.com Signed-off-by: Andre Przywara andre.przywara@arm.com
arch/arm/mach-sunxi/dram_sun8i_h3.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/mach-sunxi/dram_sun8i_h3.c b/arch/arm/mach-sunxi/dram_sun8i_h3.c index 6ee73ae..1bdd738 100644 --- a/arch/arm/mach-sunxi/dram_sun8i_h3.c +++ b/arch/arm/mach-sunxi/dram_sun8i_h3.c @@ -438,7 +438,7 @@ static int mctl_channel_init(uint16_t socid, struct dram_para *para) clrsetbits_le32(&mctl_ctl->dx[i].gcr, (0x3 << 4) | (0x1 << 1) | (0x3 << 2) | (0x3 << 12) | (0x3 << 14),
IS_ENABLED(CONFIG_DRAM_ODT_EN) ? 0x0 : 0x2);
IS_ENABLED(CONFIG_DRAM_ODT_EN) ? 0x0 : 0x20);
You should use a define here if that bit function is known.
Maxime

Hi,
On 19/12/16 10:01, Maxime Ripard wrote:
On Mon, Dec 19, 2016 at 01:50:09AM +0000, Andre Przywara wrote:
According to Jens disabling the on-die-termination should set bit 5, not bit 1 in the respective register. Fix this.
Reported-by: Jens Kuske jenskuske@gmail.com Signed-off-by: Andre Przywara andre.przywara@arm.com
arch/arm/mach-sunxi/dram_sun8i_h3.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/mach-sunxi/dram_sun8i_h3.c b/arch/arm/mach-sunxi/dram_sun8i_h3.c index 6ee73ae..1bdd738 100644 --- a/arch/arm/mach-sunxi/dram_sun8i_h3.c +++ b/arch/arm/mach-sunxi/dram_sun8i_h3.c @@ -438,7 +438,7 @@ static int mctl_channel_init(uint16_t socid, struct dram_para *para) clrsetbits_le32(&mctl_ctl->dx[i].gcr, (0x3 << 4) | (0x1 << 1) | (0x3 << 2) | (0x3 << 12) | (0x3 << 14),
IS_ENABLED(CONFIG_DRAM_ODT_EN) ? 0x0 : 0x2);
IS_ENABLED(CONFIG_DRAM_ODT_EN) ? 0x0 : 0x20);
You should use a define here if that bit function is known.
I agree, and I tried but I failed to find one. That part seems to differ from the Keystone documentation, which doesn't know an ODT _dis_able bit at all.
Maybe Jens can add his source?
Cheers, Andre.

On 19.12.2016 11:33, Andre Przywara wrote:
Hi,
On 19/12/16 10:01, Maxime Ripard wrote:
On Mon, Dec 19, 2016 at 01:50:09AM +0000, Andre Przywara wrote:
According to Jens disabling the on-die-termination should set bit 5, not bit 1 in the respective register. Fix this.
Reported-by: Jens Kuske jenskuske@gmail.com Signed-off-by: Andre Przywara andre.przywara@arm.com
arch/arm/mach-sunxi/dram_sun8i_h3.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/mach-sunxi/dram_sun8i_h3.c b/arch/arm/mach-sunxi/dram_sun8i_h3.c index 6ee73ae..1bdd738 100644 --- a/arch/arm/mach-sunxi/dram_sun8i_h3.c +++ b/arch/arm/mach-sunxi/dram_sun8i_h3.c @@ -438,7 +438,7 @@ static int mctl_channel_init(uint16_t socid, struct dram_para *para) clrsetbits_le32(&mctl_ctl->dx[i].gcr, (0x3 << 4) | (0x1 << 1) | (0x3 << 2) | (0x3 << 12) | (0x3 << 14),
IS_ENABLED(CONFIG_DRAM_ODT_EN) ? 0x0 : 0x2);
IS_ENABLED(CONFIG_DRAM_ODT_EN) ? 0x0 : 0x20);
You should use a define here if that bit function is known.
I agree, and I tried but I failed to find one. That part seems to differ from the Keystone documentation, which doesn't know an ODT _dis_able bit at all.
Maybe Jens can add his source?
I think DXnGCR bits 5:4 mean:
0x0 = dynamic ODT 0x1 = ODT always on 0x2 = ODT off
But its only guessed, there is no documentation. Compare to A83 boot0 code [1], setting dram_odt_en parameter in H3 boot0 to these values results in matching changes in bits 5:4.
Jens
[1] https://github.com/allwinner-zh/bootloader/blob/182661abb29d5467a15eb83cf982...

On Mon, Dec 19, 2016 at 03:32:31PM +0100, Jens Kuske wrote:
On 19.12.2016 11:33, Andre Przywara wrote:
Hi,
On 19/12/16 10:01, Maxime Ripard wrote:
On Mon, Dec 19, 2016 at 01:50:09AM +0000, Andre Przywara wrote:
According to Jens disabling the on-die-termination should set bit 5, not bit 1 in the respective register. Fix this.
Reported-by: Jens Kuske jenskuske@gmail.com Signed-off-by: Andre Przywara andre.przywara@arm.com
arch/arm/mach-sunxi/dram_sun8i_h3.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/mach-sunxi/dram_sun8i_h3.c b/arch/arm/mach-sunxi/dram_sun8i_h3.c index 6ee73ae..1bdd738 100644 --- a/arch/arm/mach-sunxi/dram_sun8i_h3.c +++ b/arch/arm/mach-sunxi/dram_sun8i_h3.c @@ -438,7 +438,7 @@ static int mctl_channel_init(uint16_t socid, struct dram_para *para) clrsetbits_le32(&mctl_ctl->dx[i].gcr, (0x3 << 4) | (0x1 << 1) | (0x3 << 2) | (0x3 << 12) | (0x3 << 14),
IS_ENABLED(CONFIG_DRAM_ODT_EN) ? 0x0 : 0x2);
IS_ENABLED(CONFIG_DRAM_ODT_EN) ? 0x0 : 0x20);
You should use a define here if that bit function is known.
I agree, and I tried but I failed to find one. That part seems to differ from the Keystone documentation, which doesn't know an ODT _dis_able bit at all.
Maybe Jens can add his source?
I think DXnGCR bits 5:4 mean:
0x0 = dynamic ODT 0x1 = ODT always on 0x2 = ODT off
But its only guessed, there is no documentation. Compare to A83 boot0 code [1], setting dram_odt_en parameter in H3 boot0 to these values results in matching changes in bits 5:4.
Then we can just add a comment on top of those defines saying that we're not quite sure.
Maxime

Fix the output of the DRAM size on AArch64 SPLs.
Signed-off-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Alexander Graf agraf@suse.de Reviewed-by: Simon Glass sjg@chromium.org --- arch/arm/mach-sunxi/dram_sun8i_h3.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/mach-sunxi/dram_sun8i_h3.c b/arch/arm/mach-sunxi/dram_sun8i_h3.c index 1bdd738..cfc8479 100644 --- a/arch/arm/mach-sunxi/dram_sun8i_h3.c +++ b/arch/arm/mach-sunxi/dram_sun8i_h3.c @@ -646,6 +646,6 @@ unsigned long sunxi_dram_init(void) mctl_auto_detect_dram_size(¶); mctl_set_cr(¶);
- return (1 << (para.row_bits + 3)) * para.page_size * + return (1UL << (para.row_bits + 3)) * para.page_size * (para.dual_rank ? 2 : 1); }

On Mon, Dec 19, 2016 at 01:50:10AM +0000, Andre Przywara wrote:
Fix the output of the DRAM size on AArch64 SPLs.
Signed-off-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Alexander Graf agraf@suse.de Reviewed-by: Simon Glass sjg@chromium.org
Acked-by: Maxime Ripard maxime.ripard@free-electrons.com
Thanks, Maxime

Now that the SPL is ready to be compiled in AArch64 and the DRAM init code is ready, enable SPL support for the A64 SoC and in the Pine64 defconfig. For now we keep the boot0 header in the U-Boot proper, as this allows to still use boot0 as an SPL replacement without hurting the SPL use case. We disable FEL support for now by making its compilation conditional and disabling it for ARM64, as the code isn't ready yet.
Signed-off-by: Andre Przywara andre.przywara@arm.com --- arch/arm/mach-sunxi/board.c | 2 +- board/sunxi/Kconfig | 2 ++ configs/pine64_plus_defconfig | 1 + include/configs/sunxi-common.h | 2 ++ 4 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mach-sunxi/board.c b/arch/arm/mach-sunxi/board.c index aa11493..52be5b0 100644 --- a/arch/arm/mach-sunxi/board.c +++ b/arch/arm/mach-sunxi/board.c @@ -133,7 +133,7 @@ static int gpio_init(void) return 0; }
-#ifdef CONFIG_SPL_BUILD +#if defined(CONFIG_SPL_BOARD_LOAD_IMAGE) && defined(CONFIG_SPL_BUILD) static int spl_board_load_image(struct spl_image_info *spl_image, struct spl_boot_device *bootdev) { diff --git a/board/sunxi/Kconfig b/board/sunxi/Kconfig index c2eb85e..0001133 100644 --- a/board/sunxi/Kconfig +++ b/board/sunxi/Kconfig @@ -125,6 +125,7 @@ config MACH_SUN50I bool "sun50i (Allwinner A64)" select ARM64 select SUNXI_GEN_SUN6I + select SUPPORT_SPL
endchoice
@@ -196,6 +197,7 @@ config DRAM_ODT_EN bool "sunxi dram odt enable" default n if !MACH_SUN8I_A23 default y if MACH_SUN8I_A23 + default y if MACH_SUN50I ---help--- Select this to enable dram odt (on die termination).
diff --git a/configs/pine64_plus_defconfig b/configs/pine64_plus_defconfig index ebc24b8..2374170 100644 --- a/configs/pine64_plus_defconfig +++ b/configs/pine64_plus_defconfig @@ -5,6 +5,7 @@ CONFIG_MACH_SUN50I=y CONFIG_DEFAULT_DEVICE_TREE="sun50i-a64-pine64-plus" # CONFIG_SYS_MALLOC_CLEAR_ON_INIT is not set CONFIG_CONSOLE_MUX=y +CONFIG_SPL=y # CONFIG_CMD_IMLS is not set # CONFIG_CMD_FLASH is not set # CONFIG_CMD_FPGA is not set diff --git a/include/configs/sunxi-common.h b/include/configs/sunxi-common.h index e05c318..ab2d33f 100644 --- a/include/configs/sunxi-common.h +++ b/include/configs/sunxi-common.h @@ -183,7 +183,9 @@
#define CONFIG_SPL_FRAMEWORK
+#ifndef CONFIG_ARM64 /* AArch64 FEL support is not ready yet */ #define CONFIG_SPL_BOARD_LOAD_IMAGE +#endif
#if defined(CONFIG_MACH_SUN9I) #define CONFIG_SPL_TEXT_BASE 0x10040 /* sram start+header */

On Mon, Dec 19, 2016 at 01:50:11AM +0000, Andre Przywara wrote:
Now that the SPL is ready to be compiled in AArch64 and the DRAM init code is ready, enable SPL support for the A64 SoC and in the Pine64 defconfig. For now we keep the boot0 header in the U-Boot proper, as this allows to still use boot0 as an SPL replacement without hurting the SPL use case. We disable FEL support for now by making its compilation conditional and disabling it for ARM64, as the code isn't ready yet.
Signed-off-by: Andre Przywara andre.przywara@arm.com
Acked-by: Maxime Ripard maxime.ripard@free-electrons.com
Maxime

On 19 December 2016 at 23:01, Maxime Ripard maxime.ripard@free-electrons.com wrote:
On Mon, Dec 19, 2016 at 01:50:11AM +0000, Andre Przywara wrote:
Now that the SPL is ready to be compiled in AArch64 and the DRAM init code is ready, enable SPL support for the A64 SoC and in the Pine64 defconfig. For now we keep the boot0 header in the U-Boot proper, as this allows to still use boot0 as an SPL replacement without hurting the SPL use case. We disable FEL support for now by making its compilation conditional and disabling it for ARM64, as the code isn't ready yet.
Signed-off-by: Andre Przywara andre.przywara@arm.com
Acked-by: Maxime Ripard maxime.ripard@free-electrons.com
Maxime
-- Maxime Ripard, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com
Reviewed-by: Simon Glass sjg@chromium.org

Read the specified "arch" value from a legacy or FIT U-Boot image and store it in our SPL data structure. This allows loaders to take the target architecture in account for custom loading procedures. Having the complete string -> arch mapping for FIT based images in the SPL would be too big, so we leave it up to architectures (or boards) to overwrite the weak function that does the actual translation, possibly covering only the required subset there. Document struct spl_image_info on the way.
Signed-off-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Simon Glass sjg@chromium.org Reviewed-by: Tom Rini trini@konsulko.com --- common/spl/spl.c | 1 + common/spl/spl_fit.c | 8 ++++++++ include/spl.h | 15 ++++++++++++++- 3 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/common/spl/spl.c b/common/spl/spl.c index a76ea3a..ef195e0 100644 --- a/common/spl/spl.c +++ b/common/spl/spl.c @@ -114,6 +114,7 @@ int spl_parse_image_header(struct spl_image_info *spl_image, header_size; } spl_image->os = image_get_os(header); + spl_image->arch = image_get_arch(header); spl_image->name = image_get_name(header); debug("spl: payload image: %.*s load addr: 0x%lx size: %d\n", (int)sizeof(spl_image->name), spl_image->name, diff --git a/common/spl/spl_fit.c b/common/spl/spl_fit.c index aae556f..a5d903b 100644 --- a/common/spl/spl_fit.c +++ b/common/spl/spl_fit.c @@ -123,6 +123,11 @@ static int get_aligned_image_size(struct spl_load_info *info, int data_size, return (data_size + info->bl_len - 1) / info->bl_len; }
+__weak u8 spl_genimg_get_arch_id(const char *arch_str) +{ + return IH_ARCH_DEFAULT; +} + int spl_load_simple_fit(struct spl_image_info *spl_image, struct spl_load_info *info, ulong sector, void *fit) { @@ -136,6 +141,7 @@ int spl_load_simple_fit(struct spl_image_info *spl_image, int base_offset, align_len = ARCH_DMA_MINALIGN - 1; int src_sector; void *dst, *src; + const char *arch_str;
/* * Figure out where the external images start. This is the base for the @@ -184,10 +190,12 @@ int spl_load_simple_fit(struct spl_image_info *spl_image, data_offset = fdt_getprop_u32(fit, node, "data-offset"); data_size = fdt_getprop_u32(fit, node, "data-size"); load = fdt_getprop_u32(fit, node, "load"); + arch_str = fdt_getprop(fit, node, "arch", NULL); debug("data_offset=%x, data_size=%x\n", data_offset, data_size); spl_image->load_addr = load; spl_image->entry_point = load; spl_image->os = IH_OS_U_BOOT; + spl_image->arch = spl_genimg_get_arch_id(arch_str);
/* * Work out where to place the image. We read it so that the first diff --git a/include/spl.h b/include/spl.h index bde4437..8223f4b 100644 --- a/include/spl.h +++ b/include/spl.h @@ -20,13 +20,26 @@ #define MMCSD_MODE_FS 2 #define MMCSD_MODE_EMMCBOOT 3
+/* + * Information about an U-Boot image file as described in include/image.h. + * Parsed by the SPL code from a legacy or FIT image file. + * + * @name: descriptive string (mkimage -n) + * @load_addr: address to load the image file to (mkimage -a) + * @entry_point: address of first instruction to execute (mkimage -e) + * @size: size of image in bytes + * @flags: optional, used only for SPL_COPY_PAYLOAD_ONLY so far + * @os: target operating system, one of IH_OS_* (mkimage -O) + * @arch: target architecture, one of IH_ARCH_* (mkimage -A) + */ struct spl_image_info { const char *name; - u8 os; ulong load_addr; ulong entry_point; u32 size; u32 flags; + u8 os; + u8 arch; };
/*

At the moment we use the arch/arm directory for arm64 boards as well, so the Makefile will pick up the "arm" name for the architecture to use for tagging binaries in U-Boot image files. Differentiate between the two by looking at the CPU variable being defined to "armv8", and use the arm64 architecture name on creating the image file if that matches.
Signed-off-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Simon Glass sjg@chromium.org Reviewed-by: Tom Rini trini@konsulko.com --- Makefile | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/Makefile b/Makefile index dfed58b..749fc5d 100644 --- a/Makefile +++ b/Makefile @@ -923,13 +923,18 @@ quiet_cmd_cpp_cfg = CFG $@ cmd_cpp_cfg = $(CPP) -Wp,-MD,$(depfile) $(cpp_flags) $(LDPPFLAGS) -ansi \ -DDO_DEPS_ONLY -D__ASSEMBLY__ -x assembler-with-cpp -P -dM -E -o $@ $<
+ifeq ($(CPU),armv8) +IH_ARCH := arm64 +else +IH_ARCH := $(ARCH) +endif ifdef CONFIG_SPL_LOAD_FIT -MKIMAGEFLAGS_u-boot.img = -f auto -A $(ARCH) -T firmware -C none -O u-boot \ +MKIMAGEFLAGS_u-boot.img = -f auto -A $(IH_ARCH) -T firmware -C none -O u-boot \ -a $(CONFIG_SYS_TEXT_BASE) -e $(CONFIG_SYS_UBOOT_START) \ -n "U-Boot $(UBOOTRELEASE) for $(BOARD) board" -E \ $(patsubst %,-b arch/$(ARCH)/dts/%.dtb,$(subst ",,$(CONFIG_OF_LIST))) else -MKIMAGEFLAGS_u-boot.img = -A $(ARCH) -T firmware -C none -O u-boot \ +MKIMAGEFLAGS_u-boot.img = -A $(IH_ARCH) -T firmware -C none -O u-boot \ -a $(CONFIG_SYS_TEXT_BASE) -e $(CONFIG_SYS_UBOOT_START) \ -n "U-Boot $(UBOOTRELEASE) for $(BOARD) board" endif

Since the SPL FIT loader can now differentiate between different architectures, teach it how to tell arm and arm64 apart when a FIT image is used. We just support those two for now, as these are so far the only sensible alternatives.
Signed-off-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Simon Glass sjg@chromium.org Reviewed-by: Tom Rini trini@konsulko.com --- arch/arm/lib/spl.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
diff --git a/arch/arm/lib/spl.c b/arch/arm/lib/spl.c index e606d47..45d285c 100644 --- a/arch/arm/lib/spl.c +++ b/arch/arm/lib/spl.c @@ -63,3 +63,18 @@ void __noreturn jump_to_image_linux(struct spl_image_info *spl_image, void *arg) image_entry(0, machid, arg); } #endif + +/* This overwrites the weak definition in spl_fit.c */ +u8 spl_genimg_get_arch_id(const char *arch_str) +{ + if (!arch_str) + return IH_ARCH_DEFAULT; + + if (!strcmp(arch_str, "arm")) + return IH_ARCH_ARM; + + if (!strcmp(arch_str, "arm64")) + return IH_ARCH_ARM64; + + return IH_ARCH_DEFAULT; +}

The ARMv8 capable Allwinner A64 SoC comes out of reset in AArch32 mode. To run AArch64 code, we have to trigger a warm reset via the RMR register, which proceeds with code execution at the address stored in the RVBAR register. If the bootable payload in the FIT image is using a different architecture than the SPL has been compiled for, enter it via this said RMR switch mechanism, by writing the entry point address into the MMIO mapped, writable version of the RVBAR register. Then the warm reset is triggered via a system register write. If the payload architecture is the same as the SPL, we use the normal branch as usual.
Signed-off-by: Andre Przywara andre.przywara@arm.com --- arch/arm/mach-sunxi/Makefile | 1 + arch/arm/mach-sunxi/spl_switch.c | 81 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 82 insertions(+) create mode 100644 arch/arm/mach-sunxi/spl_switch.c
diff --git a/arch/arm/mach-sunxi/Makefile b/arch/arm/mach-sunxi/Makefile index 7daba11..128091e 100644 --- a/arch/arm/mach-sunxi/Makefile +++ b/arch/arm/mach-sunxi/Makefile @@ -51,4 +51,5 @@ obj-$(CONFIG_MACH_SUN8I_A83T) += dram_sun8i_a83t.o obj-$(CONFIG_MACH_SUN8I_H3) += dram_sun8i_h3.o obj-$(CONFIG_MACH_SUN9I) += dram_sun9i.o obj-$(CONFIG_MACH_SUN50I) += dram_sun8i_h3.o +obj-$(CONFIG_MACH_SUN50I) += spl_switch.o endif diff --git a/arch/arm/mach-sunxi/spl_switch.c b/arch/arm/mach-sunxi/spl_switch.c new file mode 100644 index 0000000..855379e --- /dev/null +++ b/arch/arm/mach-sunxi/spl_switch.c @@ -0,0 +1,81 @@ +/* + * (C) Copyright 2016 ARM Ltd. + * + * SPDX-License-Identifier: GPL-2.0+ + */ + +#include <common.h> +#include <spl.h> + +#include <asm/io.h> +#include <asm/barriers.h> + +static void __noreturn jump_to_image_native(struct spl_image_info *spl_image) +{ + typedef void __noreturn (*image_entry_noargs_t)(void); + + image_entry_noargs_t image_entry = + (image_entry_noargs_t)spl_image->entry_point; + + image_entry(); +} + +/* + * Do a warm-reset via the RMR register to enter the processor in a different + * execution mode. This allows to switch from AArch32 to AArch64 and vice + * versa. Execution starts at the address hold in the RVBAR register, which + * needs to be set before. + */ +static void __noreturn reset_rmr_switch(void) +{ +#ifdef CONFIG_ARM64 + __asm__ volatile ( "mrs x0, RMR_EL3\n\t" + "bic x0, x0, #1\n\t" /* Clear enter-in-64 bit */ + "orr x0, x0, #2\n\t" /* set reset request bit */ + "msr RMR_EL3, x0\n\t" + "isb sy\n\t" + "nop\n\t" + "wfi\n\t" + "b .\n" + ::: "x0"); +#else + __asm__ volatile ( "mrc 15, 0, r0, cr12, cr0, 2\n\t" + "orr r0, r0, #3\n\t" /* request reset in 64 bit */ + "mcr 15, 0, r0, cr12, cr0, 2\n\t" + "isb\n\t" + "nop\n\t" + "wfi\n\t" + "b .\n" + ::: "r0"); +#endif + while (1); /* to avoid a compiler warning about __noreturn */ +} + +void __noreturn jump_to_image_no_args(struct spl_image_info *spl_image) +{ + if (spl_image->arch == IH_ARCH_DEFAULT) { + /* + * If the image to be executed is using the same architecture + * as we are currently running in, just branch to the target + * address. + */ + debug("entering by branch\n"); + jump_to_image_native(spl_image); + } else { + /* + * If the target architecture and the current one differ, use + * the RMR routine to change it. + */ + debug("entering by RMR switch\n"); + /* + * The start address at which execution continues after the + * RMR switch is held in the RVBAR system register, which is + * architecturally read-only. + * Allwinner provides a writeable alias in MMIO space for it. + */ + writel(spl_image->entry_point, 0x17000a0); + DSB; + ISB; + reset_rmr_switch(); + } +}

On Mon, Dec 19, 2016 at 01:50:15AM +0000, Andre Przywara wrote:
The ARMv8 capable Allwinner A64 SoC comes out of reset in AArch32 mode. To run AArch64 code, we have to trigger a warm reset via the RMR register, which proceeds with code execution at the address stored in the RVBAR register. If the bootable payload in the FIT image is using a different architecture than the SPL has been compiled for, enter it via this said RMR switch mechanism, by writing the entry point address into the MMIO mapped, writable version of the RVBAR register. Then the warm reset is triggered via a system register write. If the payload architecture is the same as the SPL, we use the normal branch as usual.
Signed-off-by: Andre Przywara andre.przywara@arm.com
Acked-by: Maxime Ripard maxime.ripard@free-electrons.com
Thanks, Maxime

When compiling the SPL for the Allwinner A64 in AArch64 mode, we can't use the more compact Thumb2 encoding, which only exists for AArch32 code. This makes the SPL rather big, up to a point where any code additions or even a different compiler may easily exceed the 32KB limit that the Allwinner BROM imposes. Introduce a separate, mostly generic sun50i-a64 configuration, which defines the CPU_V7 symbol and thus will create a 32-bit binary using the memory-saving Thumb2 encoding. This should only be used for the SPL, the U-Boot proper should still be using the existing 64-bit configuration. The SPL code can switch to AArch64 if needed, so a 32-bit SPL can be combined with a 64-bit U-Boot proper to eventually launch arm64 kernels.
Signed-off-by: Andre Przywara andre.przywara@arm.com --- board/sunxi/Kconfig | 14 ++++++++++++-- configs/pine64_plus_defconfig | 2 +- configs/sun50i_spl32_defconfig | 10 ++++++++++ 3 files changed, 23 insertions(+), 3 deletions(-) create mode 100644 configs/sun50i_spl32_defconfig
diff --git a/board/sunxi/Kconfig b/board/sunxi/Kconfig index 0001133..0d77c3a 100644 --- a/board/sunxi/Kconfig +++ b/board/sunxi/Kconfig @@ -43,6 +43,10 @@ config SUNXI_GEN_SUN6I watchdog, etc.
+config MACH_SUN50I + bool + select SUNXI_GEN_SUN6I + choice prompt "Sunxi SoC Variant" optional @@ -121,10 +125,16 @@ config MACH_SUN9I select SUNXI_GEN_SUN6I select SUPPORT_SPL
-config MACH_SUN50I +config MACH_SUN50I_64 bool "sun50i (Allwinner A64)" + select MACH_SUN50I select ARM64 - select SUNXI_GEN_SUN6I + select SUPPORT_SPL + +config MACH_SUN50I_32 + bool "sun50i (Allwinner A64) SPL-32bit" + select MACH_SUN50I + select CPU_V7 select SUPPORT_SPL
endchoice diff --git a/configs/pine64_plus_defconfig b/configs/pine64_plus_defconfig index 2374170..a76f66a 100644 --- a/configs/pine64_plus_defconfig +++ b/configs/pine64_plus_defconfig @@ -1,7 +1,7 @@ CONFIG_ARM=y CONFIG_RESERVE_ALLWINNER_BOOT0_HEADER=y CONFIG_ARCH_SUNXI=y -CONFIG_MACH_SUN50I=y +CONFIG_MACH_SUN50I_64=y CONFIG_DEFAULT_DEVICE_TREE="sun50i-a64-pine64-plus" # CONFIG_SYS_MALLOC_CLEAR_ON_INIT is not set CONFIG_CONSOLE_MUX=y diff --git a/configs/sun50i_spl32_defconfig b/configs/sun50i_spl32_defconfig new file mode 100644 index 0000000..29c6a47 --- /dev/null +++ b/configs/sun50i_spl32_defconfig @@ -0,0 +1,10 @@ +CONFIG_ARM=y +CONFIG_ARCH_SUNXI=y +CONFIG_MACH_SUN50I_32=y +CONFIG_SPL=y +CONFIG_DEFAULT_DEVICE_TREE="sun50i-a64-pine64-plus" +CONFIG_OF_LIST="sun50i-a64-pine64 sun50i-a64-pine64-plus" +# CONFIG_CMD_IMLS is not set +# CONFIG_CMD_FLASH is not set +# CONFIG_CMD_FPGA is not set +CONFIG_MMC_SUNXI_SLOT_EXTRA=2

On Mon, Dec 19, 2016 at 01:50:16AM +0000, Andre Przywara wrote:
When compiling the SPL for the Allwinner A64 in AArch64 mode, we can't use the more compact Thumb2 encoding, which only exists for AArch32 code. This makes the SPL rather big, up to a point where any code additions or even a different compiler may easily exceed the 32KB limit that the Allwinner BROM imposes. Introduce a separate, mostly generic sun50i-a64 configuration, which defines the CPU_V7 symbol and thus will create a 32-bit binary using the memory-saving Thumb2 encoding. This should only be used for the SPL, the U-Boot proper should still be using the existing 64-bit configuration. The SPL code can switch to AArch64 if needed, so a 32-bit SPL can be combined with a 64-bit U-Boot proper to eventually launch arm64 kernels.
Signed-off-by: Andre Przywara andre.przywara@arm.com
Like I said in the previous version of those patches, I'd like to discuss this more and not merge this patch right now.. For context, this would be better to keep the current discussion on the v2 ongoing.
Maxime

On Mon, Dec 19, 2016 at 2:49 AM, Andre Przywara andre.przywara@arm.com wrote:
Hi,
another reworked version of the SPL support series for the Allwinner A64 SoC. Again many thanks to the diligent reviewers, I hope I didn't miss any comments. As the previous versions this one includes support for both AArch64 and AArch32 SPL builds. The FIT support is still missing, which means the functionality is limited. Due to the missing ARM Trusted Firmware (ATF) in this firmware chain we lose Ethernet and SMP, among other minor things. A full 64-bit build can be written to an SD card as expected and will boot the U-Boot proper prompt. However Linux will crash on boot, as PSCI is missing. Building the 32-bit version of the SPL and combining this with an ATF build and the 64-bit U-Boot proper allows to use FEL booting now: # sunxi-fel spl sunxi-spl.bin write 0x4a000000 u-boot-dtb.bin \ write 0x44000 bl31.bin reset64 0x44000 This way of booting the board gives full functionality.
The first patch is a rather simple fix (with no changes to v2). Patches 2-8 prepare the SPL code to be compiled for 64-bit in general and AArch64 in particular. Patches 9-11 refactor the existing boot0 header functionality to be used by patch 12, which introduces the 64-bit switch in the first SPL instructions. Patches 13-20 then introduce the actual core of the SPL support: the DRAM initialization, courtesy of Jens. This piggy backs on the existing H3 DRAM code, deviating where needed. This has been reworked compared to v2: I added a patch from Philipp to replace the rather uninspired register writes in the MBUS priority setup function with some meaningful code, explaining the various bits. Also the actual A64 DRAM code is no longer #ifdef'ed into the H3 driver, but uses parameters to (static) functions. The compiler detects this and removes the dead code from the other variant, resulting in the same binary size for the H3.
Patch 21 finally enables the 64-bit SPL support. So now building the existing pine64_plus_defconfig will generate a sunxi-spl.bin, which can be prepended to the U-Boot proper image (not .bin) to boot from an SD card. Due to the missing ATF support this is of limited usability at the moment, though. Also FEL support requires more love - to switch back to AArch32 before returning to FEL (without crashing, that is ;-), so this is disabled. On my setup this results in a 26KB SPL binary, which is close to the 28K limit mksunxiboot imposes at the moment. Adding anything (like FIT support or DEBUG) will exceed this, and although I have patches to let mksunxiboot get close to 32KB, this is the ulimate frontier.
So patches 22-25 then teach the SPL how to detect an U-Boot image file of a different bitness and do the RMR switch from AArch32 to AArch64, if needed. This is used by the final patch 26, which creates another _defconfig to let the SPL compile for AArch32 using the Thumb2 encoding. This results in a binary of less than 17KB in my case, so has plenty of room for extensions.
Cheers, Andre.
Changelog v2 .. v3:
- add various Reviewed-by: and Acked-by: tags
- split tiny-printf fix to handle "-" separately
- add various comments and extend commit messages
- add assembly file to re-create the embedded RMR switch code
- add patch 14/26 to explain the MBUS priority setup
- move DRAM r/w delay values into #defines to simplify re-usablity
- replace #ifdef'ed addition of A64 support to the H3 DRAM driver with an approach using static parameters
Changelog v1 .. v2:
- drop SPI build fix (already merged)
- confine A31 register init change to H3 and A64
- use IS_ENABLED() instead of #idef to guard MBUS2 clock init
- fix tiny-printf (proper sign extension for 32-bit integers)
- add "size" output in commit msg to document tiny-printf size impact
- fix sdelay(): use only one register, add "cc" clobber
- update RMR switch code to provide easy access to RVBAR register address
- drop redundant DRAM frequency setting from Pine64 defconfig
- minor changes as requested by reviewers
Andre Przywara (21): sun6i: Restrict some register initialization to Allwinner A31 SoC armv8: prevent using THUMB armv8: add lowlevel_init.S SPL: tiny-printf: add "l" modifier SPL: tiny-printf: ignore "-" modifier move UL() macro from armv8/mmu.h into common.h SPL: make struct spl_image 64-bit safe armv8: add simple sdelay implementation armv8: move reset branch into boot hook ARM: boot0 hook: remove macro, include whole header file sunxi: introduce extra config option for boot0 header sunxi: A64: do an RMR switch if started in AArch32 mode sunxi: provide default DRAM config for sun50i in Kconfig sunxi: H3/A64: fix non-ODT setting sunxi: DRAM: fix H3 DRAM size display on aarch64 sunxi: A64: enable SPL SPL: read and store arch property from U-Boot image Makefile: use "arm64" architecture for U-Boot image files ARM: SPL/FIT: differentiate between arm and arm64 arch properties sunxi: introduce RMR switch to enter payloads in 64-bit mode sunxi: A64: add 32-bit SPL support
Jens Kuske (3): sunxi: H3: add and rename some DRAM contoller registers sunxi: H3: add DRAM controller single bit delay support sunxi: A64: use H3 DRAM initialization code for A64 as well
Philipp Tomsich (2): sunxi: H3: Rework MBUS priority setup sunxi: clocks: Use the correct pattern register for PLL11
Please rebase on top of master, I am getting merge conflicts.
thanks!
participants (6)
-
Andre Przywara
-
Jagan Teki
-
Jens Kuske
-
Maxime Ripard
-
Simon Glass
-
Steve Rae