[U-Boot] [PATCH 00/14] sunxi: Allwinner A10/A13/A20 DRAM controller fixes

Hello,
First of all, it may be worth reminding that no accurate documentation for this particular DRAM controller exists in public access.
However it is suspected that Allwinner uses one of the revisions of Synopsys DesignWare DDR2/3-Lite Memory Controller IP (MCTL) combined with DDR2/3-Lite PHY IP in A10/A13/A20. Also this DRAM controller apparently has siblings in Rockchip RK29XX, RK30XX and TI KeyStone2 hardware, which have some documentation and some bits of kernel and bootloader sources available in the Internet. Not to mention the original Allwinner boot0 bootloader sources and the suspend support code from the linux-sunxi kernel. This provides enough hints for finding out how the DRAM controller actually works by checking various bits of information via the trial and error method.
In other words, a few people from the linux-sunxi community (me included) are essentially doing reverse engineering of the DRAM controller and using the linux-sunxi wiki to document all the findings: http://linux-sunxi.org/DRAM_Controller
If Allwinner Technology, Synopsys DesignWare or anyone else can share real documentation or at least provide some hints or review the code, that would be really appreciated.
Nevertheless, here is a patch set, which tries to improve the current u-boot sunxi dram code in the following way: * remove the convoluted dead code paths, which have no real use * fix obvious bugs and timing violations * add real ZQ calibration and ODT support for better reliability and higher clock speed potential * remove duplication and share code between sun4i/sun5i/sun7i * add more configuration knobs (such as MBUS clock frequency) * optional support for automatic detection of the bus width and chip density
This patch set only modifies the dram.c and dram.h source files and applies cleanly to u-boot v2014.07 (but is intended for v2014.10). The patch set is organized in such a way, that we first fix bugs in the current sun7i implementation, and only after that add the missing sun4i/sun5i support bits. As such, it clashes with the dram parts of the sun4i/sun5i support patches, earlier submitted by Hans de Goede.
Most of this work has been done during the last 3 months (and apart from the u-boot patches, it also includes improving the DRAM controler documentation in the wiki and developing extra tools, which assist in finding optimal DRAM configuration for each device). Many thanks to Jens Kuske, who helped really a lot in brainstorming and testing.
To sum it up, the main purpose of these patches is to provide: 1) The potential ability to have a universal generic failsafe DRAM initialization for all Allwinner A10/A13/A20 devices using the same u-boot binary (or at least the same SPL) 2) The configuration knobs, which allow to reach extreme clock speeds and maximize performance for the selected boards with the help of extra tuning.
These patches are also available at: https://github.com/ssvb/u-boot-sunxi-dram/commits/sunxi-dram-fixes
As a demonstration, there is also a highly experimental test branch with the DRAM performance tuning for the Cubietruck board (initially targeting 600 MHz DRAM clock speed, up from the current 432 MHz), which is compatible with the sunxi-3.4 kernel: https://github.com/ssvb/u-boot-sunxi-dram/commits/highspeedtruck-sunxi-3.4 Or alternatively, a similar branch for the mainline 3.16 kernel: https://github.com/ssvb/u-boot-sunxi-dram/commits/highspeedtruck-mainline-3....
The 600 MHz clock speed is clearly an overkill and may not work on every device, so we will have to settle with something more modest in the end. But, for example, the DRAM in my Cubietruck can be clocked up to 648 MHz. The preliminary DRAM parameters tuning instructions are available at: http://linux-sunxi.org/A10_DRAM_Controller_Calibration
Siarhei Siamashka (14): sunxi: dram: Remove useless 'dramc_scan_dll_para()' function sunxi: dram: Remove broken super-standby remnants sunxi: dram: Respect the DDR3 reset timing requirements sunxi: dram: Code cleanup and comments for the CKE delay handling sunxi: dram: Code cleanup for the impedance calibration sunxi: dram: Configurable MBUS clock speed (use PLL5 or PLL6) sunxi: dram: Use divisor P=1 for PLL5 sunxi: dram: Improve DQS gate data training error handling sunxi: dram: Add a helper function 'mctl_get_number_of_lanes' sunxi: dram: Configurable DQS gating window mode and delay sunxi: dram: Support sun4i (Allwinner A10) and sun5i (Allwinner A13) sunxi: dram: Drop DDR2 support and assume only single rank DDR3 memory sunxi: dram: Derive write recovery delay from DRAM clock speed sunxi: dram: Autodetect DDR3 bus width and density
arch/arm/cpu/armv7/sunxi/dram.c | 606 ++++++++++++++++++++------------- arch/arm/include/asm/arch-sunxi/dram.h | 11 +- 2 files changed, 374 insertions(+), 243 deletions(-)

The attempt to do DRAM parameters calibration in 'dramc_scan_dll_para()' function by trying different DLL adjustments and using the hardware DQS gate training result as a feedback is a great source of inspiration, but it just can't work properly the way it is implemented now. The fatal problem of this implementation is that the DQS gating window can be successfully found for almost every DLL delay adjustment setup that gets tried. Thus making it unable to see any real difference between 'good' and 'bad' settings.
Also this code was supposed to be only activated by setting the highest bit in the 'dram_tpr3' variable of the 'dram_para' struct (per-board dram configuration). But none of the linux-sunxi devices has ever used it for real. Basically, this code is just a dead weight.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- arch/arm/cpu/armv7/sunxi/dram.c | 125 +--------------------------------------- 1 file changed, 1 insertion(+), 124 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index b43c4b4..76a7106 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -264,117 +264,6 @@ static int dramc_scan_readpipe(void) return 0; }
-static int dramc_scan_dll_para(void) -{ - struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; - const u32 dqs_dly[7] = {0x3, 0x2, 0x1, 0x0, 0xe, 0xd, 0xc}; - const u32 clk_dly[15] = {0x07, 0x06, 0x05, 0x04, 0x03, - 0x02, 0x01, 0x00, 0x08, 0x10, - 0x18, 0x20, 0x28, 0x30, 0x38}; - u32 clk_dqs_count[15]; - u32 dqs_i, clk_i, cr_i; - u32 max_val, min_val; - u32 dqs_index, clk_index; - - /* Find DQS_DLY Pass Count for every CLK_DLY */ - for (clk_i = 0; clk_i < 15; clk_i++) { - clk_dqs_count[clk_i] = 0; - clrsetbits_le32(&dram->dllcr[0], 0x3f << 6, - (clk_dly[clk_i] & 0x3f) << 6); - for (dqs_i = 0; dqs_i < 7; dqs_i++) { - for (cr_i = 1; cr_i < 5; cr_i++) { - clrsetbits_le32(&dram->dllcr[cr_i], - 0x4f << 14, - (dqs_dly[dqs_i] & 0x4f) << 14); - } - udelay(2); - if (dramc_scan_readpipe() == 0) - clk_dqs_count[clk_i]++; - } - } - /* Test DQS_DLY Pass Count for every CLK_DLY from up to down */ - for (dqs_i = 15; dqs_i > 0; dqs_i--) { - max_val = 15; - min_val = 15; - for (clk_i = 0; clk_i < 15; clk_i++) { - if (clk_dqs_count[clk_i] == dqs_i) { - max_val = clk_i; - if (min_val == 15) - min_val = clk_i; - } - } - if (max_val < 15) - break; - } - - /* Check if Find a CLK_DLY failed */ - if (!dqs_i) - goto fail; - - /* Find the middle index of CLK_DLY */ - clk_index = (max_val + min_val) >> 1; - if ((max_val == (15 - 1)) && (min_val > 0)) - /* if CLK_DLY[MCTL_CLK_DLY_COUNT] is very good, then the middle - * value can be more close to the max_val - */ - clk_index = (15 + clk_index) >> 1; - else if ((max_val < (15 - 1)) && (min_val == 0)) - /* if CLK_DLY[0] is very good, then the middle value can be more - * close to the min_val - */ - clk_index >>= 1; - if (clk_dqs_count[clk_index] < dqs_i) - clk_index = min_val; - - /* Find the middle index of DQS_DLY for the CLK_DLY got above, and Scan - * read pipe again - */ - clrsetbits_le32(&dram->dllcr[0], 0x3f << 6, - (clk_dly[clk_index] & 0x3f) << 6); - max_val = 7; - min_val = 7; - for (dqs_i = 0; dqs_i < 7; dqs_i++) { - clk_dqs_count[dqs_i] = 0; - for (cr_i = 1; cr_i < 5; cr_i++) { - clrsetbits_le32(&dram->dllcr[cr_i], - 0x4f << 14, - (dqs_dly[dqs_i] & 0x4f) << 14); - } - udelay(2); - if (dramc_scan_readpipe() == 0) { - clk_dqs_count[dqs_i] = 1; - max_val = dqs_i; - if (min_val == 7) - min_val = dqs_i; - } - } - - if (max_val < 7) { - dqs_index = (max_val + min_val) >> 1; - if ((max_val == (7-1)) && (min_val > 0)) - dqs_index = (7 + dqs_index) >> 1; - else if ((max_val < (7-1)) && (min_val == 0)) - dqs_index >>= 1; - if (!clk_dqs_count[dqs_index]) - dqs_index = min_val; - for (cr_i = 1; cr_i < 5; cr_i++) { - clrsetbits_le32(&dram->dllcr[cr_i], - 0x4f << 14, - (dqs_dly[dqs_index] & 0x4f) << 14); - } - udelay(2); - return dramc_scan_readpipe(); - } - -fail: - clrbits_le32(&dram->dllcr[0], 0x3f << 6); - for (cr_i = 1; cr_i < 5; cr_i++) - clrbits_le32(&dram->dllcr[cr_i], 0x4f << 14); - udelay(2); - - return dramc_scan_readpipe(); -} - static void dramc_clock_output_en(u32 on) { #if defined(CONFIG_SUN5I) || defined(CONFIG_SUN7I) @@ -569,19 +458,7 @@ unsigned long dramc_init(struct dram_para *para)
/* scan read pipe value */ mctl_itm_enable(); - if (para->tpr3 & (0x1 << 31)) { - ret_val = dramc_scan_dll_para(); - if (ret_val == 0) - para->tpr3 = - (((readl(&dram->dllcr[0]) >> 6) & 0x3f) << 16) | - (((readl(&dram->dllcr[1]) >> 14) & 0xf) << 0) | - (((readl(&dram->dllcr[2]) >> 14) & 0xf) << 4) | - (((readl(&dram->dllcr[3]) >> 14) & 0xf) << 8) | - (((readl(&dram->dllcr[4]) >> 14) & 0xf) << 12 - ); - } else { - ret_val = dramc_scan_readpipe(); - } + ret_val = dramc_scan_readpipe();
if (ret_val < 0) return 0;

On Fri, 2014-07-18 at 19:22 +0300, Siarhei Siamashka wrote:
The attempt to do DRAM parameters calibration in 'dramc_scan_dll_para()' function by trying different DLL adjustments and using the hardware DQS gate training result as a feedback is a great source of inspiration, but it just can't work properly the way it is implemented now. The fatal problem of this implementation is that the DQS gating window can be successfully found for almost every DLL delay adjustment setup that gets tried. Thus making it unable to see any real difference between 'good' and 'bad' settings.
Also this code was supposed to be only activated by setting the highest bit in the 'dram_tpr3' variable of the 'dram_para' struct (per-board dram configuration). But none of the linux-sunxi devices has ever used it for real. Basically, this code is just a dead weight.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com
Acked-by: Ian Campbell ijc@hellion.org.uk

If the dram->ppwrsctl (SDR_DPCR) register has the lowest bit set to 1, this means that DRAM is currently in self-refresh mode and retaining the old data. Since we have no idea what to do in this situation yet, just set this register to 0 and initialize DRAM in the same way as on any normal reboot (discarding whatever was stored there).
This part of code was apparently used by the Allwinner boot0 bootloader to handle resume from the so-called super-standby mode. But this particular code got somehow mangled on the way from the boot0 bootloader to the u-boot-sunxi bootloader and has no chance of doing anything even remotely sane. For example: 1. in the original boot0 code we had "mctl_write_w(SDR_DPCR, 0x16510000)" (write to the register) and in the u-boot it now looks like "setbits_le32(&dram->ppwrsctl, 0x16510000)" (set bits in the register) 2. in the original boot0 code it was issuing three commands "0x12, 0x17, 0x13" (Self-Refresh entry, Self-Refresh exit, Refresh), but in the u-boot they have become "0x12, 0x12, 0x13" (Self-Refresh entry, Self-Refresh entry, Refresh)
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- arch/arm/cpu/armv7/sunxi/dram.c | 67 ++++++++++++++--------------------------- 1 file changed, 22 insertions(+), 45 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index 76a7106..beaf95a 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -295,6 +295,24 @@ static void dramc_set_autorefresh_cycle(u32 clk, u32 type, u32 density) writel(DRAM_DRR_TREFI(tREFI) | DRAM_DRR_TRFC(tRFC), &dram->drr); }
+/* + * If the dram->ppwrsctl (SDR_DPCR) register has the lowest bit set to 1, this + * means that DRAM is currently in self-refresh mode and retaining the old + * data. Since we have no idea what to do in this situation yet, just set this + * register to 0 and initialize DRAM in the same way as on any normal reboot + * (discarding whatever was stored there). + * + * Note: on sun7i hardware, the highest 16 bits need to be set to 0x1651 magic + * value for this write operation to have any effect. On sun5i hadware this + * magic value is not necessary. And on sun4i hardware the writes to this + * register seem to have no effect at all. + */ +static void mctl_disable_power_save(void) +{ + struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; + writel(0x16510000, &dram->ppwrsctl); +} + unsigned long dramc_init(struct dram_para *para) { struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; @@ -309,6 +327,9 @@ unsigned long dramc_init(struct dram_para *para) /* setup DRAM relative clock */ mctl_setup_dram_clock(para->clock);
+ /* Disable any pad power save control */ + mctl_disable_power_save(); + /* reset external DRAM */ mctl_set_drive();
@@ -366,12 +387,7 @@ unsigned long dramc_init(struct dram_para *para) setbits_le32(&dram->idcr, 0x1ffff); #endif
-#ifdef CONFIG_SUN7I - if ((readl(&dram->ppwrsctl) & 0x1) != 0x1) - mctl_ddr3_reset(); - else - setbits_le32(&dram->mcr, DRAM_MCR_RESET); -#endif + mctl_ddr3_reset();
udelay(1);
@@ -417,45 +433,6 @@ unsigned long dramc_init(struct dram_para *para) setbits_le32(&dram->ccr, DRAM_CCR_INIT); await_completion(&dram->ccr, DRAM_CCR_INIT);
-#ifdef CONFIG_SUN7I - /* setup zq calibration manual */ - reg_val = readl(&dram->ppwrsctl); - if ((reg_val & 0x1) == 1) { - /* super_standby_flag = 1 */ - - reg_val = readl(0x01c20c00 + 0x120); /* rtc */ - reg_val &= 0x000fffff; - reg_val |= 0x17b00000; - writel(reg_val, &dram->zqcr0); - - /* exit self-refresh state */ - clrsetbits_le32(&dram->dcr, 0x1f << 27, 0x12 << 27); - /* check whether command has been executed */ - await_completion(&dram->dcr, 0x1 << 31); - - udelay(2); - - /* dram pad hold off */ - setbits_le32(&dram->ppwrsctl, 0x16510000); - - await_completion(&dram->ppwrsctl, 0x1); - - /* exit self-refresh state */ - clrsetbits_le32(&dram->dcr, 0x1f << 27, 0x12 << 27); - - /* check whether command has been executed */ - await_completion(&dram->dcr, 0x1 << 31); - - udelay(2); - - /* issue a refresh command */ - clrsetbits_le32(&dram->dcr, 0x1f << 27, 0x13 << 27); - await_completion(&dram->dcr, 0x1 << 31); - - udelay(2); - } -#endif - /* scan read pipe value */ mctl_itm_enable(); ret_val = dramc_scan_readpipe();

On Fri, 2014-07-18 at 19:22 +0300, Siarhei Siamashka wrote:
If the dram->ppwrsctl (SDR_DPCR) register has the lowest bit set to 1, this means that DRAM is currently in self-refresh mode and retaining the old data. Since we have no idea what to do in this situation yet, just set this register to 0 and initialize DRAM in the same way as on any normal reboot (discarding whatever was stored there).
This part of code was apparently used by the Allwinner boot0 bootloader to handle resume from the so-called super-standby mode. But this particular code got somehow mangled on the way from the boot0 bootloader to the u-boot-sunxi bootloader and has no chance of doing anything even remotely sane. For example:
- in the original boot0 code we had "mctl_write_w(SDR_DPCR, 0x16510000)" (write to the register) and in the u-boot it now looks like "setbits_le32(&dram->ppwrsctl, 0x16510000)" (set bits in the register)
- in the original boot0 code it was issuing three commands "0x12, 0x17, 0x13" (Self-Refresh entry, Self-Refresh exit, Refresh), but in the u-boot they have become "0x12, 0x12, 0x13" (Self-Refresh entry, Self-Refresh entry, Refresh)
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com
Acked-by: Ian Campbell ijc@hellion.org.uk

The RESET pin needs to be kept low for at least 200 us according to the DDR3 spec. So just do it the right way.
This issue did not cause any visible major problems earlier, because the DRAM RESET pin is usually already low after the board reset. And the time gap before reaching the sunxi u-boot DRAM initialization code appeared to be sufficient.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- arch/arm/cpu/armv7/sunxi/dram.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index beaf95a..01c492f 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -48,13 +48,18 @@ static void await_completion(u32 *reg, u32 mask) } }
+/* + * This performs the external DRAM reset by driving the RESET pin low and + * then high again. According to the DDR3 spec, the RESET pin needs to be + * kept low for at least 200 us. + */ static void mctl_ddr3_reset(void) { struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE;
clrbits_le32(&dram->mcr, DRAM_MCR_RESET); - udelay(2); + udelay(200); setbits_le32(&dram->mcr, DRAM_MCR_RESET); }

On Fri, 2014-07-18 at 19:22 +0300, Siarhei Siamashka wrote:
The RESET pin needs to be kept low for at least 200 us according to the DDR3 spec. So just do it the right way.
This issue did not cause any visible major problems earlier, because the DRAM RESET pin is usually already low after the board reset. And the time gap before reaching the sunxi u-boot DRAM initialization code appeared to be sufficient.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com
Acked-by: Ian Campbell ijc@hellion.org.uk

Before driving the CKE pin (Clock Enable) high, the DDR3 spec requires to wait for additional 500 us after the RESET pin is de-asserted.
The DRAM controller takes care of this delay by itself, using a configurable counter in the SDR_IDCR register. This works in the same way on sun4i/sun5i/sun7i hardware (even the default register value 0x00c80064 is identical). Except that the counter is ticking a bit slower on sun7i (3 DRAM clock cycles instead of 2), resulting in longer actual delays for the same settings.
This patch keeps the old code and only removes the CONFIG_SUN7I ifdef. But maybe we should drop all of this and just add 'udelay(500)' after the DDR3 reset without bothering to play with these undocumented registers.
Another interesting observation is that the u-boot-sunxi code (derived from the Allwinner boot0) did not configure the SDR_IDCR register for sun4i/sun5i, but performed the DDR3 reset very early. Possibly resulting in a sufficient time gap between the DDR3 reset and the DDR3 initialization steps.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- arch/arm/cpu/armv7/sunxi/dram.c | 45 ++++++++++++++++++++++++++++++++++------- 1 file changed, 38 insertions(+), 7 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index 01c492f..def4247 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -318,6 +318,41 @@ static void mctl_disable_power_save(void) writel(0x16510000, &dram->ppwrsctl); }
+/* + * After the DRAM is powered up or reset, the DDR3 spec requires to wait at + * least 500 us before driving the CKE pin (Clock Enable) high. The dram->idct + * (SDR_IDCR) register appears to configure this delay, which gets applied + * right at the time when the DRAM initialization is activated in the + * 'mctl_ddr3_initialize' function. + */ +static void mctl_set_cke_delay(void) +{ + struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; + + /* The CKE delay is represented in dram clock cycles, multiplied by N + * (where N=2 for sun4i/sun5i and N=3 for sun7i). We are being lazy + * to do proper calculations and just set it to the maximum possible + * value 0x1ffff. This is enough to provide the needed 500 us delay + * at the DRAM clock freqencies up to ~524MHz on sun4i/sun5i hardware. + * The sun7i hardware has even more headroom due to a larger multiplier. + */ + setbits_le32(&dram->idcr, 0x1ffff); +} + +/* + * This triggers the DRAM initialization. It performs sending the mode registers + * to the DRAM among other things. Very likely the ZQCL command is also getting + * executed (to do the initial impedance calibration on the DRAM side of the + * wire). The memory controller and the PHY must be already configured before + * calling this function. + */ +static void mctl_ddr3_initialize(void) +{ + struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; + setbits_le32(&dram->ccr, DRAM_CCR_INIT); + await_completion(&dram->ccr, DRAM_CCR_INIT); +} + unsigned long dramc_init(struct dram_para *para) { struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; @@ -387,10 +422,7 @@ unsigned long dramc_init(struct dram_para *para) writel(reg_val, &dram->zqcr0); #endif
-#ifdef CONFIG_SUN7I - /* Set CKE Delay to about 1ms */ - setbits_le32(&dram->idcr, 0x1ffff); -#endif + mctl_set_cke_delay();
mctl_ddr3_reset();
@@ -434,9 +466,8 @@ unsigned long dramc_init(struct dram_para *para) if (para->tpr4 & 0x1) setbits_le32(&dram->ccr, DRAM_CCR_COMMAND_RATE_1T); #endif - /* reset external DRAM */ - setbits_le32(&dram->ccr, DRAM_CCR_INIT); - await_completion(&dram->ccr, DRAM_CCR_INIT); + /* initialize external DRAM */ + mctl_ddr3_initialize();
/* scan read pipe value */ mctl_itm_enable();

On Fri, 2014-07-18 at 19:22 +0300, Siarhei Siamashka wrote:
Before driving the CKE pin (Clock Enable) high, the DDR3 spec requires to wait for additional 500 us after the RESET pin is de-asserted.
The DRAM controller takes care of this delay by itself, using a configurable counter in the SDR_IDCR register. This works in the same way on sun4i/sun5i/sun7i hardware (even the default register value 0x00c80064 is identical). Except that the counter is ticking a bit slower on sun7i (3 DRAM clock cycles instead of 2), resulting in longer actual delays for the same settings.
This patch keeps the old code and only removes the CONFIG_SUN7I ifdef. But maybe we should drop all of this and just add 'udelay(500)' after the DDR3 reset without bothering to play with these undocumented registers.
I'm happy to go with whichever you think is better.
Another interesting observation is that the u-boot-sunxi code (derived from the Allwinner boot0) did not configure the SDR_IDCR register for sun4i/sun5i, but performed the DDR3 reset very early. Possibly resulting in a sufficient time gap between the DDR3 reset and the DDR3 initialization steps.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com
Acked-by: Ian Campbell ijc@hellion.org.uk

On Mon, 21 Jul 2014 19:51:50 +0100 Ian Campbell ijc@hellion.org.uk wrote:
On Fri, 2014-07-18 at 19:22 +0300, Siarhei Siamashka wrote:
Before driving the CKE pin (Clock Enable) high, the DDR3 spec requires to wait for additional 500 us after the RESET pin is de-asserted.
The DRAM controller takes care of this delay by itself, using a configurable counter in the SDR_IDCR register. This works in the same way on sun4i/sun5i/sun7i hardware (even the default register value 0x00c80064 is identical). Except that the counter is ticking a bit slower on sun7i (3 DRAM clock cycles instead of 2), resulting in longer actual delays for the same settings.
This patch keeps the old code and only removes the CONFIG_SUN7I ifdef. But maybe we should drop all of this and just add 'udelay(500)' after the DDR3 reset without bothering to play with these undocumented registers.
I'm happy to go with whichever you think is better.
If the total DRAM initialization time in u-boot is not really critical (all the delays are only fractions of millisecond), then I would probably go with the "cargo cult" approach and actually apply the delays in both places ('udelay(500)' after the DDR3 reset and keep the maximum delay in the SDR_IDCR register too).
I just feel uneasy about using only the SDR_IDCR approach, because it implies the 524MHz DRAM clock speed limit for sun4i/sun5i hardware if we don't want to violate the DDR3 requirements. And we would prefer to have at least the 528MHz clock speed option (the Allwinner A13 manual says that 533MHz is the DRAM clock limit for it).
The changes in the SDR_IDCR delay counter on sun7i hardware (which permit longer delays) are very interesting. It might be a hint that the DRAM controller in sun7i had been originally intended to support higher clock speeds than its predecessors. However the Allwinner A20 manual only advertises the 400MHz DRAM clock. It might be that the Allwinner engineers could not figure out how to configure the DRAM parameters to reach really high clock speeds and decided to be modest about this. However that's just a speculation on my side.
Anyway, if anybody has a logic analyzer (the hardware companies like Olimex and CubieTech?), then checking and confirming the timings between the signals on the CKE and RESET pins during the DRAM initialization would be very interesting to confirm that everything is alright.
Another interesting observation is that the u-boot-sunxi code (derived from the Allwinner boot0) did not configure the SDR_IDCR register for sun4i/sun5i, but performed the DDR3 reset very early. Possibly resulting in a sufficient time gap between the DDR3 reset and the DDR3 initialization steps.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com
Acked-by: Ian Campbell ijc@hellion.org.uk
Thanks.

On Fri, 2014-07-25 at 04:41 +0300, Siarhei Siamashka wrote:
On Mon, 21 Jul 2014 19:51:50 +0100 Ian Campbell ijc@hellion.org.uk wrote:
On Fri, 2014-07-18 at 19:22 +0300, Siarhei Siamashka wrote:
Before driving the CKE pin (Clock Enable) high, the DDR3 spec requires to wait for additional 500 us after the RESET pin is de-asserted.
The DRAM controller takes care of this delay by itself, using a configurable counter in the SDR_IDCR register. This works in the same way on sun4i/sun5i/sun7i hardware (even the default register value 0x00c80064 is identical). Except that the counter is ticking a bit slower on sun7i (3 DRAM clock cycles instead of 2), resulting in longer actual delays for the same settings.
This patch keeps the old code and only removes the CONFIG_SUN7I ifdef. But maybe we should drop all of this and just add 'udelay(500)' after the DDR3 reset without bothering to play with these undocumented registers.
I'm happy to go with whichever you think is better.
If the total DRAM initialization time in u-boot is not really critical (all the delays are only fractions of millisecond), then I would probably go with the "cargo cult" approach and actually apply the delays in both places ('udelay(500)' after the DDR3 reset and keep the maximum delay in the SDR_IDCR register too).
Makes sense to me.
If someone later decides they really care about boot time to this degree then they can implement the SDR_IDCR thing, hopefully with the aid of a logic analyser, as you say.
Ian.

Moved the impedance setup code part into a separate function. Added explicit wait for ZQ calibration completion before proceeding to the next initialization steps. Removed the CONFIG_SUN7I ifdef guard around the code, which has identical behaviour on sun4i/sun5i/sun7i. And if 'odt_en' is set in the 'dram_para' struct, then ODT now actually gets enabled in the DRAM_IOCR register (which the older code failed to do and was always running without ODT no matter the settings).
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- arch/arm/cpu/armv7/sunxi/dram.c | 71 ++++++++++++++++++++++++++++------ arch/arm/include/asm/arch-sunxi/dram.h | 4 ++ 2 files changed, 63 insertions(+), 12 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index def4247..afeb2df 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -49,6 +49,19 @@ static void await_completion(u32 *reg, u32 mask) }
/* + * Wait up to 1s for mask to be set in given reg. + */ +static void await_bits_set(u32 *reg, u32 mask) +{ + unsigned long tmo = timer_get_us() + 1000000; + + while ((readl(reg) & mask) != mask) { + if (timer_get_us() > tmo) + panic("Timeout initialising DRAM\n"); + } +} + +/* * This performs the external DRAM reset by driving the RESET pin low and * then high again. According to the DDR3 spec, the RESET pin needs to be * kept low for at least 200 us. @@ -353,6 +366,51 @@ static void mctl_ddr3_initialize(void) await_completion(&dram->ccr, DRAM_CCR_INIT); }
+/* + * Perform impedance calibration on the DRAM controller side of the wire. + */ +static void mctl_set_impedance(u32 zq, u32 odt_en) +{ + struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; + u32 reg_val; + u32 zprog = zq & 0xFF, zdata = (zq >> 8) & 0xFFFFF; + +#ifndef CONFIG_SUN7I + /* wait for the default impedance configuration to settle */ + await_bits_set(&dram->zqsr, DRAM_ZQSR_ZDONE); +#endif + + if (!odt_en) + return; + +#ifdef CONFIG_SUN7I + /* some weird magic, but sun7i fails to boot without it */ + writel((1 << 24) | (1 << 1), &dram->zqcr1); +#endif + + /* needed at least for sun5i, because it does not self clear there */ + clrbits_le32(&dram->zqcr0, DRAM_ZQCR0_ZCAL); + + if (zdata) { + /* set the user supplied impedance data */ + reg_val = DRAM_ZQCR0_ZDEN | zdata; + writel(reg_val, &dram->zqcr0); + /* no need to wait, this takes effect immediately */ + } else { + /* do the calibration using the external resistor */ + reg_val = DRAM_ZQCR0_ZCAL | DRAM_ZQCR0_IMP_DIV(zprog); + writel(reg_val, &dram->zqcr0); + /* wait for the new impedance configuration to settle */ + await_bits_set(&dram->zqsr, DRAM_ZQSR_ZDONE); + } + + /* needed at least for sun5i, because it does not self clear there */ + clrbits_le32(&dram->zqcr0, DRAM_ZQCR0_ZCAL); + + /* set I/O configure register */ + writel(DRAM_IOCR_ODT_EN(odt_en), &dram->iocr); +} + unsigned long dramc_init(struct dram_para *para) { struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; @@ -407,20 +465,9 @@ unsigned long dramc_init(struct dram_para *para) reg_val |= DRAM_DCR_MODE(DRAM_DCR_MODE_INTERLEAVE); writel(reg_val, &dram->dcr);
-#ifdef CONFIG_SUN7I - setbits_le32(&dram->zqcr1, (0x1 << 24) | (0x1 << 1)); - if (para->tpr4 & 0x2) - clrsetbits_le32(&dram->zqcr1, (0x1 << 24), (0x1 << 1)); dramc_clock_output_en(1); -#endif
-#if (defined(CONFIG_SUN5I) || defined(CONFIG_SUN7I)) - /* set odt impendance divide ratio */ - reg_val = ((para->zq) >> 8) & 0xfffff; - reg_val |= ((para->zq) & 0xff) << 20; - reg_val |= (para->zq) & 0xf0000000; - writel(reg_val, &dram->zqcr0); -#endif + mctl_set_impedance(para->zq, para->odt_en);
mctl_set_cke_delay();
diff --git a/arch/arm/include/asm/arch-sunxi/dram.h b/arch/arm/include/asm/arch-sunxi/dram.h index 67fbfad..4433eeb 100644 --- a/arch/arm/include/asm/arch-sunxi/dram.h +++ b/arch/arm/include/asm/arch-sunxi/dram.h @@ -159,6 +159,10 @@ struct dram_para {
#define DRAM_ZQCR0_IMP_DIV(n) (((n) & 0xff) << 20) #define DRAM_ZQCR0_IMP_DIV_MASK DRAM_ZQCR0_IMP_DIV(0xff) +#define DRAM_ZQCR0_ZCAL (1 << 31) /* Starts ZQ calibration when set to 1 */ +#define DRAM_ZQCR0_ZDEN (1 << 28) /* Uses ZDATA instead of doing calibration */ + +#define DRAM_ZQSR_ZDONE (1 << 31) /* ZQ calibration completion flag */
#define DRAM_IOCR_ODT_EN(n) ((((n) & 0x3) << 30) | ((n) & 0x3) << 0) #define DRAM_IOCR_ODT_EN_MASK DRAM_IOCR_ODT_EN(0x3)

On Fri, 2014-07-18 at 19:22 +0300, Siarhei Siamashka wrote:
Moved the impedance setup code part into a separate function. Added explicit wait for ZQ calibration completion before proceeding to the next initialization steps. Removed the CONFIG_SUN7I ifdef guard around the code, which has identical behaviour on sun4i/sun5i/sun7i. And if 'odt_en' is set in the 'dram_para' struct, then ODT now actually gets enabled in the DRAM_IOCR register (which the older code failed to do and was always running without ODT no matter the settings).
There's a few aspects of this code which don't seem to be explained here.
Firstly if odt_en is not enabled we now skip setting the impedance. Which seems logical but should me mentioned. It's also worth noting that none of the platforms in u-boot-sunxi.git#master set odt_en
Secondly the weird sun7i magic has changed from - setbits_le32(&dram->zqcr1, (0x1 << 24) | (0x1 << 1)); - if (para->tpr4 & 0x2) - clrsetbits_le32(&dram->zqcr1, (0x1 << 24), (0x1 << 1)); into just a write of the raw value. This should be mentioned. Also this now occurs after the call to dramc_clock_output_en().
Thirdly why do we not wait for ZQ calibration on sun7i?
Lastly it now seems to support calibration using an external resistor. And neither half of that new if (zdata) seems to correspond to the old code.
I think part of the problem here is that this patch is trying to do too much in one go. If separating things out isn't possible (e.g. because these changes are all interdependent) then it is important that the commit message describes them. I'd also steer clear of describing this as a code cleanup when it also has functional changes.
- Wait up to 1s for mask to be set in given reg.
- */
+static void await_bits_set(u32 *reg, u32 mask)
This could be combined with the existing await_completion into a function which takes a mask and a val. Perhaps with convenience wrappers for the two cases.
Ian.

On Mon, 21 Jul 2014 20:20:20 +0100 Ian Campbell ijc@hellion.org.uk wrote:
On Fri, 2014-07-18 at 19:22 +0300, Siarhei Siamashka wrote:
Moved the impedance setup code part into a separate function. Added explicit wait for ZQ calibration completion before proceeding to the next initialization steps. Removed the CONFIG_SUN7I ifdef guard around the code, which has identical behaviour on sun4i/sun5i/sun7i. And if 'odt_en' is set in the 'dram_para' struct, then ODT now actually gets enabled in the DRAM_IOCR register (which the older code failed to do and was always running without ODT no matter the settings).
There's a few aspects of this code which don't seem to be explained here.
Firstly if odt_en is not enabled we now skip setting the impedance. Which seems logical but should me mentioned.
Right. The commit message needs to be rewritten to address the fact that we are introducing the new ZQ calibration code and just removing the old one.
It's also worth noting that none of the platforms in u-boot-sunxi.git#master set odt_en
None of the sunxi devices are using it for anything meaningful (even though some of them attempt to set odt_en in the non-mainline sunxi u-boot, they are actually very lucky if they don't suffer from adverse effects doing this).
The demonstration of the usefulness of this patch is only presented in the 'highspeedtruck' branches, referenced from the cover letter. We know that this code works, because we can get a major DRAM clock speed uplift. Which is simply impossible without doing impedance configuration correctly.
Secondly the weird sun7i magic has changed from
setbits_le32(&dram->zqcr1, (0x1 << 24) | (0x1 << 1));
if (para->tpr4 & 0x2)
clrsetbits_le32(&dram->zqcr1, (0x1 << 24), (0x1 << 1));
If you want to have some extra fun, then you can compare this fragment with the original code from boot0:
//SDR_ZQCR1 set bit24 to 1 reg_val = mctl_read_w(SDR_ZQCR1); reg_val |= (0x1<<24) | (0x1<<1); if(para->dram_tpr4 & 0x2) { reg_val &= ~((0x1<<24) | (0x1<<1)); } mctl_write_w(SDR_ZQCR1, reg_val);
As you can see, the original boot0 logic already got mangled somewhat. But since we have no boards, which would pass the (para->dram_tpr4 & 0x2) check, it is not really important in practice.
into just a write of the raw value. This should be mentioned. Also this now occurs after the call to dramc_clock_output_en().
The old code is basically a non-working gibberish. And it is not very useful as a reference.
We had to first figure out how the hardware works (taking advantage of the fact that Rockchip and TI Keystone2 DRAM controllers are rather similar and we can get some hints from them): http://linux-sunxi.org/A10_DRAM_Controller_Register_Guide#SDR_ZQCR0 And then reimplement the ZQ calibration code using this information.
Only the ZPROG and ZDATA bits placement is kept the same in the 'zq' parameter for "backwards compatibility" reasons. But there is no real backwards compatibility, because it is difficult to be exactly compatible with something that is broken.
Thirdly why do we not wait for ZQ calibration on sun7i?
On sun4i and sun5i hardware, we see that the ZQ calibration has been already initiated by something and is in progress. So it is reasonable to wait until it finishes doing its things and only then start the real ZQ calibration with our settings.
On sun7i hardware, there are no signs of this auto-initiated calibration. So if we add a wait, it will only result in a timeout panic.
Lastly it now seems to support calibration using an external resistor. And neither half of that new if (zdata) seems to correspond to the old code.
That's right.
I think part of the problem here is that this patch is trying to do too much in one go.
The patch is rather small in terms of LOC and added functionality. The ZDATA handling can be split into a separate patch though.
If separating things out isn't possible (e.g. because these changes are all interdependent) then it is important that the commit message describes them.
If this would make you more happy, it is also possible to just remove all the old impedance code in one commit (since nothing is really using it yet). And then introduce the new impedance code in another commit.
Luckily, in the mainline u-boot we still don't have any copy-pasted board configs, which happen to pretend to be using the 'odt_en' parameter in the 'dram_para' struct.
I'd also steer clear of describing this as a code cleanup when it also has functional changes.
The 'cleanup' was just a bad choice of word. It is a reimplementation.
- Wait up to 1s for mask to be set in given reg.
- */
+static void await_bits_set(u32 *reg, u32 mask)
This could be combined with the existing await_completion into a function which takes a mask and a val. Perhaps with convenience wrappers for the two cases.
That's a good idea.

On Fri, 2014-07-25 at 06:44 +0300, Siarhei Siamashka wrote:
I'd also steer clear of describing this as a code cleanup when it also has functional changes.
The 'cleanup' was just a bad choice of word. It is a reimplementation.
Right, I think that's the crux of all issues raised (hence I didn't respond to your comments, which looked good though, thanks). The commit said clean up so I was expecting no (or few) functional changes, so all the undescribed changes made me antsy.
I'd be happy with a patch described as reimplement (ideally with a more comprehensive list of the changes in the changelog) or with a remove and replace as you prefer.
Ian.

The sun5i hardware (Allwinner A13) introduced configurable MBUS clock speed. Allwinner A13 uses only 16-bit data bus width to connect the external DRAM, which is halved compared to the 32-bit data bus of sun4i (Allwinner A10), so it does not make much sense to clock a wider internal bus at a very high speed. The Allwinner A13 manual specifies 300 MHz MBUS clock speed limit and 533 MHz DRAM clock speed limit. Newer sun7i hardware (Allwinner A20) has a full width 32-bit external memory interface again, but still keeps the MBUS clock speed configurable. Clocking MBUS too low inhibits memory performance and one has to find the optimal MBUS/DRAM clock speed ratio, which may depend on many factors.
This patch introduces a new 'mbus_clock' parameter for the 'dram_para' struct and uses it as a desired MBUS clock speed target. If 'mbus_clock' is not set, 300 MHz is used by default to match the older hardcoded settings.
Attempting to set the MBUS clock speed has no effect on sun4i, but does no harm either.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- arch/arm/cpu/armv7/sunxi/dram.c | 41 +++++++++++++++++++++++++++++----- arch/arm/include/asm/arch-sunxi/dram.h | 1 + 2 files changed, 36 insertions(+), 6 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index afeb2df..d41fb1e 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -178,11 +178,19 @@ static void mctl_configure_hostport(void) writel(hpcr_value[i], &dram->hpcr[i]); }
-static void mctl_setup_dram_clock(u32 clk) +static void mctl_setup_dram_clock(u32 clk, u32 mbus_clk) { u32 reg_val; struct sunxi_ccm_reg *ccm = (struct sunxi_ccm_reg *)SUNXI_CCM_BASE;
+ /* PLL5P and PLL6 are the potential clock sources for MBUS */ + u32 pll6x_div, pll5p_div; + u32 pll6x_clk = clock_get_pll6() / 1000000; + u32 pll5p_clk = clk / 24 * 24; +#ifdef CONFIG_SUN7I + pll6x_clk *= 2; /* sun7i uses PLL6*2, sun5i uses just PLL6 */ +#endif + /* setup DRAM PLL */ reg_val = readl(&ccm->pll5_cfg); reg_val &= ~CCM_PLL5_CTRL_M_MASK; /* set M to 0 (x1) */ @@ -191,30 +199,35 @@ static void mctl_setup_dram_clock(u32 clk) reg_val &= ~CCM_PLL5_CTRL_P_MASK; /* set P to 0 (x1) */ if (clk >= 540 && clk < 552) { /* dram = 540MHz, pll5p = 540MHz */ + pll5p_clk = 540; reg_val |= CCM_PLL5_CTRL_M(CCM_PLL5_CTRL_M_X(2)); reg_val |= CCM_PLL5_CTRL_K(CCM_PLL5_CTRL_K_X(3)); reg_val |= CCM_PLL5_CTRL_N(CCM_PLL5_CTRL_N_X(15)); reg_val |= CCM_PLL5_CTRL_P(1); } else if (clk >= 512 && clk < 528) { /* dram = 512MHz, pll5p = 384MHz */ + pll5p_clk = 384; reg_val |= CCM_PLL5_CTRL_M(CCM_PLL5_CTRL_M_X(3)); reg_val |= CCM_PLL5_CTRL_K(CCM_PLL5_CTRL_K_X(4)); reg_val |= CCM_PLL5_CTRL_N(CCM_PLL5_CTRL_N_X(16)); reg_val |= CCM_PLL5_CTRL_P(2); } else if (clk >= 496 && clk < 504) { /* dram = 496MHz, pll5p = 372MHz */ + pll5p_clk = 372; reg_val |= CCM_PLL5_CTRL_M(CCM_PLL5_CTRL_M_X(3)); reg_val |= CCM_PLL5_CTRL_K(CCM_PLL5_CTRL_K_X(2)); reg_val |= CCM_PLL5_CTRL_N(CCM_PLL5_CTRL_N_X(31)); reg_val |= CCM_PLL5_CTRL_P(2); } else if (clk >= 468 && clk < 480) { /* dram = 468MHz, pll5p = 468MHz */ + pll5p_clk = 468; reg_val |= CCM_PLL5_CTRL_M(CCM_PLL5_CTRL_M_X(2)); reg_val |= CCM_PLL5_CTRL_K(CCM_PLL5_CTRL_K_X(3)); reg_val |= CCM_PLL5_CTRL_N(CCM_PLL5_CTRL_N_X(13)); reg_val |= CCM_PLL5_CTRL_P(1); } else if (clk >= 396 && clk < 408) { /* dram = 396MHz, pll5p = 396MHz */ + pll5p_clk = 396; reg_val |= CCM_PLL5_CTRL_M(CCM_PLL5_CTRL_M_X(2)); reg_val |= CCM_PLL5_CTRL_K(CCM_PLL5_CTRL_K_X(3)); reg_val |= CCM_PLL5_CTRL_N(CCM_PLL5_CTRL_N_X(11)); @@ -242,10 +255,26 @@ static void mctl_setup_dram_clock(u32 clk) #endif
/* setup MBUS clock */ - reg_val = CCM_MBUS_CTRL_GATE | - CCM_MBUS_CTRL_CLK_SRC(CCM_MBUS_CTRL_CLK_SRC_PLL6) | - CCM_MBUS_CTRL_N(CCM_MBUS_CTRL_N_X(2)) | - CCM_MBUS_CTRL_M(CCM_MBUS_CTRL_M_X(2)); + if (!mbus_clk) + mbus_clk = 300; + pll6x_div = (pll6x_clk + mbus_clk - 1) / mbus_clk; + pll5p_div = (pll5p_clk + mbus_clk - 1) / mbus_clk; + + if (pll6x_div <= 16 && pll6x_clk / pll6x_div > pll5p_clk / pll5p_div) { + /* use PLL6 as the MBUS clock source */ + reg_val = CCM_MBUS_CTRL_GATE | + CCM_MBUS_CTRL_CLK_SRC(CCM_MBUS_CTRL_CLK_SRC_PLL6) | + CCM_MBUS_CTRL_N(CCM_MBUS_CTRL_N_X(1)) | + CCM_MBUS_CTRL_M(CCM_MBUS_CTRL_M_X(pll6x_div)); + } else if (pll5p_div <= 16) { + /* use PLL5P as the MBUS clock source */ + reg_val = CCM_MBUS_CTRL_GATE | + CCM_MBUS_CTRL_CLK_SRC(CCM_MBUS_CTRL_CLK_SRC_PLL5) | + CCM_MBUS_CTRL_N(CCM_MBUS_CTRL_N_X(1)) | + CCM_MBUS_CTRL_M(CCM_MBUS_CTRL_M_X(pll5p_div)); + } else { + panic("Bad mbus_clk\n"); + } writel(reg_val, &ccm->mbus_clk_cfg);
/* @@ -423,7 +452,7 @@ unsigned long dramc_init(struct dram_para *para) return 0;
/* setup DRAM relative clock */ - mctl_setup_dram_clock(para->clock); + mctl_setup_dram_clock(para->clock, para->mbus_clock);
/* Disable any pad power save control */ mctl_disable_power_save(); diff --git a/arch/arm/include/asm/arch-sunxi/dram.h b/arch/arm/include/asm/arch-sunxi/dram.h index 4433eeb..3c29256 100644 --- a/arch/arm/include/asm/arch-sunxi/dram.h +++ b/arch/arm/include/asm/arch-sunxi/dram.h @@ -69,6 +69,7 @@ struct sunxi_dram_reg {
struct dram_para { u32 clock; + u32 mbus_clock; u32 type; u32 rank_num; u32 density;

On Fri, 2014-07-18 at 19:22 +0300, Siarhei Siamashka wrote:
The sun5i hardware (Allwinner A13) introduced configurable MBUS clock speed. Allwinner A13 uses only 16-bit data bus width to connect the external DRAM, which is halved compared to the 32-bit data bus of sun4i (Allwinner A10), so it does not make much sense to clock a wider internal bus at a very high speed. The Allwinner A13 manual specifies 300 MHz MBUS clock speed limit and 533 MHz DRAM clock speed limit. Newer sun7i hardware (Allwinner A20) has a full width 32-bit external memory interface again, but still keeps the MBUS clock speed configurable. Clocking MBUS too low inhibits memory performance and one has to find the optimal MBUS/DRAM clock speed ratio, which may depend on many factors.
This patch introduces a new 'mbus_clock' parameter for the 'dram_para' struct and uses it as a desired MBUS clock speed target. If 'mbus_clock' is not set, 300 MHz is used by default to match the older hardcoded settings.
Nothing in this series seems to set it for any board -- is that expected?
- if (pll6x_div <= 16 && pll6x_clk / pll6x_div > pll5p_clk / pll5p_div) {
Some brackets or perhaps some temporaries ({pll5p,pll6x}_rate ?) might help clarity/readability here.
When pll6 is viable you prefer the faster clock, even if it might happen to be further from the requested clock, is that right? Or does all the arithmetic end up with that never being the case?
Ian.

On Mon, 21 Jul 2014 20:31:30 +0100 Ian Campbell ijc@hellion.org.uk wrote:
On Fri, 2014-07-18 at 19:22 +0300, Siarhei Siamashka wrote:
The sun5i hardware (Allwinner A13) introduced configurable MBUS clock speed. Allwinner A13 uses only 16-bit data bus width to connect the external DRAM, which is halved compared to the 32-bit data bus of sun4i (Allwinner A10), so it does not make much sense to clock a wider internal bus at a very high speed. The Allwinner A13 manual specifies 300 MHz MBUS clock speed limit and 533 MHz DRAM clock speed limit. Newer sun7i hardware (Allwinner A20) has a full width 32-bit external memory interface again, but still keeps the MBUS clock speed configurable. Clocking MBUS too low inhibits memory performance and one has to find the optimal MBUS/DRAM clock speed ratio, which may depend on many factors.
This patch introduces a new 'mbus_clock' parameter for the 'dram_para' struct and uses it as a desired MBUS clock speed target. If 'mbus_clock' is not set, 300 MHz is used by default to match the older hardcoded settings.
Nothing in this series seems to set it for any board -- is that expected?
Yes. Not touching the board config files avoids any extra dependencies and merging conflicts. I could explicitly add ".mbus_clock = 300" to the Cubietruck 'dram_para' struct, but this is the default MBUS value anyway and makes no real difference.
If we wanted to set something other than 300MHz, there are too many possible options to select from: http://linux-sunxi.org/A10_DRAM_Controller_Performance
- if (pll6x_div <= 16 && pll6x_clk / pll6x_div > pll5p_clk / pll5p_div) {
Some brackets or perhaps some temporaries ({pll5p,pll6x}_rate ?) might help clarity/readability here.
With the brackets we would exceed the 80 characters line limit. This leaves us with the only choice. I'll add the temporaries.
When pll6 is viable you prefer the faster clock, even if it might happen to be further from the requested clock, is that right? Or does all the arithmetic end up with that never being the case?
The 'pll5p_rate' and 'pll6x_rate' values are always equal to or less than the requested 'mbus_clock'. Selecting the larger of these two will make it closer to the requested clock.

On Fri, 2014-07-25 at 07:00 +0300, Siarhei Siamashka wrote:
On Mon, 21 Jul 2014 20:31:30 +0100 Ian Campbell ijc@hellion.org.uk wrote:
On Fri, 2014-07-18 at 19:22 +0300, Siarhei Siamashka wrote:
The sun5i hardware (Allwinner A13) introduced configurable MBUS clock speed. Allwinner A13 uses only 16-bit data bus width to connect the external DRAM, which is halved compared to the 32-bit data bus of sun4i (Allwinner A10), so it does not make much sense to clock a wider internal bus at a very high speed. The Allwinner A13 manual specifies 300 MHz MBUS clock speed limit and 533 MHz DRAM clock speed limit. Newer sun7i hardware (Allwinner A20) has a full width 32-bit external memory interface again, but still keeps the MBUS clock speed configurable. Clocking MBUS too low inhibits memory performance and one has to find the optimal MBUS/DRAM clock speed ratio, which may depend on many factors.
This patch introduces a new 'mbus_clock' parameter for the 'dram_para' struct and uses it as a desired MBUS clock speed target. If 'mbus_clock' is not set, 300 MHz is used by default to match the older hardcoded settings.
Nothing in this series seems to set it for any board -- is that expected?
Yes. Not touching the board config files avoids any extra dependencies and merging conflicts.
Makes sense.
I could explicitly add ".mbus_clock = 300" to the Cubietruck 'dram_para' struct, but this is the default MBUS value anyway and makes no real difference.
Sure, no need unless you feel it.
When pll6 is viable you prefer the faster clock, even if it might happen to be further from the requested clock, is that right? Or does all the arithmetic end up with that never being the case?
The 'pll5p_rate' and 'pll6x_rate' values are always equal to or less than the requested 'mbus_clock'. Selecting the larger of these two will make it closer to the requested clock.
Makes sense. Can you write that in a comment please.
Ian.

This configures the PLL5P clock frequency to something in the ballpark of 1GHz and allows more choices for MBUS and G2D clock frequency selection (using their own divisors). In particular, it enables the use of 2/3 clock speed ratio between MBUS and DRAM.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- arch/arm/cpu/armv7/sunxi/dram.c | 28 +++++++++++----------------- 1 file changed, 11 insertions(+), 17 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index d41fb1e..f756f23 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -186,7 +186,7 @@ static void mctl_setup_dram_clock(u32 clk, u32 mbus_clk) /* PLL5P and PLL6 are the potential clock sources for MBUS */ u32 pll6x_div, pll5p_div; u32 pll6x_clk = clock_get_pll6() / 1000000; - u32 pll5p_clk = clk / 24 * 24; + u32 pll5p_clk = clk / 24 * 48; #ifdef CONFIG_SUN7I pll6x_clk *= 2; /* sun7i uses PLL6*2, sun5i uses just PLL6 */ #endif @@ -198,46 +198,40 @@ static void mctl_setup_dram_clock(u32 clk, u32 mbus_clk) reg_val &= ~CCM_PLL5_CTRL_N_MASK; /* set N to 0 (x0) */ reg_val &= ~CCM_PLL5_CTRL_P_MASK; /* set P to 0 (x1) */ if (clk >= 540 && clk < 552) { - /* dram = 540MHz, pll5p = 540MHz */ - pll5p_clk = 540; + /* dram = 540MHz, pll5p = 1080MHz */ + pll5p_clk = 1080; reg_val |= CCM_PLL5_CTRL_M(CCM_PLL5_CTRL_M_X(2)); reg_val |= CCM_PLL5_CTRL_K(CCM_PLL5_CTRL_K_X(3)); reg_val |= CCM_PLL5_CTRL_N(CCM_PLL5_CTRL_N_X(15)); - reg_val |= CCM_PLL5_CTRL_P(1); } else if (clk >= 512 && clk < 528) { - /* dram = 512MHz, pll5p = 384MHz */ - pll5p_clk = 384; + /* dram = 512MHz, pll5p = 1536MHz */ + pll5p_clk = 1536; reg_val |= CCM_PLL5_CTRL_M(CCM_PLL5_CTRL_M_X(3)); reg_val |= CCM_PLL5_CTRL_K(CCM_PLL5_CTRL_K_X(4)); reg_val |= CCM_PLL5_CTRL_N(CCM_PLL5_CTRL_N_X(16)); - reg_val |= CCM_PLL5_CTRL_P(2); } else if (clk >= 496 && clk < 504) { - /* dram = 496MHz, pll5p = 372MHz */ - pll5p_clk = 372; + /* dram = 496MHz, pll5p = 1488MHz */ + pll5p_clk = 1488; reg_val |= CCM_PLL5_CTRL_M(CCM_PLL5_CTRL_M_X(3)); reg_val |= CCM_PLL5_CTRL_K(CCM_PLL5_CTRL_K_X(2)); reg_val |= CCM_PLL5_CTRL_N(CCM_PLL5_CTRL_N_X(31)); - reg_val |= CCM_PLL5_CTRL_P(2); } else if (clk >= 468 && clk < 480) { - /* dram = 468MHz, pll5p = 468MHz */ - pll5p_clk = 468; + /* dram = 468MHz, pll5p = 936MHz */ + pll5p_clk = 936; reg_val |= CCM_PLL5_CTRL_M(CCM_PLL5_CTRL_M_X(2)); reg_val |= CCM_PLL5_CTRL_K(CCM_PLL5_CTRL_K_X(3)); reg_val |= CCM_PLL5_CTRL_N(CCM_PLL5_CTRL_N_X(13)); - reg_val |= CCM_PLL5_CTRL_P(1); } else if (clk >= 396 && clk < 408) { - /* dram = 396MHz, pll5p = 396MHz */ - pll5p_clk = 396; + /* dram = 396MHz, pll5p = 792MHz */ + pll5p_clk = 792; reg_val |= CCM_PLL5_CTRL_M(CCM_PLL5_CTRL_M_X(2)); reg_val |= CCM_PLL5_CTRL_K(CCM_PLL5_CTRL_K_X(3)); reg_val |= CCM_PLL5_CTRL_N(CCM_PLL5_CTRL_N_X(11)); - reg_val |= CCM_PLL5_CTRL_P(1); } else { /* any other frequency that is a multiple of 24 */ reg_val |= CCM_PLL5_CTRL_M(CCM_PLL5_CTRL_M_X(2)); reg_val |= CCM_PLL5_CTRL_K(CCM_PLL5_CTRL_K_X(2)); reg_val |= CCM_PLL5_CTRL_N(CCM_PLL5_CTRL_N_X(clk / 24)); - reg_val |= CCM_PLL5_CTRL_P(CCM_PLL5_CTRL_P_X(2)); } reg_val &= ~CCM_PLL5_CTRL_VCO_GAIN; /* PLL VCO Gain off */ reg_val |= CCM_PLL5_CTRL_EN; /* PLL On */

On Fri, 2014-07-18 at 19:22 +0300, Siarhei Siamashka wrote:
This configures the PLL5P clock frequency to something in the ballpark of 1GHz and allows more choices for MBUS and G2D clock frequency selection (using their own divisors). In particular, it enables the use of 2/3 clock speed ratio between MBUS and DRAM.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com
Acked-by: Ian Campbell ijc@hellion.org.uk

The stale error status should be cleared for all sun4i/sun5i/sun7i hardware and not just for sun7i. Also there are two types of DQS gate training errors ("found no result" and "found more than one possible result"). Both are handled now.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- arch/arm/cpu/armv7/sunxi/dram.c | 2 -- arch/arm/include/asm/arch-sunxi/dram.h | 4 +++- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index f756f23..18a5c3b 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -289,9 +289,7 @@ static int dramc_scan_readpipe(void) u32 reg_val;
/* data training trigger */ -#ifdef CONFIG_SUN7I clrbits_le32(&dram->csr, DRAM_CSR_FAILED); -#endif setbits_le32(&dram->ccr, DRAM_CCR_DATA_TRAINING);
/* check whether data training process has completed */ diff --git a/arch/arm/include/asm/arch-sunxi/dram.h b/arch/arm/include/asm/arch-sunxi/dram.h index 3c29256..11e3507 100644 --- a/arch/arm/include/asm/arch-sunxi/dram.h +++ b/arch/arm/include/asm/arch-sunxi/dram.h @@ -133,7 +133,9 @@ struct dram_para { #define DRAM_DCR_MODE_SEQ 0x0 #define DRAM_DCR_MODE_INTERLEAVE 0x1
-#define DRAM_CSR_FAILED (0x1 << 20) +#define DRAM_CSR_DTERR (0x1 << 20) +#define DRAM_CSR_DTIERR (0x1 << 21) +#define DRAM_CSR_FAILED (DRAM_CSR_DTERR | DRAM_CSR_DTIERR)
#define DRAM_DRR_TRFC(n) ((n) & 0xff) #define DRAM_DRR_TREFI(n) (((n) & 0xffff) << 8)

On Fri, 2014-07-18 at 19:22 +0300, Siarhei Siamashka wrote:
The stale error status should be cleared for all sun4i/sun5i/sun7i hardware and not just for sun7i. Also there are two types of DQS gate training errors ("found no result" and "found more than one possible result"). Both are handled now.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com
Acked-by: Ian Campbell ijc@hellion.org.uk

It is going to be useful in more than one place.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- arch/arm/cpu/armv7/sunxi/dram.c | 30 +++++++++++++++++++----------- 1 file changed, 19 insertions(+), 11 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index 18a5c3b..49d1770 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -115,23 +115,31 @@ static void mctl_enable_dll0(u32 phase) udelay(22); }
+/* Get the number of DDR byte lanes */ +static u32 mctl_get_number_of_lanes(void) +{ + struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; + switch (readl(&dram->dcr) & DRAM_DCR_BUS_WIDTH_MASK) { + case DRAM_DCR_BUS_WIDTH(DRAM_DCR_BUS_WIDTH_32BIT): + return 4; + case DRAM_DCR_BUS_WIDTH(DRAM_DCR_BUS_WIDTH_16BIT): + return 2; + default: + return 1; + } +} + /* * Note: This differs from pm/standby in that it checks the bus width */ static void mctl_enable_dllx(u32 phase) { struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; - u32 i, n, bus_width; - - bus_width = readl(&dram->dcr); + u32 i, number_of_lanes;
- if ((bus_width & DRAM_DCR_BUS_WIDTH_MASK) == - DRAM_DCR_BUS_WIDTH(DRAM_DCR_BUS_WIDTH_32BIT)) - n = DRAM_DCR_NR_DLLCR_32BIT; - else - n = DRAM_DCR_NR_DLLCR_16BIT; + number_of_lanes = mctl_get_number_of_lanes();
- for (i = 1; i < n; i++) { + for (i = 1; i <= number_of_lanes; i++) { clrsetbits_le32(&dram->dllcr[i], 0xf << 14, (phase & 0xf) << 14); clrsetbits_le32(&dram->dllcr[i], DRAM_DLLCR_NRESET, @@ -140,12 +148,12 @@ static void mctl_enable_dllx(u32 phase) } udelay(2);
- for (i = 1; i < n; i++) + for (i = 1; i <= number_of_lanes; i++) clrbits_le32(&dram->dllcr[i], DRAM_DLLCR_NRESET | DRAM_DLLCR_DISABLE); udelay(22);
- for (i = 1; i < n; i++) + for (i = 1; i <= number_of_lanes; i++) clrsetbits_le32(&dram->dllcr[i], DRAM_DLLCR_DISABLE, DRAM_DLLCR_NRESET); udelay(22);

On Fri, 2014-07-18 at 19:23 +0300, Siarhei Siamashka wrote:
It is going to be useful in more than one place.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com
arch/arm/cpu/armv7/sunxi/dram.c | 30 +++++++++++++++++++----------- 1 file changed, 19 insertions(+), 11 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index 18a5c3b..49d1770 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -115,23 +115,31 @@ static void mctl_enable_dll0(u32 phase) udelay(22); }
+/* Get the number of DDR byte lanes */ +static u32 mctl_get_number_of_lanes(void) +{
- struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE;
- switch (readl(&dram->dcr) & DRAM_DCR_BUS_WIDTH_MASK) {
- case DRAM_DCR_BUS_WIDTH(DRAM_DCR_BUS_WIDTH_32BIT):
return 4;
- case DRAM_DCR_BUS_WIDTH(DRAM_DCR_BUS_WIDTH_16BIT):
return 2;
- default:
return 1;
- }
+}
/*
- Note: This differs from pm/standby in that it checks the bus width
*/ static void mctl_enable_dllx(u32 phase) { struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE;
- u32 i, n, bus_width;
- bus_width = readl(&dram->dcr);
- u32 i, number_of_lanes;
- if ((bus_width & DRAM_DCR_BUS_WIDTH_MASK) ==
DRAM_DCR_BUS_WIDTH(DRAM_DCR_BUS_WIDTH_32BIT))
n = DRAM_DCR_NR_DLLCR_32BIT;
- else
n = DRAM_DCR_NR_DLLCR_16BIT;
Either DRAM_DCR_NR_DLLCR_??BIT are obsolete now and should be removed or they should be be adjusted and used in the new function.
ISTM they don't add much so removing would be fine by me.
- number_of_lanes = mctl_get_number_of_lanes();
There is a subtle functional change here since number_of_lanes can be 1 whereas n could never have been 2. Is that intended/ok? Please mention in the commit message.
Ian.

On Mon, 21 Jul 2014 20:41:33 +0100 Ian Campbell ijc@hellion.org.uk wrote:
On Fri, 2014-07-18 at 19:23 +0300, Siarhei Siamashka wrote:
It is going to be useful in more than one place.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com
arch/arm/cpu/armv7/sunxi/dram.c | 30 +++++++++++++++++++----------- 1 file changed, 19 insertions(+), 11 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index 18a5c3b..49d1770 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -115,23 +115,31 @@ static void mctl_enable_dll0(u32 phase) udelay(22); }
+/* Get the number of DDR byte lanes */ +static u32 mctl_get_number_of_lanes(void) +{
- struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE;
- switch (readl(&dram->dcr) & DRAM_DCR_BUS_WIDTH_MASK) {
- case DRAM_DCR_BUS_WIDTH(DRAM_DCR_BUS_WIDTH_32BIT):
return 4;
- case DRAM_DCR_BUS_WIDTH(DRAM_DCR_BUS_WIDTH_16BIT):
return 2;
- default:
return 1;
- }
+}
/*
- Note: This differs from pm/standby in that it checks the bus width
*/ static void mctl_enable_dllx(u32 phase) { struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE;
- u32 i, n, bus_width;
- bus_width = readl(&dram->dcr);
- u32 i, number_of_lanes;
- if ((bus_width & DRAM_DCR_BUS_WIDTH_MASK) ==
DRAM_DCR_BUS_WIDTH(DRAM_DCR_BUS_WIDTH_32BIT))
n = DRAM_DCR_NR_DLLCR_32BIT;
- else
n = DRAM_DCR_NR_DLLCR_16BIT;
Either DRAM_DCR_NR_DLLCR_??BIT are obsolete now and should be removed or they should be be adjusted and used in the new function.
ISTM they don't add much so removing would be fine by me.
Agreed, I also think that they are better to be dropped.
- number_of_lanes = mctl_get_number_of_lanes();
There is a subtle functional change here since number_of_lanes can be 1 whereas n could never have been 2. Is that intended/ok? Please mention in the commit message.
I tried to experiment with setting the 8-bit bus width and it is semi-workable (single byte access is OK, but accessing more than one byte is broken). This part of the patch looks like a forgotten leftover of these experiments. But it clearly has no practical value and we only normally deal with the 16-bit or 32-bit bus width.
The most correct way of handling this unexpected code branch would be to panic. But that's an unnecessarily increase of the code size. So I think that the best solution is just to keep the old code logic (expect only 16-bit or 32-bit bus width and 2 or 4 lanes).

On Fri, 2014-07-25 at 07:26 +0300, Siarhei Siamashka wrote:
- number_of_lanes = mctl_get_number_of_lanes();
There is a subtle functional change here since number_of_lanes can be 1 whereas n could never have been 2. Is that intended/ok? Please mention in the commit message.
I tried to experiment with setting the 8-bit bus width and it is semi-workable (single byte access is OK, but accessing more than one byte is broken). This part of the patch looks like a forgotten leftover of these experiments. But it clearly has no practical value and we only normally deal with the 16-bit or 32-bit bus width.
The most correct way of handling this unexpected code branch would be to panic. But that's an unnecessarily increase of the code size. So I think that the best solution is just to keep the old code logic (expect only 16-bit or 32-bit bus width and 2 or 4 lanes).
Not sure how much a panic actually adds to the code size or if it is worth worrying about, but removing the 8-bit stuff is fine by me anyway.
Ian.

The hardware DQS gate training is a bit unreliable and does not always find the best delay settings.
So we introduce a 32-bit 'dqs_gating_delay' variable, where each byte encodes the DQS gating delay for each byte lane. The delay granularity is 1/4 cycle.
Also we allow to enable the active DQS gating window mode, which works better than the passive mode in practice. The DDR3 spec says that there is a 0.9 cycles preamble and 0.3 cycle postamble. The DQS window has to be opened during preamble and closed during postamble. In the passive window mode, the gating window is opened and closed by just using the gating delay settings. And because of the 1/4 cycle delay granularity, accurately hitting the 0.3 cycle long postamble is a bit tough. In the active window mode, the gating window is auto-closing with the help of monitoring the DQS line, which relaxes the gating delay accuracy requirements.
But the hardware DQS gate training is still performed in the passive window mode. It is a more strict test, which is reducing the results variance compared to the training with active window mode.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- arch/arm/cpu/armv7/sunxi/dram.c | 55 +++++++++++++++++++++++++++++++++- arch/arm/include/asm/arch-sunxi/dram.h | 2 ++ 2 files changed, 56 insertions(+), 1 deletion(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index 49d1770..2e994db 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -99,6 +99,14 @@ static void mctl_itm_enable(void) clrbits_le32(&dram->ccr, DRAM_CCR_ITM_OFF); }
+static void mctl_itm_reset(void) +{ + mctl_itm_disable(); + udelay(1); /* ITM reset needs a bit of delay */ + mctl_itm_enable(); + udelay(1); +} + static void mctl_enable_dll0(u32 phase) { struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; @@ -291,6 +299,37 @@ static void mctl_setup_dram_clock(u32 clk, u32 mbus_clk) udelay(22); }
+/* + * The data from rslrX and rdgrX registers (X=rank) is stored + * in a single 32-bit value using the following format: + * bits [31:26] - DQS gating system latency for byte lane 3 + * bits [25:24] - DQS gating phase select for byte lane 3 + * bits [23:18] - DQS gating system latency for byte lane 2 + * bits [17:16] - DQS gating phase select for byte lane 2 + * bits [15:10] - DQS gating system latency for byte lane 1 + * bits [ 9:8 ] - DQS gating phase select for byte lane 1 + * bits [ 7:2 ] - DQS gating system latency for byte lane 0 + * bits [ 1:0 ] - DQS gating phase select for byte lane 0 + */ +static void mctl_set_dqs_gating_delay(int rank, u32 dqs_gating_delay) +{ + struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; + u32 lane, number_of_lanes = mctl_get_number_of_lanes(); + /* rank0 gating system latency (3 bits per lane: cycles) */ + u32 slr = readl(rank == 0 ? &dram->rslr0 : &dram->rslr1); + /* rank0 gating phase select (2 bits per lane: 90, 180, 270, 360) */ + u32 dgr = readl(rank == 0 ? &dram->rdgr0 : &dram->rdgr1); + for (lane = 0; lane < number_of_lanes; lane++) { + u32 tmp = dqs_gating_delay >> (lane * 8); + slr &= ~(7 << (lane * 3)); + slr |= ((tmp >> 2) & 7) << (lane * 3); + dgr &= ~(3 << (lane * 2)); + dgr |= (tmp & 3) << (lane * 2); + } + writel(slr, rank == 0 ? &dram->rslr0 : &dram->rslr1); + writel(dgr, rank == 0 ? &dram->rdgr0 : &dram->rdgr1); +} + static int dramc_scan_readpipe(void) { struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; @@ -534,7 +573,7 @@ unsigned long dramc_init(struct dram_para *para) writel(para->emr2, &dram->emr2); writel(para->emr3, &dram->emr3);
- /* set DQS window mode */ + /* disable drift compensation and set passive DQS window mode */ clrsetbits_le32(&dram->ccr, DRAM_CCR_DQS_DRIFT_COMP, DRAM_CCR_DQS_GATE);
#ifdef CONFIG_SUN7I @@ -547,11 +586,25 @@ unsigned long dramc_init(struct dram_para *para)
/* scan read pipe value */ mctl_itm_enable(); + + /* Hardware DQS gate training */ ret_val = dramc_scan_readpipe();
if (ret_val < 0) return 0;
+ /* allow to override the DQS training results with a custom delay */ + if (para->dqs_gating_delay) + mctl_set_dqs_gating_delay(0, para->dqs_gating_delay); + + /* set the DQS gating window type */ + if (para->active_windowing) + clrbits_le32(&dram->ccr, DRAM_CCR_DQS_GATE); + else + setbits_le32(&dram->ccr, DRAM_CCR_DQS_GATE); + + mctl_itm_reset(); + /* configure all host port */ mctl_configure_hostport();
diff --git a/arch/arm/include/asm/arch-sunxi/dram.h b/arch/arm/include/asm/arch-sunxi/dram.h index 11e3507..3b21225 100644 --- a/arch/arm/include/asm/arch-sunxi/dram.h +++ b/arch/arm/include/asm/arch-sunxi/dram.h @@ -88,6 +88,8 @@ struct dram_para { u32 emr1; u32 emr2; u32 emr3; + u32 dqs_gating_delay; + u32 active_windowing; };
#define DRAM_CCR_COMMAND_RATE_1T (0x1 << 5)

Add the necessary missing bits from the legacy u-boot-sunxi for the Allwinner A10 and A13 support (originally authored by Henrik Nordstrom, Stefan Roese, Oliver Schinagl and Hans de Goede).
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- arch/arm/cpu/armv7/sunxi/dram.c | 55 ++++++++++++++++++++++++++++++++++++++--- 1 file changed, 52 insertions(+), 3 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index 2e994db..8f35e1b 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -71,9 +71,26 @@ static void mctl_ddr3_reset(void) struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE;
- clrbits_le32(&dram->mcr, DRAM_MCR_RESET); - udelay(200); - setbits_le32(&dram->mcr, DRAM_MCR_RESET); +#ifdef CONFIG_SUN4I + struct sunxi_timer_reg *timer = + (struct sunxi_timer_reg *)SUNXI_TIMER_BASE; + u32 reg_val; + + writel(0, &timer->cpu_cfg); + reg_val = readl(&timer->cpu_cfg); + + if ((reg_val & CPU_CFG_CHIP_VER_MASK) != + CPU_CFG_CHIP_VER(CPU_CFG_CHIP_REV_A)) { + setbits_le32(&dram->mcr, DRAM_MCR_RESET); + udelay(200); + clrbits_le32(&dram->mcr, DRAM_MCR_RESET); + } else +#endif + { + clrbits_le32(&dram->mcr, DRAM_MCR_RESET); + udelay(200); + setbits_le32(&dram->mcr, DRAM_MCR_RESET); + } }
static void mctl_set_drive(void) @@ -168,6 +185,26 @@ static void mctl_enable_dllx(u32 phase) }
static u32 hpcr_value[32] = { +#ifdef CONFIG_SUN5I + 0, 0, 0, 0, + 0, 0, 0, 0, + 0, 0, 0, 0, + 0, 0, 0, 0, + 0x1031, 0x1031, 0x0735, 0x1035, + 0x1035, 0x0731, 0x1031, 0, + 0x0301, 0x0301, 0x0301, 0x0301, + 0x0301, 0x0301, 0x0301, 0 +#endif +#ifdef CONFIG_SUN4I + 0x0301, 0x0301, 0x0301, 0x0301, + 0x0301, 0x0301, 0, 0, + 0, 0, 0, 0, + 0, 0, 0, 0, + 0x1031, 0x1031, 0x0735, 0x5031, + 0x1035, 0x0731, 0x1031, 0x0735, + 0x1035, 0x1031, 0x0731, 0x1035, + 0x1031, 0x0301, 0x0301, 0x0731 +#endif #ifdef CONFIG_SUN7I 0x0301, 0x0301, 0x0301, 0x0301, 0x0301, 0x0301, 0x0301, 0x0301, @@ -360,6 +397,13 @@ static void dramc_clock_output_en(u32 on) else clrbits_le32(&dram->mcr, DRAM_MCR_DCLK_OUT); #endif +#ifdef CONFIG_SUN4I + struct sunxi_ccm_reg *ccm = (struct sunxi_ccm_reg *)SUNXI_CCM_BASE; + if (on) + setbits_le32(&ccm->dram_clk_cfg, CCM_DRAM_CTRL_DCLK_OUT); + else + clrbits_le32(&ccm->dram_clk_cfg, CCM_DRAM_CTRL_DCLK_OUT); +#endif }
static const u16 tRFC_table[2][6] = { @@ -502,6 +546,11 @@ unsigned long dramc_init(struct dram_para *para) /* dram clock off */ dramc_clock_output_en(0);
+#ifdef CONFIG_SUN4I + /* select dram controller 1 */ + writel(DRAM_CSEL_MAGIC, &dram->csel); +#endif + mctl_itm_disable(); mctl_enable_dll0(para->tpr3);

On Fri, 2014-07-18 at 19:23 +0300, Siarhei Siamashka wrote:
Add the necessary missing bits from the legacy u-boot-sunxi for the Allwinner A10 and A13 support (originally authored by Henrik Nordstrom, Stefan Roese, Oliver Schinagl and Hans de Goede).
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com
This is already in u-boot-sunxi.git#master. Please rebase.

All the known Allwinner A10/A13/A20 devices are using just single rank DDR3 memory. So don't pretend that we support DDR2 or more than one rank, because nobody could ever test these configurations for real and they are likely broken. Support for these features can be added back in the case if such hardware actually exists.
As part of this code cleanup, also replace division by 1024 with division by 1000 for the refresh timing calculations. This allows to use the original non-skewed tRFC timing table from the DRR3 spec and make code less confusing.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- arch/arm/cpu/armv7/sunxi/dram.c | 41 +++++++++++++++++++---------------------- 1 file changed, 19 insertions(+), 22 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index 8f35e1b..2db4f5a 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -406,20 +406,18 @@ static void dramc_clock_output_en(u32 on) #endif }
-static const u16 tRFC_table[2][6] = { - /* 256Mb 512Mb 1Gb 2Gb 4Gb 8Gb */ - /* DDR2 75ns 105ns 127.5ns 195ns 327.5ns invalid */ - { 77, 108, 131, 200, 336, 336 }, - /* DDR3 invalid 90ns 110ns 160ns 300ns 350ns */ - { 93, 93, 113, 164, 308, 359 } +/* tRFC in nanoseconds for different densities (from the DDR3 spec) */ +static const u16 tRFC_DDR3_table[6] = { + /* 256Mb 512Mb 1Gb 2Gb 4Gb 8Gb */ + 90, 90, 110, 160, 300, 350 };
-static void dramc_set_autorefresh_cycle(u32 clk, u32 type, u32 density) +static void dramc_set_autorefresh_cycle(u32 clk, u32 density) { struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; u32 tRFC, tREFI;
- tRFC = (tRFC_table[type][density] * clk + 1023) >> 10; + tRFC = (tRFC_DDR3_table[density] * clk + 999) / 1000; tREFI = (7987 * clk) >> 10; /* <= 7.8us */
writel(DRAM_DRR_TREFI(tREFI) | DRAM_DRR_TRFC(tRFC), &dram->drr); @@ -534,6 +532,13 @@ unsigned long dramc_init(struct dram_para *para) if (!para) return 0;
+ /* + * only single rank DDR3 is supported by this code even though the + * hardware can theoretically support DDR2 and up to two ranks + */ + if (para->type != DRAM_MEMORY_TYPE_DDR3 || para->rank_num != 1) + return 0; + /* setup DRAM relative clock */ mctl_setup_dram_clock(para->clock, para->mbus_clock);
@@ -555,9 +560,7 @@ unsigned long dramc_init(struct dram_para *para) mctl_enable_dll0(para->tpr3);
/* configure external DRAM */ - reg_val = 0x0; - if (para->type == DRAM_MEMORY_TYPE_DDR3) - reg_val |= DRAM_DCR_TYPE_DDR3; + reg_val = DRAM_DCR_TYPE_DDR3; reg_val |= DRAM_DCR_IO_WIDTH(para->io_width >> 3);
if (para->density == 256) @@ -597,25 +600,19 @@ unsigned long dramc_init(struct dram_para *para) mctl_enable_dllx(para->tpr3);
/* set refresh period */ - dramc_set_autorefresh_cycle(para->clock, para->type - 2, density); + dramc_set_autorefresh_cycle(para->clock, density);
/* set timing parameters */ writel(para->tpr0, &dram->tpr0); writel(para->tpr1, &dram->tpr1); writel(para->tpr2, &dram->tpr2);
- if (para->type == DRAM_MEMORY_TYPE_DDR3) { - reg_val = DRAM_MR_BURST_LENGTH(0x0); + reg_val = DRAM_MR_BURST_LENGTH(0x0); #if (defined(CONFIG_SUN5I) || defined(CONFIG_SUN7I)) - reg_val |= DRAM_MR_POWER_DOWN; + reg_val |= DRAM_MR_POWER_DOWN; #endif - reg_val |= DRAM_MR_CAS_LAT(para->cas - 4); - reg_val |= DRAM_MR_WRITE_RECOVERY(0x5); - } else if (para->type == DRAM_MEMORY_TYPE_DDR2) { - reg_val = DRAM_MR_BURST_LENGTH(0x2); - reg_val |= DRAM_MR_CAS_LAT(para->cas); - reg_val |= DRAM_MR_WRITE_RECOVERY(0x5); - } + reg_val |= DRAM_MR_CAS_LAT(para->cas - 4); + reg_val |= DRAM_MR_WRITE_RECOVERY(0x5); writel(reg_val, &dram->mr);
writel(para->emr1, &dram->emr);

On Fri, 2014-07-18 at 19:23 +0300, Siarhei Siamashka wrote:
All the known Allwinner A10/A13/A20 devices are using just single rank DDR3 memory. So don't pretend that we support DDR2 or more than one rank, because nobody could ever test these configurations for real and they are likely broken. Support for these features can be added back in the case if such hardware actually exists.
- if (para->type != DRAM_MEMORY_TYPE_DDR3 || para->rank_num != 1)
return 0;
Can we not go further and remove these fields from the para struct too?
Ian.

On Mon, 21 Jul 2014 20:51:12 +0100 Ian Campbell ijc@hellion.org.uk wrote:
On Fri, 2014-07-18 at 19:23 +0300, Siarhei Siamashka wrote:
All the known Allwinner A10/A13/A20 devices are using just single rank DDR3 memory. So don't pretend that we support DDR2 or more than one rank, because nobody could ever test these configurations for real and they are likely broken. Support for these features can be added back in the case if such hardware actually exists.
- if (para->type != DRAM_MEMORY_TYPE_DDR3 || para->rank_num != 1)
return 0;
Can we not go further and remove these fields from the para struct too?
Right now the DRAM patchset keeps backwards compatibility with the old 'dram_para' settings. This is intended to ensure that it is a drop-in replacement for the old code and make the transition easier. The fields can be removed at any time later as a cosmetic fix.

The write recovery time is 15ns for all JEDEC DDR3 speed bins. And instead of hardcoding it to 10 cycles, it is possible to set tighter timings based on accurate calculations. For example, DRAM clock frequencies up to 533MHz need only 8 cycles for write recovery.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- arch/arm/cpu/armv7/sunxi/dram.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index 2db4f5a..9f55799 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -423,6 +423,21 @@ static void dramc_set_autorefresh_cycle(u32 clk, u32 density) writel(DRAM_DRR_TREFI(tREFI) | DRAM_DRR_TRFC(tRFC), &dram->drr); }
+/* Calculate the value for A11, A10, A9 bits in MR0 (write recovery) */ +static u32 ddr3_write_recovery(u32 clk) +{ + u32 twr_ns = 15; /* DDR3 spec says that it is 15ns for all speed bins */ + u32 twr_ck = (twr_ns * clk + 999) / 1000; + if (twr_ck < 5) + return 1; + else if (twr_ck <= 8) + return twr_ck - 4; + else if (twr_ck <= 10) + return 5; + else + return 6; +} + /* * If the dram->ppwrsctl (SDR_DPCR) register has the lowest bit set to 1, this * means that DRAM is currently in self-refresh mode and retaining the old @@ -612,7 +627,7 @@ unsigned long dramc_init(struct dram_para *para) reg_val |= DRAM_MR_POWER_DOWN; #endif reg_val |= DRAM_MR_CAS_LAT(para->cas - 4); - reg_val |= DRAM_MR_WRITE_RECOVERY(0x5); + reg_val |= DRAM_MR_WRITE_RECOVERY(ddr3_write_recovery(para->clock)); writel(reg_val, &dram->mr);
writel(para->emr1, &dram->emr);

On Fri, 2014-07-18 at 19:23 +0300, Siarhei Siamashka wrote:
The write recovery time is 15ns for all JEDEC DDR3 speed bins. And instead of hardcoding it to 10 cycles, it is possible to set tighter timings based on accurate calculations. For example, DRAM clock frequencies up to 533MHz need only 8 cycles for write recovery.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com
Acked-by: Ian Campbell ijc@hellion.org.uk

In the case if the 'dram_para' struct does not specify the exact bus width or chip density, just use a trial and error method to find a usable configuration.
Because all the major bugs in the DRAM initialization sequence are now hopefully fixed, it should be safe to re-initialize the DRAM controller multiple times until we get it configured right. The original Allwinner's boot0 bootloader also used a similar autodetection trick.
The DDR3 spec contains the package pinout and addressing table for different possible chip densities. It appears to be impossible to distinguish between a single chip with 16 I/O data lines and a pair of chips with 8 I/O data lines in the case if they provide the same storage capacity. Because a single 16-bit chip has a higher density than a pair of equivalent 8-bit chips, it has stricter refresh timings. So in the case of doubt, we assume that 16-bit chips are used. Additionally, only Allwinner A20 has all A0-A15 address lines and can support densities up to 8192. The older Allwinner A10 and Allwinner A13 can only support densities up to 4096.
We deliberately leave out DDR2, dual-rank configurations and the special case of a 8-bit chip with density 8192. None of these configurations seem to have been ever used in real devices. And no new devices are likely to use these exotic configurations (because only up to 2GB of RAM can be populated in any case).
This DRAM autodetection feature potentially allows to have a single low performance fail-safe DDR3 initialiazation for a universal single bootloader binary, which can be compatible with all Allwinner A10/A13/A20 based devices (if the ifdefs are replaced with a runtime SoC type detection).
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- arch/arm/cpu/armv7/sunxi/dram.c | 52 +++++++++++++++++++++++++++++++++++++---- 1 file changed, 47 insertions(+), 5 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index 9f55799..3778cc9 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -536,17 +536,13 @@ static void mctl_set_impedance(u32 zq, u32 odt_en) writel(DRAM_IOCR_ODT_EN(odt_en), &dram->iocr); }
-unsigned long dramc_init(struct dram_para *para) +static unsigned long dramc_init_helper(struct dram_para *para) { struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; u32 reg_val; u32 density; int ret_val;
- /* check input dram parameter structure */ - if (!para) - return 0; - /* * only single rank DDR3 is supported by this code even though the * hardware can theoretically support DDR2 and up to two ranks @@ -671,3 +667,49 @@ unsigned long dramc_init(struct dram_para *para)
return get_ram_size((long *)PHYS_SDRAM_0, PHYS_SDRAM_0_SIZE); } + +unsigned long dramc_init(struct dram_para *para) +{ + unsigned long dram_size, actual_density; + + /* If the dram configuration is not provided, use a default */ + if (!para) + return 0; + + /* if everything is known, then autodetection is not necessary */ + if (para->io_width && para->bus_width && para->density) + return dramc_init_helper(para); + + /* try to autodetect the DRAM bus width and density */ + para->io_width = 16; + para->bus_width = 32; +#if defined(CONFIG_SUN4I) || defined(CONFIG_SUN5I) + /* only A0-A14 address lines on A10/A13, limiting max density to 4096 */ + para->density = 4096; +#else + /* all A0-A15 address lines on A20, which allow density 8192 */ + para->density = 8192; +#endif + + dram_size = dramc_init_helper(para); + if (!dram_size) { + /* if 32-bit bus width failed, try 16-bit bus width instead */ + para->bus_width = 16; + dram_size = dramc_init_helper(para); + if (!dram_size) { + /* if 16-bit bus width also failed, then bail out */ + return dram_size; + } + } + + /* check if we need to adjust the density */ + actual_density = (dram_size >> 17) * para->io_width / para->bus_width; + + if (actual_density != para->density) { + /* update the density and re-initialize DRAM again */ + para->density = actual_density; + dram_size = dramc_init_helper(para); + } + + return dram_size; +}

On Fri, 2014-07-18 at 19:23 +0300, Siarhei Siamashka wrote:
In the case if the 'dram_para' struct does not specify the exact bus width or chip density, just use a trial and error method to find a usable configuration.
Because all the major bugs in the DRAM initialization sequence are now hopefully fixed, it should be safe to re-initialize the DRAM controller multiple times until we get it configured right. The original Allwinner's boot0 bootloader also used a similar autodetection trick.
The DDR3 spec contains the package pinout and addressing table for different possible chip densities. It appears to be impossible to distinguish between a single chip with 16 I/O data lines and a pair of chips with 8 I/O data lines in the case if they provide the same storage capacity. Because a single 16-bit chip has a higher density than a pair of equivalent 8-bit chips, it has stricter refresh timings. So in the case of doubt, we assume that 16-bit chips are used. Additionally, only Allwinner A20 has all A0-A15 address lines and can support densities up to 8192. The older Allwinner A10 and Allwinner A13 can only support densities up to 4096.
We deliberately leave out DDR2, dual-rank configurations and the special case of a 8-bit chip with density 8192. None of these configurations seem to have been ever used in real devices. And no new devices are likely to use these exotic configurations (because only up to 2GB of RAM can be populated in any case).
This DRAM autodetection feature potentially allows to have a single low performance fail-safe DDR3 initialiazation for a universal single bootloader binary, which can be compatible with all Allwinner A10/A13/A20 based devices (if the ifdefs are replaced with a runtime SoC type detection).
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com
Acked-by: Ian Campbell ijc@hellion.org.uk

Hi,
On 07/18/2014 06:22 PM, Siarhei Siamashka wrote:
Hello,
First of all, it may be worth reminding that no accurate documentation for this particular DRAM controller exists in public access.
However it is suspected that Allwinner uses one of the revisions of Synopsys DesignWare DDR2/3-Lite Memory Controller IP (MCTL) combined with DDR2/3-Lite PHY IP in A10/A13/A20. Also this DRAM controller apparently has siblings in Rockchip RK29XX, RK30XX and TI KeyStone2 hardware, which have some documentation and some bits of kernel and bootloader sources available in the Internet. Not to mention the original Allwinner boot0 bootloader sources and the suspend support code from the linux-sunxi kernel. This provides enough hints for finding out how the DRAM controller actually works by checking various bits of information via the trial and error method.
In other words, a few people from the linux-sunxi community (me included) are essentially doing reverse engineering of the DRAM controller and using the linux-sunxi wiki to document all the findings: http://linux-sunxi.org/DRAM_Controller
If Allwinner Technology, Synopsys DesignWare or anyone else can share real documentation or at least provide some hints or review the code, that would be really appreciated.
Nevertheless, here is a patch set, which tries to improve the current u-boot sunxi dram code in the following way:
- remove the convoluted dead code paths, which have no real use
- fix obvious bugs and timing violations
- add real ZQ calibration and ODT support for better reliability and higher clock speed potential
- remove duplication and share code between sun4i/sun5i/sun7i
- add more configuration knobs (such as MBUS clock frequency)
- optional support for automatic detection of the bus width and chip density
This patch set only modifies the dram.c and dram.h source files and applies cleanly to u-boot v2014.07 (but is intended for v2014.10). The patch set is organized in such a way, that we first fix bugs in the current sun7i implementation, and only after that add the missing sun4i/sun5i support bits. As such, it clashes with the dram parts of the sun4i/sun5i support patches, earlier submitted by Hans de Goede.
First of all, many many thanks for working on this!
Unfortunately the wheels of progress have been moving on though.
Ian Campbell and I now maintain a u-boot custodian tree for allwinner related patches, and we (mostly Ian) have already prepared a set of patches to get merged for v2014.10 as soon as the window opens, and that includes sun4i and sun5i support. You can find this tree here:
http://git.denx.de/?p=u-boot/u-boot-sunxi.git;a=summary
I really don't want to go and roll-back changes already there so that we can add your patch-set. Therefor I've to ask you, can you please rebase your set on top of this tree?
I understand this may be non trivial and thus may require some extra work from you side, and I'm sorry about that.
Once you've posted a new rebased version I or Ian will review it and get it into the above tree for merging upstream.
Regards,
Hans

On Sat, 2014-07-19 at 12:59 +0200, Hans de Goede wrote:
can you please rebase your set on top of this tree?
Yes, please.
I understand this may be non trivial and thus may require some extra work from you side, and I'm sorry about that.
Me too.
Having been through the series though I don't think there is much other than the rebase (and review comments) to address which is massively more complex on top of the current custodian master, but maybe I'm being overly optimistic.
Once you've posted a new rebased version I or Ian will review it and get it into the above tree for merging upstream.
I actually went through it as it stands already and I've acked various bits which seemed like they just need the obvious rebasing onto u-boot-sunxi#master. Siarhei, if the rebase turns out to be non-trivial/obvious then please drop the ack.
Ian.
participants (3)
-
Hans de Goede
-
Ian Campbell
-
Siarhei Siamashka