[U-Boot] [PATCH v2 00/16] sunxi: Allwinner A10/A13/A20 DRAM controller fixes

This is version 2 of http://lists.denx.de/pipermail/u-boot/2014-July/183981.html
Rebased on git://git.denx.de/u-boot-sunxi.git master branch (commit 3340eab26d89176dd0bf543e6d2590665c577423 "sun7i: Add bananapi board")
Siarhei Siamashka (16): sunxi: dram: Remove useless 'dramc_scan_dll_para()' function sunxi: dram: Remove broken super-standby remnants sunxi: dram: Respect the DDR3 reset timing requirements sunxi: dram: Fix CKE delay handling for sun4i/sun5i sunxi: dram: Remove broken impedance and ODT configuration code sunxi: dram: Do DDR3 reset in the same way on sun4i/sun5i/sun7i sunxi: dram: Add 'await_bits_clear'/'await_bits_set' helper functions sunxi: dram: Re-introduce the impedance calibration ond ODT sunxi: dram: Configurable MBUS clock speed (use PLL5 or PLL6) sunxi: dram: Use divisor P=1 for PLL5 sunxi: dram: Improve DQS gate data training error handling sunxi: dram: Add a helper function 'mctl_get_number_of_lanes' sunxi: dram: Configurable DQS gating window mode and delay sunxi: dram: Drop DDR2 support and assume only single rank DDR3 memory sunxi: dram: Derive write recovery delay from DRAM clock speed sunxi: dram: Autodetect DDR3 bus width and density
arch/arm/cpu/armv7/sunxi/dram.c | 621 ++++++++++++++++++--------------- arch/arm/include/asm/arch-sunxi/dram.h | 14 +- 2 files changed, 351 insertions(+), 284 deletions(-)

The attempt to do DRAM parameters calibration in 'dramc_scan_dll_para()' function by trying different DLL adjustments and using the hardware DQS gate training result as a feedback is a great source of inspiration, but it just can't work properly the way it is implemented now. The fatal problem of this implementation is that the DQS gating window can be successfully found for almost every DLL delay adjustment setup that gets tried. Thus making it unable to see any real difference between 'good' and 'bad' settings.
Also this code was supposed to be only activated by setting the highest bit in the 'dram_tpr3' variable of the 'dram_para' struct (per-board dram configuration). But none of the linux-sunxi devices has ever used it for real. Basically, this code is just a dead weight.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com Acked-by: Ian Campbell ijc@hellion.org.uk --- Changes for v2: - break lines in the commit message in order not to exceed 72 characters line limit - no other changes
arch/arm/cpu/armv7/sunxi/dram.c | 125 +--------------------------------------- 1 file changed, 1 insertion(+), 124 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index 0f1ceec..236eed7 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -321,117 +321,6 @@ static int dramc_scan_readpipe(void) return 0; }
-static int dramc_scan_dll_para(void) -{ - struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; - const u32 dqs_dly[7] = {0x3, 0x2, 0x1, 0x0, 0xe, 0xd, 0xc}; - const u32 clk_dly[15] = {0x07, 0x06, 0x05, 0x04, 0x03, - 0x02, 0x01, 0x00, 0x08, 0x10, - 0x18, 0x20, 0x28, 0x30, 0x38}; - u32 clk_dqs_count[15]; - u32 dqs_i, clk_i, cr_i; - u32 max_val, min_val; - u32 dqs_index, clk_index; - - /* Find DQS_DLY Pass Count for every CLK_DLY */ - for (clk_i = 0; clk_i < 15; clk_i++) { - clk_dqs_count[clk_i] = 0; - clrsetbits_le32(&dram->dllcr[0], 0x3f << 6, - (clk_dly[clk_i] & 0x3f) << 6); - for (dqs_i = 0; dqs_i < 7; dqs_i++) { - for (cr_i = 1; cr_i < 5; cr_i++) { - clrsetbits_le32(&dram->dllcr[cr_i], - 0x4f << 14, - (dqs_dly[dqs_i] & 0x4f) << 14); - } - udelay(2); - if (dramc_scan_readpipe() == 0) - clk_dqs_count[clk_i]++; - } - } - /* Test DQS_DLY Pass Count for every CLK_DLY from up to down */ - for (dqs_i = 15; dqs_i > 0; dqs_i--) { - max_val = 15; - min_val = 15; - for (clk_i = 0; clk_i < 15; clk_i++) { - if (clk_dqs_count[clk_i] == dqs_i) { - max_val = clk_i; - if (min_val == 15) - min_val = clk_i; - } - } - if (max_val < 15) - break; - } - - /* Check if Find a CLK_DLY failed */ - if (!dqs_i) - goto fail; - - /* Find the middle index of CLK_DLY */ - clk_index = (max_val + min_val) >> 1; - if ((max_val == (15 - 1)) && (min_val > 0)) - /* if CLK_DLY[MCTL_CLK_DLY_COUNT] is very good, then the middle - * value can be more close to the max_val - */ - clk_index = (15 + clk_index) >> 1; - else if ((max_val < (15 - 1)) && (min_val == 0)) - /* if CLK_DLY[0] is very good, then the middle value can be more - * close to the min_val - */ - clk_index >>= 1; - if (clk_dqs_count[clk_index] < dqs_i) - clk_index = min_val; - - /* Find the middle index of DQS_DLY for the CLK_DLY got above, and Scan - * read pipe again - */ - clrsetbits_le32(&dram->dllcr[0], 0x3f << 6, - (clk_dly[clk_index] & 0x3f) << 6); - max_val = 7; - min_val = 7; - for (dqs_i = 0; dqs_i < 7; dqs_i++) { - clk_dqs_count[dqs_i] = 0; - for (cr_i = 1; cr_i < 5; cr_i++) { - clrsetbits_le32(&dram->dllcr[cr_i], - 0x4f << 14, - (dqs_dly[dqs_i] & 0x4f) << 14); - } - udelay(2); - if (dramc_scan_readpipe() == 0) { - clk_dqs_count[dqs_i] = 1; - max_val = dqs_i; - if (min_val == 7) - min_val = dqs_i; - } - } - - if (max_val < 7) { - dqs_index = (max_val + min_val) >> 1; - if ((max_val == (7-1)) && (min_val > 0)) - dqs_index = (7 + dqs_index) >> 1; - else if ((max_val < (7-1)) && (min_val == 0)) - dqs_index >>= 1; - if (!clk_dqs_count[dqs_index]) - dqs_index = min_val; - for (cr_i = 1; cr_i < 5; cr_i++) { - clrsetbits_le32(&dram->dllcr[cr_i], - 0x4f << 14, - (dqs_dly[dqs_index] & 0x4f) << 14); - } - udelay(2); - return dramc_scan_readpipe(); - } - -fail: - clrbits_le32(&dram->dllcr[0], 0x3f << 6); - for (cr_i = 1; cr_i < 5; cr_i++) - clrbits_le32(&dram->dllcr[cr_i], 0x4f << 14); - udelay(2); - - return dramc_scan_readpipe(); -} - static void dramc_clock_output_en(u32 on) { #if defined(CONFIG_SUN5I) || defined(CONFIG_SUN7I) @@ -665,19 +554,7 @@ unsigned long dramc_init(struct dram_para *para)
/* scan read pipe value */ mctl_itm_enable(); - if (para->tpr3 & (0x1 << 31)) { - ret_val = dramc_scan_dll_para(); - if (ret_val == 0) - para->tpr3 = - (((readl(&dram->dllcr[0]) >> 6) & 0x3f) << 16) | - (((readl(&dram->dllcr[1]) >> 14) & 0xf) << 0) | - (((readl(&dram->dllcr[2]) >> 14) & 0xf) << 4) | - (((readl(&dram->dllcr[3]) >> 14) & 0xf) << 8) | - (((readl(&dram->dllcr[4]) >> 14) & 0xf) << 12 - ); - } else { - ret_val = dramc_scan_readpipe(); - } + ret_val = dramc_scan_readpipe();
if (ret_val < 0) return 0;

If the dram->ppwrsctl (SDR_DPCR) register has the lowest bit set to 1, this means that DRAM is currently in self-refresh mode and retaining the old data. Since we have no idea what to do in this situation yet, just set this register to 0 and initialize DRAM in the same way as on any normal reboot (discarding whatever was stored there).
This part of code was apparently used by the Allwinner boot0 bootloader to handle resume from the so-called super-standby mode. But this particular code got somehow mangled on the way from the boot0 bootloader to the u-boot-sunxi bootloader and has no chance of doing anything even remotely sane. For example: 1. in the original boot0 code we had "mctl_write_w(SDR_DPCR, 0x16510000)" (write to the register) and in the u-boot it now looks like "setbits_le32(&dram->ppwrsctl, 0x16510000)" (set bits in the register) 2. in the original boot0 code it was issuing three commands "0x12, 0x17, 0x13" (Self-Refresh entry, Self-Refresh exit, Refresh), but in the u-boot they have become "0x12, 0x12, 0x13" (Self-Refresh entry, Self-Refresh entry, Refresh)
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- Changes for v2: - break lines in the commit message in order not to exceed 72 characters line limit - rebase on git://git.denx.de/u-boot-sunxi.git master and resolve minor conflicts
arch/arm/cpu/armv7/sunxi/dram.c | 66 +++++++++++++---------------------------- 1 file changed, 20 insertions(+), 46 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index 236eed7..dc79d1c 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -359,6 +359,24 @@ static void dramc_set_autorefresh_cycle(u32 clk, u32 type, u32 density) writel(DRAM_DRR_TREFI(tREFI) | DRAM_DRR_TRFC(tRFC), &dram->drr); }
+/* + * If the dram->ppwrsctl (SDR_DPCR) register has the lowest bit set to 1, this + * means that DRAM is currently in self-refresh mode and retaining the old + * data. Since we have no idea what to do in this situation yet, just set this + * register to 0 and initialize DRAM in the same way as on any normal reboot + * (discarding whatever was stored there). + * + * Note: on sun7i hardware, the highest 16 bits need to be set to 0x1651 magic + * value for this write operation to have any effect. On sun5i hadware this + * magic value is not necessary. And on sun4i hardware the writes to this + * register seem to have no effect at all. + */ +static void mctl_disable_power_save(void) +{ + struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; + writel(0x16510000, &dram->ppwrsctl); +} + unsigned long dramc_init(struct dram_para *para) { struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; @@ -373,10 +391,8 @@ unsigned long dramc_init(struct dram_para *para) /* setup DRAM relative clock */ mctl_setup_dram_clock(para->clock);
-#ifdef CONFIG_SUN5I /* Disable any pad power save control */ - writel(0, &dram->ppwrsctl); -#endif + mctl_disable_power_save();
/* reset external DRAM */ #ifndef CONFIG_SUN7I @@ -444,10 +460,7 @@ unsigned long dramc_init(struct dram_para *para) #endif
#ifdef CONFIG_SUN7I - if ((readl(&dram->ppwrsctl) & 0x1) != 0x1) - mctl_ddr3_reset(); - else - setbits_le32(&dram->mcr, DRAM_MCR_RESET); + mctl_ddr3_reset(); #else /* dram clock on */ dramc_clock_output_en(1); @@ -513,45 +526,6 @@ unsigned long dramc_init(struct dram_para *para) setbits_le32(&dram->ccr, DRAM_CCR_INIT); await_completion(&dram->ccr, DRAM_CCR_INIT);
-#ifdef CONFIG_SUN7I - /* setup zq calibration manual */ - reg_val = readl(&dram->ppwrsctl); - if ((reg_val & 0x1) == 1) { - /* super_standby_flag = 1 */ - - reg_val = readl(0x01c20c00 + 0x120); /* rtc */ - reg_val &= 0x000fffff; - reg_val |= 0x17b00000; - writel(reg_val, &dram->zqcr0); - - /* exit self-refresh state */ - clrsetbits_le32(&dram->dcr, 0x1f << 27, 0x12 << 27); - /* check whether command has been executed */ - await_completion(&dram->dcr, 0x1 << 31); - - udelay(2); - - /* dram pad hold off */ - setbits_le32(&dram->ppwrsctl, 0x16510000); - - await_completion(&dram->ppwrsctl, 0x1); - - /* exit self-refresh state */ - clrsetbits_le32(&dram->dcr, 0x1f << 27, 0x12 << 27); - - /* check whether command has been executed */ - await_completion(&dram->dcr, 0x1 << 31); - - udelay(2); - - /* issue a refresh command */ - clrsetbits_le32(&dram->dcr, 0x1f << 27, 0x13 << 27); - await_completion(&dram->dcr, 0x1 << 31); - - udelay(2); - } -#endif - /* scan read pipe value */ mctl_itm_enable(); ret_val = dramc_scan_readpipe();

On Sun, 2014-08-03 at 05:32 +0300, Siarhei Siamashka wrote:
If the dram->ppwrsctl (SDR_DPCR) register has the lowest bit set to 1, this means that DRAM is currently in self-refresh mode and retaining the old data. Since we have no idea what to do in this situation yet, just set this register to 0 and initialize DRAM in the same way as on any normal reboot (discarding whatever was stored there).
This part of code was apparently used by the Allwinner boot0 bootloader to handle resume from the so-called super-standby mode. But this particular code got somehow mangled on the way from the boot0 bootloader to the u-boot-sunxi bootloader and has no chance of doing anything even remotely sane. For example:
- in the original boot0 code we had "mctl_write_w(SDR_DPCR, 0x16510000)" (write to the register) and in the u-boot it now looks like "setbits_le32(&dram->ppwrsctl, 0x16510000)" (set bits in the register)
- in the original boot0 code it was issuing three commands "0x12, 0x17, 0x13" (Self-Refresh entry, Self-Refresh exit, Refresh), but in the u-boot they have become "0x12, 0x12, 0x13" (Self-Refresh entry, Self-Refresh entry, Refresh)
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com
Acked-by: Ian Campbell ijc@hellion.org.uk

The RESET pin needs to be kept low for at least 200 us according to the DDR3 spec. So just do it the right way.
This issue did not cause any visible major problems earlier, because the DRAM RESET pin is usually already low after the board reset. And the time gap before reaching the sunxi u-boot DRAM initialization code appeared to be sufficient.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- Changes for v2: - rebase on git://git.denx.de/u-boot-sunxi.git master and resolve minor conflicts (udelay in two places instead of one)
arch/arm/cpu/armv7/sunxi/dram.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index dc79d1c..a632926 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -48,6 +48,11 @@ static void await_completion(u32 *reg, u32 mask) } }
+/* + * This performs the external DRAM reset by driving the RESET pin low and + * then high again. According to the DDR3 spec, the RESET pin needs to be + * kept low for at least 200 us. + */ static void mctl_ddr3_reset(void) { struct sunxi_dram_reg *dram = @@ -64,13 +69,13 @@ static void mctl_ddr3_reset(void) if ((reg_val & CPU_CFG_CHIP_VER_MASK) != CPU_CFG_CHIP_VER(CPU_CFG_CHIP_REV_A)) { setbits_le32(&dram->mcr, DRAM_MCR_RESET); - udelay(2); + udelay(200); clrbits_le32(&dram->mcr, DRAM_MCR_RESET); } else #endif { clrbits_le32(&dram->mcr, DRAM_MCR_RESET); - udelay(2); + udelay(200); setbits_le32(&dram->mcr, DRAM_MCR_RESET); } }

On Sun, 2014-08-03 at 05:32 +0300, Siarhei Siamashka wrote:
The RESET pin needs to be kept low for at least 200 us according to the DDR3 spec. So just do it the right way.
This issue did not cause any visible major problems earlier, because the DRAM RESET pin is usually already low after the board reset. And the time gap before reaching the sunxi u-boot DRAM initialization code appeared to be sufficient.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com
Acked-by: Ian Campbell ijc@hellion.org.uk

Before driving the CKE pin (Clock Enable) high, the DDR3 spec requires to wait for additional 500 us after the RESET pin is de-asserted.
The DRAM controller takes care of this delay by itself, using a configurable counter in the SDR_IDCR register. This works in the same way on sun4i/sun5i/sun7i hardware (even the default register value 0x00c80064 is identical). Except that the counter is ticking a bit slower on sun7i (3 DRAM clock cycles instead of 2), resulting in longer actual delays for the same settings.
This patch configures the SDR_IDCR register for all sun4i/sun5i/sun7i SoC variants and not just for sun7i alone. Also an explicit udelay(500) is added immediately after DDR3 reset for extra safety. This is a duplicated functionality. But since we don't have perfect documentation, it may be reasonable to play safe. Half a millisecond boot time increase is not that significant. Boot time can be always optimized later. Preferebly by the people, who have the hardware equipment to check the actual signals on the RESET and CKE lines and verify all the timings.
The old code did not configure the SDR_IDCR register for sun4i/sun5i, but performed the DDR3 reset very early for sun4i/sun5i. This resulted in a larger time gap between the DDR3 reset and the DDR3 initialization steps and reduced the chances of CKE delay timing violation to cause real troubles.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- Changes for v2: - use both methods for implementing CKE delay at once (add udelay after reset and still keep using SDR_IDCR), as agreed with Ian Campbell during v1 review - update commit message - more comments
arch/arm/cpu/armv7/sunxi/dram.c | 57 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 50 insertions(+), 7 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index a632926..6849952 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -78,6 +78,19 @@ static void mctl_ddr3_reset(void) udelay(200); setbits_le32(&dram->mcr, DRAM_MCR_RESET); } + /* After the RESET pin is de-asserted, the DDR3 spec requires to wait + * for additional 500 us before driving the CKE pin (Clock Enable) + * high. The duration of this delay can be configured in the SDR_IDCR + * (Initialization Delay Configuration Register) and applied + * automatically by the DRAM controller during the DDR3 initialization + * step. But SDR_IDCR has limited range on sun4i/sun5i hardware and + * can't provide sufficient delay at DRAM clock frequencies higher than + * 524 MHz (while Allwinner A13 supports DRAM clock frequency up to + * 533 MHz according to the datasheet). Additionally, there is no + * official documentation for the SDR_IDCR register anywhere, and + * there is always a chance that we are interpreting it wrong. + * Better be safe than sorry, so add an explicit delay here. */ + udelay(500); }
static void mctl_set_drive(void) @@ -382,6 +395,40 @@ static void mctl_disable_power_save(void) writel(0x16510000, &dram->ppwrsctl); }
+/* + * After the DRAM is powered up or reset, the DDR3 spec requires to wait at + * least 500 us before driving the CKE pin (Clock Enable) high. The dram->idct + * (SDR_IDCR) register appears to configure this delay, which gets applied + * right at the time when the DRAM initialization is activated in the + * 'mctl_ddr3_initialize' function. + */ +static void mctl_set_cke_delay(void) +{ + struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; + + /* The CKE delay is represented in DRAM clock cycles, multiplied by N + * (where N=2 for sun4i/sun5i and N=3 for sun7i). Here it is set to + * the maximum possible value 0x1ffff, just like in the Allwinner's + * boot0 bootloader. The resulting delay value is somewhere between + * ~0.4 ms (sun5i with 648 MHz DRAM clock speed) and ~1.1 ms (sun7i + * with 360 MHz DRAM clock speed). */ + setbits_le32(&dram->idcr, 0x1ffff); +} + +/* + * This triggers the DRAM initialization. It performs sending the mode registers + * to the DRAM among other things. Very likely the ZQCL command is also getting + * executed (to do the initial impedance calibration on the DRAM side of the + * wire). The memory controller and the PHY must be already configured before + * calling this function. + */ +static void mctl_ddr3_initialize(void) +{ + struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; + setbits_le32(&dram->ccr, DRAM_CCR_INIT); + await_completion(&dram->ccr, DRAM_CCR_INIT); +} + unsigned long dramc_init(struct dram_para *para) { struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; @@ -459,10 +506,7 @@ unsigned long dramc_init(struct dram_para *para) writel(reg_val, &dram->zqcr0); #endif
-#ifdef CONFIG_SUN7I - /* Set CKE Delay to about 1ms */ - setbits_le32(&dram->idcr, 0x1ffff); -#endif + mctl_set_cke_delay();
#ifdef CONFIG_SUN7I mctl_ddr3_reset(); @@ -527,9 +571,8 @@ unsigned long dramc_init(struct dram_para *para) if (para->tpr4 & 0x1) setbits_le32(&dram->ccr, DRAM_CCR_COMMAND_RATE_1T); #endif - /* reset external DRAM */ - setbits_le32(&dram->ccr, DRAM_CCR_INIT); - await_completion(&dram->ccr, DRAM_CCR_INIT); + /* initialize external DRAM */ + mctl_ddr3_initialize();
/* scan read pipe value */ mctl_itm_enable();

On Sun, 2014-08-03 at 05:32 +0300, Siarhei Siamashka wrote:
Before driving the CKE pin (Clock Enable) high, the DDR3 spec requires to wait for additional 500 us after the RESET pin is de-asserted.
The DRAM controller takes care of this delay by itself, using a configurable counter in the SDR_IDCR register. This works in the same way on sun4i/sun5i/sun7i hardware (even the default register value 0x00c80064 is identical). Except that the counter is ticking a bit slower on sun7i (3 DRAM clock cycles instead of 2), resulting in longer actual delays for the same settings.
This patch configures the SDR_IDCR register for all sun4i/sun5i/sun7i SoC variants and not just for sun7i alone. Also an explicit udelay(500) is added immediately after DDR3 reset for extra safety. This is a duplicated functionality. But since we don't have perfect documentation, it may be reasonable to play safe. Half a millisecond boot time increase is not that significant. Boot time can be always optimized later. Preferebly by the people, who have the hardware equipment to check the actual signals on the RESET and CKE lines and verify all the timings.
The old code did not configure the SDR_IDCR register for sun4i/sun5i, but performed the DDR3 reset very early for sun4i/sun5i. This resulted in a larger time gap between the DDR3 reset and the DDR3 initialization steps and reduced the chances of CKE delay timing violation to cause real troubles.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com
Acked-by: Ian Campbell ijc@hellion.org.uk

We can safely remove it, because none of the currently supported boards uses these features.
The existing implementation had multiple problems: - unnecessary code duplication between sun4i/sun5i/sun7i - ZQ calibration was never initiated explicitly, and could be only triggered by setting the highest bit in the 'zq' parameter in the 'dram_para' struct (this was never actually done for any of the known Allwinner devices). - even if the ZQ calibration could be started, no attempts were made to wait for its completion, or checking whether the default automatically initiated ZQ calibration is still in progress - ODT was only ever enabled on sun4i, but not on sun5i/sun7i
Additionally, SDR_IOCR was set to 0x00cc0000 only on sun4i. There are some hints in the Rockchip Linux kernel sources, indicating that these bits are related to the automatic I/O power down feature, which is poorly understood on sunxi hardware at the moment. Avoiding to set these bits on sun4i too does not seem to have any measurable/visible impact.
The impedance and ODT configuration code will be re-introdeced in one of the next comits.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- Changes for v2: - this is a new patch in the series - implement the first part of the old patch [PATCH 05/14] "sunxi: dram: Code cleanup for the impedance calibration" (the old patch is split as agreed with Ian Campbell)
arch/arm/cpu/armv7/sunxi/dram.c | 27 --------------------------- 1 file changed, 27 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index 6849952..33e8bd6 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -492,20 +492,9 @@ unsigned long dramc_init(struct dram_para *para) writel(reg_val, &dram->dcr);
#ifdef CONFIG_SUN7I - setbits_le32(&dram->zqcr1, (0x1 << 24) | (0x1 << 1)); - if (para->tpr4 & 0x2) - clrsetbits_le32(&dram->zqcr1, (0x1 << 24), (0x1 << 1)); dramc_clock_output_en(1); #endif
-#if (defined(CONFIG_SUN5I) || defined(CONFIG_SUN7I)) - /* set odt impendance divide ratio */ - reg_val = ((para->zq) >> 8) & 0xfffff; - reg_val |= ((para->zq) & 0xff) << 20; - reg_val |= (para->zq) & 0xf0000000; - writel(reg_val, &dram->zqcr0); -#endif - mctl_set_cke_delay();
#ifdef CONFIG_SUN7I @@ -521,22 +510,6 @@ unsigned long dramc_init(struct dram_para *para)
mctl_enable_dllx(para->tpr3);
-#ifdef CONFIG_SUN4I - /* set odt impedance divide ratio */ - reg_val = ((para->zq) >> 8) & 0xfffff; - reg_val |= ((para->zq) & 0xff) << 20; - reg_val |= (para->zq) & 0xf0000000; - writel(reg_val, &dram->zqcr0); -#endif - -#ifdef CONFIG_SUN4I - /* set I/O configure register */ - reg_val = 0x00cc0000; - reg_val |= (para->odt_en) & 0x3; - reg_val |= ((para->odt_en) & 0x3) << 30; - writel(reg_val, &dram->iocr); -#endif - /* set refresh period */ dramc_set_autorefresh_cycle(para->clock, para->type - 2, density);

On Sun, 2014-08-03 at 05:32 +0300, Siarhei Siamashka wrote:
We can safely remove it, because none of the currently supported boards uses these features.
The existing implementation had multiple problems:
- unnecessary code duplication between sun4i/sun5i/sun7i
- ZQ calibration was never initiated explicitly, and could be only triggered by setting the highest bit in the 'zq' parameter in the 'dram_para' struct (this was never actually done for any of the known Allwinner devices).
- even if the ZQ calibration could be started, no attempts were made to wait for its completion, or checking whether the default automatically initiated ZQ calibration is still in progress
- ODT was only ever enabled on sun4i, but not on sun5i/sun7i
Additionally, SDR_IOCR was set to 0x00cc0000 only on sun4i. There are some hints in the Rockchip Linux kernel sources, indicating that these bits are related to the automatic I/O power down feature, which is poorly understood on sunxi hardware at the moment. Avoiding to set these bits on sun4i too does not seem to have any measurable/visible impact.
The impedance and ODT configuration code will be re-introdeced in one of the next comits.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com
Acked-by: Ian Campbell ijc@hellion.org.uk

The older differences were likely justified by the need to mitigate the CKE delay timing violations on sun4i/sun5i. The CKE problem is already resolved, so now we can use the sun7i variant of this code everywhere.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- Changes for v2: - this is a new patch in the series - reduce differences between sun4i/sun5i/sun7i to avoid conflicts when applying the follow-up patches
arch/arm/cpu/armv7/sunxi/dram.c | 11 ----------- 1 file changed, 11 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index 33e8bd6..9042e9a 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -446,10 +446,6 @@ unsigned long dramc_init(struct dram_para *para) /* Disable any pad power save control */ mctl_disable_power_save();
- /* reset external DRAM */ -#ifndef CONFIG_SUN7I - mctl_ddr3_reset(); -#endif mctl_set_drive();
/* dram clock off */ @@ -491,18 +487,11 @@ unsigned long dramc_init(struct dram_para *para) reg_val |= DRAM_DCR_MODE(DRAM_DCR_MODE_INTERLEAVE); writel(reg_val, &dram->dcr);
-#ifdef CONFIG_SUN7I dramc_clock_output_en(1); -#endif
mctl_set_cke_delay();
-#ifdef CONFIG_SUN7I mctl_ddr3_reset(); -#else - /* dram clock on */ - dramc_clock_output_en(1); -#endif
udelay(1);

On Sun, 2014-08-03 at 05:32 +0300, Siarhei Siamashka wrote:
The older differences were likely justified by the need to mitigate the CKE delay timing violations on sun4i/sun5i. The CKE problem is already resolved, so now we can use the sun7i variant of this code everywhere.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com
Acked-by: Ian Campbell ijc@hellion.org.uk

The old 'await_completion' function is not sufficient, because in some cases we want to wait for bits to be cleared, and in the other cases we want to wait for bits to be set. So split the 'await_completion' into two new 'await_bits_clear' and 'await_bits_set' functions.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- Changes for v2: - this is a new patch in the series - implement the second part of the old patch [PATCH 05/14] "sunxi: dram: Code cleanup for the impedance calibration" (the old patch is split as agreed with Ian Campbell)
arch/arm/cpu/armv7/sunxi/dram.c | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index 9042e9a..4a72480 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -36,19 +36,35 @@ #define CPU_CFG_CHIP_REV_B 0x3
/* - * Wait up to 1s for mask to be clear in given reg. + * Wait up to 1s for value to be set in given part of reg. */ -static void await_completion(u32 *reg, u32 mask) +static void await_completion(u32 *reg, u32 mask, u32 val) { unsigned long tmo = timer_get_us() + 1000000;
- while (readl(reg) & mask) { + while ((readl(reg) & mask) != val) { if (timer_get_us() > tmo) panic("Timeout initialising DRAM\n"); } }
/* + * Wait up to 1s for mask to be clear in given reg. + */ +static inline void await_bits_clear(u32 *reg, u32 mask) +{ + await_completion(reg, mask, 0); +} + +/* + * Wait up to 1s for mask to be set in given reg. + */ +static inline void await_bits_set(u32 *reg, u32 mask) +{ + await_completion(reg, mask, mask); +} + +/* * This performs the external DRAM reset by driving the RESET pin low and * then high again. According to the DDR3 spec, the RESET pin needs to be * kept low for at least 200 us. @@ -329,7 +345,7 @@ static int dramc_scan_readpipe(void) setbits_le32(&dram->ccr, DRAM_CCR_DATA_TRAINING);
/* check whether data training process has completed */ - await_completion(&dram->ccr, DRAM_CCR_DATA_TRAINING); + await_bits_clear(&dram->ccr, DRAM_CCR_DATA_TRAINING);
/* check data training result */ reg_val = readl(&dram->csr); @@ -426,7 +442,7 @@ static void mctl_ddr3_initialize(void) { struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; setbits_le32(&dram->ccr, DRAM_CCR_INIT); - await_completion(&dram->ccr, DRAM_CCR_INIT); + await_bits_clear(&dram->ccr, DRAM_CCR_INIT); }
unsigned long dramc_init(struct dram_para *para) @@ -495,7 +511,7 @@ unsigned long dramc_init(struct dram_para *para)
udelay(1);
- await_completion(&dram->ccr, DRAM_CCR_INIT); + await_bits_clear(&dram->ccr, DRAM_CCR_INIT);
mctl_enable_dllx(para->tpr3);

On Sun, 2014-08-03 at 05:32 +0300, Siarhei Siamashka wrote:
The old 'await_completion' function is not sufficient, because in some cases we want to wait for bits to be cleared, and in the other cases we want to wait for bits to be set. So split the 'await_completion' into two new 'await_bits_clear' and 'await_bits_set' functions.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com
Acked-by: Ian Campbell ijc@hellion.org.uk

The DRAM controller allows to configure impedance either by using the calibration against an external high precision 240 ohm resistor, or by skipping the calibration and loading pre-defined data. The DRAM controller register guide is available here:
http://linux-sunxi.org/A10_DRAM_Controller_Register_Guide#SDR_ZQCR0
The new code supports both of the impedance configuration modes: - If the higher bits of the 'zq' parameter in the 'dram_para' struct are zero, then the lowest 8 bits are used as the ZPROG value, where two divisors encoded in lower and higher 4 bits. One divisor is used for calibrating the termination impedance, and another is used for the output impedance. - If bits 27:8 in the 'zq' parameters are non-zero, then they are used as the pre-defined ZDATA value instead of performing the ZQ calibration.
Two lowest bits in the 'odt_en' parameter enable ODT for the DQ and DQS lines individually. Enabling ODT for both DQ and DQS means that the 'odt_en' parameter needs to be set to 3.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- Changes for v2: - this is the last part half of the old patch [PATCH 05/14] "sunxi: dram: Code cleanup for the impedance calibration", split as agreed with Ian Campbell. - new commit message now explains how the code works - more comments are added to the code
arch/arm/cpu/armv7/sunxi/dram.c | 56 ++++++++++++++++++++++++++++++++++ arch/arm/include/asm/arch-sunxi/dram.h | 4 +++ 2 files changed, 60 insertions(+)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index 4a72480..f87cbeb 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -445,6 +445,60 @@ static void mctl_ddr3_initialize(void) await_bits_clear(&dram->ccr, DRAM_CCR_INIT); }
+/* + * Perform impedance calibration on the DRAM controller side of the wire. + */ +static void mctl_set_impedance(u32 zq, u32 odt_en) +{ + struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; + u32 reg_val; + u32 zprog = zq & 0xFF, zdata = (zq >> 8) & 0xFFFFF; + +#ifndef CONFIG_SUN7I + /* Appears that some kind of automatically initiated default + * ZQ calibration is already in progress at this point on sun4i/sun5i + * hardware, but not on sun7i. So it is reasonable to wait for its + * completion before doing anything else. */ + await_bits_set(&dram->zqsr, DRAM_ZQSR_ZDONE); +#endif + + /* ZQ calibration is not really useful unless ODT is enabled */ + if (!odt_en) + return; + +#ifdef CONFIG_SUN7I + /* Enabling ODT in SDR_IOCR on sun7i hardware results in a deadlock + * unless bit 24 is set in SDR_ZQCR1. Not much is known about the + * SDR_ZQCR1 register, but there are hints indicating that it might + * be related to periodic impedance re-calibration. This particular + * magic value is borrowed from the Allwinner boot0 bootloader, and + * using it helps to avoid troubles */ + writel((1 << 24) | (1 << 1), &dram->zqcr1); +#endif + + /* Needed at least for sun5i, because it does not self clear there */ + clrbits_le32(&dram->zqcr0, DRAM_ZQCR0_ZCAL); + + if (zdata) { + /* Set the user supplied impedance data */ + reg_val = DRAM_ZQCR0_ZDEN | zdata; + writel(reg_val, &dram->zqcr0); + /* no need to wait, this takes effect immediately */ + } else { + /* Do the calibration using the external resistor */ + reg_val = DRAM_ZQCR0_ZCAL | DRAM_ZQCR0_IMP_DIV(zprog); + writel(reg_val, &dram->zqcr0); + /* Wait for the new impedance configuration to settle */ + await_bits_set(&dram->zqsr, DRAM_ZQSR_ZDONE); + } + + /* Needed at least for sun5i, because it does not self clear there */ + clrbits_le32(&dram->zqcr0, DRAM_ZQCR0_ZCAL); + + /* Set I/O configure register */ + writel(DRAM_IOCR_ODT_EN(odt_en), &dram->iocr); +} + unsigned long dramc_init(struct dram_para *para) { struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; @@ -505,6 +559,8 @@ unsigned long dramc_init(struct dram_para *para)
dramc_clock_output_en(1);
+ mctl_set_impedance(para->zq, para->odt_en); + mctl_set_cke_delay();
mctl_ddr3_reset(); diff --git a/arch/arm/include/asm/arch-sunxi/dram.h b/arch/arm/include/asm/arch-sunxi/dram.h index 67fbfad..4433eeb 100644 --- a/arch/arm/include/asm/arch-sunxi/dram.h +++ b/arch/arm/include/asm/arch-sunxi/dram.h @@ -159,6 +159,10 @@ struct dram_para {
#define DRAM_ZQCR0_IMP_DIV(n) (((n) & 0xff) << 20) #define DRAM_ZQCR0_IMP_DIV_MASK DRAM_ZQCR0_IMP_DIV(0xff) +#define DRAM_ZQCR0_ZCAL (1 << 31) /* Starts ZQ calibration when set to 1 */ +#define DRAM_ZQCR0_ZDEN (1 << 28) /* Uses ZDATA instead of doing calibration */ + +#define DRAM_ZQSR_ZDONE (1 << 31) /* ZQ calibration completion flag */
#define DRAM_IOCR_ODT_EN(n) ((((n) & 0x3) << 30) | ((n) & 0x3) << 0) #define DRAM_IOCR_ODT_EN_MASK DRAM_IOCR_ODT_EN(0x3)

On Sun, 2014-08-03 at 05:32 +0300, Siarhei Siamashka wrote:
The DRAM controller allows to configure impedance either by using the calibration against an external high precision 240 ohm resistor, or by skipping the calibration and loading pre-defined data. The DRAM controller register guide is available here:
http://linux-sunxi.org/A10_DRAM_Controller_Register_Guide#SDR_ZQCR0
The new code supports both of the impedance configuration modes:
- If the higher bits of the 'zq' parameter in the 'dram_para' struct are zero, then the lowest 8 bits are used as the ZPROG value, where two divisors encoded in lower and higher 4 bits. One divisor is used for calibrating the termination impedance, and another is used for the output impedance.
- If bits 27:8 in the 'zq' parameters are non-zero, then they are used as the pre-defined ZDATA value instead of performing the ZQ calibration.
Two lowest bits in the 'odt_en' parameter enable ODT for the DQ and DQS lines individually. Enabling ODT for both DQ and DQS means that the 'odt_en' parameter needs to be set to 3.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com
Acked-by: Ian Campbell ijc@hellion.org.uk

The sun5i hardware (Allwinner A13) introduced configurable MBUS clock speed. Allwinner A13 uses only 16-bit data bus width to connect the external DRAM, which is halved compared to the 32-bit data bus of sun4i (Allwinner A10), so it does not make much sense to clock a wider internal bus at a very high speed. The Allwinner A13 manual specifies 300 MHz MBUS clock speed limit and 533 MHz DRAM clock speed limit. Newer sun7i hardware (Allwinner A20) has a full width 32-bit external memory interface again, but still keeps the MBUS clock speed configurable. Clocking MBUS too low inhibits memory performance and one has to find the optimal MBUS/DRAM clock speed ratio, which may depend on many factors: http://linux-sunxi.org/A10_DRAM_Controller_Performance
This patch introduces a new 'mbus_clock' parameter for the 'dram_para' struct and uses it as a desired MBUS clock speed target. If 'mbus_clock' is not set, 300 MHz is used by default to match the older hardcoded settings.
PLL5P and PLL6 are both evaluated as possible clock sources. Preferring the one, which can provide higher clock frequency that is lower or equal to the 'mbus_clock' target. In the case of a tie, PLL5P has higher priority.
Attempting to set the MBUS clock speed has no effect on sun4i, but does no harm either.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- Changes for v2: - update commit message to explain PLL5P vs. PLL6 selection, as requested during review - introduce 'pll5p_rate' and 'pll6x_rate' temporary variables to make the 'if' expression more readable, as requested during review - use DIV_ROUND_UP macro
arch/arm/cpu/armv7/sunxi/dram.c | 52 +++++++++++++++++++++++++--------- arch/arm/include/asm/arch-sunxi/dram.h | 1 + 2 files changed, 39 insertions(+), 14 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index f87cbeb..2403659 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -235,11 +235,20 @@ static void mctl_configure_hostport(void) writel(hpcr_value[i], &dram->hpcr[i]); }
-static void mctl_setup_dram_clock(u32 clk) +static void mctl_setup_dram_clock(u32 clk, u32 mbus_clk) { u32 reg_val; struct sunxi_ccm_reg *ccm = (struct sunxi_ccm_reg *)SUNXI_CCM_BASE;
+ /* PLL5P and PLL6 are the potential clock sources for MBUS */ + u32 pll6x_div, pll5p_div; + u32 pll6x_clk = clock_get_pll6() / 1000000; + u32 pll5p_clk = clk / 24 * 24; + u32 pll5p_rate, pll6x_rate; +#ifdef CONFIG_SUN7I + pll6x_clk *= 2; /* sun7i uses PLL6*2, sun5i uses just PLL6 */ +#endif + /* setup DRAM PLL */ reg_val = readl(&ccm->pll5_cfg); reg_val &= ~CCM_PLL5_CTRL_M_MASK; /* set M to 0 (x1) */ @@ -248,30 +257,35 @@ static void mctl_setup_dram_clock(u32 clk) reg_val &= ~CCM_PLL5_CTRL_P_MASK; /* set P to 0 (x1) */ if (clk >= 540 && clk < 552) { /* dram = 540MHz, pll5p = 540MHz */ + pll5p_clk = 540; reg_val |= CCM_PLL5_CTRL_M(CCM_PLL5_CTRL_M_X(2)); reg_val |= CCM_PLL5_CTRL_K(CCM_PLL5_CTRL_K_X(3)); reg_val |= CCM_PLL5_CTRL_N(CCM_PLL5_CTRL_N_X(15)); reg_val |= CCM_PLL5_CTRL_P(1); } else if (clk >= 512 && clk < 528) { /* dram = 512MHz, pll5p = 384MHz */ + pll5p_clk = 384; reg_val |= CCM_PLL5_CTRL_M(CCM_PLL5_CTRL_M_X(3)); reg_val |= CCM_PLL5_CTRL_K(CCM_PLL5_CTRL_K_X(4)); reg_val |= CCM_PLL5_CTRL_N(CCM_PLL5_CTRL_N_X(16)); reg_val |= CCM_PLL5_CTRL_P(2); } else if (clk >= 496 && clk < 504) { /* dram = 496MHz, pll5p = 372MHz */ + pll5p_clk = 372; reg_val |= CCM_PLL5_CTRL_M(CCM_PLL5_CTRL_M_X(3)); reg_val |= CCM_PLL5_CTRL_K(CCM_PLL5_CTRL_K_X(2)); reg_val |= CCM_PLL5_CTRL_N(CCM_PLL5_CTRL_N_X(31)); reg_val |= CCM_PLL5_CTRL_P(2); } else if (clk >= 468 && clk < 480) { /* dram = 468MHz, pll5p = 468MHz */ + pll5p_clk = 468; reg_val |= CCM_PLL5_CTRL_M(CCM_PLL5_CTRL_M_X(2)); reg_val |= CCM_PLL5_CTRL_K(CCM_PLL5_CTRL_K_X(3)); reg_val |= CCM_PLL5_CTRL_N(CCM_PLL5_CTRL_N_X(13)); reg_val |= CCM_PLL5_CTRL_P(1); } else if (clk >= 396 && clk < 408) { /* dram = 396MHz, pll5p = 396MHz */ + pll5p_clk = 396; reg_val |= CCM_PLL5_CTRL_M(CCM_PLL5_CTRL_M_X(2)); reg_val |= CCM_PLL5_CTRL_K(CCM_PLL5_CTRL_K_X(3)); reg_val |= CCM_PLL5_CTRL_N(CCM_PLL5_CTRL_N_X(11)); @@ -298,20 +312,30 @@ static void mctl_setup_dram_clock(u32 clk) clrbits_le32(&ccm->ahb_gate0, CCM_AHB_GATE_GPS); #endif
-#if defined(CONFIG_SUN5I) || defined(CONFIG_SUN7I) /* setup MBUS clock */ - reg_val = CCM_MBUS_CTRL_GATE | -#ifdef CONFIG_SUN7I - CCM_MBUS_CTRL_CLK_SRC(CCM_MBUS_CTRL_CLK_SRC_PLL6) | - CCM_MBUS_CTRL_N(CCM_MBUS_CTRL_N_X(2)) | - CCM_MBUS_CTRL_M(CCM_MBUS_CTRL_M_X(2)); -#else /* defined(CONFIG_SUN5I) */ - CCM_MBUS_CTRL_CLK_SRC(CCM_MBUS_CTRL_CLK_SRC_PLL5) | - CCM_MBUS_CTRL_N(CCM_MBUS_CTRL_N_X(1)) | - CCM_MBUS_CTRL_M(CCM_MBUS_CTRL_M_X(2)); -#endif + if (!mbus_clk) + mbus_clk = 300; + pll6x_div = DIV_ROUND_UP(pll6x_clk, mbus_clk); + pll5p_div = DIV_ROUND_UP(pll5p_clk, mbus_clk); + pll6x_rate = pll6x_clk / pll6x_div; + pll5p_rate = pll5p_clk / pll5p_div; + + if (pll6x_div <= 16 && pll6x_rate > pll5p_rate) { + /* use PLL6 as the MBUS clock source */ + reg_val = CCM_MBUS_CTRL_GATE | + CCM_MBUS_CTRL_CLK_SRC(CCM_MBUS_CTRL_CLK_SRC_PLL6) | + CCM_MBUS_CTRL_N(CCM_MBUS_CTRL_N_X(1)) | + CCM_MBUS_CTRL_M(CCM_MBUS_CTRL_M_X(pll6x_div)); + } else if (pll5p_div <= 16) { + /* use PLL5P as the MBUS clock source */ + reg_val = CCM_MBUS_CTRL_GATE | + CCM_MBUS_CTRL_CLK_SRC(CCM_MBUS_CTRL_CLK_SRC_PLL5) | + CCM_MBUS_CTRL_N(CCM_MBUS_CTRL_N_X(1)) | + CCM_MBUS_CTRL_M(CCM_MBUS_CTRL_M_X(pll5p_div)); + } else { + panic("Bad mbus_clk\n"); + } writel(reg_val, &ccm->mbus_clk_cfg); -#endif
/* * open DRAMC AHB & DLL register clock @@ -511,7 +535,7 @@ unsigned long dramc_init(struct dram_para *para) return 0;
/* setup DRAM relative clock */ - mctl_setup_dram_clock(para->clock); + mctl_setup_dram_clock(para->clock, para->mbus_clock);
/* Disable any pad power save control */ mctl_disable_power_save(); diff --git a/arch/arm/include/asm/arch-sunxi/dram.h b/arch/arm/include/asm/arch-sunxi/dram.h index 4433eeb..3c29256 100644 --- a/arch/arm/include/asm/arch-sunxi/dram.h +++ b/arch/arm/include/asm/arch-sunxi/dram.h @@ -69,6 +69,7 @@ struct sunxi_dram_reg {
struct dram_para { u32 clock; + u32 mbus_clock; u32 type; u32 rank_num; u32 density;

On Sun, 2014-08-03 at 05:32 +0300, Siarhei Siamashka wrote:
The sun5i hardware (Allwinner A13) introduced configurable MBUS clock speed. Allwinner A13 uses only 16-bit data bus width to connect the external DRAM, which is halved compared to the 32-bit data bus of sun4i (Allwinner A10), so it does not make much sense to clock a wider internal bus at a very high speed. The Allwinner A13 manual specifies 300 MHz MBUS clock speed limit and 533 MHz DRAM clock speed limit. Newer sun7i hardware (Allwinner A20) has a full width 32-bit external memory interface again, but still keeps the MBUS clock speed configurable. Clocking MBUS too low inhibits memory performance and one has to find the optimal MBUS/DRAM clock speed ratio, which may depend on many factors: http://linux-sunxi.org/A10_DRAM_Controller_Performance
This patch introduces a new 'mbus_clock' parameter for the 'dram_para' struct and uses it as a desired MBUS clock speed target. If 'mbus_clock' is not set, 300 MHz is used by default to match the older hardcoded settings.
PLL5P and PLL6 are both evaluated as possible clock sources. Preferring the one, which can provide higher clock frequency that is lower or equal to the 'mbus_clock' target. In the case of a tie, PLL5P has higher priority.
Attempting to set the MBUS clock speed has no effect on sun4i, but does no harm either.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com
Acked-by: Ian Campbell ijc@hellion.org.uk

This configures the PLL5P clock frequency to something in the ballpark of 1GHz and allows more choices for MBUS and G2D clock frequency selection (using their own divisors). In particular, it enables the use of 2/3 clock speed ratio between MBUS and DRAM.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com Acked-by: Ian Campbell ijc@hellion.org.uk --- Changes for v2: - break lines in the commit message in order not to exceed 72 characters line limit - no other changes
arch/arm/cpu/armv7/sunxi/dram.c | 28 +++++++++++----------------- 1 file changed, 11 insertions(+), 17 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index 2403659..6f98c6a 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -243,7 +243,7 @@ static void mctl_setup_dram_clock(u32 clk, u32 mbus_clk) /* PLL5P and PLL6 are the potential clock sources for MBUS */ u32 pll6x_div, pll5p_div; u32 pll6x_clk = clock_get_pll6() / 1000000; - u32 pll5p_clk = clk / 24 * 24; + u32 pll5p_clk = clk / 24 * 48; u32 pll5p_rate, pll6x_rate; #ifdef CONFIG_SUN7I pll6x_clk *= 2; /* sun7i uses PLL6*2, sun5i uses just PLL6 */ @@ -256,46 +256,40 @@ static void mctl_setup_dram_clock(u32 clk, u32 mbus_clk) reg_val &= ~CCM_PLL5_CTRL_N_MASK; /* set N to 0 (x0) */ reg_val &= ~CCM_PLL5_CTRL_P_MASK; /* set P to 0 (x1) */ if (clk >= 540 && clk < 552) { - /* dram = 540MHz, pll5p = 540MHz */ - pll5p_clk = 540; + /* dram = 540MHz, pll5p = 1080MHz */ + pll5p_clk = 1080; reg_val |= CCM_PLL5_CTRL_M(CCM_PLL5_CTRL_M_X(2)); reg_val |= CCM_PLL5_CTRL_K(CCM_PLL5_CTRL_K_X(3)); reg_val |= CCM_PLL5_CTRL_N(CCM_PLL5_CTRL_N_X(15)); - reg_val |= CCM_PLL5_CTRL_P(1); } else if (clk >= 512 && clk < 528) { - /* dram = 512MHz, pll5p = 384MHz */ - pll5p_clk = 384; + /* dram = 512MHz, pll5p = 1536MHz */ + pll5p_clk = 1536; reg_val |= CCM_PLL5_CTRL_M(CCM_PLL5_CTRL_M_X(3)); reg_val |= CCM_PLL5_CTRL_K(CCM_PLL5_CTRL_K_X(4)); reg_val |= CCM_PLL5_CTRL_N(CCM_PLL5_CTRL_N_X(16)); - reg_val |= CCM_PLL5_CTRL_P(2); } else if (clk >= 496 && clk < 504) { - /* dram = 496MHz, pll5p = 372MHz */ - pll5p_clk = 372; + /* dram = 496MHz, pll5p = 1488MHz */ + pll5p_clk = 1488; reg_val |= CCM_PLL5_CTRL_M(CCM_PLL5_CTRL_M_X(3)); reg_val |= CCM_PLL5_CTRL_K(CCM_PLL5_CTRL_K_X(2)); reg_val |= CCM_PLL5_CTRL_N(CCM_PLL5_CTRL_N_X(31)); - reg_val |= CCM_PLL5_CTRL_P(2); } else if (clk >= 468 && clk < 480) { - /* dram = 468MHz, pll5p = 468MHz */ - pll5p_clk = 468; + /* dram = 468MHz, pll5p = 936MHz */ + pll5p_clk = 936; reg_val |= CCM_PLL5_CTRL_M(CCM_PLL5_CTRL_M_X(2)); reg_val |= CCM_PLL5_CTRL_K(CCM_PLL5_CTRL_K_X(3)); reg_val |= CCM_PLL5_CTRL_N(CCM_PLL5_CTRL_N_X(13)); - reg_val |= CCM_PLL5_CTRL_P(1); } else if (clk >= 396 && clk < 408) { - /* dram = 396MHz, pll5p = 396MHz */ - pll5p_clk = 396; + /* dram = 396MHz, pll5p = 792MHz */ + pll5p_clk = 792; reg_val |= CCM_PLL5_CTRL_M(CCM_PLL5_CTRL_M_X(2)); reg_val |= CCM_PLL5_CTRL_K(CCM_PLL5_CTRL_K_X(3)); reg_val |= CCM_PLL5_CTRL_N(CCM_PLL5_CTRL_N_X(11)); - reg_val |= CCM_PLL5_CTRL_P(1); } else { /* any other frequency that is a multiple of 24 */ reg_val |= CCM_PLL5_CTRL_M(CCM_PLL5_CTRL_M_X(2)); reg_val |= CCM_PLL5_CTRL_K(CCM_PLL5_CTRL_K_X(2)); reg_val |= CCM_PLL5_CTRL_N(CCM_PLL5_CTRL_N_X(clk / 24)); - reg_val |= CCM_PLL5_CTRL_P(CCM_PLL5_CTRL_P_X(2)); } reg_val &= ~CCM_PLL5_CTRL_VCO_GAIN; /* PLL VCO Gain off */ reg_val |= CCM_PLL5_CTRL_EN; /* PLL On */

The stale error status should be cleared for all sun4i/sun5i/sun7i hardware and not just for sun7i. Also there are two types of DQS gate training errors ("found no result" and "found more than one possible result"). Both are handled now.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com Acked-by: Ian Campbell ijc@hellion.org.uk --- Changes for v2: - no changes
arch/arm/cpu/armv7/sunxi/dram.c | 2 -- arch/arm/include/asm/arch-sunxi/dram.h | 4 +++- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index 6f98c6a..47017d2 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -357,9 +357,7 @@ static int dramc_scan_readpipe(void) u32 reg_val;
/* data training trigger */ -#ifdef CONFIG_SUN7I clrbits_le32(&dram->csr, DRAM_CSR_FAILED); -#endif setbits_le32(&dram->ccr, DRAM_CCR_DATA_TRAINING);
/* check whether data training process has completed */ diff --git a/arch/arm/include/asm/arch-sunxi/dram.h b/arch/arm/include/asm/arch-sunxi/dram.h index 3c29256..11e3507 100644 --- a/arch/arm/include/asm/arch-sunxi/dram.h +++ b/arch/arm/include/asm/arch-sunxi/dram.h @@ -133,7 +133,9 @@ struct dram_para { #define DRAM_DCR_MODE_SEQ 0x0 #define DRAM_DCR_MODE_INTERLEAVE 0x1
-#define DRAM_CSR_FAILED (0x1 << 20) +#define DRAM_CSR_DTERR (0x1 << 20) +#define DRAM_CSR_DTIERR (0x1 << 21) +#define DRAM_CSR_FAILED (DRAM_CSR_DTERR | DRAM_CSR_DTIERR)
#define DRAM_DRR_TRFC(n) ((n) & 0xff) #define DRAM_DRR_TREFI(n) (((n) & 0xffff) << 8)

It is going to be useful in more than one place.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- Changes for v2: - remove DRAM_DCR_NR_DLLCR_32BIT/DRAM_DCR_NR_DLLCR_16BIT/ DRAM_DCR_NR_DLLCR_8BIT macros - handle only 4 and 2 lanes (32-bit and 16-bit bus width), just like the old code did (now the patch introduces no functional changes)
arch/arm/cpu/armv7/sunxi/dram.c | 27 ++++++++++++++++----------- arch/arm/include/asm/arch-sunxi/dram.h | 3 --- 2 files changed, 16 insertions(+), 14 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index 47017d2..3048391 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -152,23 +152,28 @@ static void mctl_enable_dll0(u32 phase) udelay(22); }
+/* Get the number of DDR byte lanes */ +static u32 mctl_get_number_of_lanes(void) +{ + struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; + if ((readl(&dram->dcr) & DRAM_DCR_BUS_WIDTH_MASK) == + DRAM_DCR_BUS_WIDTH(DRAM_DCR_BUS_WIDTH_32BIT)) + return 4; + else + return 2; +} + /* * Note: This differs from pm/standby in that it checks the bus width */ static void mctl_enable_dllx(u32 phase) { struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; - u32 i, n, bus_width; - - bus_width = readl(&dram->dcr); + u32 i, number_of_lanes;
- if ((bus_width & DRAM_DCR_BUS_WIDTH_MASK) == - DRAM_DCR_BUS_WIDTH(DRAM_DCR_BUS_WIDTH_32BIT)) - n = DRAM_DCR_NR_DLLCR_32BIT; - else - n = DRAM_DCR_NR_DLLCR_16BIT; + number_of_lanes = mctl_get_number_of_lanes();
- for (i = 1; i < n; i++) { + for (i = 1; i <= number_of_lanes; i++) { clrsetbits_le32(&dram->dllcr[i], 0xf << 14, (phase & 0xf) << 14); clrsetbits_le32(&dram->dllcr[i], DRAM_DLLCR_NRESET, @@ -177,12 +182,12 @@ static void mctl_enable_dllx(u32 phase) } udelay(2);
- for (i = 1; i < n; i++) + for (i = 1; i <= number_of_lanes; i++) clrbits_le32(&dram->dllcr[i], DRAM_DLLCR_NRESET | DRAM_DLLCR_DISABLE); udelay(22);
- for (i = 1; i < n; i++) + for (i = 1; i <= number_of_lanes; i++) clrsetbits_le32(&dram->dllcr[i], DRAM_DLLCR_DISABLE, DRAM_DLLCR_NRESET); udelay(22); diff --git a/arch/arm/include/asm/arch-sunxi/dram.h b/arch/arm/include/asm/arch-sunxi/dram.h index 11e3507..71301db 100644 --- a/arch/arm/include/asm/arch-sunxi/dram.h +++ b/arch/arm/include/asm/arch-sunxi/dram.h @@ -122,9 +122,6 @@ struct dram_para { #define DRAM_DCR_BUS_WIDTH_32BIT 0x3 #define DRAM_DCR_BUS_WIDTH_16BIT 0x1 #define DRAM_DCR_BUS_WIDTH_8BIT 0x0 -#define DRAM_DCR_NR_DLLCR_32BIT 5 -#define DRAM_DCR_NR_DLLCR_16BIT 3 -#define DRAM_DCR_NR_DLLCR_8BIT 2 #define DRAM_DCR_RANK_SEL(n) (((n) & 0x3) << 10) #define DRAM_DCR_RANK_SEL_MASK DRAM_DCR_CMD_RANK(0x3) #define DRAM_DCR_CMD_RANK_ALL (0x1 << 12)

On Sun, 2014-08-03 at 05:32 +0300, Siarhei Siamashka wrote:
It is going to be useful in more than one place.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com
Acked-by: Ian Campbell ijc@hellion.org.uk

The hardware DQS gate training is a bit unreliable and does not always find the best delay settings.
So we introduce a 32-bit 'dqs_gating_delay' variable, where each byte encodes the DQS gating delay for each byte lane. The delay granularity is 1/4 cycle.
Also we allow to enable the active DQS gating window mode, which works better than the passive mode in practice. The DDR3 spec says that there is a 0.9 cycles preamble and 0.3 cycle postamble. The DQS window has to be opened during preamble and closed during postamble. In the passive window mode, the gating window is opened and closed by just using the gating delay settings. And because of the 1/4 cycle delay granularity, accurately hitting the 0.3 cycle long postamble is a bit tough. In the active window mode, the gating window is auto-closing with the help of monitoring the DQS line, which relaxes the gating delay accuracy requirements.
But the hardware DQS gate training is still performed in the passive window mode. It is a more strict test, which is reducing the results variance compared to the training with active window mode.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- Changes for v2: - no changes
arch/arm/cpu/armv7/sunxi/dram.c | 55 +++++++++++++++++++++++++++++++++- arch/arm/include/asm/arch-sunxi/dram.h | 2 ++ 2 files changed, 56 insertions(+), 1 deletion(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index 3048391..be6750d 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -136,6 +136,14 @@ static void mctl_itm_enable(void) clrbits_le32(&dram->ccr, DRAM_CCR_ITM_OFF); }
+static void mctl_itm_reset(void) +{ + mctl_itm_disable(); + udelay(1); /* ITM reset needs a bit of delay */ + mctl_itm_enable(); + udelay(1); +} + static void mctl_enable_dll0(u32 phase) { struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; @@ -356,6 +364,37 @@ static void mctl_setup_dram_clock(u32 clk, u32 mbus_clk) udelay(22); }
+/* + * The data from rslrX and rdgrX registers (X=rank) is stored + * in a single 32-bit value using the following format: + * bits [31:26] - DQS gating system latency for byte lane 3 + * bits [25:24] - DQS gating phase select for byte lane 3 + * bits [23:18] - DQS gating system latency for byte lane 2 + * bits [17:16] - DQS gating phase select for byte lane 2 + * bits [15:10] - DQS gating system latency for byte lane 1 + * bits [ 9:8 ] - DQS gating phase select for byte lane 1 + * bits [ 7:2 ] - DQS gating system latency for byte lane 0 + * bits [ 1:0 ] - DQS gating phase select for byte lane 0 + */ +static void mctl_set_dqs_gating_delay(int rank, u32 dqs_gating_delay) +{ + struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; + u32 lane, number_of_lanes = mctl_get_number_of_lanes(); + /* rank0 gating system latency (3 bits per lane: cycles) */ + u32 slr = readl(rank == 0 ? &dram->rslr0 : &dram->rslr1); + /* rank0 gating phase select (2 bits per lane: 90, 180, 270, 360) */ + u32 dgr = readl(rank == 0 ? &dram->rdgr0 : &dram->rdgr1); + for (lane = 0; lane < number_of_lanes; lane++) { + u32 tmp = dqs_gating_delay >> (lane * 8); + slr &= ~(7 << (lane * 3)); + slr |= ((tmp >> 2) & 7) << (lane * 3); + dgr &= ~(3 << (lane * 2)); + dgr |= (tmp & 3) << (lane * 2); + } + writel(slr, rank == 0 ? &dram->rslr0 : &dram->rslr1); + writel(dgr, rank == 0 ? &dram->rdgr0 : &dram->rdgr1); +} + static int dramc_scan_readpipe(void) { struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; @@ -618,7 +657,7 @@ unsigned long dramc_init(struct dram_para *para) writel(para->emr2, &dram->emr2); writel(para->emr3, &dram->emr3);
- /* set DQS window mode */ + /* disable drift compensation and set passive DQS window mode */ clrsetbits_le32(&dram->ccr, DRAM_CCR_DQS_DRIFT_COMP, DRAM_CCR_DQS_GATE);
#ifdef CONFIG_SUN7I @@ -631,11 +670,25 @@ unsigned long dramc_init(struct dram_para *para)
/* scan read pipe value */ mctl_itm_enable(); + + /* Hardware DQS gate training */ ret_val = dramc_scan_readpipe();
if (ret_val < 0) return 0;
+ /* allow to override the DQS training results with a custom delay */ + if (para->dqs_gating_delay) + mctl_set_dqs_gating_delay(0, para->dqs_gating_delay); + + /* set the DQS gating window type */ + if (para->active_windowing) + clrbits_le32(&dram->ccr, DRAM_CCR_DQS_GATE); + else + setbits_le32(&dram->ccr, DRAM_CCR_DQS_GATE); + + mctl_itm_reset(); + /* configure all host port */ mctl_configure_hostport();
diff --git a/arch/arm/include/asm/arch-sunxi/dram.h b/arch/arm/include/asm/arch-sunxi/dram.h index 71301db..1945f75 100644 --- a/arch/arm/include/asm/arch-sunxi/dram.h +++ b/arch/arm/include/asm/arch-sunxi/dram.h @@ -88,6 +88,8 @@ struct dram_para { u32 emr1; u32 emr2; u32 emr3; + u32 dqs_gating_delay; + u32 active_windowing; };
#define DRAM_CCR_COMMAND_RATE_1T (0x1 << 5)

On Sun, 2014-08-03 at 05:32 +0300, Siarhei Siamashka wrote:
The hardware DQS gate training is a bit unreliable and does not always find the best delay settings.
So we introduce a 32-bit 'dqs_gating_delay' variable, where each byte encodes the DQS gating delay for each byte lane. The delay granularity is 1/4 cycle.
Also we allow to enable the active DQS gating window mode, which works better than the passive mode in practice. The DDR3 spec says that there is a 0.9 cycles preamble and 0.3 cycle postamble. The DQS window has to be opened during preamble and closed during postamble. In the passive window mode, the gating window is opened and closed by just using the gating delay settings. And because of the 1/4 cycle delay granularity, accurately hitting the 0.3 cycle long postamble is a bit tough. In the active window mode, the gating window is auto-closing with the help of monitoring the DQS line, which relaxes the gating delay accuracy requirements.
But the hardware DQS gate training is still performed in the passive window mode. It is a more strict test, which is reducing the results variance compared to the training with active window mode.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com
Acked-by: Ian Campbell ijc@hellion.org.uk

All the known Allwinner A10/A13/A20 devices are using just single rank DDR3 memory. So don't pretend that we support DDR2 or more than one rank, because nobody could ever test these configurations for real and they are likely broken. Support for these features can be added back in the case if such hardware actually exists.
As part of this code cleanup, also replace division by 1024 with division by 1000 for the refresh timing calculations. This allows to use the original non-skewed tRFC timing table from the DRR3 spec and make code less confusing.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com --- Changes for v2: - no changes
arch/arm/cpu/armv7/sunxi/dram.c | 41 +++++++++++++++++++---------------------- 1 file changed, 19 insertions(+), 22 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index be6750d..0bbb10d 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -434,20 +434,18 @@ static void dramc_clock_output_en(u32 on) #endif }
-static const u16 tRFC_table[2][6] = { - /* 256Mb 512Mb 1Gb 2Gb 4Gb 8Gb */ - /* DDR2 75ns 105ns 127.5ns 195ns 327.5ns invalid */ - { 77, 108, 131, 200, 336, 336 }, - /* DDR3 invalid 90ns 110ns 160ns 300ns 350ns */ - { 93, 93, 113, 164, 308, 359 } +/* tRFC in nanoseconds for different densities (from the DDR3 spec) */ +static const u16 tRFC_DDR3_table[6] = { + /* 256Mb 512Mb 1Gb 2Gb 4Gb 8Gb */ + 90, 90, 110, 160, 300, 350 };
-static void dramc_set_autorefresh_cycle(u32 clk, u32 type, u32 density) +static void dramc_set_autorefresh_cycle(u32 clk, u32 density) { struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; u32 tRFC, tREFI;
- tRFC = (tRFC_table[type][density] * clk + 1023) >> 10; + tRFC = (tRFC_DDR3_table[density] * clk + 999) / 1000; tREFI = (7987 * clk) >> 10; /* <= 7.8us */
writel(DRAM_DRR_TREFI(tREFI) | DRAM_DRR_TRFC(tRFC), &dram->drr); @@ -570,6 +568,13 @@ unsigned long dramc_init(struct dram_para *para) if (!para) return 0;
+ /* + * only single rank DDR3 is supported by this code even though the + * hardware can theoretically support DDR2 and up to two ranks + */ + if (para->type != DRAM_MEMORY_TYPE_DDR3 || para->rank_num != 1) + return 0; + /* setup DRAM relative clock */ mctl_setup_dram_clock(para->clock, para->mbus_clock);
@@ -590,9 +595,7 @@ unsigned long dramc_init(struct dram_para *para) mctl_enable_dll0(para->tpr3);
/* configure external DRAM */ - reg_val = 0x0; - if (para->type == DRAM_MEMORY_TYPE_DDR3) - reg_val |= DRAM_DCR_TYPE_DDR3; + reg_val = DRAM_DCR_TYPE_DDR3; reg_val |= DRAM_DCR_IO_WIDTH(para->io_width >> 3);
if (para->density == 256) @@ -632,25 +635,19 @@ unsigned long dramc_init(struct dram_para *para) mctl_enable_dllx(para->tpr3);
/* set refresh period */ - dramc_set_autorefresh_cycle(para->clock, para->type - 2, density); + dramc_set_autorefresh_cycle(para->clock, density);
/* set timing parameters */ writel(para->tpr0, &dram->tpr0); writel(para->tpr1, &dram->tpr1); writel(para->tpr2, &dram->tpr2);
- if (para->type == DRAM_MEMORY_TYPE_DDR3) { - reg_val = DRAM_MR_BURST_LENGTH(0x0); + reg_val = DRAM_MR_BURST_LENGTH(0x0); #if (defined(CONFIG_SUN5I) || defined(CONFIG_SUN7I)) - reg_val |= DRAM_MR_POWER_DOWN; + reg_val |= DRAM_MR_POWER_DOWN; #endif - reg_val |= DRAM_MR_CAS_LAT(para->cas - 4); - reg_val |= DRAM_MR_WRITE_RECOVERY(0x5); - } else if (para->type == DRAM_MEMORY_TYPE_DDR2) { - reg_val = DRAM_MR_BURST_LENGTH(0x2); - reg_val |= DRAM_MR_CAS_LAT(para->cas); - reg_val |= DRAM_MR_WRITE_RECOVERY(0x5); - } + reg_val |= DRAM_MR_CAS_LAT(para->cas - 4); + reg_val |= DRAM_MR_WRITE_RECOVERY(0x5); writel(reg_val, &dram->mr);
writel(para->emr1, &dram->emr);

On Sun, 2014-08-03 at 05:32 +0300, Siarhei Siamashka wrote:
All the known Allwinner A10/A13/A20 devices are using just single rank DDR3 memory. So don't pretend that we support DDR2 or more than one rank, because nobody could ever test these configurations for real and they are likely broken. Support for these features can be added back in the case if such hardware actually exists.
As part of this code cleanup, also replace division by 1024 with division by 1000 for the refresh timing calculations. This allows to use the original non-skewed tRFC timing table from the DRR3 spec and make code less confusing.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com
Acked-by: Ian Campbell ijc@hellion.org.uk

The write recovery time is 15ns for all JEDEC DDR3 speed bins. And instead of hardcoding it to 10 cycles, it is possible to set tighter timings based on accurate calculations. For example, DRAM clock frequencies up to 533MHz need only 8 cycles for write recovery.
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com Acked-by: Ian Campbell ijc@hellion.org.uk --- Changes for v2: - no changes
arch/arm/cpu/armv7/sunxi/dram.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index 0bbb10d..2ad685b 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -451,6 +451,21 @@ static void dramc_set_autorefresh_cycle(u32 clk, u32 density) writel(DRAM_DRR_TREFI(tREFI) | DRAM_DRR_TRFC(tRFC), &dram->drr); }
+/* Calculate the value for A11, A10, A9 bits in MR0 (write recovery) */ +static u32 ddr3_write_recovery(u32 clk) +{ + u32 twr_ns = 15; /* DDR3 spec says that it is 15ns for all speed bins */ + u32 twr_ck = (twr_ns * clk + 999) / 1000; + if (twr_ck < 5) + return 1; + else if (twr_ck <= 8) + return twr_ck - 4; + else if (twr_ck <= 10) + return 5; + else + return 6; +} + /* * If the dram->ppwrsctl (SDR_DPCR) register has the lowest bit set to 1, this * means that DRAM is currently in self-refresh mode and retaining the old @@ -647,7 +662,7 @@ unsigned long dramc_init(struct dram_para *para) reg_val |= DRAM_MR_POWER_DOWN; #endif reg_val |= DRAM_MR_CAS_LAT(para->cas - 4); - reg_val |= DRAM_MR_WRITE_RECOVERY(0x5); + reg_val |= DRAM_MR_WRITE_RECOVERY(ddr3_write_recovery(para->clock)); writel(reg_val, &dram->mr);
writel(para->emr1, &dram->emr);

In the case if the 'dram_para' struct does not specify the exact bus width or chip density, just use a trial and error method to find a usable configuration.
Because all the major bugs in the DRAM initialization sequence are now hopefully fixed, it should be safe to re-initialize the DRAM controller multiple times until we get it configured right. The original Allwinner's boot0 bootloader also used a similar autodetection trick.
The DDR3 spec contains the package pinout and addressing table for different possible chip densities. It appears to be impossible to distinguish between a single chip with 16 I/O data lines and a pair of chips with 8 I/O data lines in the case if they provide the same storage capacity. Because a single 16-bit chip has a higher density than a pair of equivalent 8-bit chips, it has stricter refresh timings. So in the case of doubt, we assume that 16-bit chips are used. Additionally, only Allwinner A20 has all A0-A15 address lines and can support densities up to 8192. The older Allwinner A10 and Allwinner A13 can only support densities up to 4096.
We deliberately leave out DDR2, dual-rank configurations and the special case of a 8-bit chip with density 8192. None of these configurations seem to have been ever used in real devices. And no new devices are likely to use these exotic configurations (because only up to 2GB of RAM can be populated in any case).
This DRAM autodetection feature potentially allows to have a single low performance fail-safe DDR3 initialiazation for a universal single bootloader binary, which can be compatible with all Allwinner A10/A13/A20 based devices (if the ifdefs are replaced with a runtime SoC type detection).
Signed-off-by: Siarhei Siamashka siarhei.siamashka@gmail.com Acked-by: Ian Campbell ijc@hellion.org.uk --- Changes for v2: - break lines in the commit message in order not to exceed 72 characters line limit - no other changes
arch/arm/cpu/armv7/sunxi/dram.c | 52 +++++++++++++++++++++++++++++++++++++---- 1 file changed, 47 insertions(+), 5 deletions(-)
diff --git a/arch/arm/cpu/armv7/sunxi/dram.c b/arch/arm/cpu/armv7/sunxi/dram.c index 2ad685b..584f742 100644 --- a/arch/arm/cpu/armv7/sunxi/dram.c +++ b/arch/arm/cpu/armv7/sunxi/dram.c @@ -572,17 +572,13 @@ static void mctl_set_impedance(u32 zq, u32 odt_en) writel(DRAM_IOCR_ODT_EN(odt_en), &dram->iocr); }
-unsigned long dramc_init(struct dram_para *para) +static unsigned long dramc_init_helper(struct dram_para *para) { struct sunxi_dram_reg *dram = (struct sunxi_dram_reg *)SUNXI_DRAMC_BASE; u32 reg_val; u32 density; int ret_val;
- /* check input dram parameter structure */ - if (!para) - return 0; - /* * only single rank DDR3 is supported by this code even though the * hardware can theoretically support DDR2 and up to two ranks @@ -706,3 +702,49 @@ unsigned long dramc_init(struct dram_para *para)
return get_ram_size((long *)PHYS_SDRAM_0, PHYS_SDRAM_0_SIZE); } + +unsigned long dramc_init(struct dram_para *para) +{ + unsigned long dram_size, actual_density; + + /* If the dram configuration is not provided, use a default */ + if (!para) + return 0; + + /* if everything is known, then autodetection is not necessary */ + if (para->io_width && para->bus_width && para->density) + return dramc_init_helper(para); + + /* try to autodetect the DRAM bus width and density */ + para->io_width = 16; + para->bus_width = 32; +#if defined(CONFIG_SUN4I) || defined(CONFIG_SUN5I) + /* only A0-A14 address lines on A10/A13, limiting max density to 4096 */ + para->density = 4096; +#else + /* all A0-A15 address lines on A20, which allow density 8192 */ + para->density = 8192; +#endif + + dram_size = dramc_init_helper(para); + if (!dram_size) { + /* if 32-bit bus width failed, try 16-bit bus width instead */ + para->bus_width = 16; + dram_size = dramc_init_helper(para); + if (!dram_size) { + /* if 16-bit bus width also failed, then bail out */ + return dram_size; + } + } + + /* check if we need to adjust the density */ + actual_density = (dram_size >> 17) * para->io_width / para->bus_width; + + if (actual_density != para->density) { + /* update the density and re-initialize DRAM again */ + para->density = actual_density; + dram_size = dramc_init_helper(para); + } + + return dram_size; +}

On Sun, 2014-08-03 at 05:32 +0300, Siarhei Siamashka wrote:
This is version 2 of http://lists.denx.de/pipermail/u-boot/2014-July/183981.html
Rebased on git://git.denx.de/u-boot-sunxi.git master branch (commit 3340eab26d89176dd0bf543e6d2590665c577423 "sun7i: Add bananapi board")
This all looks great, thanks!
I think this should be the primary subject of the next sunxi PR. A change of this magnitude really ought to be in an rc1 so hopefully there won't be any significant changes required (it doesn't look like there should be to me).
I'm AFK from tomorrow until after when I expect rc1 to be so I'm not around to prepare a PR -- Hans can you take care of it (if you agree with my assessment of the patches, of course)? I've delegated them to you in pw in anticipation.
Cheers, Ian.

Hi,
On 08/05/2014 09:02 AM, Ian Campbell wrote:
On Sun, 2014-08-03 at 05:32 +0300, Siarhei Siamashka wrote:
This is version 2 of http://lists.denx.de/pipermail/u-boot/2014-July/183981.html
Rebased on git://git.denx.de/u-boot-sunxi.git master branch (commit 3340eab26d89176dd0bf543e6d2590665c577423 "sun7i: Add bananapi board")
This all looks great, thanks!
Indeed, many thanks for working on this and for the rebase.
I think this should be the primary subject of the next sunxi PR. A change of this magnitude really ought to be in an rc1 so hopefully there won't be any significant changes required (it doesn't look like there should be to me).
I'm AFK from tomorrow until after when I expect rc1 to be so I'm not around to prepare a PR -- Hans can you take care of it (if you agree with my assessment of the patches, of course)? I've delegated them to you in pw in anticipation.
Ok, I've also reviewed this, and ran some tests, and this looks good, I agree with trying to get this into RC-1.
But currently we already have a pull-req pending for Albert. So for now I've pushed this to u-boot-sunxi/next, with u-boot-sunxi/master still reflecting the pull-req send to Albert.
I guess it is probably best to send a new pull-req to Albert superseding the old one. Albert how would you prefer to handle this ?
Regards,
Hans
p.s.
Siarhei Siamashka, I've seen your "Single u-boot binary" patches, and I'll try to review them during the coming days.
participants (3)
-
Hans de Goede
-
Ian Campbell
-
Siarhei Siamashka