
On Tue, 02 Mar 2021 21:50:49 +0800 Icenowy Zheng icenowy@aosc.io wrote:
Hi Icenowy,
于 2021年3月2日 GMT+08:00 下午9:40:44, Andre Przywara andre.przywara@arm.com 写到:
On Fri, 26 Feb 2021 00:13:25 +0800 Icenowy Zheng icenowy@aosc.io wrote:
Previously we do not have proper dual rank memory detection on R40 (because we omitted PIR_QSGATE, which does not work on R40 with our configuration), and dual rank memory is just simply disabled as early R40 boards available (Banana Pi M2 Ultra and Berry) have single rank memory.
As a board with dual rank memory (Forlinx OKA40i-C) is now known to
us,
we need to have a way to do memory rank detection to support that
board.
Add some routine to detect memory rank by trying to access the memory in rank 1 and check for error status of the memory controller, and
then
enable dual rank memory on R40.
Similar routine can be used to detect half DQ width (which is also detected by PIR_QSGATE on other SoCs), but it's left unimplemented because there's no known R40 board with half DQ width now.
So when looking at the SPL size I realised that this patch breaks the socid constant parameter propagation, especially for mctl_set_cr(). I don't see immediately why, this whole code here should be compiled out on any A64 builds, for instance. Instead GCC turns "<mctl_set_cr.constprop.0>:" into "<mctl_set_cr>:", and passes 0x1689 around everytime. I tried GCC 10.2.0 and 9.2.0, also tried adding const everywhere, to amplify the constant nature of this value. Patch 1/2 added to the code size, but kept the const propagation (only one instance of 0x1689 in the disassembly). This patch here should be for free on A64, but adds 104 bytes.
Does anyone have a clue why this is happening? Is this a compiler issue?
Maybe the issue is that I assume this codepath is only for R40 and I didn't add socid to it?
But that's clearly visible by this function only being called inside "if (socid == SOCID_R40)". And that works very well for the H3 ZQ calibration quirk, for instance.
Could you try to add a socid parameter to mctl_r40_detect_rank_count()?
I just tried that, and apart from looking weird it didn't do anything.
To be clear: Your code is absolutely fine, exactly the way it should be written. It's just that the compiler is playing stupid suddenly. I was thinking that your dummy readl might have upset it, but even with that commented out it's still happening.
Or maybe it makes mctl_calc_rank_size() not inlined?
So the assembly looks like everything apart from mctl_set_cr() and mctl_auto_detect_dram_size_rank() is inlined. Those are extra because they are called multiple times and we are using -Os.
(Well I think the code looks noop on non-R40)
Exactly that was my thinking: when compiling for the A64, it should be totally compiled out, and the object file should be the same before and after.
BTW, original rank/width detection part won't get called on R40. But R40 is not where we are tight on code size, so I didn't bother to ignore it on R40.
I see. Yeah, I am not concerned about R40 either, but I want to avoid the A64 bloating up.
Or should we switch back to #if's and do not rely on compiler behavior any longer?
I'd rather not do that. We are doing the right thing, and it works marvellously so far.
To be clear: it's not a show stopper, I was just curious why this happens. The code size is not really critical on the A64 at the moment, so I'd merge it as it, even if we don't find a solution. Maybe just leave a hint about it in the code should people need to come back to this.
I also asked some compiler buffs about it, but it's not exactly the simple reproducer case they like to deal with ;-)
Cheers, Andre
Cheers, Andre
Signed-off-by: Icenowy Zheng icenowy@aosc.io
arch/arm/mach-sunxi/dram_sunxi_dw.c | 55
+++++++++++++++++++++++++----
1 file changed, 49 insertions(+), 6 deletions(-)
diff --git a/arch/arm/mach-sunxi/dram_sunxi_dw.c
b/arch/arm/mach-sunxi/dram_sunxi_dw.c
index 2b9d631d49..b86ae7cdf3 100644 --- a/arch/arm/mach-sunxi/dram_sunxi_dw.c +++ b/arch/arm/mach-sunxi/dram_sunxi_dw.c @@ -414,11 +414,9 @@ static void mctl_set_cr(uint16_t socid, struct
dram_para *para)
}
if (socid == SOCID_R40) {
if (para->dual_rank)
panic("Dual rank memory not supported\n");
- /* Mux pin to A15 address line for single rank memory. */
setbits_le32(&mctl_com->cr_r1, MCTL_CR_R1_MUX_A15);
if (!para->dual_rank)
}setbits_le32(&mctl_com->cr_r1, MCTL_CR_R1_MUX_A15);
}
@@ -702,8 +700,55 @@ static unsigned long mctl_calc_rank_size(struct
rank_para *rank)
return (1UL << (rank->row_bits + rank->bank_bits)) *
rank->page_size;
}
+/*
- Because we cannot do mctl_phy_init(PIR_QSGATE) on R40 now (which
leads
- to failure), it's needed to detect the rank count of R40 in
another way.
- The code here is modelled after time_out_detect() in BSP, which
tries to
- access the memory and check for error code.
- TODO: auto detect half DQ width here
- */
+static void mctl_r40_detect_rank_count(struct dram_para *para) +{
- ulong rank1_base = (ulong) CONFIG_SYS_SDRAM_BASE +
mctl_calc_rank_size(¶->ranks[0]);
- struct sunxi_mctl_ctl_reg * const mctl_ctl =
(struct sunxi_mctl_ctl_reg *)SUNXI_DRAM_CTL0_BASE;
- /* Enable read time out */
- setbits_le32(&mctl_ctl->pgcr[0], 0x1 << 25);
- (void) readl((void *) rank1_base);
- udelay(10);
- if (readl(&mctl_ctl->pgsr[0]) & (0x1 << 13)) {
clrsetbits_le32(&mctl_ctl->dtcr, 0xf << 24, 0x1 << 24);
para->dual_rank = 0;
- }
- /* Reset PHY FIFO to clear it */
- clrbits_le32(&mctl_ctl->pgcr[0], 0x1 << 26);
- udelay(100);
- setbits_le32(&mctl_ctl->pgcr[0], 0x1 << 26);
- /* Clear error status */
- setbits_le32(&mctl_ctl->pgcr[0], 0x1 << 24);
- /* Clear time out flag */
- clrbits_le32(&mctl_ctl->pgsr[0], 0x1 << 13);
- /* Disable read time out */
- clrbits_le32(&mctl_ctl->pgcr[0], 0x1 << 25);
+}
static void mctl_auto_detect_dram_size(uint16_t socid, struct
dram_para *para)
{
- if (socid == SOCID_R40) {
mctl_r40_detect_rank_count(para);
mctl_set_cr(socid, para);
- }
- mctl_auto_detect_dram_size_rank(socid, para,
(ulong)CONFIG_SYS_SDRAM_BASE, ¶->ranks[0]);
if ((socid == SOCID_A64 || socid == SOCID_R40) && para->dual_rank)
{
@@ -854,8 +899,6 @@ unsigned long sunxi_dram_init(void) uint16_t socid = SOCID_H3; #elif defined(CONFIG_MACH_SUN8I_R40) uint16_t socid = SOCID_R40;
- /* Currently we cannot support R40 with dual rank memory */
- para.dual_rank = 0;
#elif defined(CONFIG_MACH_SUN8I_V3S) uint16_t socid = SOCID_V3S; #elif defined(CONFIG_MACH_SUN50I)