[PATCH 0/1] Fix for U-Boot SPL hang on sunxi H6 due to incorrect ram size detection

On Orange Pi 3 LTS, sometimes on reboot, SPL detects ram as 4 GB instead of 2 GB. This leads to a hang later and board fails to boot. Fix is tested by leaving the device in a reboot loop overnight where device successfully rebooted 1350+ times by the time I stopped rebooting it.
Gunjan Gupta (1): sunxi: dram: Fix incorrect ram size detection for some H6 boards
arch/arm/mach-sunxi/dram_helpers.c | 1 + arch/arm/mach-sunxi/dram_sun50i_h6.c | 2 ++ 2 files changed, 3 insertions(+)

On some H6 boards like Orange Pi 3 LTS, some times U-Boot fails to detect ram size correctly. Instead of 2GB thats available, it detects 4GB of ram and then SPL just hangs there making board not to boot further.
On debugging, I found that the rows value were being determined correctly, but columns were sometimes off by one value. I found that adding some delay after the mctl_core_init call along with making use of dsb in the start of the mctl_mem_matches solves the issue.
Signed-off-by: Gunjan Gupta viraniac@gmail.com ---
arch/arm/mach-sunxi/dram_helpers.c | 1 + arch/arm/mach-sunxi/dram_sun50i_h6.c | 2 ++ 2 files changed, 3 insertions(+)
diff --git a/arch/arm/mach-sunxi/dram_helpers.c b/arch/arm/mach-sunxi/dram_helpers.c index cdf2750f1c..5758c58e07 100644 --- a/arch/arm/mach-sunxi/dram_helpers.c +++ b/arch/arm/mach-sunxi/dram_helpers.c @@ -32,6 +32,7 @@ void mctl_await_completion(u32 *reg, u32 mask, u32 val) #ifndef CONFIG_MACH_SUNIV bool mctl_mem_matches(u32 offset) { + dsb(); /* Try to write different values to RAM at two addresses */ writel(0, CFG_SYS_SDRAM_BASE); writel(0xaa55aa55, (ulong)CFG_SYS_SDRAM_BASE + offset); diff --git a/arch/arm/mach-sunxi/dram_sun50i_h6.c b/arch/arm/mach-sunxi/dram_sun50i_h6.c index bff2e42513..a031a845f5 100644 --- a/arch/arm/mach-sunxi/dram_sun50i_h6.c +++ b/arch/arm/mach-sunxi/dram_sun50i_h6.c @@ -623,6 +623,8 @@ static void mctl_auto_detect_dram_size(struct dram_para *para) para->cols = 11; mctl_core_init(para);
+ udelay(50); + for (para->cols = 8; para->cols < 11; para->cols++) { /* 8 bits per byte and 16/32 bit width */ if (mctl_mem_matches(1 << (para->cols + 1 +

On Sun, 1 Oct 2023 21:43:32 +0530 Gunjan Gupta viraniac@gmail.com wrote:
(fixing Jernej's email)
Hi Gunjan,
thanks for sending a patch!
On some H6 boards like Orange Pi 3 LTS, some times U-Boot fails to detect ram size correctly. Instead of 2GB thats available, it detects 4GB of ram and then SPL just hangs there making board not to boot further.
On debugging, I found that the rows value were being determined correctly, but columns were sometimes off by one value. I found that adding some delay after the mctl_core_init call along with making use of dsb in the start of the mctl_mem_matches solves the issue.
Signed-off-by: Gunjan Gupta viraniac@gmail.com
arch/arm/mach-sunxi/dram_helpers.c | 1 + arch/arm/mach-sunxi/dram_sun50i_h6.c | 2 ++ 2 files changed, 3 insertions(+)
diff --git a/arch/arm/mach-sunxi/dram_helpers.c b/arch/arm/mach-sunxi/dram_helpers.c index cdf2750f1c..5758c58e07 100644 --- a/arch/arm/mach-sunxi/dram_helpers.c +++ b/arch/arm/mach-sunxi/dram_helpers.c @@ -32,6 +32,7 @@ void mctl_await_completion(u32 *reg, u32 mask, u32 val) #ifndef CONFIG_MACH_SUNIV bool mctl_mem_matches(u32 offset) {
- dsb();
This looks a bit odd, do you have an explanation for that? And are you sure that is really needed? I understand why we need the DSB after the writel's below, but before that? The only thing I could think of is that we are missing a barrier in mctl_core_init() - which is the function called before mctl_mem_matches(). Can you move that dsb(); into mctl_auto_detect_dram_size(), right after the mctl_core_init() call (where you add the udelay() below)? And I wonder if a dmb() would already be sufficient? I noticed recently that the clr/setbit_le32() functions don't have a barrier at all, maybe that should be fixed instead?
/* Try to write different values to RAM at two addresses */ writel(0, CFG_SYS_SDRAM_BASE); writel(0xaa55aa55, (ulong)CFG_SYS_SDRAM_BASE + offset); diff --git a/arch/arm/mach-sunxi/dram_sun50i_h6.c b/arch/arm/mach-sunxi/dram_sun50i_h6.c index bff2e42513..a031a845f5 100644 --- a/arch/arm/mach-sunxi/dram_sun50i_h6.c +++ b/arch/arm/mach-sunxi/dram_sun50i_h6.c @@ -623,6 +623,8 @@ static void mctl_auto_detect_dram_size(struct dram_para *para) para->cols = 11; mctl_core_init(para);
- udelay(50);
The location of that udelay() looks a bit odd, any chance that belongs at the end of mctl_channel_init() instead? And how did you come up with that and the value of 50? Just pure experimentation? I think the original BSP DRAM code has plenty of delay calls, so we might have missed this one here, but I would love to see some explanation here.
Cheers, Andre
for (para->cols = 8; para->cols < 11; para->cols++) { /* 8 bits per byte and 16/32 bit width */ if (mctl_mem_matches(1 << (para->cols + 1 +

bool mctl_mem_matches(u32 offset) {
dsb();
This looks a bit odd, do you have an explanation for that? And are you sure that is really needed? I understand why we need the DSB after the writel's below, but before that?
I started with Ondrej Jirman's patch from LibreELEC's tree that had a dsb call added after the first writel call. That reduced the frequency of the errors but didn't removed it completely. My reason for moving it before the writel was to make sure any memory access is completed before starting the actual logic for the test. That reduced the frequency even further, but didn't resolve the issue. I did try removing it leaving only udelay added to the code, but that brings back the issue.
The only thing I could think of is that we are missing a barrier in mctl_core_init() - which is the function called before mctl_mem_matches(). Can you move that dsb(); into mctl_auto_detect_dram_size(), right after the mctl_core_init() call (where you add the udelay() below)? And I wonder if a dmb() would already be sufficient?
Sure, I will try experimenting with it.
I noticed recently that the clr/setbit_le32() functions don't have a barrier at all, maybe that should be fixed instead?
I haven't done much of the low level programming myself. Mostly have done user space programming along with fixing minor kernel module compilation issues due to kernel api changes. So I wasn't sure what all places to debug. But if you point me to places with things to try, I surely can give time playing around testing the proposed fixes.
@@ -623,6 +623,8 @@ static void mctl_auto_detect_dram_size(struct dram_para *para) para->cols = 11; mctl_core_init(para);
udelay(50);
The location of that udelay() looks a bit odd, any chance that belongs at the end of mctl_channel_init() instead? And how did you come up with that and the value of 50? Just pure experimentation?
Before adding the udelay, I added 7 print statements to print all the members of the para struct. That itself solved the issue along with the dsb added to the top of the mctl_mem_matches function. This is what gave me the clue that a delay is needed there. The value of 50 is indeed from pure experimentation
Thanks & Regards Gunjan Gupta

Dne ponedeljek, 02. oktober 2023 ob 14:42:40 CEST je Gunjan Gupta napisal(a):
bool mctl_mem_matches(u32 offset) {
dsb();
This looks a bit odd, do you have an explanation for that? And are you sure that is really needed? I understand why we need the DSB after the writel's below, but before that?
I started with Ondrej Jirman's patch from LibreELEC's tree that had a dsb call added after the first writel call. That reduced the frequency of the errors but didn't removed it completely. My reason for moving it before the writel was to make sure any memory access is completed before starting the actual logic for the test. That reduced the frequency even further, but didn't resolve the issue. I did try removing it leaving only udelay added to the code, but that brings back the issue.
The only thing I could think of is that we are missing a barrier in mctl_core_init() - which is the function called before mctl_mem_matches(). Can you move that dsb(); into mctl_auto_detect_dram_size(), right after the mctl_core_init() call (where you add the udelay() below)? And I wonder if a dmb() would already be sufficient?
Sure, I will try experimenting with it.
I noticed recently that the clr/setbit_le32() functions don't have a barrier at all, maybe that should be fixed instead?
I haven't done much of the low level programming myself. Mostly have done user space programming along with fixing minor kernel module compilation issues due to kernel api changes. So I wasn't sure what all places to debug. But if you point me to places with things to try, I surely can give time playing around testing the proposed fixes.
@@ -623,6 +623,8 @@ static void mctl_auto_detect_dram_size(struct dram_para *para) para->cols = 11; mctl_core_init(para);
udelay(50);
The location of that udelay() looks a bit odd, any chance that belongs at the end of mctl_channel_init() instead? And how did you come up with that and the value of 50? Just pure experimentation?
Before adding the udelay, I added 7 print statements to print all the members of the para struct. That itself solved the issue along with the dsb added to the top of the mctl_mem_matches function. This is what gave me the clue that a delay is needed there. The value of 50 is indeed from pure experimentation
Oh, I found one major difference between BSP and mainline driver. Please test patch attached below. I don't know if this path is always taken when wrong configuration is tested or not.
Best regards, Jernej
--- a/arch/arm/mach-sunxi/dram_sun50i_h6.c +++ b/arch/arm/mach-sunxi/dram_sun50i_h6.c @@ -420,6 +420,7 @@ static bool mctl_channel_init(struct dram_para *para) (struct sunxi_mctl_ctl_reg *)SUNXI_DRAM_CTL0_BASE; struct sunxi_mctl_phy_reg * const mctl_phy = (struct sunxi_mctl_phy_reg *)SUNXI_DRAM_PHY0_BASE; + bool ret = true; int i; u32 val;
@@ -537,7 +538,7 @@ static bool mctl_channel_init(struct dram_para *para) debug("DRAM PHY DX%dRSR0 = %x\n", i, readl(&mctl_phy->dx[i].rsr[0])); debug("Error while initializing DRAM PHY!\n");
- return false; + ret = false; }
if (sunxi_dram_is_lpddr(para->type)) @@ -553,7 +554,7 @@ static bool mctl_channel_init(struct dram_para *para) writel(0x7ff, &mctl_com->maer1); writel(0xffff, &mctl_com->maer2);
- return true; + return ret; }
static void mctl_auto_detect_rank_width(struct dram_para *para)

On Mon, 02 Oct 2023 20:59:34 +0200 Jernej Škrabec jernej.skrabec@gmail.com wrote:
Hi Jernej,
Dne ponedeljek, 02. oktober 2023 ob 14:42:40 CEST je Gunjan Gupta napisal(a):
bool mctl_mem_matches(u32 offset) {
dsb();
This looks a bit odd, do you have an explanation for that? And are you sure that is really needed? I understand why we need the DSB after the writel's below, but before that?
I started with Ondrej Jirman's patch from LibreELEC's tree that had a dsb call added after the first writel call. That reduced the frequency of the errors but didn't removed it completely. My reason for moving it before the writel was to make sure any memory access is completed before starting the actual logic for the test. That reduced the frequency even further, but didn't resolve the issue. I did try removing it leaving only udelay added to the code, but that brings back the issue.
The only thing I could think of is that we are missing a barrier in mctl_core_init() - which is the function called before mctl_mem_matches(). Can you move that dsb(); into mctl_auto_detect_dram_size(), right after the mctl_core_init() call (where you add the udelay() below)? And I wonder if a dmb() would already be sufficient?
Sure, I will try experimenting with it.
I noticed recently that the clr/setbit_le32() functions don't have a barrier at all, maybe that should be fixed instead?
I haven't done much of the low level programming myself. Mostly have done user space programming along with fixing minor kernel module compilation issues due to kernel api changes. So I wasn't sure what all places to debug. But if you point me to places with things to try, I surely can give time playing around testing the proposed fixes.
@@ -623,6 +623,8 @@ static void mctl_auto_detect_dram_size(struct dram_para *para) para->cols = 11; mctl_core_init(para);
udelay(50);
The location of that udelay() looks a bit odd, any chance that belongs at the end of mctl_channel_init() instead? And how did you come up with that and the value of 50? Just pure experimentation?
Before adding the udelay, I added 7 print statements to print all the members of the para struct. That itself solved the issue along with the dsb added to the top of the mctl_mem_matches function. This is what gave me the clue that a delay is needed there. The value of 50 is indeed from pure experimentation
Oh, I found one major difference between BSP and mainline driver. Please test patch attached below. I don't know if this path is always taken when wrong configuration is tested or not.
Best regards, Jernej
--- a/arch/arm/mach-sunxi/dram_sun50i_h6.c +++ b/arch/arm/mach-sunxi/dram_sun50i_h6.c @@ -420,6 +420,7 @@ static bool mctl_channel_init(struct dram_para *para) (struct sunxi_mctl_ctl_reg *)SUNXI_DRAM_CTL0_BASE; struct sunxi_mctl_phy_reg * const mctl_phy = (struct sunxi_mctl_phy_reg *)SUNXI_DRAM_PHY0_BASE;
bool ret = true; int i; u32 val;
@@ -537,7 +538,7 @@ static bool mctl_channel_init(struct dram_para *para) debug("DRAM PHY DX%dRSR0 = %x\n", i, readl(&mctl_phy->dx[i].rsr[0])); debug("Error while initializing DRAM PHY!\n");
So those error messages (and the ones before them, not shown here) look odd: if I follow the code correctly, this here is the only place which makes mctl_core_init() return false. And we rely on it doing so up to three times, to detect the proper rank and bus width, in mctl_auto_detect_rank_width(). So we should not have dramatic error messages (even for debug), as they would occur during normal, eventually successful setup as well. So shall we tone those messages down? I'd suggest to change the "Oops!..." comment into something saying that we detected a wrong rank/bus-width setup. Also removing the pointless PLL debug print (since the failure is not DRAM clock related), and also the final error message (the last line shown here). I will make patch for that.
But regardless I doubt that this patch is doing anything: when this function returns false, we set new rank/width parameters, and call mctl_core_init() again, which starts with mctl_sys_init() resetting all DRAM controller registers, which I actually wonder is really necessary. But surely any register setup after that point above is useless, with the current code.
Does that make sense?
Cheers, Andre
return false;
ret = false; } if (sunxi_dram_is_lpddr(para->type))
@@ -553,7 +554,7 @@ static bool mctl_channel_init(struct dram_para *para) writel(0x7ff, &mctl_com->maer1); writel(0xffff, &mctl_com->maer2);
return true;
return ret;
}
static void mctl_auto_detect_rank_width(struct dram_para *para)

On Saturday, October 21, 2023 1:38:39 AM CEST Andre Przywara wrote:
On Mon, 02 Oct 2023 20:59:34 +0200 Jernej Škrabec jernej.skrabec@gmail.com wrote:
Hi Jernej,
Dne ponedeljek, 02. oktober 2023 ob 14:42:40 CEST je Gunjan Gupta
napisal(a):
bool mctl_mem_matches(u32 offset) {
dsb();
This looks a bit odd, do you have an explanation for that? And are you sure that is really needed? I understand why we need the DSB after the writel's below, but before that?
I started with Ondrej Jirman's patch from LibreELEC's tree that had a dsb call added after the first writel call. That reduced the frequency of the errors but didn't removed it completely. My reason for moving it before the writel was to make sure any memory access is completed before starting the actual logic for the test. That reduced the frequency even further, but didn't resolve the issue. I did try removing it leaving only udelay added to the code, but that brings back the issue.
The only thing I could think of is that we are missing a barrier in mctl_core_init() - which is the function called before mctl_mem_matches(). Can you move that dsb(); into mctl_auto_detect_dram_size(), right after the mctl_core_init() call (where you add the udelay() below)? And I wonder if a dmb() would already be sufficient?
Sure, I will try experimenting with it.
I noticed recently that the clr/setbit_le32() functions don't have a barrier at all, maybe that should be fixed instead?
I haven't done much of the low level programming myself. Mostly have done user space programming along with fixing minor kernel module compilation issues due to kernel api changes. So I wasn't sure what all places to debug. But if you point me to places with things to try, I surely can give time playing around testing the proposed fixes.> >
@@ -623,6 +623,8 @@ static void mctl_auto_detect_dram_size(struct dram_para *para)> > > > para->cols = 11; mctl_core_init(para);
udelay(50);
The location of that udelay() looks a bit odd, any chance that belongs at the end of mctl_channel_init() instead? And how did you come up with that and the value of 50? Just pure experimentation?
Before adding the udelay, I added 7 print statements to print all the members of the para struct. That itself solved the issue along with the dsb added to the top of the mctl_mem_matches function. This is what gave me the clue that a delay is needed there. The value of 50 is indeed from pure experimentation
Oh, I found one major difference between BSP and mainline driver. Please test patch attached below. I don't know if this path is always taken when wrong configuration is tested or not.
Best regards, Jernej
--- a/arch/arm/mach-sunxi/dram_sun50i_h6.c +++ b/arch/arm/mach-sunxi/dram_sun50i_h6.c @@ -420,6 +420,7 @@ static bool mctl_channel_init(struct dram_para *para)
(struct sunxi_mctl_ctl_reg *)SUNXI_DRAM_CTL0_BASE; struct sunxi_mctl_phy_reg * const mctl_phy = (struct sunxi_mctl_phy_reg *)SUNXI_DRAM_PHY0_BASE;
bool ret = true; int i; u32 val;
@@ -537,7 +538,7 @@ static bool mctl_channel_init(struct dram_para *para)
debug("DRAM PHY DX%dRSR0 = %x\n", i, readl(&mctl_phy->dx[i].rsr[0])); debug("Error while initializing DRAM PHY!\n");
So those error messages (and the ones before them, not shown here) look odd: if I follow the code correctly, this here is the only place which makes mctl_core_init() return false. And we rely on it doing so up to three times, to detect the proper rank and bus width, in mctl_auto_detect_rank_width(). So we should not have dramatic error messages (even for debug), as they would occur during normal, eventually successful setup as well. So shall we tone those messages down? I'd suggest to change the "Oops!..." comment into something saying that we detected a wrong rank/bus-width setup. Also removing the pointless PLL debug print (since the failure is not DRAM clock related), and also the final error message (the last line shown here). I will make patch for that.
Those messages are remnants of old behaviour. In the past, code first enabled dual rank and 32-bit lanes. If that failed, it analyzed error messages and in second round only enabled supported features. However, that proved unreliable, so I changed code so it simply tries every combination and uses whatever works first.
In any case, I concur that these messages might not make sense anymore.
But regardless I doubt that this patch is doing anything: when this function returns false, we set new rank/width parameters, and call mctl_core_init() again, which starts with mctl_sys_init() resetting all DRAM controller registers, which I actually wonder is really necessary. But surely any register setup after that point above is useless, with the current code.
Does that make sense?
My concern is that code after failure might put controller or DRAM in initial state which may gave more time for next round of attempts to init DRAM properly. It may also be the reason why initial approach didn't work reliably.
I'm fine with current approach if Gunjan confirms that initialization works reliably.
Best regards, Jernej
Cheers, Andre
return false;
ret = false; } if (sunxi_dram_is_lpddr(para->type))
@@ -553,7 +554,7 @@ static bool mctl_channel_init(struct dram_para *para)
writel(0x7ff, &mctl_com->maer1); writel(0xffff, &mctl_com->maer2);
return true;
return ret;
}
static void mctl_auto_detect_rank_width(struct dram_para *para)

I'm fine with the current approach if Gunjan confirms that initialization works reliably.
I will test it out over the coming week. Too sick to even pick up the laptop right now.
Regards Gunjan

Dne ponedeljek, 02. oktober 2023 ob 13:26:26 CEST je Andre Przywara napisal(a):
On Sun, 1 Oct 2023 21:43:32 +0530 Gunjan Gupta viraniac@gmail.com wrote:
(fixing Jernej's email)
Hi Gunjan,
thanks for sending a patch!
On some H6 boards like Orange Pi 3 LTS, some times U-Boot fails to detect ram size correctly. Instead of 2GB thats available, it detects 4GB of ram and then SPL just hangs there making board not to boot further.
On debugging, I found that the rows value were being determined correctly, but columns were sometimes off by one value. I found that adding some delay after the mctl_core_init call along with making use of dsb in the start of the mctl_mem_matches solves the issue.
Signed-off-by: Gunjan Gupta viraniac@gmail.com
arch/arm/mach-sunxi/dram_helpers.c | 1 + arch/arm/mach-sunxi/dram_sun50i_h6.c | 2 ++ 2 files changed, 3 insertions(+)
diff --git a/arch/arm/mach-sunxi/dram_helpers.c b/arch/arm/mach-sunxi/dram_helpers.c index cdf2750f1c..5758c58e07 100644 --- a/arch/arm/mach-sunxi/dram_helpers.c +++ b/arch/arm/mach-sunxi/dram_helpers.c @@ -32,6 +32,7 @@ void mctl_await_completion(u32 *reg, u32 mask, u32 val) #ifndef CONFIG_MACH_SUNIV bool mctl_mem_matches(u32 offset) {
- dsb();
This looks a bit odd, do you have an explanation for that? And are you sure that is really needed? I understand why we need the DSB after the writel's below, but before that? The only thing I could think of is that we are missing a barrier in mctl_core_init() - which is the function called before mctl_mem_matches(). Can you move that dsb(); into mctl_auto_detect_dram_size(), right after the mctl_core_init() call (where you add the udelay() below)? And I wonder if a dmb() would already be sufficient? I noticed recently that the clr/setbit_le32() functions don't have a barrier at all, maybe that should be fixed instead?
Looking at original BSP DRAM code, there is no data barriers that I can find. Cache shouldn't be a thing before DRAM is initialized, right? Conversely, I suggest adding memory barriers before each udelay(), as it is there for a reason.
/* Try to write different values to RAM at two addresses */ writel(0, CFG_SYS_SDRAM_BASE); writel(0xaa55aa55, (ulong)CFG_SYS_SDRAM_BASE + offset); diff --git a/arch/arm/mach-sunxi/dram_sun50i_h6.c b/arch/arm/mach-sunxi/dram_sun50i_h6.c index bff2e42513..a031a845f5 100644 --- a/arch/arm/mach-sunxi/dram_sun50i_h6.c +++ b/arch/arm/mach-sunxi/dram_sun50i_h6.c @@ -623,6 +623,8 @@ static void mctl_auto_detect_dram_size(struct dram_para *para) para->cols = 11; mctl_core_init(para);
- udelay(50);
The location of that udelay() looks a bit odd, any chance that belongs at the end of mctl_channel_init() instead? And how did you come up with that and the value of 50? Just pure experimentation? I think the original BSP DRAM code has plenty of delay calls, so we might have missed this one here, but I would love to see some explanation here.
I checked original driver and we have *almost* all delays. There are two missing. Both in mctl_phy_pir_init(). One before mctl_await_completion() and another after it. Both are 1 us long.
Maybe that solves it (in combination with data barriers)?
Best regards, Jernej
Cheers, Andre
for (para->cols = 8; para->cols < 11; para->cols++) { /* 8 bits per byte and 16/32 bit width */ if (mctl_mem_matches(1 << (para->cols + 1 +

On Mon, 02 Oct 2023 20:50:49 +0200 Jernej Škrabec jernej.skrabec@gmail.com wrote:
Hi,
Dne ponedeljek, 02. oktober 2023 ob 13:26:26 CEST je Andre Przywara napisal(a):
On Sun, 1 Oct 2023 21:43:32 +0530 Gunjan Gupta viraniac@gmail.com wrote:
(fixing Jernej's email)
Hi Gunjan,
thanks for sending a patch!
On some H6 boards like Orange Pi 3 LTS, some times U-Boot fails to detect ram size correctly. Instead of 2GB thats available, it detects 4GB of ram and then SPL just hangs there making board not to boot further.
On debugging, I found that the rows value were being determined correctly, but columns were sometimes off by one value. I found that adding some delay after the mctl_core_init call along with making use of dsb in the start of the mctl_mem_matches solves the issue.
Signed-off-by: Gunjan Gupta viraniac@gmail.com
arch/arm/mach-sunxi/dram_helpers.c | 1 + arch/arm/mach-sunxi/dram_sun50i_h6.c | 2 ++ 2 files changed, 3 insertions(+)
diff --git a/arch/arm/mach-sunxi/dram_helpers.c b/arch/arm/mach-sunxi/dram_helpers.c index cdf2750f1c..5758c58e07 100644 --- a/arch/arm/mach-sunxi/dram_helpers.c +++ b/arch/arm/mach-sunxi/dram_helpers.c @@ -32,6 +32,7 @@ void mctl_await_completion(u32 *reg, u32 mask, u32 val) #ifndef CONFIG_MACH_SUNIV bool mctl_mem_matches(u32 offset) {
- dsb();
This looks a bit odd, do you have an explanation for that? And are you sure that is really needed? I understand why we need the DSB after the writel's below, but before that? The only thing I could think of is that we are missing a barrier in mctl_core_init() - which is the function called before mctl_mem_matches(). Can you move that dsb(); into mctl_auto_detect_dram_size(), right after the mctl_core_init() call (where you add the udelay() below)? And I wonder if a dmb() would already be sufficient? I noticed recently that the clr/setbit_le32() functions don't have a barrier at all, maybe that should be fixed instead?
Looking at original BSP DRAM code, there is no data barriers that I can find. Cache shouldn't be a thing before DRAM is initialized, right?
With the *MMU off* every memory access is "device nGnRnE", so definitely not cached. But a Cortex-A53 still has a store buffer which is in effect even for device accesses.
Conversely, I suggest adding memory barriers before each udelay(), as it is there for a reason.
Talking to a colleague and looking at the ARM ARM and other documentation, including [1]: A DMB memory barrier (__iowmb() in U-Boot and Linux), as used in readl/writel, just ensures ordering, it does not force completion. For just programming the DRAM controller, this is what we want. A DSB does everything that a DMB does, plus ensures "completion" of memory accesses (plus other things like TLBs and CMOs, which we don't care about in this case). In a Cortex-A53, this seems to include flushing the store buffer, which is probably the culprit here.
So I don't know if the delays in the BSP DRAM driver are really pauses that the DRAM controller needs. In this case we would need at least a DSB before the udelay(), if not a device-read-back, though I just assume that the DRAM controller registers do not have another buffer on the hardware side. Check 27:47 onward of [1].
Another possibility is that the delays are just crude measures to paper over missing barriers, hoping that the buffer is drained after the time. But since the delays don't really hurt here, we should just assume they are meaningful and keep them.
Also we pretty surely need a DSB after we setup the DRAM controller, but before we use the DRAM array: the controller registers and the array are separate devices, on different AXI buses. So that aspect of this patch seems actually to be alright, and the fact that Gunjan needed the DSB supports that.
I will try to come up with a patch that implements these ideas.
Cheers, Andre
P.S. Please note that above statements are an application of the architecture rules to the A53 and the MMU-off/single core situation in the SPL. The situation is more complex when running Linux, especially on more modern cores that do speculation and out-of-order execution.
[1] https://www.youtube.com/watch?v=i6DayghhA8Q
/* Try to write different values to RAM at two addresses */ writel(0, CFG_SYS_SDRAM_BASE); writel(0xaa55aa55, (ulong)CFG_SYS_SDRAM_BASE + offset); diff --git a/arch/arm/mach-sunxi/dram_sun50i_h6.c b/arch/arm/mach-sunxi/dram_sun50i_h6.c index bff2e42513..a031a845f5 100644 --- a/arch/arm/mach-sunxi/dram_sun50i_h6.c +++ b/arch/arm/mach-sunxi/dram_sun50i_h6.c @@ -623,6 +623,8 @@ static void mctl_auto_detect_dram_size(struct dram_para *para) para->cols = 11; mctl_core_init(para);
- udelay(50);
The location of that udelay() looks a bit odd, any chance that belongs at the end of mctl_channel_init() instead? And how did you come up with that and the value of 50? Just pure experimentation? I think the original BSP DRAM code has plenty of delay calls, so we might have missed this one here, but I would love to see some explanation here.
I checked original driver and we have *almost* all delays. There are two missing. Both in mctl_phy_pir_init(). One before mctl_await_completion() and another after it. Both are 1 us long.
Maybe that solves it (in combination with data barriers)?
Best regards, Jernej
Cheers, Andre
for (para->cols = 8; para->cols < 11; para->cols++) { /* 8 bits per byte and 16/32 bit width */ if (mctl_mem_matches(1 << (para->cols + 1 +
participants (3)
-
Andre Przywara
-
Gunjan Gupta
-
Jernej Škrabec