drivers/ram/rockchip/sdram_common.c :: sdram_detect_dbw() LPDDR3/2 calculations seem wrong

Hi,
I got a nanopc-t4 amongst others which shipped with:
DDR Version 1.15 20181010 Channel 0: LPDDR3, 933MHz Bus Width=32 Col=10 Bank=8 Row=15/15 CS=2 Die Bus-Width=32 Size=2048MB ..
I have since upgraded to more recent u-boot versions:
U-Boot TPL 2020.07 (Sep 27 2020 - 12:34:15) Channel 0: LPDDR3, 933MHz BW=32 Col=10 Bk=8 CS0 Row=15 CS1 Row=15 CS=2 Die BW=16 Size=2048MB
U-Boot TPL 2020.10 (Nov 10 2020 - 13:37:45) Channel 0: LPDDR3, 933MHz BW=32 Col=10 Bk=8 CS0 Row=15 CS1 Row=15 CS=2 Die BW=16 Size=2048MB
The machine was highly instable showing memory and locking issues. When only using two little cores, it was a lot more stable.
I went and also tried:
DDR Version 1.24 20191016 Channel 0: LPDDR3, 933MHz Bus Width=32 Col=10 Bank=8 Row=15/15 CS=2 Die Bus-Width=16 Size=2048MB
which seems to match recent u-boot but all of them are different to the original Die BW of 32 which I currently assume to be correct for the Samsung K4E6E304EC-EGCG (so possibly the error also migrated into rokchip-linux/rkbin ?).
Looking at sdram_common.c::sdram_detect_dbw()
300 cs_cap = (1 << (row + col + bk + bw - 20)); 301 if (bw == 2) { 302 if (cs_cap <= 0x2000000) /* 256Mb */ 303 die_bw_0 = (col < 9) ? 2 : 1; 304 else if (cs_cap <= 0x10000000) /* 2Gb */ 305 die_bw_0 = (col < 10) ? 2 : 1; 306 else if (cs_cap <= 0x40000000) /* 8Gb */ 307 die_bw_0 = (col < 11) ? 2 : 1; 308 else 309 die_bw_0 = (col < 12) ? 2 : 1; 310 if (cs > 1) { 311 row = cap_info->cs1_row; 312 cs_cap = (1 << (row + col + bk + bw - 20)); 313 if (cs_cap <= 0x2000000) /* 256Mb */ 314 die_bw_0 = (col < 9) ? 2 : 1; 315 else if (cs_cap <= 0x10000000) /* 2Gb */ 316 die_bw_0 = (col < 10) ? 2 : 1; 317 else if (cs_cap <= 0x40000000) /* 8Gb */ 318 die_bw_0 = (col < 11) ? 2 : 1; 319 else 320 die_bw_0 = (col < 12) ? 2 : 1; 321 } 322 } else {
ca_cap is off by 20 bits compared to the values you are comparing to; in my case 0x400 and not 0x40000000:
type 6 row 15 col 10 bk 3 cs 2 bw 2 cs_cap 8 cs1_row 15 1 << (15 + 10 + 3 + 2 - 20) == 1 << 10 == 0x400
And similar in the 2nd case with cs1_row given cs > 1.
Now I know very little about all the memory chips out there but it seems very unlikely to regain these 20 bits in these calculations. So either the “-20” goes or the cs_cap <= values need adjustment.
The problem now comes from the fact that cap_info->dbw gets the wrong value from die_bw_0 this way and given it is LPDDR3 I assume that set_cap_relate_config() in sdram_rk3399.c later restores the wrong values for the “memdata_ratio”.
There might be more problems lingering, but changing this, my machine got a lot more reliable, though I still see memory errors when I push it to its (temperature) limits running on all 6 cores, even with decent cooling, but that might be a secondary problem.
Can someone with a lot more insight into this magic have a look and if needed please fix it?
/bz
participants (1)
-
Bjoern A. Zeeb