
On 03/17/2015 08:57 AM, popcorn mix wrote:
On 17/03/15 03:04, Stephen Warren wrote:
It would be nice though if someone from the RPi Foundation could comment on the exact effect of the upper bus address bits, and why 0xc would work for RPi2 but 0x4 for the RPi 1. I wonder if the ARM cache status (enabled, disabled) interacts with the GPU cache enable in any way, e.g. burst vs. non-burst transactions on the bus or something? That's about the only reason I can see for the RPi Foundation kernel working with 0x4 bus addresses on both chips, but U-Boot needing something different on RPi2...
Dom, for reference, see: http://lists.denx.de/pipermail/u-boot/2015-March/207947.html http://lists.denx.de/pipermail/u-boot/2015-March/thread.html#207947
Thanks for the great explanation. I'll have to bookmark/archive it:-)
First, remember that 2835 is a large GPU with a small ARM attached. On some platforms the ARM is not even used. The GPU boots first and may wake the arm. The GPU is the centre of the universe, and the ARM has to fit in.
Okay, I'll try to explain what goes on. Here are my definitions of some terms:
bus address: a VideoCore/GPU address. The lower 30-bits define the 1G of addressable memory. The top two bits define the caching alias. physical address: An ARM side address given to the VC MMU. This is a 30 bit address space.
The GPU always uses bus addresses. GPU bus mastering peripherals (like DMA) use bus addresses. The ARM uses physical addresses.
VC MMU: A coarse MMU used by the arm for accessing GPU memory. Each page is 16M and there are 64 pages. This maps 30-bits of physical address to 32-bits of bus address.
The setup of VC MMU is handled by the GPU and by default the mapping is: 2835: first 32 pages map physical addresses 0x00000000-0x1fffffff to bus addresses 0x40000000-0x5ffffffff. The next page maps physical adddress 0x20000000 to 0x20ffffff to bus addresses 0x7e000000 to 0x7effffff
2836: first 63 pages map physical addresses 0x00000000-0x3effffff to bus addresses 0xc0000000-0xfefffffff. The next page maps physical adddress 0x3f000000 to 0x3fffffff to bus addresses 0x7e000000 to 0x7effffff
OK, this explains why in U-Boot, we need to OR in 0x40000000 on bcm2835 and 0xc0000000 on bcm2836; that matches the VC MMU setup.
I guess we need to fix the U-Boot mailbox driver too, and many things in the upstream RPi kernel.
I have two more questions:
1)
Do the RPi 1 and RPi 2 use different kernel binaries in the RPi Foundation's images? I'd assumed there was a single unified binary which supported both. The reason I ask is that I see:
https://github.com/raspberrypi/linux/blob/rpi-3.18.y/arch/arm/mach-bcm2708/i...
#ifdef CONFIG_BCM2708_NOL2CACHE #define _REAL_BUS_OFFSET UL(0xC0000000) /* don't use L1 or L2 caches */ #else #define _REAL_BUS_OFFSET UL(0x40000000) /* use L2 cache */ #endif
That's identical in the mach-bcm2709 version too. However, arch/arm/mach-bcm270[89]/Kconfig's entry for that config option:
config BCM2708_NOL2CACHE bool "Videocore L2 cache disable" depends on MACH_BCM2709 default y help Do not allow ARM to use GPU's L2 cache. Requires disable_l2cache in config.txt.
Has "default n" for the bcm2708 version and "default y" for the bcm2709 version. If I'd noticed that difference in default value, it would have been a big clue that what I proposed in the U-Boot patch was correct! Anyway, this implies that there are separate kernel binaries for the RPi 1 and RPi 2, since otherwise those default values wouldn't work.
2)
I assume the SDHCI controller (RPi SD card, CM eMMC) is affected by this just as much; we need to use bus addresses not ARM physical addresses when programming any DMA there?
Perhaps this would explain why I had issues with the eMMC on the CM (I think only in the kernel though, whereas U-Boot may have been fine; I'll have to check)
...
So, on 2835 the ARM has a 16K L1 cache and no L2 cache. The GPU has a 128M L2 cache. The GPU's L2 cache is accessible from the ARM but it's not particularly close (i.e. not very fast). However mapping through the L2 allocating alias (0x4) was shown to be beneficial on 2835, so that is the alias we use.
The situation is different on 2836. The ARM has a 32K L1 cache and a 512M integrated/fast L2 cache. Additionally going through the smaller/slower GPU L2 is bad for performance. So, we map through the SDRAM alias (0xc) and avoid the GPU L2 cache.
I assume 128M and 512M there should be 128K and 512K?