[U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations

Stephen Warren

13 Mar 2015 13 Mar '15

7:13 a.m.

BCM2835 bus addresses use the top 2 bits to determine whether peripherals use or bypass the GPU L1 and L2 cache. BCM2835-ARM-Peripherals.pdf states that:

0: L1 & L2 cached 4: L2 cache coherent (non allocaing) 8: L2 cached only c: Direct uncached.

That document also states that "Software accessing RAM using the DMA engines must use bus addresses (base at 0xc0000000). However, this appears to be incorrect since it does not work in practice on the bcm2835 (although it does on bcm2836). "usb start" causes some EABI function to call raise(8), presumably due to corrupted USB IN data (the converse is true on bcm2836; a value of 4 causes signals). However, I haven't investigated the cause.

A value of 4 matches what the RPI Foundation's kernel; see the definition of _REAL_BUS_OFFSET in arch/arm/mach-bcm2708/include/mach/memory.h. With the code updated to implement a phys->bus translation by setting the top two bits of DWC2 DMA addresses to 4, USB keyboard support appears stable.

A similar change is made for bcm2836 (RPi 2). I can't justify this value since it doesn't match the RPi Foundation kernel. However, it does appear to work for the built-in USB Ethernet at least.

Ideally, the bcm2835 SoC support would provide some common function for any DMA-capable driver to call to perform the phys->bus translation, rather than placing ifdefs in each driver file. However, I can't find such a standard function in U-Boot.

I'm not sure if e.g. SDHCI needs this change too? It appears to work fine without...

Cc: Eric Anholt eric@anholt.net Cc: Gordon Hollingworth gordon@holliweb.co.uk Signed-off-by: Stephen Warren swarren@wwwdotorg.org --- (For those CC'd: note that this is a patch for U-Boot)

drivers/usb/host/dwc2.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/host/dwc2.c b/drivers/usb/host/dwc2.c index e370d29ffc8e..f647461eabbb 100644 --- a/drivers/usb/host/dwc2.c +++ b/drivers/usb/host/dwc2.c @@ -752,6 +752,7 @@ int chunk_msg(struct usb_device *dev, unsigned long pipe, int *pid, int in, uint32_t xfer_len; uint32_t num_packets; int stop_transfer = 0; + uint32_t dma_addr;

debug("%s: msg: pipe %lx pid %d in %d len %d\n", __func__, pipe, *pid, in, len); @@ -792,7 +793,26 @@ int chunk_msg(struct usb_device *dev, unsigned long pipe, int *pid, int in, if (!in) memcpy(aligned_buffer, (char *)buffer + done, len);

- writel((uint32_t)aligned_buffer, &hc_regs->hcdma); + dma_addr = (uint32_t)aligned_buffer; +#if defined(CONFIG_BCM2836) + /* + * BCM2836 bus addresses use the top 2 bits to determine + * whether peripherals use or bypass the GPU L1 and L2 cache. + * While this doesn't match the value the RPi Foundation + * kernel uses, it does work in practice for U-Boot. + */ + dma_addr |= 0xc0000000; +#elif defined(CONFIG_BCM2835) + /* + * BCM2835 bus addresses use the top 2 bits to determine + * whether peripherals use or bypass the GPU L1 and L2 cache. + * This phys->virt mapping matches what the RPI Foundation's + * kernel does; see the definition of _REAL_BUS_OFFSET in + * arch/arm/mach-bcm2708/include/mach/memory.h. + */ + dma_addr |= 0x40000000; +#endif + writel(dma_addr, &hc_regs->hcdma);

/* Set host channel enable after all other setup is complete. */ clrsetbits_le32(&hc_regs->hcchar, DWC2_HCCHAR_MULTICNT_MASK |

-- 1.9.1

Show replies by date

Marek Vasut

13 Mar 13 Mar

3:30 p.m.

New subject: [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations

On Friday, March 13, 2015 at 07:13:09 AM, Stephen Warren wrote:

...

BCM2835 bus addresses use the top 2 bits to determine whether peripherals use or bypass the GPU L1 and L2 cache. BCM2835-ARM-Peripherals.pdf states that:

0: L1 & L2 cached 4: L2 cache coherent (non allocaing) 8: L2 cached only c: Direct uncached.

Caches aren't working on BCM2xxx or what's the reason for this hack ? Or are these different (not on-CPU) caches we're talking about (yes, I did notice the GPU Lx cache stuff)?

Best regards, Marek Vasut

Stephen Warren

5:35 p.m.

New subject: [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations

On 03/13/2015 08:30 AM, Marek Vasut wrote:

...

On Friday, March 13, 2015 at 07:13:09 AM, Stephen Warren wrote:

...
BCM2835 bus addresses use the top 2 bits to determine whether peripherals use or bypass the GPU L1 and L2 cache. BCM2835-ARM-Peripherals.pdf states that:

0: L1 & L2 cached 4: L2 cache coherent (non allocaing) 8: L2 cached only c: Direct uncached.

Caches aren't working on BCM2xxx or what's the reason for this hack ? Or are these different (not on-CPU) caches we're talking about (yes, I did notice the GPU Lx cache stuff)?

Yes, the "GPU" has its own caches, entirely separate from the ARM core and at a different location in the system bus structure, and it seems as if at least some other peripherals other than GPU/graphics/VideoCore access DRAM via those caches too.

There are some brief details in BCM2835-ARM-Peripherals.pdf, although it isn't terribly clear.

Marek Vasut

7:13 p.m.

New subject: [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations

On Friday, March 13, 2015 at 05:35:53 PM, Stephen Warren wrote:

...

On 03/13/2015 08:30 AM, Marek Vasut wrote:

...
On Friday, March 13, 2015 at 07:13:09 AM, Stephen Warren wrote:

...
BCM2835 bus addresses use the top 2 bits to determine whether peripherals use or bypass the GPU L1 and L2 cache. BCM2835-ARM-Peripherals.pdf states that:

0: L1 & L2 cached 4: L2 cache coherent (non allocaing) 8: L2 cached only c: Direct uncached.

Caches aren't working on BCM2xxx or what's the reason for this hack ? Or are these different (not on-CPU) caches we're talking about (yes, I did notice the GPU Lx cache stuff)?

Yes, the "GPU" has its own caches, entirely separate from the ARM core and at a different location in the system bus structure, and it seems as if at least some other peripherals other than GPU/graphics/VideoCore access DRAM via those caches too.

There are some brief details in BCM2835-ARM-Peripherals.pdf, although it isn't terribly clear.

Thanks for clearing this up. I suspect there's no way to turn those caches off altogether, right ? But uh ... ew :(

Best regards, Marek Vasut

Stephen Warren

7:39 p.m.

New subject: [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations

On 03/13/2015 12:13 PM, Marek Vasut wrote:

...

On Friday, March 13, 2015 at 05:35:53 PM, Stephen Warren wrote:

...
On 03/13/2015 08:30 AM, Marek Vasut wrote:

...
On Friday, March 13, 2015 at 07:13:09 AM, Stephen Warren wrote:

...
BCM2835 bus addresses use the top 2 bits to determine whether peripherals use or bypass the GPU L1 and L2 cache. BCM2835-ARM-Peripherals.pdf states that:

0: L1 & L2 cached 4: L2 cache coherent (non allocaing) 8: L2 cached only c: Direct uncached.

Caches aren't working on BCM2xxx or what's the reason for this hack ? Or are these different (not on-CPU) caches we're talking about (yes, I did notice the GPU Lx cache stuff)?

Yes, the "GPU" has its own caches, entirely separate from the ARM core and at a different location in the system bus structure, and it seems as if at least some other peripherals other than GPU/graphics/VideoCore access DRAM via those caches too.

There are some brief details in BCM2835-ARM-Peripherals.pdf, although it isn't terribly clear.

Thanks for clearing this up. I suspect there's no way to turn those caches off altogether, right ? But uh ... ew :(

There may be, Search for disable_l2cache at http://elinux.org/RPiconfig. That option is read by the SoC's binary bootloader (which I believe 99%-100% runs on the VideoCore not ARM) and programmed before the ARM bootloader (U-Boot) is started.

The disadvantages of the option are:

* According to all descriptions of the option I've seen, it requires that SW that wishes to run with that option enabled must pass a different upper 2 bits of physical address to DMA engines. See for example the elinux.org link above and:

https://github.com/raspberrypi/linux/blob/rpi-3.18.y/arch/arm/mach-bcm2708/i...

https://github.com/raspberrypi/linux/blob/rpi-3.18.y/arch/arm/mach-bcm2708/K...

* It's a system-wide option without any runtime control that I'm aware of, and so would affect anything U-Boot boots such as Linux, so Linux would need to be modified too. I assume it would reduce graphics performance at least.

As such, I don't think we want to require that option.

Marek Vasut

7:49 p.m.

New subject: [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations

On Friday, March 13, 2015 at 07:39:08 PM, Stephen Warren wrote:

...

On 03/13/2015 12:13 PM, Marek Vasut wrote:

...
On Friday, March 13, 2015 at 05:35:53 PM, Stephen Warren wrote:

...
On 03/13/2015 08:30 AM, Marek Vasut wrote:

...
On Friday, March 13, 2015 at 07:13:09 AM, Stephen Warren wrote:

...
BCM2835 bus addresses use the top 2 bits to determine whether peripherals use or bypass the GPU L1 and L2 cache. BCM2835-ARM-Peripherals.pdf states that:

0: L1 & L2 cached 4: L2 cache coherent (non allocaing) 8: L2 cached only c: Direct uncached.

Caches aren't working on BCM2xxx or what's the reason for this hack ? Or are these different (not on-CPU) caches we're talking about (yes, I did notice the GPU Lx cache stuff)?

Yes, the "GPU" has its own caches, entirely separate from the ARM core and at a different location in the system bus structure, and it seems as if at least some other peripherals other than GPU/graphics/VideoCore access DRAM via those caches too.

There are some brief details in BCM2835-ARM-Peripherals.pdf, although it isn't terribly clear.

Thanks for clearing this up. I suspect there's no way to turn those caches off altogether, right ? But uh ... ew :(

There may be, Search for disable_l2cache at http://elinux.org/RPiconfig. That option is read by the SoC's binary bootloader (which I believe 99%-100% runs on the VideoCore not ARM) and programmed before the ARM bootloader (U-Boot) is started.

The disadvantages of the option are:

According to all descriptions of the option I've seen, it requires

that SW that wishes to run with that option enabled must pass a different upper 2 bits of physical address to DMA engines. See for example the elinux.org link above and:

https://github.com/raspberrypi/linux/blob/rpi-3.18.y/arch/arm/mach-bcm2708/ include/mach/memory.h#L38

https://github.com/raspberrypi/linux/blob/rpi-3.18.y/arch/arm/mach-bcm2708/ Kconfig#L43

It's a system-wide option without any runtime control that I'm aware

of, and so would affect anything U-Boot boots such as Linux, so Linux would need to be modified too. I assume it would reduce graphics performance at least.

As such, I don't think we want to require that option.

Agreed.

Best regards, Marek Vasut

Eric Anholt

6:02 p.m.

New subject: [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations

Stephen Warren swarren@wwwdotorg.org writes:

...

BCM2835 bus addresses use the top 2 bits to determine whether peripherals use or bypass the GPU L1 and L2 cache. BCM2835-ARM-Peripherals.pdf states that:

0: L1 & L2 cached 4: L2 cache coherent (non allocaing) 8: L2 cached only c: Direct uncached.

That document also states that "Software accessing RAM using the DMA engines must use bus addresses (base at 0xc0000000). However, this appears to be incorrect since it does not work in practice on the bcm2835 (although it does on bcm2836). "usb start" causes some EABI function to call raise(8), presumably due to corrupted USB IN data (the converse is true on bcm2836; a value of 4 causes signals). However, I haven't investigated the cause.

A value of 4 matches what the RPI Foundation's kernel; see the definition of _REAL_BUS_OFFSET in arch/arm/mach-bcm2708/include/mach/memory.h. With the code updated to implement a phys->bus translation by setting the top two bits of DWC2 DMA addresses to 4, USB keyboard support appears stable.

A similar change is made for bcm2836 (RPi 2). I can't justify this value since it doesn't match the RPi Foundation kernel. However, it does appear to work for the built-in USB Ethernet at least.

Ideally, the bcm2835 SoC support would provide some common function for any DMA-capable driver to call to perform the phys->bus translation, rather than placing ifdefs in each driver file. However, I can't find such a standard function in U-Boot.

Huh. Agreed that it seems like it should be 0xc top bits on both, but I guess whatever works.

It does seem like we ought to have some vtophys / vtobus functions.

Stephen Warren

15 Mar 15 Mar

5:04 p.m.

New subject: [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations

On 03/13/2015 12:13 AM, Stephen Warren wrote:

...

BCM2835 bus addresses use the top 2 bits to determine whether peripherals use or bypass the GPU L1 and L2 cache. BCM2835-ARM-Peripherals.pdf states that: ...

If you do end up applying this, the subject should say phys->bus not phys->virt.

Marek Vasut

7:20 p.m.

New subject: [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations

On Sunday, March 15, 2015 at 05:04:05 PM, Stephen Warren wrote:

...

On 03/13/2015 12:13 AM, Stephen Warren wrote:

...
BCM2835 bus addresses use the top 2 bits to determine whether peripherals use or bypass the GPU L1 and L2 cache. BCM2835-ARM-Peripherals.pdf states that: ...

If you do end up applying this, the subject should say phys->bus not phys->virt.

I'd say we should wait a bit until these patches stabilize a little more, don't you think so ?

Best regards, Marek Vasut

Stephen Warren

17 Mar 17 Mar

4:04 a.m.

New subject: [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations

On 03/15/2015 12:20 PM, Marek Vasut wrote:

...

On Sunday, March 15, 2015 at 05:04:05 PM, Stephen Warren wrote:

...
On 03/13/2015 12:13 AM, Stephen Warren wrote:

...
BCM2835 bus addresses use the top 2 bits to determine whether peripherals use or bypass the GPU L1 and L2 cache. BCM2835-ARM-Peripherals.pdf states that: ...

If you do end up applying this, the subject should say phys->bus not phys->virt.

I'd say we should wait a bit until these patches stabilize a little more, don't you think so ?

I can see the argument. That said, I don't expect anything much to "stabilize" about the patches; they appear to work!

It would be nice though if someone from the RPi Foundation could comment on the exact effect of the upper bus address bits, and why 0xc would work for RPi2 but 0x4 for the RPi 1. I wonder if the ARM cache status (enabled, disabled) interacts with the GPU cache enable in any way, e.g. burst vs. non-burst transactions on the bus or something? That's about the only reason I can see for the RPi Foundation kernel working with 0x4 bus addresses on both chips, but U-Boot needing something different on RPi2...

Dom, for reference, see: http://lists.denx.de/pipermail/u-boot/2015-March/207947.html http://lists.denx.de/pipermail/u-boot/2015-March/thread.html#207947

popcorn mix

3:57 p.m.

New subject: [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations

On 17/03/15 03:04, Stephen Warren wrote:

...

It would be nice though if someone from the RPi Foundation could comment on the exact effect of the upper bus address bits, and why 0xc would work for RPi2 but 0x4 for the RPi 1. I wonder if the ARM cache status (enabled, disabled) interacts with the GPU cache enable in any way, e.g. burst vs. non-burst transactions on the bus or something? That's about the only reason I can see for the RPi Foundation kernel working with 0x4 bus addresses on both chips, but U-Boot needing something different on RPi2...

Dom, for reference, see: http://lists.denx.de/pipermail/u-boot/2015-March/207947.html http://lists.denx.de/pipermail/u-boot/2015-March/thread.html#207947

First, remember that 2835 is a large GPU with a small ARM attached. On some platforms the ARM is not even used. The GPU boots first and may wake the arm. The GPU is the centre of the universe, and the ARM has to fit in.

Okay, I'll try to explain what goes on. Here are my definitions of some terms:

bus address: a VideoCore/GPU address. The lower 30-bits define the 1G of addressable memory. The top two bits define the caching alias. physical address: An ARM side address given to the VC MMU. This is a 30 bit address space.

The GPU always uses bus addresses. GPU bus mastering peripherals (like DMA) use bus addresses. The ARM uses physical addresses.

VC MMU: A coarse MMU used by the arm for accessing GPU memory. Each page is 16M and there are 64 pages. This maps 30-bits of physical address to 32-bits of bus address. The setup of VC MMU is handled by the GPU and by default the mapping is: 2835: first 32 pages map physical addresses 0x00000000-0x1fffffff to bus addresses 0x40000000-0x5ffffffff. The next page maps physical adddress 0x20000000 to 0x20ffffff to bus addresses 0x7e000000 to 0x7effffff 2836: first 63 pages map physical addresses 0x00000000-0x3effffff to bus addresses 0xc0000000-0xfefffffff. The next page maps physical adddress 0x3f000000 to 0x3fffffff to bus addresses 0x7e000000 to 0x7effffff

Bus address 0x7exxxxxx contains the peripherals. Note: the top 16M of sdram is not visible to the arm due the mapping of the peripherals. The GPU and GPU peripherals (DMA) can see it as they use bus addresses

The bus address cache alias bits are:

From the VideoCore processor: 0x0 L1 and L2 cache allocating and coherent 0x4 L1 non-allocating, but coherent. L2 allocating and coherent 0x8 L1 non-allocating, but coherent. L2 non-allocating, but coherent 0xc SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent

From the GPU peripherals (note: all peripherals bypass the L1 cache. The arm will see this view once through the VC MMU): 0x0 Do not use 0x4 L1 non-allocating, and incoherent. L2 allocating and coherent. 0x8 L1 non-allocating, and incoherent. L2 non-allocating, but coherent 0xc SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent

In general as long as VideoCore processor and GPU peripherals use the same alias everything works out. Mixing aliases requires flushing/invalidating for coherency and is generally avoided.

So, on 2835 the ARM has a 16K L1 cache and no L2 cache. The GPU has a 128M L2 cache. The GPU's L2 cache is accessible from the ARM but it's not particularly close (i.e. not very fast). However mapping through the L2 allocating alias (0x4) was shown to be beneficial on 2835, so that is the alias we use.

The situation is different on 2836. The ARM has a 32K L1 cache and a 512M integrated/fast L2 cache. Additionally going through the smaller/slower GPU L2 is bad for performance. So, we map through the SDRAM alias (0xc) and avoid the GPU L2 cache.

So, what does this mean? In general if you don't use GPU peripherals or communicate with the GPU, you only care about physical addresses and it makes no difference what bus address is actually being used. The ARM just sees 1G of physical space that is always coherent. No flushing of GPU L2 cache is ever required. No need to know about aliases.

However if you do want to use GPU bus mastering peripherals (like DMA), or you communicate with the GPU (e.g. using the mailbox interface) you do need to distinguish physical and bus addresses, and you must use the correct alias.

So, on 2835 you convert from physical to bus address with bus_address = 0x40000000 | physical_address; And on 2836 you convert from physical to bus address with bus_address = 0xC0000000 | physical_address;

(Note: you can get these offsets from device tree. See: https://github.com/raspberrypi/userland/commit/3b81b91c18ff19f97033e146a9f32...)

So, when using GPU DMA, the addresses used for SCB, SA (source address), DA (dest address) must never be zero. They should be bus addresses and therefore 0x4 or 0xc aliases. However the difference between a 0x0 alias and a 0x4 alias is small. Using 0x0 is wrong, may be incoherent, and may trigger exceptions on the GPU. But you may get away with it. The difference between a 0x0 alias and a 0xC alias is much larger. There is now 128K of incoherent data you may hit. You are less likely to get away with getting this wrong.

So, I don't believe there is any issue with:

...

ARM cache status (enabled, disabled) interacts with the GPU cache enable in any way, e.g. burst vs. non-burst transactions on the bus or something

but I would guess there may be a current bug/misunderstanding on Pi1 uboot that happens to be more fatal on Pi2.

Stephen Warren

6:29 p.m.

New subject: [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations

On 03/17/2015 08:57 AM, popcorn mix wrote:

...

On 17/03/15 03:04, Stephen Warren wrote:

...
It would be nice though if someone from the RPi Foundation could comment on the exact effect of the upper bus address bits, and why 0xc would work for RPi2 but 0x4 for the RPi 1. I wonder if the ARM cache status (enabled, disabled) interacts with the GPU cache enable in any way, e.g. burst vs. non-burst transactions on the bus or something? That's about the only reason I can see for the RPi Foundation kernel working with 0x4 bus addresses on both chips, but U-Boot needing something different on RPi2...

Dom, for reference, see: http://lists.denx.de/pipermail/u-boot/2015-March/207947.html http://lists.denx.de/pipermail/u-boot/2015-March/thread.html#207947

Thanks for the great explanation. I'll have to bookmark/archive it:-)

...

First, remember that 2835 is a large GPU with a small ARM attached. On some platforms the ARM is not even used. The GPU boots first and may wake the arm. The GPU is the centre of the universe, and the ARM has to fit in.

Okay, I'll try to explain what goes on. Here are my definitions of some terms:

bus address: a VideoCore/GPU address. The lower 30-bits define the 1G of addressable memory. The top two bits define the caching alias. physical address: An ARM side address given to the VC MMU. This is a 30 bit address space.

The GPU always uses bus addresses. GPU bus mastering peripherals (like DMA) use bus addresses. The ARM uses physical addresses.

VC MMU: A coarse MMU used by the arm for accessing GPU memory. Each page is 16M and there are 64 pages. This maps 30-bits of physical address to 32-bits of bus address.

The setup of VC MMU is handled by the GPU and by default the mapping is: 2835: first 32 pages map physical addresses 0x00000000-0x1fffffff to bus addresses 0x40000000-0x5ffffffff. The next page maps physical adddress 0x20000000 to 0x20ffffff to bus addresses 0x7e000000 to 0x7effffff

2836: first 63 pages map physical addresses 0x00000000-0x3effffff to bus addresses 0xc0000000-0xfefffffff. The next page maps physical adddress 0x3f000000 to 0x3fffffff to bus addresses 0x7e000000 to 0x7effffff

OK, this explains why in U-Boot, we need to OR in 0x40000000 on bcm2835 and 0xc0000000 on bcm2836; that matches the VC MMU setup.

I guess we need to fix the U-Boot mailbox driver too, and many things in the upstream RPi kernel.

I have two more questions:

Do the RPi 1 and RPi 2 use different kernel binaries in the RPi Foundation's images? I'd assumed there was a single unified binary which supported both. The reason I ask is that I see:

...

https://github.com/raspberrypi/linux/blob/rpi-3.18.y/arch/arm/mach-bcm2708/i...

...

#ifdef CONFIG_BCM2708_NOL2CACHE #define _REAL_BUS_OFFSET UL(0xC0000000) /* don't use L1 or L2 caches */ #else #define _REAL_BUS_OFFSET UL(0x40000000) /* use L2 cache */ #endif

That's identical in the mach-bcm2709 version too. However, arch/arm/mach-bcm270[89]/Kconfig's entry for that config option:

...

config BCM2708_NOL2CACHE bool "Videocore L2 cache disable" depends on MACH_BCM2709 default y help Do not allow ARM to use GPU's L2 cache. Requires disable_l2cache in config.txt.

Has "default n" for the bcm2708 version and "default y" for the bcm2709 version. If I'd noticed that difference in default value, it would have been a big clue that what I proposed in the U-Boot patch was correct! Anyway, this implies that there are separate kernel binaries for the RPi 1 and RPi 2, since otherwise those default values wouldn't work.

I assume the SDHCI controller (RPi SD card, CM eMMC) is affected by this just as much; we need to use bus addresses not ARM physical addresses when programming any DMA there?

Perhaps this would explain why I had issues with the eMMC on the CM (I think only in the kernel though, whereas U-Boot may have been fine; I'll have to check)

...

So, on 2835 the ARM has a 16K L1 cache and no L2 cache. The GPU has a 128M L2 cache. The GPU's L2 cache is accessible from the ARM but it's not particularly close (i.e. not very fast). However mapping through the L2 allocating alias (0x4) was shown to be beneficial on 2835, so that is the alias we use.

The situation is different on 2836. The ARM has a 32K L1 cache and a 512M integrated/fast L2 cache. Additionally going through the smaller/slower GPU L2 is bad for performance. So, we map through the SDRAM alias (0xc) and avoid the GPU L2 cache.

I assume 128M and 512M there should be 128K and 512K?

popcorn mix

6:53 p.m.

New subject: [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations

On 17/03/15 17:29, Stephen Warren wrote:

...

Do the RPi 1 and RPi 2 use different kernel binaries in the RPi Foundation's images? I'd assumed there was a single unified binary which supported both. The reason I ask is that I see:

We ship separate kernel binaries (kernel.img for 2835 and kernel7.img for 2836). kernel.img is built from bcmrpi_defconfig, and kernel7.img is built from bcm2709_defconfig

A single unified binary would sure be nice, but I think we have too many non-device-tree drivers in our kernel and not enough experience to make this happen easily. It's certainly a desirable goal (as it moving closer to the upstream mach-2835 kernel).

...

I assume the SDHCI controller (RPi SD card, CM eMMC) is affected by this just as much; we need to use bus addresses not ARM physical addresses when programming any DMA there?

Yes. Any address given to the DMA controller should be a bus address. Similarly any address exchanged with the GPU (e.g. framebuffer address from mailbox interface) should be a bus address.

...

Perhaps this would explain why I had issues with the eMMC on the CM (I think only in the kernel though, whereas U-Boot may have been fine; I'll have to check)

Using physical addresses when bus addresses are required can almost work, but with intermittent failure cases, so yes that sounds possible.

...

I assume 128M and 512M there should be 128K and 512K?

Yes, quite right.

Stephen Warren

15 Mar 15 Mar

5:51 p.m.

New subject: [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations

On 03/13/2015 12:13 AM, Stephen Warren wrote:

...

BCM2835 bus addresses use the top 2 bits to determine whether peripherals use or bypass the GPU L1 and L2 cache. BCM2835-ARM-Peripherals.pdf states that:

0: L1 & L2 cached 4: L2 cache coherent (non allocaing) 8: L2 cached only c: Direct uncached.

That document also states that "Software accessing RAM using the DMA engines must use bus addresses (base at 0xc0000000). However, this appears to be incorrect since it does not work in practice on the bcm2835 (although it does on bcm2836). "usb start" causes some EABI function to call raise(8), presumably due to corrupted USB IN data (the converse is true on bcm2836; a value of 4 causes signals). However, I haven't investigated the cause.

I've confirmed that the raise(8) calls are due to corrupted USB IN data; the maxpacketsize field in the device descriptor is getting corrupted to 0, which in turn surely causes division by zero when calculating the number of packets in a transfer, for example.

Marek Vasut

7:20 p.m.

New subject: [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations

On Sunday, March 15, 2015 at 05:51:26 PM, Stephen Warren wrote:

...

On 03/13/2015 12:13 AM, Stephen Warren wrote:

...
BCM2835 bus addresses use the top 2 bits to determine whether peripherals use or bypass the GPU L1 and L2 cache. BCM2835-ARM-Peripherals.pdf states that:

0: L1 & L2 cached 4: L2 cache coherent (non allocaing) 8: L2 cached only c: Direct uncached.

That document also states that "Software accessing RAM using the DMA engines must use bus addresses (base at 0xc0000000). However, this appears to be incorrect since it does not work in practice on the bcm2835 (although it does on bcm2836). "usb start" causes some EABI function to call raise(8), presumably due to corrupted USB IN data (the converse is true on bcm2836; a value of 4 causes signals). However, I haven't investigated the cause.

I've confirmed that the raise(8) calls are due to corrupted USB IN data; the maxpacketsize field in the device descriptor is getting corrupted to 0, which in turn surely causes division by zero when calculating the number of packets in a transfer, for example.

Nice progress :)

Best regards, Marek Vasut

3710

Age (days ago)

3714

Last active (days ago)

List overview

Download

14 comments

4 participants

tags (0)

participants (4)

Eric Anholt
Marek Vasut
popcorn mix
Stephen Warren