[U-Boot-Users] pci memory booting on ppc460

Hello, I saw a reference to PCI booting in the docs, but could not find enough detail. We would like to use PCI memory space booting on an upcoming design with the PPC460EX. Does Uboot currently support that mode of booting? Is it as easy as booting from a flash? We are developing our app with the canyonlands board, but we won't be able to try the PCI boot until we have our own board finished.
Thanks -ame

Hi Ame,
Hello, I saw a reference to PCI booting in the docs, but could not find enough detail. We would like to use PCI memory space booting on an upcoming design with the PPC460EX. Does Uboot currently support that mode of booting? Is it as easy as booting from a flash? We are developing our app with the canyonlands board, but we won't be able to try the PCI boot until we have our own board finished.
U-Boot is a bootloader. What you are describing here circumvents needing a bootloader.
I use an MPC8349EA Freescale processor, and it also has an option to boot over PCI. However, I plan to use U-Boot on Flash to boot the board. I'll tell you why ... for my hardware anyway, your reasons may end up being similar.
If you want to boot your board over PCI, then the board will most likely be a peripheral board (a host would need to boot and setup bridges, so you would have a hard time booting a host through a bridge that is not configured). The host CPU would hold the image of the kernel that you were planning to boot onto the PCI peripheral board. However, your host would need to perform all of the tasks normally performed by the bootloader; setup the memory map, memory controllers, and peripherals that Linux expects to find configured. Then even trickier, is to setup the kernel boot command line arguments and device tree. My guess would be that you would have to hardwire that info into your kernel image, or add a small bootloader to the kernel image to setup the arguments to the kernel proper. There is also the issue that the host CPU is running an OS with its MMU enabled, so the kernel image that it can present to the board over the PCI bus is fragmented. The only way to make this image linear would be to preserve a chunk of memory on boot of the OS, and load the kernel image there for the PCI board to boot from.
There's a lot of code in there that you would have to maintain on your lonesome. For the price of a flash device, I would recommend using U-Boot as the bootloader in Flash on your board. The PCI interface can be used to transfer a kernel image, and then that image can be booted by the bootloader. Note that this transfer can account for the fragmented image on the host by using scatter-gather DMA into the linear address space of the DDR memory on the board (since U-Boot is running with a flat memory map).
Save yourself a lot of trouble, and no community support, by doing things the standard way (and a good sign is, that you're already asking questions).
Note that if your processor has the option that when reset is deasserted, the core can remain in reset, make sure you implement that feature (on the MPC8349EA its controlled by the reset configuration words source pin strapping). This allows you to enable the chip, including the PCI interface, while the core is reset, and therefore does not try to boot. The MPC8349EA has default windows over PCI that then allow a host CPU to manipulate all the memory-mapped registers visible to the PowerPC core. I've been using this feature of the MPC8349EA processor to test my board over PCI from an x86 host. The DDR controller can be enabled/disabled, the clocks swept, and data patterns DMAed to/from PCI to confirm the memory controller configuration prior to attempting a U-Boot port.
Hope this helps.
Cheers, Dave

In message 480E32A6.8080908@ovro.caltech.edu you wrote:
...
from a flash? We are developing our app with the canyonlands board, but we won't be able to try the PCI boot until we have our own board finished.
U-Boot is a bootloader. What you are describing here circumvents needing a bootloader.
This is not necessarily true. Ther emay be many reasons why you still want to run a boot loader (like U-Boot) on the PCI device's local processor.
If you want to boot your board over PCI, then the board will most likely be a peripheral board (a host would need to boot and setup bridges, so you would have a hard time booting a host through a bridge that is not configured). The host CPU would hold the image of the kernel that you were planning to boot onto the PCI peripheral board. However, your host would need to perform all of the tasks normally performed by the bootloader; setup the memory map, memory controllers, and peripherals that Linux expects to find configured. Then even trickier, is to setup the kernel boot command line arguments and device tree. My guess would be that you would have to hardwire that info into your kernel image, or add a small bootloader to the kernel image to setup the arguments to the kernel proper.
You see? There is plenty of good reasons to have a well-known, powerful boot loader available :-)
Save yourself a lot of trouble, and no community support,
Who says "no community support"? This is a perfectly legal way of using U-Boot, and we we can, we will help.
Best regards,
Wolfgang Denk

Hi Wolfgang,
This is not necessarily true. There may be many reasons why you still want to run a boot loader (like U-Boot) on the PCI device's local processor.
Fair enough.
If you want to boot your board over PCI, then the board will most likely be a peripheral board (a host would need to boot and setup bridges, so you would have a hard time booting a host through a bridge that is not configured). The host CPU would hold the image of the kernel that you were planning to boot onto the PCI peripheral board. However, your host would need to perform all of the tasks normally performed by the bootloader; setup the memory map, memory controllers, and peripherals that Linux expects to find configured. Then even trickier, is to setup the kernel boot command line arguments and device tree. My guess would be that you would have to hardwire that info into your kernel image, or add a small bootloader to the kernel image to setup the arguments to the kernel proper.
You see? There is plenty of good reasons to have a well-known, powerful boot loader available :-)
Save yourself a lot of trouble, and no community support,
Who says "no community support"? This is a perfectly legal way of using U-Boot, and we we can, we will help.
The 'no community support' referred to maintaining out-of-tree host-side code that performed all the board-specific setup before loading the kernel, i.e., with no U-boot interaction at all. Sorry if that was misleading.
Ok, so given this is a perfectly legal way of booting U-Boot, would you (Wolfgang) recommend it?
Lets take the MPC8349EA as the example here. The processor reset configuration can be setup for PCI boot, however the processor will come out of reset with PCI outbound translation windows configured to fetch from PCI address 0 (I think). A boot-sequencer EEPROM on the PowerPC would not help (for setting up the outbound window), since it won't know the PCI address of the U-Boot image (lets assume its an x86 running Linux for example sake). So the x86 host would have to setup the PCI outbound translation window to point to the U-Boot image. But you have the issue, that the hosts MMU will be running, so the linear U-Boot image that the host loaded into memory, will in fact consist of 4K pages of physical addresses as viewed over the PCI bus, i.e., the U-Boot image will be fragmented. The PCI boot sequence from the perspective of the PowerPC expects a linear sequence of addresses, and for this particular example, that will not normally be the case. The host could arrange for a block of its DDR to be reserved (eg. using the mem= kernel argument on boot), and it could load the U-Boot image to that, and then point the PowerPC to boot from the PCI address that points to that window.
So, its not impossible, but it does seem like more work than using Flash to boot the board. But of course there may be reasons to want to boot over PCI ...
I guess once you do this, there isn't really much difference between the U-Boot image that would have been programmed to Flash versus the one fetched over PCI, since U-Boot can still perform its stack-in-cache trick, setup DDR etc.
Interesting ... if there is work done on this, I can test any patches on the MPC8349EA.
Cheers, Dave

On Tuesday 22 April 2008, David Hawkins wrote:
So, its not impossible, but it does seem like more work than using Flash to boot the board.
Full ACK from me. I would always recommend to use a small NOR FLASH with a full-blown U-Boot for booting from NOR with full DDR2 setup etc. All this will be hard to implement with all the restrictions for booting from PCI (even though I have never done this before, but I suspect that it will be quite as limited as booting from NAND).
But of course there may be reasons to want to boot over PCI ...
Yes, what is the main reason you want to do this? As mentioned above, I recommend to boot from a small NOR FLASH (512kB) and "wait" in U-Boot for the PCI host to provide the OS/application image via PCI.
Best regards, Stefan
===================================================================== DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: +49-8142-66989-0 Fax: +49-8142-66989-80 Email: office@denx.de =====================================================================

On Wed, Apr 23, 2008 at 10:26:51AM +0200, Stefan Roese wrote:
Full ACK from me. I would always recommend to use a small NOR FLASH with a full-blown U-Boot for booting from NOR with full DDR2 setup etc. All this will be hard to implement with all the restrictions for booting from PCI (even though I have never done this before, but I suspect that it will be quite as limited as booting from NAND).
But of course there may be reasons to want to boot over PCI ...
Yes, what is the main reason you want to do this? As mentioned above, I recommend to boot from a small NOR FLASH (512kB) and "wait" in U-Boot for the PCI host to provide the OS/application image via PCI.
Ok, here is what I want to do. I have a board that has several PPC460s on it. Each has its own DDR2. One of them is the master and boots up normally today with uboot, launches linux, etc. What I want to do is have that master load the flash binary image into memory and then lock that memory down (via whatever kernel calls are required). Then I want to configure each slaves PCI bar registers to be able to read from this main memory and then tell them to go. So my thought was that it would appear to the slaves that they are just loading from a regular flash and this image would contain uboot, kernel, root filesystem. Basically everything a flash normally would have. It sounded reasonable at the time. The reason to avoid a flash on every one is so that we don't have to reflash that many CPUs every time and because it will take up more board space. Thanks
- ame

Hi Ame,
Yes, what is the main reason you want to do this?
Ok, here is what I want to do. I have a board that has several PPC460s on it. Each has its own DDR2. One of them is the master and boots up normally today with uboot, launches linux, etc. What I want to do is have that master load the flash binary image into memory and then lock that memory down (via whatever kernel calls are required). Then I want to configure each slaves PCI bar registers to be able to read from this main memory and then tell them to go. So my thought was that it would appear to the slaves that they are just loading from a regular flash and this image would contain uboot, kernel, root filesystem. Basically everything a flash normally would have. It sounded reasonable at the time. The reason to avoid a flash on every one is so that we don't have to reflash that many CPUs every time and because it will take up more board space. Thanks
Here's a few ideas then.
1) Use the main host to setup DDR on each of the targets, copy a U-Boot image to each target, and then enable the targets to boot from DDR.
The target U-Boot versions would have to skip all DDR initialization, since they are running from DDR.
I believe this code would come under ...
http://www.denx.de/wiki/view/DULG/CanUBootBeConfiguredSuchThatItCanBeStarted...
Which contains the warning:
'But it is difficult, unsupported, and fraught with peril.'
Wolfgang and Stephan could probably comment more on this.
2) Configure an inbound translation window on the host to point a PCI window to its *flash*.
Configure the outbound translation windows on the targets to point to the host inbound window.
Each target would then boot from the same flash as the host. Assuming there was some I/O to distinguish the targets from the host, the code could determine what to do at runtime.
This has the disadvantage that all boards are booting from potentially slower flash memory, but that may not matter.
3) Configure the host DDR memory such that the OS does not use all installed memory, eg. reserves enough memory for a U-Boot image.
Setup the host inbound translation window to point to that linear DDR region, and have the host copy the image there.
Setup the targets as in (2) and let them boot.
The nice thing about (2) and (3) is that the target processors are effectively in the 'virgin' state that flash booting expects, so the modifications to U-Boot required to support the booting scheme would be minimal.
4) Another option would be to place SRAM on each of the other processors, and copy the U-Boot image there. That way flash on the targets is replaced with SRAM.
However, this would require a hardware change, and is not much of an improvement.
As for the root filesystem, there's more to think about there :)
I will soon have a similar situation to yours. I'll have a master CPU (an x86 host CPU in a compact PCI crate), and 15 to 20 boards in peripheral slots (MPC8349EA processors). Each processor has a gigabit ethernet port, so during development, I'll just use an NFS mounted rootfs. However, once I deploy about 8 crates worth of this hardware, I don't want to have lots of Gbe cables and switches. At that point I plan to change to a scheme where I create a virtual network interface over PCI. If there is stuff that needs to be stored locally, then I'll setup a RAM disk and rsync the contents from an NFS mount at boot time. The only issue with this approach is whether a virtual network over PCI driver exists. The MPC8349EA development kit comes with a driver that sounds like it does some of this, so I'll start with that.
The scheme I envisage is; the peripheral boards boot U-Boot, and that U-Boot port has a 'terminal over PCI' and 'ethernet over PCI' driver built in. Back on the host CPU, I can get to the terminals via /dev/ttyPCI# nodes, and get to each board via ethernet connections, where the slot numbers of the boards define unique MAC addresses. The U-Boots on each board can then tftp a kernel and boot. The kernel command line will use an NFS path that comes back through the x86 host CPU.
None of this is implemented, but its only software ... right :)
Cheers, Dave

On Wednesday 23 April 2008, David Hawkins wrote:
Here's a few ideas then.
- Use the main host to setup DDR on each of the targets, copy a U-Boot image to each target, and then enable the targets to boot from DDR.
I don't see how this could be done. The SDRAM has to be setup by the CPU itself and can't be setup via another PCI host.
The target U-Boot versions would have to skip all DDR initialization, since they are running from DDR. I believe this code would come under ...
http://www.denx.de/wiki/view/DULG/CanUBootBeConfiguredSuchThatItCanBeStarte dInRAM
Which contains the warning: 'But it is difficult, unsupported, and fraught with peril.' Wolfgang and Stephan could probably comment more on this.
Configure an inbound translation window on the host to point a PCI window to its *flash*.
Configure the outbound translation windows on the targets to point to the host inbound window.
This is probably only possible from the CPU itself too. And where should the code come from to do this?
Each target would then boot from the same flash as the host. Assuming there was some I/O to distinguish the targets from the host, the code could determine what to do at runtime. This has the disadvantage that all boards are booting from potentially slower flash memory, but that may not matter.
Configure the host DDR memory such that the OS does not use all installed memory, eg. reserves enough memory for a U-Boot image.
Setup the host inbound translation window to point to that linear DDR region, and have the host copy the image there.
Setup the targets as in (2) and let them boot.
The nice thing about (2) and (3) is that the target processors are effectively in the 'virgin' state that flash booting expects, so the modifications to U-Boot required to support the booting scheme would be minimal.
I don't think this is possible. To configure the translation windows, code is needed on all PPC's.
Another option would be to place SRAM on each of the other processors, and copy the U-Boot image there. That way flash on the targets is replaced with SRAM.
However, this would require a hardware change, and is not much of an improvement.
SRAM or FLASH, no big difference here. The only advantage is that there is no need to update all target FLASH's.
Another idea would be to put only one NOR FLASH for all target PPC's on the board. Then you "release" (put out of reset) all target PPC's from the host PPC in a sequential order. This way there should be no problem sharing the NOR FLASH.
Best regards, Stefan
===================================================================== DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: +49-8142-66989-0 Fax: +49-8142-66989-80 Email: office@denx.de =====================================================================

Hi Stefan,
Here's a few ideas then.
Sorry, all of the ideas were based on the MPC8349EA, which I am currently using. I did not look at the ppc460 documents, as I was just putting ideas 'out there'.
When the MPC8349EA is booted, you can enable the peripherals while leaving the core in reset. The IMMRs are memory mapped to PCI via a 1MB BAR, and there are several other BARs setup to have 1MB windows. The IMMR registers can be accessed to setup the memory map of the target processor, and setup the inbound translation windows. I've been using this technique to check out all of my hardware, and DMA over PCI, DDR, and local buses.
Given that you don't think any of my suggestions are possible, I'll have to go download the ppc460 reference manual and convince myself :)
Ame, feel free to chime in if you think any of the ideas are possible.
Cheers, Dave

On Thu, Apr 24, 2008 at 08:53:53AM -0700, David Hawkins wrote:
Given that you don't think any of my suggestions are possible, I'll have to go download the ppc460 reference manual and convince myself :)
Ame, feel free to chime in if you think any of the ideas are possible.
So maybe I need to clarify some more. The PPC460 data sheet is not too clear on this yet. However, here are my thoughts on this. Lets just take the simple case as an example. We have a plurality of 460s where a single one is the master. Between the master and all the slaves is a PCI bridge. The slaves are hardwired to boot from pci bus memory -- according to the datasheet that is at a fixed address. So there does not appear to be any need to do anything to the slave upon power up. Now the master boots and then allocates a chunk of contiguous memory using a kernel driver or whatever is needed. The image is just whatever the flash image would normally contain (uboot + kernel + rootfs). The address of that chunk is then given to the pci bridge so that it can perform inbound translation from the address that the PPC slaves will use to the address where the image is physically located. Then the slaves are taken out of reset and begin reading "flash" across the pci bus which really goes through the bridge and is mapped to the DRAM on the master (or I guess it could be the flash on the master, but DRAM seemed easier since it is already running).
Ok, so how many holes does this approach have?
Thanks - ame

Hi Ame,
So maybe I need to clarify some more. The PPC460 data sheet is not too clear on this yet. However, here are my thoughts on this. Lets just take the simple case as an example. We have a plurality of 460s where a single one is the master. Between the master and all the slaves is a PCI bridge. The slaves are hardwired to boot from pci bus memory -- according to the datasheet that is at a fixed address. So there does not appear to be any need to do anything to the slave upon power up.
Really? I didn't see a comment about the fixed address when I parsed the data sheet. Where is that comment in PP460EX_DS2063.pdf, rev 1.09 April 14, 2008?
Now the master boots and then allocates a chunk of contiguous memory using a kernel driver or whatever is needed. The image is just whatever the flash image would normally contain (uboot + kernel + rootfs).
Or you could just have a u-boot image, and then use u-boot to fetch the kernel and rootfs.
The address of that chunk is then given to the pci bridge so that it can perform inbound translation from the address that the PPC slaves will use to the address where the image is physically located. Then the slaves are taken out of reset and begin reading "flash" across the pci bus which really goes through the bridge and is mapped to the DRAM on the master (or I guess it could be the flash on the master, but DRAM seemed easier since it is already running).
Ok, so how many holes does this approach have?
This seems reasonable to me.
However, without the full users manual for the 460EX, and the users manual for the pci bridge, I can't really comment more.
Cheers, Dave

On Thursday 24 April 2008, Ayman M. El-Khashab wrote:
So maybe I need to clarify some more. The PPC460 data sheet is not too clear on this yet. However, here are my thoughts on this. Lets just take the simple case as an example. We have a plurality of 460s where a single one is the master. Between the master and all the slaves is a PCI bridge. The slaves are hardwired to boot from pci bus memory -- according to the datasheet that is at a fixed address. So there does not appear to be any need to do anything to the slave upon power up. Now the master boots and then allocates a chunk of contiguous memory using a kernel driver or whatever is needed. The image is just whatever the flash image would normally contain (uboot + kernel + rootfs). The address of that chunk is then given to the pci bridge so that it can perform inbound translation from the address that the PPC slaves will use to the address where the image is physically located. Then the slaves are taken out of reset and begin reading "flash" across the pci bus which really goes through the bridge and is mapped to the DRAM on the master (or I guess it could be the flash on the master, but DRAM seemed easier since it is already running).
Ok, so how many holes does this approach have?
Sounds quite reasonable. The only thing I'm unsure here is the size of the PCI windows that is mapped upon PCI booting. The 460EX users manual doesn't mention anything about this, but from the 440EPx users manual this windows has the following size:
0xfffe.0000 - 0xffff.ffff -> 128kB
Not sure if this is the same on 460EX/GT. You should probably contact AMCC and ask about this.
If 128kB is correct, then this is a little short for a full-blown U-Boot image.
Best regards, Stefan
===================================================================== DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: +49-8142-66989-0 Fax: +49-8142-66989-80 Email: office@denx.de =====================================================================

Stefan Roese wrote:
On Thursday 24 April 2008, Ayman M. El-Khashab wrote:
So maybe I need to clarify some more. The PPC460 data sheet is not too clear on this yet. However, here are my thoughts on this. Lets just take the simple case as an example. We have a plurality of 460s where a single one is the master. Between the master and all the slaves is a PCI bridge. The slaves are hardwired to boot from pci bus memory -- according to the datasheet that is at a fixed address. So there does not appear to be any need to do anything to the slave upon power up. Now the master boots and then allocates a chunk of contiguous memory using a kernel driver or whatever is needed. The image is just whatever the flash image would normally contain (uboot + kernel + rootfs). The address of that chunk is then given to the pci bridge so that it can perform inbound translation from the address that the PPC slaves will use to the address where the image is physically located. Then the slaves are taken out of reset and begin reading "flash" across the pci bus which really goes through the bridge and is mapped to the DRAM on the master (or I guess it could be the flash on the master, but DRAM seemed easier since it is already running).
Ok, so how many holes does this approach have?
Sounds quite reasonable. The only thing I'm unsure here is the size of the PCI windows that is mapped upon PCI booting.
Thanks for all your help ...
I thought the answer to both questions was in footnote 3 on page 8 of the data sheet. At least that was my interpretation of the following statement: "3. When the optional boot from PCI Memory is selected, the PCI Boot ROM address space begins at 0000 000C FF00 0000 (16 MB)."
I interpreted that to mean the address was not something that needed to be configured in any sort of bar register and that the address it fetched was fixed. So the only thing that was required was to configure the inbound translation on the bridge.
- ame

On Friday 25 April 2008, Ayman El-Khashab wrote:
Sounds quite reasonable. The only thing I'm unsure here is the size of the PCI windows that is mapped upon PCI booting.
Thanks for all your help ...
I thought the answer to both questions was in footnote 3 on page 8 of the data sheet. At least that was my interpretation of the following statement: "3. When the optional boot from PCI Memory is selected, the PCI Boot ROM address space begins at 0000 000C FF00 0000 (16 MB)."
Ah, I didn't notice this.
I interpreted that to mean the address was not something that needed to be configured in any sort of bar register and that the address it fetched was fixed. So the only thing that was required was to configure the inbound translation on the bridge.
You could be right here.
Good luck with this PCI booting stuff. Please keep us informed on the progress.
Best regards, Stefan
===================================================================== DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: +49-8142-66989-0 Fax: +49-8142-66989-80 Email: office@denx.de =====================================================================

On Fri, Apr 25, 2008 at 02:45:38PM +0200, Stefan Roese wrote:
Good luck with this PCI booting stuff. Please keep us informed on the progress.
We'll contribute back what we find on this approach. We won't be able to try it until aug/sep since that is when the assembled boards will be in house.
Thanks again

Hi Ame,
Given that you don't think any of my suggestions are possible, I'll have to go download the ppc460 reference manual and convince myself :)
There does not appear to be a 460EX User's Manual on the AMCC web site. Anyone know where to get it? I created an account just in case it was not visible to guests, but no change.
The data sheet lists the 460EX is capable of operating as an agent and can perform PCI boot. However, without the user's manual, there's no way of figuring out how.
Cheers, Dave

Hi David,
On Thursday 24 April 2008, David Hawkins wrote:
Given that you don't think any of my suggestions are possible, I'll have to go download the ppc460 reference manual and convince myself :)
There does not appear to be a 460EX User's Manual on the AMCC web site. Anyone know where to get it? I created an account just in case it was not visible to guests, but no change.
It's still preliminary, so you probably need to contact your AMCC distributor/FAE to get access to it.
Best regards, Stefan
===================================================================== DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: +49-8142-66989-0 Fax: +49-8142-66989-80 Email: office@denx.de =====================================================================

Hi Stefan,
There does not appear to be a 460EX User's Manual on the AMCC web site. Anyone know where to get it? I created an account just in case it was not visible to guests, but no change.
It's still preliminary, so you probably need to contact your AMCC distributor/FAE to get access to it.
It sounds like Ame has things under control, so I won't try getting the manual.
Ame, good luck with your board bring-up.
If your hardware guys are still working on the board, make sure they connect all the JTAG connections on those processors!
Thanks, Dave

In message 20080422161140.GA20218@crust.elkhashab.com you wrote:
Hello, I saw a reference to PCI booting in the docs, but could not find enough detail. We would like to use PCI memory space booting on an upcoming design with the PPC460EX. Does Uboot currently support that mode of booting? Is it as easy as booting
It has been done before. See for example the PN62 board which was the first board where this was implemented - more than 5 years ago and probably untested since.
from a flash? We are developing our app with the canyonlands board, but we won't be able to try the PCI boot until we have our own board finished.
Of course, adaptions will be necessary.
Best regards,
Wolfgang Denk
participants (6)
-
Ayman El-Khashab
-
Ayman M. El-Khashab
-
ayman@austin.rr.com
-
David Hawkins
-
Stefan Roese
-
Wolfgang Denk